--- title: "Shape-recognition sensitivity study" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Shape-recognition sensitivity study} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.align = "center", fig.width = 8, fig.height = 6, dpi = 110 ) ``` `janusplot()` assigns every fitted smooth to one of 24 shape categories via a `(n_turning_points, n_inflections)` dispatch with additional `(monotonicity_index, convexity_index)` disambiguation for the monotone cases (see the `janusplot` vignette for the full definition of the indices). How reliably does this classifier recover the ground-truth shape of a noisy sample? This vignette answers the question with a full-factorial sensitivity sweep. ## Design For each combination of ground-truth shape, sample size `n`, and noise level `sigma`, the sweep: 1. Generates `n` points from the noiseless canonical curve on `x ∈ [0, 1]`, with `y` normalised to `[0, 1]` so that `sigma` is the fraction of y-range that Gaussian noise contributes — an SNR-comparable scale across shapes. 2. Fits `mgcv::gam(y ~ s(x), method = "REML")`. 3. Classifies the fit via `janusplot_shape_metrics()`. 4. Records correctness at the **fine** (24-category) and **archetype** (7-family) levels. The design factors are orthogonal and replicated. See `?janusplot_shape_sensitivity` for the function surface. The 14 canonical ground-truth shapes cover five of the seven archetypes (`chaotic` and `degenerate` have no realistic deterministic generator). ```{r setup-pkg} library(janusplot) library(ggplot2) janusplot_shape_sensitivity_shapes() ``` ## Pre-registered hypotheses The sweep's hypotheses are pinned in `simulation/PLAN.md` (Scenario 4): - **H1.** At `n = 500`, `sigma = 0.05`, archetype accuracy exceeds 0.90 for every shape. - **H2.** Fine-category accuracy exceeds 0.75 at `n = 500`, `sigma = 0.05` for monotone + unimodal shapes; wave and multimodal tolerate less noise. - **H3.** Rippled variants require `n ≥ 200` and `sigma ≤ 0.10` to resolve. - **H4.** At `sigma = 0.40`, archetype accuracy collapses below 0.50 for all but the simplest shapes. ## Precomputed demo The package ships a small-footprint precomputed sweep — 6 shapes (one per non-degenerate archetype) × 3 sample sizes × 4 noise levels × 30 replicates = 2160 fits — so you can explore the API without running the full sweep yourself. ```{r demo-data} data("shape_sensitivity_demo") str(shape_sensitivity_demo, vec.len = 2) ``` ### Recovery curves (headline figure) ```{r recovery-curves} janusplot_shape_sensitivity_plot(shape_sensitivity_demo, "recovery_curves") ``` Every shape is recovered near-perfectly at low noise; the informative picture is where each shape's curve falls off as sigma grows. The unimodal and monotone-curved families tolerate more noise than the multimodal ones. ### Archetype confusion ```{r archetype-confusion, fig.width = 6, fig.height = 5} janusplot_shape_sensitivity_plot(shape_sensitivity_demo, "confusion_archetype") ``` The off-diagonals reveal the classifier's failure modes. A `unimodal` truth misclassified as `wave` or `multimodal` means the spline invented extra turning points under noise. ### Archetype-level accuracy grid ```{r accuracy-grid} janusplot_shape_sensitivity_plot(shape_sensitivity_demo, "accuracy_grid") ``` Per-shape heatmap of `P(archetype correct)` across the `(n, sigma)` design. Reading across a row shows the noise-tolerance profile of one sample size; reading up a column shows the sample-size sensitivity at one noise level. ### Numerical summary ```{r summary} head(janusplot_shape_sensitivity_summary(shape_sensitivity_demo, level = "archetype"), 10) ``` ## Running your own sweep The demo is a starting point. For the publication-grade figure use the full default grid (14 shapes × 4 sample sizes × 5 noise levels × 200 reps = 56 000 fits): ```{r full-sweep, eval = FALSE} # Configure parallel execution (optional) — you control the plan. future::plan(future::multisession, workers = 4L) res <- janusplot_shape_sensitivity(parallel = TRUE) # Save for your paper saveRDS(res, "shape_sensitivity_full.rds") janusplot_shape_sensitivity_plot(res, "recovery_curves") ``` ### Custom shape subsets + cutoffs Every argument is tunable. Below, we rerun only the bimodal/wave family under stricter monotonicity thresholds to see whether tightening `mono_strong` buys any fine-accuracy improvement for these categories. ```{r custom-subset, eval = FALSE} strict <- janusplot_shape_cutoffs(mono_strong = 0.95, curv_low = 0.1) res_strict <- janusplot_shape_sensitivity( shapes = c("wave", "bimodal", "bi_wave"), n_grid = c(200L, 500L), sigma_grid = c(0.05, 0.10, 0.20), n_rep = 100L, cutoffs = strict ) janusplot_shape_sensitivity_summary(res_strict, level = "fine") ``` ## References - Pya, N., & Wood, S. N. (2015). Shape constrained additive models. *Statistics and Computing*, 25(3), 543–559. - Calabrese, E. J. (2008). Hormesis: why it is important to toxicology and toxicologists. *Environmental Toxicology and Chemistry*, 27(7), 1451–1474. - Milnor, J. (1963). *Morse Theory*. Princeton University Press. - Meyer, M. C. (2008). Inference using shape-restricted regression splines. *Annals of Applied Statistics*, 2(3), 1013–1033. ```{r session-info} sessionInfo() ```