| Type: | Package |
| Title: | Bayesian Model Selection Approach for Parsimonious Gaussian Mixture Models |
| Version: | 1.3.1 |
| Date: | 2026-05-26 |
| Depends: | R(≥ 3.1.0) |
| Imports: | methods (≥ 3.5.1), mcmcse (≥ 1.3-2), pgmm (≥ 1.2.3), mvtnorm (≥ 1.0-10), MASS (≥ 7.3-51.1), parallel, Rcpp (≥ 1.0.1), gtools (≥ 3.8.1), label.switching (≥ 1.8), fabMix (≥ 5.0), mclust (≥ 5.4.3) |
| Author: | Yaoxiang Li [aut, cre], Xiang Lu [aut], Tanzy Love [aut] |
| Maintainer: | Yaoxiang Li <liyaoxiang@outlook.com> |
| Description: | Model-based clustering using Bayesian parsimonious Gaussian mixture models. MCMC (Markov chain Monte Carlo) are used for parameter estimation. The RJMCMC (Reversible-jump Markov chain Monte Carlo) is used for model selection. GREEN et al. (1995) <doi:10.1093/biomet/82.4.711>. |
| License: | GPL-3 |
| URL: | https://github.com/YaoxiangLi/bpgmm, https://yaoxiangli.github.io/bpgmm/, https://doi.org/10.1007/s00357-021-09391-8 |
| BugReports: | https://github.com/YaoxiangLi/bpgmm/issues |
| Encoding: | UTF-8 |
| RoxygenNote: | 7.3.2 |
| Suggests: | knitr, rmarkdown, testthat |
| LinkingTo: | Rcpp, RcppArmadillo |
| VignetteBuilder: | knitr |
| NeedsCompilation: | yes |
| Packaged: | 2026-05-28 03:48:04 UTC; Li |
| Repository: | CRAN |
| Date/Publication: | 2026-05-28 07:10:17 UTC |
Hyperparameter set for the Bayesian PGMM sampler.
Description
Hyperparameter set for the Bayesian PGMM sampler.
Slots
alpha1First Dirichlet prior parameter for component weights.
alpha2Second Dirichlet prior parameter for component weights.
deltaShape parameter used in prior updates.
ggammaPrior rate parameter used in covariance updates.
bbetaPrior scale parameter used in covariance updates.
ThetaYList-class
Description
Parameter set for sampled PGMM component parameters.
Slots
taoNumeric vector of component mixing weights.
psyList of diagonal noise covariance matrices.
MList of component mean vectors.
lambdaList of component factor loading matrices.
YList of latent factor score matrices.
Convert PGMM Constraint Codes to Paper Model Names
Description
The paper represents the eight PGMM covariance structures with three-letter model names. Each letter is either 'C' for constrained or 'U' for unconstrained. The letters indicate whether the loading matrix is shared across clusters, whether the noise covariance is shared across clusters, and whether the noise covariance is isotropic within clusters.
Usage
constraint_to_model(constraint)
Arguments
constraint |
Integer or numeric vector of length three with entries '0' or '1'. '1' maps to 'C'; '0' maps to 'U'. |
Value
A character scalar, one of 'CCC', 'CCU', 'CUC', 'CUU', 'UCC', 'UCU', 'UUC', or 'UUU'.
Convert PGMM Paper Model Names to Constraint Codes
Description
Convert PGMM Paper Model Names to Constraint Codes
Usage
model_to_constraint(model)
Arguments
model |
Character scalar naming one of the eight PGMM covariance structures: 'CCC', 'CCU', 'CUC', 'CUU', 'UCC', 'UCU', 'UUC', or 'UUU'. |
Value
Integer vector of length three. '1' means constrained and '0' means unconstrained.
Bayesian Model-Based Clustering with Parsimonious Gaussian Mixture Models
Description
Carries out model-based clustering using parsimonious Gaussian mixture models. MCMC is used for parameter estimation and RJMCMC is used for model selection.
Usage
pgmm_rjmcmc(
X,
m_init,
m_range,
q_new,
delta = 2,
ggamma = 2,
burn = 20,
niter = 1000,
constraint = c(0, 0, 0),
d_vec = c(1, 1, 1),
s_vec = c(1, 1, 1),
m_step = 0,
v_step = 0,
split_combine = 0,
verbose = TRUE
)
Arguments
X |
the observation matrix with variables in rows and observations in columns. |
m_init |
the number of initial clusters. |
m_range |
the allowed range for the number of clusters. |
q_new |
the number of latent factors for a new cluster. |
delta |
scalar hyperparameter for the noise covariance prior |
ggamma |
scalar hyperparameter used in covariance-structure proposals |
burn |
the number of burn-in iterations |
niter |
the number of posterior sampling iterations |
constraint |
initial PGMM covariance constraint. Use a three-letter model label such as '"CCC"' or '"UUU"', or a numeric vector of length three with binary entries. For example, 'c(1, 1, 1)' is 'CCC', the fully constrained model, and 'c(0, 0, 0)' is 'UUU', the fully unconstrained model. |
d_vec |
a vector of hyperparameters with length three, shape parameters for alpha1, alpha2 and bbeta respectively |
s_vec |
a vector of hyperparameters with length three, rate parameters for alpha1, alpha2 and bbeta respectively |
m_step |
indicator for RJMCMC model selection on the number of clusters. |
v_step |
indicator for RJMCMC model selection on covariance structures. |
split_combine |
indicator for using split/combine moves in the cluster-number RJMCMC step. |
verbose |
logical; if 'TRUE', print iteration progress. |
Details
The 'constraint' argument follows the three-letter PGMM model notation used in Lu, Li, and Love (2021). The first entry indicates whether loading matrices are shared across clusters, the second whether noise covariance matrices are shared across clusters, and the third whether the noise covariance is isotropic within each cluster. Use [model_to_constraint()] to convert model names such as 'CCC', 'CCU', 'CUC', 'CUU', 'UCC', 'UCU', 'UUC', and 'UUU' into the numeric vector used internally.
Value
A list of posterior samples with snake_case fields: 'tau_samples', 'psi_samples', 'mean_samples', 'lambda_samples', 'factor_score_samples', 'allocation_samples', 'constraint_samples', 'alpha1_samples', 'alpha2_samples', 'beta_samples', and 'active_cluster_samples'.
Run Multiple Independent Bayesian PGMM Chains
Description
Runs independent 'pgmm_rjmcmc()' chains, optionally in parallel. This is the safest way to use multiple CPU cores because each MCMC iteration depends on the previous state, while independent chains can be evaluated concurrently.
Usage
pgmm_rjmcmc_chains(
X,
m_init,
m_range,
q_new,
delta = 2,
ggamma = 2,
burn = 20,
niter = 1000,
constraint = c(0, 0, 0),
d_vec = c(1, 1, 1),
s_vec = c(1, 1, 1),
m_step = 0,
v_step = 0,
split_combine = 0,
verbose = FALSE,
chains = 2,
cores = min(chains, available_cores()),
seed = NULL
)
Arguments
X |
the observation matrix with variables in rows and observations in columns. |
m_init |
the number of initial clusters. |
m_range |
the allowed range for the number of clusters. |
q_new |
the number of latent factors for a new cluster. |
delta |
scalar hyperparameter for the noise covariance prior |
ggamma |
scalar hyperparameter used in covariance-structure proposals |
burn |
the number of burn-in iterations |
niter |
the number of posterior sampling iterations |
constraint |
initial PGMM covariance constraint. Use a three-letter model label such as '"CCC"' or '"UUU"', or a numeric vector of length three with binary entries. For example, 'c(1, 1, 1)' is 'CCC', the fully constrained model, and 'c(0, 0, 0)' is 'UUU', the fully unconstrained model. |
d_vec |
a vector of hyperparameters with length three, shape parameters for alpha1, alpha2 and bbeta respectively |
s_vec |
a vector of hyperparameters with length three, rate parameters for alpha1, alpha2 and bbeta respectively |
m_step |
indicator for RJMCMC model selection on the number of clusters. |
v_step |
indicator for RJMCMC model selection on covariance structures. |
split_combine |
indicator for using split/combine moves in the cluster-number RJMCMC step. |
verbose |
logical; if 'TRUE', print iteration progress. |
chains |
positive integer giving the number of independent chains. |
cores |
positive integer giving the number of worker processes to use. Values greater than 'chains' are reduced to 'chains'. |
seed |
optional integer seed used to generate deterministic per-chain seeds. |
Value
A list with one fitted 'pgmm_rjmcmc()' result per chain. The result has class 'bpgmm_rjmcmc_chains' and stores the per-chain seeds in the 'chain_seeds' attribute.
Summarize RJMCMC Samples from a Bayesian PGMM Fit
Description
Summarizes posterior samples from [pgmm_rjmcmc()] into the modal allocation, posterior counts for the number of clusters, posterior counts for the eight PGMM covariance-constraint models, and optionally the adjusted Rand index against a known reference partition.
Usage
summarize_pgmm_rjmcmc(fit, true_cluster = NULL)
Arguments
fit |
Result list from [pgmm_rjmcmc()]. |
true_cluster |
Optional true or reference cluster allocation. |
Value
A list with 'allocation', 'n_clusters', 'n_constraints', and optionally 'ari'.