Help for package bpgmm

Type:

Package

Title:

Bayesian Model Selection Approach for Parsimonious Gaussian Mixture Models

Version:

1.3.1

Date:

2026-05-26

Depends:

R(≥ 3.1.0)

Imports:

methods (≥ 3.5.1), mcmcse (≥ 1.3-2), pgmm (≥ 1.2.3), mvtnorm (≥ 1.0-10), MASS (≥ 7.3-51.1), parallel, Rcpp (≥ 1.0.1), gtools (≥ 3.8.1), label.switching (≥ 1.8), fabMix (≥ 5.0), mclust (≥ 5.4.3)

Author:

Yaoxiang Li [aut, cre], Xiang Lu [aut], Tanzy Love [aut]

Maintainer:

Yaoxiang Li <liyaoxiang@outlook.com>

Description:

Model-based clustering using Bayesian parsimonious Gaussian mixture models. MCMC (Markov chain Monte Carlo) are used for parameter estimation. The RJMCMC (Reversible-jump Markov chain Monte Carlo) is used for model selection. GREEN et al. (1995) <doi:10.1093/biomet/82.4.711>.

License:

GPL-3

URL:

https://github.com/YaoxiangLi/bpgmm, https://yaoxiangli.github.io/bpgmm/, https://doi.org/10.1007/s00357-021-09391-8

BugReports:

https://github.com/YaoxiangLi/bpgmm/issues

Encoding:

UTF-8

RoxygenNote:

7.3.2

Suggests:

knitr, rmarkdown, testthat

LinkingTo:

Rcpp, RcppArmadillo

VignetteBuilder:

knitr

NeedsCompilation:

yes

Packaged:

2026-05-28 03:48:04 UTC; Li

Repository:

CRAN

Date/Publication:

2026-05-28 07:10:17 UTC

Hyperparameter set for the Bayesian PGMM sampler.

Description

Hyperparameter set for the Bayesian PGMM sampler.

Slots

alpha1: First Dirichlet prior parameter for component weights.
alpha2: Second Dirichlet prior parameter for component weights.
delta: Shape parameter used in prior updates.
ggamma: Prior rate parameter used in covariance updates.
bbeta: Prior scale parameter used in covariance updates.

ThetaYList-class

Description

Parameter set for sampled PGMM component parameters.

Slots

tao: Numeric vector of component mixing weights.
psy: List of diagonal noise covariance matrices.
M: List of component mean vectors.
lambda: List of component factor loading matrices.
Y: List of latent factor score matrices.

Convert PGMM Constraint Codes to Paper Model Names

Description

The paper represents the eight PGMM covariance structures with three-letter model names. Each letter is either 'C' for constrained or 'U' for unconstrained. The letters indicate whether the loading matrix is shared across clusters, whether the noise covariance is shared across clusters, and whether the noise covariance is isotropic within clusters.

Usage

constraint_to_model(constraint)

Arguments

constraint

Integer or numeric vector of length three with entries '0' or '1'. '1' maps to 'C'; '0' maps to 'U'.

Value

A character scalar, one of 'CCC', 'CCU', 'CUC', 'CUU', 'UCC', 'UCU', 'UUC', or 'UUU'.

Convert PGMM Paper Model Names to Constraint Codes

Description

Convert PGMM Paper Model Names to Constraint Codes

Usage

model_to_constraint(model)

Arguments

model

Character scalar naming one of the eight PGMM covariance structures: 'CCC', 'CCU', 'CUC', 'CUU', 'UCC', 'UCU', 'UUC', or 'UUU'.

Value

Integer vector of length three. '1' means constrained and '0' means unconstrained.

Bayesian Model-Based Clustering with Parsimonious Gaussian Mixture Models

Description

Carries out model-based clustering using parsimonious Gaussian mixture models. MCMC is used for parameter estimation and RJMCMC is used for model selection.

Usage

pgmm_rjmcmc(
  X,
  m_init,
  m_range,
  q_new,
  delta = 2,
  ggamma = 2,
  burn = 20,
  niter = 1000,
  constraint = c(0, 0, 0),
  d_vec = c(1, 1, 1),
  s_vec = c(1, 1, 1),
  m_step = 0,
  v_step = 0,
  split_combine = 0,
  verbose = TRUE
)

Arguments

X

the observation matrix with variables in rows and observations in columns.

m_init

the number of initial clusters.

m_range

the allowed range for the number of clusters.

q_new

the number of latent factors for a new cluster.

delta

scalar hyperparameter for the noise covariance prior

ggamma

scalar hyperparameter used in covariance-structure proposals

burn

the number of burn-in iterations

niter

the number of posterior sampling iterations

constraint

initial PGMM covariance constraint. Use a three-letter model label such as '"CCC"' or '"UUU"', or a numeric vector of length three with binary entries. For example, 'c(1, 1, 1)' is 'CCC', the fully constrained model, and 'c(0, 0, 0)' is 'UUU', the fully unconstrained model.

d_vec

a vector of hyperparameters with length three, shape parameters for alpha1, alpha2 and bbeta respectively

s_vec

a vector of hyperparameters with length three, rate parameters for alpha1, alpha2 and bbeta respectively

m_step

indicator for RJMCMC model selection on the number of clusters.

v_step

indicator for RJMCMC model selection on covariance structures.

split_combine

indicator for using split/combine moves in the cluster-number RJMCMC step.

verbose

logical; if 'TRUE', print iteration progress.

Details

The 'constraint' argument follows the three-letter PGMM model notation used in Lu, Li, and Love (2021). The first entry indicates whether loading matrices are shared across clusters, the second whether noise covariance matrices are shared across clusters, and the third whether the noise covariance is isotropic within each cluster. Use [model_to_constraint()] to convert model names such as 'CCC', 'CCU', 'CUC', 'CUU', 'UCC', 'UCU', 'UUC', and 'UUU' into the numeric vector used internally.

Value

A list of posterior samples with snake_case fields: 'tau_samples', 'psi_samples', 'mean_samples', 'lambda_samples', 'factor_score_samples', 'allocation_samples', 'constraint_samples', 'alpha1_samples', 'alpha2_samples', 'beta_samples', and 'active_cluster_samples'.

Run Multiple Independent Bayesian PGMM Chains

Description

Runs independent 'pgmm_rjmcmc()' chains, optionally in parallel. This is the safest way to use multiple CPU cores because each MCMC iteration depends on the previous state, while independent chains can be evaluated concurrently.

Usage

pgmm_rjmcmc_chains(
  X,
  m_init,
  m_range,
  q_new,
  delta = 2,
  ggamma = 2,
  burn = 20,
  niter = 1000,
  constraint = c(0, 0, 0),
  d_vec = c(1, 1, 1),
  s_vec = c(1, 1, 1),
  m_step = 0,
  v_step = 0,
  split_combine = 0,
  verbose = FALSE,
  chains = 2,
  cores = min(chains, available_cores()),
  seed = NULL
)

Arguments

X

the observation matrix with variables in rows and observations in columns.

m_init

the number of initial clusters.

m_range

the allowed range for the number of clusters.

q_new

the number of latent factors for a new cluster.

delta

scalar hyperparameter for the noise covariance prior

ggamma

scalar hyperparameter used in covariance-structure proposals

burn

the number of burn-in iterations

niter

the number of posterior sampling iterations

constraint

d_vec

a vector of hyperparameters with length three, shape parameters for alpha1, alpha2 and bbeta respectively

s_vec

a vector of hyperparameters with length three, rate parameters for alpha1, alpha2 and bbeta respectively

m_step

indicator for RJMCMC model selection on the number of clusters.

v_step

indicator for RJMCMC model selection on covariance structures.

split_combine

indicator for using split/combine moves in the cluster-number RJMCMC step.

verbose

logical; if 'TRUE', print iteration progress.

chains

positive integer giving the number of independent chains.

cores

positive integer giving the number of worker processes to use. Values greater than 'chains' are reduced to 'chains'.

seed

optional integer seed used to generate deterministic per-chain seeds.

Value

A list with one fitted 'pgmm_rjmcmc()' result per chain. The result has class 'bpgmm_rjmcmc_chains' and stores the per-chain seeds in the 'chain_seeds' attribute.

Summarize RJMCMC Samples from a Bayesian PGMM Fit

Description

Summarizes posterior samples from [pgmm_rjmcmc()] into the modal allocation, posterior counts for the number of clusters, posterior counts for the eight PGMM covariance-constraint models, and optionally the adjusted Rand index against a known reference partition.

Usage

summarize_pgmm_rjmcmc(fit, true_cluster = NULL)

Arguments

fit

Result list from [pgmm_rjmcmc()].

true_cluster

Optional true or reference cluster allocation.

Value

A list with 'allocation', 'n_clusters', 'n_constraints', and optionally 'ari'.

Package {bpgmm}

Hyperparameter set for the Bayesian PGMM sampler.

Description

Slots

ThetaYList-class

Description

Slots

Convert PGMM Constraint Codes to Paper Model Names

Description

Usage

Arguments

Value

Convert PGMM Paper Model Names to Constraint Codes

Description

Usage

Arguments

Value

Bayesian Model-Based Clustering with Parsimonious Gaussian Mixture Models

Description

Usage

Arguments

Details

Value

Run Multiple Independent Bayesian PGMM Chains

Description

Usage

Arguments

Value

Summarize RJMCMC Samples from a Bayesian PGMM Fit

Description

Usage

Arguments

Value