Package {cyclicwave}


Type: Package
Title: Cyclic Wave Analysis for Time-Series Clustering
Version: 0.1.0
Description: A modular toolkit for feature extraction and density-based clustering of time-series data. It provides classical statistical, discrete wavelet, Hilbert-based phase, and circular statistical features. The Hilbert-based phase representation can support the analysis of periodic patterns, phase relationships, and circular behavior in time-series data. The package supports DBSCAN and OPTICS clustering, cluster evaluation, visualization, data preparation, and comparison of multiple feature extraction and clustering combinations. Methods are described in Karakaya and Purutcuoglu (2026) <doi:10.15672/hujms.1821412> and Karakaya et al. (2026) <doi:10.1007/978-3-032-17020-0_27>.
License: MIT + file LICENSE
Encoding: UTF-8
LazyData: true
Imports: stats, utils, dbscan, gsignal, waveslim, MASS, e1071, ggplot2
Suggests: FNN, testthat (≥ 3.0.0), knitr, rmarkdown
Config/testthat/edition: 3
Config/roxygen2/version: 8.0.0
Depends: R (≥ 3.5.0)
VignetteBuilder: knitr
NeedsCompilation: no
Packaged: 2026-06-25 23:37:58 UTC; buaht
Author: Şule Şevval Karakaya [aut, cre], Ahmet Bursalı [aut], Vilda Purutçuoğlu [aut]
Maintainer: Şule Şevval Karakaya <sule.karakaya@metu.edu.tr>
Repository: CRAN
Date/Publication: 2026-07-03 12:00:08 UTC

M Statistic

Description

Calculates a leave-one-out circular statistic that measures how strongly one angular observation affects the mean resultant length.

Usage

M_statistic(theta, weights)

Arguments

theta

Numeric vector containing angles in radians.

weights

Numeric vector containing observation weights.

Value

Maximum M statistic across all observations.


A-Star Statistic

Description

Calculates an observation-level circular distance measure using the shortest angular separation between pairs of angles.

Usage

a_star_statistic(theta)

Arguments

theta

Numeric vector containing angles in radians.

Value

Maximum normalized angular distance value.


A Statistic

Description

Calculates an observation-level circular distance measure based on pairwise cosine differences.

Usage

a_statistic(theta)

Arguments

theta

Numeric vector containing angles in radians.

Value

Maximum normalized distance value.


Analytic Signal

Description

Calculates the complex analytic signal of a numeric vector or matrix using the Hilbert transform.

Usage

analytic_signal(X, axis = c("time", "feature"))

Arguments

X

Numeric vector, matrix, or data frame.

axis

Direction in which the Hilbert transform is applied. Available options are "time" and "feature".

Value

A complex matrix with the same dimensions as the input.


Match Predicted Cluster Labels

Description

Matches each predicted cluster label to the true label with which it has the largest overlap.

Usage

best_map(true_labels, pred_labels)

Arguments

true_labels

Vector containing the known class labels.

pred_labels

Vector containing the predicted cluster labels.

Value

A numeric vector containing the matched predicted labels.


Chord Length Statistic

Description

Calculates an observation-level distance measure using chord lengths between angular values on the unit circle.

Usage

chord_length(theta)

Arguments

theta

Numeric vector containing angles in radians.

Value

Maximum normalized chord-length value.


Circular Mean

Description

Calculates the weighted or unweighted mean direction of angular values.

Usage

circ_mean(theta, weights = NULL)

Arguments

theta

Numeric vector containing angles in radians.

weights

Optional numeric vector containing observation weights.

Value

Mean direction in radians between -pi and pi.


Mean Resultant Length

Description

Measures the concentration of angular values around their mean direction. Values close to 1 indicate strong concentration, while values close to 0 indicate greater angular dispersion.

Usage

circ_r(theta, weights = NULL)

Arguments

theta

Numeric vector containing angles in radians.

weights

Optional numeric vector containing observation weights.

Value

Numeric value between 0 and 1.


Circular Standard Deviation

Description

Calculates circular standard deviation from the mean resultant length. Larger values indicate greater angular dispersion.

Usage

circ_std(theta, weights = NULL)

Arguments

theta

Numeric vector containing angles in radians.

weights

Optional numeric vector containing observation weights.

Value

Non-negative numeric value.


Circular Variance

Description

Measures angular dispersion using the mean resultant length. Values close to 0 indicate low dispersion, while values close to 1 indicate high dispersion.

Usage

circ_var(theta, weights = NULL)

Arguments

theta

Numeric vector containing angles in radians.

weights

Optional numeric vector containing observation weights.

Value

Numeric value between 0 and 1.


Calculate Circular Distance Measures

Description

Calculates one or more circular statistics for the same angular vector.

Usage

circular_distance_measures(
  theta,
  weights = NULL,
  measures = c("mardia_kurtosis", "M_statistic", "a_statistic", "a_star_statistic",
    "chord_length")
)

Arguments

theta

Numeric vector containing angles in radians.

weights

Optional numeric vector containing observation weights. Weights are required for Mardia kurtosis and the M statistic.

measures

Character vector containing the names of the measures to calculate.

Value

Named numeric vector containing the selected measure values.


Clustering Accuracy

Description

Calculates clustering accuracy after matching predicted cluster labels to the known class labels.

Usage

cluster_accuracy(true_labels, pred_labels)

Arguments

true_labels

Vector containing the known class labels.

pred_labels

Vector containing the predicted cluster labels.

Value

A numeric accuracy value between 0 and 1.


Compare Feature and Clustering Methods

Description

Applies multiple feature extraction and clustering methods to the same dataset. Each feature and clustering combination is evaluated using the selected performance measures.

Usage

compare_methods(
  data,
  feature_methods,
  cluster_methods,
  metrics = c("dbi", "n_clusters", "n_noise"),
  true_labels = NULL,
  normalize = NULL,
  verbose = TRUE
)

Arguments

data

Numeric matrix or data frame containing the input data.

feature_methods

Named list of feature extraction functions. Each function must accept data as input and return a feature matrix.

cluster_methods

Named list of clustering method specifications. Each entry must contain a clustering function and may contain its parameter values.

metrics

Character vector containing the evaluation measures. Available options are "dbi", "accuracy", "n_clusters", and "n_noise".

true_labels

Optional vector containing the known class labels. It is required when accuracy is selected.

normalize

Optional normalization method passed to prepare_features. Available options are "zscore", "range", or NULL.

verbose

Logical value indicating whether progress information should be displayed.

Value

A data frame containing one row for each feature and clustering method combination and the selected evaluation results.


Instantaneous Phase

Description

Calculates instantaneous phase from the Hilbert analytic signal.

Usage

compute_phase(X, axis = c("time", "feature"))

Arguments

X

Numeric vector, matrix, or data frame.

axis

Direction in which the Hilbert transform is applied. Available options are "time" and "feature".

Value

A numeric matrix with the same dimensions as the input. Phase values are returned in radians between -pi and pi.


Davies-Bouldin Index

Description

Calculates the Davies-Bouldin Index for a clustering result. The index compares within-cluster spread with the distance between cluster centers. Lower values indicate more compact and separated clusters.

Usage

davies_bouldin(X, labels, noise_label = 0)

Arguments

X

Numeric feature matrix. Rows represent observations and columns represent features.

labels

Integer vector containing the cluster label of each observation.

noise_label

Label used for noise observations. Default is 0.

Value

A numeric value containing the Davies-Bouldin Index. Returns NA when fewer than two clusters remain after removing noise.


Extract Circular Features

Description

Extracts row-based circular features from a phase matrix. Each row is treated as one observation and each column as one phase value.

Usage

extract_circular_features(phase)

Arguments

phase

Numeric matrix containing phase values in radians.

Value

Numeric matrix containing mean phase, chord-based phase difference, and circular correlation distance for each row.


Find an Elbow Point

Description

Finds the point with the largest perpendicular distance from the line connecting the first and last values of a sorted sequence.

Usage

find_elbow(values)

Arguments

values

Numeric vector sorted in ascending order.

Value

Integer index of the detected elbow point.


First Differences

Description

Calculates the difference between consecutive observations in each column. A row of zeros is added at the beginning so that the output has the same number of rows as the input.

Usage

first_difference(X)

Arguments

X

Numeric vector, matrix, or data frame.

Value

A numeric matrix containing first differences.


Flatten Data with Group Labels

Description

Converts a wide matrix into a single value vector and creates a matching group label for each original column.

Usage

flatten_with_zones(X)

Arguments

X

Numeric matrix. Rows represent observations and columns represent groups or zones.

Value

A list containing the flattened values and their corresponding group labels.


Create Labels from Quantile Groups

Description

Divides numeric values into groups using selected quantile cut points and assigns an integer label to each value.

Usage

label_by_quantile(values, probs = c(1/3, 2/3))

Arguments

values

Numeric vector to be labelled.

probs

Numeric vector containing quantile cut points between 0 and 1. The default values create three groups.

Value

An integer vector containing one group label for each value.


Mardia Kurtosis

Description

Calculates a weighted multivariate kurtosis measure after representing angular values as points on the unit circle.

Usage

mardia_kurtosis(theta, weights)

Arguments

theta

Numeric vector containing angles in radians.

weights

Numeric vector containing observation weights.

Value

Numeric kurtosis value.


Normalize Feature Columns

Description

Normalizes each feature column using z-score or range scaling.

Usage

normalize_features(X, method = c("zscore", "range"))

Arguments

X

Numeric matrix or data frame. Rows represent observations and columns represent features.

method

Normalization method. Available options are "zscore" and "range".

Value

Numeric matrix with the same dimensions as the input.


Plot Clusters Using PCA

Description

Projects a feature matrix onto its first two principal components and displays the clustering result in two dimensions.

Usage

plot_clusters_pca(X, labels, noise_label = 0)

Arguments

X

Numeric feature matrix. Rows represent observations and columns represent features.

labels

Cluster label assigned to each observation.

noise_label

Label used for noise observations. Default is 0.

Value

A ggplot object containing the two-dimensional cluster graph.


Plot k-Distance Graph

Description

Creates a sorted k-nearest-neighbor distance graph to support the selection of the epsilon parameter for density-based clustering.

Usage

plot_k_distance(X, k = 5, distance = c("euclidean", "manhattan"))

Arguments

X

Numeric feature matrix. Rows represent observations and columns represent features.

k

Number of nearest neighbors used in the distance calculation.

distance

Distance measure. Available options are "euclidean" and "manhattan".

Value

A ggplot object containing the sorted k-distance graph.


Plot OPTICS Reachability

Description

Creates a reachability graph from an OPTICS result. Lower values generally represent denser regions, while higher values may indicate sparse regions or transitions between clusters.

Usage

plot_reachability(optics_result, log_scale = FALSE, epsilon = NULL)

Arguments

optics_result

Result returned by run_optics.

log_scale

Logical value indicating whether reachability values should be shown on a logarithmic scale.

epsilon

Optional numeric value displayed as a horizontal reference line.

Value

A ggplot object containing the reachability graph.


Tetouan Power Consumption Data

Description

Electricity consumption and weather measurements recorded at 10-minute intervals in three zones of Tetouan, Morocco.

Usage

power_consumption

Format

A data frame with 13,906 rows and 9 variables:

Datetime

Date and time of the observation.

Temperature

Ambient temperature.

Humidity

Relative humidity.

WindSpeed

Wind speed.

GeneralDiffuseFlows

General diffuse solar radiation.

DiffuseFlows

Diffuse solar radiation.

PowerConsumption_Zone1

Electricity consumption in Zone 1.

PowerConsumption_Zone2

Electricity consumption in Zone 2.

PowerConsumption_Zone3

Electricity consumption in Zone 3.

Source

UCI Machine Learning Repository, Tetouan City Power Consumption dataset.


Prepare a Feature Matrix

Description

Converts input features to a matrix, replaces missing values, and optionally applies normalization before clustering.

Usage

prepare_features(X, normalize = NULL)

Arguments

X

Numeric matrix or data frame containing extracted features.

normalize

Optional normalization method. Available options are "zscore", "range", or NULL.

Value

A numeric matrix prepared for further analysis.


Rolling Statistics

Description

Calculates moving summary statistics for each column of a numeric signal. The window is centered on each observation and becomes shorter near the beginning and end of the signal.

Usage

rolling_stats(X, window_size = 10, stats = c("mean", "sd", "max", "min"))

Arguments

X

Numeric vector, matrix, or data frame. Rows represent observations and columns represent signals.

window_size

Positive integer defining the moving window length.

stats

Character vector containing the statistics to calculate. Available options are "mean", "sd", "max", and "min".

Value

A named list containing one numeric matrix for each selected statistic.


DBSCAN Clustering

Description

Applies DBSCAN to a numeric feature matrix. The algorithm identifies dense groups of observations and labels observations outside these groups as noise.

Usage

run_dbscan(X, eps, min_pts)

Arguments

X

Numeric matrix or data frame. Rows represent observations and columns represent features.

eps

Maximum distance used to define the neighbourhood of a point.

min_pts

Minimum number of points required to form a dense region.

Value

A list containing cluster labels, the number of clusters, the number of noise observations, the method name, and parameter values.

Examples

set.seed(1)

X <- rbind(
  matrix(rnorm(60, mean = 0, sd = 0.5), ncol = 2),
  matrix(rnorm(60, mean = 5, sd = 0.5), ncol = 2)
)

result <- run_dbscan(X, eps = 1, min_pts = 5)

result$n_clusters
result$n_noise
table(result$cluster)


OPTICS Clustering

Description

Applies an OPTICS-based procedure to a numeric feature matrix. The function orders observations according to local density and calculates reachability values used to represent cluster structure.

Usage

run_optics(X, eps, min_pts, distance = c("euclidean", "manhattan"))

Arguments

X

Numeric matrix or data frame. Rows represent observations and columns represent features.

eps

Maximum distance used to find neighbouring observations.

min_pts

Minimum number of neighbouring points required to define a dense region.

distance

Distance measure. Available options are "euclidean" and "manhattan".

Value

A list containing cluster labels, the number of clusters, the number of noise observations, reachability values, observation order, the method name, and parameter values.

Examples

set.seed(1)

X <- rbind(
  matrix(rnorm(60, mean = 0, sd = 0.5), ncol = 2),
  matrix(rnorm(60, mean = 5, sd = 0.5), ncol = 2)
)

result <- run_optics(
  X,
  eps = 1.5,
  min_pts = 5,
  distance = "euclidean"
)

result$n_clusters
result$n_noise
head(result$reachability)


Segment a Signal

Description

Divides a numeric signal into consecutive non-overlapping windows. Values that do not complete a full window are omitted.

Usage

segment_signal(x, window_size)

Arguments

x

Numeric vector containing the signal.

window_size

Positive integer defining the number of values in each window.

Value

A list of numeric vectors with equal window lengths.


Select Numeric Columns

Description

Extracts numeric columns from a data frame and returns them as a matrix.

Usage

select_numeric_columns(data)

Arguments

data

Data frame containing numeric and non-numeric columns.

Value

A numeric matrix containing only the numeric columns.


Steel Industry Energy Consumption Data

Description

Energy consumption and production-related measurements recorded at 15-minute intervals during 2018 at a steel production facility in South Korea.

Usage

steel_industry

Format

A data frame with 35,040 rows and 11 variables:

date

Date and time of the observation.

Usage_kWh

Electricity consumption in kilowatt-hours.

Lagging_Current_Reactive.Power_kVarh

Lagging reactive power.

Leading_Current_Reactive_Power_kVarh

Leading reactive power.

CO2.tCO2.

Carbon dioxide emissions.

Lagging_Current_Power_Factor

Lagging current power factor.

Leading_Current_Power_Factor

Leading current power factor.

NSM

Number of seconds since midnight.

WeekStatus

Weekday or weekend category.

Day_of_week

Day of the week.

Load_Type

Electricity load category.

Source

UCI Machine Learning Repository, Steel Industry Energy Consumption dataset.


Thin a Dataset

Description

Reduces the number of rows by keeping every selected row interval.

Usage

thin_data(data, step = 1)

Arguments

data

Data frame or matrix.

step

Positive integer indicating the interval between retained rows. A value of 1 keeps all rows.

Value

A data frame or matrix containing the selected rows.


Multi-Level Wavelet Approximation

Description

Applies a multi-level discrete wavelet decomposition and reconstructs the signal using only the approximation information.

Usage

wavelet_approx(x, wavelet = "d4", n_levels = 2)

Arguments

x

Numeric vector containing the signal.

wavelet

Wavelet filter name accepted by the waveslim package.

n_levels

Positive integer defining the number of decomposition levels.

Value

Numeric vector containing the reconstructed approximation signal.


Single-Level Wavelet Transform

Description

Applies a single-level discrete wavelet transform to a numeric signal. The function returns approximation and detail coefficients.

Usage

wavelet_transform(x, wavelet = c("db1", "db4"))

Arguments

x

Numeric vector containing the signal.

wavelet

Wavelet type. Available options are "db1" and "db4".

Value

A list containing the approximation coefficients in cA and the detail coefficients in cD.


Window Summary Statistics

Description

Divides a numeric signal into consecutive non-overlapping windows and calculates selected summary statistics for each window.

Usage

window_moments(
  x,
  window_size = 4,
  stats = c("mean", "sd", "skewness", "kurtosis")
)

Arguments

x

Numeric vector containing the signal.

window_size

Positive integer defining the number of observations in each window.

stats

Character vector containing the statistics to calculate. Available options are "mean", "sd", "skewness", "kurtosis", "range", and "energy".

Value

A numeric matrix in which rows represent windows and columns represent the selected statistics.


Wrap Angles

Description

Converts angular values to the standard interval from -pi to pi.

Usage

wrap_to_pi(x)

Arguments

x

Numeric vector or matrix containing angles in radians.

Value

Values with the same dimensions as the input, wrapped to the interval from -pi inclusive to pi exclusive.