API Documentation
himap.ab module
Functions:
|
Provides the current value of u checking whether the t-d is non-negative, t is less than n_samples, and d is greate than or equal to t - (n_samples - 1). |
|
Computes the forward variable alpha needed for the likelihood computation and the parameters re-estimation. |
|
Computes the backward variable beta needed for the likelihood computation and the parameters re-estimation. |
|
Computes the u values needed for the forward variable computation. |
- himap.ab._curr_u(n_samples, u, t, j, d)
Provides the current value of u checking whether the t-d is non-negative, t is less than n_samples, and d is greate than or equal to t - (n_samples - 1). Utilized by the
_forwardauxiliary function.- Parameters:
n_samples (int) – Number of samples.
u (np.ndarray) – Array of shape (n_samples, n_states, n_durations) containing the u values as produced by the
_u_onlyfunction.t (int) – Current time step.
j (int) – Current state.
d (int) – Current duration.
- Returns:
curr_u – Current value of u.
- Return type:
float
See also
_forwardFunction that computes the forward variable.
himap.base.HSMM._core_u_onlyMethod that computes the u values.
- himap.ab._forward(n_samples, n_states, n_durations, log_startprob, log_transmat, log_durprob, left_censor, right_censor, eta, u, xi)
Computes the forward variable alpha needed for the likelihood computation and the parameters re-estimation. Utilized by the
HSMM._core_forwardmethod.- Parameters:
n_samples (int) – Number of samples.
n_states (int) – Number of states.
n_durations (int) – Number of durations.
log_startprob (np.ndarray) – Array of shape (n_states,) containing the log of the initial state probabilities.
log_transmat (np.ndarray) – Array of shape (n_states, n_states) containing the log of the transition probabilities.
log_durprob (np.ndarray) – Array of shape (n_states, n_durations) containing the log of the duration probabilities.
left_censor (int) – 0 if no left censoring, 1 if left censoring (Default is 0).
right_censor (int) – 0 if no right censoring, 1 if right censoring (Default is 0).
eta (np.ndarray) – Array of shape (n_samples, n_states, n_durations) containing the eta values.
u (np.ndarray) – Array of shape (n_samples, n_states, n_durations) containing the u values as produced by the
_u_onlyauxiliary function.xi (np.ndarray) – Array of shape (n_samples, n_states, n_states) containing the xi values.
- Returns:
alpha – Array of shape (n_states,) containing the alpha values.
- Return type:
np.ndarray
See also
himap.base.HSMM._core_forwardMethod that computes the forward variable.
- himap.ab._backward(n_samples, n_states, n_durations, log_startprob, log_transmat, log_durprob, right_censor, beta, u, betastar)
Computes the backward variable beta needed for the likelihood computation and the parameters re-estimation. Utilized by the
HSMM._core_backwardmethod.- Parameters:
n_samples (int) – Number of samples.
n_states (int) – Number of states.
n_durations (int) – Number of durations.
log_startprob (np.ndarray) – Array of shape (n_states,) containing the log of the initial state probabilities.
log_transmat (np.ndarray) – Array of shape (n_states, n_states) containing the log of the transition probabilities.
log_durprob (np.ndarray) – Array of shape (n_states, n_durations) containing the log of the duration probabilities.
right_censor (int) – 0 if no right censoring, 1 if right censoring (Default is 0).
beta (np.ndarray) – Array of shape (n_samples, n_states) containing the initialized beta values.
u (np.ndarray) – Array of shape (n_samples, n_states, n_durations) containing the u values as produced by the
_u_onlyauxiliary function.betastar (np.ndarray) – Array of shape (n_samples, n_states) containing the beta* values.
- Return type:
None
See also
himap.base.HSMM._core_backwardMethod that computes the backward variable.
Notes
The beta values are computed inplace.
- himap.ab._u_only(n_samples, n_states, n_durations, log_obsprob, u)
Computes the u values needed for the forward variable computation. Utilized by the
HSMM._core_u_onlymethod.- Parameters:
n_samples (int) – Number of samples.
n_states (int) – Number of states.
n_durations (int) – Number of durations.
log_obsprob (np.ndarray) – Array of shape (n_samples, n_states) containing the log of the observation probabilities.
u (np.ndarray) – Array of shape (n_samples, n_states, n_durations) containing the u values.
- Return type:
None
See also
himap.base.HSMM._core_u_onlyMethod that computes the u values.
Notes
The u values are computed inplace.
himap.base module
Classes:
|
Base class for Hidden Semi-Markov Models (HSMMs) |
|
The GaussianHSMM class models Hidden Semi-Markov processes with Gaussian-distributed emissions. |
|
The HMM class models Hidden Markov processes with discrete emissions. |
- class himap.base.HSMM(n_states=2, n_durations=5, n_iter=20, tol=0.01, left_to_right=False, obs_state_len=None, f_value=None, random_state=None, name='', results_parent_path=None)
Bases:
objectBase class for Hidden Semi-Markov Models (HSMMs)
- Parameters:
n_states (int) – Number of hidden states. Must be ≥ 2.
n_durations (int) – Number of duration categories per state. Must be ≥ 1.
n_iter (int) – Maximum number of iterations for training.
tol (float) – Convergence threshold for stopping the training.
left_to_right (bool) – Indicates whether the model follows a left-to-right topology.
obs_state_len (int, optional) – Length of the observed state (required if f_value is provided).
f_value (int/float, optional) – Final observed value of the state (required if obs_state_len is provided).
random_state (int/None, optional) – Seed for reproducibility.
name (str, optional) – Name of the model. Defaults to “hsmm” if not provided.
results_parent_path (str, optional) – The path to create the himap_results directory tree where the results (models, figures, dictionaries, performance metrics) are saved
Methods:
_init([X])Initializes model parameters if they are not already set.
_init_mc()Initialize the model parameters for MC sampling (to be implemented in child class).
Ensures that the HiMAP Cython extension is available.
_check()Validates the initialized parameters:
_dur_init(*args)Ιnitializes duration parameters if there are no arguments yet (to be implemented in child class).
_dur_check(*args)Checks if properties of duration parameters are satisfied arguments (to be implemented in child class).
_dur_probmat(*args)Compute the probability per state of each duration arguments (to be implemented in child class).
_dur_mstep(*args)Compute the duration parameters (to be implemented in child class).
_emission_logl(*args)Compute the log-likelihood of each observation under each state (to be implemented in child class).
_emission_pre_mstep(*args)Prepare for emission parameters re-estimation (process gamma and save output to emission_var) (to be implemented in child class).
_emission_mstep(*args)Compute the emission parameters.
_state_sample(*args)Genrate observation sequence for given state arguments (to be implemented in child class).
sample([n_samples, random_state])Generates a sequence of observations and corresponding state sequence performing a random walk on the model (MC Sampling).
mc_dataset(num, timesteps)Generates a dataset of a number of observations and corresponding state sequences utilizing the
samplemethod._core_u_only(logframe)Computes auxiliary matrix u for duration probabilities utilizing the
ab._u_onlymethod._core_forward(u, logdur)Performs the forward step of the HSMM algorithm using duration and transition probabilities, utilizing the
ab._forwardmethod._core_backward(u, logdur)Implements the backward algorithm for the HSMM.
_core_smoothed(beta, betastar, eta, xi)Combines forward and backward variables to compute the smoothed probabilities.
_core_viterbi(u, logdur)Implements the Viterbi algorithm for finding the most probable state sequence given the observations.
score(X)Computes the log-likelihood of the observation sequences under the current model.
predict(X)Predicts the most likely hidden state sequence for a given observation sequence using the Viterbi algorithm.
fit(X[, save_iters])Trains the model using the Expectation-Maximization (EM) algorithm.
bic(train)Computes the Bayesian Information Criterion (BIC) score to evaluate model performance.
fit_bic(X, states[, return_models])Fits multiple models with different numbers of states, evaluates them using
bicmethod, and selects the best one.RUL(viterbi_states, max_samples[, equation])Estimates the Remaining Useful Life (RUL) for a given state history using convolution of duration probabilities.
prognostics(data[, max_samples, plot_rul, ...])Performs prognostics for given degradation histories, estimating RUL utilizing the
RULmethod and saving the results.save_model([path])Saves the current model state to self.results_path/models/self.name.pkl or to path if provided.
load_model([model_name, path])Loads a previously saved model state from self.results.path/models/model_name.pkl or to path if provided.
- _init(X=None)
Initializes model parameters if they are not already set. For left-to-right models: Sets the initial state to 1 (pi[0] = 1) and enforces forward transitions. For other topologies: Distributes probabilities evenly among states.
- Parameters:
X (dict) – Observation dataset (optional) as a dictionary with trajectory identifiers and observation sequences made with the
utils.create_data_hsmmmethod.- Return type:
None
See also
himap.utils.create_data_hsmmGenerates a dataset of trajectories for the model.
- _init_mc()
Initialize the model parameters for MC sampling (to be implemented in child class).
- _require_core()
Ensures that the HiMAP Cython extension is available. :rtype: None
- _check()
Validates the initialized parameters:
Ensures starting probabilities (pi) sum to 1. Checks transition matrix (tmat) shape and sums across rows. Verifies duration probabilities.
- Return type:
None
- _dur_init(*args)
Ιnitializes duration parameters if there are no arguments yet (to be implemented in child class).
- _dur_check(*args)
Checks if properties of duration parameters are satisfied arguments (to be implemented in child class).
- _dur_probmat(*args)
Compute the probability per state of each duration arguments (to be implemented in child class).
- _dur_mstep(*args)
Compute the duration parameters (to be implemented in child class).
- _emission_logl(*args)
Compute the log-likelihood of each observation under each state (to be implemented in child class).
- _emission_pre_mstep(*args)
Prepare for emission parameters re-estimation (process gamma and save output to emission_var) (to be implemented in child class).
- _emission_mstep(*args)
Compute the emission parameters. arguments (to be implemented in child class).
- _state_sample(*args)
Genrate observation sequence for given state arguments (to be implemented in child class).
- sample(n_samples=5, random_state=None)
Generates a sequence of observations and corresponding state sequence performing a random walk on the model (MC Sampling).
- Parameters:
n_samples (int) – Number of observations to generate.
random_state (int/None) – Seed for reproducibility.
- Returns:
ctr_sample (int) – Number of samples generated.
X (ndarray) – Generated observation sequence.
state_sequence (ndarray) – State sequence corresponding to the observations.
- mc_dataset(num, timesteps)
Generates a dataset of a number of observations and corresponding state sequences utilizing the
samplemethod.- Parameters:
num (int) – Number of samples to generate.
timesteps (int) – Number of maximum timesteps for each sample.
- Returns:
obs (dict[str, List[int]]) – A dictionary with trajectory observations.
states (dict[str, List[int]]) – A dictionary with the corresponding states for each trajectory.
See also
HSMM.sampleGenerates a sequence of observations and corresponding state sequence performing a random walk on the model (MC Sampling).
- _core_u_only(logframe)
Computes auxiliary matrix u for duration probabilities utilizing the
ab._u_onlymethod.- Parameters:
logframe (ndarray) – A 2D array of log-likelihood values for each observation under each state. Shape: (n_samples, n_states).
- Returns:
u – A 3D array of intermediate values computed for each sample, state, and duration. Shape: (n_samples, n_states, n_durations).
- Return type:
ndarray
See also
himap.ab._u_onlyComputes the auxiliary matrix u for duration probabilities.
- _core_forward(u, logdur)
Performs the forward step of the HSMM algorithm using duration and transition probabilities, utilizing the
ab._forwardmethod.- Parameters:
u (ndarray) – Intermediate values computed from _core_u_only. Shape: (n_samples, n_states, n_durations).
logdur (ndarray) – Logarithm of the duration probabilities for each state. Shape: (n_states, n_durations).
- Returns:
eta (ndarray) – Smoothed probabilities for states and durations at each sample. Shape: (n_samples + 1, n_states, n_durations).
xi (ndarray) – Transition probabilities between states at each step. Shape: (n_samples + 1, n_states, n_states).
alpha (ndarray) – Forward probabilities for each state at each sample. Shape: (n_samples, n_states).
See also
himap.ab._forwardPerforms the forward step of the HSMM algorithm.
- _core_backward(u, logdur)
Implements the backward algorithm for the HSMM. Computes backward probabilities and intermediate variables for scaling. Utilizes the
ab._backwardmethod.- Parameters:
u (ndarray) – Scaled forward probabilities from _core_u_only.
logdur (ndarray) – Logarithmic duration probability matrix.
- Returns:
beta (ndarray) – Backward probabilities for each state.
betastar (ndarray) – Scaled backward probabilities.
See also
himap.ab._backwardImplements the backward algorithm for the HSMM.
- _core_smoothed(beta, betastar, eta, xi)
Combines forward and backward variables to compute the smoothed probabilities. Implemented in Cython.
- Parameters:
beta (ndarray) – Backward probabilities for each state.
betastar (ndarray) – Scaled backward probabilities.
eta (ndarray) – Transition probabilities.
xi (ndarray) – Joint probabilities of transitions.
- Returns:
gamma – Smoothed probabilities.
- Return type:
ndarray
- _core_viterbi(u, logdur)
Implements the Viterbi algorithm for finding the most probable state sequence given the observations.
- Parameters:
u (ndarray) – Scaled forward probabilities from _core_u_only.
logdur (ndarray) – Logarithmic duration probability matrix.
- Returns:
state_sequence (ndarray) – The most probable sequence of states.
state_logl (float) – Log-likelihood of the state sequence.
- score(X)
Computes the log-likelihood of the observation sequences under the current model.
- Parameters:
X (ndarray) – Observation sequences.
- Returns:
score – Total log-likelihood of the observations.
- Return type:
float
- predict(X)
Predicts the most likely hidden state sequence for a given observation sequence using the Viterbi algorithm.
- Parameters:
X (ndarray) – Observation sequences.
- Returns:
state_sequence (ndarray) – Predicted state sequence.
state_logl (float) – Log-likelihood of the predicted state sequence.
- fit(X, save_iters=False)
Trains the model using the Expectation-Maximization (EM) algorithm.
- Parameters:
X (dict) – Observation sequences following the format of the
utils.create_data_hsmmmethod.save_iters (bool, optional) – Whether to save the model after each iteration. Defaults to False.
- Returns:
self – The trained model.
- Return type:
object
See also
himap.utils.create_data_hsmmGenerates a dataset of trajectories for the model.
- bic(train)
Computes the Bayesian Information Criterion (BIC) score to evaluate model performance.
- Parameters:
train (dict) – Observation sequences used for training.
- Returns:
score – The BIC score for the model.
- Return type:
float
- fit_bic(X, states, return_models=False)
Fits multiple models with different numbers of states, evaluates them using
bicmethod, and selects the best one.- Parameters:
X (dict) – Observation sequences (same format as fit).
states (list[int]) – List of state counts to evaluate.
return_models (bool, optional) – Whether to return all trained models. Defaults to False.
- Returns:
self (object) – The best-performing model.
bic (list[float]) – BIC scores for each fitted model.
models (dict, optional) – All trained models, returned if return_models=True.
See also
himap.utils.create_data_hsmmGenerates a dataset of trajectories for the model.
HSMM.bicComputes the Bayesian Information Criterion (BIC) score to evaluate model performance.
HSMM.fitTrains the model using the Expectation-Maximization (EM) algorithm.
- RUL(viterbi_states, max_samples, equation=1)
Estimates the Remaining Useful Life (RUL) for a given state history using convolution of duration probabilities.
- Parameters:
viterbi_states (numpy.ndarray) – Sequence of Viterbi states representing the history of hidden states.
max_samples (int) – Maximum length of RUL to consider.
equation (int, optional) – Equation type for RUL estimation. Default is 1.
- Returns:
RUL (numpy.ndarray) – RUL probability distribution for each timestep.
mean_RUL (numpy.ndarray) – Mean RUL for each timestep.
UB_RUL (numpy.ndarray) – Upper bound of the RUL distribution.
LB_RUL (numpy.ndarray) – Lower bound of the RUL distribution.
- prognostics(data, max_samples=None, plot_rul=True, get_metrics=True, equation=1, return_results=False)
Performs prognostics for given degradation histories, estimating RUL utilizing the
RULmethod and saving the results.- Parameters:
data (dict) – A dictionary where keys are trajectory IDs and values are degradation histories following the format of the
utils.create_data_hsmmmethod.max_samples (int, optional) – Maximum length of RUL. Defaults to 10x the maximum trajectory length.
plot_rul (bool, optional) – Whether to plot RUL results for each sample. Default is True.
get_metrics (bool, optional) – Whether to compute and save evaluation metrics. Default is True.
equation (int, optional) – Equation type for RUL estimation. Default is 1.
return_results (bool, optional) – Whether to return the results: mean_rul_per_step, pdf_ruls_all, upper_rul_per_step, lower_rul_per_step (default is False).
- Returns:
None if return_results is False
mean_rul_per_step (dict (Optional)) – A dictionary containing the mean_RUL ndarray per trajectory.
pdf_ruls_all (dict (Optional)) – A dictionary containing the RUL ndarray per trajectory.
upper_rul_per_step (dict (Optional)) – A dictionary containing the upper_rul_per_step ndarray per trajectory.
lower_rul_per_step (dict (Optional)) – A dictionary containing the lower_rul_per_step ndarray per trajectory.
Notes
Saves the following in the ‘results’ directory:
PDF RUL distributions.
Mean RUL per step.
Upper and lower RUL bounds.
Evaluation metrics (if get_metrics=True).
RUL plots (if plot_rul=True).
See also
HSMM.RULEstimates the Remaining Useful Life (RUL) for a given state history using convolution of duration probabilities.
himap.utils.create_data_hsmmGenerates a dataset of trajectories for the model.
- save_model(path=None)
Saves the current model state to self.results_path/models/self.name.pkl or to path if provided.
- Parameters:
path (str (optional)) – The path to save the model (overrides the default path).
- Return type:
None
- load_model(model_name=None, path=None)
Loads a previously saved model state from self.results.path/models/model_name.pkl or to path if provided.
- Parameters:
model_name (str (optional)) – Name of the model file to load (without extension). Not needed if path is provided.
path (str (optional)) – The path to load the model from (overrides the default path).
- Return type:
None
- class himap.base.GaussianHSMM(n_states=2, n_durations=5, n_iter=100, tol=0.5, left_to_right=True, obs_state_len=None, f_value=None, random_state=None, name='', results_parent_path=None, kmeans_init='k-means++', kmeans_n_init='auto')
Bases:
HSMMThe GaussianHSMM class models Hidden Semi-Markov processes with Gaussian-distributed emissions. It supports explicit duration modeling, and it can handle left-to-right or arbitrary state transitions. K-means clustering is used for initialization.
- Parameters:
n_states (int) – Number of hidden states in the model. Default is 2.
n_durations (int) – Maximum duration for each state. Default is 5.
n_iter (int) – Maximum number of iterations for model fitting. Default is 100.
tol (float) – Convergence threshold for the EM algorithm. Default is 0.5.
left_to_right (bool) – If True, constrains transitions to progress in a left-to-right manner. Default is True for prognostics.
obs_state_len (int, optional) – Length of observed state (relevant in specific configurations).
f_value (float, optional) – Emission value for the final state, if applicable.
random_state (int or RandomState instance, optional) – Seed or random state for reproducibility.
name (str) – Name identifier for the model.
kmeans_init (str) – Initialization method for K-means clustering (‘k-means++’ or ‘random’). Default is ‘k-means++’.
kmeans_n_init (int or str) – Number of initializations for K-means clustering. Default is ‘auto’.
results_parent_path (str, optional) – The path to create the himap_results directory tree where the results (models, figures, dictionaries, performance metrics) are saved
Methods:
_init([X])Initializes model parameters based on input data
X._init_mc()Initializes model parameters for the Monte Carlo Sampling example.
_check()Performs validation checks to ensure model parameters are consistent.
Initializes the duration probability matrix
self.dur.Validates the duration probability matrix
self.dur.Returns the duration probability matrix
self.dur._dur_mstep(new_dur)Performs the M-step update for the duration probabilities.
Calculates the log-likelihood of the emissions given the observations.
_emission_mstep(X, emission_var[, inplace])Performs the M-step update for emission parameters.
_state_sample(state[, random_state])Generates a sample from the Gaussian distribution of a specified state.
- _init(X=None)
Initializes model parameters based on input data
X.- Parameters:
X (numpy.ndarray, optional) – Observations to initialize the model. If None, defaults to 1D Gaussian emissions.
- Return type:
None
- _init_mc()
Initializes model parameters for the Monte Carlo Sampling example.
- Return type:
None
- _check()
Performs validation checks to ensure model parameters are consistent.
- Return type:
None.
- _dur_init()
Initializes the duration probability matrix
self.dur.- Return type:
None.
- _dur_check()
Validates the duration probability matrix
self.dur.- Return type:
None.
- _dur_probmat()
Returns the duration probability matrix
self.dur. (no changes for non-parametric duration distributions)
- _dur_mstep(new_dur)
Performs the M-step update for the duration probabilities. (no changes for non-parametric duration distributions) :param new_dur: Updated duration probabilities. :type new_dur: numpy.ndarray
- Return type:
None
- _emission_logl(X)
Calculates the log-likelihood of the emissions given the observations.
- Parameters:
X (numpy.ndarray) – Observations.
- Returns:
logframe – Log-likelihood of each observation under each state.
- Return type:
numpy.ndarray
- _emission_mstep(X, emission_var, inplace=True)
Performs the M-step update for emission parameters.
- Parameters:
X (numpy.ndarray) – Observations.
emission_var (numpy.ndarray) – Responsibilities or posteriors for each observation-state pair.
inplace (bool, optional) – If True, updates parameters in-place. If False, returns updated parameters.
- Returns:
mean (numpy.ndarray, optional) – Updated means for each state (if inplace=False).
covmat (numpy.ndarray, optional) – Updated covariance matrices for each state (if inplace=False).
- _state_sample(state, random_state=None)
Generates a sample from the Gaussian distribution of a specified state.
- Parameters:
state (int) – Index of the state to sample from.
random_state (int or RandomState, optional) – Random seed or state for reproducibility.
- Returns:
sample – Sampled observation.
- Return type:
numpy.ndarray
- class himap.base.HMM(n_states=2, n_obs_symbols=30, n_iter=100, tol=0.01, left_to_right=True, name='', results_parent_path=None)
Bases:
objectThe HMM class models Hidden Markov processes with discrete emissions.
- Parameters:
n_states (int) – Number of hidden states in the model. Must be ≥ 2.
n_obs_symbols (int) – Number of observation symbols.
n_iter (int) – Maximum number of iterations for training. Default is 100.
tol (float) – Tolerance for convergence during training. Default is 1e-2.
left_to_right (bool) – Whether the HMM uses a left-to-right structure. Default is True for use in prognostics.
name (str) – Name of the model. Default is “hmm” if no name is provided.
results_parent_path (str, optional) – The path to create the himap_results directory tree where the results (models, figures, dictionaries, performance metrics) are saved
Methods:
_init([X])Initializes transition and emission matrices based on model structure (left_to_right).
_init_mc()Initializes the model parameters for the Monte Carlo Sampling example.
fit(X[, return_all_scores, save_iters])Trains the HMM using the Baum-Welch algorithm.
fit_bic(X, states[, return_models])Fits multiple HMMs using the Bayesian Information Criterion (BIC) to select the best model.
decode(history, calc_emi, calc_tr)Computes forward (fs) and backward (bs) probabilities for a given sequence.
sample()Generates a sequence of observations and corresponding state sequences performing a random walk on the model.
mc_dataset(n_samples)Generates a dataset of a number of observations and corresponding state sequences utilizing the
samplemethod.predict(history[, return_score])Predicts the most likely state sequence for a given observation sequence using the Viterbi algorithm.
estimate(history, estimatedStates[, ...])Estimates transition and emission matrices based on observed sequences and states.
RUL(estimatedStates, max_samples[, confidence])Estimates the remaining useful life of a system based on state sequence.
prognostics(data[, max_samples, plot_rul, ...])Performs prognostics utilizing the
RULmethod and evaluates model performance.save_model([path])Saves the current model state to self.results_path/models/self.name.pkl or to path if provided.
load_model([model_name, path])Loads a previously saved model state from self.results.path/models/model_name.pkl or to path if provided.
- _init(X=None)
Initializes transition and emission matrices based on model structure (left_to_right).
- Parameters:
X (dict) – Dataset of trajectories for determining the maximum sequence length following the format of
utils.create_data_hsmm. The default is None.- Return type:
None.
See also
himap.utils.create_data_hsmmGenerates a dataset of trajectories for the model.
- _init_mc()
Initializes the model parameters for the Monte Carlo Sampling example.
- Return type:
None.
- fit(X, return_all_scores=False, save_iters=False)
Trains the HMM using the Baum-Welch algorithm.
- Parameters:
X (dict) – Observations organized as { “traj_<index>”: [sequence] } following the format of
utils.create_data_hsmm.return_all_scores (bool, optional) – If True, returns log-likelihood scores for all iterations, default is False.
save_iters (bool, optional) – If True, saves the model at each iteration, default is False.
- Returns:
hmm (object) – Trained HMM instance.
score_per_iter (list, optional) – Log-likelihood scores for each iteration (if return_all_scores=True).
See also
himap.utils.create_data_hsmmGenerates a dataset of trajectories for the model.
- fit_bic(X, states, return_models=False)
Fits multiple HMMs using the Bayesian Information Criterion (BIC) to select the best model.
- Parameters:
X (dict) – Observation dataset.
states (list) – List of candidate numbers of states.
return_models (bool, optional) – If True, returns all trained models and BIC scores (default is False).
- Returns:
hmm (object) – Best HMM model based on BIC.
bic (list) – BIC scores for each candidate model.
models (dict, optional) – All trained models and BIC scores (if return_models=True).
See also
HMM.fitFits the HMM using the Baum-Welch algorithm.
- decode(history, calc_emi, calc_tr)
Computes forward (fs) and backward (bs) probabilities for a given sequence.
- Parameters:
history (list) – Observation sequence.
calc_emi (array) – Current emission matrix.
calc_tr (array) – Current transition matrix.
- Returns:
pStates (numpy.ndarray) – Posterior probabilities for states.
pSeq (float) – Log-probability of the sequence.
fs (numpy.ndarray) – Forward probabilities.
bs (numpy.ndarray) – Backward probabilities.
s (numpy.ndarray) – Scaling factors.
- sample()
Generates a sequence of observations and corresponding state sequences performing a random walk on the model.
- Returns:
history (list) – A list containing the generated sequence of observations, where each observation corresponds to a state in the sequence.
states (list) – A list containing the sequence of states visited during the process, where each state is represented by its index.
- mc_dataset(n_samples)
Generates a dataset of a number of observations and corresponding state sequences utilizing the
samplemethod.- Parameters:
n_samples (int) – Number of sequences to generate.
- Returns:
obs (dict) – Generated observation sequences.
states_all (dict) – Corresponding state sequences.
See also
HMM.sampleGenerates a sequence of observations and corresponding state sequences.
- predict(history, return_score=False)
Predicts the most likely state sequence for a given observation sequence using the Viterbi algorithm.
- Parameters:
history (list) – Observation sequence.
return_score (bool, optional) – If True, returns the log-probability of the best state sequence (default is False).
- Returns:
currentState (numpy.ndarray) – Most likely state sequence.
logP (float, optional) – Log-probability of the predicted sequence (if return_score=True).
- estimate(history, estimatedStates, return_matrices=False)
Estimates transition and emission matrices based on observed sequences and states.
- Parameters:
history (list) – Observation sequence.
estimatedStates (list) – Corresponding state sequence.
return_matrices (bool, optional) – If True, returns the matrices instead of updating the model (default is False).
- Returns:
tr (numpy.ndarray, optional) – Updated transition matrix (if return_matrices=True).
emi (numpy.ndarray, optional) – Updated emission matrix (if return_matrices=True).
hmm (object) – Updated HMM instance.
- RUL(estimatedStates, max_samples, confidence=0.95)
Estimates the remaining useful life of a system based on state sequence.
- Parameters:
estimatedStates (list) – Sequence of estimated states.
max_samples (int) – Maximum number of timesteps for RUL estimation.
confidence (float) – Confidence level for bounds.
- Returns:
rul_mean (list) – Mean RUL estimates.
rul_upper_bound (list) – Upper confidence bounds.
rul_lower_bound (list) – Lower confidence bounds.
rul_matrix (numpy.ndarray) – RUL probability distributions.
- prognostics(data, max_samples=None, plot_rul=True, get_metrics=True, return_results=False)
Performs prognostics utilizing the
RULmethod and evaluates model performance.- Parameters:
data (dict) – Observation data for multiple trajectories following the format of
utils.create_data_hsmm.max_samples (int, optional) – Maximum timesteps for RUL. Default is 10× the max sequence length (default is None).
plot_rul (bool, optional) – If True, saves RUL plots (default is True).
get_metrics (bool, optional) – If True, evaluates RUL predictions with metrics (default is True).
return_results (bool, optional) – Whether to return the results: mean_rul_per_step, pdf_ruls_all, upper_rul_per_step, lower_rul_per_step (default is False).
- Returns:
None if return_results is False
rul_mean_all (dict (Optional)) – A dictionary containing the rul_mean ndarray per trajectory.
pdf_ruls_all (dict (Optional)) – A dictionary containing the rul_matrix ndarray per trajectory.
rul_upper_bound_all (dict (Optional)) – A dictionary containing the rul_upper_bound ndarray per trajectory.
rul_lower_bound_all (dict (Optional)) – A dictionary containing the rul_lower_bound ndarray per trajectory.
See also
HMM.RULEstimates the remaining useful life of a system based on state sequence.
himap.utils.create_data_hsmmGenerates a dataset of trajectories for the model.
- save_model(path=None)
Saves the current model state to self.results_path/models/self.name.pkl or to path if provided.
- Parameters:
path (str (optional)) – The path to save the model (overrides the default path).
- Return type:
None
- load_model(model_name=None, path=None)
Loads a previously saved model state from self.results.path/models/model_name.pkl or to path if provided.
- Parameters:
model_name (str (optional)) – Name of the model file to load (without extension). Not needed if path is provided.
path (str (optional)) – The path to load the model from (overrides the default path).
- Return type:
None
himap.main module
Functions:
|
Run the process for the selected model |
|
Main function for running the HMM models |
- himap.main.run_process(args)
Run the process for the selected model
- Parameters:
args (argparse.Namespace) –
Arguments for the process. Expected attributes are:
hsmm (bool): Flag to indicate if HSMM model should be used.
mc_sampling (bool): Flag to indicate if Monte Carlo sampling should be used.
bic_fit (bool): Flag to indicate if BIC fitting should be performed.
save (bool): Flag to indicate if the model should be saved.
metrics (bool): Flag to indicate if metrics should be calculated.
enable_visuals (bool): Flag to indicate if visualizations should be enabled.
num_histories (int): Number of histories for Monte Carlo sampling.
n_states (int): Number of states for the HMM/HSMM model.
- Return type:
None
- himap.main.himap_main(hsmm, mc_sampling, bic_fit, save, metrics, enable_visuals, num_histories, n_states)
Main function for running the HMM models
- Parameters:
hsmm (bool) – If True use Hidden Semi-Markov Model. If False use Hidden Markov Model.
mc_sampling (bool) – If True use Monte-Carlo Sampling as case example. If False use CMAPSS data.
bic_fit (bool) – If True enable Bayesian Information Criterion fitting for Markov Models.
save (bool) – If True enable saving of the fitted models.
metrics (bool) – If True enable calculation of performance metrics for RUL prediction.
enable_visuals (bool) – If True enable generating and saving figures.
num_histories (int) – The number of generated histories via Monte Carlo Sampling. It is only used if mc_sampling is True.
n_states (int) – The number of hidden states for Markov Model.
- Return type:
None
himap.plot module
Functions:
|
Plot multiple degradation histories from MC sampling. |
|
Plot RUL predictiction with confidence intervals vs true RUL. |
- himap.plot.plot_multiple_observ(obs, states, num2plot)
Plot multiple degradation histories from MC sampling.
- Parameters:
obs (dict) – Dictionary containing all observations.
states (dict) – Dictionary containing all statesm of the corresponding observations.
num2plot (int) – Number of histories to plot.
- Return type:
None
Notes
The figure is saved at ‘/path/to/current/directory/results/figures/mc_traj.png’.
- himap.plot.plot_ruls(rul_mean, rul_upper, rul_lower, fig_path)
Plot RUL predictiction with confidence intervals vs true RUL.
- Parameters:
rul_mean (list) – Mean RUL predictions.
rul_upper (list) – Upper bound of the confidence interval.
rul_lower (list) – Lower bound of the confidence interval.
fig_path (str) – Path to save the figure.
- Return type:
None
Notes
The figure is saved at ‘/path/to/current/directory/results/figures/’.
himap.utils module
Classes:
|
Custom JSON encoder to handle numpy.ndarray and numpy.integer objects for serialization. |
Functions:
|
|
|
Creates a dictionary of trajectories for input into the HSMM model. |
|
Loads the C-MAPSS dataset and prepares it for input into the HSMM model. |
Applies the log function to an array, masking zero values. |
|
|
Returns the history states for a single trajectory. |
|
Applies the Viterbi algorithm to predict the most probable states for each trajectory in data using the HSMM. |
|
Prepares trajectory data for input into the HSMM model by appending f_value and adjusting indexing if needed. |
|
Computes the Root Mean Square Error (RMSE) between predicted Remaining Useful Life (RUL) and true RUL. |
|
Calculates the coverage of true RUL values within the predicted upper and lower bounds. |
|
Calculates the area under the curve weighted by time for the given x and y values. |
|
Computes the Weighted Spread Uncertainty (WSU) between the upper and lower bounds. |
|
Evaluates the test set by calculating RMSE, coverage, and WSU. |
|
Implements the Baum-Welch algorithm for parameter estimation in Hidden Markov Models (HMM). |
|
Computes the forward probabilities (fs) for a given sequence using the emission and transition matrices. |
|
Computes the backward probabilities (bs) for a given sequence using the emission and transition matrices. |
|
Calculates the expected value of a probability mass function (PMF). |
|
Calculates the cumulative distribution function (CDF) and percentile values for a given probability mass function (PMF). |
|
Create a directory structure for storing results. |
- class himap.utils.NumpyArrayEncoder(*, skipkeys=False, ensure_ascii=True, check_circular=True, allow_nan=True, sort_keys=False, indent=None, separators=None, default=None)
Bases:
JSONEncoderCustom JSON encoder to handle numpy.ndarray and numpy.integer objects for serialization.
Constructor for JSONEncoder, with sensible defaults.
If skipkeys is false, then it is a TypeError to attempt encoding of keys that are not str, int, float or None. If skipkeys is True, such items are simply skipped.
If ensure_ascii is true, the output is guaranteed to be str objects with all incoming non-ASCII characters escaped. If ensure_ascii is false, the output can contain non-ASCII characters.
If check_circular is true, then lists, dicts, and custom encoded objects will be checked for circular references during encoding to prevent an infinite recursion (which would cause an RecursionError). Otherwise, no such check takes place.
If allow_nan is true, then NaN, Infinity, and -Infinity will be encoded as such. This behavior is not JSON specification compliant, but is consistent with most JavaScript based encoders and decoders. Otherwise, it will be a ValueError to encode such floats.
If sort_keys is true, then the output of dictionaries will be sorted by key; this is useful for regression tests to ensure that JSON serializations can be compared on a day-to-day basis.
If indent is a non-negative integer, then JSON array elements and object members will be pretty-printed with that indent level. An indent level of 0 will only insert newlines. None is the most compact representation.
If specified, separators should be an (item_separator, key_separator) tuple. The default is (’, ‘, ‘: ‘) if indent is
Noneand (‘,’, ‘: ‘) otherwise. To get the most compact JSON representation, you should specify (‘,’, ‘:’) to eliminate whitespace.If specified, default is a function that gets called for objects that can’t otherwise be serialized. It should return a JSON encodable version of the object or raise a
TypeError.Methods:
default(obj)Implement this method in a subclass such that it returns a serializable object for
o, or calls the base implementation (to raise aTypeError).- default(obj)
Implement this method in a subclass such that it returns a serializable object for
o, or calls the base implementation (to raise aTypeError).For example, to support arbitrary iterators, you could implement default like this:
def default(self, o): try: iterable = iter(o) except TypeError: pass else: return list(iterable) # Let the base class default method raise the TypeError return JSONEncoder.default(self, o)
- himap.utils.str2bool(v)
- himap.utils.create_data_hsmm(files, obs_state_len, f_value)
Creates a dictionary of trajectories for input into the HSMM model.
- Parameters:
files (list of str) – List of file paths to CSV files containing trajectory data.
obs_state_len (int) – The length of the observed state.
f_value (float) – A value used for fixing the input data.
- Returns:
traj – A dictionary where keys are trajectory identifiers and values are lists of cluster data.
- Return type:
dict
- himap.utils.load_data_cmapss(obs_state_len=5, f_value=21)
Loads the C-MAPSS dataset and prepares it for input into the HSMM model.
- Parameters:
obs_state_len (int, optional) – Length to be used for the failure state, by default 5
f_value (int, optional) – Failure value corresponding to the final state, by default 21
- Returns:
seqs_train (dict) – A dictionary containing the training trajectories.
seqs_test (dict) – A dictionary containing the testing trajectories.
- himap.utils.log_mask_zero(a)
Applies the log function to an array, masking zero values.
- Parameters:
a (np.ndarray) – An array of values.
- Returns:
The log-transformed array with zero values masked.
- Return type:
np.ndarray
- himap.utils.get_single_history_states(states, index, last_state)
Returns the history states for a single trajectory.
- Parameters:
states (list) – A list of list, each list contains the states for a trajectory.
index (int) – The index of the trajectory
last_state (int) – The last state of the trajectory
- Returns:
history_states – A list of the history states for the trajectory.
- Return type:
list
- himap.utils.get_viterbi(HSMM, data)
Applies the Viterbi algorithm to predict the most probable states for each trajectory in data using the HSMM.
- Parameters:
HSMM (HSMM) – The trained Hidden Semi-Markov Model used to predict states.
data (dict[str, List[int]]) – A dictionary of trajectories where each key is a trajectory name and each value is a list of observations.
- Returns:
results – A list of lists containing the predicted states for each trajectory.
- Return type:
List[List[int]]
- himap.utils.fix_input_data(traj, f_value, obs_state_len, is_zero_indexed=True)
Prepares trajectory data for input into the HSMM model by appending f_value and adjusting indexing if needed.
- Parameters:
traj (dict[str, List[int]]) – A dictionary containing the trajectories as lists of observed states.
f_value (int) – The value to append to each trajectory.
obs_state_len (int) – The number of times to append f_value to each trajectory.
is_zero_indexed (bool, optional) – Flag indicating whether the data is zero-indexed. Default is True.
- Returns:
traj – The modified trajectory dictionary with f_value appended and indexing adjusted if necessary.
- Return type:
dict[str, List[int]]
- himap.utils.get_rmse(mean_rul_dict, true_rul_dict)
Computes the Root Mean Square Error (RMSE) between predicted Remaining Useful Life (RUL) and true RUL.
- Parameters:
mean_rul_dict (dict[str, List[float]]) – A dictionary where each key is a trajectory name and the value is the list of predicted RUL values.
true_rul_dict (dict[str, int]) – A dictionary where each key is a trajectory name and the value is the true RUL for that trajectory.
- Returns:
df_results – A DataFrame containing RMSE values for each trajectory, including the average RMSE.
- Return type:
pd.DataFrame
- himap.utils.get_coverage(upper_bound_dict, lower_bound_dict, true_rul_dict)
Calculates the coverage of true RUL values within the predicted upper and lower bounds.
- Parameters:
upper_bound_dict (dict[str, List[float]]) – A dictionary where each key is a trajectory name and the value is the list of upper bounds for predicted RUL.
lower_bound_dict (dict[str, List[float]]) – A dictionary where each key is a trajectory name and the value is the list of lower bounds for predicted RUL.
true_rul_dict (dict[str, int]) – A dictionary where each key is a trajectory name and the value is the true RUL for that trajectory.
- Returns:
df_results – A DataFrame containing coverage values for each trajectory, including the average coverage.
- Return type:
pd.DataFrame
- himap.utils.calculate_area_weighted_by_time(x_values, y_values)
Calculates the area under the curve weighted by time for the given x and y values.
- Parameters:
x_values (list[int]) – A list of x values (e.g., time).
y_values (list[float]) – A list of y values (predicted values).
- Returns:
area – The area under the curve weighted by time.
- Return type:
float
- himap.utils.get_wsu(upper_bound_dict, lower_bound_dict)
Computes the Weighted Spread Uncertainty (WSU) between the upper and lower bounds.
- Parameters:
upper_bound_dict (dict[str, List[float]]) – A dictionary where each key is a trajectory name and the value is the list of upper bounds for predicted RUL.
lower_bound_dict (dict[str, List[float]]) – A dictionary where each key is a trajectory name and the value is the list of lower bounds for predicted RUL.
- Returns:
df_results – A DataFrame containing WSU values for each trajectory, including the average WSU.
- Return type:
pd.DataFrame
- himap.utils.evaluate_test_set(mean_rul_dict, upper_bound_dict, lower_bound_dict, true_rul_dict)
Evaluates the test set by calculating RMSE, coverage, and WSU.
- Parameters:
mean_rul_dict (dict[str, List[float]]) – A dictionary where each key is a trajectory name and the value is the list of predicted RUL values.
upper_bound_dict (dict[str, List[float]]) – A dictionary where each key is a trajectory name and the value is the list of upper bounds for predicted RUL.
lower_bound_dict (dict[str, List[float]]) – A dictionary where each key is a trajectory name and the value is the list of lower bounds for predicted RUL.
true_rul_dict (dict[str, int]) – A dictionary where each key is a trajectory name and the value is the true RUL for that trajectory.
- Returns:
combined_df – A DataFrame combining RMSE, coverage, and WSU for each trajectory, including the average values.
- Return type:
pd.DataFrame
- himap.utils.baumwelch_method(n_states, n_obs_symbols, logPseq, fs, bs, scale, score, history, tr, emi, calc_tr, calc_emi)
Implements the Baum-Welch algorithm for parameter estimation in Hidden Markov Models (HMM).
- Parameters:
n_states (int) – The number of hidden states in the model.
n_obs_symbols (int) – The number of observation symbols
logPseq (float) – The log-probability of the observed sequence.
fs (np.ndarray) – The forward probabilities matrix (shape: [n_states, sequence_length]).
bs (np.ndarray) – The backward probabilities matrix (shape: [n_states, sequence_length]).
scale (np.ndarray) – The scale factors for normalization (shape: [1, sequence_length]).
score (float) – The cumulative score (log probability) to be updated.
history (List[int]) – The sequence of observed symbols (integer indices).
tr (np.ndarray) – The transition matrix (shape: [n_states, n_states]).
emi (np.ndarray) – The emission matrix (shape: [n_states, n_obs_symbols]).
calc_tr (np.ndarray) – A precomputed matrix of transition probabilities (shape: [n_states, n_states]).
calc_emi (np.ndarray) – A precomputed matrix of emission probabilities (shape: [n_states, n_obs_symbols]).
- Returns:
tr (np.ndarray) – Updated transition matrix after the algorithm has performed parameter estimation.
emi (np.ndarray) – Updated emission matrix after the algorithm has performed parameter estimation.
- himap.utils.fs_calculation(n_states, end_traj, fs, s, history, calc_emi, calc_tr)
Computes the forward probabilities (fs) for a given sequence using the emission and transition matrices.
- Parameters:
n_states (int) – The number of hidden states in the model.
end_traj (int) – The length of the observation sequence.
fs (np.ndarray) – The forward probabilities matrix (shape: [n_states, end_traj]).
s (np.ndarray) – Scaling factors to prevent underflow (shape: [1, end_traj]).
history (List[int]) – The sequence of observed symbols (integer indices).
calc_emi (np.ndarray) – A matrix of emission probabilities (shape: [n_states, n_obs_symbols]).
calc_tr (np.ndarray) – A matrix of transition probabilities (shape: [n_states, n_states]).
- Returns:
fs (np.ndarray) – The updated forward probabilities matrix.
s (np.ndarray) – The updated scaling factors.
- himap.utils.bs_calculation(n_states, end_traj, bs, s, history, calc_emi, calc_tr)
Computes the backward probabilities (bs) for a given sequence using the emission and transition matrices.
- Parameters:
n_states (int) – The number of hidden states in the model.
end_traj (int) – The length of the observation sequence.
bs (np.ndarray) – The backward probabilities matrix (shape: [n_states, end_traj]).
s (np.ndarray) – Scaling factors for normalization (shape: [1, end_traj]).
history (List[int]) – The sequence of observed symbols (integer indices).
calc_emi (np.ndarray) – A matrix of emission probabilities (shape: [n_states, n_obs_symbols]).
calc_tr (np.ndarray) – A matrix of transition probabilities (shape: [n_states, n_states]).
- Returns:
bs – The updated backward probabilities matrix.
- Return type:
np.ndarray
- himap.utils.calculate_expected_value(pmf_values)
Calculates the expected value of a probability mass function (PMF).
- Parameters:
pmf_values (List[float]) – A list of probabilities for each possible value.
- Returns:
expected_value – The expected value calculated from the PMF.
- Return type:
float
- himap.utils.calculate_cdf(pmf, confidence_level)
Calculates the cumulative distribution function (CDF) and percentile values for a given probability mass function (PMF).
- Parameters:
pmf (List[float]) – A list of probabilities for each possible value.
confidence_level (float) – The confidence level for calculating the percentiles (e.g., 0.95 for 95%).
- Returns:
lower_value – The index corresponding to the lower percentile.
- Return type:
int
- himap.utils.create_folders(results_parent_path=None)
Create a directory structure for storing results.
This function creates a main himap_results folder in the specified results parent directory path and the subdirectories within it, including “dictionaries”, “figures”, and “models”. If the results_parent_path is not specified, the himap_results folder is created in the current working directory.
- Parameters:
results_parent_path (str (Optional)) – Defines the parent directory of the results folder where the himap_results directory tree is created.
Notes
The function does not return any values.
The created folder structure is as follows:
himap_results/
├── dictionaries/
├── figures/
├── models/
Examples
>>> create_folders() Created folder: /results_parent_path/himap_results Created folder: /results_parent_path/himap_results/dictionaries Created folder: /results_parent_path/himap_results/figures Created folder: /results_parent_path/himap_results/models