API Reference

combss.linear

Best subset selection for linear regression.

Two methods are available:

  • Frank-Wolfe method (method='fw', default): Frank-Wolfe homotopy algorithm. Sparsity is controlled by k (model size): COMBSS returns selected features for each k = 1, ..., q. The lam_ridge parameter is an optional ridge regularisation on the coefficients in the inner solver.

  • Original method (method='original'): Adam optimiser with a dynamic lambda grid, as proposed in Moka et al. (2024). Sparsity is controlled by lambda: a grid of lambda values is searched, and each lambda yields a different subset. The best subset is selected by validation MSE.

class combss.linear.model[source]

COMBSS model for best subset selection in linear regression.

fit(X_train, y_train, ...)[source]

Run COMBSS to select the best subset of predictors.

Attributes(after fitting with method='fw', the Frank-Wolfe method)
---------------------------------------------
subset : ndarray or None

Indices of the best subset (0-indexed). Requires validation data.

mse : float or None

Validation MSE for the best subset. Requires validation data.

coef_ : ndarray or None

Regression coefficients (length p, zeros for unselected). Requires validation data.

subset_list : list

Subsets for k = 1, …, q (0-indexed). May be shorter than q if early stopping was triggered.

lam_ridge : float

Ridge penalty used in the inner solver.

Attributes(after fitting with method='original')
--------------------------------------------------
subset : ndarray

Indices of the best subset (0-indexed).

coef_ : ndarray

Regression coefficients (length p, zeros for unselected).

mse : float

Validation MSE for the best subset.

lambda_ : float

Optimal lambda value.

subset_list : list

Subsets across the lambda grid (0-indexed).

lambda_list : list

Lambda grid values.

fit(X_train, y_train, X_val=None, y_val=None, q=None, method='fw', Niter=25, lam_ridge=0, alpha=0.01, scale=True, verbose=True, mandatory_features=None, inner_tol=0.0001, patience=20, min_k=20, nlam=50, t_init=[], scaling=True, tau=0.5, delta_frac=1, eta=0.001, gd_patience=10, gd_maxiter=1000, gd_tol=1e-05, cg_maxiter=None, cg_tol=1e-05)[source]

Fit the COMBSS model for linear regression.

Parameters:
  • X_train (ndarray (n_train, p)) – Training design matrix.

  • y_train (ndarray (n_train,)) – Training response vector.

  • X_val (ndarray (n_val, p), optional) – Validation design matrix. Required for method=’original’. Optional for method=’fw’; when provided, the best subset is selected by validation MSE and coef_ are computed.

  • y_val (ndarray (n_val,), optional) – Validation response. Required for method=’original’.

  • q (int, optional) – Maximum subset size. Defaults to min(n, p).

  • method (str) – 'fw' (default) for the Frank-Wolfe homotopy algorithm, or 'original' for the Adam + dynamic lambda grid method.

  • (method='fw') (Frank-Wolfe method parameters)

  • -------------------------------------

  • Niter (int) – Number of homotopy iterations (default 25).

  • lam_ridge (float) – Ridge regularisation parameter for the inner solver (default 0). This is NOT the sparsity penalty lambda used in the original method.

  • alpha (float) – Frank-Wolfe step size (default 0.01).

  • scale (bool) – Column-normalise X before running (default True).

  • verbose (bool) – Print progress (default True).

  • mandatory_features (list or None) – 1-indexed features to force into every model.

  • inner_tol (float) – Inner solver convergence tolerance (default 1e-4).

  • patience (int) – Early stopping patience (default 20). When validation data is provided, stop if validation MSE has not improved for this many consecutive k values. Only active when X_val/y_val are given. Set to None to disable early stopping.

  • min_k (int) – Minimum number of k values to evaluate before early stopping can trigger (default 20). Together with patience, the total k values evaluated is at least min(min_k + patience, q), capped so that min_k + patience <= p.

  • (method='original') (Original method parameters)

  • -----------------------------------------------

  • nlam (int) – Number of lambda values in the dynamic grid (default 50).

  • t_init (array-like) – Initial t vector (default centre of hypercube).

  • scaling (bool) – Enable feature scaling (default True).

  • tau (float) – Threshold parameter (default 0.5).

  • delta_frac (float) – n/delta in the objective function (default 1).

  • eta (float) – Truncation parameter (default 0.001).

  • gd_patience (int) – Patience for Adam termination (default 10).

  • gd_maxiter (int) – Maximum Adam iterations (default 1000).

  • gd_tol (float) – Adam convergence tolerance (default 1e-5).

  • cg_maxiter (int or None) – Conjugate gradient max iterations (default n_train).

  • cg_tol (float) – Conjugate gradient tolerance (default 1e-5).

combss.logistic

Best subset selection for binary logistic regression.

Uses the Frank-Wolfe homotopy algorithm with Danskin’s envelope gradient and a warm-started sklearn L-BFGS-B inner solver.

Labels y must be binary {0, 1}.

class combss.logistic.model[source]

COMBSS model for best subset selection in binary logistic regression.

fit(X_train, y_train, ...)[source]

Run COMBSS to select the best subset of predictors.

Attributes(available after fitting)
------------------------------------
subset : ndarray or None

Indices of the best subset (0-indexed). Requires validation data.

accuracy : float or None

Validation accuracy for the best subset. Requires validation data.

coef_ : ndarray or None

Logistic regression coefficients (length p, zeros for unselected). Requires validation data.

lam_ridge : float

Ridge penalty used in the inner solver.

subset_list : list

Subsets for k = 1, …, q (0-indexed). May be shorter if early stopping was triggered.

fit(X_train, y_train, X_val=None, y_val=None, q=None, Niter=25, lam_ridge=0, alpha=0.01, scale=True, verbose=True, mandatory_features=None, inner_tol=0.0001, patience=20, min_k=20)[source]

Fit the COMBSS model for binary logistic regression.

Parameters:
  • X_train (ndarray (n, p)) – Training design matrix (no intercept column).

  • y_train (ndarray (n,)) – Binary labels {0, 1}.

  • X_val (ndarray (n_val, p), optional) – Validation design matrix. When provided, the best subset is selected by validation accuracy and coef_ are computed.

  • y_val (ndarray (n_val,), optional) – Validation labels.

  • q (int, optional) – Maximum subset size. Defaults to min(n, p).

  • Niter (int) – Number of homotopy iterations (default 25).

  • lam_ridge (float) – Ridge regularisation parameter for the inner solver (default 0).

  • alpha (float) – Frank-Wolfe step size (default 0.01).

  • scale (bool) – Column-normalise X before running (default True).

  • verbose (bool) – Print progress (default True).

  • mandatory_features (list or None) – 1-indexed features to force into every model.

  • inner_tol (float) – Inner solver convergence tolerance (default 1e-4).

  • patience (int) – Early stopping patience (default 20). Stop if validation accuracy has not improved for this many consecutive k values. Only active when X_val/y_val are given. Set to None to disable.

  • min_k (int) – Minimum k values to evaluate before early stopping can trigger (default 20). Capped so min_k + patience <= p.

combss.multinomial

Best subset selection for multinomial logistic regression.

Uses the Frank-Wolfe homotopy algorithm with a baseline-category multinomial model. Labels y must be in {1, ..., C}.

class combss.multinomial.model[source]

COMBSS model for best subset selection in multinomial logistic regression.

fit(X_train, y_train, ...)[source]

Run COMBSS to select the best subset of predictors.

Attributes(available after fitting)
------------------------------------
subset : ndarray or None

Indices of the best subset (0-indexed). Requires validation data.

accuracy : float or None

Validation accuracy for the best subset. Requires validation data.

coef_ : ndarray or None

Multinomial coefficients (shape (C, p), zeros for unselected). Requires validation data.

lam_ridge : float

Ridge penalty used in the inner solver.

subset_list : list

Subsets for k = 1, …, q (0-indexed). May be shorter if early stopping was triggered.

fit(X_train, y_train, X_val=None, y_val=None, q=None, C=None, Niter=25, lam_ridge=0, alpha=0.01, scale=True, verbose=True, mandatory_features=None, inner_tol=0.0001, patience=20, min_k=20)[source]

Fit the COMBSS model for multinomial logistic regression.

Parameters:
  • X_train (ndarray (n, p)) – Training design matrix (no intercept column).

  • y_train (ndarray (n,)) – Class labels in {1, …, C}.

  • X_val (ndarray (n_val, p), optional) – Validation design matrix. When provided, the best subset is selected by validation accuracy and coef_ are computed.

  • y_val (ndarray (n_val,), optional) – Validation labels.

  • q (int, optional) – Maximum subset size. Defaults to min(n, p).

  • C (int, optional) – Number of classes. If None, inferred from y_train.

  • Niter (int) – Number of homotopy iterations (default 25).

  • lam_ridge (float) – Ridge regularisation parameter for the inner solver (default 0).

  • alpha (float) – Frank-Wolfe step size (default 0.01).

  • scale (bool) – Column-normalise X before running (default True).

  • verbose (bool) – Print progress (default True).

  • mandatory_features (list or None) – 1-indexed features to force into every model.

  • inner_tol (float) – Inner solver convergence tolerance (default 1e-4).

  • patience (int) – Early stopping patience (default 20). Stop if validation accuracy has not improved for this many consecutive k values. Only active when X_val/y_val are given. Set to None to disable.

  • min_k (int) – Minimum k values to evaluate before early stopping can trigger (default 20). Capped so min_k + patience <= p.

combss.cv

Leave-one-out cross-validation for selecting the ridge penalty lam_ridge in the COMBSS Frank-Wolfe algorithm.

Note: the lambda_grid in this module contains ridge penalty values, not the sparsity penalty lambda used in the original COMBSS method.

combss.cv.select_lambda(X, y, q, C=None, lambda_grid=None, Niter=50, alpha=0.01, model_type='multinomial', inner_tol=0.0001, lambda_refit=0.0, verbose=True)[source]

Select the ridge regularisation penalty via leave-one-out cross-validation.

For each candidate ridge penalty in lambda_grid, COMBSS selects features for k = 1..q on the full data. Each selected model is then evaluated by LOOCV on the refit step.

Note: this lambda is a ridge penalty on the coefficients in the inner solver (lam_ridge in the model classes), NOT the sparsity penalty used in the original COMBSS method.

Parameters:
  • X (ndarray (n, p)) – Feature matrix (no intercept column).

  • y (ndarray (n,)) – Response / labels: - {1, …, C} for multinomial - {0, 1} for logit - real-valued for linear

  • q (int) – Maximum subset size.

  • C (int or None) – Number of classes (required for logit/multinomial; ignored for linear).

  • lambda_grid (array-like or None) – Candidate ridge penalty values (default: [0] + logspace(-3, 1, 10)).

  • Niter (int) – COMBSS homotopy iterations (default 50).

  • alpha (float) – Frank-Wolfe step size (default 0.01).

  • model_type (str) – 'multinomial', 'logit', or 'linear'.

  • inner_tol (float) – Inner solver tolerance (default 1e-4).

  • lambda_refit (float) – Ridge penalty for LOOCV refit (default 0).

  • verbose (bool) – Print progress (default True).

Returns:

  • best_lambda (float) – Lambda maximising mean LOOCV accuracy (classification) or minimising mean LOOCV MSE (linear).

  • best_lambda_per_k (dict) – Best lambda for each k individually.

  • results_df (DataFrame) – Columns: lambda, k, loocv_acc/loocv_mse, selected.

combss.cv.loocv_mse_linear(X_sel, y, lambda_refit=0.0)[source]

Exact leave-one-out MSE for linear regression via the hat-matrix identity.

Uses the formula: e_i^{LOO} = (y_i - y_hat_i) / (1 - h_ii) where h_ii are the diagonal entries of the hat matrix.

Cost O(n k^2) – no per-fold loop required.

Parameters:
  • X_sel (ndarray (n, k)) – Columns restricted to selected features.

  • y (ndarray (n,)) – Continuous response.

  • lambda_refit (float) – Ridge penalty on slopes (0 = OLS); intercept is never penalised.

Returns:

float

Return type:

LOOCV mean squared error.

combss.cv.loocv_accuracy(X_sel, y, lambda_refit=0.0)[source]

Leave-one-out cross-validation accuracy for classification.

Parameters:
  • X_sel (ndarray (n, k)) – Columns restricted to selected features.

  • y (ndarray (n,)) – Class labels.

  • lambda_refit (float) – Ridge penalty for the refit model (0 = unpenalised).

Returns:

float

Return type:

fraction of correctly predicted held-out labels.

combss.metrics

Performance metrics for evaluating variable selection.

combss.metrics.performance_metrics(data_X, beta_true, beta_pred)[source]

Computes the evaluation metrics for COMBSS.

data_X : array-like of shape (n_samples, n_covariates) The design matrix, where n_samples is the number of samples observed and n_covariates is the number of covariates measured in each sample.

beta_truearray-like of shape (n_covariates, 1)

The true value of beta used in the generation of data.

beta_predarray-like of shape (n_covariates, 1)

The predicted value of beta generated by COMBSS.

array-like of floats, [pe, MCC, accuracy, sensitivity, specificity, f1_score, precision], where

pe : float The model’s relative prediction error, expressed as a fraction where the L-2 norm of the difference between the fitted values and true predicted values is divided by the L-2 norm of the true predicted values.

MCCfloat

The model’s Matthew’s Correlation Coefficient.

accfloat

The accuracy of the particular model, calculated as proportion of total instances where the model correctly classifies whether or not a predictor is selected in, or rejected from the true model, calculated as a quantity between 0 and 1.

sensfloat

The sensitivity of the particular model, calculated as the proportion of total instances where the model correctly classifies the inclusion of predictors that belong in the true model, calculated as a quantity between 0 and 1.

specfloat

The specificity of the particular model, calculated as the proportion of total instances where the model correctly classifies the rejection of predictors that do not belong in the true model, calculated as a quantity between 0 and 1.

f1float

The F1 Score of the particular model.

precfloat

The precision of the model, calculated as the proportion at which the model correctly includes a true predictor in it’s predicted model, calculated as a quantity between 0 and 1.

combss.metrics.binary_confusion_matrix(y_true, y_pred)[source]

Compute confusion matrix for binary classification.

Args:

y_true (np.ndarray): Ground truth (0 or 1). y_pred (np.ndarray): Predicted labels (0 or 1).

Returns:

np.ndarray: 2x2 confusion matrix [[TN, FP], [FN, TP]].