Run vertex-wise linear mixed model using lme4::lmer()

This is is the main function for conducting vertex-wise linear mixed model analyses on brain surface metrics. It will first check use inputs, prepare the phenotype data(list) and run a linear mixed model at each vertex of the specified hemisphere using the single_lmm function.

The function supports analysis of both single and multiple imputed datasets. It also automatically handles cortical masking, and provides cluster-wise correction for multiple testing using FreeSurfer's Monte Carlo simulation approach.

Usage

run_vw_lmm(
  formula,
  pheno,
  subj_dir,
  outp_dir = NULL,
  hemi = c("lh", "rh"),
  folder_id = "folder_id",
  seed = 3108,
  n_cores = 1,
  chunk_size = 1000,
  FS_HOME = Sys.getenv("FREESURFER_HOME"),
  fs_template = "fsaverage",
  apply_cortical_mask = TRUE,
  use_model_template = TRUE,
  save_ss = TRUE,
  verbose = TRUE,
  fwhm = 10,
  mcz_thr = 30,
  cwp_thr = 0.025
)

Arguments

formula

A model formula object. This should specify a linear mixed model lme4 syntax. The outcome variable should be one of the supported brain surface metrics (see Details). Example: vw_thickness ~ age * sex + site + (1|participant_id).

pheno

Either a data.frame/tibble containing the "phenotype" data (i.e., already loaded in the global environment), or a string specifying the file path to phenotype data. Supported file formats: .rds, .csv, .txt, .sav (SPSS). The data should be in long format and it should contain all the variables specified in the formula plus the folder_id column.

subj_dir

Character string specifying the path to FreeSurfer data directory. Must follow the verywise directory structure (see package vignette for details).

outp_dir

Character string specifying the output directory for results. If NULL (default), creates a "verywise_results" subdirectory within subj_dir.

hemi

Character string specifying which hemisphere(s) to analyze. Options: "both" (default), "lh" (left only), "rh" (right only).

folder_id

Character string specifying the column name in pheno that contains subject directory names of the input neuroimaging data (e.g., "sub-001_ses-baseline" or "site1/sub-010_ses-F1"). These are expected to be nested inside subj_dir. Default: "folder_id".

seed

Integer specifying the random seed for reproducibility. Default: 3108.

n_cores

Integer specifying the number of CPU cores for parallel processing. Default: 1.

chunk_size

Integer specifying the number of vertices processed per chunk in parallel operations. Larger values use more memory but may be faster. Default: 1000.

FS_HOME

Character string specifying the FreeSurfer home directory. Defaults to FREESURFER_HOME environment variable.

fs_template

Character string specifying the FreeSurfer template for vertex registration. Options:

"fsaverage" (default) = 163842 vertices (highest resolution),
"fsaverage6" = 40962 vertices,
"fsaverage5" = 10242 vertices,
"fsaverage4" = 2562 vertices,
"fsaverage3" = 642 vertices

Note that lower resolutions should be only used to downsample the brain map, for faster model tuning. The final analyses should also run using fs_template = "fsaverage" to avoid (small) imprecisions in vertex registration and smoothing.

apply_cortical_mask

Logical indicating whether to exclude non-cortical vertices from analysis. Default: TRUE (recommended).

use_model_template

Logical indicating whether to pre-compile the model template for faster estimation. Default: TRUE (recommended).

save_ss

Logical indicating whether to save the super-subject matrix as an .rds file for faster future processing. Default: TRUE (recommended).

verbose

Logical indicating whether to display progress messages. Default: TRUE.

fwhm

Numeric value specifying the full-width half-maximum for smoothing kernel. Default: 10.

mcz_thr

Numeric value for Monte Carlo simulation threshold. Any of the following are accepted (equivalent values separate by /):

13 / 1.3 / 0.05,
20 / 2.0 / 0.01,
23 / 2.3 / 0.005,
30 / 3.0 / 0.001, \* default
33 / 3.3 / 0.0005,
40 / 4.0 / 0.0001.

cwp_thr

Numeric value for cluster-wise p-value threshold (on top of all corrections). Set this to 0.025 when both hemispheres are analyzed, 0.05 for single hemisphere. Default: 0.025.

Value

A list of file-backed matrices (bigstatsr::FBM objects) containing pooled coefficients, SEs, t- and p- values and residuals. Results are also automatically saved to disk in .mgh format.

Details

Supported Brain Surface Metrics: The outcome specified in formula should be a brain surface metric among:

vw_thickness - Cortical thickness
vw_area - Cortical surface area (white surface)
vw_area.pial - Cortical surface area (pial surface)
vw_curv - Mean curvature
vw_jacobian_white - Jacobian determinant (white surface)
vw_pial - Pial surface coordinates
vw_pial_lgi - Local gyrification index (pial surface)
vw_sulc - Sulcal depth
vw_volume - Gray matter volume
vw_w_g.pct - White/gray matter intensity ratio
vw_white.H - Mean curvature (white surface)
vw_white.K - Gaussian curvature (white surface)

Statistical Approach: The function uses lme4::lmer() for mixed-effects modeling, enabling analysis of longitudinal and hierarchical data. P-values are computed using the t-as-z approximation, with cluster-wise correction applied using FreeSurfer's Monte Carlo simulation approach.

Multiple Imputation: The function automatically detects and handles multiple imputed datasets (created with mice or similar packages), pooling results according to Rubin's rules.

Parallel processing: The verywise package employs a carefully designed parallelization strategy to maximize computational efficiency while avoiding the performance penalties associated with nested parallelization. Left and right cortical hemispheres are processed sequentially by default. Parallel processing of the two hemispheres (and/or different metrics / models) should be handled by the user (for example, using SLURM job arrays or similar, see vignette on parallelisation [COMING UP]) Within each hemisphere, vertices are divided into chunks of size chunk_size and processed in parallel across n_cores workers (when n_cores > 1). When multiple imputed datasets are present, these are processed sequentially within each vertex.

Note that, on some systems, implicit parallelism in low-level matrix algebra libraries (BLAS/LAPACK) can interfere with explicit parallelization. If you feel like processing is taking too long, we recommend disabling these implicit threading libraries before starting R. For example:


export OPENBLAS_NUM_THREADS=1
export OMP_NUM_THREADS=1
export MKL_NUM_THREADS=1
export VECLIB_MAXIMUM_THREADS=1
export NUMEXPR_NUM_THREADS=1

Also note that using a very large number of cores (e.b. >120) may sometimes cause worker initialization or other issues (e.g. R parallel processess limits)

Output Files: Results are saved in FreeSurfer-compatible .mgh format for visualization with [verywiseWIZard](https://github.com/SereDef/verywise-wizard), FreeView or other neuroimaging software.

Note

Ensure FreeSurfer is properly installed and the FREESURFER_HOME environment variable is set.
Large datasets may require substantial memory. Consider adjusting chunk_size and n_cores based on your system specifications.
For reproducibility, always specify a seed.

Author

Serena Defina, 2024.

Examples

# Basic cortical thickness analysis
if (FALSE) { # \dontrun{
results <- run_vw_lmm(
  formula = vw_thickness ~ age + sex + site + (1|participant_id),
  pheno = "path/to/phentype/data", # or data.frame object 
  subj_dir = "/path/to/freesurfer/subjects",
  outp_dir = "/path/to/output",
  hemi = "lh",
  n_cores = 4)
} # }

Run vertex-wise linear mixed model using `lme4::lmer()`