This is is the main function for conducting vertex-wise linear mixed model analyses
on brain surface metrics. It will first check use inputs, prepare the phenotype data(list)
and run a linear mixed model at each vertex of the specified hemisphere using the
single_lmm
function.
The function supports analysis of both single and multiple imputed datasets. It also automatically handles cortical masking, and provides cluster-wise correction for multiple testing using FreeSurfer's Monte Carlo simulation approach.
Usage
run_vw_lmm(
formula,
pheno,
subj_dir,
outp_dir = NULL,
hemi = c("lh", "rh"),
folder_id = "folder_id",
seed = 3108,
n_cores = 1,
chunk_size = 1000,
FS_HOME = Sys.getenv("FREESURFER_HOME"),
fs_template = "fsaverage",
apply_cortical_mask = TRUE,
use_model_template = TRUE,
save_ss = TRUE,
verbose = TRUE,
fwhm = 10,
mcz_thr = 30,
cwp_thr = 0.025
)
Arguments
- formula
A model formula object. This should specify a linear mixed model
lme4
syntax. The outcome variable should be one of the supported brain surface metrics (see Details). Example:vw_thickness ~ age * sex + site + (1|participant_id)
.- pheno
Either a
data.frame
/tibble
containing the "phenotype" data (i.e., already loaded in the global environment), or a string specifying the file path to phenotype data. Supported file formats: .rds, .csv, .txt, .sav (SPSS). The data should be in long format and it should contain all the variables specified in theformula
plus thefolder_id
column.- subj_dir
Character string specifying the path to FreeSurfer data directory. Must follow the verywise directory structure (see package vignette for details).
- outp_dir
Character string specifying the output directory for results. If
NULL
(default), creates a "verywise_results" subdirectory withinsubj_dir
.- hemi
Character string specifying which hemisphere(s) to analyze. Options:
"both"
(default),"lh"
(left only),"rh"
(right only).- folder_id
Character string specifying the column name in
pheno
that contains subject directory names of the input neuroimaging data (e.g., "sub-001_ses-baseline" or "site1/sub-010_ses-F1"). These are expected to be nested insidesubj_dir
. Default:"folder_id"
.- seed
Integer specifying the random seed for reproducibility. Default: 3108.
- n_cores
Integer specifying the number of CPU cores for parallel processing. Default: 1.
- chunk_size
Integer specifying the number of vertices processed per chunk in parallel operations. Larger values use more memory but may be faster. Default: 1000.
- FS_HOME
Character string specifying the FreeSurfer home directory. Defaults to
FREESURFER_HOME
environment variable.- fs_template
Character string specifying the FreeSurfer template for vertex registration. Options:
"fsaverage"
(default) = 163842 vertices (highest resolution),"fsaverage6"
= 40962 vertices,"fsaverage5"
= 10242 vertices,"fsaverage4"
= 2562 vertices,"fsaverage3"
= 642 vertices
Note that lower resolutions should be only used to downsample the brain map, for faster model tuning. The final analyses should also run using
fs_template = "fsaverage"
to avoid (small) imprecisions in vertex registration and smoothing.- apply_cortical_mask
Logical indicating whether to exclude non-cortical vertices from analysis. Default:
TRUE
(recommended).- use_model_template
Logical indicating whether to pre-compile the model template for faster estimation. Default:
TRUE
(recommended).- save_ss
Logical indicating whether to save the super-subject matrix as an .rds file for faster future processing. Default:
TRUE
(recommended).- verbose
Logical indicating whether to display progress messages. Default:
TRUE
.- fwhm
Numeric value specifying the full-width half-maximum for smoothing kernel. Default: 10.
- mcz_thr
Numeric value for Monte Carlo simulation threshold. Any of the following are accepted (equivalent values separate by
/
):13 / 1.3 / 0.05,
20 / 2.0 / 0.01,
23 / 2.3 / 0.005,
30 / 3.0 / 0.001, \* default
33 / 3.3 / 0.0005,
40 / 4.0 / 0.0001.
- cwp_thr
Numeric value for cluster-wise p-value threshold (on top of all corrections). Set this to 0.025 when both hemispheres are analyzed, 0.05 for single hemisphere. Default: 0.025.
Value
A list of file-backed matrices (bigstatsr::FBM
objects) containing pooled
coefficients, SEs, t- and p- values and residuals.
Results are also automatically saved to disk in .mgh format.
Details
Supported Brain Surface Metrics:
The outcome specified in formula
should be a brain surface metric among:
vw_thickness
- Cortical thicknessvw_area
- Cortical surface area (white surface)vw_area.pial
- Cortical surface area (pial surface)vw_curv
- Mean curvaturevw_jacobian_white
- Jacobian determinant (white surface)vw_pial
- Pial surface coordinatesvw_pial_lgi
- Local gyrification index (pial surface)vw_sulc
- Sulcal depthvw_volume
- Gray matter volumevw_w_g.pct
- White/gray matter intensity ratiovw_white.H
- Mean curvature (white surface)vw_white.K
- Gaussian curvature (white surface)
Statistical Approach:
The function uses lme4::lmer()
for mixed-effects modeling, enabling
analysis of longitudinal and hierarchical data. P-values are computed using
the t-as-z approximation, with cluster-wise correction applied using FreeSurfer's
Monte Carlo simulation approach.
Multiple Imputation:
The function automatically detects and handles multiple imputed datasets
(created with mice
or similar packages), pooling results according
to Rubin's rules.
Parallel processing:
The verywise
package employs a carefully designed parallelization strategy
to maximize computational efficiency while avoiding the performance penalties
associated with nested parallelization.
Left and right cortical hemispheres are processed sequentially by default. Parallel processing
of the two hemispheres (and/or different metrics / models) should be handled by the user
(for example, using SLURM job arrays or similar, see vignette on parallelisation [COMING UP])
Within each hemisphere, vertices are divided into chunks of size chunk_size
and processed
in parallel across n_cores
workers (when n_cores > 1
).
When multiple imputed datasets are present, these are processed sequentially within each vertex.
Note that, on some systems, implicit parallelism in low-level matrix algebra libraries (BLAS/LAPACK) can interfere with explicit parallelization. If you feel like processing is taking too long, we recommend disabling these implicit threading libraries before starting R. For example:
export OPENBLAS_NUM_THREADS=1
export OMP_NUM_THREADS=1
export MKL_NUM_THREADS=1
export VECLIB_MAXIMUM_THREADS=1
export NUMEXPR_NUM_THREADS=1
Also note that using a very large number of cores (e.b. >120) may sometimes cause worker initialization or other issues (e.g. R parallel processess limits)
Output Files: Results are saved in FreeSurfer-compatible .mgh format for visualization with [verywiseWIZard](https://github.com/SereDef/verywise-wizard), FreeView or other neuroimaging software.
Note
Ensure FreeSurfer is properly installed and the
FREESURFER_HOME
environment variable is set.Large datasets may require substantial memory. Consider adjusting
chunk_size
andn_cores
based on your system specifications.For reproducibility, always specify a
seed
.
See also
single_lmm
for single-vertex modeling,
vignette("03-run-vw-lmm", package = "verywise")
for detailed usage examples.
Examples
# Basic cortical thickness analysis
if (FALSE) { # \dontrun{
results <- run_vw_lmm(
formula = vw_thickness ~ age + sex + site + (1|participant_id),
pheno = "path/to/phentype/data", # or data.frame object
subj_dir = "/path/to/freesurfer/subjects",
outp_dir = "/path/to/output",
hemi = "lh",
n_cores = 4)
} # }