
Simulate a longitudinal brain surface dataset with associated phenotype data
Source:R/simulate_dataset.R
simulate_dataset.RdGenerates a synthetic longitudinal dataset for multiple sites/cohorts, each with multiple timepoints/sessions per subject. The function produces:
Brain surface data in FreeSurfer
.mghformat, organised in a verywise folder structure (see vignettes for details).A matching
phenodata frame with mock participant sex and age, saved as"phenotype.csv"in thepathdirectory.
This is useful for testing pipelines or demonstrations where realistic FreeSurfer-style data and phenotypic information are required.
Usage
simulate_dataset(
path,
data_structure = list(cohort1 = list(sessions = c("01", "02", "03"), n_subjects = 10),
cohort2 = list(sessions = c("01", "02"), n_subjects = 20)),
fs_template = "fsaverage",
roi_subset = c("temporalpole", "frontalpole", "entorhinal"),
simulate_association = NULL,
location_association = NULL,
overwrite = TRUE,
seed = 3108,
verbose = TRUE,
...
)Arguments
- path
Character string. Directory where the dataset should be created. Will be created if it does not exist.
- data_structure
Named list defining cohorts/sites. Each element is a list with:
"sessions"Character vector of session labels.
"n_subjects"Integer number of subjects.
- fs_template
Character (default =
"fsaverage"). FreeSurfer template for vertex registration. This is used to determine the size of the synthetic brain surface data. Options:"fsaverage"= 163842 vertices (highest resolution)"fsaverage6"= 40962 vertices"fsaverage5"= 10242 vertices"fsaverage4"= 2562 vertices"fsaverage3"= 642 vertices
- roi_subset
Character vector (default = c('temporalpole', 'frontalpole', 'entorhinal')). Vertex-wise data is simulated by default only within a smaller subset (~1.5%) of the total surface. The rest of the vertex values are set to 0, so they won't be analysed, saving time during estimation. The region locations are extracted from the annotation files in that are distributed with FreeSurfer and saved internally in R/sysdata.rda.
- simulate_association
Optional. If numeric, must be of length equal to the number of generated files; if character, must have the format
"<beta> * <variable_name>". Associations are injected into one small region (the entorhinal cortex).- location_association
Optional string or character vector. If specified, the association is only present within these ROIs. The rest of the vertex values will be set to have no relationship with any of the predictors. The region locations are extracted from the annotation files in that are distributed with FreeSurfer and saved internally in R/sysdata.rda.
- overwrite
Logical (default =
TRUE). Whether to overwrite an existing phenotype file.- seed
Integer (default =
3108). Random seed.- verbose
Logical (default =
TRUE). IfTRUE, print progress messages.- ...
Additional arguments passed to
simulate_freesurfer_data.