Skip to contents

Simulating a dataset with multiple cohorts/sites and multiple timepoints/sessions. This generates:

  • a pheno data.frame with participant sex and age mock-data, saved as "phenotype.csv" file in the path directory.

  • the FreeSurfer data saved as .mgh files organised in a verywise folder structure.

Usage

simulate_dataset(
  path,
  data_structure = list(cohort1 = list(sessions = c("01", "02", "03"), n_subjects = 10),
    cohort2 = list(sessions = c("01", "02"), n_subjects = 20)),
  simulate_association = 0.05 * pheno$age,
  overwrite = TRUE,
  seed = 31081996,
  ...
)

Arguments

path

Where should the dataset be created.

data_structure

A nested list, with top level determining the cohort/ dataset/site. Each site is itself a list with two items: "sessions": a vector of session names/numbers; and "n_subjects": an integer indicating the number of subjects.

simulate_association

(default = NULL) simulate an association in the format beta * variable. This is by default isolated to three regions: the superior temporal gyrus, precentral gyrus and middle temporal gyrus.

overwrite

(default = TRUE) whether phenotype file should be overwitten.

seed

(default = 31081996) seed used for randomization.

...

Other arguments to be passed to simulate_freesurfer_data

Author

Serena Defina, 2024.