Skip to contents

This function subsets a FBM supersubject matrix that was created using build_supersubject, retaining only the rows that correspond to a set of folder (or row) IDs. It reads row names from an associated .csv file, checks for missing IDs, and writes logs if any are not found. The new subsetted matrix can be saved for future use.

Usage

subset_supersubject(
  supsubj_dir,
  supsubj_file,
  folder_ids,
  error_cutoff = 20,
  new_supsubj_dir,
  save_rds = FALSE,
  verbose = TRUE
)

Arguments

supsubj_dir

Character string indicating the path to the directory containing the supersubject files (i.e the supersubject matrix itself as a .rds file, and the associated .bk and .rownames.csv files).

supsubj_file

Character string indicating the name of the supersubject .rds file. Must follow the naming pattern "<hemi>.<measure>.<fs_template>.supersubject.rds".

folder_ids

Character vector of folder IDs to retain in the new ss matrix. This should also be a column in the phenotype dataset.

error_cutoff

Integer indicating the maximum number of missing IDs that is allowed before the function throws an error. If the number of missing IDs is <= error_cutoff , a warning is issued instead. Default: 20.

new_supsubj_dir

Character string indicating the path to the directory where the new supersubject files should be stored (either temporarily or permanently if save_rds == TRUE . Created if it does not exist.

save_rds

Logical. If TRUE, the new ss is also saved to a .rds file inside new_supsubj_dir.

verbose

Logical. Default: TRUE.

Value

A FBM object containing the subsetted supersubject matrix.

Details

The function performs the following steps: 1. Reads row names from the supersubject's `.csv` file. 2. Checks whether all `folder_ids` exist in the supersubject. 3. Logs missing IDs to `issues.log` in `new_supsubj_dir`. 4. If the number of missing IDs exceeds `error_cutoff`, stops with an error. 5. Creates a new FBM with only the matching rows, writing it blockwise to avoid excessive RAM usage. 6. Writes the filtered row names to `ss.rownames.csv` in `new_supsubj_dir`.

Examples

if (FALSE) { # \dontrun{
# Subset a supersubject to a small set of IDs
subset_supersubject(
  supsubj_dir = "path/to/original/ss/",
  supsubj_file = "<hemi>.<measure>.<fsaverage>.supersubject.rds",
  folder_ids = pheno_data[, "folder_id"],
  error_cutoff = 20,
  new_supsubj_dir = "path/to/subsetted/ss/",
  save_rds = TRUE
)
} # }