sushie.cli.process_raw(rawData: List[RawData], keep_subject: List[str], pi: DataFrame, keep_ambiguous: bool, maf: float, rint: bool, no_regress: bool, mega: bool, cv: bool, cv_num: int, seed: int) Tuple[DataFrame, CleanData, CleanData | None, List[CVData] | None][source]

The function to process raw phenotype, genotype, covariates data across ancestries.

Parameters:
rawData: List[RawData]

Raw data for phenotypes, genotypes, covariates across ancestries.

keep_subject: List[str]

The DataFrame that contains subject ID that fine-mapping performs on.

pi: DataFrame

The DataFrame that contains prior weights for each SNP to be causal.

keep_ambiguous: bool

The indicator whether to keep ambiguous SNPs.

maf: float

The minor allele frequency threshold to filter the genotypes.

rint: bool

The indicator whether to perform rank inverse normalization on each phenotype data.

no_regress: bool

The indicator whether to regress genotypes on covariates.

mega: bool

The indicator whether to prepare datasets for mega SuShiE.

cv: bool

The indicator whether to prepare datasets for cross-validation.

cv_num: int

The number for \(X\)-fold cross-validation.

seed: int

The random seed for row-wise shuffling the datasets for cross validation.

Returns:

A tuple of
  1. SNP information (pd.DataFrame),

  2. dataset for running SuShiE (io.CleanData),

  3. dataset for mega SuShiE (Optional[io.CleanData]),

  4. dataset for cross-validation (Optional[List[io.CVData]]).

Return type:

Tuple[pd.DataFrame, io.CleanData, Optional[io.CleanData], Optional[List[io.CVData]]]


Last update: Oct 27, 2024