sushie.cli.process_raw

The function to process raw phenotype, genotype, covariates data across ancestries.

Parameters:

rawData: List[RawData]¶: Raw data for phenotypes, genotypes, covariates across ancestries.
keep_subject: List[str]¶: The DataFrame that contains subject ID that fine-mapping performs on.
pi: DataFrame¶: The DataFrame that contains prior weights for each SNP to be causal.
keep_ambiguous: bool¶: The indicator whether to keep ambiguous SNPs.
maf: float¶: The minor allele frequency threshold to filter the genotypes.
rint: bool¶: The indicator whether to perform rank inverse normalization on each phenotype data.
no_regress: bool¶: The indicator whether to regress genotypes on covariates.
mega: bool¶: The indicator whether to prepare datasets for mega SuShiE.
cv: bool¶: The indicator whether to prepare datasets for cross-validation.
cv_num: int¶: The number for \(X\)-fold cross-validation.
seed: int¶: The random seed for row-wise shuffling the datasets for cross validation.

Returns:

A tuple of

Return type:

Tuple[pd.DataFrame, io.CleanData, Optional[io.CleanData], Optional[List[io.CVData]]]

Last update: Oct 27, 2024