- sushie.cli.process_raw(rawData: List[RawData], keep_subject: List[str], pi: DataFrame, keep_ambiguous: bool, maf: float, rint: bool, no_regress: bool, mega: bool, cv: bool, cv_num: int, seed: int) Tuple[DataFrame, CleanData, CleanData | None, List[CVData] | None] [source]
The function to process raw phenotype, genotype, covariates data across ancestries.
- Parameters:
- rawData: List[RawData]¶
Raw data for phenotypes, genotypes, covariates across ancestries.
- keep_subject: List[str]¶
The DataFrame that contains subject ID that fine-mapping performs on.
- pi: DataFrame¶
The DataFrame that contains prior weights for each SNP to be causal.
- keep_ambiguous: bool¶
The indicator whether to keep ambiguous SNPs.
- maf: float¶
The minor allele frequency threshold to filter the genotypes.
- rint: bool¶
The indicator whether to perform rank inverse normalization on each phenotype data.
- no_regress: bool¶
The indicator whether to regress genotypes on covariates.
- mega: bool¶
The indicator whether to prepare datasets for mega SuShiE.
- cv: bool¶
The indicator whether to prepare datasets for cross-validation.
- cv_num: int¶
The number for \(X\)-fold cross-validation.
- seed: int¶
The random seed for row-wise shuffling the datasets for cross validation.
- Returns:
- A tuple of
SNP information (
pd.DataFrame
),dataset for running SuShiE (
io.CleanData
),dataset for mega SuShiE (
Optional[io.CleanData]
),dataset for cross-validation (
Optional[List[io.CVData]]
).
- Return type:
Tuple[pd.DataFrame, io.CleanData, Optional[io.CleanData], Optional[List[io.CVData]]]
Last update:
Oct 27, 2024