Users Manual

Initialize Environment

SuShiE is a command-line software written in Python. Before installation, we recommend to create a new environment using conda so that it will not affect the software versions of users’ other projects.

SuShiE uses JAX with Just In Time compilation to achieve high-speed computation. However, there are some issues for JAX with Mac M1 chip. To solve this, users need to initiate conda using miniforge, and then install SuShiE using pip in the desired environment.

Installation

Users can download the latest repository and then use pip:

git clone https://github.com/mancusolab/sushie.git
cd sushie
pip install .

We currently only support Python3.8+.

Data Preparation

Fine-mapping using individual-level data

To fine-map using individual-level data, SuShiE requires at least phenotype and genotype data specified with the option to specify covariates.

Although we highly recommend users to perform high-quality QC on their own genotype, phenotype, and covariate data, we implement following basic QCs in the software:

  1. Remove subjects with N/A values from either phenotype or covariates data.

  2. Remove SNPs that all subjects have N/A value.

  3. Impute SNPs that partial subjects have N/A value based on two times allele frequencies.

  4. Only keep subjects who have data in all the genotype, phenotype, and covariate data.

  5. Only keep SNPs that are available in all the ancestries.

  6. Adjust genotype data across ancestries based on the same reference alleles. Drop non-biallelic SNPs.

  7. Remove SNPs that have minor allele frequency (MAF) less than 1% within each ancestry (users can change 1% with --maf).

  8. Users also have an option to keep ambiguous SNPs (i.e., A/T, T/A, C/G, or GC) by specifying --keep-ambiguous (Default is NOT to keep them).

  9. For single ancestry SuSiE or Mega-SuSiE, users have the option to perform rank inverse normalization transformation on the phenotype data.

See sushie.cli.process_raw() for these QCs’ source codes.

Fine-mapping using summary-level data (GWAS statistics)

To fine-map using summary-level data, SuShiE requires at least GWAS z statistics, sample sizes, and LD data. For LD data, users can provide individual-level genotype in PLINK1.9, VCF, or BGEN format and let SuShiE compute the LD matrix, or provide pre-computed LD matrix in tsv format.

Although we highly recommend users to perform high-quality QC on their own summary-level data, we implement following basic QCs in the software:

  1. Remove SNPs with N/A values in GWAS.

  2. Only keep SNPs that are available in all the ancestries.

  3. Adjust GWAS and genotype data across ancestries based on the same reference alleles. Drop non-biallelic SNPs.

  4. Remove SNPs (for LD computation) that have minor allele frequency (MAF) less than 1% within each ancestry (users can change 1% with --maf).

  5. Users also have an option to keep ambiguous SNPs (i.e., A/T, T/A, C/G, or GC) by specifying --keep-ambiguous (Default is NOT to keep them).

Testing Data

We provide example data in ./data/ folder to test out SuShiE. All the data are in three ancestries: 489 European individuals (EUR), 639 African individuals (AFR), and 481 East Asian individuals (EAS).

The genotype is the high-quality HapMap SNPs in some random gene 1M base-pair window, which contains 123, 129, and 113 SNPs for EUR, AFR, and EAS respectively in 1000G project. We provide genotype data in plink 1, vcf, and bgen 1.3 format.

Using ./data/make_example.py, we simulated phenotype data (2 causal QTLs, cis-SNP heritability: 0.5 and effect size correlation 0.8), random covariate data for each ancestry. The two QTL rsID are rs1886340 and rs10914958. It also outputs all.pheno file that row-binds simulated phenotype across ancestries, all.ancestry.index file that specifies ancestry index if using all.pheno, all.covar, and .\plink\all triplets, keep.subject file that specifies subjects to be included in the inference.

As for the format requirement, see Parameters for detailed explanations.

Examples

SuShiE software is very easy to use, for it only has one command finemap. In this section, we walk through several examples of using SuShiE.

See Output Files for the detailed explanation of output files.

See Parameters for the detailed explanation of parameters.

We make a bash script ./misc/run_sushie.sh to show a more general working flow of running SuShiE.

If users still have questions, feel free to contact the developers.

Here are some examples for fine-mapping using individual-level data:

1. Two-Ancestry SuShiE

In this example, we perform two-ancestry SuShiE with covariates regressed out from both phenotypes and genotypes while updating the prior effect size covariance matrix during the optimizations.

cd ./data/
sushie finemap --pheno EUR.pheno AFR.pheno --vcf vcf/EUR.vcf vcf/AFR.vcf --covar EUR.covar AFR.covar --output ./test_result

2. \(N\)-Ancestry SuShiE

In the example below, we perform single-ancestry SuShiE, which is equivalently to the SuSiE model (see Reference).

cd ./data/
sushie finemap --pheno EUR.pheno --vcf vcf/EUR.vcf --covar EUR.covar --output ./test_result

Or three-ancestry setting:

cd ./data/
sushie finemap --pheno EUR.pheno AFR.pheno EAS.pheno --vcf vcf/EUR.vcf vcf/AFR.vcf vcf/EAS.vcf --covar EUR.covar AFR.covar EAS.covar --output ./test_result

3. Can I use other formats of genotypes?

Yes! SuShiE can take either plink 1, vcf, or bgen, but not plink 2.

For plink 1, SuShiE read in the triplet (bed, bim, and fam) prefix.

cd ./data/
sushie finemap --pheno EUR.pheno AFR.pheno --plink plink/EUR plink/AFR --output ./test_result

For bgen data, users need to make sure that the latter allele shown up in the allele ids is the reference allele.

cd ./data/
sushie finemap --pheno EUR.pheno AFR.pheno --bgen bgen/EUR.bgen bgen/AFR.bgen --output ./test_result

4. My data contains all the participants and I do not want to separate them

No problem! If all the subjects are in single phenotype, genotype, and covariate files. Users just need to use --ancestry-index command to specify a file that subject ID on the first column, and the ancestry index on the second column. The ancestry index has to start from 1 continuously to the total number of ancestry.

cd ./data/
sushie finemap --pheno all.pheno --plink plink/all --ancestry-index all.ancestry.index --output ./test_result

5. How about mega or meta SuShiE?

The software employs the function to run meta SuShiE and mega SuShiE by adding the parameter --meta or --mega.

We define the meta SuShiE as running single-ancestry SuShiE followed by meta analysis of the PIPs:

\[\text{PIP}_{\text{meta}} = 1 - \prod_{i=1}^k(1 - \text{PIP}_{\text{ancestry i}})\]

We define the mega SuShiE as running single-ancestry SuShiE on genotype and phenotype data that is row-wise stacked across ancestries.

cd ./data/
sushie finemap --pheno EUR.pheno AFR.pheno --vcf vcf/EUR.vcf vcf/AFR.vcf --meta --output ./test_result
cd ./data/
sushie finemap --pheno EUR.pheno AFR.pheno --vcf vcf/EUR.vcf vcf/AFR.vcf --mega --output ./test_result

6. Let’s estimate heritability, run CV, and make FUSION files!

SuShiE incorporates codes in limix to estimate the narrow-sense cis-heritability (\(h_g^2\)) by specifying --her.

SuShiE also has a function (--cv) to perform \(X\)-fold cross-validation (CV; --cv-num X) on the ancestry-specific prediction weights to compute the out-of-sample \(r^2\) between predicted and measured expressions with its corresponding \(p\)-value.

Specifically, we randomly (--seed [YOUR SEED]) and equally divide the dataset into X portions. We regard each portion as validation dataset and the rest four portions as training dataset. Then, we perform SuShiE on the training datasets for X times, and predict the expressions on the corresponding validation dataset. Last, we row-wise stack all X predicted expressions and compute the \(r^2\) with row-wise stacked and matched validation dataset.

With these two information (\(h_g^2\) and CV), we prepare R codes ./misc/make_fusion.R to generate FUSION-format prediction weights, thus can be used in TWAS.

cd ./data/
sushie finemap --pheno EUR.pheno AFR.pheno --vcf vcf/EUR.vcf vcf/AFR.vcf --cv --her --output ./test_result
Rscript ./misc/make_FUSION.R ./test_result ~

7. I don’t want to scale my phenotype by its standard deviation

Fine-mapping inference sometimes can be sensitive to whether scaling the phenotypes and genotypes. SuShiE by default scales the phenotypes and genotypes by their respective standard deviations. However, if users want to disable it, simply add --no-scale to the command.

cd ./data/
sushie finemap --pheno EUR.pheno AFR.pheno --vcf vcf/EUR.vcf vcf/AFR.vcf --no-scale --output ./test_result

8. I have my own initial values for the hyperparameters

SuShiE has three hyperparameters (Model Description): the residual variance (\(\sigma^2_e\)) prior, the QTL effect size variance (\(\sigma^2_{i,b}\)) prior, and the ancestral effect size correlation (\(\rho\)) prior. SuShiE by default initializes them as 0.001, 0.001, and 0.8. If users have their own initial values, simply specify them with --resid-var, --effect-var, and --rho. Make sure the ancestry order has to match the phenotype file order.

cd ./data/
sushie finemap --pheno EUR.pheno AFR.pheno --vcf vcf/EUR.vcf vcf/AFR.vcf --resid-var 2.2 2.2 --effect-var 1.2 3.4 --rho 0.2 --output ./test_result

By default, SuShiE will update \(\sigma^2_{i,b}\) and \(\rho\) during the optimization. If users want to disable it, add --no-update to the command line.

cd ./data/
sushie finemap --pheno EUR.pheno AFR.pheno --vcf vcf/EUR.vcf vcf/AFR.vcf --resid-var 2.2 2.2 --effect-var 1.2 3.4 --rho 0.2 --no-update --output ./test_result

In addition, with --no-update, if users only specify --effect-var but not for --rho, --effect-var will be fixed during the optimizations while --rho will get updated, vice versa. In other words, if users want to fix both priors, they have to specify both at the same time or specify neither of them (in the latter case, fixing the default values 0.001 and 0.2 as the priors).

9. What if I assume no correlation across ancestries?

SuShiE features that it accounts for ancestral quantitative trait loci (QTL) effect size correlation (\(\rho\) in Model Description) in the inference, which is different from other SuSiE-extended multi-ancestry fine-mapping frameworks assuming no ancestral correlation (Joint SuShiE). However, it has the functions to make inference assuming no correlation across ancestries by simply specifying --no-update on the effect size covariance matrix and fixing the rho equal to zero --rho 0. With this, the effect size variance (\(\sigma^2_{i,b}\) in Model Description) will get updated while rho will not.

cd ./data/
sushie finemap --pheno EUR.pheno AFR.pheno --vcf vcf/EUR.vcf vcf/AFR.vcf --no-update --rho 0 --output ./test_result

10. I want to improvise in post-hoc analysis

We understand Output Files output by SuShiE may not serve all users’ post-hoc analysis. Therefore, we add the option to save all the inference results in *.npy file by specifying --numpy.

The *.npy files include SNP information, prior estimators, posterior estimators, credible set, PIPs, and sample size.

cd ./data/
sushie finemap --pheno EUR.pheno AFR.pheno --vcf vcf/EUR.vcf vcf/AFR.vcf --numpy --output ./test_result

11. I seek to use GPU or TPU to make inference faster

SuShiE software uses JAX with Just In Time compilation to achieve high-speed computation. Jax can be run on GPU or TPU.

cd ./data/
sushie finemap --pheno EUR.pheno AFR.pheno --vcf vcf/EUR.vcf vcf/AFR.vcf --platform gpu --output ./test_result

12. I want to use 32-bit precision

SuShiE uses 64-bit precision to assure an accurate inference. However, if users want to use 32-bit precision, they can specify it by having --precision 32.

Unless necessarily needed, we do not recommend to use 32-bit precision as it may cause non-positive semi-definite effect size covariance prior or decreasing ELBO, thus concluding the inference earlier.

cd ./data/
sushie finemap --pheno EUR.pheno AFR.pheno --vcf vcf/EUR.vcf vcf/AFR.vcf --precision 32 --output ./test_result

13. I want to run fine-mapping on certain subjects

Users can use --keep command to specify a file that contains the subject IDs. As a result, the following fine-mapping inference only performs on the subjects listed in the file.

cd ./data/
sushie finemap --pheno EUR.pheno AFR.pheno --vcf vcf/EUR.vcf vcf/AFR.vcf --keep keep.subject --output ./test_result

14. I want to assign the prior weights for each SNP

Users can use --pi command to specify a tsv file that contains the SNP ID and their prior weights. The weights will be normalized to sum to 1 before inference.

cd ./data/
sushie finemap --pheno EUR.pheno AFR.pheno --vcf vcf/EUR.vcf vcf/AFR.vcf --pi prior_weights --output ./test_result

Here are some examples for fine-mapping using individual-level data:

  1. I want to perform fine-mapping on summary-level data and I provide individual-level reference panels for LD.

cd ./data/
sushie finemap --summary --gwas EUR.gwas AFR.gwas --vcf vcf/EUR.vcf AFR.vcf --sample-size 489 639 --output ./test_result
  1. I want to perform fine-mapping on summary-level data and I provide pre-computed LD matrix.

cd ./data/
sushie finemap --summary --gwas EUR.gwas AFR.gwas --ld EUR.ld AFR.ld --sample-size 489 639 --output ./test_result
  1. I want to only focus on SNPs with GWAS P values less than 5e-8 across all ancestries.

cd ./data/
sushie finemap --summary --gwas EUR.gwas AFR.gwas --vcf vcf/EUR.vcf AFR.vcf --sample-size 489 639 --gwas-sig 5e-8 --gwas-sig-type all --output ./test_result
  1. I want to only focus on SNPs between 1bp and 1Mbp on chromsome 6

cd ./data/
sushie finemap --summary --gwas EUR.gwas AFR.gwas --vcf vcf/EUR.vcf AFR.vcf --sample-size 489 639 --chrom 6 --start 1 --end 1000000 --output ./test_result
  1. My GWAS data has different column names.

cd ./data/
sushie finemap --summary --gwas EUR.gwas AFR.gwas --vcf vcf/EUR.vcf AFR.vcf --sample-size 489 639 --gwas-header CHR SNP BP A1 A2 STAT --output ./test_result
  1. I want to add small number to diagonal of my LD matrix to make it positive definite.

cd ./data/
sushie finemap --summary --gwas EUR.gwas AFR.gwas --ld EUR.ld AFR.ld --sample-size 489 639 --ld-adjust 1e-3 --output ./test_result

Parameters

Parameter

Type

Default

Example

Notes

--summary

Boolean

False

--summary # will store as True

Indicator whether to run fine-mapping on summary statistics. Default is False. If True, the software will need GWAS files as input data by specifying –gwas and need LD matrix by specifying either –ld or one of the –plink, –vcf, or –bgen. If False, the software will need phenotype data by specifying –pheno and genotype data by specifying either –plink, –vcf, or –bgen.

--pheno

String

Required, no default

--pheno EUR.pheno AFR.pheno

Phenotype data. It has to be a tsv file that contains at least two columns where the first column is subject ID and the second column is the continuous phenotypic value. It can be a compressed file (e.g., tsv.gz). It is okay to have additional columns, but only the first two columns will be used. No headers. Use space to separate ancestries if more than two. SuShiE currently only fine-maps on continuous data.

--plink

String

None

--plink plink/EUR plink/AFR

Genotype data in plink 1 format. The plink triplet (bed, bim, and fam) should be in the same folder with the same prefix. Use space to separate ancestries if more than two. Keep the same ancestry order as phenotype’s. SuShiE currently does not take plink 2 format.

--vcf

String

None

--vcf vcf/EUR.vcf vcf/AFR.vcf

Genotype data in vcf format. Use space to separate ancestries if more than two. Keep the same ancestry order as phenotype’s.

--bgen

String

None

--bgen bgen/EUR.bgen bgen/AFR.bgen

Genotype data in bgen 1.3 format. Use space to separate ancestries if more than two. Keep the same ancestry order as phenotype’s.

--ancestry-index

String

None

--ancestry-index all.ancestry.index

Single file that contains subject ID and their ancestry index. Default is None. It has to be a tsv file that contains at least two columns where the first column is the subject ID and the second column is the ancestry index starting from 1 (e.g., 1, 2, 3 etc.). It can be a compressed file (e.g., tsv.gz). Only the first two columns will be used. No headers. If this file is specified, it assumes that all the phenotypes across ancestries are in one single file, and same thing for genotypes and covariates data. It will produce errors if multiple phenotype, genotype, and covariates are specified.

--keep

String

None

--keep keep.subject

Single file that contains subject ID across all ancestries that are used for fine-mapping. It has to be a tsv file that contains at least one columns where the first column is the subject ID. It can be a compressed file (e.g., tsv.gz). No headers. If this file is specified, all phenotype, genotype, and covariates data will be filtered down to the subjects listed in it.

--covar

String

None

--covar EUR.covar AFR.covar

Covariates that will be accounted in the fine-mapping. It has to be a tsv file that contains at least two columns where the first column is the subject ID. It can be a compressed file (e.g., tsv.gz). No headers. All the columns will be counted. Use space to separate ancestries if more than two. Keep the same ancestry order as phenotype’s. Pre-converting the categorical covariates into dummy variables is required. If the categorical covariate has n levels, make sure the dummy variables have n-1 columns.

--ld

String

None

--ld EUR.ld AFR.ld

LD files that will be used in the fine-mapping. Default is None. Keep the same ancestry order as GWAS files. It has to be a tsv or comparessed file (e.g., tsv.gz). The header has to be the SNP name matching the GWAS data in –gwas. It can have less or more SNPs than the GWAS data, and the software will find the overlap SNPs. Users must ensure that the LD and GWAS z statistics are computed using the same counting alleles.

--chrom

Integer

None

--chrom 6

Chromsome number to subset GWAS SNPs in the fine-mapping. Default is None. Value has to be an integer number between 1 and 22.

--start

Integer

None

--start 1

Base-pair start position to subset GWAS SNPs in the fine-mapping. Default is None. Value has to be a positive integer number.

--end

Integer

None

--end 1000000

Base-pair end position to subset GWAS SNPs in the fine-mapping. Default is None. Value has to be a positive integer number.

--sample-size

Integer

None

--sample-size 489 639

GWAS sample size of each ancestry. Default is None. Values have to be positive integer. Use ‘space’ to separate ancestries if more than two. The order has to be the same as the GWAS data in –gwas.

--gwas-header

String

chrom snp pos a1 a2 z

--gwas-header CHR SNP BP A1 A2 Z

GWAS file header names. Default is [‘chrom’, ‘snp’, ‘pos’, ‘a1’, ‘a0’, ‘z’]. Users can specify the header names for the GWAS data in this order.

--gwas-sig

Float

None

--gwas-sig 5e-8

The significance threshold for SNPs to be included in the fine-mapping. Default is 1.0. Only SNPs with P value less than this threshold will be included. It has to be a float number between 0 and 1.

--gwas-sig-type

String

at-least

--gwas-sig-type all

The cases how to include significant SNPs in the fine-mapping across ancestries. If it is ‘at-least’, the software will include SNPs that are significant in at least one ancestry. If it is ‘all’, the software will include SNPs that are significant in all ancestries. Default is ‘at-least’. The significant threshold is specified by –gwas-sig.

--L

Integer

10

--L 5

Integer number of shared effects pre-specified. Larger number may cause slow inference.

--pi

String

“uniform”

--pi ./prior_weights

Prior probability for each SNP to be causal (\(\pi\) in Model Description). Default is uniform (i.e., 1/p where p is the number of SNPs in the region. It is the fixed across all ancestries. Alternatively, users can specify the file path that contains the prior weights for each SNP. The weights have to be positive value. The weights will be normalized to sum to 1 before inference. The file has to be a tsv file that contains two columns where the first column is the SNP ID and the second column is the prior weights. Additional columns will be ignored. For SNPs do not have prior weights in the file, it will be assigned the average value of the rest. It can be a compressed file (e.g., tsv.gz). No headers.

--resid-var

Float

1e-3

--resid-var 5.18 0.2

Specify the prior for the residual variance (\(\sigma^2_e\) in Model Description) for ancestries. Values have to be positive. Use space to separate ancestries if more than two.

--effect-var

Float

1e-3

``–effect-var 5.21 0.99 ``

Specify the prior for the causal effect size variance (\(\sigma^2_{i,b}\) in Model Description) for ancestries. Values have to be positive. Use space to separate ancestries if more than two. If --no-update is specified and --rho is not, specifying this parameter will only fix effect-var as prior through optimizations and update rho. If --effect-covar, --rho, and --no-update all three are specified, both --effect-covar and --rho will be fixed as prior through optimizations. If --no-update is specified, but neither --effect-covar nor --rho, both --effect-covar and --rho will be fixed as default prior value through optimizations.

--rho

Float

0.1

--rho 0.05

Specify the prior for the effect correlation (\(\rho\) in Model Description) for ancestries. Default is 0.1 for each pair of ancestries. Use space to separate ancestries if more than two. Each rho has to be a float number between -1 and 1. If there are N > 2 ancestries, X = choose(N, 2) is required. The rho order has to be rho(1,2), …, rho(1, N), rho(2,3), …, rho(N-1. N). If --no-update is specified and --effect-covar is not, specifying this parameter will only fix rho as prior through optimizations and update effect-covar. If --effect-covar, --rho, and --no-update all three are specified, both --effect-covar and --rho will be fixed as prior through optimizations. If --no-update is specified, but neither --effect-covar nor --rho, both --effect-covar and --rho will be fixed as default prior value through optimizations.

--no-scale

Boolean

False

--no-scale # will store as True

Indicator to scale the genotype and phenotype data by standard deviation. Default is to scale. Specify --no-scale will store True value, and may cause different inference.

--no-regress

Boolean

False

--no-regress # will store as True

Indicator to regress the covariates on each SNP. Default is to regress. Specify --no-regress will store True value. It may slightly slow the inference, but can be more accurate.

--no-update

Boolean

False

--no-update # will store as True

Indicator to update effect covariance prior before running single effect regression. Default is to update. Specify --no-update will store True value. The updating algorithm is similar to EM algorithm or Empirical Bayes method that computes the prior covariance conditioned on other parameters. See the manuscript for more information.”

--max-iter

Integer

500

--max-iter 300

Maximum iterations for the optimization. Larger number may slow the inference while smaller may cause different inference.

--min-tol

Float

1e-3

--min-tol 1e-4

Minimum tolerance for the convergence. Smaller number may slow the inference while larger may cause different inference.

--threshold

Float

0.95

--threshold 0.9

Specify the PIP threshold for SNPs to be included in the credible sets. It has to be a float number between 0 and 1.

--purity

Float

0.5

--purity 0.5

Specify the purity threshold for credible sets to be output. It has to be a float number between 0 and 1.

--purity_method

String

“weighted”

--purity_method max

Specify the method to compute purity across ancestries. Users choose ‘weighted’, ‘max’, or ‘min’. weighted is the sum of the purity of each ancestry weighted by the sample size.”, max is the maximum purity value across ancestries. min is the minimum purity value across ancestries. Default is weighted.

--ld-adjust

Float

0

--ld-adjust 1e-3

The adjusting number to LD diagonal to ensure the positive definiteness. It has to be positive integer number between 0 and 0.1. Default is 0.

--max-select

Integer

250

--max-select 100

The maximum selected number of SNPs to calculate the purity. Default is 250. It has to be positive integer number. A larger number can unnecessarily spend much memory.

--min-snps

Integer

100

--min-snps 50

The minimum number of SNPs to fine-map. Default is 100. It has to be positive integer number. A smaller number may produce weird results.

--maf

float

0.01

--maf 0.05

Threshold for minor allele frequency (MAF) to filter out SNPs for each ancestry. It has to be a float between 0 (exclusive) and 0.5 (inclusive).

--rint

Boolean

False

--rint

Indicator to perform rank inverse normalization transformation (rint) for each phenotype data. Default is False (do not transform). Specify –rint will store ‘True’ value. We suggest users to do this QC during data preparation.

--no-reorder

Boolean

False

--no-reorder

Indicator to re-order single effects based on Frobenius norm of alpha-weighted posterior mean square. Default is False (to re-order). Specify –no-reorder will store ‘True’ value.

--keep-ambiguous

Boolean

False

--keep-ambiguous

Indicator to keep ambiguous SNPs (i.e., A/T, T/A, C/G, or G/C) from the genotypes. Recommend to remove these SNPs if each ancestry data is from different studies or plan to use the inference results for downstream analysis with other datasets. Default is False (do not keep). Specify –keep-ambiguous will store ‘True’ value.

--meta

Boolean

False

--meta # will store as True

Indicator to perform single-ancestry SuShiE followed by meta analysis of the results. Specify --meta will store True value and increase running time. Specifying one ancestry in phenotype and genotype parameter will ignore --meta.

--mega

Boolean

False

--mega # will store as True

Indicator to perform mega SuShiE that run single-ancestry SuShiE on genotype and phenotype data that is row-wise stacked across ancestries. Specify --mega will store True value and increase running time. Specifying one ancestry in phenotype and genotype parameter will ignore --mega.

--her

Boolean

False

--her # will store as True

Indicator to perform heritability (\(h_g^2\)) analysis using limix. Specify --her will store True value and increase running time. It estimates \(h_g^2\) using codes in limix.

--cv

Boolean

False

--cv 0.5 # will store as True

Indicator to perform cross validation (CV) and output CV results (adjusted r-squared and its p-value) for future FUSION pipline. Specify --cv will store True value and increase running time.

--cv-num

Integer

5

--cv-num 6

The number of fold cross validation. It has to be a positive integer number. Larger number may cause longer running time.

--seed

Integer

12345

--seed 4321

The seed for randomization. It can be used to cut data sets in cross validation. It can also be used to randomly select SNPs in the credible sets to calculate the purity. Default is 12345. It has to be positive integer number.

--alphas

Boolean

False

--alphas # will store as True

Indicator to output all the credible set results before pruning for purity including PIPs, \(\alpha\) (in Model Description), whether in cs, across all \(L\). Default is False. Specify –alphas will store ‘True’ value and increase running time.

--numpy

Boolean

False

--numpy # will store as True

Indicator to output all the results in *.npy file. Specify --numpy will store True and increase running time. *.npy file contains all the inference results including SNP information, credible sets, pips, priors, posteriors, and sample size for users’ own post-hoc analysis.

--trait

String

“Trait”

--trait GENE_ABC

Trait, tissue, gene name of the phenotype for better indexing in post-hoc analysis.

--quiet

Boolean

False

--quiet # will store as True

Indicator to not print message to console. Specify --quiet will store True value.

--verbose

Boolean

False

--verbose # will store as True

Indicator to include debug information in the log. Specify --verbose will store True value.

--compress

Boolean

False

--compress # will store as True

Indicator to compress all output tsv files in ‘tsv.gz’. Specify --compress will store True value to save disk space. This command will not compress npy files.

--platform

String choices in ["cpu", "gpu", "tpu"]

cpu

--platform gpu

Indicator for the JAX platform.

--jax-precision

Integer choices in [32, 64]

64

--jax-precision 32

Indicator for the JAX precision: 64-bit or 32-bit. Choose 32-bit may cause ‘elbo decreases’ warning.

--output

String

sushie_finemap

--output folder/trait_name

Prefix for output files.


Last update: Oct 27, 2024