sushie.io.read_vcf(path: str) Tuple[DataFrame, DataFrame, Array][source]
Read in genotype data in vcf format.

cyvcf2 package is used to read in the vcf file. gt_types are used to determine the genotype matrix. It it is UNKNOWN, it will be coded as NA.

Parameters:
path: str

The path for vcf genotype data (full file name). It will count REF allele.

Returns:

A tuple of
  1. SNP information (bim; pd.DataFrame),

  2. participants information (fam; pd.DataFrame),

  3. genotype matrix (bed; Array).

Return type:

Tuple[pd.DataFrame, pd.DataFrame, Array]


Last update: Oct 27, 2024