Getting started¤
genoio reads VCF, PLINK1, PLINK2, and BGEN genotype files into Python
matrices. The Python API resolves sources and assembles results. Rust readers
parse records, apply filters, and build matrices.
Installation¤
Install the development version from this repository:
pip install git+https://github.com/mancusolab/genoio.git
For local development, use the project environment and build the Rust extension in place:
make build-dev
Quick example¤
import genoio
ds = genoio.pfile("data/chr22_hg38")
samples = ds.samples()
y = load_phenotype_vector(samples["iid"])
C = load_covariates(samples["iid"])
for X, variants in ds.iter_blocks(10_000, return_variants=True):
# X has shape (samples, variants_in_this_block).
# `y` and `C` must be aligned to the rows described by `samples`.
association_scan(X, y, C, variants=variants)
Four constructors resolve supported sources:
vcffor VCF/BCF files.bfilefor PLINK1.bed/.bim/.famfile sets.pfilefor PLINK2.pgen/.pvar[.zst]/.psamfile sets.bgenfor BGEN.bgenfiles.
Each constructor returns a reusable Dataset
with read,
iter_blocks,
iter_regions,
samples, and
variants methods.
Dense reads return NumPy arrays with shape (samples, variants). Sparse reads
return SciPy sparse matrices. Metadata is returned as Polars DataFrames.
Next steps¤
Use Examples for GWAS and cis-eQTL scan sketches. Read Filtering for variant selection and region pushdown, Format support for source-specific behavior, or API Reading for matrix options, missing-data handling, sparse output, and iterator contracts.