Skip to content

FAQ¤

Can I use genoio output with JAX?¤

Yes. Dense reads return NumPy arrays, so convert each block with jax.numpy.asarray or jax.device_put before calling JAX code.

import jax.numpy as jnp

for X, variants in ds.iter_blocks(
    size=4096,
    variants=genoio.maf(max=0.05) & genoio.snp() & genoio.biallelic(),
    return_variants=True,
):
    # Convert inside the loop so the analysis does not need one whole-genome
    # genotype matrix in device memory.
    X_jax = jnp.asarray(X)
    scan_block(X_jax, variants)

Keep the host-to-device transfer cost in mind. genoio reads data on the CPU; JAX kernels run fastest when each block is large enough to amortize that transfer.

Can I use genoio output with PyTorch?¤

Yes. Dense reads return NumPy arrays, and torch.from_numpy converts compatible CPU arrays without copying.

import torch

for X, variants in ds.iter_blocks(
    size=4096,
    variants=genoio.maf(max=0.05) & genoio.snp() & genoio.biallelic(),
    return_variants=True,
):
    # from_numpy shares CPU memory with X. Clone if downstream code mutates it.
    X_tensor = torch.from_numpy(X).to(device)
    scan_block(X_tensor, variants)

If you move tensors to a GPU, PyTorch will copy the data to that device. For large scans, convert and transfer one block at a time.

What about sparse matrices?¤

Sparse reads return SciPy sparse matrices. That is useful for methods that already accept SciPy inputs. JAX and PyTorch have their own sparse APIs, so convert deliberately if your model expects those representations.

For many association scans, dense blocks are simpler. They also keep the same shape across formats and make it easier to call NumPy, JAX, PyTorch, or compiled association kernels.

Why does genoio use filter objects instead of Python callbacks?¤

Filter objects describe the selection before records are read. Python builds an expression such as genoio.region("22:20000000-21000000") and genoio.maf(max=0.05); Rust evaluates that expression while reading the file.

That matters for region reads. A concrete region filter can be pushed into indexed VCF/BCF and BGEN reads, so genoio can jump to the requested genomic interval instead of scanning the full file.

When should I use read, iter_blocks, or iter_regions?¤

Use read(...) when the requested matrix fits in memory and you want one array.

Use iter_blocks(...) for GWAS-style scans. It returns fixed-width chunks of retained variants, which is a good fit for algorithms that apply the same test to many independent variant columns.

Use iter_regions(...) for cis scans or other local analyses. It takes a sequence of region filters and returns one matrix per requested interval.

Are samples reordered?¤

No. Retained samples stay in source order. Use ds.samples() or read(return_samples=True) to align phenotype and covariate tables before a scan.

Are variants reordered?¤

No. Variants are returned in source order after filtering. For region iteration, each result contains the variants retained inside that region.

Which haplotype representations are supported?¤

VCF/BCF haplotype reads use phased hardcall GT records. PLINK2 haplotype reads support explicit phased hardcalls with dosage="hardcall" and explicit phased full dosages with dosage="dosage". BGEN haplotype reads support BGEN v1.2+ Layout 2 phased biallelic diploid probabilities with kind="haplo", dosage="dosage", returned as expected A1 dosage per haplotype row.

genoio does not convert probabilities into hardcalls. Sparse PLINK2 dosage haplotypes and sparse BGEN haplotypes are not supported, while PLINK2 explicit phased hardcall haplotypes can be read sparsely when retained calls are non-missing. If an unsupported record is retained, the read fails; metadata-only filters can skip unsupported records before decode.

Universal Standard¤

Is this a new universal standard? No. Shh.