NEW: Dominic Kwiatkowski’s final paper... more
Pv4: An open dataset of Plasmodium vivax - v.4.0

Released on 11 Feb 2022.

Parasite

This page provides information about the Pv4 dataset, which contains genome variation data on 1,895 worldwide samples of Plasmodium vivax. The key publication is MalariaGEN et al, Wellcome Open Research 2022, 7:136 https://doi.org/10.12688/wellcomeopenres.17795.1.

Full details of the methods can be found in the accompanying paper. The major changes from the v1 (May 2016 data release) pipeline are that we now a) map to the PvP01 reference genome rather than PvSal1 and b) use a pipeline based on current GATK best practices which is analogous to the Pf6 pipeline.

This release contains details on contributing partner studies, sample metadata and key sample attributes inferred from genomic data, and genomic data including raw sequence reads. Further details and analytical results can be found in the accompanying data release paper.

These data are available open access. Publications using these data should acknowledge and cite the source of the data using the following format: “This publication uses data from the MalariaGEN Plasmodium vivax Genome Variation Project as described in ‘An open dataset of Plasmodium vivax genome variation in 1,895 worldwide samples’. MalariaGEN et al, Wellcome Open Research 2022, 7:136 https://doi.org/10.12688/wellcomeopenres.17795.1

Data sets

Study Information

Details of the 11 contributing partner studies, and 3 external studies, including description, contact information and key people.

Download study information.

Sample provenance and sequencing metadata

Sample information including partner study information, location and year of collection, ENA accession numbers, and QC information for 1,895 samples from 27 countries.

Download sample provenance and sequencing metadata.

Measure of complexity of infections

Characterisation of within-host diversity (FWS) for 1,072 QC pass samples.

Download measure of complexity of infections.

Drug resistance marker genotypes

Genotypes at known markers of drug resistance for 1,895 samples, containing amino acid and copy number genotypes at 3 loci: dhfr, dhps, mdr1.

Download drug resistance marker genotypes.

Inferred resistance status classification

Classification of 1,072 QC pass samples into different types of resistance to 4 drugs or combinations of drugs: pyrimethamine, sulfadoxine, mefloquine, and sulfadoxine-pyrimethamine combination.

Download inferred resistance status classification.

Drug resistance markers to inferred resistance status

Details of the heuristics utilised to map genetic markers to resistance status classification.

Download drug resistance markers to inferred resistance status.

Short variants genotypes

Genotype calls on 4,571,056 SNPs and short indels in 1,895 samples from 27 countries, available both as VCF and zarr files.

These are available at ftp://ngs.sanger.ac.uk/production/malaria/Resource/30.

Tandem duplication genotypes

Genotypes for tandem duplications discovered in four regions of the genome.

Download tandem duplication genotypes.

Genome regions

A bed file classifying genomic regions as core genome or different classes of non-core genome.

Download genome regions bed file.

Tabix index file for genome regions file.

Download genome regions index file.

Release notes

11 Feb 2022

A README file describes in detail all the files included in the release, the format and interpretation of each column, and contains some tips and tricks for accessing the genotype data in VCF and zarr files.

All of the data files included in this release can be downloaded from the Wellcome Trust Sanger Institute public FTP site using a freely available FTP client.

Access the data on Google Cloud

Open access

Our approach to sharing data

Data package contact

Citations

Publications using these data should acknowledge and cite the source of the data using the following format: “This publication uses data from the MalariaGEN Plasmodium vivax Genome Variation Project as described in ‘An open dataset of Plasmodium vivax genome variation in 1,895 worldwide samples’. MalariaGEN et al, Wellcome Open Research 2022, 7:136 https://doi.org/10.12688/wellcomeopenres.17795.1