NEW: Call for open parasite data... more
Pf8: An open dataset of Plasmodium falciparum - v.8.0

Released on 15 Apr 2025.

Parasite

This page provides information about the Pf8 dataset which contains genome variation data on 33,325 samples of Plasmodium falciparum.

Open the Pf8 app to view summary information about contributing studies, countries, and resistance profiles.

This release contains details on contributing partner studies, sample metadata and key sample attributes inferred from genomic data, and genomic data including raw sequence reads. A description of the dataset can be found here.

These data are available open access. Publications using these data should acknowledge and cite the source of the data. The key publication details will be made available here as soon as it is available.

Data sets

Study information

Details of the 99 contributing partner studies, including description, contact information and key people can be found in Supplementary Materials of the linked paper

Sample provenance and sequencing metadata

Sample information including partner study information, location and year of collection, ENA accession numbers, and QC information for 33,325 samples from 34 countries.

Download sample provenance and sequencing metadata

CNV calls 

Amplification calls for genes CRT, GCH1, MDR1 and PM2_PM3, and deletion calls for HRP2 and HRP3.

Download CNV calls

Tandem duplication breakpoints 

Genomic coordinates of breakpoints used for faceaway read-based calling.

Download tandem duplication breakpoints

Measure of complexity of infections

Characterisation of within-host diversity (FWS) for 24,409 QC pass samples.

Download measure of complexity of infections

Drug resistance marker genotypes

Genotypes at known markers of drug resistance for 33,325 samples, containing amino acid and copy number genotypes at six loci: crt, dhfr, dhps, mdr1, kelch13, plasmepsin 2-3.

Download drug resistance marker genotypes

Inferred resistance status classification

Classification of 24,409 QC pass samples into different types of resistance to 10 drugs or combinations of drugs and to RDT detection: chloroquine, pyrimethamine, sulfadoxine, mefloquine, artemisinin, piperaquine, sulfadoxine- pyrimethamine for treatment of uncomplicated malaria, sulfadoxine- pyrimethamine for intermittent preventive treatment in pregnancy, artesunate-mefloquine, dihydroartemisinin-piperaquine, hrp2 and hrp3 gene deletions.

Download Inferred resistance status classification

Drug resistance markers to inferred resistance status 

Details of the heuristics utilised to map genetic markers to resistance status classification

Download drug resistance markers to inferred resistance status

Pf HaploAtlas

Analysis-ready data on haplotypes for all 4,952 core genes. 

View analysis-ready data on Pf HaploAtlas

 

Reference genome

The version of the 3D7 reference genome fasta file used for mapping. 

Download 3D7 reference genome file 

Annotation file

The version of the 3D7 reference annotation gff file used for genome annotations. 

Download 3D7 reference annotation file

Genetic distances

Genetic distance matrix comparing all 33,325 samples. 

Download genetic distance matrix 

Download SNP-only genetic distance matrix

Short variants genotypes 

Genotype calls on 12,493,205  SNPs and short indels in all 33,325 samples from 34 countries, available both as VCF and zarr files.

Download short variants genotypes as VCF and zarr files (Pf8.zarr.zip)

SNP-only genotypes

Genotype calls on 10,821,552 SNPs  in all 33,325 samples from 34 countries, available both as VCF and zarr files.

Download SNPs as VCF and zarr files (pf8_snp_only_clean_zarr.zip)

CRAM files

Compressed sequencing data files for all 33,325 samples

Download CRAM files

gVCF files

genomic VCF files containing both variant and non-variant regions for all 33,325 Pf8 samples

Download gVCF files

Release notes

A README file describes in fine detail all the files included in the release, the format and interpretation of each column, and contains some tips and tricks for accessing genotype data in VCF and zarr files.