This page contains information about the pilot data release 5 from the Pf3k project. This release contains de novo variant discovery and genotyping across an updated sample set from the pilot phase of the project.
At the time of their release, these data were subject to the Pf3k Pilot Phase Terms of Use. In September 2016, these restrictions were lifted and this dataset is now available open access.
Data sets
5.0 Data
This data release comprises sample information and analysis BAMs for the 5.0 sample set which includes 2,640 P. falciparum samples. This updated sample set comprises:
- The Pf3k pilot phase 4.0 sample set:
- 2,375 samples from multiple sampling sites in Africa and Asia, contributed by a number of P. falciparum Community Project partner studies
- 137 samples from Senegal that were contributed by the Broad Institute
- 5 lab clonal samples (7G8, GB4, KH02, KE01, GN01) used for validation
- New samples:
- 96 crosses samples comprising parents and progeny from the P. falciparum Genetic Crosses project
- 27 mixed lab strains in varying proportions created by Dr Jason Wendler
Files in this release include:
- A table of sample metadata in tab-delimited and Excel file formats. This table includes the accessions for downloading the sequence reads from the European Nucleotide Archive (ENA), the sampling location, the contributing partner study ID and contact person, and mapping metadata including sequence coverage metrics.
- Analysis BAM files. These files, one per sample, contain alignments of the raw sequence reads to the 3D7v3.1 reference genome.
These data can be downloaded from the Wellcome Trust Sanger Institute public ftp site.
NOTE: Many browsers now do not support links to FTP sites. If you are experiencing difficulties, you may need to change your browser settings.
5.1 Data
This data release contains a set of de novo genotypes for the 5.0 sample set. The genotyping, including both indel and SNP variants, was performed using a pipeline based on GATK best practices (http://www.broadinstitute.org/gatk/guide/best-practices). These genotypes should not be taken as a quality-controlled output of the Pf3K project and are provided for public interest and as a basis for future methods development.
For more information, see the README files on the ftp site.
Files in this release include:
- Per-chromosome VCF files (http://vcftools.sourceforge.net/specs.html) containing genotypes for all 5.0 samples at ~2M high-quality SNP and indel loci.
These data release can be downloaded from the Wellcome Trust Sanger Institute public ftp site.
NOTE: Many browsers now do not support links to FTP sites. If you are experiencing difficulties, you may need to change your browser settings.
Open access
Data package contact
Citations
To cite this release directly, please use the following format:
The Pf3K Project (2016): pilot data release 5. http://www.malariagen.net/data_package/pf3k-5/