The MalariaGEN Vector Observatory Anopheles gambiae data resource version 3.7 (Ag3.7) contains single nucleotide polymorphism (SNP) calls, copy number variant (CNV) calls and SNP haplotypes from whole-genome sequencing of mosquitoes collected in Benin (451 samples), Burkino Faso (24 samples), Cameroon (744 samples), Gabon (4 samples), Ghana (528 samples), Guinea (1 sample), and Tanzania (794 samples) from 2007 to 2021. Ag3.7 contains 2546 whole genome sequences from An. coluzzi, An. arabiensis, An. fontenillei and An. gambiae.
Data sets
Downloads
Downloads include sequence alignments (BAM files) and variant calls (VCF files) from both alignment- and assembly-based calling methods.
All of the data files included in this release can be downloaded from the Wellcome Trust Sanger Institute public FTP site.
NOTE: Many browsers now do not support links to FTP sites. If you are experiencing difficulties, you may need to change your browser settings.
Release notes
Organisation of VCF files
There are two VCF files available for each chromosome arm. One file has all SNPs discovered (e.g., ag1000g.phase1.AR2.2L.vcf.gz) and the second file has only those SNPs that passed all quality filters (ag1000g.phase1.AR2.2L.PASS.vcf.gz). For most analyses it is recommended to only work with PASS variants and therefore the PASS.vcf.gz files will be more convenient to use.
Known issues
Organisation of VCF files iiii
aaa There are two VCF files available for each chromosome arm. One file has all SNPs discovered (e.g., ag1000g.phase1.AR2.2L.vcf.gz) and the second file has only those SNPs that passed all quality filters (ag1000g.phase1.AR2.2L.PASS.vcf.gz). For most analyses it is recommended to only work with PASS variants and therefore the PASS.vcf.gz files will be more convenient to use.
Apply for access
Archived
Citations
To cite these data directly, please use the following citation format:
The Anopheles gambiae 1000 Genomes Consortium (2014): Ag1000G phase 1 AR2 data release. MalariaGEN. http://www.malariagen.net/data/ag1000g-phase1-AR2
Also bla bla cite these data directly, please use the following citation format:
The Anopheles gambiae 1000 Genomes Consortium (2014): Ag1000G phase 1 AR2 data release. MalariaGEN. http://www.malariagen.net/data/ag1000g-phase1-AR2
Contacts
- Fffirstname Lllastname
aaa@sanger.ac.uk