Iowa State University NSF
Download Assemblies:
Maize Assembled Genomic Island (MAGI) Downloads
MAGI 4.0 Contigs and Singletons 727,781 Sequences 201.0 MB
MAGI 4.0 Contigs and Singletons SDRv3.1 repmasked 727,781 Sequences 184.0 MB
MAGI 4.0 Contigs and Singletons Cereal v3.1 repmasked 727,781 Sequences 165.8 MB
MAGI 4.0 Contigs 163,390 Sequences 78.2 MB
MAGI 4.0 Contigs SDRv3.1 repmasked 163,390 Sequences 75.7 MB
MAGI 4.0 Contigs Cereal v3.1 repmasked 163,390 Sequences 72.4 MB
Kalyanaraman A, Emrich SJ, Schnable PS, Aluru S (2006) Assembling genomes on large-scale parallel computers. Proceedings of the IEEE International Parallel and Distributed Processing Symposium, April 25-29, 2006. [Full Text PDF]
MAGI 3.1 Contigs and Singletons 214,472 Sequences 73.2 MB
MAGI 3.1 Contigs and Singletons SDRv3.1 repmasked 214,472 Sequences 70.5 MB
MAGI 3.1 Contigs and Singletons Cereal v3.1 repmasked 214,472 Sequences 68.2 MB
MAGI 3.1 Contigs 114,173 Sequences 53.3 MB
MAGI 3.1 Contigs SDRv3.1 repmasked 114,173 Sequences 51.5 MB
MAGI 3.1 Contigs Cereal v3.1 repmasked 114,173 Sequences 50.5 MB
Fu Y, Emrich SJ, Guo L, Wen T-J, Aluru S, Ashlock DA, Schnable PS (2005) Quality assesment of maize assembled genomic islands (MAGIs) and experimental verification of predicted genes. Proceedings National Academy Science, 102(34): 12282-12287. [Full Text PDF]

Emrich SJ, Aluru S, Fu Y, Narayanan M, Guo L, Ashlock DA, Schnable PS (2004) A strategy for assembling the maize (Zea mays L.) Genome. Bioinformatics, 20(2): 140-147. [Full Text PDF]
Show Deprecated Assemblies
MAGI 2.31 Contigs and Singletons 269,475 Sequences 76.7 MB
MAGI 2.31 Contigs and Singletons SDRv3.1 repmasked 269,475 Sequences 71.3 MB
MAGI 2.31 Contigs 94,293 Sequences 37.8 MB
MAGI 2.31 Contigs SDRv3.1 repmasked 94,293 Sequences 37.4 MB
NOTE: MAGIs 3.1, and 4.0 are all available for BLAST and download. Please remember that contig names are NOT conserved between assemblies 2.31, 3.1, and 4.0.
MAGIv4.0 Contigs Genes Predicted via FGENESH v2.6
Premature mRNA 61,428 Sequences 22.9 MB
Premature mRNA SDRv3.1 repmasked 61,428 Sequences 22.6 MB
Premature mRNA Cereal v3.1 repmasked 61,428 Sequences 22.3 MB
Premature mRNA + 300bp upstream & downstream 61,428 Sequences 29.7 MB
Premature mRNA + 300bp upstream & downstream SDRv3.1 repmasked 61,428 Sequences 29.2 MB
Premature mRNA + 300bp upstream & downstream Cereal v3.1 repmasked 61,428 Sequences 28.6 MB
mRNA 61,428 Sequences 17.3 MB
mRNA SDRv3.1 repmasked 61,428 Sequences 17.0 MB
mRNA Cereal v3.1 repmasked 61,428 Sequences 16.7 MB
mRNA + 300bp upstream & downstream 61,428 Sequences 24.1 MB
mRNA + 300bp upstream & downstream SDRv3.1 repmasked 61,428 Sequences 23.7 MB
mRNA + 300bp upstream & downstream Cereal v3.1 repmasked 61,428 Sequences 23.1 MB
Coding Regions 61,428 Sequences 10.8 MB
Coding Regions SDRv3.1 repmasked 61,428 Sequences 10.7 MB
Coding Regions Cereal v3.1 repmasked 61,428 Sequences 10.5 MB
To facilitate analyses of the maize gene space, gene predictions were performed on the 163,390 MAGIv4.0 contigs by FGENESH v2.6 (Softberry, Inc.) using the monocots matrix and -GC -pmrna -pexons -scip_prom -scip_term parameters. This resulted in the prediction of structures for 61,428 MAGIv4.0 genes. These predictions were parsed to produce premature mRNAs, mRNAs, and ORFs. "MAGI Premature mRNAs" consist of genomic fragments that include predicted UTRs, exons, and introns. "MAGI Premature mRNAs + 300 bp" additionally include 300 bases upstream and downstream of the predicted transcription start and end sites. "MAGI mRNAs" include only predicted UTRs and exons. "MAGI mRNAs + 300 bp" additionally include 300 bases upstream and downstream of the predicted UTR or first/last exon regions. "MAGI ORFs" consist of only exonic coding regions (i.e., mRNAs minus UTRs). Schematics of the extracted structures are available here. Note that MAGIs can contain truncated genes. Extracted sequences were masked using the MAGI version of repeatmasker (Emrich et al, 2004), in combination with our Statistical Defined Repeat (SDRs) and Cereal Repeat databases.
MAGIv3.1 Contigs Genes Predicted via FGENESH v2.6
Premature mRNA 43,707 Sequences 16.5 MB
Premature mRNA SDRv3.1 repmasked 43,707 Sequences 16.3 MB
Premature mRNA Cereal v3.1 repmasked 43,707 Sequences 16.2 MB
Premature mRNA + 300bp upstream & downstream 43,707 Sequences 21.4 MB
Premature mRNA + 300bp upstream & downstream SDRv3.1 repmasked 43,707 Sequences 21.1 MB
Premature mRNA + 300bp upstream & downstream Cereal v3.1 repmasked 43,707 Sequences 21.0 MB
mRNA 43,707 Sequences 12.6 MB
mRNA SDRv3.1 repmasked 43,707 Sequences 12.5 MB
mRNA Cereal v3.1 repmasked 43,707 Sequences 12.5 MB
mRNA + 300bp upstream & downstream 43,707 Sequences 17.6 MB
mRNA + 300bp upstream & downstream SDRv3.1 repmasked 43,707 Sequences 17.4 MB
mRNA + 300bp upstream & downstream Cereal v3.1 repmasked 43,707 Sequences 17.2 MB
Coding Regions 43,707 Sequences 8.1 MB
Coding Regions SDRv3.1 repmasked 43,707 Sequences 8.0 MB
Coding Regions Cereal v3.1 repmasked 43,707 Sequences 8.0 MB
To facilitate analyses of the maize gene space, gene predictions were performed on the 114,173 MAGIv3.1 contigs by FGENESH v2.6 (Softberry, Inc.) using the monocots matrix and -GC -pmrna -pexons -scip_prom -scip_term parameters. This resulted in the prediction of structures for 43,707 MAGIv3.1 genes. These predictions were parsed to produce premature mRNAs, mRNAs, and ORFs. "MAGI Premature mRNAs" consist of genomic fragments that include predicted UTRs, exons, and introns. "MAGI Premature mRNAs + 300 bp" additionally include 300 bases upstream and downstream of the predicted transcription start and end sites. "MAGI mRNAs" include only predicted UTRs and exons. "MAGI mRNA + 300 bp" additionally include 300 bases upstream and downstream of the predicted UTR or first/last exon regions. "MAGI ORFs" consist of only exonic coding regions (i.e., mRNAs minus UTRs). Schematics of the extracted structures are available here. Note that MAGIs can contain truncated genes. Extracted sequences were masked using the MAGI version of repeatmasker (Emrich et al, 2004), in combination with our Statistical Defined Repeat (SDRs) and Cereal Repeat databases.
Sorghum Assembled GenoMic Island (SAMI) Downloads
SAMI 2.0 Contigs and Singletons 211,645 Sequences 44.8 MB
SAMI 2.0 Contigs 81,342 Sequences 27.4 MB
SAMI 1.0 Contigs and Singletons 207,440 Sequences 53.8 MB
SAMI 1.0 Contigs 74,673 Sequences 29.3 MB
NOTE: SAMIs 1.0 and 2.0 are both available for BLAST and download. Please remember that contig names are NOT conserved between assemblies 1.0 and 2.0.
Maize Expressed Genes (MEG) Downloads
MEG_Dec03 Contigs and Singletons 80,252 Sequences 13.6 MB
MEG_Dec03 Contigs 32,136 Sequences 6.8 MB
Maize EST Contig (MEC) Downloads
MEC_P98-Mar06 Contigs and Singletons 672,881 Sequences 59.9 MB
MEC_P98-Mar06 Contigs 126,832 Sequences 18.7 MB
MEC_P95-Mar06 Contigs and Singletons 503,467 Sequences 45.6 MB
MEC_P95-Mar06 Contigs 101,999 Sequences 16.6 MB
MEC_P98-May05 Contigs and Singletons 204,059 Sequences 26.9 MB
MEC_P98-May05 Contigs 56,445 Sequences 10.5 MB
MEC_P95-May05 Contigs and Singletons 124,997 Sequences 19.3 MB
MEC_P95-May05 Contigs 46,053 Sequences 9.6 MB
418,638 Zea mays ESTs were downloaded from the dbEST division of GenBank in late January 2005, of which 11,521 sequences were removed due to contamination. In additional, ~31,000 Shoot Apical Meristem (SAM) ESTs from the University of Georgia and Iowa State University were also processed for a total of 438,223 ESTs. These ESTs were then clustered using PaCE with initial 30 bp exact match as criterion. Overlaps with >= 87% identity over 80 bp were used to merge clusters. Two distinct EST assemblies, named MEC_P98-May05 and MEC_P95-May05 were subsequently generated using the same PaCE clustering result. MEC_P98-May05 and MEC_P95-May05 were built using CAP3 with the following parameters: >=98% (_P98) and >=95% (_P95) identities, <=5% overhang length, >=60 bp clipping range, and an overlap length >= 50 bp. GeneSeqer EST alignment display cutoff: at least one exon with similarity >= 95% and overall cDNA coverage >= 80%.
NOTE: MEC Contigs (_P98 and _P95) clustered on May 2005 and March 2006 are all available for BLAST and download. Please remember that contig names are NOT conserved between each set of contigs.
454 Transcriptome Downloads
454 Shoot Apical Meristem (SAM) B73 260,736 Sequences 9.6 MB
Emrich SJ, WB Barbazuk, L Li, PS Schnable (2007) Gene discovery and annotation using LCM-454 transcriptome sequencing. Genome Research, 17(1): 69-73. (Epub: 2006 Nov 9). [Full Text PDF]
454 Shoot Apical Meristem (SAM) Mo17 287,914 Sequences 10.9 MB
Barbazuk WB, SJ Emrich, HD Chen, PS Schnable (2007) SNP discovery via 454 transcriptome sequencing. Plant Journal, 51(5): 910-918. [Full Text PDF]
Other Maize EST Contig Downloads
3' B73 EST Contigs and Singletons 20,599 Sequences 3.4 MB
3' B73 EST Contigs 2,970 Sequences 588 KB
Approximately 32,000 3' B73 EST sequences generated by the Schnable Lab (Qiu et al., 2003) were downloaded from Genbank. Only those 30,356 ESTs with polyT prefixes of >7 bp (indicative of the presence of a polyA tail on the corresponding cDNA) were assembled using CAP3 (Huang and Madan, 1999). CAP3 parameters: overlap identity >=98%, overlap length >=60 bp, clipping range <=20 bp, and overhang <= 5%. PolyT prefixes were masked prior to clustering. This CAP3 analysis yielded 3,252 contigs and 16,202 singletons. GeneSeqer (Usuka et al., 2000) was used for MAGI/EST alignments with the option for specifying a particular orientation for genomic sequences (-f). GeneSeqer EST alignment display cutoff: at least one exon with similarity >= 95% and overall cDNA coverage >= 80%
3' Mo17 EST Contigs and Singletons 701 Sequences 96 KB
3' Mo17 EST Contigs 67 Sequences 12 KB