Iowa State University NSF
MAGI Cereal Repeat Database 3.1
A principal computational difficulty in assembling complex cereal genomes is the abundance of repetitive elements. Since randomly sampled sequences such as BAC ends should provide a nearly uniform sample of maize genomic DNA (Meyers et al., 2001), statistical analysis of these sequences might provide additions for a repeat database that could be used to mask repeats prior to assembly. We term repetitive elements defined in this fashion "statistically defined repeats", or SDRs (Emrich et al., 2004) . MAGI NR-SDRs Version 3.1 was constructed from approximately 400k BAC ends, which were masked to enrich for uncharacterized elements. We plan to mine a much larger Joint Genome Institute random sample of the maize genome (~1 million reads) into version 4 of this resource.

The current non-redundant repeat database used for our assemblies includes sequences from four distinct sources: TIGR's Cereal Repeat Database Version 2.0, Wessler database of plant TEs, TREP (Triticeae Repeat Sequence Database) release 7, and maize SDRs. Sequences obtained from TIGR provide roughly 800 high-copy repetitive sequences mostly obtained from GenBank. Wessler-derived and TREP repeats are on average moderate-copy while SDRs tend to be lower copy, divergent repetitive sequences that may still confound assembly and analysis.

Below are both a collection of non-redundant SDRs within the MAGI Cereal Repeat Database version 3.1 and a collection of mainly rice repetitive sequences generously shared by Ning Jiang and Susan Wessler at the University of Georgia (Wessler Laboratory website). Our hope is that these data are of use to the cereal community as they were during our maize (MAGI) and sorghum (SAMI) assemblies.


Current non-redundant database statistics:
Download links: