ENCODE – ENCyclopedia Of DNA Elements

Public Health Burden
After completing the full sequence of the human genome, scientists faced the challenge of understanding what that sequence means and how it contributes to health and disease. One approach NHGRI has taken to address this question is to support the Encyclopedia of DNA Elements (ENCODE) Project whose goal is to identify the parts of the human genome sequence that are “functional,” that is, sequences that play a critical role in biological processes. Research laboratories participating in the ENCODE Project use a variety of methods to catalog the functional elements of the human genome, such as genes or regions that control the expression of genes. The resulting list of functional elements is giving scientists a new set of tools to use while investigating biological phenomena and human disease. To further help biologists understand the human genome, related projects were conducted on the genomes of well-studied model organisms.

The goal of the modENCODE Project is to create a comprehensive catalog, of functional elements, freely available to the biomedical research community, in two widely used research model organisms (the roundworm C. elegans and the fruit fly D. melanogaster, which have many biological mechanisms and genes that are similar to humans). ARRA funding for modENCODE expanded data production and enhanced bioinformatics support for data analysis and submission to public databases.

  • One major example of how ARRA funds were used is that NHGRI was able to make an award for a modENCODE-specific Data Analysis Center (DAC) to support and enhance integrative analyses of different types of genomic data being produced by this project. These integrated analyses resulted in a series of papers describing a global view of the functional components of these genomes and a deeper understanding of how genomes work to drive the biology of the organisms.
  • The two main Consortium publications resulting from the modENCODE data analyses (supported by the DAC and from individual modENCODE grants) are: Identification of Functional Elements and Regulatory Circuits by Drosophila modENCODE (Science, December 2010) and Integrative Analysis of the Caenorhabditis Genome by the modENCODE Project (Science, December 2010).
  • There were several additional companion papers citing support from the DAC. Based on the infrastructure created with ARRA funding, NHGRI continued support of the DAC, which has expanded the integrated analyses to combine fly, worm and human data. With this analysis infrastructure, the modENCODE Project is preparing a series of manuscripts to report on these findings.

The goal of the Mouse ENCODE Project, like that of ENCODE and modENCODE, is to create a comprehensive catalog of functional elements, this time in the mouse genome. ARRA support was specifically used to initiate an ENCODE-like project in the mouse to allow direct comparison with human data and to enhance the annotation of the human genome.

  • Three RC2 awards were made in 2009 and seeded an effort that has since been expanded to additional groups. Since its inception, and with additional funding provided by NHGRI and other NIH Institutes, the group has generated over 600 datasets that are freely available to the research community to use in their own research endeavors. The mouse ENCODE Consortium recently published a “marker” paper describing the resource: Plans are underway to published integrated analyses of mouse and human ENCODE data.

The goal of the ENCODE Project is to create a comprehensive catalog of functional elements in the human genome. ARRA funding was used to enhance the ENCODE resource by deepening the examination of data types already under study and by expanding the kinds of data types being collected by the project, through both supplemental funding and the awarding of RC2 grants. This funding greatly expanded the catalog of functional elements and enhanced the analysis of the ENCODE data, contributing considerably to the landmark suite of papers the ENCODE Project recently published which reported on the detailed and global view of the human genome.

This unprecedented set of 30 linked papers included one main integrative paper in Nature, 5 additional papers in Nature, 18 papers in Genome Research, and 6 papers in Genome Biology. (See These papers, which report on analysis of over 1600 datasets, demonstrate the utility of this community resource for studying human biology and disease. To date, more than 3700 datasets are now freely available to the biomedical community to use in their own research

Contributing NIH Institutes & Centers

  • National Human Genome Research Institute (NHGRI)

  1. modENCODE:
    • DAC funding: Appl ID 8327885 (Kellis)
    • Bioinformatics and Data Production supplements: Appl ID 7936306 (Kellis); Appl ID 10108689 (Celniker); Appl ID 7923474 (Henikoff); Appl ID 7940283 (Karpen); Appl ID 7929796 (Lieb); Appl ID 7940279 (MacAlpine); Appl ID 7923469 (Waterston); Appl ID 7929797 (Lai); Appl ID 7929793 (Piano); Appl ID 7925013 (White)
  2. Mouse ENCODE: Appl ID 3193033 (Snyder); Appl ID 7854853 (Stamatoyannopoulos); Appl ID 7852369 (Hardison) Human ENCODE: Appl ID 7929798 (Bernstein); Appl ID 7925978 (Crawford); Appl ID 7940623 (Stamatoyannopoulos); Appl ID 7853780 (White); Appl ID 7855660 (Giddings); Appl ID 7940277 (Kent); Appl ID 7462575 (Tullius); Appl ID 8132645 (Weng); Appl ID 7941543 (Pennacchio), Appl ID 7921275 (Dekker)