The Biological Questions We Address

(Alternative Layperson's Description)

In the broadest terms, we seek to understand how information is encoded and dynamically utilized in living eukaryotic genomes. We focus specifically on those areas of the genome that regulate chromosomal functions such as transcription, DNA replication and repair, recombination, and chromosome segregation.

DNA is an elegant molecule, but carries its information in a language that consists of only four letters. The molecular simplicity of DNA imposes practical limits on the complexity and types of information it can encode. How do complex organisms overcome these limitations? Conceptually, information in living genomes can be visualized as existing in layers, with the information being more diffusely coded in each ascending layer. The primary layer is best represented by protein-coding DNA, which operates according to the relatively inflexible universal genetic code. A second layer encodes regulatory information through the occurrence of millions of degenerate sequence motifs potentially recognized by "sequence specific" DNA-binding proteins such as transcription factors. A third layer of sequence information is very diffusely encoded over hundreds of bases and guides the positioning and occupancy of nucleosomes, the basic units of DNA packaging. The final layer is composed of the nucleosomes themselves. Nucleosomes greatly extend the information-coding capacity of the genome by allowing overlapping, redundant, and even illegitimate information to be safely encoded in DNA sequences. Nucleosomes accomplish this by blocking regulatory protein access to most of the genome, and by dynamically allowing access to relatively small portions of the genome that are utilized specifically in a given cellular environment. We seek to characterize quantitatively how the regulation of genome accessibility occurs and how it is coordinated with the underlying layers of information encoded in DNA.

Yeast, worms, and humans: A strategy for linking basic biology and medicine

The projects in my laboratory are united by the scientific goal of understanding relationships between chromatin, transcription factor targeting, and gene expression. We use three biological systems: (1) S. cerevisiae (hereafter "yeast") to address basic molecular mechanisms; (2) C. elegans to test the importance of those mechanisms in a simple multicellular organism; and (3) cell lines and clinical samples to directly interrogate chromatin function in human development and disease. The genomes of these organisms span three orders of magnitude in size (12 Mb, 100 Mb, and 3000 Mb respectively) and a wide range of genome complexity (~50% coding, ~25% coding, and ~1.5% coding respectively). Use of these systems, with C. elegans serving as a "stepping stone" to bridge yeast and human studies, permits us to quickly bring concepts discovered in model systems to medical relevance.

Major Projects in the Lab

1. Using yeast transcription factors to investigate regulation of protein-genome interactions

We use the localization of yeast proteins as model systems to investigate in vivo DNA-binding specificity, and how it is regulated under different environmental and developmental conditions. Genome-wide localization of proteins is determined by a method commonly called "ChIP-chip", which stands for Chromatin Immunoprecipitation followed by microarray analysis. We also measure transcription genome-wide to study the biological implications of protein-DNA interactions.

2. Using in vitro methods to identify factors that regulate in vivo target selection by DNA-binding proteins

In collaboration with Neil Clarke's group (now at the Genome Institute of Singapore), we have developed a new method for determining the DNA-binding specificity of proteins. In DIP-chip (DNA immunoprecipitation with microarray detection), protein-DNA complexes are isolated from an in vitro mixture of purified protein and naked genomic DNA. Whole-genome DNA microarrays are used to identify the protein-bound DNA fragments, and the sequence of the identified fragments is used to derive binding site descriptions. The experimental protocol for DIP-chip can also be used for a rather different purpose, which is comparing the sites of binding in vitro with the sites of binding in vivo, as defined by ChIP-chip. Comparisons of DIP-chip and ChIP-chip experiments will be useful in determining how much of the specificity of in vivo interactions depends on chromatin and other factors, and how much is inherent to the protein and DNA itself.

3. Combining biochemical and genomic methods to study genome and chromatin organization.

Our group work to characterize how DNA is packaged, focusing in particular on the regulation of nucleosome dynamics. We have published results that provide evidence that the basic repeating units of eukaryotic chromatin, nucleosomes, are depleted from active regulatory elements throughout the Saccharomyces cerevisiae genome in vivo. Alterations in the global transcriptional program resulted in an increased nucleosome occupancy at repressed promoters, and a decreased nucleosome occupancy at promoters that became active. Given the conservation of sequence and function among components of both chromatin and the transcriptional machinery, nucleosome depletion at promoters may be a fundamental feature of eukaryotic transcriptional regulation. We are continuing to study the regulation of nucleosome occupancy genome-wide in yeast.

We are interested in bringing technologies and concepts we develop in model systems to the study of human biology and health. One example is FAIRE (Formaldehyde-Assisted Isolation of Regulatory Elements), a simple low-cost method for the isolation and identification of nucleosome-depleted regions of chromatin genomewide. FAIRE was initially discovered in yeast, where we observed that if formaldehyde-crosslinked chromatin was subjected to phenol-chloroform extraction, nucleosome-depleted sequences were recovered in the aqueous phase with much greater efficiency than coding sequences. FAIRE presumably works because covalently crosslinked protein-DNA complexes are retained at the interface of the organic and aqueous solvents, whereas DNA that is not crosslinked (or trapped by crosslinks) escapes into the aqueous phase. Higher-resolution comparison of FAIRE signal to nucleosome mapping data revealed that nearly all yeast genomic regions depleted in histone H3 and H4-Myc chips were enriched by FAIRE. Histone proteins are likely to dominate the crosslinking profile because of their abundant primary amines and close proximity to DNA, both required for crosslinking. We have developed FAIRE as an alternative method for identification of open chromatin sites in human chromatin. FAIRE isolates regulatory regions in human cells that overlap to a large degree with DNaseI hypersensitive regions, but also detect a unique set of loci. Our discovery in of FAIRE in yeast, and its continued development in human cells provides the foundation of projects designed to create a human open chromatin atlas, and our proposal to profile chromatin in human cancer.

4. Establishment of Caenorhabditis elegans as a model metazoan for the study of protein-DNA interactions during development

Yeast is a fabulous system, but we are also interested in studying aspects of chromatin regulation that are required for development. For this purpose, we initiated studies of C. elegans. C. elegans is at the forefront of both large-scale genomic research and gene function discovery. It was the first animal to have a fully mapped and sequenced genome. Genomic approaches including EST projects, SAGE sequencing, an ORFeome library, extensive yeast two hybrid data-sets, microarray profiling, and genome-wide RNAi screens have provided a wealth of information regarding gene structure and function ( The versatility of C. elegans for experimental manipulation has led to a large collection of mutant alleles and many well-known discoveries of basic biology ( Also unique to worms are the advantages it holds for the study of chromatin factors regulating meiosis and germline development, which are notoriously difficult to study in mammalian systems. All of these features make realistic the goal of understanding how a genome sequence directs animal development. C. elegans has traditionally been exploited as a model for genetics, cell biology, and neurobiology, but application of biochemical approaches has lagged. We sought to establish ChIP-chip, which we helped to develop in yeast, to this important model system. For this purpose we used the C. elegans dosage compensation complex (DCC) proteins. Because the DCC binds specifically to X and not to the autosomes, we could measure the specificity and sensitivity of our assay and optimize procedures to maximize the ratio of signal (X-chromosome hybridization) to noise (autosome hybridization). Furthermore, we are able to cross-validate experiments using antibodies against distinct components of the DCC. Therefore in addition to important biological discoveries, this test case offered technical advantages that allowed protocol development and objective assessment of our methods. This led to successful ChIPs of other factors, including the histone variant H2A.Z and the transcription factor NFI-1.

We are currently funded as part of a large effort funded by NHGRI's modENCODE project to identify elements encoded in DNA that control chromatin behavior in C. elegans.