MIA: Carl de Boer, Deep learning regulatory models; David Kelley, Expression prediction from DN

Models. Inference and Algorithms September 21, 2022 Broad Institute of MIT and Harvard Meeting: Using deep learning regulatory models and random DNA for evolutionary inference Carl de Boer Assistant Professor, School of Biomedical Engineering UBC Genetic variation in cis-regulatory sequences can alter gene regulation and is a major driver of phenotypic variation. Here, I will describe two recent advances in our understanding of the molecular evolution of cis-regulatory DNA, gleaned through gene regulatory “big data” and machine learning. Using yeast as a model system, we recently demonstrated that random DNA (where bases are randomly selected from the four possibilities) placed in the a promoter-like context had diverse gene regulatory activity. Measuring random DNA at scale enabled us to train highly accurate machine learning models that capture gene regulation. These models demonstrated that gene expression evolution is highly dynamic and enabled us to chart the course of cis-regulatory evolutionary past and future. The diverse expression observed in random promoter sequences implied that regulatory activities were easy to evolve. Using a combination of experimentation and inference, we estimated how frequently gene regulatory features (e.g. transcripts and chromatin marks) occur by chance in entire chromosomes of evolutionarily naive DNA, finding that regulatory features are both predicted and observed to be frequent in evolutionarily naïve DNA. Since gene regulatory features are expected to occur by chance in the absence of selection, many of the biochemically active sequences in genomes are unlikely to be adaptive. Primer: Primer: Gene expression prediction from DNA sequences David Kelley Calico Labs How noncoding DNA determines gene expression in different cell types is a major unsolved problem, and critical downstream applications in human genetics depend on improved solutions. I will describe improved gene expression prediction accuracy from DNA sequences through the use of a deep learning architecture, called Enformer, that is able to integrate information from long-range interactions (up to 100 kb away) in the genome. This improvement yields more accurate variant effect predictions on gene expression for both natural genetic variants and saturation mutagenesis measured by massively parallel reporter assays. Furthermore, Enformer learns to predict enhancer–promoter interactions directly from the DNA sequence competitively with methods that take direct experimental data as input. We expect that these advances will enable more effective fine-mapping of human disease associations and provide a framework to interpret cis-regulatory evolution. For more information visit: Copyright Broad Institute, 2022. All rights reserved.
В начало