Cell-specific 3D Epigenome Interpretation of Brain Development Gene Regulatory Mechanisms

2022-03-28    作者:Yingzi Gene

This article is reprinted from the cutting-edge research platform, FSR



Hello, everyone. The paper we're sharing this week is "Cell Type-specific 3D Epigenomes in the Developing Human Cortex", published in Nature on October 14, 2020. In this article, multi-omics approaches are utilized comprehensively to analyze key genetic variations discovered through GWAS, eQTL or selective sweep, making it an invaluable analytical template. It holds great reference value and is particularly interesting for those who are interested in 3D genomics and GWAS. Are you ready? We're about to take off!


This research paper was jointly published by the research group of Shen Yin at the University of California, San Francisco, the neurobiologist Arnold Kriegstein, and the statistician Hu Ming from the Cleveland Clinic.



First, let's briefly outline the development of 3D genomics technologies.




In 2009, the Hi-C technique was introduced, allowing for the initial analysis of chromatin interactions and conformation, and since then became a cornerstone in the field of 3D genomics. However, as a first-generation technology, it inevitably has some drawbacks, requiring a large quantity of cells, and unable to maintain the integrity of the cell nucleus, leading to changes in chromatin conformation. In 2014, the in situ Hi-C technology, as an improved version, became capable of capturing the complete chromatin spatial conformation within the cell nucleus, significantly enhancing the level of spatial resolution (as seen in the figure below). However, when it came to capturing genome-wide interactions with Hi-C technology at specific resolutions (i.e. 1-5kb), it requires at least 1-3 billion reads, which is generally challenging for most laboratories to handle. On this basis, researchers from around the world began to make various modifications and innovations to in situ Hi-C. For instance, if researchers want to capture chromatin interactions at specific loci, they can design probes for specific regions, giving rise to capture Hi-C. If they want to capture interactions mediated by specific proteins or factors, they can use corresponding antibodies, leading to chip-mediated Hi-C (i.e. Hi-ChIP or PLAC-seq); if they want to capture interactions within open chromatin domains, they can utilize the principles of ATAC-seq and develop technologies like HiCAR, and so on.



 (Rao, S. S. et al, Cell, 2014)



Having covered Hi-C, let's now focus on introducing the PLAC-seq (Hi-ChIP) technology utilized in this article. In terms of PLAC-seq technology, it's quite intricate and complex, and a few sentences might not suffice to cover its entire history and development. In the future, if there's enough interest, we might dedicate a separate discussion to it. For now, let's primarily focus on the selection of analysis software. Since PLAC-seq captures interactions that are "enriched" for specific factors, various balance principles of Hi-C are not as applicable (such as VC, KR, ICE methods, which generally aim for "equal visibility" across specific genomic regions). However, due to the specific factor-mediated interactions, it is theoretically impossible to capture genomic regions without binding and interacting with target factors, thus contradicting the equal visibility hypothesis. Therefore, we need to apply special modeling corrections to the analysis of PLAC-seq and similar technologies. Currently, several relevant analysis software have been developed. Here, I will list four of them in chronological order, as shown in the figure below:





MANGO was introduced very early, but it was initially designed to only consider "peak-to-peak" interactions, and "peak-to-none" interactions within the genome are not possible. Next is Hichipper. This software has corrected for biases related to enriched peaks but still can only analyze "peak-to-peak" interactions. Moving on to the year 2019, two major software tools were launched. One is MAPS, developed by Professor Ren Bing's lab at UCSD, which used a completely new modeling approach and for the first time incorporated the analysis of "peak-to-none" interactions. Five months later, another software tool, Fithichip, was released by "ay-lab", known for developing various software applications. The distinctive feature of this software is the correction of distances between chip-seq peaks and PLAC-seq cleavage sites, resulting in more precise chromatin interactions.



(Aside 1, regarding chromatin interactions, the terms "loop/contact/interaction" in the literature may sometimes be confusing. Through the comparison of various studies, we believe that contact and interaction have broader meanings, whereas loop is often a very significant or precise terms. Anything very weak or non-significant may be a false positive; moreover, if the software you are using is identified as "interaction", it is what it says. After all, the use of the software in this matter is flexible and can vary according to the preference).



(Aside 2, if you're interested in the practical performance of these software tools, feel free to consult with our editorial team. They're quite professional, and might have specific test results to share with you.)



(Aside 3, it might seem a bit verbose, but there are rules for naming the ends of each interaction or loop. For example, if your data is mediated by H3K4me3, then the end of the loop with H3K4me3 peak binding is called the "anchor bin", and the end without H3K4me3 binding is the "target bin". So, what are the roles of the two ends in this case? Which one is the promoter? Which one is the potential regulatory element, such as enhancer?) .



Alright, now we're finally turning back to this article. When reading the article, it's essential to start with the biological materials used. (Otherwise, why is the first figure in CNS articles always about the overall experimental design?) . In this article, human mid-stage embryonic brains were used to isolate four distinct cell types: Radial Glial Cells (RG), Intermediate Progenitor Cells (IPC), Excitatory Neurons (eNs), and Interneurons (iNs). These cell types were isolated through flow cytometry by utilizing cell-specific marker genes. The types of cells are as follows (where RG -> IPCs -> excitatory neurons represent a complete differentiation pathway):

 

 

Following the flow cytometry, the authors employed H3K4me3-mediated PLAC-seq technology to identify chromatin interactions in the four mentioned cell types, and the basic statistical parameters for these loops are as follows:


 



From left to right, the parameters are as follows: the total number of loops and the ratio of loops with peaks at both ends (AND) and loops with peaks at only one end (XOR), the distribution of loop distances in the four cell types, and the statistical distribution of the number of loops on each promoter.

 





The Venn Diagram reveals that loops exhibit cell type-specific characteristics. Moreover, the vast majority of loops also occur within the TAD. (Generally, TADs are highly stable and represent larger gene expression regulation units on the genome. If we liken the genome to a teaching building, TADs can be considered as individual classrooms. Inside each classroom, students (gene clusters) attend the same class (receive similar transcriptional regulation), but interactions between different classrooms do not influence each other).




Analysis often relies on entities, and in the next step of the analysis, we start with the entities that are the loops falling on promoters. These are specifically categorized using Shannon Information Entropy method. (For more information on how to use it, please reply "Information Entropy" in the background). Then, we observe the expression of these promoter-associated genes, and it seems that the stronger the loop is, the higher the gene expression levels will be? Now take a quick look at the functional annotations of these gene clusters, it also aligns well with the distinct biological characteristics of the four cell types. So, we can't help but wonder: what is the relationship between three-dimensional regulation (loops) and gene expression? Then a more detailed and specific (quantitative) method is needed.





First, we conducted a correlation analysis between gene expressions and loop strength across different cell types, revealing a moderate positive correlation. It appears that loops do indeed influence the gene expression. So we examined the direct correlation between the number of loops and gene expressions, only to find that the correlation was quite low. Huh? Why is this?





Here, the authors provided two hypotheses:



1) Loops act as bridges between regulatory elements and genes, but the impact of regulatory elements on gene expression may be fine-tuning rather than robust and dramatic regulation;




2) The simultaneous action of multiple regulatory elements may exert nonlinear regulation over gene expression.




As for these two explanations, we need to elaborate a bit further. Organisms are highly redundant in many respects, with numerous mechanisms acting on the same process. Take the core of life processes—gene expression as an example, it is simultaneously influenced by DNA methylation, histone modifications, transcription factors, RNA, chromatin accessibility, and multiple regulatory elements (as well as many potential and unknown influences). So, when quantifying these effects, their weights (variance components) differ in various contexts, such as different time points, different cell types, different biological processes, or even different genes themselves. With varying weights, the degree of correlation naturally varies as well.



Did I mention chromatin accessibility and transcription factors just now? Yes, these two factors are crucial when considering gene expression regulation. Therefore, the authors conducted the aforementioned analysis on the target bin regions of loops (potential enhancers):





These regions are highly accessible and bound by transcription factors that are essential in all four cell types. (Want to know how to create a bubble plot?) Raise your hands, folks! It looks like we're starting another topic. (-_-||...)



Regarding the relationship between loop regulation and gene expression patterns, mere correlation isn't sufficient. The authors also conducted expression trend analysis, where the trends for groups 1, 2, and 3 are the same. I won't go into further detail on this here. It is worth mentioning that, why the loop trend and gene expression in Group 4 and Group 5 are in an inverse relationship? We discovered a positive correlation just now, right? Why there is such contrasting relationship?



Now, let's list all the factors and examine them closely:









Based on the analysis in the lower right corner, it becomes apparent that loops exhibit an opposing trend to gene expression. Furthermore, these loops have a lower overlap with enhancers and involve some inhibitory transcription factors. Thus, we speculate that these loops are likely structural or inhibitory loops rather than inactive enhancer-promoter regulatory loops.




Next, we've reached a crucial point in this article! Those who are familiar with 3D omics, some of you may already be well-acquainted with the concept of "super enhancers" (for those not familiar, here's a review article https://www.nature.com/articles/ng.3167). In this article, the authors leveraged a similar concept to statistically analyze the loop strengths for each promoter, sort them all, and ultimately obtain "super interactive promoters" (SIPs). So what are the characteristics of these SIPs?


 



从左至右:SIP中发现很多四种细胞的特异关键基因;韦恩图详细展示;SIP的表达量在全基因组中属于第一梯队。


Next, we can observe that SIPs, in comparison to non-SIPs, have anchor ends of loops that are more enriched with super enhancers and DNA methylation valleys, while the target ends are more enriched with super enhancers (indicative of strong transcriptional activity). Simultaneously, the right figure employs several datasets, further confirming that SIPs are more enriched in cell type-specific genes.


 



Lastly, let's take a look at how PLAC-seq is employed to analyze GWAS loci. In this study, the authors selected seven different human psychiatric disorders for investigation. They quantitatively calculated the genetic contribution of each locus to the diseases using Linkage Disequilibrium Score Regression (LDSC), and the specific results are as shown in the following figure:





From left to right, these are: loop anchor bins (promoters), loop target bins (potential enhancers), distant ATAC peaks, cell-type-specific genes (dashed lines indicate no significant enrichment). It's quite evident that, compared with the latter two, loops have a significantly higher explanatory power for disease loci. The reason is that, unlike traditional approaches that solely focus on genes or regulatory elements, loops can seamlessly integrate both, offering a more holistic and realistic perspective. Of course, this may also lead to an improvement in explanatory power.



Finally, due to space limitations, the mechanism of transposons in loop formation and the author's newly developed experimental validation system using CRISPR and SMART-Q were not covered in this overview. Interested readers can explore these topics on their own, and if you have any questions, feel free to interact with our editorial team. (ง •̀_•́)



Finally, in the era of single-cell studies, transitioning from scRNA-seq to scATAC-seq, the next big explosion will be the single-cell 3D genomics technology, so are you ready for it?

 

Original Link:https://doi.org/10.1038/s41586-020-2825-4