When it comes to gene chips, they are not unfamiliar to us, which is the method of detecting genetic variations through hybridization with a set of known nucleic acid probes. With the rapid development of NGS technology over the past decade, high-quality reference genomes for hundreds of economically important species have been generated. Simultaneously, a vast amount of resequencing data has been obtained, based on which various species have subsequently developed high-density SNP chips.
In recent years, functional genomics research has been developing rapidly, and technologies such as Hi-C, ATAC-seq and CUT&Tag/ChIP-seq have been widely applied in the functional genomics research of economically important species. Multiple species have had their genome promoters, enhancers, open chromatin regions, and other regulatory elements and active regions extensively identified, leading to increasingly refined genome functional annotation; Combined with high-level comparative genomics and evolutionary studies, a multitude of important functional loci that can influence gene expression and protein activity have been identified. Building upon the above research results, functional locus gene chips have emerged.
01
What is the functional locus gene chip? What are the differences compared with regular marker locus gene chips?
The functional locus gene chip refers to a chip design where all the included loci are functional mutations that can directly impact transcription levels or protein activity; Compared with regular marker locus gene chips, the functional locus gene chips are more scientifically selected in terms of loci, as they directly influence the regulatory processes of life and carry a higher content of functional information.
The marker locus effects of the marker locus gene chips shall be determined by linked functional mutations. However, due to variations in the tightness of linkage in different generations and populations, the marker locus effects tend to be unstable, making it challenging to achieve data integration for genomic breeding efficiency improvement. Functional locus gene chips carry functional mutations, which can effectively bypass the limitations of linkage disequilibrium. The effects of mutation loci should be relatively stable in different populations, which is highly advantageous for multi-generational and multi-population data integration analysis and able to significantly enhance the efficiency of critical gene discovery and genomic breeding.
Compared with marker locus gene chips, functional locus gene chips are closer to the "ideal gene chip". An ideal gene chip refers to the chip whose loci perfectly match all QTN loci of the studied phenotype. However, breeding target traits are often complex and influenced by minor polygene effects, making it difficult to design an ideal gene chip that captures all QTNs associated with the studied phenotype. However, it is certain that QTN must be a functional mutation. Therefore, compared with marker locus gene chips, functional locus gene chips include more QTN loci in theory, making them closer to the ideal chip. Furthermore, compared with whole-genome sequencing, functional locus gene chips may contain fewer QTNs, but their non-functional noise loci are also reduced greatly, and they have great advantages in terms of storage and computing costs. The relationships among the ideal gene chip, functional locus gene chip, marker locus gene chip, and whole-genome sequencing are as follows (assuming the studied phenotype is controlled by 1000 QTN):
Figure 1. Diagram for Gene Chips
02
Design Roadmap of Functional Locus Gene Chips
Figure 2 Technical Roadmap for Designing Functional Locus Gene Chips
03
Design Steps and Explanations for Functional Locus Gene Chips
1. Genome Assembly
Primarily targeted at species without a reference genome. When designing a sequencing scheme, it is required take a diploid karyotype species as an example. If only consistent genome assembly is required, which is to select one copy from homologous chromosomes as a representative for assembly at the chromosomal level, a combination of PacBio HiFi+Hi-C+WGS technologies is at least necessary; If single haplotype genome assembly is required, in addition to the consistent genome assembly data, high-coverage WGS data from both the paternal and maternal lines are required.
2. Genome Genetic Variation Detection
If the studied species lacks a high-quality genetic variation database or existing genetic variation information cannot effectively cover certain specific varieties, it is necessary to identify genomic genetic variations from scratch. Specific Method: Identify SNV, Indel, and SV of the studied species/variety using high-quality population genome resequencing data. Given the reduced sensitivity of detecting SV using second-generation sequencing data, representative individuals can be selected for third-generation PacBio HiFi resequencing to improve the efficiency of population SV detection.
3. Functional Genome Annotation
(1)Epigenome: Utilize the epigenome sequencing technologies, such as ATAC-seq and ChIp-seq/CUT&Tag, of specific tissues or various development stages of different tissues to comprehensively and accurately identify genomic regulatory elements and transcription factor binding motif.
(2) Conserved Elements of Genome Evolution: Genomic sequences that detect different conserved thresholds (completely conserved, highly conserved, and significantly conserved) in a collection of genomes from multiple species based on the concept of conservation.
4. Screening of Chip Candidate Functional Locus
Annotate and label genome-wide genetic variations, such as intergenic regions, introns, synonymous mutations, missense mutations, nonsense mutations, frame shift mutations, population's minimum allele frequency quantiles, regulatory elements, evolutionary conservation and candidate functional mutations. Combine all genomic feature weights and sequentially calculate the total score for genome-wide genetic variation features. Calculate the genomic haplotype group based on the genetic linkage information, and select the genetic variation with the highest feature score from each haplotype block as the tagged genetic variation and candidate variation locus for the gene chip.
5. Design and Evaluation of Whole-genome Capture Probe
Design a genome-wide probe sequence library by taking probe sequence length, GC content and specificity into account and predict the capture efficiency of all probes using deep learning models.
6. Determination of Chip Functional Locus and Capture Probe Sequencing
Design the first version of the functional gene chip for this species by taking chip locus feature score, genome representation, chip probe capturing efficiency, chip probe density, and chip size into account.
7. Breeding Assessment and Iterative Optimization
Iteratively optimize chip loci and probe sequences based on the actual results from the first version of the functional gene chip population, including data on actual probe capture specificity and efficiency, locus integrity, polymorphism information content and genome evaluation accuracy, etc.
04
Design Example: Pig 80K Functional Locus Gene Chip
Figure 3: Distribution of Functional Loci of Pig 80K Gene Chip in Different Types of Elements
Chip Features
01 Functional Locus: Selected are functional mutations that have a significant impact on gene expression levels or protein activity, not just molecular markers, making the chip site design more scientific.
02 Accurate Typing: Locus typing adopts precise capture sequencing technology, with a minimum sequencing depth of 10X for each target site, resulting in more accurate typing of chip locus.
03 Flexible Upgrading: Loci can be optimized based on the genetic characteristics of pig populations and the latest achievements in pig genome research, making chip locus upgrading more flexible.
04 Abundant Information: Not only does it provide information on the designated 80K functional loci, but also offers additional genetic variation information for approximately 300K nearby loci.
05 Efficient Breeding: Utilize intelligent weighted genomic breeding algorithms like KAML to achieve more accurate breeding estimates; Utilize HiBLUP breeding information platform to make efficient breeding possible; and utilize rMVP for genome-wide association analysis to realize the identification of causal mutation more efficient.
After reading this article, I believe you have a comprehensive understanding of functional locus gene chips, which are designed based on genome resequencing and functional genomics research data. Wuhan Yingzi Gene is equipped with the BGI T7 ultra-high-throughput sequencing platform, offering a wide range of genome research services, including Hi-C, ATAC-seq, CUT&Tag, HiChIP/ChIA-PET, GRID-seq and DNA methylation, etc; it also possesses proprietary genomic breeding algorithm software, liquid-phase gene chip technology and platform, enabling comprehensive support for the R&D of functional locus gene chips, and providing genetic breeding experts with a complete solution for the design (including data accumulation), preparation, detection and breeding applications of functional locus gene chips.