念書筆記

ref: A multi-omics integrative network map of maize | Nature Genetics

https://doi.org/10.1038/s41588-022-01262-1

摘要

Evolution of gene function is often preceded by whole-genome duplication (WGD), tandem duplication (tandem), proximal duplication (proximal) or duplications mediated by transpositions (transposed) and duplications of dispersed origin (dispersed) (Fig. 2a)20,21. Network modules group genes that potentially participate in the same process. The shortest distance (SD; Methods) is a metric that represents the shortest path between a pair of nodes within a network. When applied to duplicated genes, a large SD suggests that the two genes are functionally different. All duplicated genes had a significantly higher probability of belonging to the same module (Fig. 2b) and had a lower SD (Fig. 2c) than non-paralogous genes. Tandem and proximal genes were most likely to be enriched in the same network modules, whereas transposed and dispersed genes were least likely. Gene pairs resulting from WGD, tandem and proximal duplications had the lowest SD values, indicating that they are the least functionally divergent, whereas those derived from transposed and dispersed duplications had the highest SD values, suggesting that they are the most functionally divergent (Fig. 2b,c).

  • 可以把基因功能的演化分成

    • Whole genome duplication

    • Tandem duplication

    • Proximal duplication

    • Duplication mediated by transposition

    • Dispersed

  • 在網路結構裡,可以計算兩點間最短距離(short distance)來呈現基因之間的關係。如果最短距離較大,代表兩個基因功能比較不像了。

  • 本研究顯示多數duplicate gene還是在同module裏面,而且最短距離也比non paralougs還要短

  • Tandem與proximal通常都在同一個module裡,而transpose比較不會在同一個module

  • 以最短距離來比較,whole genome duplication、tandem、proximal通常有較短距離,但transpose、 disperse比較長,代表他們基因功能開始不像了

To further explore the network divergence of duplicated gene pairs from an evolutionary perspective, we introduced two indices, the Simpson index (SAB) and the divergence index (DAB). The SAB evaluates the similarity between two nodes, while the DAB measures the network divergence of two genes, and these indices are collectively a good indicator of functional divergence (Fig. 2d). Based on SAB and DAB values, we classified all duplicated genes into five divergence patterns (Methods), with a clear gradual decrease seen in the extent of network similarity between the two genes as differentiation increased from type I to type V (Fig. 2d). These five patterns potentially represent various types of functional evolution: conservation (type I), subfunctionalization (type II), neofunctionalization (types III and IV) and nonfunctionalization/specialization (type V)22.

  • 利用Simpson index (SAB)與divergence index (DAB)可以把所有的duplicate gene拆分成五種。當然這是有reference 參考的分法

    • conservation (type I),

    • subfunctionalization (type II),

    • neofunctionalization (types III and IV)

    • nonfunctionalization/specialization (type V)

Notably, overall the enrichment ratios gradually decreased from type I to type V compared to background, indicating that duplicated genes share a stronger network similarity than would be expected by chance (Fig. 2e). Different types of duplications, such as WGD and tandem, showed the same trend (Fig. 2f and Extended Data Fig. 5). Moreover, the proportions of different types of network divergence varied across genetic layers (Fig. 2g and Extended Data Fig. 5). Interestingly, we observed a dynamic rewiring of networks after gene duplication events from the coexpression to the cotranslation to the interactome level. Our results indicate that approximately 41% of the duplicates were consistent across different omics layer, while approximately 35% were associated with network divergence and approximately 24% were associated with network convergence (Supplementary Fig. 9). For example, the duplicated genes Zm00001d049815 and Zm00001d031611 displayed a progressive network rewiring from the coexpression to the cotranslation to the interactome level (Fig. 2h). Importantly, the proportions of duplicated genes originating at different times in the past varied dramatically from type I to V, suggesting that nonfunctionalization, neofunctionalization and specialization are more likely to be associated with older duplications, whereas functional conservation is more likely for the most recent duplications (Fig. 2i). This relationship of network divergence to duplication age became clearer from the coexpression to the cotranslation to the interactome. Together, these results indicate that our integrative network map encompasses potentially evolutionary information relevant to the dissection of gene function in an evolutionary perspective.

  • 這種奇異的分類方法之後要接一段敘述,分析利用這種分類方法的優勢及可以看出什麼特徵,再舉個實例證實。當然這篇的優勢在於multiple omits,所有需要將這優勢資料貫穿在分析之中。

Variable network divergence of maize subgenomes Maize is an allotetraploid species with distinct ancient subgenomes, designated here as maize1 and maize2, which exhibit asymmetric divergence in both gene content and expression levels23. Although maize1, the dominant subgenome, contains more orthologs and highly expressed genes than maize2, past studies did not find regulating bias, probably because they only assessed this question at a single level23,24. Therefore, we dissected the regulating bias of the two subgenomes in more detail by classifying all subgenome genes into two groups: maize1 genes with retained maize2 orthologs and maize2 genes with retained maize1 orthologs. We then compared the degree (a network attribute; the number of edges of a node) of genes from each group at different omics levels. To this end, we calculated the variation in the degree of genes from the two different groups. We detected no significant differences between the subgenomes at the coexpression level. The difference increased but was still not significant at the cotranslation level. However, we observed a significant difference at the interactome level between the two subgenomes, as reflected by the degree values (Fig. 3a), which suggests the functional importance of genes and nodes25. Additionally, we quantified the overall difference of the degree of genes in a sliding window (100 genes) in the ancient maize subgenomes derived from the Sorghum genome as previously published23. We detected more dominant bins for the maize1 subgenome from the coexpression to the cotranslation to the interactome level, suggesting that differences in network connectivity of the subgenomes vary across the three network levels (Fig. 3b and Supplementary Figs. 10 and 11)

  • 植物資料的優勢在於通常有subgenome演化的問題,所有如果能以degree去區分不同subgenome差異,也會是相當有意義。

Functional networks of important genes The integrative network map can reconstruct the functional connections of well-known genes and uncover new regulating genes with similar functions (Fig. 4). We explored a subnetwork involving three key tillering genes, teosinte branched 1 (Tb1, Zm00001d033673)26, grassy tillers 1 (Gt1, Zm00001d028129)27 and tassels replace upper ears 1 (Tru1, Zm00001d042111)28,29,30,31,32,33,34. Loss-of-function mutations in any of these genes lead to the formation of more tillers, a characteristic of the maize ancestor teosinte. Tb1, Gt1 and Tru1 were all part of a coexpression network with ZmALOG1 (Zm00001d003057) and ZmALOG2 (Zm00001d032696), two genes of unknown function belonging to the Arabidopsis LSH1 and Oryza G1 (ALOG) transcription factor family (Fig. 4a). Notably, the loss-of-function mutations in ZmALOG1 and ZmALOG2 obtained by CRISPR- or CRISPR*–*Cas9-mediated gene editing led to higher tillering, similar to the tb1, gt1 and tru1 mutants (Fig. 4b and Extended Data Fig. 6a). Moreover, yeast two-hybrid assays indicated that ZmALOG1 and ZmALOG2 can interact with TB1 (Fig. 4c). These results demonstrate that the network edges between known and unknown genes in subnetworks have potential biological meaning. Therefore, our maize network map can be used to generate hypotheses about the functions of genes of interest, as well as their putative network pathways.

  • 建立network之後,除了串聯已知基因之外,也可以探討未知基因的功能。所以可以靠突變是否有類似功能之類的資訊,以及yeast two hybrid等方法實際證實未知基因的功能。

Predicting novel functional genes for FT FT is an important agronomic trait involved in the adaptation of maize to a wide range of climates worldwide40,41,42. Although many key FT genes in maize have been cloned (Supplementary Table 11), much remains to be learned about the genetic control and molecular mechanisms underlying this trait. To explore the potential regulating networks, we constructed a prediction model based on the integrative map and a machine learning method using known FT genes (functionally cloned FT genes in maize and maize homologs of FT genes cloned in other species) as the training dataset (Methods, Extended Data Fig. 8 and Supplementary Figs. 13 and 14). The model showed high prediction accuracy and different layers of the interaction map had variable prediction power, with area under the curve (AUC) values ranging from 0.67 to 0.93 (Fig. 5a and Supplementary Fig. 14). We predicted that 2,651 genes are associated with FT (Supplementary Table 12), reflecting the highly complex molecular mechanism underling FT in maize. We assessed the SD between each known validated FT gene and all predicted FT genes. The predicted FT genes had significantly lower SD values (to well-known FT genes) compared to those obtained against background genes (Fig. 5b). Compared to background genes, the predicted FT genes were significantly enriched both in the Arabidopsis homology dataset and in a genome-wide association study of single-nucleotide polymorphisms43 associated with FT(Extended Data Fig. 9a).

  • 甚至,因為擁有了大量資料,就可以進一步進行預測了。這邊以開花基因作為預測模型。當然,因為開花基因擁有相對較多的true positive,甚至可以參考arabidopsis 資訊
  • 此外,預測完之後,現在也都需要實際knock-out證實預測模型是否正確了

Take home message

對於高度複雜的網路結構問題,作者提出一系列可用的指標,用來拆解網路裡duplicated gene與其功能的關係。也從網路結構裡建立預測模型,用來預測並驗證出一些新穎開花基因

Keywords

network