New data collection priority: focusing on genome-based bioinformation
Aннотация
Genetic research used to be data-driven under the framework of well-established gene theory. With the advance of various-omics technologies, large-scale data generation has become routine. However, data analyses have unexpectedly become extremely challenging due to the highly heterogeneous nature of bio-data. Furthermore, these diverse data sets seem to be inconsistent with some key predictions of gene theory. To briefly address this new reality, this editorial suggests a new genome theory to explain the new facts. By briefly comparing cytogenetics and gene sequencing research, the importance and key principles of chromosomal coded bioinformation is highlighted in the context of evolutionary studies and disease research.
Ключевые слова: genome chaos, genome theory, karyotype coding, missing heritability, non clonal chromosome aberrations, system inheritance, two-phases of cancer evolution
While classical cytogenetic analyses were technically challenging (e.g. extensive training and practice are required to perform different banding methods and precisely identify altered chromosomes and chromosomal fragments), the theoretical perspective motivating it was rather simple: chromosomes are the carriers of genes, and clonal chromosomal aberrations are the focus for analyses [1, 2]. Molecular cytogenetics has certainly made cytogenetic analyses much easier. Although the higher order of chromosomal structure is still unsolved, karyotype analyses are working so far, especially with the help of various FISH technologies, including SKY and M-FISH.
The discovery of CNVs (copy number variations) generated high excitement, and array CGH (comparative genomic hybridization) has become routine in cytogenetic labs. For years, there have been discussions and predictions on whether array CGH should replace classical karyotype analyses [1, 3]. One of the key rationales of such discussion is simple: array CGH provides much higher resolution, and in the era of molecular biology, higher molecular resolution is considered higher quality, more valuable information. For many researchers, only molecular sequencing approaches can reveal any “mechanistic” understanding: cytogenetic profiles are merely descriptive [4]. No wonder “sequencing everything” has become a new trend for data collection in biological research.
Ironically, the surprising results of various large scale-omics studies, including the Human Genome Project, Personal Genome Project, and Cancer Genome Project, have forcefully challenged the molecular rationale that focuses on “molecular part characterization” rather than system behavior at more suitable levels of genomic-systems.
For example, the “highly predictable” relationship between genes and phenotypes has become increasingly complicated, even more so with high degrees of uncertainty. Most complex traits often cannot be explained by polygenic models as the missing heritability seems to be caused by the emergent system behavior of a closely connected yet adaptive genomic network within dynamic environments. Such emergent properties are hard to link to a limited number of either dominant genomic or environmental factors, even with whole genome scanning methodologies based on huge numbers of samples. A system is much more than the collection of all parts.
Similarly, a more specific puzzle comes from the current Cancer Genome Project. Its goal, based on the gene mutation theory of cancer, is to identify a key limited number of common cancer gene mutations. Presumably, by sequencing many cancer samples for each cancer type, the pattern of driver genes will emerge from the background of heterogeneity or “genomic noise.” Quite the opposite is true, however. What was detected was a high degree of heterogeneity (at multiple genomic and nongenomic, including epigenetic, levels), which reflects the stochastic nature of genomic changes in most cancers [1, 2].
While it is disappointing to fail the initial goal of identifying common mutation drivers, the sequencing data did confirm the importance of cytogenetic findings that chromosomal changes are the common drivers of cancer evolution. First, it confirms the two phases of cancer evolution (punctuated and stepwise) in most cancer types [5, 6]; Second, it endorses the importance of Genome Chaos in evolutionary phase transition [7-9]; Third, it supports the significance of heterogeneity in cancer, as previously ignored “genomic noise” actually represents evolutionary potential and can be used as a new biomarker to monitor system instability [1-2]; and Fourth, it validates the conclusion that profiles of chromosomal abnormalities have much better clinical prediction value than gene mutation data [10].
Why is cytogenetic data more powerful than individual gene mutation data? The genome theory of cancer and organismal evolution offers a clear explanation: the chromosome is not just the carrier of genes, but also the genomic organizer at a higher level of organization. More specifically, the chromosomal sets encode a new type of bio-information above genes. In the perspective of systems biology, the network of gene interaction is the key. However, what defines the gene interaction is unknown. The genome theory proposes that the physical platform of the gene interaction is provided by the karyotype coding, where the physical relationship of genes/regulatory elements along and among chromosomes can form action domains within the 3D nuclei. When the karyotype coding is changed by chromosomal numerical or/and structural changes, new action domains will form, leading to changes of the gene interaction relationship, often at the global level [11]. Those variants form the genomic basis for different types of diseases when interacting with dynamic environments [12].
Knowing the ultimate importance of genome-encoded information (system inheritance) and its relevance to biological processes, from evolutionary mechanisms to implications in disease formation, a new research priority needs to be established in the field of genomics and evolution, as well as molecular medicine [13]. First, a new framework embracing the multiple levels of genomics needs to be established by accepting the fact that not all levels of organization are equal. Long-ignored chromosomal level profiling might be more useful than DNA sequencing, especially during the macro-cellular evolutionary phase. Second, efforts are needed to build technical platforms suitable for chromosomal profiling [14-19]. There is much to be done: a) Systematic identification of different types of chromosomal numerical/structural changes, many of which have been considered “noise” in previous studies due to the lack of clonality; b) Understanding the relationship among various subtypes of chromosomal abnormality (from aneuploidy to polyploidy, from simple translocations to chaotic genome) and comparing their contributions to disease processes; c) Acceptance of using Non-Clonal Chromosomal aberrations or NCCAs to study genome instability and unify different molecular mechanisms; d) Profiling both individual cells and population dynamics in watch evolution in action experiments. Third, new knowledge of genome-based genomics and evolution needs to be applied into disease studies, especially when dealing with complex issues such as genomic mosaicism, phase transitions of somatic evolution, and the interaction between genomics and environment [20-22].
It is thus timely to write this editorial to remind readers of the importance of data collection at the chromosomal level in current biomedicine. Fortunately, the genome theory has outlined the key rationales and conceptual frameworks of genome-based genomic and evolutionary ideas, based on genome-defined bioinformation principles [1].
No conflict of interest was recorded with respect to this article.
Список литературы