Not junk after all
From IBM press releases page:
IBM today announced its researchers have discovered numerous DNA patterns shared by areas of the human genome that were thought to have little or no influence on its function and those areas that do. If verified experimentally, the discovery suggests a potential connection between these coding and non-coding parts of the human genome that could have a profound impact on genomic research and provide important insights on the workings of cells.
“Our goal is to apply advanced computational techniques to analyze the workings of processes and systems, in this case the function of the human genome,” said Ajay Royyuru, head of the Computational Biology Center at IBM Research. “Using these tools, we’ve been able to shed new light on parts of the DNA that were traditionally thought of as not having a specific purpose. We believe the innovative application of technology can provide further understanding in the life sciences at large.”
The findings will be published in the next PNAS issue. Here is the abstract of the publication:
Using an unsupervised pattern-discovery method, we processed the human intergenic and intronic regions and catalogued all variable-length patterns with identically conserved copies and multiplicities above what is expected by chance. Among the millions of discovered patterns, we found a subset of 127,998 patterns, termed pyknons, which have additional nonoverlapping instances in the untranslated and protein-coding regions of 30,675 transcripts from 20,059 human genes. The pyknons arrange combinatorially in the untranslated and coding regions of numerous human genes where they form mosaics. Consecutive instances of pyknons in these regions show a strong bias in their relative placement, favoring distances of {approx}22 nucleotides. We also found pyknons to be enriched in a statistically significant manner in genes involved in specific processes, e.g., cell communication, transcription, regulation of transcription, signaling, transport, etc. For {approx}1/3 of the pyknons, the intergenic/intronic instances of their reverse complement lie within 380,084 nonoverlapping regions, typically 60–80 nucleotides long, which are predicted to form double-stranded, energetically stable, hairpin-shaped RNA secondary structures; additionally, the pyknons subsume {approx}40% of the known microRNA sequences, thus suggesting a possible link with posttranscriptional gene silencing and RNA interference. Cross-genome comparisons reveal that many of the pyknons have instances in the 3' UTRs of genes from other vertebrates and invertebrates where they are overrepresented in similar biological processes, as in the human genome. These unexpected findings suggest potential unique functional connections between the coding and noncoding parts of the human genome.
It is interesting that 98.5% of our genome is non-coding (meaning it does not encode functional proteins) and still, science does not leave it at that but keeps exploring and asking infinite questions about it, driven by simple reasoning: it is highly unlikable that the millions of years of evolution have left 98% of DNA that needs repairing, doubling, dragging to different sides of the cell in the process of replication, sifting through in the process of transcription etc., without any purpose for such a vast amount of nucleic acid. Finding a pattern in a chaos is usually a first step towards discovering a function.
IBM Research | Press Resources | IBM Discovery Could Shed New Light on Workings of the Human Genome