An article published in the journal “Genome Biology” reports the results of tests conducted with SeqScreen, a free / open source software developed to recognize genetic sequences existing in pathogenic microorganisms. A team of researchers led by computer scientist Todd Treangen of Rice University and genomics specialist Krista Ternus of the scientific consultancy firm Signature Science, LLC developed SeqScreen to analyze the characteristics of short DNA sequences often called oligonucleotides and improve the recognition techniques of sequences present in a sample that are at least potentially dangerous.
The recognition of pathogenic microorganisms is becoming possible thanks to the great advances made in genetic techniques and computer systems used to analyze DNA (or RNA in the case of RNA viruses). Despite this, accurate recognition of pathogenic sequences is still complex. For this reason, in 2017, a project to improve the situation was funded by IARPA (Intelligence Advanced Research Projects Activity), a USA government agency that manages high-risk, high-payoff scientific research.
So far, in the approach in this type of research, the tools that were used tried to identify specific bacteria or viruses while SeqScreen was developed to recognize in any microorganism small genetic sequences that can have harmful effects such as toxins. These microorganisms can exchange genes through the so-called horizontal gene transfer, and that means that there can be bacteria with almost identical genomes except for some sequences that encode harmful substances such as toxins. SeqScreen was developed to recognize those sequences in the bacterial variant that included them in its genome.
It took years of development and the use of a machine learning algorithm to train SeqScreen to recognize certain genetic sequences. The image (Courtesy Balaji, A., Kille, B., Kappell, A.D. et al.) illustrates SeqScreen’s workflow (A) and the machine learning training framework (B). The software is available under the free / open source GNU GPL3 license.
The spread of potentially harmful viruses and bacteria has grown in recent decades thanks to the ease of travel even between different continents. The Covid-19 pandemic dramatically showed how any delay in intervening with quarantines and other health measures can have heavy effects. New bioinformatics applications such as SeqScreen can help recognize harmful microorganisms early and Todd Treangen stated that this software will help detect new or emerging pathogens from the environment.