The tool is an algorithm called CONSERTING, short for Copy Number Segmentation by Regression Tree in Next Generation Sequencing.
St. Jude researchers created CONSERTING to improve identification of copy number alterations (CNAs) in the billions of pieces of genetic information generated by next-generation, whole-genome sequencing techniques. CNAs involve the gain or loss of DNA segments. The alterations affect just a few or many hundreds of genes depending on the size of the DNA segments.
In this study, researchers showed CONSERTING identified such alterations with dramatically better accuracy and sensitivity than other techniques, including four published algorithms used to recognize CNA in whole-genome sequencing data. The comparison involved the normal and tumor genomes from 43 children and adults with brain tumors, leukemia, melanoma and the pediatric eye tumor retinoblastoma.
“CONSERTING has helped us harness the power of next-generation, whole-genome sequencing to better understand the genetic landscape of cancer genomes and lay the foundation for the next era of cancer therapy,” said corresponding author Jinghui Zhang, Ph.D., a member of the St. Jude Department of Computational Biology. “In this study of the tumor and normal genomes of 43 patients, CONSERTING identified copy number alterations in children with 100 times greater precision and 10 times greater precision in adults.”
First author Xiang Chen, Ph.D., a St. Jude senior research scientist, added: “CONSERTING helped us identify alterations that other algorithms missed, including previously undetected chromosomal rearrangements and copy number alterations present in a small percentage of tumor cells.”
Using CONSERTING, researchers discovered genetic alterations driving pediatric leukemia, the pediatric brain tumor low-grade glioma, the adult brain tumor glioblastoma and retinoblastoma. The algorithm also helped identify genetic changes that are present in a small percentage of a tumor’s cells. The alterations may be the key to understanding why tumors sometimes return after treatment.
In addition, Zhang said CONSERTING should make it easier to track the evolution of tumors with complex genetic rearrangements, sometimes involving multiple chromosomes that swap pieces when they break and reassemble.
St. Jude has made CONSERTING available for free to researchers worldwide. The software, user manual and related data can be downloaded from http://www.stjuderesearch.org/site/lab/zhang. St. Jude researchers have also developed a cloud version of CONSERTING and related tools that can be accessed through Amazon Web Services. Instead of downloading CONSERTING, scientists can upload data for analysis.
Work on CONSERTING began in 2010 shortly after the St. Jude Children’s Research Hospital — Washington University Pediatric Cancer Genome Project was launched. The Pediatric Cancer Genome Project used next-generation, whole-genome sequencing to study some of the most aggressive and least understood childhood cancers. Early in the project researchers realized that existing analytic methods often missed duplications or deletions of DNA segments, particularly small changes that involve a handful of genes and provide insight into the origins of a patient’s cancer.
CONSERTING has now been used to analyze next-generation, whole-genome sequencing data for the Pediatric Cancer Genome Project. The project includes the normal and cancer genomes of 700 pediatric cancer patients with 21 different cancer subtypes.
CONSERTING combines a method of data analysis called regression tree, which is a machine learning algorithm, with next-generation, whole- genome sequencing. Machine learning capitalizes on advances in computing to design algorithms that repeatedly and rapidly analyze large, complex sets of data sets and unearth unexpected insights. “This combination has provided us with a powerful tool for recognizing copy number alterations, even those present in relatively few cells or in tumor samples that include normal cells along with tumor cells,” Zhang said.
Next-generation, whole-genome sequencing involves breaking the human genome into about 1 billion pieces that are copied and reassembled using the normal genome as a template. CONSERTING software compensates for gaps and variations in sequencing data. The sequencing data is integrated with information about the chromosomal rearrangements to find CNAs and identify their origins in the genome.