“HiPub looks through all this text and tries to recognize what is being called genes, proteins, drugs and diseases. It extracts this information and visualizes it in a network. Especially in molecular biology or cancer biology, it’s useful to see the connections between these things in their biological context,” says Aik Choon Tan, PhD, investigator at the CU Cancer Center and associate professor at the CU School of Medicine.
Tan gives the example of a hypothetical researcher who reads a paper exploring the genes KRAS and MEK, known to influence the development of certain cancers. “The researcher wants to know if these genes have any relevance to her specialty, maybe something like p53 [another gene known to influence cancer].”
The researcher queries “P53” along with the new article and HiPub visualizes how the researcher’s interest is connected to the genes in this new paper. If connections seem compelling, the researcher could design experiments to test these links.
“The idea of text mining isn’t new,” Tan says. “Computer scientists have been doing it for ten or twenty years. But the real application of text mining in biomedical research is very limited. HiPub is a way to use text mining to streamline the process of knowledge discovery.”
The project is a collaboration between Tan and colleagues in the Department of Computer Science and Engineering at Korea University in Seoul, Korea, including first author Kyubum Lee and co-senior author Professor Jaewoo Kang. Korea University hosts the HiPub description page, http://hipub.korea.ac.kr/, which includes download link and user guide.