Scientific Literature Mining (LitMiner)
Information Analysis and Retrieval
Scientific Literature Mining (LitMiner)
How
can scientific researchers hope to know about all of the latest
advances and new discoveries in their field, given that more than
40,000 scholarly articles are published in the scientific literature
every month? How can they be sure of finding all of the relevant
knowledge “hidden” in journal articles?
Even with the advent of massive numerical and structural databases,
the scientific literature still holds the newest information and the
intelligence surrounding the data. The problem is that researchers
cannot hope to read all the articles relevant to their field of study
if they are also to conduct research.
In response to this pressing challenge, the NRC Institute for
Information Technology (NRC-IIT), in collaboration with the NRC
Institute for Biological Sciences (NRC-IBS), the Canada Institute for
Scientific and Technical Information (CISTI), the Samuel Lunenfeld
Institute and Blueprint International, is developing a unified
collection of text and language processing tools to solve the real
information needs of genomic and proteomic scientists.
In the short-term, the goal is to save researchers time by letting
computers assume some of the tasks. In the longer term, the goal is to
support hypothesis formation in ways that are not possible with the
current organization of the literature.
The LitMiner project is currently in its first
stage, which is to integrate several existing text tools into a
proof-of-concept prototype. More elaborate scenarios of use will be
possible from that prototype, which will ultimately lead to more useful
systems.
The research is being conducted in both in text processing and
bio-informatics. Most of the tools being combined in LitMiner are
machine learning, information retrieval or text mining algorithms,
either new or based on novel modifications of existing algorithms.
While the application to the scientific literature is driving the
further development of these algorithms, the research is important in
its own right.