GeneNarrator: Mining the Literaturome
for Relations Among Genes |
| Jing Ding1, Daniel Berleant2,*, Jun Xu3,
Kenton Juhlin4, Eve Wurtele5, Andy Fulmer6 |
1Information Warehouse, Ohio State University Medical Center, 410 W. 10th Ave.,
Columbus, Ohio, 43210, USA, jing.ding@osumc.edu, (614) 293-0776, fax (614) 293-2210 |
2Department of Information Science, University of Arkansas at Little Rock, 2801 University Ave.,
Little Rock, Arkansas, 72204, USA, berleant@gmail.com, (501) 683-7056, fax (501) 683-7049 |
3Miami Valley Laboratories, The Procter and Gamble Company, 11810 East Miami River Rd., Ross, Ohio, 45061, USA, xu.j.1@pg.com |
| 4Miami Valley Laboratories, The Procter and Gamble Company, 11810 East Miami River Rd., Ross, Ohio, 45061, USA, juhlin.kd@pg.com |
| 5Department of Genetics, Development and Cell Biology, Iowa State University, Ames, Iowa, 50011, USA, mash@iastate.edu |
| 6Miami Valley Laboratories, The Procter and Gamble Company, 11810 East Miami River Rd., Ross, Ohio, 45061, USA, fulmer.aw@pg.com |
| *Corresponding author. |
| Received July 07, 2009; Accepted August 23, 2009; Published August 24, 2009 |
Citation: Ding J, Berleant D, Xu J, Juhlin K, et al. (2009) GeneNarrator: Mining the Literaturome for Relations
Among Genes. J Proteomics Bioinform 2: 360-371.doi:10.4172/jpb.1000096 |
Copyright: ©2009 Ding J, et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author
and source are credited. |
| Abstract |
The rapid development of microarray and other genomic technologies now enables biologists to monitor the
expression of hundreds, even thousands of genes in a single experiment. Interpreting the biological meaning of the
expression patterns still relies largely on biologists’ domain knowledge, as well as on information collected from
the literature and various public databases. Yet individual experts’ domain knowledge is insufficient for large data
sets, and collecting and analyzing this information manually from the literature and/or public databases is tedious and
time-consuming. Computer-aided functional analysis tools are therefore highly desirable.
We describe the architecture of GeneNarrator, a text mining system for functional analysis of microarray data.
This system’s primary purpose is to test the feasibility of a more general system architecture based on a two-stage
clustering strategy that is explained in detail. Given a list of genes, GeneNarrator collects abstracts about them
from PubMed, then clusters the abstracts into functional topics in a first clustering stage. In the second clustering
stage, the genes are clustered into groups based on similarities in their distributions of occurrence across topics.
This novel two-stage architecture, the primary contribution of this project, has benefits not easily provided by onestage
clustering. |
|
|
|