Cancer susceptibility may be controlled not only by host genes and mutated genes in cancer cells, but also by
the epistatic interactions between genes from the host and cancer genomes. We derive a novel statistical model
for cancer gene identification by integrating the gene mutation hypothesis of cancer formation into the mixturemodel
framework. Within this framework, genetic interactions of DNA sequences (or haplotypes) between host
and cancer genes responsible for cancer risk are defined in terms of quantitative genetic principle. Our model
was founded on a commonly used genetic association design in which a random sample of patients is drawn from
a natural human population. Each patient is typed for single nucleotide polymorphisms (SNPs) on normal and
cancer cells and measured for cancer susceptibility. The model is formulated within the maximum likelihood
context and implemented with the EM algorithm, allowing the estimation of both population and quantitative
genetic parameters. The model provides a general procedure for testing the distribution of haplotypes constructed
by SNPs from host and cancer genes and the linkage disequilibria of different orders among the SNPs.
The model also formulates a series of testable hypotheses about the effects of host genes, cancer genes, and
their interactions on cancer susceptibility. We carried out simulation studies to examine the statistical properties
of the model. The implications of this model for cancer gene identification are discussed. |