Journal Home
Search for

Volume 146, Issue 6, Pages 929-931 (December 2003)


View previous. 7 of 46 View next.

Where the rubber meets the road in pharmacogenetics: assessment of gene-environment interactions

Elizabeth R. Hauser, PhDacCorresponding Author Informationemail address, Andrew S. Allen, PhDbc

Article Outline

References

Copyright

One of the most promising consequences of the Human Genome Project, including the single nucleotide polymorphism and haplotype maps, is the promise of understanding the relationships among the genetic variants that affect drug actions and treatment outcomes in individuals. This promise hinges on our understanding of gene-environment interactions and how to detect them. However, these are difficult studies to perform, requiring careful planning, large sample sizes, and even some luck. The study by Carlquist et al1 in this issue of the Journal reporting an association between CETP Taq1B polymorphism, statin treatment, and coronary artery disease outcomes is a good example. The study was motivated by clear biologic reasoning; several lines of evidence leading to a possible interaction between CETP and statins were drawn that led to the primary hypothesis of a differential response to statins, as measured by death or coronary artery disease outcomes.

The question remains: What do we do next? Once the statistical relationships between genes and drug response are identified, it is important to follow these investigations with an examination of biological relationships in order to fully elucidate the involved pathways. As a result, adding genetic information to analyses of drug effects includes the promise of not only understanding the clinical ramifications of use of drugs but also adding to our understanding of disease biology.

Studies of gene-environment interaction can open the door to new investigations, including investigations that are generally expensive and time consuming; investigations that require substantial intellectual input to make biological sense of the relationships; and investigations that require development of new model systems or expensive data sets. Thus it is incumbent on those of us doing the upfront clinical, epidemiologic, and statistical work to be as careful and efficient as possible so that we provide convincing arguments as to why our colleagues should pursue these findings.

The ideal design for studies of gene-environment interaction is one in which a large clinical sample of the proper phenotypic composition is assembled to answer the single question at hand. This is never the case in studies of gene-environment interaction. The reality is that most of these studies are performed on cohorts of patients gathered to answer a wide variety of questions, only some of which are genetic in nature. These patient cohorts are also expensive and time consuming to gather. In addition, there is an important sense of moral obligation to make full use of these cohort sample sets because that is the only “benefit” most subjects will receive from providing samples to these open-ended and nonspecific cohort collections. Thus it is essential that we use these sample sets to the fullest extent possible. At the same time this desire to squeeze all the information possible out of a data set needs to be moderated by the real costs, both financial and scientific, of concluding that there is a relationship between genetics, environment, and disease when indeed none exists.

Pharmacogenetic studies present statistical geneticists and clinical trials statisticians a unique opportunity for collaboration by allowing each discipline's unique perspective to bear on the problem at hand: establishing the existence of differential treatment responses across the genome. These differences in perspective can result in challenges to the status quo within each discipline with the resulting opportunity for stimulating discussion and advancement. A case in point is the issue of multiple testing and the resulting inflation of type I error common to genome scans. Statistical geneticists have relied on the fact that genes are biological entities that can be subject to experimental systems to understand how the statistical results relate to human disease. They count on subsequent studies to elucidate the biological plausibility of their findings. As a result, their primary orientation is toward optimizing gene discovery with lesser emphasis on controlling false discovery rates. On the other hand, clinical trials statisticians rigorously apply statistical principles, including the control of false discovery rates, in evaluating medical therapies while avoiding unnecessary potential for harm. These are fundamentally different pursuits that are joined when pharmacogenetic effects are examined. These differences in perspective translate into differences in what is perceived as an acceptable level of false discovery. Gene-environment interaction studies exist in the hazy middle ground where exploratory analysis is required to develop hypotheses to be tested yet where clinical practice guidelines could be changed based on demonstrated gene-environment interactions. And thus it remains to draw on the strengths of both points of view to develop gene-environment interaction studies that answer questions and provide insight into further investigation.

There are a number of concepts that are important in both statistical “cultures” and should continue to be important. Of course, the notions of biological plausibility of the hypotheses and results are paramount. Cause and effect chains that can be tested are the gold standard; however, even clearly replicated genetic effects, such as the relationship between the APO-E4 variant and Alzheimer disease, may not result in complete understanding of cause and effect relationships.

Power calculations play an important role in study planning and their utility is well recognized. Power, the probability of rejecting the null hypothesis when a given alternative is true, is an important consideration, because having reasonable power to detect a meaningful effect protects one from the ultimate waste of time, money, and data: a study that fails to reach a conclusion of any kind. When a study fails to detect any effect, reasonable power enables one to conclude that a given effect probably doesn't exist, for if it did the study had high probability of detecting it. Interactions are notoriously difficult to detect in any study. Gene-environment interactions are further impacted by our inability to control allele frequencies and oftentimes exposures as well. To illustrate this, we considered planning a follow-up study to confirm the results of the Carlquist et al study and conducted a power analysis to get an idea of the sample size requirements of such a study. We used the event frequencies in the various genotype/treatment categories as well as the statin treatment frequencies presented by Carlquist et al. One thing that became apparent during this exercise is that the duration of follow-up is not clearly articulated in Carlquist et al, nor is it clear that these times are similar across genotype/treatment categories. This wouldn't affect a time-to-event analysis but makes the event frequencies in the logistic regression difficult to interpret. This is something that should be considered in planning future studies. Here we assume that the follow-up time is constant across groups. As in Carlquist et al, we use tests of supra-multiplicative interaction via logistic regression and assume a dominant genetic model. We vary the frequency of the b2 allele to illustrate the sensitivity of power estimates to this parameter. (The b2 allele frequency was 41.9% in the Carlquist et al sample). We simulate 1000 data sets for each combination of these parameters.

Fortunately for Carlquist et al, the observed b2 allele frequency of 41.9% is near the allele frequency yielding the greatest power (Figure 1). Unfortunately for anyone else attempting to replicate these results, the Carlquist et al sample size is woefully underpowered for this replication task. In fact, it is not until we obtain a sample 4 times as large as that presented in Carlquist et al that we are able to obtain reasonable power over a wide spectrum of allele frequencies. This demonstrates the need for large data resources to conduct these confirmatory studies.


View full-size image.

Figure 1. Empirical power of gene × treatment interaction tests. The three curves denote the power of a study with the same number of subjects as the orginal Carlquist et al sample (n × 1), 2 times the Carlquist et al sample size (n × 2), and 4 times the sample size (n × 4). Empirical power computed via 1000 simulated datasets. Interaction test is performed assuming a 0.05 α level.


There are 2 ways to achieve sample sizes this large. The first is to assemble a single sample data set that can be used by itself in addressing the question, either collected at a single site or as part of a larger consortium. Alternatively, a meta-analytic approach can be applied. Consistency of conclusions across several independent studies is highly valued. Furthermore the tools to evaluate consistency of conclusions, such as a complete description of the study population and complete reporting of results, should be included in each publication. Meta-analysis is an important tool for comparison of related studies and a meta-analysis should be anticipated when initial results are reported. All efforts should be made to promote meta-analysis, even so far as providing primary data when possible, to increase the efficiency of all studies.

An important concept that is unique to genetics is the sense of a genomic context for the results for individual candidate genes. One aspect of genomic context that has been widely accepted is the need to test many markers for the presence of different genetic backgrounds (ie, population substructures that may invalidate certain genetic assumptions in the analysis). There are several other subtleties specific to genetic analysis that are becoming more widely recognized. For example, it is generally very difficult to control for specific genes and allele frequencies at the outset. We have to take what exists in the sample set in terms of frequencies of alleles at specific genes. Until genotyping assays become as easy as blood pressure measurements, the element of surprise will always be operating in obtaining allele frequencies. Because humans are diploid, there exist several possibilities for describing the interaction between the 2 alleles at any given marker, possibilities that are compounded when examining gene-environment interaction. Information obtained by examining possible genetic models (eg, recessive, dominant, or additive) may be useful in developing additional studies. These models should be examined, evaluated, and reported where possible. Furthermore, relationships between variants within a gene should be taken into account. Our ability to examine linkage disequilibrium is evolving as genetic tools, such as the haplotype map and statistical tools for haplotype estimation, are improved, furthering our ability to understand the gene as an entity, rather than as discrete single nucleotide polymorphism variants.

Finally, assessment of gene-environment interaction has much to be gained by including several principles from clinical trials. As discussed above, when considering genetic context, the idea of a clinical context is very useful. Many investigations of gene variants done in the context of clinical trials are parts of larger studies designed to answer very specific questions about treatment efficacy. The parent study is usually prominently discussed and the assumptions made in the parent study are used to inform the interpretation of the results from the substudy. In addition, important concepts related to survival analysis and follow-up studies should be incorporated in genetic studies, including active follow-up. Cohort sample sets often involve active follow-up of clinical information and outcomes along with well-established quality-control procedures for the follow-up, such as those implemented for the Duke Cardiovascular Diseases Databank.

These are exciting times for the study of gene-environment interactions of any kind, with pharmacogenetics leading the way. Investigations of gene-environment interaction provide an ideal opportunity for clinical trials statisticians and statistical geneticists to interact and to develop new models and methods. By working together we can bring the strengths of both points of view, along with new methods incorporating data resampling and Bayesian perspectives, to bear on gene-environment investigations. The amount of genetic information generated in clinical studies investigations will only increase. If we can balance our need to learn as much as possible from large, but still finite data sets, with careful evaluation of the consistency of results generated from combining information from disparate sources, we will be able to realize some of the promise that the genetic era holds for improving human health and understanding human biology.

References 

return to Article Outline

1. 1 Carlquist JF, Muhlestein JB, Horne BD, et al. The cholesteryl ester transfer protein Taq1B gene polymorphism predicts clinical benefit of statin therapy in patients with significant coronary artery disease. Am Heart J 2003;146:1007–14

a Center of Human Genetics, Duke University Medical Center, Durham, NC, USA

b Duke Clinical Research Institute, Duke University Medical Center, Durham, NC, USA

c Department of Biostatistics and Bioinformatics, Duke University Medical Center, Durham, NCUSA

Corresponding Author InformationReprint requests: Elizabeth R. Hauser, PhD, PO Box 3445, Duke University Medical Center, Durham, NC 27710, USA.

PII: S0002-8703(03)00502-7

doi:10.1016/S0002-8703(03)00502-7


View previous. 7 of 46 View next.