Introduction
At root, every hypothesis is a claim about the relevance of particular scales. If the hypothesis is parsimonious, and if the phenomenon it attempts to explain is simple, most scales are irrelevant. The classic SIR model (Kermack and McKendrick 1927) is impressive because it captures the dynamics of many infectious diseases with just a few interactions. Hosts are well-mixed particles in three possible compartments whose contact rates vary as simple linear or nonlinear functions of their populations. The model omits processes on other temporal scales, such as feedback with the physical environment and evolution by the pathogen and the host. It also omits spatial scales, such as competition among pathogens within a host and differences in contact rates due to population structure.
These assumptions work well for particular systems [e.g., Grenfell et al. (1992), cf. Swinton et al. (1998)], and are the foundation of other popular models in ecology [e.g., Volterra (1926)]. By nature of their simplicity, all their potential behaviors can be predicted analytically. However, many interesting phenomena are not well explained by these assumptions. Mean-field models can poorly approximate sexually transmitted diseases (Eames and Keeling 2002) and mechanisms of disease emergence, which may depend on the effects of superspreaders (Lloyd-Smith et al. 2005). In other systems, endogenous dynamics are strongly affected by extrinsic climatic forcing (Koelle and Pascual 2004) or interactions with other populations (LoGiudice et al. 2003; Jensen et al. 2006). In fast-evolving RNA viruses, such as HIV, strong selection within individual hosts and the timing of transmission may affect the frequency of CTL escape and CXCR4 mutants in the host population (Rambaut et al. 2004).
Local interactions, biotic and abiotic extrinsic drivers, and rapid evolution are examples of spatial, organizational, or temporal scales that crucially augment or replace the null dynamical model (Pascual 2005). Often these complexities can be incorporated into standard models by adding parameters, such as modified mixing terms (Roy and Pascual 2006). If the modifications do not produce the observed behavior, they may need to be replaced by a more accurate description of the underlying process, e.g., a contact network or agent-based simulation. One of the goals in researching these systems is to address the importance of variability at small, local scales to the dynamics of aggregated quantities measured at large, global scales. If small-scale 'details' matter, we need to ask how much complexity we need to incorporate into large-scale models if we seek to both understand and predict the dynamics of global quantities
(Pascual 2005). I would add that these details do not have to be small, since we are not always studying global
quantities. We may also ask whether patterns are shaped by extrinsic factors or dynamics—perhaps it matters that the system is open. The best hypotheses of complex systems remain parsimonious while appealing to processes occurring on other spatial, temporal, or organizational scales to describe a pattern. They are consequently challenging to evaluate, since there are so many possible processes (and thus competing hypotheses) to choose from.
I would like to use this framework to determine what does and does not regulate the diversity of a common but poorly understood disease, influenza. I define diversity as the variation in genotype and phenotype at any one time and how this variation changes over time: It encapsulates a broad set of ecological and evolutionary patterns resulting from the interaction of few or many scales. I follow the lead of recent exceptional models that have relaxed common assumptions about which scales are relevant and thereby made contributions to our understanding of flu. The following section presents biological context—that is, the mechanisms I suspect underpin broad patterns in flu diversity—for the research questions that follow.
Mechanisms and patterns in influenza
Structure and antigenicity
The genome of influenza A consists of eight RNA segments, each 850 to 2300 bases long, which code for 10 proteins (Table 1). Two of these proteins, hemagglutinin (HA) and neuraminidase (NA) are abundant on the virus’s surface, with approximately 4 HA for every NA. There are sixteen forms of HA and nine forms of NA. Combinations of HA and NA form subtypes, e.g., H3N2. Amino acid sequences of HA differ up to 20% within subtypes and 30-70% between them (Skehel and Wiley 2000).
Hemagglutinin and neuraminidase are the primary determinants of antigenicity. Antibody-binding sites of some subtypes of HA and NA have been described by X-ray crystallography and electron micrographs of monoclonal antibody escape mutants (Bizebard et al. 1995; Fleury et al. 1999; Knossow et al. 2002). These sites are grouped into antibody-binding regions, or epitopes, on the globular head of hemagglutinin (HA1). HA1 of the H3N2 subtype infecting humans has four or five epitopes, labeled A-E (Figure 1a) [epitope D may only be recognized by murine antibodies (Sato et al. 2000)]. HA1 of the H1N1 subtype has four or five recognized epitopes (Caton et al. 1982) (Gerhard et al. 1981), and the N2 NA has at least two (Gulati et al. 2002). Different subtypes of HA1 can have the same epitopes (Smirnov et al. 1999).
Antibodies neutralize influenza viruses through steric inhibition of receptor binding or membrane fusion, rather than inducing conformational change in HA. The receptor-binding site is a highly conserved pocket at the top of the HA1 (in H3N2, the site falls near epitopes A and B) and shows little variation among subtypes (Skehel and Wiley 2000). Antibodies can neutralize viruses by blocking the receptor-binding site directly (Bizebard et al. 1995) or by binding to an epitope some distance away (Fleury et al. 1999) (Figure 1b); the latter mechanism can block receptor binding or interfere with membrane fusion. Antibodies to the same epitope can compete or interact synergistically in neutralization (Brown et al. 1990; Sanna et al. 2000). Antibody affinity is positively but loosely correlated with neutralization ability (Kostolansky et al. 2000). One study found that on average one quarter of the hemagglutinins on the viral surface had to be neutralized to prevent infection (Knossow et al. 2002). The three neutralizing antibodies they investigated all interfered with receptor binding rather than membrane fusion. Neutralization kinetics have been described as log-linear or pseudo-first-order [reviewed in Frank (2002)].
Immunity
Hosts resist infection through humoral immunity, cellular immunity, and serum inhibitors. The contribution of each kind of immunity is a very active area of research, and contributions differ among host species.
Humoral immunity is the basis of permanent strain-specific immunity and some cross-immunity within subtypes in mammals; it also may play a role in heterosubtypic immunity. The specificity of antibodies is gauged by the ability of sera to inhibit hemagglutination by viruses of a particular strain. Most antibodies in sera of infected or vaccinated humans target HA1 (Sato et al. 2004), though some individuals also mount responses to NA (Cox and Brokstad 1999). All mammals investigated demonstrate antibodies to NP (Deboer et al. 1990; Cox and Brokstad 1999). Antibody repertoire can be monoclonal (antibodies to one epitope are present in antisera) or polyclonal (antibodies to multiple epitopes are present). Children have narrower antibody repertoires than adults: 25 of 27 children under age twelve developed antibodies to a region of epitope B of H3N2 HA1, and six showed antibodies for sites A or C. In contrast, older subjects had polyclonal responses to epitopes A, B, C, and E (Sato et al. 2004).
Cellular immunity can significantly abrogate pathology and speed viral clearance, though it appears to play a lesser role in preventing infection [Liang et al. (1994) and review in Thomas et al. (2006)]. T cells only attack presenting cells, and thus they usually lag behind antibodies in appearance and proliferation during infection. Mice challenged with a virus containing the internal proteins of human H1N1 produce CD8+ T cells specific to all six internal proteins, though T cells specific to epitopes on PA and NP predominate and can be detected ≥570 days after initial infection. Secondary responses tend to be dominated by NP (Belz et al. 2000; Marshall et al. 2001). CD4+ T cells bolster CD8+ and B cell responses and appear requisite for T cell memory (Belz et al. 2002). Studies of humans generally cannot separate the effects of cellular immunity from humoral and are unhelpful in gauging protection conferred by the former. They do, however, confirm that (1) there is no evidence that cellular immunity prevents infection, except possibly when exposures are simultaneous; and (2) there is a marked attenuation of symptoms during secondary infections. One small study observed several cases of rapid reinfections with heterologous subtypes, sometimes within days of clearance of the first subtype, and the secondary infection was no more likely to be asymptomatic than the first (Frank et al. 1983); in contrast, students at high schools experiencing concurrent epidemics of H3N2 and H1N1 were less likely to suffer multiple infections than students at schools with sequential epidemics (Sonoguchi et al. 1986). Attenuation of symptoms during secondary infections has also been demonstrated during pandemics of H2N2 and H3N2 (Sonoguchi et al. 1985). However, in both cases the HA of the pandemic strain were rapidly evolving (Matrosovich et al. 2000), and similar patterns are produced by antibody-mediated cross-immunity (Gill and Murphy 1977).
Comparatively little is known about the immunologic effects of serum inhibitors, which can select variants with altered receptor binding sites and may play a role in host range (Rogers et al. 1983; Matrosovich et al. 1998).
Evolution
Influenza viruses evolve by point mutations, reassortment of whole gene segments, and rarely by recombination (Hirst et al. 2004; Suarez et al. 2004).
Point mutations are the most frequent means of escaping immune surveillance, and they are also a means to modulate virulence, develop drug resistance, and adapt to new hosts or tissue types. Monoclonal antibody escape mutants arise in vitro every 104 to 106 viruses (Webster and Laver 1980). They avoid recognition by conformational changes or additional glycosylation sites that block antibody binding. Conformational changes tend to affect only the local structure within the surrounding epitope (Knossow et al. 1984). Hemagglutinin may be particularly tolerant of such changes: epitope A mostly consists of a loop extending from the rest of the molecule, and epitopes B and C are bulges (Wiley et al. 1981). While amino acids at certain positions, such as loops, will have a dramatically greater influence than others on antibody recognition, the location of influential positions can change over time (Nakajima et al. 2005). As for other pathogens, tertiary protein structure greatly complicates predictions of the locations of B cell epitopes (Korber et al. 2006).
Influenza viruses can also escape immune surveillance through the addition of glycosylation sites [Asn-X-Thr/Ser, where X is any amino acid except proline or potentially aspartic acid (Gallagher et al. 1992)]. Host cell carbohydrates binding to these sites form a glycan shield to antibodies. Glycosylation is the major mechanism of antibody escape by HIV (Wei et al. 2003) and predicts the strength of the antibody repertoire mounted by Rhesus monkeys to SIV (Reitter et al. 1998). The number of potential glycosylation sites on HA1 of H3N2 has increased from two to six or seven since the subtype emerged in humans (Abe et al. 2004). While increased glycosylation decreases receptor binding activity in vitro, it can do so without negatively affecting cell fusion. Glycosylation may thus be an especially rapid effecter of antigenic change (Schulze 1997), though it is not well tolerated by all subtypes of HA (Tsuchiya et al. 2002).
The accumulation of point mutations in response to immune pressure has previously been called antigenic drift, and it underlies the characteristic phylogeny of HA1 in human viruses: genetic distance steadily increases from the founding strain, and no strain persists more than a few years (Figure 2a). Recently it has been shown that these strains in human H3N2 form antigenic clusters defined by cross-reactivity patterns (Figure 2b). Only one cluster appears to dominate at any time (Smith et al. 2004).
In contrast to antigenic drift, antigenic shift by reassortment was once thought to be rare and to always cause pandemics (Webster et al. 1992). There is growing evidence that reassortment is as common in humans (Lindstrom et al. 2004; Holmes et al. 2005) as in other hosts (Hatchette et al. 2004; Webby et al. 2004). It may lead to partial immune escape by shuffling combinations of HA, NA, and other antigenic determinants. One antigenic cluster (FU02) may have arisen through reassortment (Holmes et al. 2005), potentially without antigenically significant changes in HA1. Unlike introductions of H1N1, H2N2, and H3N2 into the human population, the emergence of the H1N2 subtype since 2001-2002 has not obviously increased incidence.
Ecology
Influenza is highly seasonal in temperate regions, with a four-month epidemic period in winter and few cases in the summer (Figure 3a) (Cox and Subbarao 2000). Incidence shows no clear periodicity in the tropics (Figure 3b) (Chow et al. 2006; Viboud et al. 2006a). Pandemic and interpandemic influenza A strains circulate globally over short time periods. H3N2 outbreaks have especially high spatial synchrony (Greene et al. 2006; Viboud et al. 2006b), and phylogenetic studies of strains circulating in France (Lavenu et al. 2006), Japan (Nakajima et al. 1991), and New York (Nelson et al. Submitted) demonstrate that multiple lineages seed annual epidemics in each community. Swabs from air travelers support the hypothesis that there is interhemispheric transport of strains throughout the year (Sato et al. 2000).
Incidence is usually inferred from deaths to pneumonia and influenza. Until recently, few countries tracked deaths or infections by type and subtype. Observations from WHO collaborating labs reporting to the CDC suggest that seasons dominated by H3N2 have relatively low incidence of H1N1 and influenza B and vice-versa (Figure 3c) (Thompson et al. 2003; Greene et al. 2006). Estimates of annual incidence range from 10-20% in interpandemic years and 40-50% during pandemics (Cox and Subbarao 2000), but there does not seem to be much confidence in any value in the flu community. Contact with young children is a significant risk factor for infection (Gubareva et al. 2002; Viboud et al. 2004), and vaccination of young children can dramatically reduce incidence in older contact children and adults (Monto et al. 1970; Hurwitz et al. 2000).
Many other species can be infected with influenza viruses, and multiple subtypes are endemic in swine, domestic poultry, and horses. Influenza is seasonal in many aquatic birds, which are its natural hosts. Transmission between wild birds and domestic poultry, wild birds and swine, domestic poultry and humans, and swine and humans occurs at least annually in many parts of the developed and developing world where the populations cohabitate (section 5, below).
Models
Most models of influenza have focused on explaining seasonal dynamics of flu in (temperate latitude) human populations and mechanisms of strain cycling (Pease 1987; Andreasen et al. 1997; Lin et al. 1999; Andreasen 2003; Boni et al. 2004; Dushoff et al. 2004; Lavenu et al. 2004). They assume population immunity declines as a function of antigenic drift, which itself can occur at a constant rate or as a function of epidemic size, and that cross-immunity between serotypes is fixed. Regular and irregular oscillations in incidence, dynamical resonance, and complex strain cycling can result under these assumptions. A few studies have explored strain structure with evolution. Boni et al. (2006) predicted that strong host immunity and long epidemics would lead to the highest rates of antigenic drift. They proposed that their model could explain anecdotal reports that antigenic variants tend to arise toward the end of a season, with attack rates rising the following year. Gog et al. (2003) demonstrated that a slight increase in the duration of infectiousness can substantially increase the survival probability of a mutant strain during seasonal bottlenecks in transmission. They suggested that this mechanism may have accounted for the rapid fixation of CTL escape mutants in the 1993-1994 season.
Three models have attempted to describe broader patterns by coupling dynamics of strain evolution to population dynamics. Gog and Grenfell (2002) showed that strains evolving in one-dimensional evolutionary space form clusters if infection times are short and cross-immunity is high. They noted that the existence of clusters could yield phylogenies with short branches, but, like previous studies, they did not attempt to match their system’s endogenous dynamics to empirically observed seasonality. To explain strain evolution, subtype cycling, and seasonal dynamics in tandem, Ferguson et al. (2003) constructed an agent-based model that could generate HA trees with short side branches, annual fluctuations in incidence, and subtype replacement and coexistence. Central to their model was a strain-transcendent, generalized immunity (attributed to cellular immunity) that reduced the probability of infection in a density-dependent way; they found this immunity required a half-life of six months to restrict diversity sufficiently. They also incorporated spatial structure, something akin to original antigenic sin (exposure would boost immunity to old strains even if hosts were not infected), and a 30-year host lifetime. It is unclear how robust their results are to these assumptions.
Ferguson and coauthors showed that the interaction of ecological dynamics, immune selection, and evolutionary dynamics on comparable time scales might be critical for influenza. The third and most recent model elaborates another scale of complexity, rather than invoking generalized immunity, to explain influenza evolution and seasonality. Koelle et al. (in prep) infer from the existence of antigenic clusters that large spaces of genotype space are effectively neutral, since they have the same antigenic phenotype, and that this relationship arises from the nature of the genotype-phenotype mapping and not the infection history of the strain (Figure 4a). We propose that interpandemic influenza undergoes epochal evolution. During periods of apparent evolutionary stasis, strains diffuse through genotype space. A mutant that discovers a new phenotype (cluster) encounters fewer immune hosts and proliferates. Partial cross-immunity to the old cluster causes the old cluster to rapidly go extinct. The model predicts and confirms that real influenza strains undergo a boom-and-bust cycle of genetic diversity. The model further proposes—and requires—that genotype space within clusters is almost neutral. Without weak positive selection, simulated strains explore genotype space too slowly to generate the characteristic increases (Figure 4b).
I am not aware of any ecological and evolutionary models of influenza in other host species, despite recent dramatic changes in host ecology and viral diversity.
Proposed research
One of the most important ends of understanding biological systems, and diseases in particular, is prediction—including the knowledge that a system might be too stochastic or chaotic to predict reliably. Our management strategy for influenza is still largely reactive; next season’s vaccine is based on the most antigenically divergent strains of the current one, and until recently, only populations most at risk of dying from infection were targeted for vaccination [2006-2007 is the first season for which the CDC has recommended vaccinating children 6-59 months old (Smith et al. 2006)]. Influenza is furthermore an exciting system because of the convergence of so many possible dynamical scales: due to their high mutation rates and short generation times, RNA viruses are especially sensitive to changes in host ecology. It is increasingly evident that the relevant ecologies are both those encountered within hosts and in host populations, manifested as adaptive immune responses (Grenfell et al. 2004). We are in a position to affect host ecology, and thus viral evolution, through vaccination, antivirals, and farming practices.
My goal is to examine linkages between viral evolution on a molecular level and immunity on molecular and population levels to identify factors involved in regulating flu diversity in humans and other species. My first three research projects investigate the roles of humoral and cellular immunity in flu evolution and strain competition. The next two projects consider the effects of other aspects of host ecologies on viral evolution. The last project explores the adaptive potential of influenza under different modes of evolution.
My specific questions are:
- 1. What temporal patterns of selection are evident in H3 HA? If the model of epochal evolution is correct, we should see episodic selection in HA corresponding to cluster transitions, and weak selection within clusters. How are selection pressures distributed across HA1? Is there evidence that positive selection is at work on other proteins, such as NA? Is the spectrum of selection potentially adaptive, by helping HA find new phenotypes?
- 2. How does antibody heterogeneity affect the dynamics of partial cross-immunity, strain structure, and epitope evolution? Diverse, epitope-specific host antibody responses mean that the effective phenotype of a strain depends on the host it is infecting. How do strains compete under such heterogeneous selection pressures? What is the effect of antibody repertoire specificity on strain and epitope evolution?
- 3. What are the conditions for subtype replacement, coexistence, and interference, and are they consistent with observations and the biology? Ferguson et al. (2003) invoked generalized immunity to explain how subtypes replace each other during pandemics and interfere during interpandemic years, though other mechanisms might generate similar patterns. Which are the most robust and biologically grounded?
- 4. How does epochal evolution interact with other aspects of host ecology to modulate flu diversity in swine? Models show that the outcome of strain competition is affected by the duration of infectiousness relative to the host lifespan. Several antigenic clusters have been known to circulate in swine concurrently. Is this diversity predicted by differences in the host’s life history and epidemiology?
- 5. How do host ecologies affect evolutionary opportunities for emergence? The fitness of a pathogen’s phenotype is determined by the number of hosts available to it, which is a function of host ecology. How do the endogenous disease dynamics and interspecific contact networks of different host species affect the probability that a virus with another receptor preference—allowing a shift in host range—will emerge?
- 6. How do epistasis and modularity affect the genetic potential of influenza? The idea that populations may be able to evolve their
genetic potential,
or capacity for change, has attracted attention from theoreticians. Applications of this idea have sparked debate. Is there evidence that influenza has evolved to increase its genetic—or antigenic—potential, e.g., through codon volatility, modularity, or the propensity to reassort? Is evolution of genetic potential feasible under epistasis?
1 Evidence of episodic selection on H3N2
There are two hypotheses of how influenza undergoes continual antigenic drift while its genetic diversity remains bounded over time. Ferguson, Galvani, and Bush (2003) and Tria et al. (2005) contend that density-dependent, generalized immunity is necessary to restrict diversity. Each group invokes another factor—heterogeneity in host transmission (spatial structure) and variable strain fitness, respectively—that crucially augments the effect of generalized immunity. Koelle et al. (in prep) propose that diversity is constrained episodically by the appearance of new antigenic phenotypes that competitively displace existing strains. Selective sweeps by antigenic variants have been proposed before for H3N2 (Fitch et al. 1991) and H1N1 (Ina and Gojobori 1994). The model of Koelle and coauthors suggests that the novel phenotypes correspond to the eleven clusters of H3N2 between 1968 and 1998. Cohort studies of reinfection in humans indicate that cross-immunity is as high as 95% within clusters and 60-84% between clusters (Gill and Murphy 1977).
The model also proposes that in addition to abrupt, intense sweeps, there is weak selection within clusters. A strain begins in a neutral network, i.e., genotype set corresponding to the same phenotype or cluster, and diffuses through high-dimensional genotype space via mutations until it arrives at a node (sequence) belonging to an adjacent network. With some probability, this adjacent network is a continuous (slight) or discontinuous (major) phenotypic change, with respectively a modest (i5%) or substantial (520%) decrease in cross-immunity. Discontinuous changes curtail diversity by precipitating cluster transitions; continuous changes increase diversity by causing accelerated diffusion from the cluster's founding strain. The effects of accelerated diffusion in genotype space may be visible in the phylogenetic trees of some clusters (Figure 4a). For example, in BE89 and BE92, multiple lineages persist from one year to the next, suggestive of neutral diffusion. The trees of WU95 and SY97, in contrast, generally have unidirectional growth, which echoes the traditional descriptions of the HA1 trees as a whole. This pattern could be due to chance, perhaps in sampling or off-season extinction, or due to positive selection on some branches.
There are several reasons why it may be useful to evaluate the strength and location of positive selection in influenza over time. The first is that there is clearly some threshold where selection ceases to augment diversity and instead reduces it. The second is that selection, as suggested above, may allow prediction of which strains will dominate from year to year within a cluster. Before the clusters were identified, Bush et al. (1999b; 1999a) found that the number of amino acid replacements of 18 codons, most located on two epitopes of HA1, could retrospectively predict the lineages that would survive from one year to the next. They applied their technique to eleven years; interestingly, their rule does most poorly in years in which cluster transitions probably occurred. This raises the interesting possibility that not only the strength but also the targets of selection within clusters may differ from that between clusters. Further, there is evidence suggesting the possibility of strong selection on NA (Venkatramani et al. 2006). Is this selection reflected in the tree of HA? Is there any temporal correlation in selection on the two proteins? Which epitopes on each protein are targeted? Recognition of these patterns could lead to better predictive models and more sophisticated vaccines.
Selection within clusters may also be a nontrivial component of influenza's evolutionary and ecological dynamics. A strictly neutral network model presents a paradox. If the supply of susceptibles diminishes as strains diffuse through genotype space, it would be possible for the virus to go extinct before finding a new phenotype. This has not yet happened for H3N2. Are the discontinuous phenotypes frequent enough to exclude this possibility, or might positive selection within networks critically accelerate diffusion, and thus increase the probability of finding significantly new phenotypes (K. Koelle, pers. comm.)?
The first chapter of my thesis will describe the strength and targets of selection in HA and NA of H3N2 over time. The null model, strict within-cluster neutrality, can be associated with three different patterns:
- 1. Simple diffusion with stochastic branch extinction should generate a distribution of pairwise genetic distances that changes characteristically over time. Specifically, it should increase, on average, more slowly than if fitnesses rise at other points in the network, since strains close to the founder will not be outcompeted faster than any others. Genetic distances under neutrality should also be less clustered in the graph theoretical sense, because no strain will be significantly more successful than any other.
- 2. There should not be an excess of nonsynonymous mutations within isolates from the same cluster. An excess of nonsynonymous mutations is the hallmark of selection.
- 3. Within clusters, branches with more amino acid substitutions or a higher ratio of nonsynonymous to synonymous replacement rates should not have a higher survival probability from year to year. Under neutrality, lineage survival within clusters should be random. Branches linking clusters, however, should have on average more amino acid substitutions than other branches. This expectation is based on the simple fact that the greater the number of amino acid replacements, the more likely it is that a discontinuous phenotype will be discovered.
I do not see a simple method for statistically evaluating the first prediction, though simulation may yield insight. My research will thus focus on the latter two predictions.
Using the 253 sequences antigenically typed by Smith et al. (2004), I have so far calculated dN and dS differences and ratios for groups of codons suggested in the literature of positive selection. Preliminary results suggest that there is weak positive selection on at least one epitope of HA1 in every cluster and strong selection between them. Selection on epitopes A, B, and E is most often associated with cluster transitions (Figure 5). Epitope B has the greatest contrast in the strength of selection within clusters versus between them, suggesting it has greater antigenic potential than other epitopes. Epitopes C and D demonstrate fluctuating selection, e.g., there is no positive selection on epitope D after the 1970s. The strength of selection on epitopes within clusters does not appear predictive of the epitopes that are most strongly selected between clusters.
Future analysis will refine these measures. In particular, I will consider methods for detecting selection at single sites and in quasispecies (Stewart et al. 2001); methods that estimate the physicochemical effects of substitutions (Wong et al. 2006); and methods that consider selection in sites that are structurally close (Suzuki 2004). I will also broaden the analysis by searching over the entire HA1 and available NA sequences.
This work will be complemented by analysis of lineage survival, broadening the picture by Bush et al. (1999a). They report a 40% excess of amino acid substitutions on terminal branches, though the number of replacements at 18 positively selected codons was associated with lineage survival. It is telling that their rule of fast
evolution, which predicts that the successful lineage has the greatest numbers of replacements at the 20 fastest-evolving codons (12 of which were under positive selection), performs best in the putative cluster transitions: accumulation of random changes in bulk may be more likely to lead to discontinuous transitions than selection at a few sites, perhaps because of the unpredictable effects of different residues on tertiary structure. Their work shows that it will be important to consider changes in individual codons, the locations of changes, and the number of changes for each branch.
To reevaluate their results and test my third hypothesis, I will thus build and compare trees with codon-based maximum likelihood and Bayesian substitution models, using relaxed clocks where possible (Drummond et al. 2006). More complex models become computationally tractable if trees are built of just a few clusters at a time. I will then evaluate models of positive selection using maximum likelihood and Bayesian approaches on the trees (Yang 1997; Ronquist and Huelsenbeck 2003; Kosakovsky Pond and Frost 2005; Kosakovsky Pond et al. 2005).
In my analysis of selection over time, I will also consider where the addition of glycosylation sites might have affected strain fitness by removing epitopes from antibody pressure (for example, at site D). I will also try to distinguish sites under T cell selection from those under antibody selection, using discovered (Rimmelzwaan et al. 2004; Berkhoff et al. 2005) and predicted (Korber et al. 2006) T cell epitopes.
2 Effects of antibody heterogeneity on the dynamics of partial cross-immunity, strain structure, and epitope evolution
Models of strain competition usually assume that cross-immunity between strains is invariant: all hosts infected with one strain have the same probability of being infected with another. Cross-immunity under this assumption can yield complex dynamics determined by the intensity of competition. Gupta et al. (1998) showed that for realistic ratios of infection times and host lifespans, cross-immunity could move the system through three different dynamical regimes (Figure 6). Intense strain competition led to the dominance of one antigenically nonoverlapping set; at intermediate competition, strains cycled periodically or chaotically; and when cross-immunity was low, strain structure disappeared. Koelle et al. (in prep) also assume that the cross-immunity between strains is fixed. Strains within a cluster have almost complete cross-immunity, but their high mutation rate allows genetic and transient antigenic diversification. Slightly lower cross-immunity between clusters causes competitive exclusion by ensuring large fitness differences between serotypes. Thus, the intrasubtypic dynamics of influenza are dominated by the regime of intense competition identified by Gupta and coauthors.
It has been shown that hosts of the same species can mount different immune responses after infection with identical strains. Wang et al. (1986) found that some strains provoked more varied responses than others: antibodies to Hong Kong pandemic strains induced antibodies to sites A and B or both, whereas a strain from 1978 induced antibodies to several epitopes. In humans, differences exist between the responses of children and adults. Nakajima et al. (2000) looked at the acute phase and convalescent sera of nine people infected with H3N2 during the 1990-1991 season (during the BE89 cluster) and found that all the young children had consistently narrower responses than adults. The sera of the three and four year old children had antibodies only to site B1, and all the older children and the one adult had antibodies binding to sites A, B1, B2, C, and C/E. In a follow-up study, Sato et al. (2004) examined the sera of 35 people who had been infected with a strain of the SY97 cluster and found that almost all young children developed antibodies to B1 and many to A. Everyone else developed a unique polyclonal response, often reacting more strongly to epitopes other than B1.
Such observations have motivated proposals that variation in antibody repertoires could underlie the mechanism of antigenic drift in humans. These hypotheses differ in the role of populations with polyclonal responses. Nakajima et al. (2000) and Sato et al. (2004) posit that drift results from serial adaptation to monoclonally-responding subpopulations. Antigenic variants first escape site B1 and then sites targeted by others; despite the rarity of older children or adults with narrow responses to sites other than B1 in their study, we [can] not exclude the possibility that on a worldwide scale, individuals of this type may be numerous
(Sato et al. 2004). This pattern, they note, fits evolution within BE89 but not SY97. Cleveland et al. (1997) recalled that major drift variants tend to have at least four amino acid substitutions in two epitopes (Wilson and Cox 1990). They predict the existence of four different human genetic groupings
with consistent, nonoverlapping epitope biases. Viruses drift as they move from group to group, acquiring a critical amino acid change in each, and become double escape mutants. In contrast to Sato et al. (2004), they argue that polyclonal responses can select for drift mutants under particular conditions: extrapolating from responses in rabbits, they conclude that antibodies to one epitope must predominate, the titer of that antibody must be sufficiently high, and titers of other antibodies must be sufficiently low. Other models in mice and ferrets suggest that mutants can arise from polyclonal responses as long as they could escape a predominant antibody [reviewed in (Nakajima et al. 2000)].
My second project will examine the ecological and evolutionary implications of heterogeneous antibody responses. Specifically, I ask:
- 1. How does variable cross-immunity between strains affect the outcome of strain competition, i.e., the possible dynamical regimes proposed by Gupta et al. (1998)?
- 2. How do viruses evolve at the epitope level in heterogeneous populations? Do the patterns predicted above emerge?
To address the first question, I will simulate an adaptation of Gupta’s model. She defines strains as having n loci, each defined by m possible alleles. In my model, each locus corresponds to an epitope, and each allele a possible phenotype of the epitope. Cross-immunity is set by , which indicates the reduction in transmission probability conferred by previous infection with one strain; (0 ≤ n ≤ 1). Without heterogeneous immune responses, the fraction immune to a strain i, zi, changes as
Formula
here
(1)
where wi is the force of infection of strain i (the per capita rate of rate of acquiring infection, which is linearly proportional to the number of infectious individuals) and p is the birth and death rate. Gupta et al. then add a compartment wi representing hosts immune to all strains j that share alleles with i, including i itself:
Formula
here
(2)
The expression j ~ i refers to all strains j sharing alleles with i. The population of individuals infectious with strain i, yi, is then determined by
Formula
here
(3)
where w is the rate of loss of infectiousness. To incorporate antibody heterogeneity into this formalism, we need to track populations immune to epitopes. Strain i is now defined as a set of epitopes, with each epitope defined by a phenotype k: i = {1k,…,nk} where k [1, m] and k Z+. Let pn be the probability that an individual develops an antibody to epitope n. Assume all responses are on average monoclonal to one epitope, so that . pn = 1. We also assume that = 1 if two strains share an epitope to which a host has antibody. Equation (1) does not change: all people infected with strain i will mount a specific response to one of its epitopes and will not transmit i in the future. But now not all hosts with immunity to strain j, which shares epitopes with i, will potentially have immunity to i. Only the fraction of hosts infected with j that mount antibodies to epitopes shared with i will then be immune to i. Let Sij be the set of shared epitopes between strains i and j: Sij i j. The probability rij of developing antibodies to i if infected with j is
Formula
here
(4)
Thus equation (2) becomes
Formula
here
(5)
The reduction in wi, the number of hosts immune from infection with other strains takes the place of ,, and the infectious class simply changes as
Formula
here
(6)
Initially I will allow only one phenotype at each epitope (m = 1). I will explore the effects of varying p1 over p1 = pn = 1/n to p1 = 1. For different values of p1, what are the system’s dynamics, and does p1 have similar thresholds to h? Can the bifurcation points be derived analytically? I will then explore the behavior of m > 1. It might be interesting to relax the assumption that responses are monoclonal, and model dynamics of two host groups analogous to young children and adults. The monoclonal assumption corresponds to p pn = 1 and fully polyclonal to pn = n (the latter corresponds to Gupta’s model with = 1).
To address the effects of antibody heterogeneity on evolution, I will let m > 1 and define a cross-immunity parameter between different phenotypes of the same epitope. This cross-immunity value could be a linear or nonlinear function of the virus’s genotype, represented by a bit string at each epitope. This model seems most practically evaluated by simulation. The goal of this experiment is to observe the effect of antibody bias on selection pressures on individual epitopes, measured by dN and dS, and virus phylogeny. I will then compare results to my findings from the first research project to see if I have produced realistic within-cluster dynamics.
3 Subtype competition: Conditions for coexistence and exclusion
Among the most striking and least well understood patterns in influenza are those of subtype replacement and coexistence (Earn et al. 2002). There is strong evidence that H3N2 (or another H3 subtype) circulated in humans before 1918 (Houswort.Wj and Spoon 1971; Enserink 2006) [and possibly H2N2 before that (Masurel and Marine 1973)]. It was then replaced by H1N1. In 1957, H2N2 replaced H1N1. In 1968, H3N2 replaced H2N2. H1N1 reentered the population in 1977 and coexisted with H3N2. H1N2 appeared sporadically in the 1980s and became widespread in 2001-2002 (Guo et al. 1992a; Xu et al. 2002). H3N2 and H1N1/influenza B appear negatively correlated in incidence from season to season (Ferguson et al. 2003). The time series indicates there are major differences in attack rates between emergence events of different subtypes (Figure 1a).
Several hypotheses have been offered to explain instances of replacement or coexistence with interference. Ferguson et al. (2003) argue that short-lived, nonspecific immunity is critical for capturing the dynamics of subtype replacement during pandemics and out-of-phase oscillations when emergence is not accompanied by a pandemic. Antibody to NA of H2N2 was significantly associated with a lower probability of infection with H3N2 during its emergence in 1968 (Monto and Kendal 1973). Viboud et al. (2005) propose that different frequencies of antibody to NA was the largest factor modulating the intensity of the H3N2 pandemic in North America and Europe. Studies of heterosubtypic immunity in humans and other animals suggest no shortage of possible effecters (Table 2), e.g., pigs previously infected with H3N2 or H1N1 are partially protected from H1N2 (Van Reeth et al. 2004). In light of the results of Gupta et al. (1998), it is clear that the outcome of competition might also be determined by chaotic dynamics. These dynamics will also be modulated by influenza’s strong seasonality.
For my third project, I plan to use results from the analytic model and the simulation developed for my second project to compare hypotheses of subtype replacement and coexistence. It is clear from Table 2 that cross-reacting epitopes may be present on almost any protein, though each epitope might elicit a quite different immune response, depending on whether it is targeted by T cells or B cells. I would like to determine the probabilistic thresholds of host heterogeneity for dynamical regimes—including subtype replacement and coexistence—under different hypotheses:
1. How does the addition of a seasonal forcing affect previous results on antibody heterogeneity? For example, for a given amount of forcing, what is the minimum p1 for exclusion between two serotypes sharing one epitope (e.g., on nucleoprotein)? To avoid unrealistically severe bottlenecks, I would include three populations, representing the northern hemisphere, the tropics, and southern hemisphere.
The following hypotheses would be addressed with and without seasonality:
2. Asymmetric cross-immunity may cause antigenic drift in H3N2 (de St. Groth 1977). There are numerous observations of antibodies to later strains binding more strongly to earlier strains than antibodies to earlier strains do to later strains (1982; 1983; 1987). There are perhaps fewer opportunities for asymmetric cross-immunity among subtypes (glycosylation of HA is the candidate mechanism in H3N2), but there is evidence of asymmetry in the responses of swine to H3N2 and H1N1 (Heinen et al. 2001). How does asymmetric cross-immunity change thresholds of p1?
3. How would short-lived generalized immunity, as defined by Ferguson et al. (2003), affect dynamics of subtype replacement and coexistence? How does the time to subtype fixation vary between generalized immunity and antibody-mediated immunity?
4. Do results differ if the reduction in transmissibility acts through the probability of infection (similar to humoral protection) or the duration of infection (similar to cellular protection)?
I would then like to conduct a more in-depth review of the literature on influenza’s epidemiology and immunology to evaluate hypotheses in light of both their biological plausibility and dynamical robustness. I will also identify missing information that would allow determination of the mechanisms of specific replacements.
4 Other determinants of diversity: Host ecology and influenza in swine
Until recently, swine in North America circulated only one subtype of influenza, the classical swine
H1N1, which evolved from human H1N1 sometime before 1933. In 1997 and 1998, H3N2 appeared in the United States and became widespread within a year [reviewed in Webby et al. (2004)]. At least two lineages emerged, one a double reassortant between human H3N2 of the SY97 cluster and classical swine H1N1, and the other a triple reassortant containing avian flu genes (Zhou et al. 2000). Since then, H3N2 in swine has acquired at least two more HA from human H3N2, and further reassortment with classical swine virus has produced at least two lineages of H1N2, which has also become widespread (Karasin et al. 2002). Reassortment between classical swine H1N1, human H3N2, and avian H1N1 had been described previously in European swine populations (Castrucci et al. 1993; Marozin et al. 2002). In the 1970s, the HK68, EN72, and VI75 clusters of human H3N2 were found circulating in Asian and Italian swine after HK68 and EN72 had disappeared from the human population (Shortridge et al. 1977; Ottis et al. 1982). It is interesting that proliferation of genetic and antigenic diversity in North American swine accompanies dramatic changes in host ecology. There are currently 100 million swine in North America; in the United States, the percentage of swine farms with ≥5000 swine increased from 18% in 1993 to 53% in 2002, and vaccination over the same time period became common (negligible in 1995, 44.1% of sows in 2000, and over half in 2003) (Wuethrich 2003). In poultry, vaccination has been associated with rapid antigenic drift away from vaccine strains (Lee et al. 2004).
Before the emergence of H3N2, several authors described antigenic drift in swine as slower than in humans, possibly due to their short life spans relative to the frequency of epidemics and the infrequency of vaccination [reviewed in Heinen et al. (2001)] or the high standing antigenic diversity (Olsen et al. 2000). It is also possible that smaller farm sizes would have reduced the frequency of epidemics and the spread of drift strains. The rapid spread of emerging subtypes in North America and Eurasia, and such observations as a multidrug resistant European swine influenza viruses in Hong Kong (Gregory et al. 2001), suggests that the relevant spatial scale for swine influenza ecology is closer to continents than farms. Because swine frequently exchange viruses with humans, and because they might be capable of sustaining more (or at least different) antigenic and genetic diversity than influenza in humans, it is important to have models to explain and predict these patterns.
For my fourth project, I propose to develop a null model of ecological and evolutionary dynamics of influenza in swine. This model will be functionally similar to the one developed for influenza’s epochal evolution in humans (Koelle et al. in prep) and will complement similar investigations into the dynamics of avian and equine influenza. I plan four steps in the analysis:
- 1. Assume the genotype-phenotype map and antibody responses of swine are identical to humans’, and study the effects of altered pathology (e.g., duration of infectiousness) and life history on incidence, phylogeny, and antigenic diversity. Do multiple clusters circulate? An important question that will have to be addressed is whether to force seasonality; infections U.S. swine appear to be seasonal (Olsen et al. 2000).
- 2. Add simple subcontinental spatial structure; how does drift slow and diversity increase as fragmentation increases?
- 3. Simulate vaccination by immunizing a fraction of the population each year with strains from the previous year. How are incidence and antigenic drift affected?
- 4. Assume swine have a more refined (sensitive) antibody response than humans, such that fewer mutations are neutral. Antigenic diversity should increase, but probably without commensurate increases in genetic diversity. Can this mechanism be distinguished qualitatively or quantitatively from mechanisms 1-3?
The researchers who developed antigenic maps of influenza in humans have submitted a paper on antigenic drift in swine (K. Koelle, pers. comm.). It will be interesting to see if their results agree with the trends in antigenic and genetic diversity described above. If they classify enough available sequences by antigenic type, it may be possible to perform selection analysis. If the authors use ferret sera to determine antigenic clusters, this analysis can help determine if the antigenic types are also recognized by swine sera. It might also shed light on whether the presumed clusters arose from selection or neutral drift, e.g., via spatial segregation.
5 Immunological, ecological, and evolutionary drivers of host range
Concern that a new strain of influenza virus may emerge from an animal reservoir and cause a pandemic in humans has motivated models of how the disease might spread in human populations (Ferguson et al. 2005; Longini et al. 2005). The mechanisms by which such a strain might move from animals to humans have generated much speculation. The traditional view of influenza held that the appearance of novel subtypes in humans, such as H2N2 in 1957 and H3N2 in 1968, was always preceded by reassortment between avian, swine, and possibly human strains in pig populations (Webster et al. 1992; Castrucci et al. 1993; Ludwig et al. 1995). Transmission of these reassortants to humans was considered rare and random, with humans occupying a peripheral niche in influenza’s large community of hosts. In addition to humans and pigs, this community included wild waterfowl, which sustained all known subtypes of the virus, and also horses and domesticate birds. With few exceptions, influenza’s dynamics in each species appeared independent of dynamics in other species. Outbreaks in turkeys in North America, which coincided with the fall migration of waterfowl, and serological surveys of asymptomatic pig farm workers were the only indicators of regular interspecific transmission (Halvorson et al. 1983; Karunakaran et al. 1983; Sivanandan et al. 1991; Campitelli et al. 1997; Olsen et al. 2002; Myers et al. 2006).
Influenza’s recent activities in almost all host species indicate that ecological and evolutionary processes, and these processes in different hosts, cannot—or can no longer—be considered independent. For the first time since 1961, wild waterfowl have suffered severe morbidity and mortality from highly pathogenic H5N1, suggesting that they may not be in evolutionary stasis
with the virus (Kida et al. 1980), cf. (Ito and Kawaoka 1998; Hatchette et al. 2004). These infections resulted from H5N1 strains adapted to poultry, which worldwide over the past nine years have experienced twice the number of outbreaks than in the previous four decades (Hirst et al. 2004; Shortridge 2005). These epidemics were often caused by the introduction of new subtypes by waterfowl throughout Asia, Europe, and North and South America (Guan et al. 2002). Many of these subtypes are still circulating in poultry and also mixing with swine (Ludwig et al. 1994; Peiris et al. 2001; Perez et al. 2003). In addition, strain diversity within certain subtypes is increasing, perhaps as a result of the selection pressures induced by vaccination measures prompted by the outbreaks (Horimoto et al. 1995; Lee et al. 2004; Swayne 2005). This rise in viral diversity in poultry has been mirrored in swine in North America and Asia (Zhou et al. 2000; Webby et al. 2004). Perhaps the most significant development in influenza’s ecology is the establishment of a new transmission route between two host species: Since 1996, influenza has been repeatedly and directly transmitted from birds to humans (Banks et al. 1998; Lin et al. 2000; Shortridge 2005). These transmissions, which have involved several avian species, at least five subtypes (involving H5, H7, and H9 HA), and hundreds to thousands of humans on different continents, demand a reexamination of almost all aspects of the virus-host interactions, and especially the ecological and evolutionary constraints on host range (Suarez 2000; Liu et al. 2003a; Enserink 2004; Palese 2004).
While no constraint holds for all subtypes all the time, the least escapable biochemical limitation in vivo and in vitro is the compatibility of virus’s receptor binding site and the host cell’s sialic acid receptor. In waterfowl, marine mammals, and horses, HA binds to sialic acid receptors of the l2,3 conformation, which are found throughout birds’ gastrointestinal tracts and the respiratory epithelia of horses and marine mammals. The respiratory epithelia of humans bind in a o2,6 conformation. Cells in pigs and chickens have both receptor types, allowing them to be infected by viruses adapted to different ranges of hosts, and thereby permitting the generation of novel subtypes. Thus, pigs and chickens might serve as mixing vessels or intermediate hosts for influenza (Claas et al. 1994; Scholtissek et al. 1998; Matrosovich et al. 2001; Gambaryan et al. 2002a; Gambaryan et al. 2002b). Not all viruses share this constraint: strains of several subtypes have been able to infect humans while retaining their .2,3 preference, but they do not appear capable of human-to-human transmission (Matrosovich et al. 1999). It is also likely that hosts are commensurately ambiguous by means of their receptor availability: pigs have more .2,6 than 22,3, and swine-adapted viruses tend to prefer 22,6; and 22,3 (but not 22,6) receptors are found in the human eye (Olofsson et al. 2005) and lower respiratory tract (Shinya et al. 2006).
Despite many uncertain details, the consistent match between the virus’s receptor preferences and the receptors available in the hosts to which they are adapted suggests a framework in which the evolution of host range can be approached. Experiments have shown that most viruses cannot replicate in host tissue of dissimilar receptor type, and viruses preferring one receptor type can usually sustain some replication in any host possessing that type, even if they are adapted to other species (Kida et al. 1994; Ito et al. 1999; Ito and Kawaoka 2000; Gambaryan et al. 2002b; Lee et al. 2005). Thus, the chemistry of receptor binding creates a tradeoff between the ability to invade cells of one type and the other. Viruses can change their receptor preference through several mutational steps, including single nucleotide substitutions at residue 226, or through reassortment with a virus having the other receptor preference on its HA (Nobusawa et al. 2000). The mutational jump from .2,3 to 22,6 occurred in domesticated swine in the 1980s following infection by poultry and can easily be obtained in vitro (Rogers et al. 1983). Reassortment appears common in most hosts (Hinshaw et al. 1980; Guo et al. 1992b; Yamnikova et al. 1993; Hatchette et al. 2004; Lindstrom et al. 2004), and can occur even when viruses are unable to replicate efficiently (Kida et al. 1994).
Various ecological conditions have been claimed to enable 'optimal evolution' responsible for the recent spread of multiple subtypes and strains in multiple hosts (Webster and Hulse 2004). Small family farms, large commercial farms, live bird markets, chickens, pigs, and other species have been implicated in the recent changes in influenza’s host range (Bulaga et al. 2003; Liu et al. 2003b; Webster and Hulse 2004). So far, the effects of host ecology on the evolutionary opportunities available to influenza viruses have not been explored quantitatively.
The goal of my fifth research project is to develop approaches in which to evaluate these hypotheses more rigorously, and that can be applied to other pathogens whose host ranges are also partly constrained by cell recognition [reviewed in Baranowski et al. (2001)]. In particular, I would like to assess how adaptation is affected by host ecology, where hosts are defined by their receptor types, contacts, and epidemiologically relevant traits (Figure 7); and how reassortment changes this picture. I will initially address this question in two ways:
- 1. Analytically: Adaptive dynamics can be more applicable than R0 maximization in studying systems with fluctuating selection pressures (Dieckmann et al. 2002). Here, selection pressure (.2,3 or 22,6 receptor availability and host immunity) changes as a function of the endogenous disease dynamics and any exogenous factors (e.g., seasonality or changes in contact rates between species). I would like to model the invasion fitness of I2,3- or 22,6-adapted mutants against the resident strain as a function of the availability of different host species. The most tractable approach might involve working with the equilibrium prevalence of the disease in each host species and assuming constant rates of contact between them.
- 2. Stochastic simulation: Fluctuations in disease prevalence resulting from endogenous dynamics, the effects of small population sizes, and the conditions corresponding to emergence are probably best modeled by stochastic simulation. I have constructed an event-driven, agent-based model in which to compare the effects of these factors against predictions.
I would like to tie this research to other theoretical models of pathogen evolution on networks. Among these models’ findings are that networks can support higher pathogen diversity (Buckee et al. 2004) and select intermediate levels of traits (Rauch et al. 2003; van Ballegooijen and Boerlijst 2004) relative to their mean-field counterparts.
6 Genotype-phenotype maps and the evolution of evolvability
Finally, I would like to analyze a group of provocative arguments about the evolution of genetic control over phenotypic variation in the context of influenza.
One of the ways in which populations may adapt to changing environments is through genetic potential
—that is, a heightened sensitivity to the effects of mutation that facilitates rapid evolution to novel states
(Meyers et al. 2005). Using a toy model, Meyers and coauthors showed that populations accumulate genetic potential fastest when mutation rates are high and the exogenously defined fitnesses of two possible phenotypes alternate every generation. This idea has also been described by codon volatility
, which measures the proportion of one-point neighbors of a codon that code for another amino acid (under the Hamming metric) or the change in the stereochemical properties of the amino acid (under the Miyata metric) (Plotkin and Dushoff 2003). Plotkin and Dushoff calculated that codons in HA1 epitopes are significantly more biased than codons in NA and NP. They later claimed that past selection can be inferred from measuring the codon volatility of a single sequence, which they demonstrated with Plasmodium (Plotkin et al. 2004). Their idea was countered with many examples, including genomes of HIV and other Plasmodium, in which volatility did not reflect selection (Friedman and Hughes 2005; Pillai et al. 2005). It was suggested that correlations between volatility and selection instead result from translational biases (Stoletzki et al. 2005). Zhang (2005) showed in simulations that strong directional selection could not increase codon volatility. We expect that selection on HA1 of influenza is frequency dependent, but that with high phenotypic diversity it more closely resembles directional selection: any mutation enabling antibody escape is advantageous, and mutations accumulate over time (Nakajima et al. 2005). For the first part of this project, I would like to revisit Plotkin and Dushoff’s calculation of volatility to determine if the volatile codons might reflect a translational bias, and if there is evidence of frequency dependent selection on HA1 codons over short time scales.
Next, I would like to explore genetic potential more broadly. It would be interesting to see if volatile codons are also the most accessible; the two properties are not identical, and random mutations should land sequences in the networks corresponding to more accessible amino acids. The theory of genetic potential has also largely ignored the consequences of higher-level interactions: Is it possible to evolve to a state of heightened sensitivity to the effects of mutation that facilitates rapid evolution to novel [phenotypes]
when amino acids interact in complex, epistatic
ways? Meyers and coauthors (2005) assume that traits under selection are controlled by independent coding regions. Kauffman (1995) argued that there exists a place in the NK adaptive landscape corresponding to phase transition between excessive order and chaos, where mutations to near neighbors usually fall in the same basin of attraction; this behavior emerges where each node interacts with only two of its neighbors (K = 2). In the NK model of Koelle et al. (in prep), even the lowest amount of epistasis (K = 1) among the amino acids comprising an epitope frequently resulted in phenotype shifts after one or two mutations. In other words, under very modest assumptions of interaction, the neutral network had too many edges for sequences to accumulate genetic potential
—there was too much potential everywhere. This is worth simulating more formally: what is the range of genetic potential under the NK model? Is there an appreciable range?
Finally, I would like to search the literature for other models of interacting amino acids that could accommodate more genetic potential. Do such landscapes have more holes? Are they based on hydrophobicity or size? If such models do not exist or have very constrained assumptions, then the concept of genetic potential either has limited application or it is not based on epistasis. Have lower K values—in other words, increasing modularity—been selected? I would like to test the compatibility of these hypotheses on genotype-phenotype landscapes and the evolution of evolvability through simulations and, if possible, mathematical analysis. Site-directed mutation experiments in different strains of H3N2 over time, including comparisons of the fitness effects of mutations in vitro and in vivo (Nakajima et al. 2003), will be useful in illuminating this problem.
Collaborators
In addition to working closely with my advisor, , I will continue collaborating with . I will take the lead on all projects described here.