The Skeptic wrote:One of the most prominent predictions of evolution is the nested hiearchical pattern of living organisms. Distrubitions of traits are expected to form nested hierarchies. Phylogenetic studies, both in molecular and morphological level attempt to find the correct phylogeny of life. However, despite such efforts, many parts of the 'tree' remains unsettled. These inconsistencies even caused some to question whether a phylogenetic tree exists or not. The theory, predicted almost perfect distrubitions of genetic changes. For example, if an Alu transposable element is present in the orthologous locus of humans, chimpanzees, orangutans and rhesus monkeys, even sequence based phylogeny of that element must be fully consistent with the same phylogeny as we infer from presence/absence patterns of Alu elements. Since, the Alu insertion is assumed to be accumulating random mutations, for example, the genetic distance of human-chimpanzee pair must be smaller than the genetic distance between human-orangutan and human-macaque pairs. The same is predicted for ERVs.
Evolutionary biology assumes that an ERV in the same locus, is identical by descent. If this were true, one can predict that accumulated mutational divergence must form the same tree. But this was far from facts. In 1999, some authors tried to test these predictions. Results were far from predictions. While some of the orthologous ERVs seemed a little bit consistent with phylogeny, many deviated from the pattern. According to some elements, we were closer to orangutans than to gorillas. Authors asserted that this inconsistencies were caused by gene conversion events. But this is not testable and repeatable. There is no way to check this. Also, incomplete lineage sorting of ancestral polymorphisms is expected to cause deviations from the pattern. In addition, deletion of ERVs, deletion of Alu elements, resurrection of pseudogenes are found to be very frequent. If you want to use a pseudogenization event as a phylogenetic marker, the results will be very misleading. The same authors found that one ERV was present in old world monkeys, orangutans and humans, but absent from the genomes of gorillas and chimpanzees. This means that evolution does not actually predict a nested hiearchy. Without nested hiearchy, one arm of evolution becomes broken. Theory loses its almost all testable predictions.
The Skeptic wrote:One of the most prominent predictions of evolution is the nested hiearchical pattern of living organisms. Distrubitions of traits are expected to form nested hierarchies. Phylogenetic studies, both in molecular and morphological level attempt to find the correct phylogeny of life. However, despite such efforts, many parts of the 'tree' remains unsettled.
The Skeptic wrote:These inconsistencies
The Skeptic wrote:even caused some to question whether a phylogenetic tree exists or not.
The Skeptic wrote:The theory, predicted almost perfect distrubitions of genetic changes. For example, if an Alu transposable element is present in the orthologous locus of humans, chimpanzees, orangutans and rhesus monkeys, even sequence based phylogeny of that element must be fully consistent with the same phylogeny as we infer from presence/absence patterns of Alu elements.
The Skeptic wrote:Since, the Alu insertion is assumed to be accumulating random mutations, for example, the genetic distance of human-chimpanzee pair must be smaller than the genetic distance between human-orangutan and human-macaque pairs. The same is predicted for ERVs.
The Skeptic wrote:Evolutionary biology assumes that an ERV in the same locus, is identical by descent.
The Skeptic wrote:If this were true, one can predict that accumulated mutational divergence must form the same tree. But this was far from facts.
The Skeptic wrote:In 1999, some authors tried to test these predictions. Results were far from predictions. While some of the orthologous ERVs seemed a little bit consistent with phylogeny, many deviated from the pattern.
The Skeptic wrote:According to some elements, we were closer to orangutans than to gorillas.
The Skeptic wrote:Authors asserted that this inconsistencies [sic] were caused by gene conversion events.
The Skeptic wrote:But this is not testable and repeatable.
The Skeptic wrote:There is no way to check this.
The Skeptic wrote:Also, incomplete lineage sorting of ancestral polymorphisms is expected to cause deviations from the pattern.
The Skeptic wrote:In addition, deletion of ERVs, deletion of Alu elements, resurrection of pseudogenes are found to be very frequent. If you want to use a pseudogenization event as a phylogenetic marker, the results will be very misleading. The same authors found that one ERV was present in old world monkeys, orangutans and humans, but absent from the genomes of gorillas and chimpanzees.
The Skeptic wrote:This means that evolution does not actually predict a nested hiearchy.
The Skeptic wrote:Without nested hiearchy, one arm of evolution becomes broken. Theory loses its almost all testable predictions.
Johnson & Coffin, 1999 wrote:ABSTRACT The genomes of modern humans are riddled with thousands of endogenous retroviruses (HERVs), the proviral remnants of ancient viral infections of the primate lineage. Most HERVs are nonfunctional, selectively neutral loci. This fact, coupled with their sheer abundance in primate genomes, makes HERVs ideal for exploitation as phylogenetic markers. Endogenous retroviruses (ERVs) provide phylogenetic information in two ways: (i) by comparison of integration site polymorphism and (ii) by orthologous comparison of evolving, proviral, nucleotide sequence. In this study, trees are constructed with the noncoding long terminal repeats (LTRs) of several ERV loci. Because the two LTRs of an ERV are identical at the time of integration but evolve independently, each ERV locus can provide two estimates of species phylogeny based on molecular evolution of the same ancestral sequence. Moreover, tree topology is highly sensitive to conversion events, allowing for easy detection of sequences involved in recombination as well as correction for such events. Although other animal species are rich in ERV sequences, the specific use of HERVs in this study allows comparison of trees to a well established phylogenetic standard, that of the Old World primates. HERVs, and by extension the ERVs of other species, constitute a unique and plentiful resource for studying the evolutionary history of the Retroviridae and their animal hosts.
Johnson & Coffin, 1999 wrote:
Retroviruses are unique among RNA viruses in their ability to integrate DNA copies of their genomes into the genome of the infected cell. On occasion, integration takes place in a germline cell, giving rise to an endogenous retrovirus (ERV), which can be inherited by the offspring of the infected host, and may eventually become fixed in the gene pool of the host population (1). The genomes of vertebrate species contain dozens to thousands of ERV sequences (2), some of which were acquired in evolutionarily recent times, whereas others derive from ‘‘ancient’’ times, as indicated by their identical site of integration in more than one species (1, 3, 4). Typically, ancient proviruses have sustained numerous point mutations, deletions, and insertions, rendering them incapable of expressing virus. No biologically active viruses have been associated with the ancient proviruses.
Despite their abundance in vertebrate genomes, and some other especially useful features described below, ERVs have rarely been exploited as phylogenetic markers (5–10). In a few instances integration site polymorphisms have served as a source of phylogenetic signal (6), or as markers for linkage analysis (11), but the usefulness of orthologous ERV nucleotide sequences has never been fully explored. Here we report the application of ancient human endogenous retrovirus (HERV) sequences to phylogenetic analysis on a time scale spanning recent primate evolution.
HERVs can be organized into at least a dozen distinct groups, which vary in size from one to thousands of members (1, 12). Cross-hybridization and PCR studies consistently reveal that most HERV families are also found in other primates, including apes and Old World monkeys (OWMs) (12–19). Many HERVs, including the ones used in this study, are the result of integration events that took place between 5 and 50 million years ago, as indicated by the distribution of specific proviruses at the same integration sites (or ‘‘loci’’) among related species. The evolution of primates has been the subject of intense study for well over a century, providing a well established phylogenetic consensus with which to compare and evaluate the performance of ERVs as phylogenetic markers.
Johnson & Coffin, 1999 wrote:RESULTS AND DISCUSSION
Building Phylogenetic Trees from ERV LTR Sequences
Endogenous retrovirus loci provide no less than three sources of phylogenetic signal, which can be used in complementary fashion to obtain much more information than simple distance estimates of homologous sequences. First, the distribution of provirus-containing loci among taxa dates the insertion. Given the size of vertebrate genomes (>1 3 × 109 bp) and the random nature of retroviral integration (22, 23), multiple integrations (and subsequent fixation) of ERV loci at precisely the same location are highly unlikely (24). Therefore, an ERV locus shared by two or more species is descended from a single integration event and is proof that the species share a common ancestor into whose germ line the original integration took place (14). Furthermore, integrated proviruses are extremely stable: there is no mechanism for removing proviruses precisely from the genome, without leaving behind a solo LTR or deleting chromosomal DNA. The distribution of an ERV among related species also reflects the age of the provirus: older loci are found among widely divergent species, whereas younger proviruses are limited to more closely related species. In theory, the species distribution of a set of known integration sites can be used to construct phylogenetic trees in a manner similar to restriction fragment length polymorphism (RFLP) analysis.
Second, as with other sequence-based phylogenetic analyses, mutations in a provirus that have accumulated since the divergence of the species provide an estimate of the genetic distance between the species. Because, for any given provirus, it is highly unlikely that there will be selection for or against any specific sequence, it is safe to assume that the rate of accumulation of mutations approximates the rate of their occurrence, with appropriate corrections for reversion. Analysis of closely related proviruses integrated at different sites should also reveal regional differences in mutation rates.
Third, sequence divergence between the LTRs at the ends of a given provirus provides an important and unique source of phylogenetic information. The LTRs are created during reverse transcription to regenerate cis-acting elements required for integration and transcription. Because of the mechanism of reverse transcription, the two LTRs must be identical at the time of integration, even if they differed in the precursor provirus (Fig. 1A). Over time, they will diverge in sequence because of substitutions, insertions, and deletions acquired during cellular DNA replication. Although it has been noted that the divergence between the two LTRs of an ERV can serve as a molecular clock (8, 15, 18, 25), there are no reported prior attempts to utilize the LTRs of individual ERV loci as a source of phylogenetic signal.
Assuming that the LTRs of an ERV are evolving independently, at approximately the same rate, and in the absence of rearrangement events, a phylogenetic tree containing 5' and 3' LTRs derived from the same ERV locus is predicted to have a topology similar to that depicted in Fig. 1B. The most useful feature of the predicted tree is the separate clustering of the 5' and the 3' LTRs. The node joining the 5' and 3' LTR clusters must be the deepest within the ingroup, since it represents the time of integration, when the two LTRs were identical. Furthermore, both clusters are predicted to have similar branching patterns as determined by the phylogenetic history of the host species, with similar branch lengths. Thus, each tree displays two estimates of host phylogeny, both of which are derived from the evolution of an initially identical sequence (compare the 5' LTR and 3' LTR clusters in Fig. 1B). As we shall see, deviation of actual trees from this prediction provides a powerful means of testing the assumptions and detecting events other than neutral accumulation of mutations in the evolutionary history of a species.
Johnson & Coffin, 1999 wrote:Species Distribution of HERV Loci
Fig. 1C depicts the PCR strategy used to determine the distribution of 6 unlinked HERV proviruses among the genomes of 12 primate species. The presence of each HERV in a given species was determined by PCR amplification of both the 5' and 3' LTRs of the HERV from genomic DNA. Two genomic DNA samples from each species were screened, except for humans (12 individuals) and bonobo (1 individual). In some cases, the absence of a HERV from a species was confirmed by PCR amplification of the uninterrupted cellular target sequence (Fig. 1C). Three of the loci, HERV-KC4, HERV-KHML6.17, and RTVL-Ia, were detectable in the genomes of OWMs and hominoids, but not New World monkeys, and therefore integrated into the germ line of a common ancestor of the Old World lineages. HERV-K18, RTVL-Ha, and RTVL-Hb were found exclusively in humans, gorillas, chimpanzees, and bonobos, and thus are consistent with a gorilla/chimpanzee/human clade. None of the loci was detected in New World monkeys.
Evolution of HERV LTR Sequences
For each HERV locus, the amplified LTRs from each species were directly sequenced, and the aligned sequences were used to generate phylogenetic trees (Fig. 2). The 5' and 3' LTRs of HERV-KHML6.17 fell into two distinct clusters, in accord with prediction (Fig. 2A). Moreover, both LTR cluster topologies are consistent with established versions of primate species phylogeny (26–29). As has been the case with numerous nuclear DNA markers, there was no consensus among the HERV trees for the relationship among humans, chimpanzees, and gorillas (30). The remaining trees displayed interesting deviations from the predicted separation of the 5' and 3' LTR sequences.
Fig. 2B shows the trees for the HERV-K18 LTR sequences. Contrary to expectation, the 5' and 3' LTRs of the gorilla provirus cluster together instead of with their counterparts from the other three species (compare Fig. 1B and Fig. 2B). The gorilla LTRs are separated from the other HERV-K18 LTRs by substitutions at 11 sites (Fig. 3A). Assuming that the two LTRs of an HERV locus are evolving independently, every substitution within the ingroup should be manifest as a difference between the 5' and 3; LTRs (compare the 5' and 3' LTR patterns above the white arrows in Fig. 3A). Substitution patterns at the 11 sites in question, however, do not differ between the 5' and 3' LTRs within a species (black arrows in Fig. 3A). For example, a substitution at site 242 appears in both the 5' and 3' gorilla LTRs. Although it is possible that any one position may suffer an identical substitution in both LTRs by chance, the probability of 11 positions undergoing identical substitutions in both LTRs is exceedingly low. It is far more likely that most of the 11 substitutions occurred only once, and that the two LTRs were homogenized by gene conversion. The tree in Fig. 2B is consistent with gene conversion between both LTRs of either the gorilla provirus or the human/chimpanzee provirus.
Alternatively, the topology of the tree in Fig. 2B may indicate that the HERV-K18 provirus of gorillas and the HERV-K18 provirus of humans/chimpanzees are not true orthologues. There are at least two mechanisms to explain this possibility:
(i) The proviruses are derived from two independent integration events (xenology). This possibility would require two nearly identical viruses (differing by no more than 11 substitutions within the LTRs) to integrate into precisely the same nucleotide position in two different lineages—a highly unlikely possibility. A similarly unlikely variation on this possibility is independent integrations into very similar cellular target sequences.
(ii) In one of the two lineages, HERV-K18 was largely replaced by recombination with a separate (but nearly identical) provirus. Such recombination would have been restricted to sequences within the provirus, as the flanking cellular sequences are identical in both lineages. It should be noted that there are hundreds of HERV-K LTR sequences within the primate genome (31–34).
The HERV-K(C4) LTR sequences (Fig. 2C) give the predicted topology; however, as noted previously (15, 16), the provirus was missing altogether from gorilla and chimpanzee DNA, in which only an unoccupied integration site was detectable. HERV-K(C4) is found in some ape and OWM species, proving that integration occurred in a common ancestor of apes and OWMs (15, 16). The provirus is located within the human C4B gene, which arose by duplication before the separation of the apes and OWMs. The absence of HERV-K(C4) from some species is most likely caused by frequent homogenization of the C4-CYP21 locus (35), resulting in conversion back to the unoccupied integration site. Both alleles of the C4 locus (with and without the HERV-K(C4) provirus) have been identified within more than one species, suggesting that such conversions have occurred multiple times during primate evolution (35).
The RTVL-Ia tree (Fig. 2D) deviates from expectation by the joining of the outgroup to the ingroup at a node that separates the 5' African green monkey sequences from all the other ingroup sequences. Inspection of the alignment reveals 10 substitutions on the RTVL-Ia tree that contribute to this unexpected branching pattern (dashed line in Fig. 2D). Seven of these sites fall within a 52-bp stretch (arrows in Fig. 3B). Within this segment, the gibbon 5' LTR is identical to the 3' LTRs (including the gibbon 3' LTR). After the gibbon lineage branched off from the other primate lineages, a portion of the 3' LTR must have been transferred to the 5' LTR by gene conversion. Because of this conversion, the most parsimonious tree identifies the gibbon 5' LTR as the progenitor of all the 3' LTRs, and incorrectly invokes parallel evolution (homoplasy) to explain the appearance of identical substitutions in the African green monkey 5' LTR and the 5' LTRs of orangutan and apes. After eliminating the hybrid gibbon 5' LTR from the analysis, the most parsimonious explanation for the sequence at these sites is shared, derived evolution (Fig. 2E). Moreover, all the new trees have the predicted topology (including those derived by neighbor-joining and maximum likelihood methods), with the root of the tree separating the 5' and 3' LTRs of the ingroup into two distinct lineages.
Johnson & Coffin, 1999 wrote:The trees in Fig. 2 F, G, and H contain proviruses of the very large RTVL-H family (36, 37). The two loci were identified by searching the genome databases for RTVL-H-related sequences and are referred to here as RTVL-Ha and RTVL-Hb. The RHTVL-Ha provirus tree conforms well to the expected topology (Fig. 2F); however, the RTVL-Hb cluster (Fig. 2G) bears no resemblance to primate species phylogeny. Most of the substitutions fall on terminal branches and provide no phylogenetic signal. One interpretation is that the RTVL-Hb sequences are recombining with other RTVL-H loci, which would have the effect of homogenizing the sequences in a type of concerted evolution. RTVL-H is the largest known HERV family, containing over 1,000 members (18), which may serve as a source of sequences for recombination.
The tree in Fig. 2H is a particularly effective illustration of the principle that LTRs derived from the same provirus cluster together. This tree contains LTRs from four related proviruses, RTVL-Ha, RTVL-Hb, and the proviruses designated as outgroups, RTVL-H and RTVL-H2. All LTRs cluster exclusively with sequences from the same provirus, with a high level of bootstrap support for the nodes separating the four loci. The RTVL-Ha and RTVL-Hb clades do not differ significantly from the trees generated for the two proviruses separately in Fig. 2 F and G, respectively.
Nucleotide Substitution Patterns
Some authors have suggested that methyl-CG deamination has evolved as a specific defense against colonization of the genome by ERVs (38). Existence of such a mechanism should be manifest as a bias toward C.G .> T.A transitions within CG dinucleotides. Tracing the pattern and direction of shared derived substitutions on each of the HERV trees revealed such a bias. Table 1 shows the distribution of C.G -> T.A changes among the C/G-containing dinucleotides in the ancestral LTR sequence. In every case, the number of C.G -> T.A substitutions per C/G dinucleotide is 5- to 10-fold higher than for any of the six other dinucleotide contexts. Indeed, over 40% of the total C 3 T and G 3 A transitions are attributable to C/G changes, despite the fact that C/G is much less frequent than any other dinucleotide. This imbalance is consistent with methyl-C/G deamination as a mechanism for generating C.G -> T.A transitions (39); however, the issue of whether a mechanism for promoting CG deamination has evolved specifically as a defense against ERVs requires a careful comparison of the substitution patterns in ERV sequences with those of other nuclear DNA markers.
Estimating the Time of Integration
The genetic distance between the 5' and 3' LTRs of an ERV reflects mutations accumulated since the time of integration and should therefore be proportional to the age of the provirus. HERV-KC4, HERV-KHML6.17, and RTVL-Ia are found in both OWMs and hominoids, which are estimated to have last shared a common ancestor over 31 million years ago. By contrast, HERV-K18, RTVL-Ha, and RTVL-Hb are found only in humans, chimpanzees, and gorillas, which are thought to have diverged around 5 million years ago (40–42). To estimate the age of each provirus the human/chimpanzee distances from each tree were used to calibrate the rate of molecular evolution at each locus (Table 2). The most recent common ancestor of humans and chimpanzees lived approximately 4.5 million years ago (40–42), so dividing the distance between the human and chimpanzee sequences (substitutions per site) by this number gives rates ranging from 2.3 to 5.0 × 10-9 substitutions per site per year. These numbers are similar to the estimated rates of evolution for pseudogenes and noncoding regions of mammalian genes (43–45). Applying each rate to the divergence between the 5' and 3' LTRs of the same locus gives integration times consistent with estimates based on species distribution (Table 2).
A number of authors have pointed out that molecular clock calibrations are subject to a wide margin of error, and are usually based on imprecise estimates of divergence dates (46–48). The calculations in Table 2 are therefore only rough estimates of absolute time, but they are nonetheless useful for comparing the relative ages and rates of evolution of different HERV loci.
The study reported here is, to our knowledge, the first to take advantage of special properties of retroelements to provide insight into evolutionary mechanisms. The HERVs analyzed above include six unlinked loci, representing five unrelated HERV sequence families. Except where noted, these sequences gave trees that were consistent with the well established phylogeny of the old world primates, including OWMs, apes, and humans. Within this time scale genetic distances were less than 10% for all orthologous comparisons, and correction for multiple substitutions did not significantly alter branch lengths or tree topologies (data not shown). As with other nuclear DNA sequences, analyses of older phylogenetic relationships by using ERVs are likely to require such corrections.
One surprising result is the high frequency of conversion we observed. Indeed, only two of the six loci analyzed had suffered no such events in any lineage. Solo LTRs, which arise by recombinational deletion of the intervening viral genes, and which are found by the thousands in the genomes of many animal species, are further evidence for high frequency of recombination involving ERV sequences (1, 49–51). The mechanism that gives rise to such events is unlikely to be provirus-specific, but probably reflects the likelihood of conversion among any repeated, nuclear DNA sequences. Because many ERVs belong to multicopy families, it is also possible that interlocus recombination gives rise to concerted evolution among some of these loci. This latter mechanism may explain the rather confusing topology of the RTVL-Hb tree (Fig. 2G)
The use of LTR-to-LTR divergence to estimate insertion times has been reported previously (8, 15, 18, 25), but such studies invariably ignore the possibility of sequence conversion. Only one report (25) discussed the concern that sequence conversion between LTRs can result in an underestimate of insertion time, and suggested that conversion events should be detectable as deletions or alterations of the sequences flanking the LTRs. However, most of the loci analyzed in our study have clearly undergone conversion/recombination, yet none of these events resulted in loss or alteration of flanking sequences (data not shown). Phylogenetic analysis using HERV LTR sequences gives rise to trees with a predictable topology, on which is superimposed the phylogeny of the host taxa, and allows ready detection of conversion events. Once aberrant sequences are identified, they can be eliminated from an analysis, and the remaining sequences can be used to calculate insertion times, delineate substitution patterns, and decipher host phylogeny. Because ERVs are abundant within the genomes of many animal species, including (but not limited to) plants, insects, mollusks, fish, rodents, domestic pets, and livestock, the ERV approach can be applied to an endless variety of phylogenetic puzzles (1, 2).
DarthHelmet86 wrote:Oh this thread looks like it will be fun.
Abbie Smith wrote:Paleovirology seems to work the opposite of the way the Creationist want. The more information we have, the further back the timeline shifts, not vice versa.
For example, we used to think HIV-1 started in humans ~1930, but after we found more ‘old’ HIV sequences, the clock got pushed back to 1902-1921.
We used to think Simian Immunodeficiency Virus emerged 1266-1685, but after we found more ‘old’ SIV sequences, the clock got pushed back 76,794 years.
After a recent finding in fish, the evolutionary history of retroviruses got pushed back… 400 million years.
An Endogenous Foamy-like Viral Element in the Coelacanth Genome
They found an endogenous retrovirus in a fishy genome. But not just any ol ERV… a foamy virus ERV. Foamy viruses are complex, and only found in land mammals, thus assumed to be relatively recent inventions of nature.
But here is a foamy virus, plopped in the middle of a fishy genome, itself about 19 million years old.
When they compared the fishy foamy ERV to mammalian foamy viruses, and lots of different kinds of exogenous and endogenous retroviruses, they found:
… the most parsimonious explanation of this phylogenetic pattern is that foamy viruses infecting land mammals originated ultimately from a prehistoric virus circulating in lobe-finned fishes.
… The common ancestor of coelacanths and tetrapods must have existed prior to the earliest known coelacanth fossil, which is 407–409 million years old.
This means that complex retroviruses are at least ~400 million years old (how old are the simple ones??? HA!). They followed us out of the water, and onto the land.
Its like every great event in our evolutionary history, we were walking hand-in-hand with our retroviruses… footprints in the sand…
Han & Worobey, 2012 wrote:Abstract
Little is known about the origin and long-term evolutionary mode of retroviruses. Retroviruses can integrate into their hosts' genomes, providing a molecular fossil record for studying their deep history. Here we report the discovery of an endogenous foamy virus-like element, which we designate ‘coelacanth endogenous foamy-like virus’ (CoeEFV), within the genome of the coelacanth (Latimeria chalumnae). Phylogenetic analyses place CoeEFV basal to all known foamy viruses, strongly suggesting an ancient ocean origin of this major retroviral lineage, which had previously been known to infect only land mammals. The discovery of CoeEFV reveals the presence of foamy-like viruses in species outside the Mammalia. We show that foamy-like viruses have likely codiverged with their vertebrate hosts for more than 407 million years and underwent an evolutionary transition from water to land with their vertebrate hosts. These findings suggest an ancient marine origin of retroviruses and have important implications in understanding foamy virus biology.
Han & Worobey, 2012 wrote:Introduction
Foamy viruses are complex retroviruses thought exclusively to infect mammalian species, including cats, cows, horses, and nonhuman primates . Although human-specific foamy viruses have not been found, humans can be naturally infected by foamy viruses of non-human primate origin [2–4]. Comparing the phylogenies of simian foamy viruses (SFVs) and Old World primates suggests they co-speciated with each other for more than 30 million years . Retroviruses can invade their hosts’ genomes in the form of endogenous retroviral elements (ERVs), providing ‘molecular fossils’ for studying the deep history of retroviruses and the long-term arms races between retroviruses and their hosts [6,7]. Although ERVs are common components of vertebrate genomes (for example, ERVs constitute around 8% of the human genome) , germline invasion by foamy virus seems to be very rare [9,10]. To date, endogenous foamy virus-like elements have been discovered only within the genomes of sloths (SloEFV)  and the aye-aye (PSFVaye) . The discovery of SloEFV extended the co-evolutionary history between foamy viruses and their mammal hosts at least to the origin of placental mammals . However, the ultimate origin of foamy virus
and other retroviruses remains elusive.
The continual increase in eukaryotic genome-scale sequence data is facilitating the discovery of additional ERVs, providing important insights into the origin and long-term evolution of this important lineage of viruses. In this study, we report the discovery and analysis of an endogenous foamy virus-like element in the genome of the coelacanth (Latimeria chalumnae), which we designate ‘coelacanth endogenous foamy-like virus’ (CoeEFV). The discovery CoeEFV offers unique insights into the origin and evolution of foamy viruses and the retroviruses as a whole.
Discovery of foamy virus-like elements within the genome of coelacanth
We screened all available animal whole genome shotgun (WGS) sequences using the tBLASTn algorithm using the protein sequences of representative foamy viruses (Table S1) and identified several foamy virus-like insertions (Table S2 and Fig. S1) within the genome of L. chalumnae, one of only two surviving species of an ancient Devonian lineage of lobe-finned fishes that branched off near the root of all tetrapods [11–15]. There are numerous inframe stop codons and frame-shift mutations present in these CoeEFV elements, suggesting that the CoeEFV elements might be functionally defective. Although more than 230 vertebrate genome scale sequences are currently available, endogenous foamy virus elements have been only found in the aye-aye, sloths, and coelacanth, indicating that germline invasion of foamy virus is a rare process [9,10]. We extracted all contigs containing significant matches and reconstructed a consensus CoeEFV genomic sequence (Fig. S2). The resulting consensus genome shows recognizable and typical foamy virus characteristics (Fig. 1). Its genome has long terminal repeat (LTR) sequences at both 5' and 3' ends and encodes the three main open reading frames (ORFs), gag, pol, and env, in positions similar to those of exogenous foamy viruses (Fig. 1). Two additional putative ORFs were found at positions similar to known foamy virus accessory genes but exhibit no significant similarity (Fig. 1). Notably, we found that the Env protein is conserved among foamy viruses and the coelacanth virus-like element (Fig. 2). A Conserved Domain search  identified a conserved foamy virus envelope protein domain (pfam03408) spanning most (887 of 1016 residues) of the CoeEFV Env protein, with an E-value of 1.36 × 10-69 (Fig. 2). The CoeEFV Env protein shares no detectable similarity with other (non-foamy
virus) retroviral Env proteins or with retroviral elements within available genomic sequences of other fishes, such as the zebrafish (Danio rerio). Hence, it provides decisive evidence that CoeEFV originated from a foamy-like virus.
To exclude the possibility that these CoeEFV elements result from laboratory contamination, we obtained a tissue sample of L. chalumnae and succeeded in amplifying CoeEFV insertions within the genome of L. chalumnae via PCR with degenerate primers designed for conserved regions of foamy virus pol and env genes.
To establish the position of CoeEFV on the retrovirus phylogeny, conserved regions of the Pol protein sequences of CoeEFV and various representative endogenous and exogenous retroviruses were used to reconstruct a phylogenetic tree with a Bayesian approach. The phylogenetic tree shows that CoeEFV groups with the foamy viruses with strong support (posterior probability = 1.00; Figs. 3 and S3), confirming that CoeEFV is indeed an endogenous form of a close relative of extant foamy viruses. The discovery of CoeEFV establishes that a distinct lineage of exogenous foamy-like viruses existed (and may still exist) in species outside the Mammalia.
CoeEFV likely invaded the coelacanth genome more than 19 million years ago
Endogenous retroviruses are likely to undergo a gradual accumulation of neutral mutations with host genome replication after endogenization . To date the invasion of CoeEFV into coelacanth genome, we identified two sets of sequences, each of which arose by segmental duplication because each set of sequences shares nearly identical flanking regions (Fig. S4). The two sets contain five and two sequences, respectively. Because the divergence time of the two extant coelacanth species (L. chalumnae and L. menadoensis) is uncertain , it is impossible to obtain a reliable neutral evolutionary rate of coelacanth species. Nevertheless, even using the mammalian neutral evolutionary rate  as a proxy for the coelacanth rate, the invasion dates were conservatively estimated at 19.3 (95% highest posterior density [HPD]: 15.3–23.6) million years ago for the dataset of five sequences. For the dataset containing two sequences, the divergence between the pair is estimated to be 4.1% and the invasion time is estimated to be approximately 9.3 million years ago. Because the CoeEFV invasion almost certainly occurred earlier than the duplication events within the host genome and because the evolutionary rate of coelacanth species is thought to be lower than other vertebrate species [19,20], the time of CoeEFV integration might be much more than 19 million years. Additional phylogenetic evidence (see below) suggests that its exogenous progenitors likely infected coelacanths for hundreds of millions of years prior to the event that fossilized CoeEFV within its host’s genome.
Foamy-like viruses have likely codiverged with their vertebrate hosts for at least 407 million years
To further evaluate the relationship of foamy viruses, we reconstructed phylogenetic trees based on the conserved region of Pol proteins of foamy viruses and Class III retroviruses, the conserved region of foamy virus Pol and Env protein concatenated alignment, and the conserved region of foamy virus Env protein alignment, respectively. The three phylogenies have the same topology in terms of foamy viruses (Figs. 4, S5, and S6). CoeEFV was positioned basal to the known foamy viruses (Fig. 4), suggesting a remarkably ancient ocean origin of foamy-like viruses: the most parsimonious explanation of this phylogenetic pattern is that foamy viruses infecting land mammals originated ultimately from a prehistoric virus circulating in lobe-finned fishes. The branching order of the three foamy virus phylogenies (Fig. 4, S5, and S6) is completely congruent with the known relationships of their hosts, and each node on the three virus trees is supported by a posterior probability of 1.0 (except the node leading to equine, bovine, and feline foamy viruses on the Env phylogeny, which is supported by a posterior probability of 0.94; Fig. S6). The common ancestor of coelacanths and tetrapods must have existed prior to the earliest known coelacanth fossil, which is 407–409 million years old . The completely congruent virus topology, therefore, strongly indicates that an ancestral foamy-like virus infected this ancient animal. Crucially, the foamy viral branch lengths of the three phylogenies are highly significantly correlated with host divergence times (R2= 0.7115, p= 1.10 × 10-5, Fig. 5; R2 = 0.7024, p= 1.41 × 10-5, Fig. S5; and R2= 0.7429, p= 4.26 ×10-6, Fig. S6), a pattern that can reasonably be expected only if the viruses and hosts codiverged. It is worth emphasizing that we used a consensus sequence to represent CoeEFV in these analyses, so its branch length should correspond roughly to that of the exogenous virus that integrated >19 million years ago, rather than within-host mutations since that time.
There are two alternative explanations for these phylogenetic patterns. One is that the exogenous progenitor of CoeEFV is not truly the sister taxon to the mammalian foamy viruses, but a more distant relative. The robust posterior probability (1.00) placing them in the same clade and the absence of evidence for viruses or virus-like elements from other species disrupting this clade argue against this view, as does the significant similarity between the Env proteins of CoeEFV and the foamy viruses (Fig. 2). Moreover, its branch length would be difficult to explain under such a scenario. If the coelacanth foamy-like virus lineage and the mammalian foamy virus lineage did not share a most recent common ancestor in their ancestral host, why is CoeEFV neither more nor less divergent from the mammalian foamy viruses than one might expect if they did?
The other alternative to the hypothesis that these viruses have co-diverged over more than 407 million years is that they somehow moved, in more recent times, from terrestrial hosts to sarcopterygian hosts that inhabited the deep sea, and that the similarity of the coelacanth virus to the mammalian viruses is due to cross-species (in fact cross-class) transmission, rather than shared history. However, as illustrated by the significant correlation between host divergence times and viral distances (Figs. 5, S5, and S6), the long branches leading to CoeEFV and the clade of mammal foamy viruses suggest the virus had already circulated in vertebrates for an extremely long time before the origin of mammal foamy virus. Given that there is strong evidence that placental mammals were already being infected with foamy viruses by about 100 million years ago , the distinctness of the coelacanth virus suggests that it would have to have crossed from some other unidentified host, one whose foamy-like virus was already hundreds of millions of years divergent from the mammalian viruses. This seems highly unlikely. Although crossspecies transmission of SFVs has been observed [2–5,22], foamy viruses seem to mainly follow a pattern of co-diversification with their hosts [5,9]. If one accepts that the endogenous foamy viruses within the genomes of sloths indicate more than 100 million years of host-virus co-divergence, it seems plausible that CoeEFV extends that timeline by an additional 300 million years.
Moreover, the habitat isolation of the coelacanth and terrestrial vertebrates would have provided limited opportunities for direct transfer of foamy viruses to coelacanths. Taken together, these lines of evidence strongly suggest that foamy viruses and their vertebrate hosts have codiverged for more than 407 million years, and that foamy viruses underwent a remarkable evolutionary transition from water to land simultaneously with the conquest of land by their vertebrate hosts.
Our analyses provide compelling evidence for the existence of retroviruses going back at least to the Early Devonian. This is the oldest estimate, to our knowledge, for any group of viruses, significantly older than the previous estimates for hepadnaviruses (19 million years)  and large dsDNA viruses of insects (310 million years) . Although highly cytopathic in tissue culture, foamy viruses do not seem to cause any recognizable disease in their natural hosts [1,25,26]. Such long-term virus-host coevolution may help explain the low pathogenicity of foamy viruses. The fact that the Env is well conserved between CoeEFV and foamy viruses is consistent with the fact that these viruses are asymptomatic and mainly co-evolve with their hosts in a relatively conflict-free relationship. It is easy to imagine that previously overlooked examples of such a non-pathogenic virus may yet be found in hosts that fill in some of the gaps in the phylogeny, namely amphibians, reptiles, and birds. It will be of interest to screen these hosts, but also various fish species, for evidence of exogenous and/or endogenous foamy-like viruses.
An ancient marine origin of retroviruses
Dating analyses provide the clearest evidence for when and where retroviruses originated. There is strong evidence that foamy viruses shared a common, exogenous retroviral ancestor more than 400 million years ago (since Env was present in both terrestrial and marine lineages). The discovery of endogenous lentiviruses demonstrates that lentiviruses, a distinct retroviral lineage that includes HIV, are also millions of years old [27–30]. Foamy viruses and lentiviruses share a distantly related ancestor (Figs. 3, S3) and the foamy virus clade alone almost certainly accounts for more than 407 million years of retroviral evolution. It follows that the origin of at least some retroviruses is older than 407 million years ago. As with the coelacanth lineage in the foamy virus clade, we found that retroviruses of fishes occupy the most basal positions within both the Class I and Class III retroviral clades (walleye dermal sarcoma virus (WSDV) and snakehead retrovirus (SnRV), respectively, blue asterisks), (Figs. 3, S3). This pattern provides additional evidence of a marine origin and longterm coevolution of these major retroviral lineages. However, to be specific, the phylogenetic reconstruction in Fig. 3 reflects the history of only of the Pol protein, not a comprehensive history of retroviral genomic evolution. Nevertheless, our analyses support a very ancient marine origin of retroviruses.
Users viewing this topic: No registered users and 1 guest