Molecular Evolution Glossary

Molecular Evolution Glossary

A ~ B ~ C ~ D ~ E ~ F ~ G ~ H ~ I ~ L ~ M ~ N ~ O ~ P ~ R ~ S ~ T ~ U ~ W


Adaptation. Evolutionary changes driven by positive selection that increase fitness.

Allele. A particular genetic variant at a given locus.

Analogy. In evolutionary terms, analogy indicates similarity through convergence rather than homology.


Bootstrapping. A method for attaching confidence values to the branches of a phylogenetic tree.

Branch. Nodes on a phylogenetic tree are joined by branches, which represent a particular period of evolutionary time. Often (but not always), the length of the branch will indicate the amount of evolutionary change that has taken place.


Clade. A group of terminal nodes (e.g. species, genes or proteins) that share a common ancestral node.

Coding Sequence. The part of an mRNA sequence (a transcribed protein-coding gene with any introns spliced out) that encodes for the protein sequence (the "cistron") - translated using the Genetic Code.

Coding Substitution. See non-synonymous substitution.

Codon. A three-letter section of a coding sequence that encodes for a single amino acid. Due to the degenerate nature of the genetic code (64 codons coding for only 20 amino acids plus a STOP signal) means that changes at the third position of a codon often constitute a synonymous substitution.

Coevolution. The evolution of two interacting entities, where changes in one drives changes in the other. Coevolution can happen at many levels – between species, between proteins and even between residues within a protein.

Conservation. Evolutionary similarity, as opposed to divergent evolution.

Conservative Substitution. A substitution that replaces one amino acid with another of similar physiochemical properties.

Convergent Evolution. The independent acquisition of the same trait in different evolutionary events.


Deleterious. A mutation/allele that reduces the fitness of the carrier.

Distance Matrix. An all-by-all matrix of distances derived from pairwise comparisons of OTUs used for phylogenetic tree construction. .

Divergent Evolution. The divergence of homologues through time following a speciation or duplication event.

Duplication. An evolutionary event where genetic material is duplicated and subsequently two copies are inherited. Duplications can occur at many levels, including parts of genes/proteins (exons/domains), whole genes/Operons, whole chromosomes or even whole genomes (WGD).


Evolution. In the context of molecular evolution, evolution is the change in allele frequencies within a population, ultimately leading to fixation.


Fitness. The relative success of an organism or mutation under selection compared to “wildtype”. Fitness (and selection) is highly context-dependent and the same phenotype may have very a different fitness in different environments.

Fixation. When an allele reaches a frequency of (effectively) 100%.


Gene. "Gene" can refer to a physical functional genetic locus, or the fundamental information unit of heredity and evolution (as in "gene pool"). Although often used synonymously with "protein-coding gene", it should be remembered that it does not always mean this.

Gene Family. A family of homologous genes that are related through gene duplication events. For multi-domain proteins, different domains may be members of different families and have distinct evolutionary histories.

Genetic Code. The three-letter code that is used for translating the coding sequence of a protein-coding gene into amino acids.

Genotype. The genetic makeup of an individual.


HGT. See Horizontal Gene Transfer.

HTU. See Hypothetical Taxonomic Unit.

Homologous. Two sequences that show homology (e.g. shared evolutionary ancestry).

Homology. Relationship through shared evolutionary ancestry.

Homology Search. Searching a sequence database for protein or nucleotide sequences with sequence similarity to a given query.

Homoplasy. Independent evolution of the same trait in different taxa. When mapped onto a correct phylogeny, such a trait will appear polyphyletic. Homoplasy can confuse attempt to construct a phylogeny by making the affected taxa look more similar to each other than they should.

Horizontal Gene Transfer. Inheritance/incorporation of genetic material from a source other than parents, e.g. virus, plasmid etc.

Hypothetical Taxonomic Unit. An internal node of a phylogenetic tree.


Indel. Genetic insertion/deletion.

Informative Site. A nucleotide or amino acid position that is able to group two or more sequences together to the exclusion of the rest. Used in maximum parsimony.

Informative Trait. A character that is able to group two or more species together to the exclusion of the rest. Used in maximum parsimony.


Locus. A physical genetic location in a genome.

Long Branch Attraction. An artefact of maximum parsimony where homoplasy tends to attract sequences that are very divergent from the rest of the tree.

Long Branch Migration. An artefact of distance matrix phylogenetic tree methods where rapidly evolving lineages that are very divergent from the rest of the tree tend to migrate to the root of the tree (and each other).


MRCA. Most Recent Common Ancestor. The most recent share evolutionary ancestor of a group of species or proteins. This is not (usually) a literal individual but will instead refer to a population.

MSA. See Multiple Sequence Alignment.

Maximum Likelihood. A phylogenetic tree construction method that selects the best tree by maximising the likelihood (probability) of the derived phylogeny given an evolutionary model.

Maximum Parsimony. A phylogenetic tree construction method that selects the best tree by maximising parsimony (i.e. minimising evolutionary changes).

Midpoint Rooting. Rooting a phylogeny on the branch that is equidistant from the two most distance OTUs.

Missense Mutation. A non-synonymous substitution that alters the encoded amino acid for a different amino acid (i.e. not a stop codon).

Molecular Clock. The prediction of The Neutral Theory that, if most fixed changes are the result of neutral mutations, molecular evolution will occur at a reasonably regular “clock-like” rate, determined primarily by the neutral mutation rate.

Molecular Evolution. The study of the evolution of DNA and protein sequences.

Monophyletic. A trait (physical or genetic) that occurs within a single clade on a phylogenetic tree and thus can be explained by a single evolutionary event.

Multiple Sequence Alignment. The alignment of homologous DNA or protein sequences. An alignment of two sequences is referred to as a “Pairwise sequence alignment”.


Negative Selection. See purifying selection.

Neighbour-Joining. A distance-matrix based phylogenetic tree construction method that does not assume a molecular clock.

Neofunctionalisation. The process by which, following gene duplication, a new protein function evolves in one of the duplicates.

Neutral Evolution. The accumulation (fixation) of neutral mutations over time by random genetic drift.

Neutral Mutation. A mutation that does not affect fitness.

Node. A point on a phylogenetic tree representing either an extant species/protein/gene (a “terminal node”) or a speciation/duplication event (“internal node”) ancestral to all species/proteins/genes in that clade.

Non-Synonymous Substitution. A substitution in a coding sequence of DNA that affects the protein sequence encoded.

Nonsense Mediated Decay. Process by which an mRNA encoding a premature stop codon may result in RNA degradation and no expression, rather than expression of a truncated protein.

Nonsense Mutation. A non-synonymous substitution that replaces an amino acid with a stop codon, thereby prematurely ending translation and truncating the protein. Truncated proteins may be subject to nonsense mediated decay.


OTU. See operational taxonomic unit.

Operational Taxonomic Unit. A sequence or organism used as the terminal nodes of a phylogenetic tree.

Orthology. Proteins/genes that are related by speciation events. Typically the “same protein in different species”, although subsequent gene duplications can result in complex “one-to-many” or “many-to-many” orthology relationships. A type of homology.

Outgroup. An operational taxonomic unit (OTU) known (or presumed) to branch off ancestrally to all other OTUs in the phylogeny. Often used for rooting or parsimony analysis.

Outgroup Rooting. Rooting a phylogeny on the branch leading to the outgroup.


Pairwise Sequence Alignment. See MSA.

Parallel Evolution. Independent evolution in closely related species that follows the same trajectory.

Paralogy. Different members of a gene family, related by duplication events. Easiest remembered as a “different protein in the same species” (in contrast to orthology) but it should be remembered that paralogues will also be present different species if there have been subsequent speciation events. A type of homology.

Paraphyletic. A distinct genetic or physical trait that is shared by all the individuals in a clade barring those belonging to one or more monophyletic groups.

Parsimony. The simplest explanation for an observation. The smallest number of changes needed to explain the data.

Phenotype. The expressed product of the genotype, upon which selection acts.

Phylogenetic Tree. Graphic representation of a phylogeny. Extant species/proteins/genes and historical events (speciation and duplication) form “nodes” on the tree, which are joined by “branches”.

Phylogeny. The evolutionary relationship of species/genes/proteins.

Pleiotropy. A single gene or mutation can affect several different traits. This is known as pleiotropy. This is particularly important when considering the role of selection in evolution – a mutation may be beneficial for one trait but have neutral or even deleterious affects on other traits.

Point Mutation. A single nucleotide substitution.

Polymorphism. The presence of at least two alleles of a particular genetic locus in the population.

Polyphyletic. A distinct genetic or physical trait that is shared by individuals in different clades and needs multiple evolutionary events (gains and/or losses of the trait) to explain the observed pattern.

Polyploidy. The presence of multiple genome copies, typically as the result of ancestral whole genome duplications (WGD). (Many domestic crops are polyploidy as a result of hybridisation and artificial selection.).

Population Genetics. The study of genetic variation, selection and evolution in populations.

Positive Selection. The evolutionary force that increases the frequency (and ultimately fixes) an allele that gives a selective advantage by increasing the fitness of the individual possessing the allele.

Protein-coding Gene. A region of DNA (genetic locus) that encodes for one or more protein sequences.

Purifying Selection. The evolutionary force that removes deleterious (harmful) alleles/mutations that lower the fitness of the individual possessing the allele. To be selected against, the decrease in fitness must be strong enough to overcome random genetic drift.


RGD. See Random Genetic Drift.

Radical Substitution. A non-synonymous substitution that replaces an amino acid with a different amino acid with very different physiochemical properties.

Random Genetic Drift. Random changes in allele frequencies over time due to chance differences in inheritance of different alleles. RGD is stronger in smaller populations.

Root. The hypothetical ancestral (MRCA) node (and HTU) of a phylogenetic tree.

Rooting. Defining the ancestral point (and HTU) of a phylogenetic tree.


SNP. Single nucleotide polymorphism.

Selection. The evolutionary force that alters allele frequencies (genotypes) based on the changes in fitness (phenotypes) they confer. Selection may be natural or artificial.

Silent Substitution. See non-synonymous substitution.

Similarity. The observation that two things resemble each other. Without further definition regarding the nature of the similarity, this is a fairly meaningless term.

Speciation. The evolutionary process by which one ancestral species diverges into two distinct descendant species.

Subfunctionalisation. The partitioning of existing protein functions following gene duplication.

Synonymous Substitution. A point mutation in coding DNA that does not result in a change of the encoded amino acid.

Synteny. The physical co-localisation of genes on the same chromosome.


Topology. The branching order of a phylogenetic tree.


UPGMA. A simple distance-matrix based phylogenetic tree construction method that assumes a molecular clock.


WGD. See Whole Genome Duplication.

Whole Genome Duplication. A rare evolutionary event that doubles the genetic content of an organism and gives rise to polyploidy.

© RJ Edwards 2012. Last modified 4 Jun 2012.

No comments:

Post a Comment

Thanks for leaving a comment! (Unless you're a spammer, in which case please stop - I am only going to delete it. You are just wasting your time and mine.)