The Evolutionary Genomics of Cichlid Fishes: Explosive Speciation and Adaptation in the Postgenomic Era

With more than 1,500 species, cichlid fishes provide textbook examples of recent and diverse adaptive radiations, rapid rates of speciation, and the parallel evolution of adaptive phenotypes among both recently and distantly related lineages. This extraordinary diversity has attracted considerable interest from researchers across several biological disciplines. Their broad phenotypic variation coupled with recent divergence makes cichlids an ideal model system for understanding speciation, adaptation, and phenotypic diversification. Genetic mapping, genome-wide analyses, and genome projects have flourished in the past decade and have added new insights on the question of why there are so many cichlids. These recent findings also show that the sharing of older DNA polymorphisms is extensive and suggest that linage sorting is incomplete and that adaptive introgression played a role in the African radiation. Here, we review the results of genetic and genomic research on cichlids in the past decade and suggest some potential avenues to further exploit the potential of the cichlid model system to provide a better understanding of the genomics of adaptation and speciation.


INTRODUCTION
Charles Darwin recognized that extinction, adaptation, and speciation are the most fundamental evolutionary outcomes. But only now, more than 150 years after the publication of On the Origin of Species, are evolutionary biologists beginning to understand (in a few cases) the genomics and genetics of adaptation and speciation (22,100,101). We are still far from understanding more generally or being able to predict common evolutionary patterns and processes at the level of the gene or the genome that explain adaptation and speciation.
Although identifying generalities of adaptation and speciation has been a major goal of evolutionary biologists since the inception of the field, until recently, this line of research was limited to a few genetic model systems studied in the laboratory using species whose ecology typically remained unknown. New technological advances in DNA-sequencing technology now enable researchers to study evolutionarily interesting lineages from a genetic and genomic standpoint. Ecological model systems can now become genetically accessible.
Cichlid fish have long attracted the attention of evolutionary biologists because of their extraordinary species richness and phenotypic adaptive diversity (Figure 1). Cichlids are famous for being extremely diverse in terms of not only body shape but also body coloration. This is well captured by the German common name for this family of fishes, Buntbarsche, or "colorful perches." Cichlids present extraordinary cases of evolutionary parallelism in ecological morphologies. Convergent phenotypic evolution is generally interpreted as a strong indication of adaptive evolution. Parallelism has been documented in the main adaptive radiations of both African (1) and Neotropical cichlids (30) and involves traits with a known adaptive function, such as hypertrophic lips (which facilitate foraging for larvae in rocky crevices), pharyngeal jaw morphology (which enables the use of novel food sources), and traits under sexual selection (e.g., body coloration) (Figure 2). Therefore, this unique biological system allows the meaningful investigation of many of the main questions in the field of adaptation genetics (see sidebar Adaptation Genetics).
Many species that differ in traits relevant for adaptation, speciation, and mate choice can be readily hybridized in the laboratory (135). This opens exciting possibilities for analyzing the genetic basis of adaptation and speciation in naturally occurring phenotypes and allows investigators to address some of the fundamental and largely unsettled questions in evolutionary genetics. However, the low level of genetic divergence also poses considerable challenges for phylogenetic reconstruction. Fortunately, genetic and genomic tools have evolved at an unprecedented pace in the past decade, and methods are now available to address both of these questions (see Section 3).

A Natural Mutagenesis Screen
Closely related cichlid radiations under a few million years of age exhibit phenotypes that are often highly diverged (Figure 2). However, because in many cases these species can still be crossed and produce fertile offspring, cichlids have been referred to as "natural mutants" (73). This term was first used (by one of us) for diverse but closely related species of zebrafish, in an article introducing the concept of comparative evolutionary developmental and molecular work within a known phylogenetic context among closely related and interbreeding species (94). The "natural mutant" analogy is meant to emphasize the power of the genetic raw material presented among phenotypically diverse and interfertile cichlid species. But it must also be emphasized that the phenotypic diversity in cichlids is fundamentally different from that of traditional laboratory mutants.
The within-species persistence and among-species fixation of polymorphisms in "natural mutant" lineages is unlikely to depend on mutation pressure; rather, it has resulted from natural and sexual selection. Teleost models have already proven useful in understanding human evolutionary history and disease, particularly with regard to pigmentation (17). However, the molecular mechanisms that underlie phenotypic variation in natural populations are likely to be more complex than those found in mutagenesis screens, which consist largely of missense or nonsense substitutions (151). Therefore, research on wild-occurring and phenotypically diverse teleost model systems such as cichlids offers a complementary route to understanding the genetics of naturally occurring variation in humans, which includes genetic disorders of moderate prevalence (125).

ADAPTATION GENETICS
Do parallel phenotypes evolve through parallel evolution at the molecular genetic level? Is there a bias toward coding or noncoding variation in adaptation? What is the role of genetic constraints in adaptive evolution? Does adaptation proceed by the gradual fixation of small-effect loci (39,109,120)? These are some of the major questions in the field of adaptation genetics, and all of them have a direct bearing on long-standing questions in evolutionary biology. The first three address the fundamental issue of how free evolution is to explore phenotypic space. The last concerns the extent of gradualism in adaptive evolution.
Recent research in quantitative genetics has shown that the underlying genetics or genomic architecture of such traits might be even more complex than was previously appreciated (54). The genetic basis of most quantitative traits investigated so far does not seem to be shared across species (9,39). Epistasis probably plays a much stronger role in the genetic architecture of complex traits than was previously thought, and this will impact the effect size in different genetic backgrounds (54). Given these findings, it is not surprising that most of the regions identified through QTL or association mapping cannot be replicated across studies. Related species can often evolve similar phenotypes using different genetic bases, and distant species can converge using the same genes, thus blurring the distinction between parallelism and convergence (4,30).
Cichlids offer a great opportunity to investigate these questions because multiple adaptive phenotypes have evolved repeatedly. Genetic mapping of the many cases of parallel adaptive traits can address whether these traits involve mutations in the same genes (or chromosomal regions), whether the traits arise through selection of new mutations or are based on standing genetic variation (30), and other fundamental questions in adaptation genetics. For instance, by comparing species pairs of different divergence times that differ in the same adaptive trait, investigators can test the hypothesis that initial adaptation is based on the recruitment of genetic variants of large effect and the subsequent fine-tuning involves variants of smaller effect (109).

Mouthbrooding:
incubation of offspring in a parent's mouth until they are capable of predator avoidance, free swimming, and foraging; the most common form is maternal mouthbrooding

Why Are There So Many Cichlids?
Cichlids figure prominently in virtually all evolutionary textbooks in chapters on adaptation, adaptive radiation, and speciation because this lineage is characterized by one of the fastest known speciation rates (99) and because, owing to their recent divergence times of less than a million to a few million years for the largest radiations (Figure 2), they are well suited for investigations of the initial buildup of reproductive isolation. Several features of cichlid morphology and biology are thought to fuel diversification, including (a) morphological features that affect foraging (e.g., the presence of pharyngeal jaws) (55), (b) morphological traits that affect mate choice and/or adaptation (e.g., nuptial coloration), (c) behavioral innovations (e.g., mouthbrooding and parental care), and (d ) genetic or genomic aspects (Figure 3).
Cichlids exploit a broader range of ecological niches than do co-occurring noncichlid fishes, and this ecological diversification is accompanied by the evolution of a suite of morphological specializations (42,81). Some examples involve the evolution of body shapes that allow foraging in open-water habitats, rocky surfaces, or crevices (5,67). In particular, the cichlid lower pharyngeal jaw, with its unique sutured conformation, is seen as a key innovation that facilitates the exploitation of novel resources (55,81).
The diversity in coloration among cichlid fishes is the result of both sexual and natural selection (8,67). Striped patterns are thought to result from ecological pressures related to background matching, foraging, and communication of motivational status (127). Male nuptial coloration is more diverse and also evolves more quickly. Cichlids with mating systems that intensify female mate choice also have higher levels of diversity in male coloration, as is expected when sexual

Transposon expansion
Gene duplications

Figure 3
Traits and evolutionary innovations thought to influence cichlid success. Most of the speciose cichlid lineages are sexually dimorphic, leading many to view sexual selection as a main factor in cichlid speciation. Morphologies related to diet and foraging performance are also diverse and have evolved multiple times. Morphological specializations include the presence of pharyngeal jaws, the presence of hypertrophic lips, and different body shapes. Behavioral innovations and diverse forms of parental care (biparental care, maternal and biparental mouthbrooding) are also thought to be one reason for this group's evolutionary success. It was recently hypothesized that a number of genomic features might also contribute to cichlid diversification (see Section 3.2). Photographs kindly provided by Oliver Lucanus, Erwin Schraml, Henrik Kusche, Ralf Schneider, and Ad Konings. selection plays a primary role (127). Mate choice has a clear relationship with speciation (see Section 2.2), which has led many to propose that sexual selection on nuptial coloration is likely to be one of the main factors driving cichlid diversification (67,146). This is in line with more general observations regarding the variation in diversification rates (22). The dazzling phenotypic variation in cichlid fishes is thought to have arisen in bursts of diversification (25) triggered by ecological opportunities and environmental changes such as lake level fluctuations (138). Species flocks of up to several hundred closely related species inhabit several large lakes in the East African Rift Valley and small crater lakes in Central America and Africa. The most famous of these are the radiations in Lakes Victoria, Tanganyika, and Malawi in the East African Rift Valley, in which more than 1,000 species live. It is important to note that, in these cases, not all species occur throughout an entire lake; in fact, only very few of the endemic species do. Rather, the overall species diversity of the hundreds of endemic species of these lakes is a summation of local diversity, with most of the species associated with either shallow or benthic environments and typically living in only a small part of a given lake.
Speciation is typically a local phenomenon, in which strict habitat preferences and competition combined with heterogeneous and fragmented habitats lead to barriers to gene flow-and hence to local adaptations and speciation along the long lakeshores. Apart from the cases where speciation is thought to have taken place under in situ sympatric conditions and in the presence of gene flow (6), most radiations are the result of a more complex history involving repeated bouts of allopatric speciation, typically along the lakeshores. In addition to this historical complexity of the geographical setting, it was recently suggested that the young, large East African haplochromine cichlid species flocks experienced a complex biological history of repeated colonization and introgression (59, 122).

Scope and Aims of the Present Review
In the decade since the last broad review of cichlid genomics (67), research on topics such as genetic mapping, transcriptomics, and genomics in cichlid fishes has flourished. In particular, the recent completion of five cichlid genome sequences by the Cichlid Genome Consortium (D. we summarize what has been learned in cichlid genomics and genetics and point out areas that have lacked attention so far but seem like they may prove exceptionally fruitful. We also summarize some of the major developments in sequencing technologies and genetic mapping approaches that are applicable to cichlids and will allow closing the gaps, especially the linking of DNA sequence variation with relevant phenotypic variation. First, we outline the major questions that cichlid biology can help to address, with an emphasis on adaptation and speciation.

Then and Now
As is frequently pointed out, Darwin's seminal work says very little about the mechanisms of speciation per se (20). This is because Darwin adopted a nominalist species concept and viewed species as varieties that are different enough to earn a distinctive name (91). The designation of species status was more or less a matter of convenience and convention. Darwin viewed adaptation and speciation to be rather continuous processes that flow into each other. He viewed natural diversifying selection as the force that would both lead to better adaptations and also, eventually, to the origin of new species. Parenthetically, another less appreciated but perhaps more interesting explanation is that Darwin faced difficulty in explaining how natural selection could lead to speciation (139). How could selection favor the spread of the maladaptive traits that are characteristic of species boundaries (e.g., hybrid sterility and unviability)?
By defending the reality of species and establishing the still dominant biological species concept, Mayr, Dobzhansky, and others paved the way for the field of speciation genetics. The research program on the genetics of speciation became straightforward: Understanding the genetics of speciation meant understanding the genetics of isolating mechanisms (142) (see sidebar The Genetics of Speciation).
The major contribution of this research program on speciation has been to elucidate the genetics of postzygotic isolation by identifying general patterns and mechanisms such as Haldane's rule (141) and Bateson-Dobzhansky-Muller incompatibilities (142). Natural selection, as opposed to genetic drift, is viewed as the main force driving the evolution of isolation mechanisms, and most speciation is thought to occur in allopatry (22). Whether this is representative of the entire speciation spectrum has been questioned, as these studies typically focus on species pairs that diverged several million years ago (144). Importantly, loci that presently maintain divergence and isolation did not necessarily contribute to the initial divergence of species. Many loci are expected

THE GENETICS OF SPECIATION
Work on model organisms, especially Drosophila, has uncovered multiple loci that contribute to intrinsic postzygotic genetic isolation (22,100). Laboratory experiments have shown that postzygotic isolation can evolve rather easily under a divergent selection regime and that even sympatric speciation can be observed in the laboratory (118). Furthermore, sex chromosomes have been proposed to be disproportionately important to the evolution of reproductive isolation (112), an idea this is supported by the very existence of Haldane's rule (112,141). The Bateson-Dobzhansky-Muller incompatibility model has been extremely successful in explaining the origin of postzygotic incompatibilities in allopatry (107). A pattern that emerged from work on Drosophila is that the degree of isolation correlates with genetic distance (21,22). In other words, intrinsic postzygotic isolation is a mere consequence of isolation. to be recruited and diversify after isolation is already established and should not be considered genes that cause speciation, i.e., speciation genes (103). Therefore, only studies that focus on populations in the process of undergoing divergence can address the initial phase of speciation (144); however, the drawback of focusing on diverging populations is that many incipient species might never complete the process of speciation (100,128).
In contrast to empirical studies, theoretical research on speciation has always focused on sympatric speciation (11,35,142). The absence of geographic barriers allows more explicit modeling of the interplay of evolutionary processes that lead to speciation. In fact, references to the geographic setting of speciation have clearly declined, and the importance of the distinctions between the geographic settings of speciation has been questioned (37,87).
Although there is consensus that selection is the main factor driving speciation, it does not necessarily follow that adaptive phenotypic evolution drives speciation, because selection does not necessarily imply adaptation to the environment (109). In what has been touted as a return to Darwin (126), the concept of ecological speciation has recently gained prominence in the speciation literature (101). This term describes the situation in which disruptive natural selection plays a direct role in the split of two lineages. In this scenario, speciation is adaptive and mating barriers other than intrinsic postzygotic reproductive isolation play a more central role. The emphasis therefore shifts from the genetics of intrinsic postzygotic reproductive isolation to that of extrinsic (ecological) postzygotic and prezygotic reproductive isolation.

The Genetic Architecture of Speciation with Gene Flow
Theoretical models of speciation in the presence of gene flow almost invariably (but see 69) suggest that two processes-disruptive natural selection and nonrandom mating-must operate in order for two populations to diverge with gene flow, and that linkage between them facilitates speciation (11, 101) (see sidebar Mechanisms of Nonrandom Mating). Thus, speciation with gene flow requires three traits: the trait under selection, mating preference, and the cue for mating preference (the trait used to choose mates, such as color) (101). Because speciation can be understood as a lack of recombination between the loci underlying these traits, the genetic and genomic architecture of the traits is of central importance in determining the stringency of the conditions under which two populations are expected to diverge (131) (Figure 4).
Several genetic and ecological factors can lead to linkage disequilibrium between these loci (101). Ecological factors could occur in the form of selection against hybrids. Genetic mechanisms involve pleiotropy, close genetic distance resulting from physical proximity, or colocalization within recombination cold spots (34). Figure 4 illustrates these possible pleiotropic relationships.

MECHANISMS OF NONRANDOM MATING
Nonrandom mating can arise through sexual selection or assortative mating. Assortative mating is defined as mate choice based on an individual's own phenotype/genotype (155). Positive assortative mating is a bias toward mating with a like partner; negative assortative or disassortative mating is a bias toward mating with an unlike partner. Assortative mating is a powerful mechanism for the maintenance of divergence in sympatric populations (140) and is also seen as central to the building of divergence in ecological speciation (89,101). Despite affecting primarily genotypic frequencies at the loci that cause nonrandom mating, assortative mating does lead to genetic divergence in neighboring linked regions (140). The spread of such a genomic island of differentiation can have consequences for sympatric speciation, as it can lead to genetic hitchhiking (145). Darwin (23) attributed the evolution of structures that do not aid survival, and many times impair the individuals presenting them, to sexual selection. Apart from a few exceptions, females are the choosy sex. Females are by definition the sex that contributes larger gametes, and sexual selection is the necessary outcome of unequal gamete contribution (152). Anisogamy (unequal gamete size) is a consequence of natural selection acting at the individual level (10). Although the existence of more than two gamete sizes (i.e., sexes) is possible, it is evolutionarily unstable (53).
Under a classical sexual selection model, in which females choose males, there are two separate, sex-limited traits with at least two alleles each (3). Because both males and females possess the marker trait in assortative mating, it can conform to a one-allele model, the most conducive to speciation (35). This can be understood as a "mate with like" allele, and if this allele is fixed, it is essentially a one-locus model of nonrandom mating where only variation for the cue segregates. Consider, for instance, a case of two color morphs-say, gray and orange. A one-allele preference gene ("mate with like") would result in gray-gray and orange-orange morph matings regardless of the background. Assortative mating alone can allow for speciation even in the absence of a source of selection, but the conditions are highly restrictive (69).

Recombination landscape:
the variation in recombination rates across the genome, usually expressed as the ratio of genetic to physical distance (centimorgans per megabase) More empirical work is needed to assess the relative importance of pleiotropy in building and maintaining reproductive isolation, and this work will be fundamental to refine the current theory of speciation.

Open Questions in Speciation Research
Theoretical work has already pinpointed the conditions under which sympatric speciation is expected to occur (11,37,38). However, few if any empirical case studies satisfy all the conditions to test these predictions (101). Although much progress has been made in speciation genetics, the number of speciation genes identified is still modest (100,101,103).
The theoretical predictions laid out in the caption for Figure 4 regarding the pleiotropic relation between the alleles responsible for speciation can be tested only by forward genetic studies of mating cues and preferences in systems where sympatric diversification is occurring. It is crucial for the success of these studies that (a) the model system used is currently undergoing speciation with gene flow and (b) the trait conferring reproductive isolation is amenable to genetic analysis. Cichlids are one of the few groups that satisfy these two criteria.

Physical and Genetic Maps
Chromosomal rearrangements such as inversions affect the recombination landscape and consequently the evolution of sex chromosomes and the maintenance of coadapted genes in the face of The genetic architecture of sympatric speciation. Overlapping areas represent pleiotropy between traits. White roman numerals label each area, and black arabic numbers within each area indicate the minimum number of loci that underlie the three traits (mating preference, cue for nonrandom mating, and target for natural selection). The association between the three traits is most likely to be eroded by recombination when all three traits are governed by different loci (areas I, V, and VII). An obvious prediction is that extensive pleiotropy (area III) between alleles in all three loci should be more conducive to speciation (134). Most theoretical models postulate the existence of marker traits (traits that are cues for mating preference) and mating preference traits with different genetic bases (101), represented by all areas except II and III. Much theoretical work has investigated how the linkage between markers and preferences evolves and impacts isolation (130), but the possibility that marker and preference traits are pleiotropic (areas II and III) has only recently gained empirical support (43,71,132). "Magic traits" normally refer to pleiotropy between the mating cue and the ecological trait under selection that separates groups of individuals during incipient speciation (areas III and VI), but pleiotropy between mating preference and the trait under selection can also be relevant for speciation (131). gene flow and speciation (see sidebar Chromosome Evolution and Speciation). In cichlids, chromosome numbers are remarkably stable, and differentiated sex chromosomes are largely absent (111,117). The modal chromosome number is 2n = 44 in the African cichlids and 2n = 48 in the Neotropical sister lineage to the African cichlids (111). However, despite the stability of chromosome number, several large-scale chromosomal rearrangements have been documented. Multiple chromosomal rearrangements, including fusions and inversions, have occurred since the divergence of two haplochromine species, Haplochromis chilotes and Astatotilapia burtoni (74). Many populations segregate for various inversions within the Neotropical genus Geophagus (110). An analysis of paired-end genomic sequences detected several thousand rearrangements among the five cichlid genomes sequenced by the Cichlid Genome Consortium (S.H. Fan, manuscript in

CHROMOSOME EVOLUTION AND SPECIATION
The patterns of the spread of differentiation across a genome have attracted considerable interest recently (61,62,101). Structural variations can contribute to speciation either directly or indirectly. The influence is direct when hybrids heterozygous for structural variations have reduced fitness resulting from abnormal meiotic segregation (i.e., the rearrangement is underdominant). Several factors, including sex-linkage, positional (centromeres, telomeres), and structural variations (e.g., inversions and rearrangement breakpoints), influence the recombination landscape. The genomic landscape of recombination rates (usually measured in centimorgans per megabase) has a major impact on the spread of islands of differentiation (97).
That the spread of underdominant rearrangements poses considerable theoretical difficulties has long been recognized (142). The indirect consequences of structural variations are far less controversial, and several theoretical studies have investigated the factors that can lead to the spread of inversions (16,63). Numerous empirical studiessome of which combined genetic mapping of speciation traits and population divergence in diverse systems, including flies (72,93), sticklebacks (57), monkey flowers (36), and humans (98)-strongly support the idea that chromosomal inversions can lead to accelerated divergence in the presence of gene flow. A particularly clear observation is that sex chromosomes contribute disproportionately to speciation (112); these contributions are likely to reflect the distinctive properties of sex chromosomes with regard to recombination rates. It has been pointed out that although structural variations facilitate genomic divergence, they are not necessary for the buildup to reproductive isolation, particularly in the presence of strong ecological selection (34). preparation). Whether any segregating inversion includes genes related to adaptation and speciation remains to be investigated.
As is the case with most teleosts (26), cichlids are characterized by the general absence of heteromorphic sex chromosomes and evolutionarily labile sex-determining mechanisms; note that the absence of morphologically differentiated sex chromosomes does not imply the absence of sex chromosomes altogether. Ser et al. (129) determined that alternative ZW and XY systems occur in some Lake Malawi cichlids. The direct observation of chromosome pairing and synaptonemal complex formation in tilapia meiosis showed the existence of a small distal region of incomplete/delayed pairing, which might indicate a sex-determining function (40). Further work on genetic mapping revealed evidence of an XY system controlled by a major gene on linkage group 1 (76).

The Cichlid Genome Project
Five cichlid genomes were recently sequenced in an effort to investigate whether particular features of cichlid genomes can be correlated or even causally linked to their explosive rates of diversification. The project was initiated by the sequencing of an inbred specimen of tilapia, Oreochromis niloticus. The additional species were chosen to represent the major African radiations as well as riverine species that are commonly used in research on the hyperdiverse cichlids. Neolamprologus brichardi, an endemic species from Lake Tanganyika, was chosen as the representative from the tribe Lamprologini, the most species-rich lineage from Lake Tanganyika. Astatotilapia burtoni is a riverine haplochromine cichlid that has a long history of research on its behavior (45) and genomics (75,124). Metriaclima zebra and Haplochromis nyererei are the representatives of the Lake Malawi and Lake Victoria radiations, respectively (Figure 3). The tilapia genomic scaffolds were anchored to linkage groups (D. Brawand, C. Wagner, I.Y. Li, M. Malinsky, S.H. Fan, et al., manuscript in preparation). Substantial differences in many genomic functions were reported, including in gene duplication, copy-number variants, microRNA evolution, and transposable elements (TEs), all of which were found to be enriched in the haplochromine lineage.
Gene duplication is seen as a major process that allows for the evolution of phenotypic diversity-or, to quote Ohno, "natural selection merely modified while redundancy created" (95). Because the original gene function is maintained by one of the gene copies, duplicated copies can break pleiotropic constraints and are freer to evolve new functions (neofunctionalization) or specialize in particular functions (subfunctionalization) (70). Naturally, there are cases where the increase in gene expression (dosage) that is associated with a recent duplication event will be deleterious and can lead to the selective pseudogenization of one of the paralogous copies (70). Differential retention of color genes, for example, is considered one of the explanations for the diversity in coloration seen in fishes (12). Whether duplications contributed to the haplochromine cichlid radiations has been investigated recently using in silico methods (D. Brawand, C. Wagner, I.Y. Li, M. Malinsky, S.H. Fan, et al., manuscript in preparation). The haplochromines investigated so far did indeed have more duplicated genes than the ancestral riverine species they were compared to. However, the differences in copy-number variants seem to be relatively modest, and in the absence of direct functional evidence, it seems unclear whether this process was one of the major driving forces of the diversification of the cichlid radiations.
Analysis of the cichlid genome sequences also revealed that haplochromines underwent an expansion of TEs (D. The cichlid genome project has identified many genomic processes that might contribute to cichlid diversification. However, the explanatory power of these findings is limited owing to the multitude of factors identified and the correlative nature of the data. The impacts of each of these particular mechanisms must be further investigated with functional genomic methods, which we review in the next section.

Genetic Mapping and Forward Genetic Techniques
The past decade has seen a boom of forward genetic studies in cichlids. Multiple linkage maps have been constructed (68,77,113,124), and traits such as sex, coloration, and ecomorphologies have been mapped (2,13,18,137).
The link between phenotypic and DNA sequence variation is more successfully established for traits with a simple genetic basis. Examples of traits determined by a single locus in cichlids include the orange blotch phenotype (119), the Midas cichlid gold polymorphism (50), and dorsolateral stripes in Haplochromis sauvagei "rock kribensis" (F. Henning, H. Lee, P. Franchini & A. Meyer, manuscript in preparation). However, most adaptive phenotypes in the wild are quantitative and thought to have a polygenic underlying genetic basis. Considering the sheer number of loci expected to contribute to polygenic traits, Rockman (120) has questioned whether identifying particular nucleotide substitutions has any bearing on understanding adaptive evolution. Analyzing F 2 generation: filial 2 generation, that is, the second generation obtained by intercrossing F 1 offspring (filial 1 generation) of a cross of the P generation (parental generation) Infinitesimal model: a statistical-genetic model that assumes that the genetic architecture of complex phenotypes can be treated as an infinite number of unlinked loci with small additive effects the complexity of the genetic architecture and the role of epistatic or pleiotropic interactions is arguably more relevant for understanding adaptive evolution than for identifying individual smalleffect loci, most of which are expected to be common in populations (156). Therefore, the process of combining favorable alleles through recombination and selection, and not the alleles themselves, is the "stuff " of evolution.
Morphological traits such as feeding morphology follow the prevalent model of many genes of small effect. As many as 50 factors contribute to cichlid jaw morphology (1), and quantitative trait locus (QTL) mapping identified more than 40 regions in 17 linkage groups that were associated with this trait (1). As expected for a trait under selection (108), the individual QTL effects were in the same direction more frequently than would be expected by chance, indicating that natural selection played a role in the phenotypic divergence of Labeotropheus fuelleborni and Metriaclima zebra (1).
Another class of methods to investigate the genetic architecture of quantitative traits involves the analysis of segregation patterns of genetic crosses between species differing in the trait of interest. This design has been employed to investigate the number of genes and general genetic architecture of several coloration traits. Most of these studies suggest that the traits that are likely to have an impact on speciation and adaptation can have relatively simple genetic bases. In particular, coloration traits in Lake Malawi cichlids may be controlled by only a few loci with large effect. Barson et al. (8) found that the body coloration differences between Metriaclima zebra (blue body with dark vertical bars and fins) and Metriaclima "gold" (yellow body with dark bars) can be accounted for by as few as seven autosomal loci (8). The divergent color patterns of Metriaclima zebra and Metriaclima mbenji (pale blue body with no vertical bars and yellow fins) were found to be controlled by up to only four loci, and the different traits analyzed were highly correlated in the F 2 generation (105). This second observation suggests that these effects are pleiotropic or the genes underlying the different traits are in linkage. On the other end of visual communication, the loci controlling expression differences (expression QTLs, or eQTLs) of opsin genes were also found to be few in number (13,106).
Taken together, the results obtained in genetic mapping in cichlids thus far suggest that many traits related to the rapid radiation of cichlid fishes might have simple genetic bases, primarily from the contribution of major-effect loci. These findings contrast with the expectation from the infinitesimal model (31) and, if upheld by future studies with higher resolution, might also help explain cichlid diversity. Owing to its homogenizing effects, recombination counteracts speciation. It could therefore be expected that traits controlled by fewer loci might be more conducive to speciation. It is tempting to think that this finding could also provide evidence for the notion that larger-effect loci are more important in the initial phases of adaptation and smaller-effect loci are recruited afterward for fine-tuning (109). However, several factors, most notably experimental artifacts (e.g., the Beavis effect or lack of power), are likely to lead to the incorrect estimation of effect sizes (85,133).
In general, genetic mapping studies in cichlids have lacked the power to reach the gene or nucleotide level. Mapping to the gene level has been successful in one case (119,137), but no study has yet succeeded in mapping to the sequence level and then performing functional validation. If populations segregating for a trait of interest are available, then association or admixture mapping can be used to refine the candidate interval and identify causal variants (136). Following the identification of a candidate interval (using recombinant breakpoint analysis) and an expression difference in one positional candidate (using quantitative real-time polymerase chain reaction), this approach was used to map the orange blotch phenotype to a cis-regulatory element of the Pax7 gene (119).
Several recently developed methods are poised to accelerate the pace of insights provided by forward genetics. Reduced representation libraries, in particular so-called restriction-site-associated DNA sequencing (RAD-seq) (24), allow the discovery and genotyping of thousands of polymorphic loci and can be used to map traits to marker intervals. This method has been applied in a few studies in cichlids for genetic mapping (104,106,113). The number of markers obtained is clearly no longer an issue, and therefore resolution will increase hugely. In fact, the current RAD-seq-based maps are saturated and have many more markers than can be resolved (104,113). The only way to make full use of so many markers is to observe more recombination events by increasing the size of the genetic mapping panels. An additional benefit is that this will also increase the power of QTL mapping studies. Having too many markers is a rather luxurious problem that allows and actually demands the application of more stringent quality control to improve the mapping quality. RAD-seq data are known to generate many forms of genotyping errors (some systematically) and missing genotypes (149). These markers can be identified by several features implemented in the most commonly used linkage mapping software, particularly by measures of segregation distortion, and should be excluded (143).
Target enrichment is a particularly attractive feature because it allows for simultaneous marker discovery, fine mapping, and finally causal polymorphism identification (27,82). Methods for functional validation have been employed with success in African cichlids (60). The CRISPR/Cas9based method is a highly flexible targeted mutagenesis technique (56) that offers exciting prospects for the functional validation of genetic mapping results.

Transcriptomics and Expression
The detection of expression differences in relevant cichlid phenotypes using microarrays (46,116), comparative hybridization (115), or RNA sequencing (RNA-seq) is one promising experimental approach (19,49,88). In particular, the recent development of RNA-seq methodologies has greatly facilitated the identification of differentially expressed genes in nonmodel organisms (148). RNAseq has already been applied to cichlids to investigate the molecular bases of a color polymorphism (49), ecomorphological traits (19,88), and even phenotypic plasticity (47), a particularly interesting and promising avenue of research on the causes of cichlid diversification.
The diversity of transcripts has been described in several cichlid species. This includes annotated expressed sequence tags and open reading frames from African cichlids (65,123,150) as well as the transcriptomes of Neotropical cichlids (29,32). The general pattern that emerged from these studies is of very limited coding sequence divergence of intralacustrine species assemblages. This result is not unexpected, however, because many of these species are extremely young (some species pairs are thought to have diverged less than 100,000 years ago), and selection on the exons of protein-coding genes might be restricted to a subset of genes having to do with coloration, perception of color, or fertilization (44).
These analyses provide tools for investigating the molecular patterns of trait development (ontogeny) but can be of limited use for understanding the evolution of the trait (phylogeny). Although in principle causal loci can also be identified by expression patterns, the results are difficult to interpret in the absence of genetic linkage data. Unless the genome is reduced to tens of positional candidate genes, fold change cannot be equated with causality. Furthermore, the magnitude of fold change is not necessarily higher than that of downstream targets. Causal loci may or may not be included in the list of differentially expressed genes for several reasons. There is an unsettled debate about whether most adaptive mutations are regulatory or coding (14,52), but clearly both play a role. Even when the molecular aspects of the phenotype are well known, predicting which will be the case is not trivial.
Sequencing-based approaches for the study of expression, such as RNA-seq, allow the simultaneous characterization of expression level and sequence comparison; in conjunction with forward genetics, these approaches provide a powerful tool for mapping causative variants (96). If the causative variant is coding, then one would expect to find a sequence polymorphism in the coding sequence of the positional candidate genes when comparing the phenotypes. If it is regulatory, then one would expect different expression levels or splicing patterns (provided that the temporal and spatial patterns of expression of the phenotype are known).
Reed et al. (114) used a clever approach that combined forward genetics (both family-based and admixture mapping) and next-generation sequencing expression analysis (tiled RNA-seq) to analyze Heliconius butterflies. The region that determines the wing color patterns was mapped to a 380-kb genomic interval containing more than 20 genes. The authors then investigated the expression levels of all the positional candidates simultaneously using a targeted capture approach and found a single gene in the interval that was differentially expressed between phenotypes. No coding sequence variation could be found, and the upstream region of optix was significantly associated with the phenotype in a natural admixture zone. Therefore, the authors concluded that a cis-regulatory polymorphism affecting optix expression determines this phenotypic difference. Now that cichlid genome sequences are becoming available, even more cost-and time-effective methods combining bulk segregant analysis with RNA-seq will be applicable for gene mapping in cichlids as well, and these promise to significantly advance our understanding of the genetic/genomic basis of phenotypic diversification, adaptation, speciation, and species differences (51, 96).

Phylogenomics
The fields of phylogenetics and population genetics have converged thanks to theoretical and methodological advances (15). Conceptually, a shift from merely describing the patterns to investigating the process of divergence is notable. How divergence spreads across the genome during speciation in the face of gene flow has received considerable attention recently (33,102,145).
With the goal of classification, systematists trying to estimate species trees largely thought of gene-tree incongruences as a nuisance (48). Those more interested in understanding the process of speciation than in classifying the tips of a phylogenetic tree consider the more interesting and relevant question to be how genomes sort out in lineages. Individual gene trees might not recapitulate the species trees in which they are embedded for a variety of reasons (86), including incomplete lineage sorting, hybridization, introgression, and even mutational bias. Other processes that generate incongruent signals even within a single gene are exon shuffling, gene conversion, and intragenic recombination (when multiple haplotypes are segregating in a population). Particularly in young radiations, numerous gene trees will not conform to the estimated species tree (Figure 5). The actual phylogeny probably resembles a cloud of gene trees (86), and this complexity is not captured by the traditional dichotomous approach to phylogenetic reconstruction. The explicit modeling of factors causing incongruence might shed light on processes that are of interest to understanding the evolutionary dynamics of cichlids. Although this has been pointed out for nearly 20 years (86) and touted as a paradigm shift (28), the explicit modeling of gene and species trees has been applied only recently to research on the cichlid radiations.
In the absence of assembled genomes, studies investigating the patterns of genomic divergence in cichlids have been limited to genome-wide scans for differentiation outliers (61), and the lack of QTL maps of relevant phenotypes is still a major impediment to assigning differentiation outliers to particular phenotypes. The accumulation of results from genetic mapping studies will also provide a means to assign these regions to particular phenotypes, as has been done in sticklebacks (57). Gene trees in species trees. (a) Sources of discordance. A gene lineage that is concordant with the species tree is shown in green. The main factors that promote incongruences between gene trees and species trees, particularly in young radiations, are incomplete lineage sorting (blue) and introgression (dark yellow). (b) Functional phylogenomics. Genes that are causally involved in the initial phases of speciation are expected to be among the first to diverge between incipient species. By contrasting gene trees of functional categories or genes suspected to influence speciation and adaptation against the species tree-inferred through hundreds of randomly located loci, represented as a green gene cloud-it might be possible to test for a role in speciation and whether adaptation occurred via introgression.
The studies that have sought to either estimate the occurrence of or explicitly model incongruences between gene trees and species trees have all found that discordance is widespread in cichlids. The genome-wide phylogenetic reconstruction of the five cichlid genomes suggests that only approximately half of the genome is completely sorted within the haplochromine lineage (D. Comparing the observed coalescence estimates with those based on simulations can distinguish incomplete lineage sorting from hybridization, and this approach has been applied in several studies in cichlids. In the Amazonian genus Cichla, mitochondrial DNA is likely to have been introgressed (153), whereas in the Neotropical genus Satanperca, the patterns of discordance are more likely due to incomplete lineage sorting (154).
A priori, the expectation is that, in the vast majority of cases, hybridization will cause negative effects, interrupt coadapted gene complexes, and produce inviable, less fit, or sterile offspring (90). However, hybridization is sometimes seen as a mechanism by which adaptive alleles can be exchanged in the process known as adaptive introgression (80). Joyce et al. (59) proposed that in the African radiations, riverine species act as transporters of genetic variation between the lake radiations, and this idea was subsequently confirmed in a panel of 200 single-nucleotide polymorphisms ascertained from Lake Malawi (83). The authors of the latter study even found that a large proportion of the polymorphic sites of the haplochromine cichlids of Lake Malawi are shared with species of other radiations. In particular, 8% of the single-nucleotide polymorphisms were considered differentiation outliers, and half of these are also present in other radiations. This finding led the authors to ask whether many of the cases of interlacustrine evolutionary convergence actually reflect deep molecular parallelism acting on genetic variation that is carried into the radiations by the riverine ancestors of the endemic cichlids. The importance of standing variation is also becoming more apparent as a source of repeated evolution in many cases of parallel evolution (30).
Newly developed methods using next-generation sequencing have revolutionized the field and allowed the use of numbers of loci orders of magnitude larger than can be used in Sanger sequencing-based approaches (79). A particular method that is bound to revolutionize the field is what has been called anchored phylogenomics. This method relies on hybrid capture followed by next-generation sequencing of hundreds of loci and can efficiently generate well-resolved gene trees (78,92). Although reduced representation libraries of the genome, such as RAD-seq, generate thousands of informative polymorphisms, they are not applicable to deep divergences and are not comparable across studies (i.e., the loci will not be shared across lineages) (58). Other approaches, such as amplicon or RNA sequencing, are relatively inefficient in terms of the number of loci, labor intensity, or quality of input material (79).
RAD-seq markers and mixture-tree approaches (79) have been applied to a sample of the Lake Victoria radiation and resulted in a highly resolved and well-supported phylogeny (147). These results are encouraging and show that the current methods have sufficient resolution for the cichlid radiations, but they should be met with caution. Concatenation resembles a democratic vote, and under certain conditions (likely to be met in some cichlid radiations), the most frequent gene tree is not necessarily concordant with the species tree (64,121).
Phylogenomics (based on hundreds of genes sampled throughout the genome) in cichlids is an exciting research avenue for several reasons. First, the sheer number of loci that can be obtained by this method promises to finally resolve even the shallowest divergences. Second, the factors that promote incongruences between gene and species trees, such as introgression and incomplete lineage sorting, are expected to be particularly prevalent on shallow phylogenetic timescales (64). Third, a particularly exciting prospect is what could be called functional phylogenomics: the contrast between gene trees from numerous random loci scattered throughout the genome and those of particular genes or functional categories. Could one also see the patterns of recruitment and genomic differentiation as gene-tree clouds with different coalescence times ( Figure 5)? The loci identified through genetic mapping of speciation traits can be contrasted with the rest of the genome, and early sorting could offer additional evidence for a role in speciation. Functional categories (e.g., coloration genes or genes expressed in sperm or eggs) that are sorted early during diversification could indicate that they play a role in species isolation. Naturally, even if a few color genes do underlie early divergence, this probably will not be reflected by the coalescence of an entire category. Loh et al. (83) hypothesized that riverine cichlid species act as transporters of genetic variance and that several seemingly convergent phenotypes are in fact deeply parallel. This can also be tested by contrasting the gene trees of genomic regions that map to parallel phenotypes (e.g., hypertrophic lips and coloration patterns) to the genomic cloud of gene trees (57). . Many other laboratories have also undertaken in-house genome sequencing projects for other cichlid genomes, and as sequencing costs continue to plummet, even more ambitious projects can be planned. The Drosophila melanogaster Genetic Reference Panel provides an example of such a collaborative effort (84). Many cichlid species with relevant phenotypic differences are probably comparable to or even younger than Drosophila strains. The establishment of a similar project would allow the investigation of several fundamental questions in cichlid evolutionary genetics, and would in particular allow investigators to carry out speciation research with unprecedented efficiency.

OUTLOOK: CICHLIDS IN THE POSTGENOMIC ERA AND THE MOVE TOWARDS FUNCTIONAL GENOMICS
In the past decade, considerable progress has been made in cichlid genomics and genetics, although research on cichlid diversity has so far continued to be mainly descriptive. Most studies have focused primarily on investigating the patterns of diversification and not the processes that generate them. A change in approach would be necessary to finally fulfill the cichlid model's potential to test theories rather than describe patterns. We outline some potentially fruitful research avenues below.
The cichlid genome project has identified several factors (e.g., a higher rate of gene duplication, novel microRNAs, expansion of TEs, genome-wide signatures of directional selection, and increased rates of regulatory evolution) that could potentially influence the diversification of haplochromine cichlids (D. Brawand, C. Wagner, I.Y. Li, M. Malinsky, S.H. Fan, et al., manuscript in preparation). The functional aspects and relative contributions of these genomic mechanisms can and should be investigated.
Forward genetic approaches have a relatively brief history in cichlid biology, and only recently has information on the genetic basis of relevant traits begun to accumulate. Even then, no study has focused on understanding the genetics of traits that clearly influence speciation, such as mating preference or male nuptial coloration. Comparing the genetic architectures of parallel traits mapped in recently and more anciently diverged species pairs could allow investigators to test whether genes of major effect are involved in the earlier stages of adaptation, as predicted by Orr (109). Additionally, once mapped, QTLs and genes provide powerful markers for ecological genetic experiments. In the case of the adaptive phenotypes that have been mapped, one could conceive of experiments to, for instance, measure the strength of selection on allele frequency in more realistic experimental settings, as has been done in other model systems (7,82). The generation of large panels of F 2 populations segregating for traits that influence species recognition could also allow investigators to pinpoint which of the traits are more important for mate choice. The functional phylogenomic approach we propose here could also help to provide a better understanding of the processes leading to improved adaptations and confirm whether traits (with known genetic bases) have caused or were recruited during the process of genomic differentiation.
These are exciting times for cichlid biologists. In the postgenomic era, generating genome-scale data for nonmodel organisms is not the issue, and many questions that were not previously within reach can be addressed. The newly developed genomic resources for cichlids (D. Brawand, C. Wagner, I.Y. Li, M. Malinsky, S.H. Fan, et al., manuscript in preparation) and the development of methods that facilitate linking genotypic with phenotypic variation in nonmodel organisms mean that cichlid biologists are in a unique position to make a considerable contribution to more general and long-standing questions in evolutionary biology and genetics.

FUTURE ISSUES
1. High-resolution mapping is needed for speciation traits, in particular mating preferences and cues.
2. Genetic mapping is needed for the many cases of parallel adaptive traits in order to address questions such as whether these traits involve mutations in the same genes (or chromosomal regions) and whether selection recruited new mutations or standing genetic variation (30).
3. Functional and experimental follow-up studies should be performed on the evidence of genomic correlates of diversification obtained from the cichlid genome project (e.g., a higher rate of gene duplication, novel microRNAs, expansion of transposable elements, genome-wide signatures of directional selection, and increased rates of regulatory evolution).
4. Phylogenomics with explicit modeling of incongruences between gene and species trees should be used to resolve relationships within the species flocks and address the role of introgression in cichlid adaptation and diversification.

5.
A more hypothesis-driven research program should be developed that focuses on testing theoretical predictions on the genetics of speciation and conducting functional and experimental ecological genetics studies.

DISCLOSURE STATEMENT
The authors are not aware of any affiliations, memberships, funding, or financial holdings that might be perceived as affecting the objectivity of this review.