A genome-wide mapped Chlamydomonas mutant library enables high-throughput genetic studies in photosynthetic eukaryotes
Photosynthetic organisms provide energy for nearly all life on Earth, yet half of their genes remain uncharacterized. Characterization of gene functions could be greatly accelerated by new genetic resources in unicellular model organisms. Here, we generated a genome-wide, indexed library of mapped insertion mutants for the green alga Chlamydomonas reinhardtii, which includes disruptions in 83% of all nuclear genes. The abundance of individual mutants can be tracked with unique DNA barcodes, allowing the library to be screened as a pool. We demonstrated the power of this platform by performing a genome-wide screen that identified 3,109 mutants with defects in photosynthetic growth. Multiple alleles allowed identification of 44 genes required for photosynthesis, 21 of which are novel. Characterization of one of these genes, CPL3, showed it is important for photosystem I activity. The availability of the 62,389 mutants in our library will accelerate characterization of gene functions in photosynthetic eukaryotes.
Plant biology, high-throughput genetics, genomics, insertional mutagenesis, mutant library, photosynthesis, Chlamydomonas, algae, pooled screening, DNA barcodes.
- The first genome-wide indexed and mapped mutant library in a unicellular photosynthetic organism
- DNA barcodes allow quantitative pooled screens under a variety of conditions
- A genome-wide screen revealed 44 genes (21 novel) critical for photosynthetic growth
- One of the novel genes, CPL3, was found to be critical for photosystem I activity
Photosynthetic organisms are the source of our food, fuels, as well as many of our materials and drugs. However, our understanding of photosynthetic organisms at the molecular level is limited. While significant progress in understanding photosynthetic eukaryotes has been achieved using the model vascular plant Arabidopsis thaliana (Provart et al., 2016), the functions of half of their most highly conserved genes remain unknown (Karpowicz et al., 2011).
Genes involved in various biological processes may have evaded identification because of (1) inadequate or insensitive phenotyping methods, (2) the gene’s essentiality (Rubin et al., 2015), or (3) genetic/functional redundancy and compensatory mechanisms (Pickett and Meeks-Wagner, 1995). One opportunity to improve gene discovery in plants lies in advancing the use of model photosynthetic organisms that complement Arabidopsis.
The green alga Chlamydomonas reinhardtii (Chlamydomonas hereafter) offers the advantages of being a unicellular photosynthetic eukaryote that has small gene families, haploid genetics, rapid growth, full genome with a molecular linkage map, and easy propagation in liquid and on solid medium in the absence of photosynthesis (Gutman and Niyogi, 2004; Harris et al., 2009; Kathir et al., 2003). Furthermore, its amenability to robotic propagation offers opportunities for the development of high-throughput functional genomics approaches that have been used so successfully in yeast(Botstein and Fink, 2011). However, the utility of Chlamydomonas has been hampered by the limited availability of mutants in many genes.
To expand the plant sciences toolkit, we sought to develop a genome-wide, indexed and mapped collection of mutants in Chlamydomonas. Individual mutants in this library would allow analyses of candidate gene functions, while a unique DNA barcode in each of the mutants would enable high-throughput phenotyping of mutants in pools (Giaever et al., 2002; Winzeler et al., 1999). Because of difficulties in targeting specific Chlamydomonas genes for disruption (Baek et al., 2016; Jiang et al., 2014; Shin et al., 2016; Slaninová et al., 2008), random insertional mutagenesis remains the most efficient method for obtaining a genome-wide collection of mutants in this alga. Large collections of mutants have been generated and screened by PCR for insertions in specific genes of interest (Cheng et al., 2017; Gonzalez-Ballester et al., 2011; Pootakham et al., 2010), but this approach is not practical for isolating mutants in tens of thousands of genes. In a pilot effort, we developed methods for mapping and maintaining Chlamydomonas insertional mutants that allowed the isolation of 1,935 mutants, covering 8% of all Chlamydomonas genes (Li et al., 2016). We have since developed technologies that enable the unique barcoding of individual mutants and rapid mapping of the insertion sites in a much larger collection.
In this study, we describe the generation and initial characterization of a genome-wide, indexed, mapped, and barcoded library containing mutants for 83% of all Chlamydomonas genes. Mutants in this collection are available through the website: https://www.chlamylibrary.org/. We also established an experimental pipeline that facilitates pooled screens with this library, which we used to identify 3,109 photosynthetically deficient mutants. Multiple alleles allowed identification of 44 genes required for photosynthetic growth, of which 21 are novel. Characterization of a mutant disrupted in the novel gene CPL3 revealed that it is required for normal photosystem I activity.
We generated a genome-wide, indexed, barcoded Chlamydomonas mutant library
We developed a three-step pipeline for generating an indexed and barcoded library of insertional mutants in Chlamydomonas.
First, we generated mutants by transforming cells with barcoded DNA cassettes. Each cassette contained two unique 22 nucleotide barcodes, one at each end of the cassette (Figure 1A and S1, Table S1). Cells were transformed and the resulting mutants were arrayed on agar plates. The barcode associated with the insertion in each mutant colony was initially not known (Figure 1B, barcodes depicted as different colors).
Second, we determined the sequence of the barcodes in each colony through a combinatorial pooling approach (Figures 1C and S2; Table S2; Datasets S1-S3). We generated combinatorial pools of the individual mutants, extracted DNA from the pools, and amplified and deep-sequenced the barcodes from each pool. The combinatorial pooling patterns allowed us to determine the barcode sequences of individual colonies. This procedure was similar in concept to the approach we used in our pilot study (Li et al., 2016), but it consumed significantly less time because a simple PCR instead of a multi-step flanking sequence extraction protocol (ChlaMmeSeq (Zhang et al., 2014)) was performed on each combinatorial pool.
Third, we determined the insertion site associated with each barcode by pooling the entire library, and then sequencing amplicons that each contained a barcode and its flanking genomic DNA (Figure 1D and Methods). Specifically, we PCR amplified the barcodes and flanking sequences from the pools, based on a modification of the LEAP-Seq method previously described (Li et al., 2016) (Figure S3; Tables S3 and S4). The flanking sequences associated with each barcode were obtained by paired-end deep sequencing (Rubin et al., 2015; Wetmore et al., 2015). The final product is an indexed library in which each colony has a known (1) flanking sequence that identifies the genomic insertion site; and (2) barcode sequence, which facilitates pooled screens in which individual mutants can be tracked by deep sequencing (Giaever et al., 2002).
This library contains 62,389 mutants with 74,923 insertions
The procedures described above yielded 127,847 mapped mutants containing a total of 149,581 mapped insertions (Dataset S4) on 569 plates of 384-colony arrays. Some mutants contained more than one insertion. We consolidated the original set of mutants to a smaller subset intended for long-term preservation, removing: 1) mutants with no mapped insertions in genes, 2) mutants with insertions in genes that already have 20 or more alleles in the library, and 3) mutants that did not survive propagation (Methods). The consolidated set contains 62,389 distinct mutants on 245 plates, with a total of 74,923 insertions.
Short “junk fragments” of Chlamydomonas DNA are often found inserted between the cassette and flanking genomic DNA. The difficulty in distinguishing these junk fragments from true flanking genomic DNA can lead to inaccurate mappings (Li et al., 2016; Zhang et al., 2014). We sought to help users prioritize mutants by classifying insertions into categories that reflect our confidence in the mapping accuracy. We considered two criteria: (1) whether flanking sequences from both sides of the cassette mapped to the same genomic region; and (2) whether the LEAP-Seq reads suggest the presence of junk DNA fragments inserted next to the cassette (Figure S3; Methods).
We assigned a mapping confidence level of 1 to 19,015 insertions in which both cassette-genome junctions mapped to the same genomic region and were free of junk fragments (Figures 2A and 2B). We assigned a mapping confidence level of 2 to 5,665 insertions in which both cassette-genome junctions mapped to the same genomic region, after correcting for the presence of a junk fragment in one junction (Methods). We assigned a mapping confidence level of 3 to 36,600 insertions in which only one cassette-genome junction could be identified, and the likelihood of junk DNA insertion was determined to be low. We assigned a mapping confidence level of 4 to 13,643 insertions in which only one junction could be identified, and that junction was likely to contain a junk fragment, or the flanking sequence could not be mapped to a unique genomic location. The mapping for these insertions was adjusted to reflect the most likely correct insertion site (Methods).
We used PCR to estimate the fraction of correctly mapped insertions in each category, as previously described (Li et al., 2016). We tested insertions with confidence levels 1 and 2 as a single group, and 21 of the 22 (95%) randomly selected insertions were confirmed. For confidence level 3, 16 of 22 (73%) insertions were confirmed. For confidence level 4, 11 of 19 (58%) insertions were confirmed (Dataset S5).
Chlamydomonas random insertions are often associated with deletions and duplications of neighboring genomic DNA (Dent et al., 2015). We sought to characterize these deletions and duplications by examining the sequences across both junctions of confidence level 1 insertions. Of these insertions, 11% have no deletions or duplications, 74% harbor genomic deletions and 15% have genomic duplications. The great majority (98%) of genomic deletions were less than 100 bp, but some were as large as 10 kb. We observed genomic duplications of up to 30 bp, with 98% shorter than 10 bp (Figure S4). Both the deletions and duplications likely resulted from non-homologous end joining repair that occurs during cassette insertion (Vu et al., 2014). Additionally, examining the 651 insertions in which a junk fragment separated two inserted cassettes allowed us to estimate the typical junk fragment length: most (73%) junk fragments were less than 300 bp, but some were as large as 1,000 bp. If larger deletions, duplications or junk fragments were present, they were not sufficiently frequent to allow us to identify them reliably.
Based on the mapping data, we also determined the number of insertions associated with each mutant in the library: 83% of the mutants harbored a single mapped insertion and 17% contained 2 or more mapped insertions (Figure 2C). Consistent with these results, Southern blotting of genomic DNA from 18 randomly chosen mutants across the different confidence categories revealed that 13 had a single band that hybridized to the AphVIII region of the transformation cassette and 5 exhibited multiple hybridizing bands (Figure S5). We note that some insertions will not be detected by each of these methods: some insertions are not mapped, and some insertions of small DNA fragments derived from the cassette will not be detected by Southern blotting. Additionally, the mutants may also contain other types of lesions not detected by either of these methods, including genomic rearrangements, point mutations, or insertion of DNA from lysed cells.
The insertions are largely randomly distributed
We examined the distribution of insertion sites throughout the genome, using the entire mutant library prior to consolidation, which avoids introducing any bias through our consolidation choices. One of the challenges we faced in this analysis was that flanking sequences associated with repetitive DNA regions could not be aligned to a unique genomic location, leading to a lower apparent density of insertions in repetitive regions. To measure this effect for each genomic region, we used a “mappability” metric, which quantifies the fraction of simulated flanking sequences from any genomic region that can be uniquely mapped to that region (Methods). A random insertion model that accounts for mappability produced a distribution of insertion sites broadly similar to the observed distribution (Figure 3A). However, we did detect some cold spots and hot spots where insertion density differed significantly from the random insertion model (Figure S6; Dataset S6; Methods). Cold spots cover 26% of the genome and on average show a 48% depletion of insertions (Methods). Hot spots cover 1.5% of the genome and contain 16% of insertions.
Hot spots fell into two distinct classes that differed in the local distribution of insertions (Figure S7). In one class, dozens of insertions were found within a region of 20-40bp. In the other class, the insertions were spread over a much larger region of 200-1,000bp. Our observations suggest that hot spots could be caused by two distinct mechanisms; however, we did not observe a correlation between specific features of the genome (e.g. exon, intron, UTR, mappability) and the occurrence of either class of hot spot.
The insertions cover 83% of Chlamydomonas genes
We analyzed the number of mutants in each Chlamydomonas gene using the consolidated mutant set. Overall, ~83% of the genes are represented by an insertion in at least one mutant, ~68% in at least two mutants and ~53% in at least three mutants (Figure 3B). Approximately 69% of genes are represented by an insertion in a 5’ UTR, exon or intron, features most likely to cause a phenotype when disrupted. Many gene sets of interest to the research community, including the genes encoding proteins that make up the GreenCut2 (many involved in photosynthesis) (Karpowicz et al., 2011), genes encoding chloroplast proteins (Terashima et al., 2011), and genes that are associated with the structure and function of the flagella are well represented (Merchant et al., 2007; Pazour et al., 2005) (Figure 3C). We note that due to incorrect mapping of some mutants (Figure 2A), the true genome coverage is slightly lower.
We sought to understand what factors contributed to the likelihood that a gene is represented in the library. As expected from the random distribution of insertions, we observe a correlation between gene size and number of insertions (Figure 3D), with small genes being underrepresented. Additionally, genes involved in essential processes appeared to be underrepresented. We identified 786 underrepresented gene ontology (GO) categories (Figure 3E; Dataset S7). As expected, GO terms corresponding to apparently essential cellular processes, such as “translation”, “helicase activity”, and “DNA replication”, are among the most depleted terms. Consistent with these trends, the CiliaCut (Merchant et al., 2007) genes have remarkably high coverage, perhaps because cilia are not essential for Chlamydomonas viability (McVittie, 1972), or because the CiliaCut is biased toward larger genes (5,805 bp average mappable length vs. 4,751 bp average mappable length for all genes).
We observed lower densities of insertions in gene features than in intergenic regions (Figure 3F). Exons and introns were least represented, possibly because insertions in these features are more likely to cause growth deficiencies or lethality. The agreement between our observations and expectations further improves our confidence in the overall insertion mapping accuracy.
We identify candidate essential genes
Identification of essential genes in bacteria, fungi, and mammals has revealed important molecular processes in these organisms (Giaever et al., 2002; Peters et al., 2016; Rubin et al., 2015; Wang et al., 2015). We sought to take advantage of the very large set of mapped mutations in the library before consolidation to identify candidate essential Chlamydomonas genes based on the absence of insertions in those genes (Methods). One caveat to this method is that besides essentiality, genes might have no insertions because of other reasons, such as low chromatin accessibility. Given our average density of insertions, we are able to detect a statistically significant (FDR-corrected p<0.05) lack of insertions for genes with a mappable length greater than 5 kb. We identified 203 candidate essential genes (Dataset S8).
Many of the predicted essential genes have homologs that have been shown to be essential in other organisms. For example, Cre01.g029200 encodes a homolog of the yeast cell cycle protease separase ESP1 (Baum et al., 1988), Cre12.g521200 encodes a homolog of yeast DNA replication factor C complex subunit 1 RFC1 (Cullmann et al., 1995), and Cre09.g400553 encodes a homolog of the yeast nutrient status sensing kinase Target of Rapamycin 2 TOR2 (Kunz et al., 1993). In addition, we observed homologs of genes required for acetate assimilation or respiration, such as Acetyl-CoA synthetase/ligase (Cre07.g353450) and components of the Mitochondrial F1F0 ATP synthase (Cre15.g635850 and Cre07.g340350). These genes may be essential under the conditions of library propagation, where acetate serves as the energy source.
We also observed genes on the list with nonessential homologs in other organisms. One example is the monogalactosyldiacylglycerol (MGDG) synthase-encoding gene Cre13.g585301: its Arabidopsis homolog MGD1 is not essential (Jarvis et al., 2000). This can be explained by the presence of two other isoforms of MGDG synthases in Arabidopsis but not in Chlamydomonas (Riekhof et al., 2005). Comparison of our candidate Chlamydomonas essential genes with those of other organisms can provide insights into evolutionary differences across the tree of life.
Mutant barcodes enabled a high-throughput screen that revealed 3,109 mutants deficient in photosynthesis
While individual mutants from the library distributed before publication are already being widely used, the library also contains features that enable rapid genome-wide screens. As a demonstration of the power of this resource, we sought to use this library to increase our understanding of photosynthesis, the process that fixes 400 billion tons of CO2 into biomass per year (Field et al., 1998) and provides the oxygen we breathe.
We generated this library under conditions that do not require photosynthesis (Methods) so that photosynthetically defective strains are not lost and can be identified by their slower growth under conditions that require photosynthesis (Dent et al., 2005; Dent et al., 2015; Levine, 1960). To identify such mutants, we pooled the library before consolidation into a single culture and evaluated the ability of each mutant to grow under photoautotrophic and heterotrophic conditions. We grew the pool of mutants photoautotrophically in Tris-Phosphate (TP) medium in light and heterotrophically in Tris-Acetate-Phosphate (TAP) medium in the dark (Figure 4A). We performed two replicates of photoautotrophic growth (labeled TP-light-I and TP-light-II respectively). To quantify mutant abundance, we PCR-amplified the barcodes from the initial and final cell populations and subjected the products to deep sequencing.
We first evaluated the robustness of the method by comparing the read counts of barcodes obtained from two technical replicate aliquots of the same initial pool (technical replicates). The read counts for 5’ barcodes in the two replicates showed a Spearman’s correlation of 0.978, with 85,690 (94%) barcodes showing a normalized read count of no more than 2-fold difference between the two replicates (Figure 4B). Comparison of the read counts between the biological replicates of TP-light samples suggests high biological reproducibility (Figure 4C).
To identify mutants with defects in photosynthesis, we compared mutant abundances in TP-light with TAP-dark (Figure 4D). Different barcodes in the same mutant yielded consistent ratios of the TP-light read count to TAP-dark read count (Figure 4E), indicating that this ratio is a robust metric. We defined mutants deficient in photosynthesis as those with a light/dark read count ratio of 0.1 or lower. Because of the larger technical noise for barcodes with lower read counts, we required a minimum of 50 normalized reads in the TAP-dark sample. We included only mutants that have at least one insertion mapped to a gene. Using these criteria, we identified 2,638 and 2,369 mutants showing a growth defect in TP-light-I and TP-light-II, respectively (Figure 4D; Dataset S9 and S10; Methods), or 3,109 showing a phenotype in either of the two replicates. Going forward, the identities of these photosynthesis-deficient mutants will aid in advancing our understanding of photosynthesis. Below, we discuss how these mutants and the genes disrupted in them can be prioritized for further study.
We identified 44 tier I genes and 264 tier II genes in the photosynthetic growth screen
The photosynthesis-deficient mutants contained insertions in 2,599 genes. A common challenge for genetic studies in Chlamydomonas and many other organisms is that mutant phenotypes can be caused by mutations other than the identified insertions (Dent et al., 2005). Consequently, we cannot immediately conclude that the gene disrupted by the mapped insertion in a photosynthesis-deficient mutant is required for photosynthesis. After excluding alleles with insertions in the 3’ UTRs, which we observed to less frequently cause a phenotype (Figures 5B and 5C), 1,601 genes are represented in the photosynthesis-deficient mutants.
It is widely accepted that a mutation in a gene is more likely to be causative if independent alleles of a gene show the same phenotype (Krysan et al., 1999). In our library, many genes are covered by two or more alleles, providing an opportunity to identify a set of mutants for which the insertion is more likely to be the cause of the observed phenotype. We developed a statistical analysis pipeline to evaluate our confidence in the link between the genotype and phenotype. For each gene, we count the number of mutant alleles with and without a phenotype, and evaluate the likelihood of obtaining these numbers by chance given the total number of mutants in the library that exhibit the phenotype (Methods; Dataset S10). We defined two tiers of hits: 44 tier I hits (p<0.0011; FDR<0.3) show a p-value at least as low as that for a gene with two alleles showing a phenotype and zero alleles not showing a phenotype; and 264 tier II (p<0.058) hits show a p-value at least as low as that for a gene with one allele showing a phenotype and zero alleles not showing a phenotype. Genes in each tier are enriched (p<0.0001) in predicted chloroplast localization (Tardif et al., 2012)(Figure 5D). As expected, the GreenCut2 proteins, which are conserved among photosynthetic organisms but absent from non-photosynthetic organisms (Karpowicz et al., 2011; Wittkopp et al., 2016), are enriched: 34 GreenCut2 proteins are among the 308 tier I and II hits, compared to ~3% in the entire genome (p<10-9). The ability to identify such a large number of genes in a process of interest is remarkable for such a rapid experiment in a photosynthetic organism.
Twenty-two tier I genes were previously shown to have a role in photosynthesis
Among the 44 hits, 22 were previously shown to have a role in photosynthesis in Chlamydomonas or other organisms (Figure 5A; Table 1). Twelve of these genes were previously shown to be important for the biogenesis of protein complexes essential for the light reactions. The Arabidopsis homologs of Cre13.g578650 (CPLD10) and Cre02.g073850 (CGL54) are required for the translation or processing of the D1 subunit of photosystem II (PSII) (Link et al., 2012; Meurer et al., 1996; Schult et al., 2007; Wei et al., 2010). The Arabidopsis homologs of Cre02.g105650, Cre06.g273700, and Cre10.g430150, and the Synechocystis sp. PCC 6803 homolog of Cre06.g273700 are involved in the assembly of PSII (Cai et al., 2010; Komenda et al., 2008; Meurer et al., 1996; Peng et al., 2006). Cre12.g524300 (CGL71), its homologs in Arabidopsis and Synechocystis, and the Arabidopsis homolog of Cre01.g045902 are important for PSI assembly or stability (Heinnickel et al., 2016; Lezhneva et al., 2004; Meurer et al., 1996; Stockel et al., 2006; Wilde et al., 2001). Cre09.g389615 is important for the stability of psaC mRNA; PSAC encodes a subunit of PSI subunit (Douchi et al., 2016). Cre09.g394150 (RAA1), Cre12.g531050 (RAA3), Cre10.g440000 (OPR120), and the Arabidopsis homolog of Cre01.g027150 are required for splicing chloroplast group II introns, which are present in the transcript of psaA (encoding another PSI subunit) in Chlamydomonas (Carlotto et al., 2016; Jacobs et al., 2013; Marx et al., 2015; Perron et al., 1999; Rivier et al., 2001).
Four other genes are involved in the metabolism of cofactors or signaling molecules important for the light reactions. Specifically, the Arabidopsis homologs of Cre13.g581850 and Cre03.g188700 are required for accumulation of plastoquinone and carotenoids (Kim et al., 2015; Martinis et al., 2014). Cre16.g659050 is important for phylloquinone synthesis (Lefebvre-Legendre et al., 2007). Cre10.g423500 (HMOX1) generates biliverdin from heme and is required for chlorophyll accumulation during the transition from dark to light in Chlamydomonas (Duanmu et al., 2013).
We also captured five genes previously shown to be required for photosynthetic carbon assimilation. Cre03.g185550 (SEBP1) is a component of the Calvin-Benson-Bassham Cycle (Liu et al., 2012). Cre06.g298300 (MRL1) and the Arabidopsis homolog of Cre12.g524500 are critical for the maturation of Rubisco (Johnson et al., 2010; Klein and Houtz, 1995). In addition, Cre12.g497300 (CAS1) and Cre10.g452800 (LCIB) are involved in signaling and inorganic carbon movement in the carbon concentrating mechanism in Chlamydomonas, respectively (Jin et al., 2016; Wang et al., 2016; Wang and Spalding, 2006). Finally, a mutant was found in Cre14.g616600; the Arabidopsis homolog of this the encoded protein impacts the morphology of chloroplasts and thylakoids and potentially also the light reactions and carbon assimilation (Gao et al., 2006; Landoni et al., 2013). The large number of known hits and the fact that 13 of them were previously identified in land plants (Table 1) demonstrate the power of the library for identifying factors in conserved processes.
One tier I gene is required for growth in the absence of acetate
One of the hits, Cre03.g194200, encodes the beta subunit of pyruvate dehydrogenase. Mutants in this gene require acetate to grow because they cannot generate acetyl-CoA from pyruvate but can generate acetyl-CoA from acetate. This requirement for acetate, rather than a defect in photosynthesis, likely explains why mutants in this gene showed a growth defect in TP-light (Dent et al., 2015). While identification of this class of genes is an unavoidable consequence of our screening approach, it appears that these genes are rare relative to those required for photosynthesis.
Tier I reveals twenty-one novel genes associated with photosynthetic function
Twenty-one of the tier I genes (Table 2) were previously not known to be required for photosynthesis. Like the known tier I genes, these novel genes are enriched for chloroplast-localized proteins based on PredAlgo predictions (Tardif et al., 2012) (known: 16/22, p<10-8; novel: 9/21, p=0.015) (Figure 5D). Fifteen of the novel proteins have informative functional annotations. The Arabidopsis homologs of two have roles in processes potentially related to photosynthesis. The homolog of Cre10.g448950 modulates sucrose and starch accumulation (Suzuki et al., 2015). Cre10.g448950 thus might impact photosynthesis through feedback regulation (Paul and Foyer, 2001). Disruption of the Arabidopsis homolog of another tier I gene, Cre07.g341850, suppresses the phenotype of a leaf-variegated mutant (Miura et al., 2007). Considering that this protein has a chloroplast translation initiation factor domain, it may control translation of proteins required for photoautotrophic growth of Chlamydomonas.
Three other novel proteins belong to gene families with members implicated in the control of chloroplast function. Cre01.g037800 contains a thioredoxin domain, which could allow it to transmit redox signals from photosynthetic electron transport and regulate enzyme activities (Schurmann and Buchanan, 2008). Cre07.g316050 (CDJ2) has a chloroplast DnaJ domain (Kong et al., 2014), which may assist in folding and stabilization of chloroplast proteins, including proteins directly involved in photosynthesis (Kong et al., 2014). Cre11.g467712 contains a starch-binding domain and could play a structural role in the pyrenoid starch sheath (Gibbs, 1962) or be required for starch accumulation and proper feedback regulation of photosynthesis.
Three additional novel genes encode putative metabolic enzymes. Cre02.g073900 encodes a putative carotenoid dioxygenase that may impact the abundance or composition of carotenoid antenna or photoprotective molecules (Lohr, 2009; Niyogi et al., 1997). Cre09.g396250 encodes a putative phosphatidate cytidylyltransferase potentially involved in biosynthesis of thylakoid membrane or photosystem lipids (Boudiere et al., 2014). Cre10.g429650 encodes a putative alpha/beta hydrolase broadly involved in protein and metabolite degradation (Mindrebo et al., 2016). Characterization of its substrate specificity could help elucidate its specific role in photosynthesis.
Four of the novel genes encode putative membrane transporters or receptors. Cre12.g542569 and Cre13.g574000 may transduce glutamate concentration or membrane potential signals. Cre13.g586750 has a predicted nuclear importin domain and may be involved in regulating transcription when cells are transferred from heterotrophic to photoautotrophic growth. Cre50.g761497 contains a magnesium transporter domain and a mitochondrial targeting signal. Its homolog has been shown to be essential for normal mitochondrial function in yeast and humans (Wiesenberger et al., 1992; Yamanaka et al., 2016). Characterization of this gene will potentially shed light on metabolic interactions between mitochondria and chloroplasts.
Finally, three of the novel genes encode protein kinases or phosphatases. Two kinases, encoded by Cre01.g008550 and Cre02.g111550, have a predicted cytosolic and secretory pathway localization, respectively, while the Cre03.g185200 protein phosphatase (CPL3) is predicted to be localized in chloroplasts and may act directly on proteins of the photosynthetic apparatus.
X/264 tier II genes have a known role in photosynthesis
Multiple genes in tier II encode proteins structurally associated with the photosynthesis machinery, including PSII subunits Cre12.g550850 (PSBP1), Cre16.g678851 (PSBP2), and Cre05.g243800 (PSB27); cytochrome b6f complex subunits Cre11.g467689 (PETC), Cre12.g546150 (PETM), Cre12.g537850 (CCB2); PSI subunits Cre05.g238332 (PSAD), Cre10.g420350 (PSAE), and Cre12.g486300 (PSAL); light-harvesting complex subunits Cre11.g467573 (LHCA3) and Cre16.g687900 (LHCA7); and the Calvin-Benson-Bassham Cycle enzymes Cre12.g554800 (PRK1) and Cre12.g510650 (FBP1/FBP2). The remaining genes on this list are promising candidates for new genes with roles in photosynthesis.
CPL3 regulates photosystem I in Chlamydomonas
Assigning the novel tier I and II genes to specific pathways will greatly advance our understanding of photosynthesis. We have started to characterize these genes, beginning with CPL3 (Cre03.g185200), which encodes a predicted protein phosphatase. A previously isolated acetate-requiring mutant contained an insertion that mapped adjacent to CPL3 (Dent et al., 2015). In our library, CPL3 has three alleles that exhibited a deficiency in photosynthesis (Figure 4D; Table S5). We chose to examine one allele (LMJ.RY0402.153647, referred to as cpl3 hereafter; Figure 6A) for phenotypic confirmation, complementation, and mechanistic studies.
We first tested whether the phenotype detected in the pooled screens could be reproduced in individual cultures. We observed that the growth of cpl3 was almost completely abolished under photoautotrophic conditions, compared to only a slight growth defect under heterotrophic conditions (Figure 6B), consistent with the pooled growth data.
We then confirmed and characterized the cpl3 insertion in detail. Our high-throughput LEAP-Seq data suggested that cpl3 contained an insertion of two back-to-back cassettes. Specifically, the cpl3 mutant contains two insertion junctions from 3’ ends of two cassettes in opposite orientations, within the CPL3 gene: junction 1 is confidence level 3 (no “junk” fragment), and junction 2 is confidence level 4 (with “junk” fragment, corrected) (Figure 6A). We successfully confirmed both junctions by PCR (Figure 6C). Sequencing of the product from junction 2 revealed that the end of the cassette has a 10 bp truncation and a 10 bp fragment of unknown origin inserted between the cassette and the CPL3 gene. The immediate flanking sequence of junction 2 overlaps with the flanking sequence in junction 1 by 2 bp. When we amplified across the insertion site, cpl3 yielded a product ~3 kb larger than the product from wild type (CC-4533) (Figure 6C). Based on these results, the most likely model for the insertion is that two copies of the cassette (at least one truncated) inserted together into the CPL3 gene in opposite orientations, with a 2 bp genomic duplication at the site of insertion.
To confirm CPL3’s involvement in photosynthesis, we cloned CPL3 genomic DNA and transformed it into the cpl3 mutant. Photoautotrophic growth was rescued in approximately 14% of the transformants, based on colony size (Figure S8). Three rescued transformants, comp1-3, were chosen randomly and were confirmed to contain wild-type CPL3 sequence (Figures 6B and 6C). These results demonstrate that disruption of CPL3 is the cause of the growth defect of the mutant.
A deficiency in photosynthesis can be caused by a defect in the light reactions, which convert light energy into chemical energy by coordinated operation of four major protein complexes (PSII, cytochrome b6f complex, PSI, ATP synthase), or in the Calvin-Benson-Bassham Cycle, which uses chemical energy to fix CO2 into sugars. We sought to determine which of these components is affected in the cpl3 mutant by using spectroscopic measurements of pigment composition and photosystem activities (Meurer et al., 1996). Pigment analyses revealed a lower chlorophyll a:b ratio in cpl3 than in wild type (Figure 6D), suggesting an abnormal abundance of pigment-protein complexes (Bassi et al., 1992). A lower chlorophyll a:b ratio is often associated with a defect in PSI (Heinnickel et al., 2016), which motivated us to investigate PSI function in the cpl3 mutant.
To determine PSI activity in the mutant, we measured oxidation-reduction of P700, the reaction center chlorophyll in PSI, in response to light. In wild-type cells when electron flow from PSII is chemically inhibited, illumination causes P700 to become oxidized. The cpl3 mutant showed less oxidation relative to wild type, whereas complemented strains showed wild-type behavior (Figures 6E and 6F). These data suggest that the cpl3 mutant has a defect in PSI function.
We generated a transformative resource for genetic studies of Chlamydomonas
Genetic studies of Chlamydomonas are answering fundamental biological questions in photosynthesis and beyond. Chlamydomonas research has greatly contributed to our understanding of basal body and ciliary biogenesis and function (Avasthi et al., 2014; Dutcher and O’Toole, 2016; Jarvik and Rosenbaum, 1980; Li et al., 2004; Mitchison et al., 2012; Snell et al., 2004; Tam and Lefebvre, 1993) including the discovery of the intraflagellar transport (Baldari and Rosenbaum, 2010; Kozminski et al., 1993) and the mechanisms underlying human ciliopathies (Fliegauf et al., 2007; Pazour et al., 2000; Qin et al., 2001). In recent years, Chlamydomonas has also been extensively used for studies of the algal carbon concentrating mechanism (Badger et al., 1980; Brueggeman et al., 2012; Fang et al., 2012; Mackinder et al., 2016; Wang and Spalding, 2006), light signaling (Petroutsos et al., 2016), lipid metabolism (Li et al., 2012; Moellering and Benning, 2010; Wang et al., 2009; Yoon et al., 2012), dark metabolism (Yang et al., 2015), fermentation and hydrogen production (Dubini et al., 2009; Ghirardi et al., 2007), pigment metabolism (Lohr, 2009), responses to abiotic stresses (Castruita et al., 2011; Gonzalez-Ballester et al., 2010; Hemme et al., 2014; Miller et al., 2010), phototaxis (Berthold et al., 2008), mating (Umen, 2011), the cell cycle (Tulin and Cross, 2014), and evolution (Flowers et al., 2015).
This mutant library facilitates reverse genetics studies in many of these fields, which have long been hampered by the limited availability of Chlamydomonas mutants in candidate genes. Previously, the two largest collections of mapped mutants contained 1,935 (Li et al., 2016) and 439 (Dent et al., 2015) strains, covering ~8% and ~4% of Chlamydomonas genes respectively. The present library adds 62,389 mapped mutants and has increased the genome coverage to 83%. The quality of the mutants is demonstrated by our overall high success rate of PCR confirmation (Figure 2A) and the fact that many genes implicated in photosynthesis were recapitulated in our screen (Figure 5A; Table 1; Dataset SX. While this manuscript was in preparation, we made mutants from this library available to the community. In the first 15 months, we distributed more than 1,700 mutants from this library to over 180 labs worldwide.
This library brings high-throughput genetics to photosynthetic organisms
This mutant library enables high-throughput forward genetic screens in a model photosynthetic eukaryote. Traditionally, forward genetic screens in Chlamydomonas are performed as follows: mutants are generated separately for each screen and phenotyped individually (Gonzalez-Ballester et al., 2005b; Massoz et al., 2015; Tam and Lefebvre, 1993); mutation sites are then determined for the strains identified in the screen, which can take months and is only successful for a fraction of the mutants (Dent et al., 2015; Gonzalez-Ballester et al., 2005a; Meslet-Cladiere and Vallon, 2012; Tam and Lefebvre, 1993).
Our library is transformative with respect to both the speed and accuracy of mutant screens. First, mutant-specific barcodes enable pooled screens, which are higher throughput than phenotyping each mutant individually and more quantitative than colony array screens (refs). Second, mutants in the library are already mapped and the disrupted genes are known. Third, 50% of genes are covered with multiple exon, intron, or 5’ UTR alleles, which allows us to statistically evaluate the likelihood for each gene to be required for any process. We have demonstrated this last advantage by identifying 44 tier I and 264 tier II genes among the 2,599 genes covered by all photosynthesis-deficient mutants discovered.
Beyond photosynthesis, this library enables the use of Chlamydomonas as the “green yeast” (Goodenough, 1992) to elucidate many other biological questions at an accelerated pace, including resistance to osmotic, oxidative, temperature, and nutrient stresses. The high throughput of the screens makes it possible to rapidly screen the library under a variety of conditions to identify genes with roles important for growth or viability under the different stress conditions. This strategy will also allow clustering of genes based on the fitness of strains in the library under a variety of conditions, which can aid in elucidating gene functions based on known genes in the same cluster (Hillenmeyer et al., 2008). The identification of genes with roles in acclimation to stress conditions and the placement of these genes into pathways will advance sustainable agriculture in the context of global climate change.
Our approach revealed 21 genes likely required for photosynthesis and hundreds of candidates
We identified 44 tier I genes and 264 tier II genes. Both tiers are enriched in predicted chloroplast localized proteins (Figure 5D). Genetic studies on photosynthesis trace back to 1950s (Kates and Jones, 1964; Levine, 1960; Sager and Zalokar, 1958), making it remarkable to see that among the 44 tier I genes we identified, 21 were not previously known to be involved in photosynthesis.
Functional annotations of the known and novel tier I genes provide insights into the outstanding questions in photosynthesis. Most of the known genes are directly involved in the transcription and translation of the proteins or the assembly of protein complexes in light reactions and carbon assimilation. Only three of the known genes are involved in signaling and regulatory aspects (Cre10.g423500, Cre12.g497300, and Cre12.g524500) (Duanmu et al., 2013; Klein and Houtz, 1995; Wang et al., 2016). In contrast, the novel genes encode a higher number of signaling and regulatory proteins, including two protein kinases (Cre01.g008550 and Cre02.g111550), one protein phosphatase (Cre03.g185200, CPL3), one thioredoxin-domain containing protein (Cre01.g037800), one nuclear importin domain-containing protein (Cre13.g586750), and two ion channels (Cre12.g542569 and Cre13.g574000) that respond to signals of membrane potential or ligand concentrations (Changeux and Christopoulos, 2016). This observation suggests a relatively poor understanding of regulatory components in photosynthesis.
CPL3 is a novel regulator of photosynthesis
There are a growing number of examples of how photosynthesis is regulated by phosphorylation, and much remains to be discovered. Chlamydomonas STT7, Arabidopsis STN7, and Arabidopsis TAP38/PPH1 regulate movement of light-harvesting complex II (LHCII) between PSII and PSI by phosphorylating or dephosphorylating LHCII (Bellafiore et al., 2005; Depege et al., 2003; Pribil et al., 2010; Shapiguzov et al., 2010). Arabidopsis STN8 and Arabidopsis PBCP are involved in a PSII repair cycle by phosphorylating or dephosphorylating PSII core subunits (Samol et al., 2012; Vainonen et al., 2005). Dozens of other proteins involved in photosynthesis are phosphorylated (Baginsky and Gruissem, 2009; Wang et al., 2014; Werth et al., 2016), but we do not know the physiological role of most of these phosphorylations, or the identities of the protein kinases and phosphatases that regulate their phosphorylation state.
CPL3 potentially fills two key holes in our understanding. First, we know that the photosynthetic apparatus is phosphorylated on tyrosines, but we don’t know any of the corresponding phosphatases. While the majority of phosphorylations in photosynthetic organisms are on serine/threonine residues, phosphorylated tyrosine residues have been observed in the PSII subunit PSBO and the PSI subunit PSAA in Chlamydomonas (Wang et al., 2014). CPL3 has 25% identity to the demonstrated protein tyrosine phosphatase Shewanella sp. PPI (Sievers et al., 2011; Tsuruta and Aizono, 2000), and shares the three key motifs “DXHG”, “GDXXDR” and “GNHE” (Tsuruta et al., 2005), strongly suggesting that it is a tyrosine phosphatase (Figure S9). Second, there is no known phosphatase acting on PSI. The PSI defects we observed in the cpl3 mutant (Figure 6) make CPL3 a strong candidate for a PSI tyrosine phosphatase.
The Arabidopsis homolog of CPL3, AtSLP1, is localized to the chloroplast, and its transcript is only expressed in the photosynthetic tissues and is correlated with photosynthesis-associated transcripts (Kutuzov and Andreeva, 2012; Uhrig and Moorhead, 2011), suggesting that CPL3’s role in photosynthesis is conserved. The example of our photosynthesis screen and characterization of CPL3 illustrates the power of our Chlamydomonas mutant library for rapidly advancing our knowledge of fundamental processes in photosynthetic eukaryotes.
Methods used in this manuscript are provided in the supplemental information.
Table 1. Tier I genes from the photosynthesis screen that had a previously known role in photosynthesis
<Weronika will format tables>
aPrediction of protein localization by PredAlgo (Tardif et al., 2012): C = chloroplast, M = mitochondrion, SP = secretory pathway, O = other.
bThe number of exon/intron/5’UTR mutant alleles for that gene that satisfy our minimum read count requirement and showed at least 10X fewer normalized reads in the TP-light sample compared to the TAP-dark sample.
cThe number exon/intron/5’UTR mutant alleles for that gene that satisfy our minimum read count requirement but did not satisfy the at least 10X depletion in TP-light criterion.
dthe FDR-adjusted p-value for that gene compared to all alleles for all genes (see Methods).
eArabidopsis homolog, obtained from the “best_arabidopsis_TAIR10_hit_name” field in Phytozome.
fAT3G17040.1 is required for functional PSII in Arabidopsis whereas Cre09.g389615 was shown to be involved in PSI accumulation in Chlamydomonas.
Table 2. Tier I genes from the photosynthesis screen with no previously known role in photosynthesis
aThe annotation of “fast leu-rich domain-containing” cannot be confirmed by blastp analysis in NCBI (Altschul et al., 1997).
Figure 1. We Generated a Genome-wide, Indexed and Barcoded Library of Chlamydomonas Insertion Mutants using a High-throughput Pipeline.
A. Transformation cassettes contained the HSP70-RBCS2 promoter (with an intron from RBCS2), the AphVIII gene that confers resistance to paromomycin, two transcriptional terminators (T1: PSAD terminator; T2: RPL12 terminator), and two regions with sequences unique to each cassette molecule. These two regions are termed “barcodes.” The barcodes were 22 base pairs (bp) in length. Simplified examples are shown in the figure.
B. After transformation and mutant arraying, the sequence of the barcode contained in the insertion cassette was unique to each transformant but initially unknown for each colony.
C. Barcodes were amplified from combinatorial pools of mutants, sequenced, and traced back to single colonies (Figure S2). After this step, the barcode sequence for each colony was known. For simplicity, only one side of the cassette is shown.
D. Barcodes and genomic sequences flanking the insertion cassettes were amplified from a few separate pools containing all mutants in the library. By pooled next-generation sequencing, the sequence flanking the insertion cassette was paired with the corresponding barcode (Figure S3). Because the physical location for each barcode was determined in the previous step, flanking sequences could then be assigned to single colonies.
See also Figures S1 and S2.
Figure 2. This Library Contains 62,389 Mutants with 74,923 Insertions.
A. The consolidated library is composed of insertions of four confidence levels, corresponding to different mapping scenarios. The insertion sites of a number of randomly chosen mutants in each category (confidence levels 1 and 2 combined as one category) were verified by PCR (Dataset S5), with the percentages of confirmed insertions shown in the last column.
B. The library includes 74,923 insertions, with the majority in confidence levels 3 and 1.
C. Most of the library mutants have a single mapped insertion, and <20% contain two or more mapped insertions.
See also Figures S3, S4, and S5.
Figure 3. This Library Covers 83% of Genes in the Chlamydomonas Genome.
A. The library provides consistent coverage over the majority of the genome, with some variation in insertion density. This panel compares the library insertion density over the genome (left column in each chromosome) to three simulations with insertions randomly distributed over all mappable positions in the genome (three narrow columns on the right of each chromosome). Three simulations were provided to illustrate the variability caused by random insertion positions. Areas that are white throughout all columns represent regions where insertions cannot be mapped to a unique genomic position due to highly repetitive sequence. Areas where the left column has a significantly different mutant density than the left (simulation) columns represent insertional hot spots or cold spots. See Figure S6 for more detail.
B. 83% of all Chlamydomonas genes are covered with one or more insertions in the library. 68% of genes are covered with two or more insertions, and 46% with two or more exon or intron insertions.
C. Various gene sets of interest to the community are covered to similar degrees, with minor variations.
D. As expected, the number of insertions is roughly correlated with gene length.
E. The GO categories with fewer insertions than average are likely enriched in essential genes. The depletion ratio is the number of insertions in all genes in the category compared to the number of insertions that would be expected based on the gene lengths if their insertion density matched the average insertion density of genes in the flagellar proteome, which was chosen as a likely non-essential control group. Shown are the three GO categories with lowest p-values. Full GO category data are provided in Dataset S7.
F. Insertion density per mappable genome length varies between different gene features, with the lowest density in exons.
Panels A, D-F are based on pre-consolidation library data to avoid biases introduced by consolidation. See also Figures S6 and S7.
Figure 4. The Barcode Feature of Our Library Allows Screening Mutants in Pools.
A. Mutants in the library were pooled, with aliquots grown under heterotrophic (TAP-dark) and photoautotrophic (TP-air) conditions. After eight doublings, cells from each condition were harvested, DNA was extracted, and barcodes were PCR-amplified and sequenced. Mutants deficient in photosynthesis are expected to have a lower read count in TP-air relative to TAP-dark.
B. The barcode sequencing read counts (normalized to 100 million total reads) for each insertion were highly reproducible between technical replicates, with Spearman’s correlation of 0.978.
C. The normalized barcode sequencing read counts for each insertion were highly reproducible between biological replicates, with Spearman’s correlation of 0.982.
D. The phenotype of each insertion was determined by comparing its read count under TAP-dark and TP-light conditions: an insertion is considered to have a photosynthesis phenotype if it has at least 10x fewer normalized reads in TP-light than in TAP-dark, and has at least 50 TAP-dark reads (to avoid low-read noise). The dashed green line shows the phenotype cutoff. cpl3 alleles are highlighted in red. Triangles and squares represent exon and intron alleles respectively.
E. The TP-light/TAP-dark ratio of multiple insertions in the same mutant is consistent, with a Spearman’s correlation of 0.744, and only 4% of insertion pairs having more than a 5x difference between ratios.
We obtained data for both 5′ and 3′ barcodes. For panels B-E, 5’ barcodes were shown. See also Table S5.
Figure 5. Our Screen Recapitulates Many Genes With Known Roles in Photosynthesis and Reveals Many Novel Components That Play a Role in Photosynthesis.
A. Twenty-two tier I genes were known to have a role in plastidic processes related to photosynthesis, including PSII protein synthesis and assembly, PSI RNA splicing and stabilization, PSI protein synthesis and assembly, Calvin-Benson-Bassham (CBB) cycle, cofactor and pigment metabolism, chloroplast and thylakoid morphogenesis, and the carbon concentrating mechanism. Twenty-one tier I genes were previously not known to have a role in photosynthesis. Each yellow box represents one gene hit.
B. The TP-light/TAP-dark ratio of all the alleles for each gene in the three gene groups. Each column is a gene; each horizontal bar is an allele, color-coded by feature (only exon/intron/5’UTR insertions are shown). The plot is separated into three sections: randomly chosen non-hit control genes (flagellar proteome), hits with known roles in photosynthesis, and novel hits. Genes in each group are sorted by the average TP-light/TAP-dark ratio over all alleles, from high to low.
C. Insertion phenotypes vary depending on the gene feature: exon and intron insertions are the most likely to show strong phenotypes, while 3’UTR insertions almost never do. The plot is based on all insertions for the 44 hit genes. The exon, intron, and 3’ UTR groups each contain >100 insertions. For 5’UTR, we caution that there are only 14 insertions and the distribution may not be representative of all 5’UTR insertions.
D. Known and novel tier I genes, and tier II genes, are all enriched in predicted chloroplast-targeted proteins.
Figure 6. CPL3 is Important for Photosystem I Activity.
A. Cassette insertion sites are indicated on a model of the CPL3 gene from the Chlamydomonas v5.5 genome. In the gene model, black boxes, gray boxes and thin lines indicate exons, UTRs, and introns respectively. Two cassettes are inserted in opposite orientations, with one of them truncated on the 3′ side (indicated by a notch); the 5′ ends may be intact or truncated. The orange indicates a small insertion of a fragment of unknown origin.
B. cpl3 is deficient in growth under photoautotrophic conditions. The growth deficiency is rescued upon complementation with the wild-type CPL3 gene (comp).
C. PCR genotyping results of cpl3 and complemented lines. cpl3 contains an insertion in the CPL3 gene. The complemented lines contain both disrupted and wild-type CPL3 gene.
D. Measurements of chlorophyll a:b ratio of wild type and cpl3 mutant. Cells were grown up in TAP medium in dark and then grown for an additional 24 h in light. Analyses were performed on three biological replicates right before (0h), 4 h after, and 24 h after exposure to light. Error bars indicate standard deviations. Asterisks indicate a statistically significant difference between the two samples (t-test, p<0.03).
E. cpl3 is defective in P700 oxidation upon light. Cells were dark-acclimated, exposed to light for 10 sec, followed by a saturating pulse, and then switched to dark incubation. Changes in absorbance at 705nm were monitored; a decrease in absorbance indicates oxidation of P700. A time course is shown for each of three biological replicates for the three strains. Analyses for the 24 h time point are presented.
F. cpl3 has less maximum photo-oxidizable PSI. Maximum photo-oxidizable P700 is calculated from the change of absorbance from the dark acclimation period to the saturating pulse time point.
See also Figures S8 and S9.