Multi-omics analysis reveals the molecular response to heat stress in a “red tide” dinoflagellate

Strain and culture conditions

The axenic culture of Prorocentrum cordatum strain CCMP1329 used in this work was obtained from the Provasoli-Guillard National Center for Marine Algae and Microbiota (NCMA). P. cordatum CCMP1329 was cultivated in L1 minus Si medium, which is a modified L1 medium [83] in which synthetic ocean water was used instead of natural seawater, and Na2SiO3·9H2O omitted. Cultures were routinely maintained by transferring 15-day-old culture (10 mL) to fresh medium (90 mL) in 300-mL Erlenmeyer flasks. The flasks were kept in a climate chamber (RUMED type 3501; Rubarth Apparate GmbH, Laatzen, Germany) at 20°C in a 12:12 h light:dark cycle, with light intensity ~ 40 µmol photons m−2 s−1 without agitation. Absence of contaminating bacteria was routinely checked by plating aliquots on LB and Difco™ marine agar 2216 (MB) plates. The number of cells mL−1 after transfer fluctuated between 2.0 × 104 and 3.0 × 104.

Cell counting

Growth of P. cordatum was determined by cell counting using a BD LSR-Fortessa flow cytometer (BD Biosciences, San Jose, CA, USA). P. cordatum was identified according to its chlorophyll autofluorescence. Chlorophyll was excited with the 488-nm excitation laser and emission was detected at 695 nm. Samples (1 mL) were taken from three biological replicates during the light period and fixed with 25% glutaraldehyde (80 µL; final concentration 2% v/v) for 15 min at room temperature (RT). Samples were snap frozen in liquid nitrogen and stored at −70°C until they were analyzed. Each sample was analyzed in triplicates for 1 min.

Four independent growth curves per temperature were recorded. For each growth curve, counts were averaged from three biological replicates (each with three technical replicates). Counts were plotted against time and a generalized additive model (GAM) was fitted. The specific growth rate in the exponential growth phase (μexp per day) and the doubling rate per day (k) was calculated [84].

Extraction of genomic DNA

DNA of P. cordatum cells extracted with commercially available plant kits or by applying the common CTAB protocol was either too fragmented or contained too many contaminating compounds to be suitable for PacBio sequencing. Therefore, we performed an initial ultrasound treatment to break the cells and separate the nuclei from the debris, based on work aimed at isolating nuclei for electron microscopy [85]. This treatment was not performed for extracting genomic DNA for Illumina NovaSeq sequencing.

P. cordatum culture (100 mL) was transferred to two 50-mL Falcon tubes and centrifuged (685 g, 5 min, RT) using a Heraeus Multifuge™ X1R (Thermo Scientific, Schwerte, Germany). The supernatant was discarded, the pellets were dissolved in artificial seawater (15 mL) using an inoculation loop, and centrifuged (685 g, 5 min, RT). The supernatant was discarded, the pellets were dissolved in 30% ethanol (15 mL), and centrifuged (685 g, 5 min, RT); this step was repeated before the final pellet for each tube was dissolved in 30% ethanol (3 mL). For each tube, ultrasound using a Bandelin Probe Sonicator (Bandelin, Berlin, Germany) was applied (1 min, 5 cycles, amplitude 40%) followed by centrifugation (171 g, 3 min, RT), and the resulting pellet was suspended in 0.85% NaCl (1.5 mL) and centrifuged (10,000 g, 1 min, 4°C) using Heraeus Primo R Centrifuge (Thermo Scientific, Schwerte, Germany). Pellets from the two tubes were dissolved in high salt buffer (1 mL) in a 2-mL Eppendorf tube using inoculation loops. Proteinase K (8 µL) was added, and the mixture was incubated at 56°C (1 h), with the tubes inverted gently every 15 min. After cooling on ice (5 min), RNase A (15 µL) was added, and the mixture was incubated at 37°C (30 min, no shaking). The sample was centrifuged (10,000 g, 5 min, 4°C) and the supernatant transferred to a new tube. NaCl (5 M, 200 µL) were added, followed by thorough mixing by inverting the tube. Solution of CTAB/NaCl (200 µL) was then added, mixed well by inverting the tubes, and the mixture was incubated at 65°C (10 min). Chloroform extraction was performed using chloroform:isoamyl alcohol (24:1 v/v, 1 mL), and repeated three times until no interphase was visible. To the collected aqueous phases, an equal volume of isopropanol (pre-chilled at −20°C) was added, and the mixture was incubated overnight at −20°C. DNA was centrifuged (10,000 g, 10 min, 4°C) and the pellet was washed three times with 70% ethanol (pre-chilled at 4°C) with centrifugation (10,000 g, 10 min, 4°C). Following air drying under the clean bench, the DNA pellet was dissolved in the elution buffer (200 µL).

The DNA samples were sent to Helmholtz-Center for Infection Research (Braunschweig, Germany) for Illumina NovaSeq 6000 (pair-end 2 × 150 bp) sequencing, and to the German Collection of Microorganisms and Cell Cultures (DSMZ, Braunschweig, Germany) PacBio Sequel and Sequel II sequencing.

RNA isolation and sequencing

For RNA extraction, the sample was thawed at RT and transferred to a cryotube filled with 0.3 g acid-washed glass beads (100 µm). The cells were homogenized using the FastPrep-24 instrument (MP Biomedicals, Irvine CA, USA) at 6.0 m/s for 3 min (3 × 1 min, and 1 min on ice). Samples were centrifuged (12,000 g, 10 min, 4°C; Heraeus Primo R Centrifuge), and the supernatants were transferred to fresh tubes and incubated at RT (5 min). Next, 1-bromo-3-chlorophenol (100 μL; Sigma-Aldrich, Taufkirchen, Germany) was added, and samples were shaken vigorously for 15 s and incubated at RT (10 min). Samples were centrifuged (12,000 g, 10 min, 4°C), the aqueous phase was transferred to a new tube, to which isopropanol (0.5 mL) was added, mixed, and incubated at RT (10 min). The sample was then centrifuged (12,000 g, 10 min, 4°C), after which the supernatant was removed. The RNA pellet was washed with 75% ethanol (1 mL) with centrifugation (7500 g, 5 min, 4°C); this step was repeated. The final pellet was air-dried for 5 min, before being resuspended in RNase-free water (50 μL), and incubated at 55°C (10 min), prior to storage at −80°C.

Removal of genomic DNA was verified via PCR using total RNA as the template. The concentration of the RNA was quantified using a NanoDrop spectrophotometer (Peqlab, Erlangen, Germany) and the RNA integrity was assessed using a Bioanalyzer 2100 (Agilent, Santa Clara, USA). The average RNA concentration in the 18 samples was 297.4 ± 109.5 ng µL−1 and the average RIN value was 5.5 ± 0.78.

RNA sequencing was performed at the HZI Braunschweig with 300 cycles on a NovaSeq 6000 using pair-end 150 bp chemistry with the library kit NEBNext Ultra II directional RNA. Ribosomal RNA was depleted prior to sequencing using polyA beads.

De novo genome assembly

Genome data from P. cordatum were generated using Illumina NovaSeq 6000 and PacBio sequencing technologies, with a total data yield of 843.4 Gb (Additional File 3: Table S24). Combining these sequence reads, a hybrid genome assembly was generated using MaSuRCA v4.0.3 [86], independently with CABOG (option FLYE_ASSEMBLY = 0) and FLYE (option FLYE_ASSEMBLY = 1) as the assembly tool in the final step. Both assemblies are near identical, in which 99.94% of the scaffolds in each assembly share 99.04% identity on average. They exhibit the same level of data completeness (56.7%) based on recovery of BUSCO single-copy orthologs in alveolata_odb10 dataset (which is known to be poor in dinoflagellate data). Between these two preliminary assemblies, MaSuRCA-CABOG is more contiguous (N50 length of scaffolds = 194.50 kb) and yields better recovery of the rRNA region (Additional File 3: Table S25); this assembly was used in the subsequent refinement steps.

To refine the assembled genome, we first incorporated RNA-Seq data to further scaffold the MaSuRCA-CABOG assembly using L_RNA_scaffolder [87]. Briefly, we mapped the de novo assembled transcriptome (for transcripts ≥ 500 bp) onto the assembled genome sequences using pblat v2.5 [88]. The mapping results in psl format were used as input for L_RNA_scaffolder. This approach yielded a more contiguous genome assembly (N50 length of scaffolds = 346.97 kb) with a better recovery of BUSCO genes (57.9%; Additional File 3: Table S25).

Next, using BlobTools v1.1.1 [89], we assessed the assembled genome for potential outlier sequences based on sequence coverage, G+C content, and shared sequence similarity to known sequences in NCBI nt database (release 243; 15 April 2021). Genome scaffolds for which the sequencing coverage or G+C content is external to the range of median ± 1.5 × interquartile range are considered as potential outliers. Scaffolds that have bacterial, archaeal, or viral sequences as the top hits plus extreme sequencing coverage or extreme G+C content are considered sequences that are putatively external to the nuclear genome of Prorocentrum cordatum. In this analysis, majority (23,366; 98.2%) of the 24,295 genome scaffolds (implicating 3.89 Gb) do not have hits in the BLAST searches (20,914 scaffolds; 2.95Mbp) or have top hits in an undefined eukaryote sequence (2452 scaffolds; 0.95Mbp); this observation is expected given the lack of dinoflagellate data in the existing databases. Using this approach, we identified 1571 outlier sequences and removed them from the genome assembly. Most outlier sequences do not have shared similarity to bacterial sequences in the database. This is expected given the algal cultures from which the genomic DNA was extracted were axenic. The final genome assembly [38] has a total size of 4.15 Gb (N50 length of scaffolds = 349.2 kb; Additional File 3: Table S25).

Transcriptome assembly

RNA-Seq reads from six conditions (20_ex, 20_st, 26_ex, 26_st, 30_ex and 30_st) were processed using fastp v0.21.0 [90] using parameter –g to remove adapter sequences and poly-G artifacts known in NovaSeq 6000 data. Transcriptomes were first assembled in “de novo” mode using Trinity v2.12.0 [91] independently for each condition. All de novo assembled transcripts were combined as a single assembly, from which redundant sequences were identified and removed using CD-HIT v4.8.1 [92] (98% identity; 0.9 length-difference cutoff), yielding the final representative reference transcriptome.

To generate the genome-guided transcriptome, processed RNA-Seq reads from each condition were first mapped onto the final genome assembly (above) using HISAT2 v2.2.1 [93]. The mapping result (i.e., describing the alignments between RNA-Seq reads and genome scaffolds) was used as input for Trinity v2.12.0 in “genome-guided” mode. Using the same strategy above, individual genome-guided assemblies were combined, and redundant sequences removed, yielding the final representative genome-guided transcriptome.

Ab initio prediction of protein-coding genes

To predict protein-coding genes, we follow the customized gene prediction workflow tailored for dinoflagellate genomes following Chen et al. [94]. The description of this workflow is available at [95]. Briefly, this approach integrates evidence of transcript and protein sequences to guide predictions using multiple gene programs, after which the results were integrated to yield the final gene models [38].

First, we identified novel repetitive elements in the genome assembly using RepeatModeler v2.0.1 [96], combined these elements with existing repeat families in RepeatMasker database release 2018/10/26 as a reference, to predict and mask all repetitive sequence regions from the genome sequences using RepeatMasker v4.1.0 [97]; this yields the repeat-masked genome assembly.

To predict protein-coding genes, we first mapped the representative de novo and genome-guided transcriptome assemblies to the genome assembly using Minimap2 v2.18 [98], for which the code was modified to recognize G-C and G-A donor splice sites. The mapping information was then used to predict transcript-based genes using PASA v2.3.3 [99] for which the code was modified to account for non-canonical splice sites. The proteins coded by the transcript-based genes were searched (BLASTP, E ≤ 10−20, > 80% query cover) against a customized protein database combining RefSeq (release 98) proteins and predicted proteins from available Symbiodiniaceae genomes (Additional File 3: Table S26). The gene models were checked for transposable elements using HHblits v3.3.0 [100] and TransposonPSI [101], searching against the JAMg transposon database [102]; those genes containing transposable elements were removed from subsequent steps. Redundant sequences were removed based on similarity using CD-HIT v4.8.1 [92] (-c 0.75 -n 5). Among the remaining transcript-based gene sequences, we identified high-quality “golden genes” using the script Prepare_golden_genes_for_predictors.pl from the JAMg pipeline [102], altered to recognize alternative splice sites. These “golden genes” represent high-quality training set for ab initio gene predictors. We used them as the training set for SNAP [103] and AUGUSTUS v3.3.1 [104] for gene prediction from the repeat-masked genome assembly; the codes for AUGUSTUS was also modified to recognize alternative splice sites [105].

The repeat-masked genome was also used as the input for GeneMark-ES [106]. We also predicted genes using MAKER v2.31.10 [107], in which the code was modified to recognize GA donor splice sites. Protein-coding genes were predicted using MAKER (protein2genome mode) based on protein sequences from the Swiss-Prot database (retrieved 02 March 2020) and predicted protein sequences from other Symbiodiniaceae genomes. Finally, gene predictions from the five methods including the ab initio predictors (GeneMark-ES, AUGUSTUS, SNAP), MAKER protein-based predictions, and PASA transcript-based predictions were integrated using EvidenceModeler v1.1.1 [108] to yield the high-confident gene models; the weighting is PASA 10, MAKER protein 8, AUGUSTUS 6, SNAP 2, and GeneMark-ES 2. These gene models were subjected to three rounds of polishing during which the gene models were corrected based on transcriptome re-mapping using the annotation update utility in PASA [99].

Introner elements were identified in the intronic regions using the program Pattern locator [109]. The patterns we searched for were inverted repeats of 8–20 nucleotides and direct repeats of 3–5 nucleotides within 30 bases of each end of the introns as described in Farhat et al. [43].

Functional annotation of protein-coding genes

Annotation of gene function for P. cordatum and other representative dinoflagellate genomes was conducted based on sequence similarity to known proteins in the UniProt database (release 2021_03). Predicted protein sequences from the gene models were first searched against the manually curated protein sequences of Swiss-Prot (UniProt release 2021_03) using BLASTp v2.3.0 + (E ≤ 10−5; subject-sequence cover ≥ 70%). Sequences that have no Swiss-Prot hits were then searched against TrEMBL (UniProt release 2021_03) using the same parameters. For predicted proteins of P. cordatum, we further assessed functions based on sequence-similarity search against known protein sequences in EnzymeDetector, InterProScan, eggNOG, and Kofam.

Prediction of transit peptides

For each predicted protein, transit peptides were first predicted independently using TargetP v2.0 (-org pl), SignalP v6 (–organism eukarya –mode fast), WoLF PSORT (plant mode), Predotar, and ChloroP v1.1. Subcellular localization is determined based on the consensus from these predictions confirmed in three or more programs.

Analysis of homologous proteins

To identify homologous proteins of the predicted P. cordatum proteins, we compiled a comprehensive protein sequence database (1,554,705 sequences from 31 dinoflagellate taxa; Additional File 3: Table S27) using available genome or transcriptome data. All data from the MMETSP database [110] were downloaded from [111]. For species where there were multiple datasets for the same isolate, the protein sequences were clustered at 90% sequence identity using CD-HIT v4.8.1 [92] to reduce redundancy. Using these 31 sets of protein sequences, homologous sets were then identified based on clustering of protein sequences based on sequence identity using OrthoFinder v2.3.8 [112] at default settings.

Identification of mRNA editing sites

Putative mRNA editing events were identified from single-nucleotide variations observed in genome sequence reads versus RNA-Seq reads. An observed nucleotide variation in the RNA-Seq reads but not in genome sequence reads is considered a potential mRNA edited site. Briefly, 25% of all genome sequence reads (randomly sampled) were mapped to the final genome assembly using bwa-mem v0.7.17-r1188 [113]. Trimmed RNA reads from each sample (6 conditions × 3 replicates) were mapped to the genome assembly separately with HISAT2 v2.2.1 [93] using default parameters (–no-discordant) and a HGFM index that was built using known exons and splice sites from the predicted gene models. PCR duplicates were marked by MarkDuplicates implemented in Picard v2.23.8 [114]. For each condition, mapping of RNA-Seq reads was compared with the mapped genome sequence reads using JACUSA v2 (call-2 -F 1024 -P2 RF-FIRSTSTRAND -s -a D,Y,H:condition = 1) [115]. We follow the authors’ recommendation to assess the statistical significance of an mRNA edited site. A site is considered statistically significant if it meets all the requirements: (a) a score > 1.15; (b) coverage of genome reads > 10; (c) coverage of RNA reads from each condition > 5; (d) number of putative editing type is < 2; (e) the editing site is present in all three replicates.

Analysis of horizontal gene transfer

To identify putative horizontal gene transfer (HGT), P. cordatum proteins were searched (BLASTP, E ≤ 10−5) against a customized protein sequence database that consist of 2,773,521 proteins from 82 other eukaryotes (Additional File 3: Table S28) and 688,212 proteins from 543 single-cell assembled genomes (SAGs) of prokaryotes [49] (Additional File 3: Table S29). Excluding hits to other Prorocentrum proteins, P. cordatum proteins that have a bacterial top hit are considered results of HGT involving P. cordatum and bacteria. To support the inferences of putative HGT, we employed OrthoFinder v2.5.4 [112] to infer homologous protein sets from all the involved proteins (i.e., P. cordatum proteins, proteins from other eukaryotes and SAGs). Homologous protein sequence sets that contain P. cordatum proteins implicated in HGT were multiply aligned using MAFFT v7.453 [116] at –maxiterate 1000. Trimmed with trimAl v1.4.1 [117] using parameter -automated1, the alignments were used to infer phylogenetic trees using IQ-TREE2 [118] at -B 2000 -T AUTO.

Integrated mixOmics analysis

We conducted a multi-omics analysis in P. cordatum using the proteomic and transcriptomic data to identify a systems level heat stress response across the growth phases using the mixOmics package [119] in R. Transcriptome FPKM values were first log2 transformed prior to quality filtering with normalized proteomic data, requiring features be present across 75% of samples. Selected features represented the top 25 and 50% most variable transcripts (15,097) and proteins (259), respectively, that were present across 75% of the samples. These features were then input to mixOmics for Data Integration Analysis for Biomarker discovery using a Latent cOmponents (DIABLO) [59].

We conducted performance testing of the initial model to identify the number of latent components that contained a multi-omics signature using Mfold validation with 5 folds and 50 repeats. This suggested two latent components as the best fits for the model. We then performed final model tuning using the max distance to select diagnostic features for both component 1 (RNA-Seq: 170, proteins: 30) and component 2 (RNA-Seq: 190, proteins: 40). Ordination of all features indicate the separation of the three temperature levels for both transcriptome and proteome features. The transcriptome and proteome features selected for component 1 discriminate a common heat stress response at both 26 and 30°C (Additional File 2: Fig. S9, Additional File 3: Table S12) and for component 2, a heat stress response specific to either 26 or 30°C (Additional File 2: Fig. S10, Additional File 3: Table S12). The variates for the pathways from both components were then input to NetworkAnalyst [120] for KEGG pathway over-representation analysis to identify functional categories that were enriched in each network.

A relevance association network was created for each component using the network function within mixOmics, where values represent a robust approximation of the Pearson correlation. A heatmap displaying the features from each component was created using the pheatmap package in R with features clustered according to their Euclidean distances and scaled within rows. This revealed two main clusters within each component that were extracted using the R package dendextend according to the corresponding dendrogram. A relevance association network was then created for each subcluster as previously done for the two components.

Experimental design of multi-omics analysis

Cultures of P. cordatum CCMP1329 were maintained as described above. After 15 days of cultivation at 20°C, 10 mL was transferred to 90 mL of fresh L1-Si medium in 300-mL Erlenmeyer flasks and placed in a climate chamber set to the desired temperature (26 or 30°C) and exhibiting the same light intensity and light:dark cycle. These temperatures are already common in the Red Sea [121] or near the equator [122], and are well within the range predicted for the future oceans [37]. Replicate cultures were set up under identical conditions to allow sampling of 3 biological replicates each for transcriptome and proteome and 5 biological replicates for metabolome. For proteome analysis, up to 12 L of culture (120 flasks each with 100 mL culture) was cultivated in the same climate chamber to obtain ~ 2 g biomass (wet weight). For each growth stage (exponential, stationary) and temperature (20°C, 26°C, 30°C), a complete set of cultures were sacrificed. Cell counting and harvesting were started about 5 h after the onset of the light period. For cell counting, random samples were chosen from the climate chamber to account for slight differences in light intensity.

For transcriptome analysis, three 100 mL cultures (biological replicates) were sampled per temperature and growth phase. Each culture was centrifuged in two 50-mL Falcon tubes at (4276 g, 4°C, 5 min) in a Heraeus Multifuge™ X1R. The supernatant was decanted, and the pellet was resuspended in the remaining medium. The two pellets were combined in an Eppendorf tube (2 mL) and centrifuged (17,000 g, 4°C, 3 min) in a Heraeus Primo R centrifuge. The remaining supernatant was removed by pipetting, and the weight of the wet pellet was determined. The pellet was resuspended in 1 mL TRIzol reagent (Thermo Fisher Scientific, Waltham MA, USA), snap frozen in liquid nitrogen and stored at −70°C until further analysis.

For proteome analysis, tubes and buffers used in the following steps were pre-chilled; all steps were conducted on ice. The culture (400 mL) was filled into pre-chilled 500-mL centrifuge bottles, centrifuged (4248 g, 4°C, 30 min) using a Sorval Lynx 4000 (Thermo Fischer). The supernatant was decanted, the pellet was resuspended in a buffer (100 mL) containing Tris–HCl (100 mM, pH 7.5) and MgCl2∙6H2O (5 mM). The resuspended pellets were centrifuged (4248 g, 4°C, 30 min), and the supernatant was removed. The pellet was resuspended by gently pipetting in the same buffer (800 µL). The suspension was transferred into 2-mL Eppendorf tubes and centrifuged (17,000 g, 4°C, 5 min) using a Heraeus Primo R centrifuge. The supernatant was removed by pipetting and the weight of the wet pellet was determined. Samples were frozen in liquid nitrogen and stored at −70°C.

For metabolome analysis, 15 mL from five cultures (100 mL each; as biological replicates) from each temperature and growth phase was extracted immediately after cell counting with a filtration unit and 0.22 µm Millipore membrane filter. Samples were filtered at 500 mbar with a vacuum controller. The cells were washed three times with 4°C cold 3.5% NaCl.

The filters were immediately transferred to 5-mL Eppendorf tubes containing 100 mg of glass beads (0.7–100 µm; Kuhmichel, Ratingen, Germany), three stainless steel beads (two 5 mm3 and one 10 mm3; Kugel Winnie, Bamberg, Germany) for partially destroying the filter and to obtain cell lysis. Cold extraction fluid (2 mL, per-chilled at −20°C) was immediately added; this extraction fluid for metabolite extraction contained methanol, ethanol, and chloroform [123], and the internal standard 13C-ribitol. The pre-chilled 5-mL tubes with the filter and extraction fluid on ice were mixed for 20 s and placed back on ice until further treatment.

Analysis of transcriptome and differentially expressed genes

The transcriptome reads were mapped to the assembled reference genome using HISAT2 [93]. The reads mapped onto the exons were counted for the corresponding genes with featureCounts [50]. For differential expression analysis, only uniquely mapped reads were used to avoid ambiguity. The fragments per kilobase of transcript per million mapped fragments (FPKM) value of each gene was calculated by normalizing the fragments per million with the sum of the exon length of the corresponding gene. The ternary visualization of the gene expression pattern across three temperatures was produced with R package ggtern [124].

For analysis of differentially expressed genes (DEGs), genes with low expression were filtered out by the filterbyexpr function in edgeR [125] using default parameters. Then, the DEG analysis was performed with edgeR using the recommended glmQLF test on the raw read count per gene. Genes with a Benjamini–Hochberg corrected p-value ≤ 0.001 and an absolute log2(fold-change) ≥ 2 were considered as significantly differentially expressed.

Hierarchical clustering with the complete-linkage algorithm was used to identify gene clusters based on their expression profile across temperature conditions. The input expression values were centered log2-transformed FPKM values, i.e., log2(FPKM + 1) centered by subtracting the mean. The tree was cut into eight clusters to represent different expression profiles. Superclusters 4 and 7 in the stationary phase (each with small number of genes) showed very similar expression patterns; these were merged for downstream functional enrichment analysis.

Functional analysis of DEGs

The GO term enrichment analysis was carried out with the R package topGO [126] for all three gene ontologies, i.e., Biological Process (BP), Cellular omponent (CC), and Molecular Function (MF). REVIGO [127] was used to summarize the GO terms according to the semantic similarity for a concise visualization (Additional File 2: Fig. S13). In order to perform a KEGG pathway enrichment analysis, we assigned KEGG ortholog (KO) number for each gene using Kofam [51]. When a gene had multiple KO assignments with an e-value ≤ 10−10, we chose the one with the lowest e-value. If a gene had several KO assignments with an e-value of zero, we kept all those KO assignments for this gene. Using this annotation, the genes with KO assigned held about 50% of mapped reads. The expression profile for each KO gene was then generated by summing up the read count for all genes belonging to the same KO gene. Differentially expressed KO genes were then identified using edgeR similar to the above DEG analysis. The KEGG pathway and module enrichment analyses were performed using clusterProfiler [128] on the DEGs. For both GO term enrichment analysis and KEGG enrichment analysis, a false discovery rate (FDR) ≤ 0.05 was considered as significantly enriched.

In the central carbon metabolism pathway analysis, a paired Wilcoxon test was performed to compare the expression change of the gene members belonging to the same EC number. The alteration of a metabolite’s concentration was analyzed using Wilcoxon test on the mean concentration values of biological replicates over the three technical replicates; FDR ≤ 0.05 was regarded as significantly changed.

Proteome analysis

Cells of P. cordatum were resuspended in solubilization buffer and disrupted by bead beating (FastPrep-24 5G, MP Biomedicals) for 10 s at 6 m/s followed by 90 s on ice (in three repetitions) using 0.1 mm silica spheres. Cell debris and insoluble material were removed by ultracentrifugation (104,000 g, 1 h, 17°C) and the supernatant was again centrifuged (200,000 g, 1 h, 17). The protein content of the preparation was determined according to Bradford [129]; 50 µg was subjected to reduction, alkylation, and tryptic digestion as reported previously [130]. Generated peptides were analyzed by (a) one-dimensional (1D) reversed phase nanoLC and (b) two dimensional (2D) SCX-/RP nanoLC separation coupled to MS-detection. For 1D separation, tryptic peptides (1 mg) were decomplexed by reversed phase nanoLC separation (Ultimate 3000 nanoRSLC; Thermo Fisher Scientific, Dreieich, Germany) using a trap column setup (2 cm length, 5 µm bead size, 75 µm inner diameter; Thermo Fisher Scientific) with a 25-cm separation column (2 µm bead size, 75 µm inner diameter; Thermo Fisher Scientific), and applying a 360-min linear gradient [130]. In case of 2D separation, 4 µg was subjected to SCX fractionation (eluent A: 5 mM H3PO4, 5% v/v acetonitrile, pH 3.0; eluent B: 5 mM H3PO4, 5% v/v acetonitrile, 1 M NaCl, pH 3.0) with a linear 20 min gradient and collection of 19 fractions per sample beginning 5 min after injection. Each fraction was subsequently applied for second dimension reversed phase separation (see above) using a 60 min gradient. For both methods, the eluent was continuously ionized (captive spray ion source; Bruker Daltonik GmbH, Bremen, Germany) and ions analyzed by an ion-trap mass spectrometer (amaZon speed ETD; Bruker Daltonik GmbH) as described in Wöhlbrand et al. [131].

To prepare the membrane protein-enriched fraction, cell pellets were gently thawed, resuspended in 0.5 mL membrane lysis buffer (MLB; 100 mM Tris–HCl, 2 mM MgCl2, 10% (w/v) glycerin, 0.5 mM DTT, pH 7.5), and disrupted by bead beating (as described above). DNA was digested using DNase I and the obtained raw fraction applied on top of a continuous sucrose gradient (30–80% w/v) prior to centrifugation (37,800 g, 12 h, 4°C). Membrane containing fractions were collected and washed twice with MLB (centrifugation at 104,000 g, 1 h, 4°C). Obtained membrane pellets were resuspended in MLB (500 µL), pooled, and centrifugation repeated. The final pellet was resuspended in sodium dodecyl sulfate (SDS, 300 µL, 1.0% w/v) and incubated at 95°C (5 min) prior to centrifugation (20,000 g, 20 min, 20°C). The supernatant was snap frozen in liquid nitrogen until further analysis. Protein content was determined using the RC-DC™ Protein Assay (BioRad GmbH, Munich, Germany). A total of 10 µg protein per fraction of each sample was separated by SDS–polyacrylamide gel electrophoresis (SDS-PAGE), and gels were stained with Coomassie brilliant blue [

Comments (0)

No login
gif