GCLiPP: global crosslinking and protein purification method for constructing high-resolution occupancy maps for RNA binding proteins

Cells

Primary CD4+ and CD8+ mouse T cells were isolated from C57BL/6 J mouse peripheral lymph nodes and spleen using positive and negative selection Dynabeads, respectively, according to the manufacturer’s instructions (Invitrogen). All mice were housed and bred in specific pathogen-free conditions in the Animal Barrier Facility at the University of California, San Francisco. Animal experiments were approved by the Institutional Animal Care and Use Committee of the University of California, San Francisco. Cells were stimulated with immobilized biotinylated anti-CD3 (0.25 mg/mL, BioXcell, clone 2C11) and anti-CD28 (1 mg/mL, BioXcell, clone 37.51) bound to Corning 10-cm cell culture dishes coated with Neutravidin (Thermo) at 10 mg/mL in PBS for 3 h at 37 °C. Cells were left on stimulation for 3 days before being transferred to non-coated dishes in T cell medium [68] supplemented with recombinant human IL-2 (20 U/mL, NCI). Th2 cell cultures were also supplemented with murine IL-4 (100 U/mL) and anti-mouse IFN-γ (10 µg/mL). CD8 T cell cultures were also supplemented with 10 ng/mL recombinant murine IL-12 (10 ng/mL). For re-stimulation, cells were treated with 20 nM phorbol 12-myristate 13-acetate (PMA) and 1 µM ionomycin (Sigma-Aldrich) for 4 h before harvest.

Peripheral blood mononuclear cells (PBMCs) were isolated from anonymous donors through Ficoll-Paque Plus centrifugation gradient (Cytiva). CD4 T cells were isolated from PBMCs using EasySep Human CD4 + Isolation Kit according to the manufacturer’s protocol (StemCell Technologies). Cells were stimulated on plates coated with anti-CD3 (1 μg/ml, UCSF Monoclonal Antibody Core; clone OKT-3) and anti-CD28 (2 μg/ml, Miltenyi Biotec; clone 15E8). After 2 days of stimulation, cells were electroporated to incorporate CRISPR-Cas9 RNPs and placed back on anti-CD3- and anti-CD28-coated plates for 1 day. Cells were then rested in T cell media supplemented with recombinant human IL-2 (20 U/mL, NCI). For Th2 polarizing conditions, cultures were supplemented with human recombinant IL-4 (12.5 ng/mL, R&D Systems) and human anti-IFN- γ (10 μg/ml, Invitrogen, clone NIB42) during stimulation and only with anti-IFN- γ (5 μg/ml) during rest. Protein readout for CD5 was conducted 4 days after electroporation and 6 days for pSTAT6. T cell media consisted of RPMI-1640 supplemented with 10% fetal bovine serum (FBS) (Omega), L-glutamine, penicillin, streptomycin, sodium pyruvate, β-mercaptoethanol, and HEPES. Jurkat cells were grown in RPMI supplemented with FBS, L-glutamine, penicillin, and streptomycin.

Measurement of mRNA decay

Cells were stimulated with PMA and Ionomycin for 4 h and then additionally treated with actinomycin D (Sigma-Aldrich) at 5 µg/mL for an additional 0, 1, 2, or 4 h. After treatment, cells were lysed with Trizol LS (Life Technologies) and processed with Direct-zol™ 96-well RNA (Zymogen). RNA was quantified with an ND-1000 spectrophotometer (NanoDrop) and reverse transcribed with SuperScript III First Strand Synthesis Kit (Invitrogen).

GCLiPP and RNAseq

 ~ 100 × 106 mouse T cells cultured from 3 mice or ~ 100 × 106 Jurkat T cells were washed and resuspended in ice-cold PBS and UV irradiated with a 254-nm UV crosslinker (Stratagene) in three doses of 4000, 2000, and 2000 mJ, swirling on ice between doses. Cells were pelleted and frozen at − 80 °C. Thawed pellets were rapidly resuspended in 400 µL PXL buffer without SDS (1 × PBS with 0.5% deoxycholate, 0.5% NP-40, Protease inhibitor cocktail) supplemented with 2000 U RNasin (Promega) and 10 U DNase (Invitrogen). Pellets were incubated at 37 °C with shaking for 10 min, before pelleting of nuclei and cell debris (17,000 g for 5 min). Supernatants were biotinylated by mixing at room temperature for 30 min with 500 µL of 10 mM EZ-Link NHS- SS-Biotin (Thermo) and 100 µL of 1 M sodium bicarbonate. Supernatants were mixed with 1 mg of washed oligo-dT beads (New England Biolabs) at room temperature for 30 min and washed 3 times with magnetic separation. Oligo-dT selected RNA was eluted from beads by heating in poly-A elution buffer (New England Biolabs) at 65 °C with vigorous shaking for 10 min. An aliquot of eluted RNA was treated with proteinase K and saved for RNAseq analysis using Illumina TruSeq Stranded Total RNA Library Prep Kit according to the manufacturer’s instructions. Cells treated with actinomycin D as described above were also collected for RNAseq to generate transcriptome-wide measurements of transcript stability.

The remaining crosslinked, biotinylated mRNA-RBP complexes were captured on 250 µL of washed M-280 Streptavidin Dynabeads (Invitrogen) for 30 min at 4 °C with continuous rotation to mix. Beads were washed 3 times with PBS and resuspended in 40 µL of PBS containing 1000 U of RNase T1 (Thermo) for 1 min at room temperature. RNase activity was stopped by addition of concentrated (10% w/v) SDS to a final concentration of 1% SDS. Beads were washed successively in 1 × PXL buffer, 5 × PXL buffer, and twice in PBS. Twenty-four picomoles of 3′ radiolabeled RNA linker was ligated to RBP-bound RNA fragments by resuspending beads in 20 µL ligation buffer containing 10 U T4 RNA Ligase 1 (New England Biolabs) with 20% PEG 8000 at 37° for 3 h. Beads were washed 3 × with PBS and free 5′ RNA ends were phosphorylated with polynucleotide kinase (New England Biolabs). Beads were washed 3 × with PBS and resuspended in ligation buffer containing 10 U T4 RNA Ligase 1, 50 pmol of 5′ RNA linker, and 20% PEG 8000 and incubated at 15 °C overnight with intermittent mixing. Beads were again washed 3 times in PBS and linker-ligated RBP-binding fragments were eluted by treatment with proteinase K (Sigma-Aldrich) in 20 µL PBS with high-speed shaking at 55 °C. Beads and supernatant were mixed 1:1 with bromophenol blue formamide RNA gel loading dye (Thermo) and loaded onto a 15% TBE-Urea denaturing polyacrylamide gel (Bio-Rad). Ligated products with insert were visualized by autoradiography and compared to a control ligation (19 and 24 nt markers). Gel slices were crushed and soaked in gel diffusion buffer (0.5 M ammonium acetate; 10 mM magnesium acetate; 1 mM EDTA, pH 8.0; 0.1% SDS) at 37 °C for 30 min with high-speed shaking, ethanol precipitated, and resuspended in 20 µL of RNase-free water. Ligated RNAs were reverse transcribed with Superscript III reverse transcriptase (Invitrogen) and amplified with Q5 polymerase (New England Biolabs). PCR was monitored using a real-time PCR thermal cycler and amplification was discontinued when it ceased to amplify linearly. PCR products were run on a 10% TBE polyacrylamide gel (Bio-Rad), size selected for an amplicon with the predicted 20–50 bp insert size to exclude linker dimers, and purified from the gel (Qiagen). Cleaned up library DNA was quantified on an Agilent 2100 Bioanalyzer using the High Sensitivity DNA Kit before being sequenced. All GCLiPP and RNAseq sequencing runs were carried out on an Illumina HiSeq 2500 sequencer.

GCLiPP and RNAseq bioinformatics analysis pipeline

FastQ files were de-multiplexed and trimmed of adapters. Each experiment was performed on three technical replicates per condition (resting and stimulated) per experiment. Cloning replicates and experiments were pooled in subsequent analyses. Jurkat and mouse T cell trimmed sequence reads were aligned to the hg38 human or mm10 mouse genome assembly using bowtie2, respectively. After alignment, PCR amplification artifacts were removed by de-duplication using the 2-nt random sequence at the 5′ end of the 3′ linker using a custom script that counted only a single read containing a unique linker sequence and start and end position of alignment per sequenced sample. Peaks of GCLiPP read density were called by convolving a normal distribution against a sliding window of the observed read distribution with a custom script (utr_peak_finder.pl). A 70-nucleotide window was analyzed centered on every nucleotide within the 3′ UTR. For each window, the observed distribution of read density was compared to a normal distribution of the same magnitude as the nucleotide in the center of the window. The Pearson correlation coefficient was computed for each nucleotide and peaks were defined as local maxima of goodness of fit between observed GCLiPP read density and the normal distribution, requiring a read depth above 20% of the maximum read depth in the 3′ UTR global minimum of 10 reads. RNAseq reads were aligned using STAR Aligner (https://github.com/alexdobin/STAR) [69] to align against the mm10 genome, and gene expression data were calculated as fragments per kilobase per million reads. Source code for data visualization software Thagomizer can be found at https://github.com/sskhon-2014/Graphy.

Comparison of GCLiPP to individual eCLIP datasets

eCLIP data [32] from K562 cell line were downloaded via the ENCODE data portal (http://www.encodeproject.org/). The first replicate set of bigwig files were downloaded for each RBP deposited online at the time of analysis (December 2017) (Additional file 7, Table S7) as well as CLIPper-called peaks for the same. To facilitate comparisons with GCLiPP, we called GCLiPP peaks in the Jurkat data using CLIPper [29] after re-aligning Jurkat GCLiPP reads to hg19. Correlation analysis was performed with a custom perl script that calculated the Spearman correlation for read depth at each nucleotide in the 3′ UTR of all genes that were expressed in each dataset (as determined by CLIP read depth). ~ 5000–15,000 expressed genes were included in the correlation analysis for each RBP. For comparison to mRNP abundancy, log10 RBP mass spectrometry spectra counts of HEK293 cells were utilized from [21]. To stratify RBPs by subcellular localization, data were taken from the COMPARTMENTS database, with RBPs with a localization score of 5 in the cytosol counted as cytosolic and lower counted as non-cytosolic [33]. All custom scripts are available at https://github.com/AnselLab/GCLiPP-Manuscript-scripts.

RBP domain analysis

We called Jurkat GCLiPP peaks aligned to hg38 using CLIPper2.0 [29]. Each peak was resized to 200 bp and oriented at the original peak center. The 200 bp RNA sequence of each peak was analyzed using pf_fold method from ViennaRNA (RNAlib version 2.4.13) [30] to calculate base-pairing probability for each pair of nucleotides and presented as an average for all the identified RBP binding sites. The PTBP1 eCLIP dataset (hg38) from K562 cells was downloaded from ENCORE (GSM2424223) and processed in similar manner. The matrices in Fig. 2A and Fig. S1A are zoomed into the central 150-bp region.

We used available resting and activated Jurkat expression data [70] (GSE145453) to calculate read counts mapped to RBP domains using annotations from RBPDBv1.3 [71] as a reference. Proteomics data of RBPs expressed in human Th0 cells was obtained and identified as described [12]. RBPs that contained more than one annotated domain based on RBPDBv1.3 were considered as an individual count in each appropriate category.

Conservation of RBP binding sites

To evaluate sequence conservation across various datasets, we performed CLIPper2.0 peak calling on sequencing data obtained through XRNAX [15] and OOPS [16]. The average PhyloP conservation score, obtained from UCSC genome browser as a bigwig of PhyloP scores of conservation 100 vertebrates, was calculated across all the sites within each method. This average was then standardized to contain a mean of 0 and a standard deviation of 1. Sequencing data for XRNAX (PRJEB26441; run accession ERR2537875) and OOPS (PRJEB26736; run accession SAMEA4663545, SAMEA4663546, SAMEA4663547, SAMEA4663548) was retrieved from EMBL-EBI ENA server and mapped to hg38 before CLIPper2.0 analysis. Specifically, our analysis used XRNAX data without ribosomal depletion and OOPS data performed using 150 mJ/cm2 crosslinking condition.

Identifying differential RBP binding

We used DeepRNAreg (accompanying manuscript available upon request) to compare GCLiPP datasets from activated and resting Jurkat cells to obtain a list of genomic loci within 3′UTRs that were enriched in either condition, and assign a differential binding intensity (DBI) value to each site. This list of loci was intersected with all ENCORE eCLIP datasets for K562 cells to assign corresponding predicted RBPs for each identified binding region. For regions assigned to each RBP, we calculated the mean DBI for activated and resting Jurkat cells, and expressed the mean DBI fold change as the ratio of these means. Gene expression in activated and resting Jurkat cells was determined by calculating total read counts from Jurkat expression data [70] (GSE145453). Source code for DeepRNAreg is available at https://github.com/AnselLab/DeepRNA-Reg.

The same sets of regions differentially bound in activated or resting Jurkat cells were scored for the presence of consensus RBP recognition motifs within an 8-base pair window centered at the differential binding site. Enrichment of each binding motif within these regions was calculated against the background frequency of the same motif within the entire set of 3′ UTRs of genes bearing differentially bound regions. This analysis was performed for 119 RBPs that are represented in the oRNAment database of consensus binding sequences [34] and expressed in Jurkat cells [70].

CRISPR editing

Guide RNA sequences were selected using the Benchling online CRISPR design tool (https://benchling.com/crispr) with guides selected to target genomic regions of GCLiPP read density. Synthetic crRNAs and tracrRNA (Dharmacon) were resuspended in water or 10 mM Tris–HCl Buffer pH 7.4 (Dharmacon) at 160 µM and allowed to hybridize at 1:1 ratio for 30 m at 37 °C. For CRISPR dissection experiments, all crRNAs were mixed at an equimolar ratio before annealing to tracrRNA. This annealed gRNA complex (80 µM) was then mixed 1:1 by volume with 40 µM S. pyogenes Cas9-NLS (University of California Berkeley QB3 Macrolab) to a final concentration of 20 µM Cas9 ribonucleotide complex (RNP). The complexed gRNA:Cas9 RNP and random 200 bp ssDNA (100 pmol, IDT) were nucleofected into primary mouse T cells (24 h after stimulation) or primary human T cells (48 h after stimulation) with the P3 Primary Cell 96-well Nucleofector™ Kit or into Jurkat cells with the SE Cell Line 96-well Nucleofector™ Kit using a 4-D Nucleofector following the manufacturer’s recommendations (Lonza). Cells were pipetted into pre-warmed media and then returned to CD3/CD28 stimulation for another 2 days for primary mouse T cells or 1 day for primary T cells and then expanded for an additional 3–5 days. Jurkat cells were expanded in resting conditions for 3–10 days after electroporation.

To validate deletion, gDNA was isolated from a portion of the edited cells using QuickExtract DNA Solution (Lucigen) following the manufacturer’s protocol. Edited regions were amplified through PCR using designed primers and MyTaq 2 × Red Mix (Bioline) and ran on 2% agarose gel. DNA bands were detected and quantified using Bio-Rad ChemiDoc.

3′ UTR dissection

3′ UTR dissection was performed as described [72]. Gene edited cells were harvested into Trizol reagent (Invitrogen) and total RNA was phase separated and purified from the aqueous phase using the Direct-zol RNA miniprep kit with on-column DNase treatment (Zymogen). Genomic DNA was extracted from the remaining organic phase by vigorous mixing with back extraction buffer (4 M guanidine thiocyanate, 50 mM sodium citrate, 1 M Tris base). cDNA was prepared with oligo-dT using the SuperScript III reverse transcription kit (Invitrogen). cDNA and genomic DNA were used as a template for PCR using MyTaq 2 × Red Mix (Bioline). To equilibrate the number of target molecules and number of PCR cycles between samples, we performed semi-quantitative PCR followed by agarose gel electrophoresis to determine a PCR cycle number where genomic DNA first showed visible bands. This cycle number was then used with a titration of cDNA concentrations. A concentration that amplified equivalently was selected for analysis by deep sequencing. To quantify relative RNA/DNA ratios, cDNA and genomic DNA amplicons were purified using a QIAquick PCR purification Kit (Qiagen) and quantified on an Agilent 2100 Bioanalyzer using the High Sensitivity DNA Kit (Agilent).

Amplicons were tagmented with the Nextera XT kit (Illumina) and sequenced on an Illumina 2500 HiSeq. Reads were aligned to a custom genome consisting of the targeted PCR amplicon using STAR aligner and mutations were scored using an awk script (https://github.com/alexdobin/STAR/blob/master/extras/scripts/sjFromSAMcollapseUandM.awk). RNA/DNA read ratios were calculated for all mutations over 20 nucleotides long and less than 250 nucleotides long, and relative expression was quantified as the median normalized RNA/DNA ratio for this subset of mutations. Mutations had to have at least 10 reads in both the RNA and gDNA amplicons and mutations with an RNA/DNA ratio of greater than 10 were excluded as outliers. Effect sizes for each nucleotide of the amplicon in each experiment were computed by comparing this median normalized RNA/DNA ratio for all mutations spanning a given nucleotide to all other mutations. Combined p-values were calculated using Welch’s two sample t-test comparing all mutations spanning a given nucleotide with all other mutations.

Shared peak calling, motif analysis and icSHAPE and phylogenetic analyses

3′ UTR alignments of mouse and human were performed by downloading hg38 RefSeq 3′ UTRs from UCSC genome browser (http://genome.ucsc.edu), identifying syntenic regions of the mouse genome in mm10 with the KentUtils liftOver program (https://github.com/ucscGenomeBrowser/kent) and aligning UTRs with Clustal Omega (http://www.ebi.ac.uk/Tools/msa/clustalo/) [73]. Alignments were programmatically performed for all human 3′ UTRs with a custom perl script (get_alignment_from_fasta.pl). Biochemically shared peaks were called by the following algorithm (implemented in conserved_peak_finder.pl). This algorithm normalizes GCLiPP read density (i.e., the fraction of the maximal read depth within that 3′ UTR) at each position and calculates the correlation between mouse and human normalized signal. To favor regions with a clear local peak of GCLiPP read density, the algorithm further calculates the correlation between the observed data and a normal distribution centered at the point being examined in both the mouse and human data tracks. These three Spearman correlations were added together to calculate a numerical score, and shared peaks were defined as local maxima of these scores. To identify high-stringency peaks, peaks were only accepted if they (1) had a correlation of > 0.75 between mouse and human, (2) had a peak that had a read density of > 0.5 of the maximum read density within that 3′ UTR in one data track (mouse or human) and > 0.2 in the other, and (3) had > 10 reads at that location in both mouse and human datasets. Biological enrichment of genes with shared peaks was calculated using the Metascape [52] online interface (http://metascape.org) using the default settings, with the exception that a background set of genes was included in the analysis, specifically all genes that contain a called GCLiPP peak in both human and mouse datasets that do not contain a biochemically shared peak.

For motif calling, HOMER [46] was used in RNA mode with the “noweight” option to turn off GC correction to search for motifs of width 5, 6, or 7 nucleotides, with otherwise default parameters. The positive sequence set was the mouse and human sequences of the biochemically shared GCLiPP peaks, the negative sequence set was all other GCLiPP-called peaks from Jurkat and mouse T cells that were not shared across species. For icSHAPE, we used a published bigwig file of locally normalized icSHAPE signal intensity generated in mouse ES cell [44]. Conservation of loci in the mouse and human genomes were obtained from the UCSC genome browser as a bigwig of PhyloP scores of conservation across 60 placental mammals (mouse) and 100 vertebrates (human) (http://hgdownload.cse.ucsc.edu/goldenpath/mm10/phyloP60way/, http://hgdownload.cse.ucsc.edu/goldenpath/hg38/phyloP100way/).

Mapping SNPs within GCLiPP peaks

We intersected our list of 3′UTR RBP peaks, determined using our peak calling algorithm, with a curated list of predicted disease causal SNPs [53] to identify SNPs within predicted RBP binding regions. We limited our analysis to SNPs located in the 3′UTR of genes that contained at least 1 GCLiPP peak. Specific regions in the 3′UTR of CD5, IKZF1, and STAT6 were deleted in resting Jurkats using CRISPR-Cas9 RNPs as previously mentioned. Protein expression of the edited genes was measured by flow cytometry 3–5 days after nucleofection.

Flow cytometry

Cells were stained with Live/Dead eFluor780 (Invitrogen) and anti-human CD5 (UCHT2) or intracellularly with anti-human IKZF1 (R32-1149) using the Foxp3 Transcription Factor Staining Kit (eBioscience). For pSTAT6 expression, Jurkat cells or primary human T cells were treated with recombinant human IL-4 (12.5 ng/mL; R&D Systems) for 0, 5, 10, 15, or 30 min, immediately fixed with 1.5% PFA for 10 min and permeabilized with ice-cold methanol for 15 min before staining with pSTAT6 (A15137E) for 1 h at room temperature. Primary T cells were additionally stained with anti-human CD4 (OKT4) and anti-human CD8 (HIT8a). Cells were analyzed on LSRII and FACSAria cytometers. GraphPad Prism was used for data visualization and for Mann–Whitney two-tailed t-test.

Oligonucleotide and primer sequences

GCLiPP 3′ RNA linker: 5′-NNGUGUCUUUACACAGCUACGGCGUCG-3′

GCLiPP 5′ RNA linker: 5′-CGACCAGCAUCGACUCAGAAG-3′

GCLiPP Reverse transcription primer: 5′-CAAGCAGAAGACGGCATACGAGATNNNNNNCGCTAGTGACTGGAGTTCAGACGTGTGCTCTTCCGATCCGACGCCGTAGCTGTGTAAA-3′ (NNNNNN is barcode for demultiplexing).

GCLiPP 3′ PCR primer: 5′-CAAGCAGAAGACGGCATACGAGAT-3′

GCLiPP 5′ PCR primer: 5′-AATGATACGGCGACCACCGAGATCTACACTGGTACTCCGACCAGCATCGACTCAGAAG-3′

Read1seq sequencing primer for GCLiPP: 5′-ACACTGGTACTCCGACCAGCATCGACTCAGAAG-3′Index sequencer primer for GCLiPP: 5′-GATCGGAAGAGCACACGTCTGAACTCCAGTCAC-3′

PIM3 (human) gRNA1: TGTGCAGGCATCGCAGATGG

PIM3 (human) gRNA2: GACTTTGTACAGTCTGCTTG

PIM3 (human) gRNA3: GTGGCTAACTTAAGGGGAGT

PIM3 (human) gRNA4: AAACAATAAATAGCCCCGGT

PIM3 (human) gRNA5: TTGAGAAAACCAAGTCCCGC

PIM3 (human) gRNA6: CAGGAGGAGACGGCCCACGC

PIM3 (human) gRNA7: TTTATGGTGTGACCCCCTGG

PIM3 (human) gRNA8: CCAAGCCCCAGGGGACAGTG

Pim3 (mouse) gRNA1: GTTCAATTCTGGGAGAGCGC

Pim3 (mouse) gRNA2 CTGGTTCAAGTATCCACCCA

Pim3 (mouse) gRNA3: CCATAAATAAGAGACCGTGG

Pim3 (mouse) gRNA4: GCTTCCTCCCGCAAACACGG

Pim3 (mouse) gRNA5: CTGGTGTGACTAAGCATCAG

Pim3 (mouse) gRNA6: TGGAGAAGGTGGTTGCTTGG

Primers

PIM3 F (human): TCCAGCAGCGAGAGCTTGTGAGGAG

PIM3 R(human): TGATCTCCAGACATCTCACTTTTGAACTG

PIM3 R2(human): TGAGATAGGTGCCTCACTGATTAAGCATTGGTGATCTCCAGACATCTCACTTTTGAACTG

Pim3 F (mouse): GCGTTCCAGAGAACTGTGACCTTCG

Pim3 R (mouse): TATGATCTTCAGACATTTCACACTTTTG

CD5 gRNA1: GGAGCCTCGGGTCTGATCAA

CD5 gRNA2: GCTCTTCCAGACTTATTATG

IKZF1 R1 gRNA1: AAGGCTGACTTGTGTTCATG

IKZF1 R1 gRNA2: GCAACAAACTGACTCTAAGA

IKZF1 R2 gRNA1: TTATCATTGCATATCAGCAA

IKZF1 R2 gRNA2: ACATAATGCTTTTGGTGCGA

STAT6 gRNA1: GGGGTTAGCATATGTCAGAG

STAT6 gRNA2: CCAAATTCCTGTTAGCCAGG

STAT6 KO gRNA1: TCATAAGAAGGCACCATGGT

STAT6 KO gRNA2: CTGGATCCTCTTCAGCACTA

Comments (0)

No login
gif