The human endometrial carcinoma HEC-1-B cells were cultured in the modified Eagle’s medium (MEM) supplemented with 10% fetal bovine serum (FBS) and 1% penicillin-streptomycin at 37°C in a 5% (v/v) CO2 incubator.
The human embryonic kidney HEK293T cells were cultured in the Dulbecco’s modified Eagle’s medium (DMEM) supplemented with 10% fetal bovine serum (FBS) and 1% penicillin-streptomycin at 37°C in a 5% (v/v) CO2 incubator.
SgRNA design and plasmid constructionWe designed sgRNA sequences using CRISPOR, most of which were located at DNase I hypersensitive sites. The plasmid construction was performed as previously described [18]. In brief, the pGL3-U6-sgRNA-PGK-puro was linearized with BsaI (NEB) at 37°C for 1.5h. The resulting plasmid backbone of the linearized vector was run on the 0.8% agarose gel and purified by Monarch DNA Gel Extraction Kit (NEB). The oligos for the inserted sgRNA targeting sequences were synthesized with two overhangs compatible with the linearized vector and complementary to each other. For example, we annealed two pairs of oligos for the double cutting in the MeCP2 locus (MeCP2-1-Fw: 5′-ACCGC ATACA TGGGT CCCCG GTCA-3′, Rv: 5′-AAACT GACCG GGGAC CCATG TATG-3′ for the first cut; MeCP2-2-Fw: 5′-ACCGT TGAAG TGCGA CTCAT GCTG-3′, Rv: 5′ -AAACC AGCAT GAGTC GCACT TCAA-3′ for the second cut). After annealing, the duplexes were ligated with the purified vector with T4 DNA ligase (NEB). The ligation products were transformed into DH5α bacteria for amplification. All plasmids were confirmed by Sanger sequencing.
DNA polymerase and Ku70/80 knockdownKnockdown experiments were performed as previously described [18, 27].Briefly, we designed two sgRNAs for each polymerase and Ku70/80, both targeting coding regions to achieve an efficient knockdown. For example, we constructed two sgRNAs plasmids targeting PolD with the oligos listed below (PD1-Fw: 5′-ACCGG TATGG GAAGT AGACC TGGG-3′, Rv: 5′-AAACC CCAGG TCTAC TTCCC ATAC-3′; PD2-Fw: 5′-ACCGT GATGA TCACG TAGGG GACG-3′, Rv: 5′-AAACC GTCCC CTACG TGATC ATCA-3′). HEC-1-B cells and HEK293T cells were plated in 6-well plates with around 30–40% cell confluence 1 day before transfection. When cells reached more than 80% confluence, they were co-transfected with Cas9 and two sgRNAs plasmids using Lipofectamine 3000 (Thermo Fisher) according to the manufacturer’s instructions. After 12 h of culturing with 5% FBS, the culture medium was changed back to the normal condition with 10% FBS. Culturing continued for an additional 12 to 24 h, then cell growth was assayed with the HPRT1 and DCK reporter systems and target site cleavage.
HPRT1 and DCK assay systemsWe used the two reporter systems of HPRT1 or DCK to detect large resections of the flanking exons induced by intronic targeting sites by the CRISPR/Cas9 system with dual sgRNAs. For the HPRT1 assay system, cells with functional HPRT1 are very sensitive to the 6-TG (6-thioguanine) chemical, and convert it into toxic thioguanosine monophosphate. By contrast, cells with deficient or non-functional HPRT1 are resistant to this lethal drug and can survive. Cells without DCK, a housekeeping gene that plays an important role in DNA synthesis, are not able to accomplish DNA synthesis and will end with cell apoptosis. We designed dual sgRNAs 70–100 bp away from the splicing site within the intron 2 and intron 4 of HPRT1 and DCK, respectively. If there are large resections into the flanking exons of HPRT1 or DCK induced by Cas9 with dual sgRNAs, cells will survive in the HPRT1 or die in the DCK assay systems.
For the HPRT1 cell growth assay, we plated HEC-1-B cells on the 6-well plate with a cell confluence of 30–40% 1 day before transfection. The number of cells plated in each well was kept consistent. When cell confluence reached 80%, we transfected the cells with plasmids targeting different polymerases to obtain knockdown cell populations. Two days later, cells were transfected again with sgRNAs targeting intron 2 of HPRT1 in the low serum medium. The medium was changed back to normal serum and continued culturing for one more day. Finally, we selected cells with 6-TG at a concentration of 10 µg/ml for 7 consecutive days. The cells were collected to count the numbers on day 1, day 2, day 4, day 6, and day 7. For the DCK cell growth assay, the procedures were similar but without the use of 6-TG, and cells were collected on day 1, day 2, day 3, day 4, and day 5.
Genomic DNA extractionWe extracted genomic DNA from transfected cells to obtain purified DNA templates for further analysis. Briefly, DPBS was used to collect cells when cell confluence reached 70–80%. After centrifugation and discarding the supernatant, the cell pellets were resuspended in the lysis buffer (200 mM NaCl, 10 mM Tris-HCl (pH 7.4), 2 mM EDTA (pH 8.0), and 0.2% (wt/vol) SDS) and incubated at 37°C with 750 rpm overnight. The genomic DNA was precipitated with 0.7× volume of isopropyl alcohol after centrifuging at a high speed of 14,000g for 0.5h. Finally, the pellet was washed with 80% ethanol and DNA was dissolved with TE. The genomic DNA can be stored at −20°C for at least half a year.
Preparation of junctional amplicon librariesChromosomal rearrangements including fragment deletion, inversion, and duplication can be induced by Cas9 with two sgRNAs. We used high-throughput sequencing to assay various junctional repair outcomes of different chromosomal rearrangements. Since the sizes of the amplified reads within an amplicon library are roughly the same, the bias of PCR amplification efficiency should be negligible. In addition, PCR modeling-based analysis showed that amplified sequences within an amplicon library have similar amplification efficiency (Additional file 1: Fig. S13). Considering the limitation of the read length, the primers used here were all near the junctional site and the length of the final amplified products was less than 290 bp. The experiments were performed as previously described with modifications [19]. Briefly, the primers were designed to be compatible with the Illumina sequencing platform. The PCR conditions were as follows: initial denaturation at 95°C for 3 min, 30 cycles of denaturation at 95°C for 30 s, annealing at 60°C for 15 s, and extension at 72°C for 30 s, followed by a final extension at 72°C for 3 min. The PCR products were purified with the High-Pure PCR Product Purification kit (Roche) and then sequenced by the X ten platform.
Multiplex high-throughput sequencingFor assessing the junctional repair outcomes of each chromosomal rearrangement upon perturbing DNA polymerases and Ku70/80, we constructed libraries using Illumina P5/P7 primers with unique barcodes and indexes. For cost-effective sequencing, we constructed libraries for the same experiment but different replicates with the same index and barcode, but split samples of replicates into different lanes for efficient sequencing. After library construction, we quantified libraries with Qubit dsDNA HS assay and pooled samples of different experiments with equal mole for efficient sequencing. We performed each polymerase and Ku70/80 knockdown experiment with three replicates. In total, we constructed 829 libraries for high-throughput sequencing.
RNA extraction and RT-PCRWe used the TRIzol reagents (Invitrogen) to obtain the total mRNA for the RT-PCR test. In detail, we used 1ml TRIzol reagent for each well of six-well plates with cell confluence of more than 80%. After homogenization, the samples were incubated for 5 min at room temperature to complete the dissociation of nucleoprotein complexes. Then 200 µl of chloroform was added to the samples, which were then shaken continuously vigorously for 15 s. After shaking, samples were left at room temperature for 5 min, then spun at 12,000g for 15 min at 4°C. After centrifugation, RNA was precipitated with 500 µl isopropyl alcohol. Finally, the pellets were washed with 75% ethanol twice and dissolved with RNase-free water. The RNA can be stored at −20°C for up to a year. For RT-PCR, we used HiScript III RT SuperMix (Vazyme) for reverse transcription according to the manufacturer’s instructions followed by PCR with targeting primers. Primer sequences are listed in Additional file 3: Table S1.
Simultaneous sequencing of deletion and inversion junctionsLAM-HTGTS (linear amplification-mediated high-throughput genomic translocation sequencing) was first introduced to detect translocations [31]. We used this method with a few modifications to assay large resections at junctional sites of chromosomal rearrangements induced by CRISPR/Cas9 systems with dual sgRNAs. Briefly, HEC-1-B cells were plated on the 6-well plate with a cell confluence of around 30% 1 day before transfection. When cell confluence reached 70%, we added fresh medium with 5% FBS and performed transfection with Lipofectamine 3000 (Thermo Fisher) according to the manufacturer’s instructions. The medium was changed back to the normal medium 24h later and continued culturing for another day to obtain total genomic DNA. We dissolved genomic DNA at a final concentration of 250 ng/µl for sonication. The sonication conditions were 8 trains of 30 s ON and 90 s OFF with low intensity. After sonication, the fragmented DNA was analyzed on 1.5% agarose gel and the ideal size should be 400–600 bp.
To acquire junctional repair outcomes of inversion and deletion simultaneously, we used primers targeting the left side of bait DSB and performed linear amplification to acquire prey sequences. Briefly, we used 5 µg sonicated DNA as input and amplified the target with Super-Fidelity DNA Polymerase (Vazyme) using 5′-biotinylated primers, which can be captured efficiently by streptavidin beads and ease downstream enrichment. The linear amplification conditions are as follows: initial denaturation at 98°C for 3 min, 85 cycles of denaturation at 98°C for 30 s, annealing at 58°C for 30 s, and extension at 72°C for 90 s, followed by a final extension at 72°C for 5 min. The linear amplification products were enriched with streptavidin beads. To get rid of free primers, we used BW buffer (5mM Tris-HCl, 0.5mM EDTA, 1M NaCl) to wash the beads. Finally, we resuspended the beads with ddH2O.
Considering various amplification 3′ ends, we ligated linear amplification products from the last step with annealed partial double-strand adaptors which have six random nucleotides at the 3′ end of one strand. After adaptor ligation, we proceeded with on-bead PCR using Super-Fidelity DNA Polymerase (Vazyme) with P5/P7 adaptors. The PCR conditions were as follows: initial denaturation at 95°C for 5 min; 19 cycles of denaturation at 95°C for 30 s, annealing at 60°C for 30 s, extension at 72°C for 60 s; followed by a final extension at 72°C for 5 min. The PCR products were purified with the High-Pure PCR Product Purification kit (Roche) and the library was sequenced by Illumina X ten platform.
Customized computer program for reads processingAlthough Cas9 has been reported to have staggered cleavage activity, up to now, there has not been any alignment software that takes this into account. We developed an alignment program that considers the complexity and diversity of Cas9 cleavage activity. With this program, we can obtain more precise alignments and thus ease downstream analyses.
CRIPSR-related insertions and deletions are frequently consecutive nucleotides. Software such as CrisprVariants [36] and AmpliconDIVider [35] maps next-generation sequencing (NGS) reads by traditional aligners like BWA-MEM [44] and NovoAlign (http://www.novocraft.com). The software often reports CRISPR-unrelated short non-consecutive insertions and deletions. To solve this problem, Labun et al. developed ampliCan [34] by removing the gap-extension penalty and by modifying other scoring parameters of the Needleman-Wunsch algorithm. Thus, ampliCan tends to report consecutive long deletions and/or insertions. However, the indels reported by ampliCan are not considered to be at the Cas9 cleavage sites. Clement et al. proposed a partial solution to this problem in CRISPResso2 [33] by introducing a reward or bonus at the cleavage site to incentivize indels there. Nevertheless, this does not completely solve the problem because the Cas9 cleavage may be staggered [18, 19]. In particular, it is not proper to treat the diverse profiles of Cas9 endonucleolytic cleavages as a single position of the -3 nucleotide upstream of PAM.
We develop a new program to solve this conundrum. It aligns each NGS input read to the junctional reference by two levels of optimization. Each NGS input read is separated into three parts before being mapped to the junctional reference. At the lower level, the program searches the optimal alignments of the left and right parts to the junctional reference, and the possibly empty middle part is the unmapped random insertion. At the upper level, the program searches the optimal separation of the three parts. The two levels of optimization are technically integrated into dynamic programming. We permit the left and right parts of each NGS input read to overlap to capture the overhang of the staggered Cas9 cleavage ends. The detailed mathematical design and generalization as well as computational dynamic programming and source code (main.cpp) including its usage (Additional file 2: Notes S1-S4) are available on the GitHub Platform (https://github.com/ljw20180420/lierlib).
Calling for insertions and small deletionsInsertions and small deletions are called as previously described with optimizations [18, 19]. We designed PCR primer pairs near the junctional sites (Additional file 3: Table S1) for generating amplicon libraries (PCR products not more than 290bp in size) to assay small indels at junctions of chromosomal rearrangements. Therefore, paired end sequences can be merged for each read. In total, we obtained about 1.6 billion reads for this assay. After demultiplexing raw data of the FASTQ format with the index and barcode, we trimmed the sequences with Cutadapt [45]. For each member of the amplicon library, we then merged the two paired end sequence reads (read1 and read2) using PANDAseq [46]. We divided each junctional repair outcome of chromosomal rearrangements into the four groups of deletions, insertions, indels, and precise ligations, and calculated their respective frequencies.
Large resection analysisReads were mapped to the hg19 genome in both strands with our customized program. For reads covering large resections during DNA-fragment deletion, we required that the second segment maps strictly downstream of the first segment and that both are mapped to the forward strand. If the second segment maps to the reverse strand, then this is the case of large resections during DNA-fragment inversion.
Mathematical estimation for MMEJ probability of small deletionsThe region of the length \(n\) around a Cas9 cleavage has microhomology if and only if the length \(M\) of the longest common sequences flanking the cleavage site in the region is larger than a certain artificial threshold \(L\) defined by biological experiments. Although it is not easy to obtain the explicit cumulative probability distribution of \(M\), an estimation is available [47] by transforming this microhomology problem within a region of DNA sequences into the problem of tossing a coin for a specific number of times equal to the length of DNA. However, each position of a DNA sequences can have any of the four bases of G,C,A,T in contrast to that each coin-tossing only has either head (obverse) or tail (reverse). The length \(R\) of the longest run of heads in the first \(n\) tosses of a coin is approximately \(}_\left(n\right)\) [48]. More strictly, \(R/}_\left(n\right)\) converges to 1 almost everywhere as \(n\) tends to infinity.
By generalizing, Richard Arratia and Michael S. Waterman prove that \(M/}_\left(n\right)\) converges to 2 almost everywhere as \(n\) tends to infinity [47], where \(p=1/4\) is the probability that two random nucleotides in the corresponding position of the microhomology flanking the cleavage site are the same. For small \(n\), they estimate the probability of \(M\le L\) with an upper bound \((1+p)/\left(1-p\right)\right)}_\left(n\right)\right)}^^}_\left(n\right)-L-1}\) and a lower bound \(1-^}_(n)+1}\) [47].
We generated the curve of the lower bound estimations of MMEJ probabilities \(P\left(M\ge 2\right)\) with increasing deletion sizes \(n\) for the panel of Fig. 2I with a customized MATLAB script (Additional file 2: Note S5).
Statistical analysisAll high-throughput sequencing libraries are constructed with at least two replicates. The significance tests are performed using the GraphPad software with two-tailed t-tests, with one, two, three, and four asterisks indicating P-values less than 0.05, 0.01, 0.001, and 0.0001, respectively.
Comments (0)