1.
Szczerba, M, Wiewiórka, MS, Okoniewski, MJ, Rybiński, H. Scalable cloud-based data analysis software systems for big data from next generation sequencing. In: Japkowicz, N, Stefanowski, J, eds. Big Data Analysis: New Algorithms for a New Society. Cham, Switzerland: Springer International Publishing; 2016:263-283. doi:
10.1007/978-3-319-26989-4_11. Google Scholar |
Crossref2.
Tomczak, K, Czerwińska, P, Wiznerowicz, M. The Cancer Genome Atlas (TCGA): an immeasurable source of knowledge. Contemp Oncol (Poznan, Poland). 2015;19:A68-A77.
Google Scholar |
Medline3.
Firebrows .
http://firebrowse.org/ (accessed December 10, 2020).
Google Scholar4.
Wilson, S, Fitzsimons, M, Ferguson, M, et al. Developing cancer informatics applications and tools using the NCI genomic data commons API. Cancer Research. 2017;77:e15-e18. doi:
10.1158/0008-5472.CAN-17-0598. Google Scholar |
Crossref |
Medline5.
Shilo, S, Rossman, H, Segal, E. Axes of a revolution: challenges and promises of big data in healthcare. Nat Med. 2020;26:29-38.
Google Scholar |
Crossref |
Medline6.
Schüssler-Fiorenza Rose, SM, Contrepois, K, Moneghetti, KJ, et al. A longitudinal big data approach for precision health. Nat Med. 2019;25:792-804.
Google Scholar |
Crossref |
Medline7.
Wu, C, Zhou, F, Ren, J, Li, X, Jiang, Y, Ma, S. A selective review of multi-level omics data integration using variable selection. High Throughput. 2019;8:4.
Google Scholar |
Crossref8.
Grabowski, P, Rappsilber, J. A primer on data analytics in functional genomics: how to move from data to insight? Trends Biochem Sci. 2019;44:21-32. doi:
10.1016/j.tibs.2018.10.010. Google Scholar |
Crossref |
Medline9.
Perez-Riverol, Y, Zorin, A, Dass, G, et al. Quantifying the impact of public omics data. Nat Commun. 2019;10:3512.
Google Scholar |
Crossref |
Medline10.
Chen, B, Butte, A. Leveraging big data to transform target selection and drug discovery. Clin Pharmacol Ther. 2016;99:285-297. doi:
10.1002/cpt.318. Google Scholar |
Crossref |
Medline11.
Wood, DE, White, JR, Georgiadis, A, et al. A machine learning approach for somatic mutation discovery. Sci Transl Med. 2018;10:eaar7939.
Google Scholar |
Crossref |
Medline12.
Krumm, N, Hoffman, N. Practical cost analysis of genomic data in the cloud. Am J Clin Pathol. 2019;152:S2-S3.
Google Scholar |
Crossref13.
He, KY, Ge, D, He, MM. Big data analytics for genomic medicine. Int J Mol Sci. 2017;18:412.
Google Scholar |
Crossref14.
Langmead, B, Nellore, A. Cloud computing for genomic data analysis and collaboration. Nat Rev Genet. 2018;19:325.
Google Scholar |
Crossref |
Medline15.
Halligan, BD, Geiger, JF, Vallejos, AK, Greene, AS, Twigger, SN. Low cost, scalable proteomics data analysis using Amazon’s cloud computing services and open source search algorithms. J Proteome Res. 2009;8:3148-3153.
Google Scholar |
Crossref |
Medline16.
Dalman, T, Dörnemann, T, Juhnke, E, et al. Metabolic flux analysis in the cloud. Paper presented at: ESCIENCE ‘10: Proceedings of the 2010 IEEE Sixth International Conference on e-Science; ; Brisbane, QLD, Australia. doi:
10.1109/eScience.2010.20. Google Scholar |
Crossref17.
Yahara, K, Suzuki, M, Hirabayashi, A, et al. Long-read metagenomics using PromethION uncovers oral bacteriophages and their interaction with host bacteria. Nat Commun. 2021;12:27. doi:
10.1038/s41467-020-20199-9. Google Scholar |
Crossref |
Medline18.
Murigneux, V, Rai, SK, Furtado, A, et al. Comparison of long-read methods for sequencing and assembly of a plant genome. GigaScience. 2020;9:giaa146. doi:
10.1093/gigascience/giaa146. Google Scholar |
Crossref |
Medline19.
Biswas, N, Chakrabarti, S. Artificial intelligence (AI)-based systems biology approaches in multi-omics data analysis of cancer. Front Oncol. 2020;10:588221. doi:
10.3389/fonc.2020.588221. Google Scholar |
Crossref |
Medline20.
Taylor, RC. An overview of the Hadoop/MapReduce/HBase framework and its current applications in bioinformatics. BMC Bioinformatics. 2010;11:S1.
Google Scholar |
Crossref |
Medline |
ISI21.
Vasaikar, SV, Straub, P, Wang, J, Zhang, B. LinkedOmics: analyzing multi-omics data within and across 32 cancer types. Nucleic Acids Res. 2017;46:D956-D963. doi:
10.1093/nar/gkx1090. Google Scholar |
Crossref22.
Boisvert, S, Laviolette, F, Corbeil, J. Ray: simultaneous assembly of reads from a mix of high-throughput sequencing technologies. J Comput Biol. 2010;17:1519-1533.
Google Scholar |
Crossref |
Medline |
ISI23.
Simpson, JT, Wong, K, Jackman, SD, Schein, JE, Jones, SJM, Birol, I. ABySS: a parallel assembler for short read sequence data. Genome Res. 2009;19:1117-1123.
Google Scholar |
Crossref |
Medline |
ISI24.
Meng, J, Wang, B, Wei, Y, Feng, S, Balaji, P. SWAP-Assembler: scalable and efficient genome assembly towards thousands of cores. BMC Bioinformatics. 2014;15:S2.
Google Scholar |
Crossref |
Medline25.
Decap, D, Reumers, J, Herzeel, C, Costanza, P, Fostier, J. Halvade: scalable sequence analysis with MapReduce. Bioinformatics. 2015;31:2482-2488. doi:
10.1093/bioinformatics/btv179. Google Scholar |
Crossref |
Medline26.
Guo, R, Zhao, Y, Zou, Q, Fang, X, Peng, S. Bioinformatics applications on Apache Spark. GigaScience. 2018;7:giy098.
Google Scholar27.
Štufi, M, Bačić, B, Stoimenov, L. Big data analytics and processing platform in Czech Republic Healthcare. Appl Sci. 2020;10:1705. doi:
10.3390/app10051705. Google Scholar |
Crossref28.
Langmead, B, Schatz, MC, Lin, J, Pop, M, Salzberg, SL. Searching for SNPs with cloud computing. Genome Biol. 2009;10:R134. doi:
10.1186/gb-2009-10-11-r134. Google Scholar |
Crossref29.
Langmead, B, Trapnell, C, Pop, M, Salzberg, SL. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 2009;10:R25. doi:
10.1186/gb-2009-10-3-r25. Google Scholar |
Crossref30.
Gu, S, Fang, L, Xu, X. Using SOAPaligner for short reads alignment. Curr Protoc Bioinformatics. 2013;44:11.11.1-11.11.17. doi:
10.1002/0471250953.bi1111s44. Google Scholar |
Crossref |
Medline31.
Zou, Q, Li, X-B, Jiang, W-R, Lin, Z-Y, Li, G-L, Chen, K. Survey of MapReduce frame operation in bioinformatics. Brief Bioinform. 2013;15:637-647. doi:
10.1093/bib/bbs088. Google Scholar |
Crossref |
Medline32.
Pandey, RV, Schlötterer, C. DistMap: a toolkit for distributed short read mapping on a Hadoop cluster. PLoS ONE. 2013;8:e72614.
Google Scholar |
Crossref33.
Lewis, S, Csordas, A, Killcoyne, S, et al. Hydra: a scalable proteomic search engine which utilizes the Hadoop distributed computing framework. BMC Bioinformatics. 2012;13:324.
Google Scholar |
Crossref |
Medline |
ISI34.
McKenna, A, Hanna, M, Banks, E, et al. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 2010;20:1297-1303.
Google Scholar |
Crossref |
Medline |
ISI35.
Niemenmaa, M, Kallio, A, Schumacher, A, Klemela, P, Korpelainen, E, Heljanko, K. Hadoop-BAM: directly manipulating next generation sequencing data in the cloud. Bioinformatics. 2012;28:876-877. doi:
10.1093/bioinformatics/bts054. Google Scholar |
Crossref |
Medline |
ISI36.
O’Connor, BD, Merriman, B, Nelson, SF. SeqWare Query Engine: storing and searching sequence data in the cloud. BMC Bioinformatics. 2010;11:S2.
Google Scholar |
Medline37.
Matthews, SJ, Williams, TL. MrsRF: an efficient MapReduce algorithm for analyzing large collections of evolutionary trees. BMC Bioinformatics. 2010;11:S15.
Google Scholar |
Crossref |
Medline38.
Weber, N, Liou, D, Dommer, J, et al. Nephele: a cloud platform for simplified, standardized and reproducible microbiome data analysis. Bioinformatics. 2018;34:1411-1413. doi:
10.1093/bioinformatics/btx617. Google Scholar |
Crossref |
Medline39.
Vouzis, PD, Sahinidis, NV. GPU-BLAST: using graphics processors to accelerate protein sequence alignment. Bioinformatics (Oxford, England). 2011;27:182-188.
Google Scholar |
Crossref |
Medline |
ISI40.
Liu, C-M, Wong, T, Wu, E, et al. SOAP3: ultra-fast GPU-based parallel alignment tool for short reads. Bioinformatics. 2012;28:878-879. doi:
10.1093/bioinformatics/bts061. Google Scholar |
Crossref |
Medline |
ISI41.
Leo, S, Santoni, F, Zanetti, G. Biodoop: bioinformatics on Hadoop. Paper presented at: 2009 International Conference on Parallel Processing Workshops; , 2009:415-422; Vienna, Austria. doi:
10.1109/ICPPW.2009.37. Google Scholar |
Crossref42.
Nordberg, H, Bhatia, K, Wang, K, Wang, Z. BioPig: a Hadoop-based analytic toolkit for large-scale sequence data. Bioinformatics. 2013;29:3014-3019. doi:
10.1093/bioinformatics/btt528. Google Scholar |
Crossref |
Medline43.
Schumacher, A, Pireddu, L, Niemenmaa, M, et al. SeqPig: simple and scalable scripting for large sequencing data sets in Hadoop. Bioinformatics. 2013;30:119-120. doi:
10.1093/bioinformatics/btt601. Google Scholar |
Crossref |
Medline44.
Di Tommaso, P, Chatzou, M, Floden, EW, Barja, PP, Palumbo, E, Notredame, C. Nextflow enables reproducible computational workflows. Nat Biotechnol. 2017;35:316-319. doi:
10.1038/nbt.3820. Google Scholar |
Crossref |
Medline45.
Köster, J, Rahmann, S. Snakemake—a scalable bioinformatics workflow engine. Bioinformatics. 2012;28:2520-2522. doi:
10.1093/bioinformatics/bts480. Google Scholar |
Crossref |
Medline46.
Mölder, F, Jablonski, K, Letcher, B, et al. Sustainable data analysis with Snakemake [version 1; peer review: 1 approved, 1 approved with reservations]. F1000Res. 2021;10:33. doi:
10.12688/f1000research.29032.1. Google Scholar |
Crossref |
Medline47.
Yang, A, Troup, M, Lin, P, Ho, JWK. Falco: a quick and flexible single-cell RNA-seq processing framework on the cloud. Bioinformatics. 2017;33:767-769. doi:
10.1093/bioinformatics/btw732. Google Scholar |
Crossref |
Medline48.
Mell, P, Grance, T. The NIST Definition of Cloud Computing. Gaithersburg, MD: National Institute of Standards and Technology (NIST); 2011.
http://csrc.nist.gov/publications/nistpubs/800-145/SP800-145.pdf.
Google Scholar49.
Dai, L, Gao, X, Guo, Y, Xiao, J, Zhang, Z. Bioinformatics clouds for big data manipulation. Biol Direct. 2012;7:43; discussion 43.
Google Scholar |
Crossref |
Medline |
ISI50.
Stephens, ZD, Lee, SY, Faghri, F, et al. Big data: astronomical or genomical? PLoS Biol. 2015;13:e1002195. doi:
10.1371/journal.pbio.1002195. Google Scholar |
Crossref |
Medline51.
Howe, KL, Achuthan, P, Allen, J, et al. Ensembl 2021. Nucleic Acids Res. 2021;49:
Comments (0)