A review on advancements in feature selection and feature extraction for high-dimensional NGS data analysis

Abdulsalam SO, Mohammed AA, Ajao JF, Babatunde RS, Ogundokun RO, Nnodim CT, Arowolo MO. Performance evaluation of ANOVA and RFE algorithms for classifying microarray dataset using SVM. InInformation Systems: 17th European, Mediterranean, and Middle Eastern Conference, EMCIS 2020, Dubai, United Arab Emirates, November 25–26, 2020, Proceedings 17 2020 (pp. 480–492). Springer International Publishing

Aduviri R, Matos D, Villanueva E (2019) Feature selection algorithm recommendation for gene expression data through gradient boosting and neural network metamodels. Proceedings - 2018 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2018:2726–2728

Aevermann B, Zhang Y, Novotny M, Keshk M, Bakken T (2021) A machine learning method for the discovery of minimum marker gene combinations for cell type identification from single-cell RNA sequencing. Genome Res 31:1767–1780

Article  PubMed  PubMed Central  Google Scholar 

Afrash MR, Mirbagheri E, Mashoufi M, Kazemi-Arpanahi H (2023) Optimizing prognostic factors of five-year survival in gastric cancer patients using feature selection techniques with machine learning algorithms: a comparative study. BMC Med Inform Decis Mak 23:54

Article  PubMed  PubMed Central  Google Scholar 

Almazrua H, Alshamlan H (2022) A Comprehensive Survey of Recent Hybrid Feature Selection Methods in Cancer Microarray Gene Expression Data. IEEE Access 10:71427–71449

Article  Google Scholar 

Almugren N, Alshamlan H (2019) A survey on hybrid feature selection methods in microarray gene expression data for cancer classification. IEEE Access 7:78533–78548

Article  Google Scholar 

Alok AK, Saha S, Ekbal A (2017) Semi-supervised clustering for gene-expression data in multiobjective optimization framework. Int J Mach Learn Cybern 8:421–439

Article  Google Scholar 

Alomari OA, Khader AT, Al-Betar MA, Alyasseri ZA (2018) A hybrid filter-wrapper gene selection method for cancer classification. In2018 2nd international conference on biosignal analysis, processing and systems (ICBAPS). 113–118

Alshamlan HM, Badr GH, Alohali YA (2015a) Genetic Bee Colony (GBC) algorithm: A new gene selection method for microarray cancer classification. Comput Biol Chem 56:49–60

Article  CAS  PubMed  Google Scholar 

Alshamlan H, Badr G, Alohali Y (2015) mRMR-ABC : A Hybrid Gene Selection Algorithm for Cancer Classification Using Microarray Gene Expression Profiling. 2015

Alzubi R, Ramzan N, Alzoubi H, Amira A (2017) A Hybrid Feature Selection Method for Complex Diseases SNPs. IEEE Access 6:1292–1301. https://doi.org/10.1109/ACCESS.2017.2778268

Article  Google Scholar 

Amid E, Warmuth MK. TriMap: Large-scale Dimensionality Reduction Using Triplets., 2019. http://arxiv.org/abs/1910.00204

Amid E, Warmuth MK (2019) TriMap: Large-scale dimensionality reduction using triplets. arXiv preprint arXiv:1910.00204

Anaissi A, Kennedy PJ, Goyal M, Catchpoole DR (2013) A balanced iterative random forest for gene selection from microarray data. BMC bioinformatics.14:1–0

Anders S, Huber W (2010) Differential expression analysis for sequence count data. Genome Biol 11:R106

Article  CAS  PubMed  PubMed Central  Google Scholar 

Andrews TS, Hemberg M (2019) M3Drop: dropout-based feature selection for scRNASeq. Bioinformatics 35:2865–2867

Article  CAS  PubMed  Google Scholar 

Ang JC, Mirzal A, Haron H, Hamed HNA (2016) Supervised, unsupervised, and semi-supervised feature selection: A review on gene selection. IEEE/ACM Trans Comput Biol Bioinform 13:971–989

Article  PubMed  Google Scholar 

M Ascensión A, Ibáñez-Solé O, Inza I, Izeta A, Araúzo-Bravo MJ (2022) Triku: a feature selection method based on nearest neighbors for single-cell data. GigaScience. 11: 017

Baldi P, Long AD (2001) A Bayesian framework for the analysis of microarray expression data: Regularized t-test and statistical inferences of gene changes. Bioinformatics 17:509–519

Article  CAS  PubMed  Google Scholar 

Bandyopadhyay S, Bhadra T, Mitra P, Maulik U (2014I) ntegration of dense subgraph finding with feature clustering for unsupervised feature selection. Pattern Recognit Lett 40:104–112

Article  Google Scholar 

Bandyopadhyay S, Mallik S. (2014) A Survey and Comparative Study of Statistical Tests for Identifying Differential Expression from Microarray Data. 11:95-115

Barshan E, Ghodsi A, Azimifar Z, Jahromi MZ (2011) Supervised principal component analysis: Visualization, classification and regression on subspaces and submanifolds. Pattern Recogn 44:1357–1371

Article  Google Scholar 

Battiti R (1994) Using mutual information for selecting features in supervised neural net learning. IEEE Trans Neural Netw 5:537–550

Article  CAS  PubMed  Google Scholar 

Bhadra T, Bandyopadhyay S (2014) Unsupervised Feature Selection using an Improved version of Differential Evolution. Expert Syst Appl 2:4042–4053

Google Scholar 

Bhadra T, Maulik U (2022) Unsupervised Feature Selection Using Iterative Shrinking and Expansion Algorithm. IEEE Trans Emerg Top Comput Intell 6:1453–1462

Article  Google Scholar 

Bhadra T, Mallik S, Hasan N, Zhao Z (2022) Comparison of five supervised feature selection algorithms leading to top features and gene signatures from multi-omics data in cancer. BMC Bioinformatics 23:153

Article  CAS  PubMed  PubMed Central  Google Scholar 

Bommert A, Welchowski T, Schmid M, Rahnenführer J. (2022) Benchmark of filter methods for feature selection in high-dimensional gene expression survival data. Brief Bioinform. 23:1–13. bbab354

Bommert A, Welchowski T, Schmid M, Rahnenführer J (2022) Benchmark of filter methods for feature selection in high-dimensional gene expression survival data, Briefings in Bioinformatics, 23,bbab354

Brazma A, Vilo J (2001) Gene expression data analysis. Microbes Infect 3:823–829

Article  CAS  PubMed  Google Scholar 

Cai JJ (2020) scGEAToolbox: a Matlab toolbox for single-cell RNA sequencing data analysis.1948–1949

Cao M, Chen G, Yu J, Shi S (2020) Computational prediction and analysis of species-specific fungi phosphorylation via feature optimization strategy. Brief Bioinform 21:595–608

Article  CAS  PubMed  Google Scholar 

Chakraborty D, Maulik U, Member S (2014) Identifying Cancer Biomarkers From Microarray Data Using Feature Selection and Semisupervised Learning. IEEE J Transl Eng Health Med. 1–11

Chandrasekhar T, Thangavel K, Elayaraja E, Sathishkumar EN (2013) Unsupervised gene expression data using enhanced clustering method. In2013 IEEE International Conference ON Emerging Trends in Computing, Communication and Nanotechnology (ICECCN). 518–522

Chen Y, Wang Y, Chen Y, Cheng Y, Wei Y (2022) Deep autoencoder for interpretable tissue-adaptive deconvolution and cell-type-specific gene analysis. Nat Commun 13:6735

Article  CAS  PubMed  PubMed Central  Google Scholar 

Danaee P, Ghaeini R, Hendrix DA. (2017) A deep learning approach for cancer detection and relevant gene identification. InPacific symposium on biocomputing 2017: 219–229

Dashtban M, Balafar M (2017) Gene selection for microarray cancer classification using a new evolutionary method employing artificial intelligence concepts. Genomics 109:91–107

Article  CAS  PubMed  Google Scholar 

Degenhardt F, Seifert S, Szymczak S (2019) Evaluation of variable selection methods for random forests and omics data sets. Brief Bioinform 20:492–503

Article  PubMed  Google Scholar 

Deng T, Chen S, Zhang Y, Xu Y, Feng D, Wu H, Sun X (2023) A cofunctional grouping-based approach for non-redundant feature gene selection in unannotated single-cell RNA-seq analysis. Briefings in Bioinformatics.24:bbad042

Dittman D, Khoshgoftaar T, Wald R, Napolitano A (2012) Similarity analysis of feature ranking techniques on imbalanced dna microarray datasets. In2012 IEEE International conference on bioinformatics and biomedicine 1–5

Djellali H, Guessoum S, Ghoualmi-Zine N, Layachi S (2017) Fast correlation based filter combined with genetic algorithm and particle swarm on feature selection. In: 2017 5th International Conference on Electrical Engineering - Boumerdes, ICEE-B. 2017:1–6

Dorrity MW, Saunders LM, Queitsch C, Fields S, Trapnell C (2020) Dimensionality reduction by UMAP to visualize physical and genetic interactions. Nat Commun. 11

Feng J, Zhang J, Zhu X (2023a) Gene selection and clustering of single-cell data based on Fisher score and genetic algorithm. J Supercomput 79:7067–7093

Article  Google Scholar 

Feng J, Zhang J, Zhu X, Wang JH (2023b) Gene selection and clustering of single-cell data based on Fisher score and genetic algorithm. J Supercomput 79:7067–7093

Article  Google Scholar 

Ferreira AJ, Figueiredo MA (2012) Efficient feature selection filters for high-dimensional data. Pattern Recogn Lett 33:1794–1804

Article  Google Scholar 

Gangeh MJ, Zarkoob H, Ghodsi A (2017) Fast and Scalable Feature Selection for Gene Expression Data Using Hilbert-Schmidt Independence Criterion. IEEE/ACM Trans Comput Biol Bioinform 14(1):167–181

Article  PubMed  Google Scholar 

Gisbrecht A, Schulz A, Hammer B (2015) Parametric nonlinear dimensionality reduction using kernel t-SNE. Neurocomputing 147:71–82

Article  Google Scholar 

Gokhale M, Mohanty SK, Ojha A (2022) A stacked autoencoder based gene selection and cancer classification framework. Biomed Signal Process Control 78:103999

Article  Google Scholar 

Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S (2020) Generative adversarial networks. Commun ACM 63:139–144

Article  Google Scholar 

GOSSET WS. Gosset, William Sealy. (2016) Encyclopedia of Mathematics

Gregory W, Sarwar N, Kevrekidis G, Villar S, Dumitrascu B (2024) MarkerMap: nonlinear marker selection for single-cell studies. NPJ Syst Biol Appl. 10:17

Article  PubMed  PubMed Central  Google Scholar 

Guo X, Jiang X, Xu J, Quan X, Wu M, Zhang H (2018) Ensemble consensus-guided unsupervised feature selection to identify huntington’s disease-associated genes. Genes (Basel).9

Gupta M, Gupta B (2021) A novel gene expression test method of minimizing breast cancer risk in reduced cost and time by improving SVM-RFE gene selection method combined with LASSO. J Integr Bioinform 18:139–153

Article  Google Scholar 

Guyon I, Weston J, Barnhill S, Vapnik V (2002) Gene selection for cancer classification using support vector machines. Mach Learn 46:389–422

Article  Google Scholar 

Ha, Van-Sang and Nguyen, Ha-Nam (2016) C-KPCA: custom kernel PCA for cancer classification.International conference on machine learning and data mining in pattern recognition.459–467

Hambali MA, Oladele TO, Adewole KS. (2020) Microarray cancer feature selection: Review, challenges and research directions. International Journal of Cognitive Computing in Engineering. 78–97

He X, Cai D, Niyogi P (2005) Laplacian Score for feature selection. Adv Neural Inf Process Syst. 507–514

Hinton GE, Salakhutdinov RR (2006) Reducing the dimensionality of data with neural networks. science. 313:504–7

Comments (0)

No login
gif