Granular Syntax Processing with Multi-Task and Curriculum Learning

Woolf BP. Chapter 5 - Communication knowledge. In: Woolf BP, editor. Building intelligent interactive tutors. San Francisco: Morgan Kaufmann; 2009. pp. 136–82.

Cambria E, Mao R, Chen M, Wang Z, Ho S-B. Seven pillars for the future of Artificial Intelligence. IEEE Intell Syst. 2023;38(6):62–9.

Article  Google Scholar 

Matsoukas S, Bulyko I, Xiang B, Nguyen K, Schwartz R, Makhoul J. Integrating speech recognition and machine translation. In: 2007 IEEE International Conference on Acoustics, Speech and Signal Processing-ICASSP’07 (vol. 4). IEEE; 2007. p. 1281.

Zhou N, Wang X, Aw A. Dynamic boundary detection for speech translation. In: Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC) (vol. 2017). IEEE; 2017. pp. 651–6.

Krallinger M, Rabal O, Lourenco A, Oyarzabal J, Valencia A. Information retrieval and text mining technologies for chemistry. Chemical Rev. 2017;117(12):7673–761.

Article  Google Scholar 

Jing H, Lopresti D, Shih C. Summarization of noisy documents: a pilot study. In: Proceedings of the HLT-NAACL 03 Text Summarization Workshop. 2003. pp. 25–32.

Boudin F, Huet S, Torres-Moreno J-M. A graph-based approach to cross-language multi-document summarization. Polibits. 2011;43:113–8.

Article  Google Scholar 

Councill I, McDonald R, Velikovich L. What’s great and what’s not: Learning to classify the scope of negation for improved sentiment analysis. In: Proceedings of the Workshop on Negation and Speculation in Natural Language Processing. 2010. pp. 51–9.

Gupta H, Kottwani A, Gogia S, Chaudhari S. Text analysis and information retrieval of text data. In: 2016 International Conference on Wireless Communications, Signal Processing and Networking (WiSPNET). IEEE; 2016. pp. 788–92.

Syed AZ, Aslam M, Martinez-Enriquez AM. Associating targets with SentiUnits: a step forward in sentiment analysis of Urdu text. Artif Intell Rev. 2014;41(4):535–61.

Article  Google Scholar 

Collobert R, Weston J, Bottou L, Karlen M, Kavukcuoglu K, Kuksa P. Natural language processing (almost) from scratch. J Mach Learn Res. 2011;12:2493–537.

Google Scholar 

Sun X, Sun S, Yin M, Yang H. Hybrid neural conditional random fields for multi-view sequence labeling. Knowl-Based Syst. 2020;189:105151.

Article  Google Scholar 

Dozat T, Manning CD. Deep biaffine attention for neural dependency parsing. arXiv:1611.01734 [Preprint]. 2016. Available from: http://arxiv.org/abs/1611.01734.

Zhou H, Zhang Y, Li Z, Zhang M. Is POS tagging necessary or even helpful for neural dependency parsing? 2020.

Mahmood A, Khan HU, Zahoor-ur-Rehman, Khan W. Query based information retrieval and knowledge extraction using hadith datasets. In: 2017 13th International Conference on Emerging Technologies (ICET). 2017. pp. 1–6. https://doi.org/10.1109/ICET.2017.8281714.

Asghar MZ, Khan A, Ahmad S, Kundi FM. A review of feature extraction in sentiment analysis. J Basic Appl Scientific Res. 2014;4(3):181–6.

Google Scholar 

Cambria E, Zhang X, Mao R, Chen M, Kwok K. SenticNet 8: Fusing emotion AI and commonsense AI for interpretable, trustworthy, and explainable affective computing. In: Proceedings of the 26th International Conference on Human-computer Interaction (HCII). 2024.

Mao R, Lin C, Guerin F. Word embedding and WordNet based metaphor identification and interpretation. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (vol. 1). 2018. pp. 1222–31.

Ge M, Mao R, Cambria E. Explainable metaphor identification inspired by conceptual metaphor theory. In: Proceedings of AAAI. 2022. pp. 10681–9.

Mao R, Li X, He K, Ge M, Cambria E. MetaPro Online: a computational metaphor processing online system. In: Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations). Toronto: Association for Computational Linguistics; 2023. pp. 127–35. https://aclanthology.org/2023.acl-demo.12.

Akbik A, Blythe D, Vollgraf R. Contextual string embeddings for sequence labeling. In: Proceedings of the 27th International Conference on Computational Linguistics. 2018. pp. 1638–49.

Wang X, Jiang Y, Bach N, Wang T, Huang Z, Huang F, Tu K. Automated concatenation of embeddings for structured prediction. arXiv:2010.05006 [Preprint]. 2020. Available from: http://arxiv.org/abs/2010.05006.

Wong DF, Chao LS, Zeng X. iSentenizer-: Multilingual sentence boundary detection model. Scientific World J. 2014;2014.

Zhang X, Mao R, Cambria E. A survey on syntactic processing techniques. Artif Intell Rev. 2023;56(6):5645–728.

Article  Google Scholar 

Chen J, Qiu X, Liu P, Huang X. Meta multi-task learning for sequence modeling. In: Proceedings of the AAAI Conference on Artificial Intelligence. 2018. p. 32.

Yang Z, Salakhutdinov R, Cohen WW. Transfer learning for sequence tagging with hierarchical recurrent networks. arXiv:1703.06345 [Preprint]. 2017. Available from: http://arxiv.org/abs/1703.06345.

Bender E.M, Koller A. Climbing towards NLU: On meaning, form, and understanding in the age of data. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 2020. pp. 5185–98.

Mao R, Chen G, Zhang X, Guerin F, Cambria E. GPTEval: A survey on assessments of ChatGPT and GPT-4. In: Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING), Torino, Italia. 2024. pp. 7844–66.

Cambria E, Poria S, Gelbukh A, Thelwall M. Sentiment analysis is a big suitcase. IEEE Intell Syst. 2017;32(6):74–80. https://doi.org/10.1109/MIS.2017.4531228.

Article  Google Scholar 

Marcus MP, Santorini B, Marcinkiewicz MA. Building a large annotated corpus of English: the Penn Treebank. Comput Linguist. 1993;19(2):313–30.

Google Scholar 

Ratinov L, Roth D. Design challenges and misconceptions in named entity recognition. In: Proceedings of the Thirteenth Conference on Computational Natural Language Learning (CoNLL-2009). 2009. pp. 147–55.

Che X, Wang C, Yang H, Meinel C. Punctuation prediction for unsegmented transcript based on word vector. In: Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC’16). 2016. pp. 654–58.

Mao R, Li X. Bridging towers of multi-task learning with a gating mechanism for aspect-based sentiment analysis and sequential metaphor identification. Proc AAAI Conf Artif Intell. 2021;35:13534–42.

Google Scholar 

Ruder S. An overview of multi-task learning in deep neural networks. arXiv:1706.05098 [Preprint]. 2017. Available from: http://arxiv.org/abs/1706.05098.

Chen S, Zhang Y, Yang Q. Multi-task learning in natural language processing: an overview. arXiv:2109.09138 [Preprint]. 2021. Available from: http://arxiv.org/abs/2109.09138.

Sang EF, Buchholz S. Introduction to the CoNLL-2000 shared task: chunking. In: Fourth Conference on Computational Natural Language Learning and the Second Learning Language in Logic Workshop. ConLL ’00. Association for Computational Linguistics; 2000. pp. 127–32. https://doi.org/10.3115/1117601.1117631.

Le D, Thai M, Nguyen T. Multi-task learning for metaphor detection with graph convolutional neural networks and word sense disambiguation. In: Proceedings of the 34th AAAI Conference on Artificial Intelligence. 2020. pp. 8139–46.

Zhang Z, Yu W, Yu M, Guo Z, Jiang M. A survey of multi-task learning in natural language processing: regarding task relatedness and training methods. arXiv:2204.03508 [Preprint]. 2022. Available from: http://arxiv.org/abs/2204.03508.

Bhat S, Debnath A, Banerjee S, Shrivastava M. Word embeddings as tuples of feature probabilities. In: Proceedings of the 5th Workshop on Representation Learning for NLP. Association for Computational Linguistics, Online; 2020. pp. 24–33. https://doi.org/10.18653/v1/2020.repl4nlp-1.4, https://aclanthology.org/2020.repl4nlp-1.4.

Grefenstette G, Tapanainen P. What is a word, what is a sentence? Problems of tokenisation. Report, Grenoble Laboratory; 1994.

Stamatatos E, Fakotakis N, Kokkinakis G. Automatic extraction of rules for sentence boundary disambiguation. In: Proceedings of the Workshop on Machine Learning in Human Language Technology. Citeseer; 1999. pp. 88–92.

Sadvilkar N, Neumann M. PySBD: pragmatic sentence boundary disambiguation. arXiv:2010.09657 [Preprint]. 2020. Available from: http://arxiv.org/abs/2010.09657.

Knoll BC, Lindemann EA, Albert AL, Melton GB, Pakhomov SVS. Recurrent deep network models for clinical NLP tasks: Use case with sentence boundary disambiguation. Stud Health Technol Inf. 2019;264(31437913):198–202. https://doi.org/10.3233/SHTI190211.

Article  Google Scholar 

Makhija K, Ho T-N, Chng E-S. Transfer learning for punctuation prediction. In: Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC) (vol. 2019). IEEE; 2019. pp. 268–73.

Alam T, Khan A, Alam F. Punctuation restoration using transformer models for high-and low-resource languages. In: Proceedings of the Sixth Workshop on Noisy User-generated Text (W-NUT 2020). 2020. pp. 132–42.

Palmer DD, Hearst MA. Adaptive multilingual sentence boundary disambiguation. Comput Linguist. 1997;23(2):241–67.

Google Scholar 

Mikheev A. Tagging sentence boundaries. In: 1st Meeting of the North American Chapter of the Association for Computational Linguistics. 2000.

Agarwal N, Ford KH, Shneider M. Sentence boundary detection using a maxEnt classifier. In: Proceedings of MISC. 2005. pp. 1–6.

Ramshaw LA, Marcus M. Text chunking using transformation-based learning. In: Yarowsky D, Church K, editors. Third Workshop on Very Large Corpora. 1995. https://aclanthology.org/W95-0107/.

Sutton C, McCallum A, Rohanimanesh K. Dynamic conditional random fields: Factorized probabilistic models for labeling and segmenting sequence data. J Mach Learn Res. 2007;8(3).

Sun X, Morency L-P, Okanohara D, Tsuruoka Y, Tsujii J. Modeling latent-dynamic in shallow parsing: a latent conditional model with improved inference. In: Proceedings of the 22nd International Conference on Computational Linguistics (Coling 2008). 2008. pp. 841–8.

Lin JC-W, Shao Y, Zhang J, Yun U. Enhanced sequence labeling based on latent variable conditional random fields. Neurocomputing. 2020;403:431–40.

Article  Google Scholar 

Liu Y, Li G, Zhang X. Semi-Markov CRF model based on stacked neural Bi-LSTM for sequence labeling. In: 2020 IEEE 3rd International Conference of Safe Production and Informatization (IICSPI). 2020. pp. 19–23. https://doi.org/10.1109/IICSPI51290.2020.9332321.

Huang Z, Xu W, Yu K. Bidirectional LSTM-CRF models for sequence tagging. arXiv:1508.01991 [Preprint]. 2015. Available from: http://arxiv.org/abs/1508.01991.

Rei M. Semi-supervised multitask learning for sequence labeling. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Vancouver: Association for Computational Linguistics; 2017. pp. 2121–30. https://doi.org/10.18653/v1/P17-1194, https://aclanthology.org/P17-1194.

Zhai F, Potdar S, Xiang B, Zhou B. Neural models for sequence chunking. arXiv:1701.04027 [Preprint]. 2017. Available from: http://arxiv.org/abs/1701.04027.

Yang Z, Salakhutdinov R, Cohen W. Multi-task cross-lingual sequence tagging from scratch. arXiv:1603.06270 [Preprint]. 2016. Available from: http://arxiv.org/abs/1603.06270.

Wei W, Wang Z, Mao X, Zhou G, Zhou P, Jiang S. Position-aware self-attention based neural sequence labeling. Pattern Recognit. 2021;110:107636.

Article  Google Scholar 

Church KW. A stochastic parts program and noun phrase parser for unrestricted text. In: Second Conference on Applied Natural Language Processing. Austin: Association for Computational Linguistics; 1988. pp. 136–43. https://doi.org/10.3115/974235.974260, https://www.aclweb.org/anthology/A88-1019.

Kupiec J. Robust part-of-speech tagging using a hidden Markov model. Comput Speech Lang. 1992;6(3):225–42. https://doi.org/10.1016/0885-2308(92)90019-Z.

Article  Google Scholar 

Brants T. TnT-a statistical part-of-speech tagger. arXiv:cs/0003055 [Preprint]. 2000. Available from: http://arxiv.org/abs/cs/0003055.

McCallum A, Freitag D, Pereira FC. Maximum entropy Markov models for information extraction and segmentation. In: Icml (vol. 17). 2000. pp. 591–8.

Lafferty JD, McCallum A, Pereira FCN. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In: Proceedings of the Eighteenth International Conference on Machine Learning. ICML ’01. San Francisco: Morgan Kaufmann Publishers Inc.; 2001. pp. 282–9.

Dos Santos C, Zadrozny B. Learning character-level representations for part-of-speech tagging. In: International Conference on Machine Learning. PMLR; 2014. pp. 1818–26.

Ma X, Hovy E. End-to-end sequence labeling via bi-directional LSTM-CNNs-CRF. arXiv:1603.01354 [Preprint]. 2016. Available from: http://arxiv.org/abs/1603.01354.

Chiu JP, Nichols E. Named entity recognition with bidirectional LSTM-CNNs. Trans Assoc Computat Linguist. 2016;4:357–70.

Article  Google Scholar 

Zhao L, Qiu X, Zhang Q, Huang X. Sequence labeling with deep gated dual path CNN. IEEE/ACM Trans Audio Speech Lang Process. 2019;27(12):2326–35.

Article  Google Scholar 

Ruder S. An overview of multi-task learning in deep neural networks. arXiv:1706.05098 [Preprint]. 2017. Available from: http://arxiv.org/abs/1706.05098.

Ma Y, Mao R, Lin Q, Wu P, Cambria E. Quantitative stock portfolio optimization by multi-task learning risk and return. Inf Fusion. 2024;104:102165. https://doi.org/10.1016/j.inffus.2023.102165.

Article  Google Scholar 

He K, Mao R, Gong T, Li C, Cambria E. Meta-based self-training and re-weighting for aspect-based sentiment analysis. IEEE Trans Affective Comput. 2023;14(3):1731–42. https://doi.org/10.1109/TAFFC.2022.3202831.

Article  Google Scholar 

Liu P, Qiu X, Huang X. Recurrent neural network for text classification with multi-task learning. arXiv:1605.05101 [Preprint]. 2016. Available from: http://arxiv.org/abs/1605.05101.

Zhao S, Liu T, Zhao S, Wang F. A neural multi-task learning framework to jointly model medical named entity recognition and normalization. In: Proceedings of the AAAI Conference on Artificial Intelligence (vol. 33). 2019. pp. 817–24.

Soviany P, Ionescu RT, Rota P. Sebe N. Curriculum learning: a survey. Int J Comput Vis. 2022:1–40.

Ma F, Meng D, Xie Q, Li Z, Dong X. Self-paced co-training. In: International Conference on Machine Learning. PMLR; 2017. pp. 2275–84.

Zhang X, Kumar G, Khayrallah H, Murray K, Gwinnup J, Martindale MJ, McNamee P, Duh K, Carpuat M. An empirical exploration of curriculum learning for neural machine translation. arXiv:1811.00739 [Preprint]. 2018. Available from: http://arxiv.org/abs/1811.00739.

Wang W, Caswell I, Chelba C. Dynamically composing domain-data selection with clean-data selection by “co-curricular learning” for neural machine translation. arXiv:1906.01130 [Preprint]. 2019. Available from: http://arxiv.org/abs/1906.01130.

Kocmi T, Bojar O. Curriculum learning and minibatch bucketing in neural machine translation. In: Proceedings of the International Conference Recent Advances in Natural Language Processing, RANLP (vol. 2017). 2017. pp. 379–86.

Liu C, He S, Liu K, Zhao J, et al. Curriculum learning for natural answer generation. In: IJCAI. 2018. pp. 4223–9.

Wu L, Tian F, Xia Y, Fan Y, Qin T, Jian-Huang L, Liu T-Y. Learning to teach with dynamic loss functions. Adv Neural Inf Process Syst. 2018;31.

Hacohen G, Weinshall D. On the power of curriculum learning in training deep networks. In: International Conference on Machine Learning. PMLR; 2019. pp. 2535–44.

Zhang M, Yu Z, Wang H, Qin H, Zhao W, Liu Y. Automatic digital modulation classification based on curriculum learning. Appl Sci. 2019;9(10):2171.

Article  Google Scholar 

Sangineto E, Nabi M, Culibrk D, Sebe N. Self paced deep learning for weakly supervised object detection. IEEE Trans Pattern Anal Mach Intell. 2018;41(3):712–25.

Article  Google Scholar 

Kim D, Bae J, Jo Y, Choi J. Incremental learning with maximum entropy regularization: rethinking forgetting and intransigence. arXiv:1902.00829 [Preprint]. 2019. Available from: http://arxiv.org/abs/1902.00829.

Castells T, Weinzaepfel P, Revaud J. Superloss: a generic loss for robust curriculum learning. Adv Neural Inf Process Syst. 2020;33:4308–19.

Google Scholar 

Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I. Attention is all you need. Adv Neural Inf Process Syst. 2017;30.

Khan S, Naseer M, Hayat M, Zamir SW, Khan FS, Shah M. Transformers in vision: a survey. ACM Comput Surv (CSUR). 2021.

Mao R, Li X, Ge M, Cambria E. Metapro: a computational metaphor processing model for text pre-processing. Inf Fusion. 2022;86–87:30–43.

Comments (0)

No login
gif