From RNNs to Transformers and Beyond: a Deep Dive into Intent Detection in Goal-oriented Conversational Agents

Laranjo L, Dunn AG, Tong HL, Kocaballi AB, Chen J, Bashir R, Surian D, Gallego B, Magrabi F, Lau AYS, Coiera E. Conversational agents in healthcare: a systematic review. J Am Med Inform Assoc. 2018;25(9):1248–58. https://doi.org/10.1093/jamia/ocy072.

Article  Google Scholar 

Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I. Attention is all you need. Advances in Neural Information Processing Systems, 2017-Decem(Nips). 2017; pp 5999–6009. https://arxiv.org/abs/1706.03762. Accessed Dec 2024.

Iovine A, Narducci F, de Gemmis M, Polignano M, Basile P, Semeraro G. A comparison of services for intent and entity recognition for conversational recommender systems. In: Brusilovsky P, de Gemmis M, Felfernig A, Lops P, O’Donovan J, Semeraro G, Willemsen MC, editors. Proceedings of the 7th Joint Workshop on Interfaces and Human Decision Making for Recommender Systems co-located with 14th Conference on Recommender Systems (RecSys 2020), Online Event. CEUR-WS.org. 2020;2682:37–47. https://ceur-ws.org/Vol-2682/paper4.pdf. Accessed Dec 2024.

Bhathiya HS, Thayasivam U. Meta learning for few-shot joint intent detection and slot-filling. Proceedings of the 2020 5th International Conference on Machine Learning Technologies. 2020; pp 86–92. https://doi.org/10.1145/3409073.3409090.

Epure EV, Compagno D, Salinesi C, Deneckere R, Bajec M, Žitnik S. Process models of interrelated speech intentions from online health-related conversations. Artif Intell Med. 2018;91:23–38. https://doi.org/10.1016/j.artmed.2018.06.007.

Article  Google Scholar 

Price PJ. Evaluation of spoken language systems: the ATIS domain. Proceedings of the Workshop on Speech and Natural Language. 1990. pp. 91–95. https://doi.org/10.3115/116580.116612.

Coucke A, Saade A, Ball A, Bluche T, Caulier A, Leroy D, Doumouro C, Gisselbrecht T, Caltagirone F, Lavril T, Primet M, Dureau J. Snips voice platform: an embedded spoken language understanding system for private-by-design voice interfaces. arXiv: [1805.10190]. 2018. https://arxiv.org/abs/1805.10190. Accessed Dec 2024.

Casanueva I, Vulic I, Spithourakis G, Budzianowski P. NLU++: a multi-label, slot-rich, generalisable dataset for natural language understanding in task-oriented dialogue. In: Carpuat M, de Marneffe M-C, Ru\’\iz IVM editors. Findings of the Association for Computational Linguistics: NAACL 2022, Seattle, WA, United States, July 10–15, 2022 (pp. 1998–2013). Association for Computational Linguistics. 2022. https://doi.org/10.18653/V1/2022.FINDINGS-NAACL.154.

Hao X, Wang L, Zhu H, Guo X. Joint agricultural intent detection and slot filling based on enhanced heterogeneous attention mechanism. Comput Electron Agric. 2023;207(January): 107756. https://doi.org/10.1016/j.compag.2023.107756.

Article  Google Scholar 

Braun D, Hernandez-Mendez A, Matthes F, Langen M. Evaluating natural language understanding services for conversational question answering systems. In: Jokinen K, Stede M, DeVault D, Louis A, editors. Proceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue, Saarbrücken, Germany, August 15–17, 2017. Association for Computational Linguistics; 2017. pp. 174–185. https://doi.org/10.18653/V1/W17-5522.

Gupta S, Shah R, Mohit M, Kumar A, Lewis M. Semantic parsing for task oriented dialog using hierarchical representations. In: Riloff E, Chiang D, Hockenmaier J, Tsujii J, editors. Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, October 31 - November 4, 2018. Association for Computational Linguistics; 2018. pp. 2787–2792. https://doi.org/10.18653/V1/D18-1300.

Liu X, Eshghi A, Swietojanski P, Rieser V. Benchmarking natural language understanding services for building conversational agents. In: Marchi E, Siniscalchi SM, Cumani S, Salerno VM, Li H editors. Increasing naturalness and flexibility in spoken dialogue interaction - 10th International Workshop on Spoken Dialogue Systems, 2019, Syracuse, Sicily, Italy, 24–26 April 2019 (Vol. 714, pp. 165–183). Springer. 2019. https://doi.org/10.1007/978-981-15-9323-9\_15.

Schuster S, Gupta S, Shah R, Lewis M. Cross-lingual transfer learning for multilingual task oriented dialog. In: Burstein J, Doran C, Solorio T, editors. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2019, Minneapolis, MN, USA, June 2–7, 2019, Volume 1 (Long and Short Papers). Association for Computational Linguistics; 2019. pp. 3795–3805. https://doi.org/10.18653/V1/N19-1380.

Chen X, Ghoshal A, Mehdad Y, Zettlemoyer L, Gupta S. Low-resource domain adaptation for compositional task-oriented semantic parsing. In: Webber B, Cohn T, He Y, Liu Y, editors. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, 2020, Online, November 16–20, 2020. Association for Computational Linguistics; 2020. pp. 5090–5100. https://doi.org/10.18653/V1/2020.EMNLP-MAIN.413.

Sowanski M, Janicki A. Leyzer: a dataset for multilingual virtual assistants. In: Sojka P, Kopecek I, Pala K, Horák A editors. Text, speech, and dialogue - 23rd International Conference, 2020, Brno, Czech Republic, September 8–11, 2020, Proceedings (Vol. 12284, pp. 477–486). Springer. 2020. https://doi.org/10.1007/978-3-030-58323-1\_51.

Qin L, Xu X, Che W, Liu T. AGIF: an adaptive graph-interactive framework for joint multiple intent detection and slot filling. In: CohnT, He Y, Liu Y editors. Findings of the Association for Computational Linguistics: EMNLP 2020 (pp. 1807–1816). Association for Computational Linguistics. 2020. https://doi.org/10.18653/v1/2020.findings-emnlp.163.

Einolghozati A, Arora A, Lecanda LS-M, Kumar A, Gupta, S. El Volumen Louder Por Favor: code-switching in task-oriented semantic parsing. In P. Merlo, J. Tiedemann, & R. Tsarfaty (Eds.), Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, 2021, Online, April 19 - 23, 2021. Association for Computational Linguistics; 2021. pp. 1009–1021. https://doi.org/10.18653/V1/2021.EACL-MAIN.87.

Li H, Arora A, Chen S, Gupta A, Gupta S, Mehdad Y. MTOP: a comprehensive multilingual task-oriented semantic parsing benchmark. In: Merlo P, Tiedemann J, Tsarfaty R, editors. Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, 2021, Online, April 19 - 23, 2021. Association for Computational Linguistics; 2021. pp. 2950–2962. https://doi.org/10.18653/V1/2021.EACL-MAIN.257.

Van der Goot R, Sharaf I, Imankulova A, Üstün A, Stepanovic M, Ramponi A, Khairunnisa SO, Komachi M, Plank B. From masked language modeling to translation: non-English auxiliary tasks improve zero-shot spoken language understanding. In: Toutanova K, Rumshisky A, Zettlemoyer L, Hakkani-Tür D, Beltagy I, Bethard S, Cotterell R, Chakraborty T, Zhou Y, editors. Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2021, Online, June 6–11, 2021. Association for Computational Linguistics; 2021. pp. 2479–2497. https://doi.org/10.18653/V1/2021.NAACL-MAIN.197.

Tür G, Hakkani-Tür D, Heck LP. What is left to be understood in ATIS? In: Hakkani-Tür D, Ostendorf M, editors. 2010 Spoken Language Technology Workshop, 2010, Berkeley, California, USA, December 12–15, 2010. IEEE; 2010. pp. 19–24. https://doi.org/10.1109/SLT.2010.5700816.

Sarker IH. Deep learning: a comprehensive overview on techniques, taxonomy, applications and research directions. SN Computer Science. 2021;2(6):1–20. https://doi.org/10.1007/s42979-021-00815-1.

Article  MathSciNet  Google Scholar 

Elman JL. Finding structure in time. Cogn Sci. 1990;14(2):179–211. https://doi.org/10.1207/s15516709cog1402_1.

Article  Google Scholar 

Hochreiter S, Schmidhuber J. Long short-term memory. Neural Comput. 1997;9(8):1735–80. https://doi.org/10.1162/neco.1997.9.8.1735.

Article  Google Scholar 

Chung J, Gulcehre C, Cho K, Bengio Y. Empirical evaluation of gated recurrent neural networks on sequence modeling. 2014; 1–9. https://arxiv.org/abs/1412.3555. Accessed Dec 2024.

Xu K, Ba J, Kiros R, Cho K, Courville A, Salakhudinov R, Zemel R, Learning, Y. B. B. T.-P. of the 32nd I. C. on M. Show, attend and tell: neural image caption generation with visual attention (F Bach, D Blei, editors. PMLR (n.d.);7:2048–2057. https://proceedings.mlr.press/v37/xuc15.pdf. Accessed Dec 2024.

Jbene M, Raif M, Tigani S, Chehri A, Saadane R. User sentiment analysis in conversational systems based on augmentation and attention-based BiLSTM. Procedia Computer Science. 2022;207:4106–12. https://doi.org/10.1016/j.procs.2022.09.473.

Article  Google Scholar 

Ravuri S, Stolcke A. Recurrent neural network and LSTM models for lexical utterance classification. In: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, Vols. 2015-January. 2015. https://doi.org/10.21437/interspeech.2015-42.

Jbene M, Tigani S, Saadane R, Chehri A. An LSTM-based intent detector for conversational recommender systems. 2022 IEEE 95th Vehicular Technology Conference: (VTC2022-Spring). 2022;1–5. https://doi.org/10.1109/VTC2022-Spring54318.2022.9860839.

Sutskever I, Vinyals O, Le QV. Sequence to sequence learning with neural networks. Adv Neural Inf Process Syst. 2014;4:3104–3112. https://arxiv.org/abs/1409.3215. Accessed Dec 2024.

Graves A. Generating sequences with recurrent neural networks. 2013;1–43. https://arxiv.org/abs/1308.0850. Accessed Dec 2024.

Liu W, Wang Q, Zhu Y, Chen H. GRU: optimization of NPI performance. In: The Journal of Supercomputing (Vol. 76, Issue 5). Springer Science and Business Media LLC. 2018; pp. 3542–3554. https://doi.org/10.1007/s11227-018-2634-9.

Schuster M, Paliwal KK. Bidirectional recurrent neural networks. IEEE Trans Signal Process. 1997;45(11):2673–81. https://doi.org/10.1109/78.650093.

Article  Google Scholar 

Graves A, Mohamed AR, Hinton G. Speech recognition with deep recurrent neural networks. ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings. 2013;3:6645–9. https://doi.org/10.1109/ICASSP.2013.6638947.

Article  Google Scholar 

Zhao B, Li X, Lu X. Hierarchical recurrent neural network for video summarization. In Proceedings of the 25th ACM international conference on Multimedia. MM ’17: ACM Multimedia Conference. ACM. 2017. https://doi.org/10.1145/3123266.3123328.

Liu J. A hierarchical RNN-based model for learning recommendation with session intent detection. In: International Conferences on Software Engineering and Knowledge Engineering, 2021. The 33rd International Conference on Software Engineering and Knowledge Engineering. KSI Research Inc. 2021; pp pp 451–457. https://doi.org/10.18293/seke2021-061.

Chen L, Zhang H, Xiao J, Nie L, Shao J, Liu W, Chua T. SCA-CNN: spatial and channel-wise attention in convolutional networks for image captioning. IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2017;2017:6298–306. https://doi.org/10.1109/CVPR.2017.667.

Article  Google Scholar 

You Q, Jin H, Wang Z, Fang C, Luo J. Image captioning with semantic attention. IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2016;2016:4651–9. https://doi.org/10.1109/CVPR.2016.503.

Article  Google Scholar 

Bahdanau D, Cho K, Bengio Y. Neural machine translation by jointly learning to align and translate. In: Bengio Y, LeCun Y, ediors. 3rd International Conference on Learning Representations, 2015, San Diego, CA, USA, May 7–9, 2015, Conference Track Proceedings. 2015. https://arxiv.org/abs/1409.0473. Accessed Dec 2024.

Graves A, Wayne G, Danihelka I. Neural turing machines. CoRR. 2014;abs/1410.5401. https://arxiv.org/abs/1410.5401. Accessed Dec 2024.

Luong T, Pham H, Manning CD. Effective approaches to attention-based neural machine translation. In: Màrquez L, Callison-Burch C, Su J editors. Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing (pp. 1412–1421). Association for Computational Linguistics. 2015.https://doi.org/10.18653/v1/D15-1166.

Chen Q, Hu Q, Huang JX, He, L, An W. Enhancing recurrent neural networks with positional attention for question answering. Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2017;993–996. https://doi.org/10.1145/3077136.3080699.

Liu B, Lane I. Attention-based recurrent neural network models for joint intent detection and slot filling. Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, 08–12-September-2016. 2016;685–689. https://doi.org/10.21437/Interspeech.2016-135.

Ying H, Zhuang F, Zhang F, Liu Y, Xu G, Xie X, Xiong H, Wu J. Sequential recommender system based on hierarchical attention network. IJCAI International Joint Conference on Artificial Intelligence, 2018-July. 2018;3926–3932. https://doi.org/10.24963/ijcai.2018/546.

Liu Y, Meng F, Zhang J, Zhou J, Chen Y, Xu J. CM-Net: a novel collaborative memory network for spoken language understanding. In: Inui K, Jiang J, Ng V, Wan X, editors. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). Association for Computational Linguistics; 2019. pp. 1051–1060. https://doi.org/10.18653/v1/D19-1097.

Daha FZ, Hewavitharana S. Deep neural architecture with character embedding for semantic frame detection. 13th International Conference on Semantic Computing, 2019, Newport Beach, CA, USA, January 30 - February 1, 2019. 2019. pp. 302–307. https://doi.org/10.1109/ICOSC.2019.8665582.

Wei P, Zeng B, Liao W. Joint intent detection and slot filling with wheel-graph attention networks. J Intell Fuzzy Syst. 2022;42(3):2409–20. https://doi.org/10.3233/JIFS-211674.

Article  Google Scholar 

Yolchuyeva S, Németh G, Gyires-Tóth B. Self-attention networks for intent detection. In: Mitkov R, Angelova G, editors. Proceedings of the International Conference on Recent Advances in Natural Language Processing, 2019, Varna, Bulgaria, September 2–4, 2019 (pp. 1373–1379). Ltd. 2019. https://doi.org/10.26615/978-954-452-056-4\_157.

Yang P, Ji D, Ai C, Li B. AISE: attending to intent and slots explicitly for better spoken language understanding. Knowl-Based Syst. 2021;211: 106537. https://doi.org/10.1016/j.knosys.2020.106537.

Article  Google Scholar 

Shen Y, Hsu Y-C, Ray A, Jin H. Enhancing the generalization for Intent classification and out-of-domain detection in SLU. In: Zong C, Xia F, Li W, Navigli R, editors. Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, 2021, (Volume 1: Long Papers), Virtual Event, August 1–6, 2021. Association for Computational Linguistics; 2021. pp. 2443–2453. https://doi.org/10.18653/V1/2021.ACL-LONG.190.

Abro WA, Qi G, Aamir M, Ali Z. Joint intent detection and slot filling using weighted finite state transducer and BERT. Appl Intell. 2022;52(15):17356–70. https://doi.org/10.1007/S10489-022-03295-9.

Article  Google Scholar 

Yoon Y, Lee J, Kim K, Park C, Kim T. BlendX: complex multi-intent detection with blended patterns. In: Calzolari N, Kan M-Y, Hoste V, Lenci A, Sakti S, Xue N, editors. Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation, 2024, 20–25 May, 2024, Torino, Italy. and . 2024. pp. 2428–2439. https://aclanthology.org/2024.lrec-main.218. Accessed Dec 2024.

Huang Z, Liu F, Zhou P, Zou Y. Sentiment injected iteratively co-interactive network for spoken language understanding. International Conference on Acoustics, Speech and Signal Processing, 2021, Toronto, ON, Canada, June 6–11, 2021. 2021. pp. 7488–7492. https://doi.org/10.1109/ICASSP39728.2021.9413885.

Li C, Zhou Y, Chao G, Chu D. Understanding users’ requirements precisely: a double Bi-LSTM-CRF joint model for detecting user’s intentions and slot tags. Neural Comput Appl. 2022;34(16):13639–48. https://doi.org/10.1007/S00521-022-07171-Y.

Article  Google Scholar 

Louvan S, Magnini B. Simple is better! Lightweight data augmentation for low resource slot filling and intent classification. In: Le Nguyen M , Luong MC, Song S, editors. Proceedings of the 34th Pacific Asia Conference on Language, Information and Computation, 2020, Hanoi, Vietnam, October 24–26, 2020 (pp. 167–177). Association for Computational Linguistics. 2020. https://aclanthology.org/2020.paclic-1.20/. Accessed Dec 2024.

Qin L, Wei F, Xie T, Xu X, Che W, Liu T. GL-GIN: fast and accurate non-autoregressive model for joint multiple intent detection and slot filling. In: Zong C, Xia F, Li W, Navigli R, editors. Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, 2021, (Volume 1: Long Papers), Virtual Event, August 1–6, 2021. Association for Computational Linguistics; 2021. pp. 178–188. https://doi.org/10.18653/V1/2021.ACL-LONG.15.

Chen L, Zhou P, Zou Y. Joint multiple intent detection and slot filling via self-distillation. International Conference on Acoustics, Speech and Signal Processing, 2022, Virtual and Singapore, 23–27 May 2022. 2022;7612–7616. https://doi.org/10.1109/ICASSP43922.2022.9747843.

Cheng L, Yang W, Jia W. A scope sensitive and result attentive model for multi-intent spoken language understanding. In: Williams B, Chen Y, Neville J, editors. Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023, Thirty-Fifth Conference on Innovative Applications of Artificial Intelligence, 2023, Thirteenth Symposium on Educational Advances in Artificial Intelligence, 2023, Washington, DC, USA, February 7–14, 2023. Press; 2023. pp. 12691–12699. https://doi.org/10.1609/AAAI.V37I11.26493.

Tu NA, Uyen HTT, Phuong TM, Bach NX. Joint multiple intent detection and slot filling with supervised contrastive learning and self-distillation. In: Gal K, Nowé A, Nalepa GJ, Fairstein R, Radulescu R, editors. ECAI 2023–26th European Conference on Artificial Intelligence, September 30 - October 4, 2023, Kraków, Poland - Including 12th Conference on Prestigious Applications of Intelligent Systems (PAIS 2023), Vol. 372. IOS Press; 2023. pp. 2370–2377. https://doi.org/10.3233/FAIA230538.

Yin S, Huang P, Xu Y, Huang H, Chen J. Do large language model understand multi-intent spoken language ? CoRR, abs/2403.04481. 2024. https://doi.org/10.48550/ARXIV.2403.04481.

Yin S, Huang P, Xu Y. Uni-MIS: united multiple intent spoken language understanding via multi-view intent-slot interaction. In: Wooldridge MJ, Dy JG, Natarajan S, editors. Thirty-Eighth Conference on Artificial Intelligence, 2024, Thirty-Sixth Conference on Innovative Applications of Artificial Intelligence, 2024, Fourteenth Symposium on Educational Advances in Artificial Intelligence, 2014, February 20–27, 2024, Vancouver, Canada. Press; 2024. pp. 19395–19403. https://doi.org/10.1609/AAAI.V38I17.29910.

Qin L, Che W, Li Y, Wen H, Liu T. A stack-propagation framework with token-level intent detection for spoken language understanding. In: Inui K, Jiang J, Ng V, Wan X, editors. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). Association for Computational Linguistics; 2019. pp. 2078–2087. https://doi.org/10.18653/v1/D19-1214.

Agarwal V, Shivnikar SD, Ghosh S, Arora H, Saini Y. LIDSNet: a lightweight on-device intent detection model using deep Siamese network. In: Wani MA, Sethi IK, Shi W , Qu G, Raicu DS, Jin R, editors. 20th International Conference on Machine Learning and Applications, 2021, Pasadena, CA, USA, December 13–16, 2021 (pp. 1112–1117). IEEE. 2021. https://doi.org/10.1109/ICMLA52953.2021.00182.

Devlin J, Chang M-W, Lee K, Toutanova K. BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein J, Doran C, Solorio T, editors. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Association for Computational Linguistics; 2019. pp. 4171–4186. https://doi.org/10.18653/v1/N19-1423.

Sanh V, Debut L, Chaumond J, Wolf T. DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. CoRR. 2019;abs/1910.01108. https://arxiv.org/abs/1910.01108. Accessed Dec 2024.

Huang L, Liang S, Ye F, Gao N. A fast attention network for joint intent detection and slot filling on edge devices. IEEE Transactions on Artificial Intelligence. 2024;5(2):530–40. https://doi.org/10.1109/TAI.2023.3309272.

Article  Google Scholar 

Sun Z, Yu H, Song X, Liu R, Yang Y, Zhou D. MobileBERT: a compact task-agnostic BERT for resource-limited devices. In: Jurafsky D, Chai J, Schluter N, Tetreault JR, editors. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020, Online, July 5–10, 2020. Association for Computational Linguistics; 2020. pp 2158–2170. https://doi.org/10.18653/V1/2020.ACL-MAIN.195.

Jiao X, Yin Y, Shang L, Jiang X, Chen X, Li L, Wang F, Liu Q. Tiny

Comments (0)

No login
gif