Zero-shot prompt-based video encoder for surgical gesture recognition

Amsterdam B, Clarkson MJ, Stoyanov D (2021) Gesture recognition in robotic surgery: a review. IEEE Trans Biomed Eng 68(6):2021–2035

Article PubMed Google Scholar

Carreira J, Zisserman A (2017) Quo vadis, action recognition? A new model and the kinetics dataset. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6299–6308

Tran D, Bourdev L, Fergus R, Torresani L, Paluri M (2015) Learning spatiotemporal features with 3D convolutional networks. In: Proceedings of the IEEE international conference on computer vision, pp 4489–4497

Hara K, Kataoka H, Satoh Y (2017) Learning spatio-temporal features with 3D residual networks for action recognition. In: Proceedings of the IEEE international conference on computer vision workshops, pp 3154–3160

Zhang J, Nie Y, Lyu Y, Li H, Chang J, Yang X, Zhang JJ (2020) Symmetric dilated convolution for surgical gesture recognition. In: Medical image computing and computer assisted intervention—MICCAI 2020: 23rd international conference, Lima, Peru, October 4–8, 2020, proceedings, Part III 23, pp 409–418. Springer

Van Amsterdam B, Funke I, Edwards E, Speidel S, Collins J, Sridhar A, Kelly J, Clarkson MJ, Stoyanov D (2022) Gesture recognition in robotic surgery with multimodal attention. IEEE Trans Med Imaging 41(7):1677–1687

Article PubMed Google Scholar

Zhang J, Nie Y, Lyu Y, Yang X, Chang J, Zhang JJ (2021) SD-Net: joint surgical gesture recognition and skill assessment. Int J Comput Assist Radiol Surg 16:1675–1682

Article CAS PubMed PubMed Central Google Scholar

Goldbraikh A, Avisdris N, Pugh CM, Laufer S (2022) Bounded future MS-TCN++ for surgical gesture recognition. In: European conference on computer vision, pp 406–421. Springer

Wang W, Zheng VW, Yu H, Miao C (2019) A survey of zero-shot learning: settings, methods, and applications. ACM Trans Intell Syst Technol TIST 10(2):1–37

Google Scholar

Li M, Chen L, Duan Y, Hu Z, Feng J, Zhou J, Lu J (2022) Bridge-prompt: towards ordinal action understanding in instructional videos. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 19880–19889

Gao Y, Vedula SS, Reiley CE, Ahmidi N, Varadarajan B, Lin HC, Tao L, Zappella L, Béjar B, Yuh DD et al (2014) JHU-ISI gesture and skill assessment working set (JIGSAWS): a surgical activity dataset for human motion modeling. In: MICCAI workshop: M2cai, vol 3

Li S, Farha YA, Liu Y, Cheng M-M, Gall J (2020) MS-TCN++: multi-stage temporal convolutional network for action segmentation. IEEE Trans Pattern Anal Mach Intell 45:6647–6658

Article Google Scholar

DiPietro R, Lea C, Malpani A, Ahmidi N, Vedula SS, Lee GI, Lee MR, Hager GD (2016) Recognizing surgical activities with recurrent neural networks. In: Medical image computing and computer-assisted intervention—MICCAI 2016: 19th international conference, Athens, Greece, October 17–21, 2016, proceedings, Part I 19. Springer, pp 551–558

Tao L, Zappella L, Hager GD, Vidal R (2013) Surgical gesture segmentation and recognition. In: Medical image computing and computer-assisted intervention—MICCAI 2013: 16th international conference, Nagoya, Japan, September 22–26, 2013, proceedings, Part III 16. Springer, pp 339–346

Reiley CE, Lin HC, Varadarajan B, Vagvolgyi B, Khudanpur S, Yuh DD, Hager GD (2008) Automatic recognition of surgical motions using statistical modeling for capturing variability. In: MMVR, vol 132, pp 396–401

Funke I, Bodenstedt S, Oehme F, Bechtolsheim F, Weitz J, Speidel S (2019) Using 3D convolutional neural networks to learn spatiotemporal features for automatic surgical gesture recognition in video. In: International conference on medical image computing and computer-assisted intervention. Springer, pp 467–475

Lea C, Reiter A, Vidal R, Hager GD (2016) Segmental spatiotemporal CNNs for fine-grained action segmentation. In: Computer vision—ECCV 2016: 14th European conference, Amsterdam, The Netherlands, October 11–14, 2016, proceedings, Part III 14. Springer, pp 36–52

Zappella L, Béjar B, Hager G, Vidal R (2013) Surgical gesture classification from video and kinematic data. Med Image Anal 17(7):732–745

Long Y, Wu JY, Lu B, Jin Y, Unberath M, Liu Y-H, Heng PA, Dou Q (2021) Relational graph learning on visual and kinematics embeddings for accurate gesture recognition in robotic surgery. In: 2021 IEEE international conference on robotics and automation (ICRA). IEEE, pp 13346–13353

Qin Y, Feyzabadi S, Allan M, Burdick JW, Azizian M (2020) davincinet: Joint prediction of motion and surgical state in robot-assisted surgery. In: 2020 IEEE/RSJ international conference on intelligent robots and systems (IROS). IEEE, pp 2921–2928

Wu JY, Tamhane A, Kazanzides P, Unberath M (2021) Cross-modal self-supervised representation learning for gesture and skill recognition in robotic surgery. Int J Comput Assist Radiol Surg 16:779–787

Article PubMed Google Scholar

Tao L, Elhamifar E, Khudanpur S, Hager GD, Vidal R (2012) Sparse hidden Markov models for surgical gesture classification and skill evaluation. In: Information processing in computer-assisted interventions: third international conference, IPCAI 2012, Pisa, Italy, June 27, 2012. Proceedings 3. Springer, pp 167–177

DiPietro R, Ahmidi N, Malpani A, Waldram M, Lee GI, Lee MR, Vedula SS, Hager GD (2019) Segmenting and classifying activities in robot-assisted surgery with recurrent neural networks. Int J Comput Assist Radiol Surg 14(11):2005–2020

Article PubMed Google Scholar

Lea C, Flynn MD, Vidal R, Reiter A, Hager GD (2017) Temporal convolutional networks for action segmentation and detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 156–165

Farha YA, Gall J (2019) MS-TCN: multi-stage temporal convolutional network for action segmentation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 3575–3584

Yuan K, Holden M, Gao S, Lee W (2022) Anticipation for surgical workflow through instrument interaction and recognized signals. Med Image Anal 82:102611

Article PubMed Google Scholar

Czempiel T, Paschali M, Keicher M, Simson W, Feussner H, Kim ST, Navab N (2020) Tecno: surgical phase recognition with multi-stage temporal convolutional networks. In: Medical image computing and computer assisted intervention—MICCAI 2020: 23rd international conference, Lima, Peru, October 4–8, 2020, proceedings, part III 23. Springer, pp 343–352

Bengio Y, Courville AC, Vincent P (2012) Unsupervised feature learning and deep learning: a review and new perspectives. CoRR arXiv:1206.5538

Radford A, Kim JW, Hallacy C, Ramesh A, Goh G, Agarwal S, Sastry G, Askell A, Mishkin P, Clark J et al (2021) Learning transferable visual models from natural language supervision. In: International conference on machine learning. PMLR, pp 8748–8763

Chen T, Kornblith S, Norouzi M, Hinton G (2020) A simple framework for contrastive learning of visual representations. In: International conference on machine learning. PMLR, pp 1597–1607

Lin T-Y, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick CL (2014) Microsoft coco: common objects in context. In: Computer vision—ECCV 2014: 13th European conference, Zurich, Switzerland, September 6–12, 2014, proceedings, Part V 13. Springer, pp 740–755

Krishna R, Zhu Y, Groth O, Johnson J, Hata K, Kravitz J, Chen S, Kalantidis Y, Li L-J, Shamma DA et al (2017) Visual genome: connecting language and vision using crowdsourced dense image annotations. Int J Comput Vis 123:32–73

Article Google Scholar

Thomee B, Shamma DA, Friedland G, Elizalde B, Ni K, Poland D, Borth D, Li L-J (2016) Yfcc100m: The new data in multimedia research. Commun ACM 59(2):64–73

Radford A, Wu J, Child R, Luan D, Amodei D, Sutskever I et al (2019) Language models are unsupervised multitask learners. OpenAI blog 1(8):9

Google Scholar

Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X, Unterthiner T, Dehghani M, Minderer M, Heigold G, Gelly S et al (2021) An image is worth \(16 \times 16\) words: transformers for image recognition at scale. In: International conference on learning representations

He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778

View original article

INTERNATIONAL JOURNAL OF COMPUTER ASSISTED RADIOLOGY AND SURGERY

Like

Share Bookmark

0 0 0 0 0 0 0

More from this channel

Zero-shot prompt-based video encoder for surgical gesture recognition

Comments (0)