{"componentChunkName":"component---src-templates-post-template-js","path":"/posts/Spatial-temporal-attention-for-video-based-assessment-of-intraoperative-surgical-skill","result":{"data":{"markdownRemark":{"id":"068b1f17-182e-5d28-b30a-e96f78f519a8","html":"<p><strong>Paper:</strong> <a href=\"https://doi.org/10.1038/s41598-024-77176-1\" target=\"_blank\" rel=\"nofollow noopener noreferrer\">https://doi.org/10.1038/s41598-024-77176-1</a></p>\n<center>\n    <img style=\"border-radius: 0.3125em;\n    box-shadow: 0 2px 4px 0 rgba(34,36,38,.12),0 2px 10px 0 rgba(34,36,38,.08);\" \n    src=\"/media/paper-images/surgical-skill/fig1.jpg\">\n    <br>\n    <div style=\"color:orange; border-bottom: 1px solid #d9d9d9;\n    display: inline-block;\n    color: #999;\n    padding: 2px;\">\n\tOverall architecture of the spatial-temporal attention network for surgical skill assessment.\n\t</div>\n</center>\n<h2 id=\"highlights\" style=\"position:relative;\"><a href=\"#highlights\" aria-label=\"highlights permalink\" class=\"anchor before\"><svg aria-hidden=\"true\" focusable=\"false\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a>Highlights</h2>\n<ul>\n<li>Novel spatial-temporal attention mechanism tailored for surgical video analysis.</li>\n<li>Automated objective assessment of surgical skill from intraoperative videos.</li>\n<li>Attention visualization reveals important surgical actions and anatomical regions correlated with skill level.</li>\n<li>State-of-the-art performance on multiple surgical skill assessment benchmarks.</li>\n</ul>\n<h2 id=\"abstract\" style=\"position:relative;\"><a href=\"#abstract\" aria-label=\"abstract permalink\" class=\"anchor before\"><svg aria-hidden=\"true\" focusable=\"false\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a>Abstract</h2>\n<p>Objective assessment of surgical skill is crucial for surgical training, credentialing, and quality improvement. Traditional methods rely on manual expert evaluation, which is subjective, time-consuming, and resource-intensive. We propose an automated surgical skill assessment framework based on spatial-temporal attention mechanisms applied to intraoperative videos. Our method learns to identify and focus on critical surgical actions and anatomical regions that are indicative of skill level. The spatial attention module identifies important regions in each video frame, such as surgical instruments and key anatomical structures. The temporal attention module captures the dynamics of surgical workflow and the temporal patterns that distinguish expert from novice performance. By combining these complementary attention mechanisms, our model achieves objective, consistent, and interpretable surgical skill assessment. Experimental results on multiple surgical datasets demonstrate that our approach achieves superior performance compared to existing methods and provides insights into the visual cues associated with surgical expertise.</p>\n<h2 id=\"method\" style=\"position:relative;\"><a href=\"#method\" aria-label=\"method permalink\" class=\"anchor before\"><svg aria-hidden=\"true\" focusable=\"false\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a>Method</h2>\n<center>\n    <img style=\"border-radius: 0.3125em;\n    box-shadow: 0 2px 4px 0 rgba(34,36,38,.12),0 2px 10px 0 rgba(34,36,38,.08);\" \n    src=\"/media/paper-images/surgical-skill/fig2.jpg\">\n    <br>\n    <div style=\"color:orange; border-bottom: 1px solid #d9d9d9;\n    display: inline-block;\n    color: #999;\n    padding: 2px;\">\n\tIllustration of the spatial attention mechanism identifying critical regions in surgical videos.\n\t</div>\n</center>\n<p>Our approach consists of two main components: spatial attention and temporal attention.</p>\n<p>The spatial attention module processes each video frame to identify regions that are most relevant for skill assessment. Rather than treating all regions equally, the spatial attention mechanism learns to focus on surgical instruments, target anatomy, and areas where critical actions occur. This is implemented through a learnable attention map that weighs different spatial regions based on their importance for skill classification.</p>\n<center>\n    <img style=\"border-radius: 0.3125em;\n    box-shadow: 0 2px 4px 0 rgba(34,36,38,.12),0 2px 10px 0 rgba(34,36,38,.08);\" \n    src=\"/media/paper-images/surgical-skill/fig3.jpg\">\n    <br>\n    <div style=\"color:orange; border-bottom: 1px solid #d9d9d9;\n    display: inline-block;\n    color: #999;\n    padding: 2px;\">\n\tTemporal attention weights across video frames showing important surgical phases.\n\t</div>\n</center>\n<p>The temporal attention module analyzes the sequence of frames to capture surgical workflow dynamics and temporal patterns. Expert surgeons exhibit smoother, more efficient movements and better adherence to optimal surgical sequences. The temporal attention mechanism learns to identify these temporal signatures of expertise by attending to key phases of the procedure and transitions between surgical actions.</p>\n<p>The spatial and temporal features are integrated through a fusion layer, and the combined representation is used for skill level prediction. This joint spatial-temporal modeling enables comprehensive understanding of surgical performance.</p>\n<h2 id=\"results\" style=\"position:relative;\"><a href=\"#results\" aria-label=\"results permalink\" class=\"anchor before\"><svg aria-hidden=\"true\" focusable=\"false\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a>Results</h2>\n<p>Our framework achieves state-of-the-art performance on standard surgical skill assessment benchmarks. The spatial-temporal attention mechanism significantly outperforms methods using only spatial or only temporal features, demonstrating the importance of their combination.</p>\n<center>\n    <img style=\"border-radius: 0.3125em;\n    box-shadow: 0 2px 4px 0 rgba(34,36,38,.12),0 2px 10px 0 rgba(34,36,38,.08);\" \n    src=\"/media/paper-images/surgical-skill/fig4.jpg\">\n    <br>\n    <div style=\"color:orange; border-bottom: 1px solid #d9d9d9;\n    display: inline-block;\n    color: #999;\n    padding: 2px;\">\n\tAttention visualizations showing regions and time points the model focuses on for skill assessment.\n\t</div>\n</center>\n<p>The attention visualizations provide interpretable insights into what the model considers important for skill assessment. Spatial attention maps highlight surgical instruments and critical anatomical structures. Temporal attention weights reveal that the model learns to focus on challenging phases of the procedure where skill differences are most pronounced.</p>\n<center>\n    <img style=\"border-radius: 0.3125em;\n    box-shadow: 0 2px 4px 0 rgba(34,36,38,.12),0 2px 10px 0 rgba(34,36,38,.08);\" \n    src=\"/media/paper-images/surgical-skill/fig5.jpg\">\n    <br>\n    <div style=\"color:orange; border-bottom: 1px solid #d9d9d9;\n    display: inline-block;\n    color: #999;\n    padding: 2px;\">\n\tComparison of skill assessment performance across different methods and datasets.\n\t</div>\n</center>\n<h2 id=\"conclusion\" style=\"position:relative;\"><a href=\"#conclusion\" aria-label=\"conclusion permalink\" class=\"anchor before\"><svg aria-hidden=\"true\" focusable=\"false\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a>Conclusion</h2>\n<p>This article is only meant for a brief introduction.</p>\n<p>We present a spatial-temporal attention framework for automated surgical skill assessment from intraoperative videos. The spatial attention module identifies critical regions in each frame, while the temporal attention module captures the dynamics of surgical workflow. By combining these complementary attention mechanisms, our model achieves accurate, objective, and interpretable surgical skill assessment. The attention visualizations provide insights into the visual and temporal cues associated with surgical expertise, which could inform surgical training curricula. Our approach demonstrates the potential of deep learning to provide scalable, consistent surgical skill evaluation, supporting surgical education and quality improvement initiatives.</p>","fields":{"slug":"/posts/Spatial-temporal-attention-for-video-based-assessment-of-intraoperative-surgical-skill","tagSlugs":["/tag/computer-vision/","/tag/deep-learning/","/tag/surgical-skill-assessment/","/tag/video-understanding/","/tag/attention-mechanism/","/tag/research/"]},"frontmatter":{"date":"2023-06-01T14:13:40.121Z","description":"<p>We propose a spatial-temporal attention mechanism for automated surgical skill assessment from intraoperative videos, enabling objective evaluation of surgical performance.</p> <p style=\"font-style: italic;\"><span style=\"font-weight: bold\">Bohua Wan</span>, M. Peven, G. Hager, S. Sikder, S. S. Vedula</p> <p style=\"font-style: italic;\">Scientific Reports (2024).</p>","tags":["Computer Vision","Deep Learning","Surgical Skill Assessment","Video Understanding","Attention Mechanism","Research"],"title":"Spatial-temporal attention for video-based assessment of intraoperative surgical skill","socialImage":{"publicURL":"/static/3b5003decd88871595c9c6fa4f2d2e75/fig1.jpg"}}}},"pageContext":{"slug":"/posts/Spatial-temporal-attention-for-video-based-assessment-of-intraoperative-surgical-skill"}},"staticQueryHashes":["251939775","401334301","41472230"]}