<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0"><channel><title><![CDATA[Bohua Wan's personal site]]></title><description><![CDATA[PhD student majored in computer science at Johns Hopkins University.]]></description><link>https://bohua-wan.netlify.com</link><generator>GatsbyJS</generator><lastBuildDate>Sun, 21 Dec 2025 01:49:23 GMT</lastBuildDate><item><title><![CDATA[Towards Virtual Clinical Trials of Radiology AI with Conditional Generative Modeling]]></title><description><![CDATA[<p>We propose a framework for conducting virtual clinical trials of radiology AI systems using conditional generative models to synthesize realistic medical imaging scenarios for comprehensive AI evaluation.</p> <p style="font-style: italic;">B. D. Killeen*, <span style="font-weight: bold">Bohua Wan*</span>, A. V. Kulkarni, N. Drenkow, M. Oberst, P. H. Yi, M. Unberath</p> <p style="font-style: italic;">arXiv preprint arXiv:2502.09688 (2025).</p>]]></description><link>https://bohua-wan.netlify.com/posts/Towards-Virtual-Clinical-Trials-of-Radiology-AI-with-Conditional-Generative-Modeling</link><guid isPermaLink="false">https://bohua-wan.netlify.com/posts/Towards-Virtual-Clinical-Trials-of-Radiology-AI-with-Conditional-Generative-Modeling</guid><pubDate>Sat, 01 Jun 2024 14:13:40 GMT</pubDate><content:encoded>&lt;p&gt;&lt;strong&gt;Paper:&lt;/strong&gt; &lt;a href=&quot;https://arxiv.org/abs/2502.09688&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener noreferrer&quot;&gt;arXiv:2502.09688&lt;/a&gt;&lt;/p&gt;
&lt;center&gt;
    &lt;img style=&quot;border-radius: 0.3125em;
    box-shadow: 0 2px 4px 0 rgba(34,36,38,.12),0 2px 10px 0 rgba(34,36,38,.08);&quot; 
    src=&quot;/media/paper-images/virtual-trials/architecture.png&quot;&gt;
    &lt;br&gt;
    &lt;div style=&quot;color:orange; border-bottom: 1px solid #d9d9d9;
    display: inline-block;
    color: #999;
    padding: 2px;&quot;&gt;
	Framework for virtual clinical trials using conditional generative modeling.
	&lt;/div&gt;
&lt;/center&gt;
&lt;h2 id=&quot;highlights&quot; style=&quot;position:relative;&quot;&gt;&lt;a href=&quot;#highlights&quot; aria-label=&quot;highlights permalink&quot; class=&quot;anchor before&quot;&gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Highlights&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Novel framework for conducting virtual clinical trials to evaluate radiology AI systems.&lt;/li&gt;
&lt;li&gt;Conditional generative modeling to synthesize realistic and diverse medical imaging scenarios.&lt;/li&gt;
&lt;li&gt;Systematic evaluation of AI performance across controlled variations in patient characteristics and imaging conditions.&lt;/li&gt;
&lt;li&gt;Addresses limitations of traditional clinical trials including cost, time, and limited scenario coverage.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&quot;abstract&quot; style=&quot;position:relative;&quot;&gt;&lt;a href=&quot;#abstract&quot; aria-label=&quot;abstract permalink&quot; class=&quot;anchor before&quot;&gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Abstract&lt;/h2&gt;
&lt;p&gt;Rigorous evaluation of radiology AI systems is essential for safe clinical deployment, but traditional clinical trials are expensive, time-consuming, and limited in their ability to test AI performance across the full spectrum of clinical scenarios. We propose a framework for virtual clinical trials that uses conditional generative models to synthesize realistic medical images with controlled variations in patient characteristics, disease presentations, and imaging parameters. This enables systematic evaluation of AI performance across diverse scenarios that may be rare or difficult to acquire in real clinical settings. Our conditional generative models are trained to produce high-fidelity medical images conditioned on relevant clinical variables such as patient demographics, disease severity, and imaging protocol. By sampling from these models, we can create large-scale synthetic test sets that comprehensively probe AI system behavior. We demonstrate that virtual clinical trials can reveal performance variations and failure modes that may not be apparent from evaluation on standard test sets. This approach provides a scalable, cost-effective complement to traditional clinical trials, enabling more thorough pre-deployment validation of radiology AI systems.&lt;/p&gt;
&lt;h2 id=&quot;method&quot; style=&quot;position:relative;&quot;&gt;&lt;a href=&quot;#method&quot; aria-label=&quot;method permalink&quot; class=&quot;anchor before&quot;&gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Method&lt;/h2&gt;
&lt;center&gt;
    &lt;img style=&quot;border-radius: 0.3125em;
    box-shadow: 0 2px 4px 0 rgba(34,36,38,.12),0 2px 10px 0 rgba(34,36,38,.08);&quot; 
    src=&quot;/media/paper-images/virtual-trials/method.png&quot;&gt;
    &lt;br&gt;
    &lt;div style=&quot;color:orange; border-bottom: 1px solid #d9d9d9;
    display: inline-block;
    color: #999;
    padding: 2px;&quot;&gt;
	Conditional generative model architecture for synthesizing medical images with controlled attributes.
	&lt;/div&gt;
&lt;/center&gt;
&lt;p&gt;Our framework consists of three main components: conditional generative modeling, virtual trial design, and comprehensive AI evaluation.&lt;/p&gt;
&lt;p&gt;The conditional generative modeling component learns to synthesize realistic medical images conditioned on clinical variables. We employ advanced generative architectures such as conditional GANs or diffusion models that can capture the complex distribution of medical images while maintaining controllability through conditioning. The conditioning variables include patient demographics (age, sex, etc.), disease characteristics (type, severity, location), and imaging parameters (scanner type, acquisition protocol).&lt;/p&gt;
&lt;center&gt;
    &lt;img style=&quot;border-radius: 0.3125em;
    box-shadow: 0 2px 4px 0 rgba(34,36,38,.12),0 2px 10px 0 rgba(34,36,38,.08);&quot; 
    src=&quot;/media/paper-images/virtual-trials/demo.png&quot;&gt;
    &lt;br&gt;
    &lt;div style=&quot;color:orange; border-bottom: 1px solid #d9d9d9;
    display: inline-block;
    color: #999;
    padding: 2px;&quot;&gt;
	Virtual trial design process showing systematic variation of clinical parameters.
	&lt;/div&gt;
&lt;/center&gt;
&lt;p&gt;The virtual trial design component systematically varies conditioning variables to create comprehensive test scenarios. This allows us to evaluate AI performance across different patient subgroups, disease presentations, and imaging conditions. The design follows principles from real clinical trial methodology but with the flexibility to test scenarios that may be impractical in real trials.&lt;/p&gt;
&lt;p&gt;The comprehensive evaluation component assesses AI system performance on the synthesized test sets, analyzing not only overall accuracy but also performance stratified by clinical variables. This reveals potential biases, performance gaps in specific subpopulations, and failure modes that may not be apparent from aggregate metrics.&lt;/p&gt;
&lt;h2 id=&quot;results&quot; style=&quot;position:relative;&quot;&gt;&lt;a href=&quot;#results&quot; aria-label=&quot;results permalink&quot; class=&quot;anchor before&quot;&gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Results&lt;/h2&gt;
&lt;p&gt;Our framework successfully generates realistic medical images that are clinically plausible and diagnostically useful. Radiologist evaluation confirms that synthetic images are difficult to distinguish from real images and maintain clinical relevance.&lt;/p&gt;
&lt;center&gt;
    &lt;img style=&quot;border-radius: 0.3125em;
    box-shadow: 0 2px 4px 0 rgba(34,36,38,.12),0 2px 10px 0 rgba(34,36,38,.08);&quot; 
    src=&quot;/media/paper-images/virtual-trials/results.png&quot;&gt;
    &lt;br&gt;
    &lt;div style=&quot;color:orange; border-bottom: 1px solid #d9d9d9;
    display: inline-block;
    color: #999;
    padding: 2px;&quot;&gt;
	Examples of synthetic medical images generated with different conditioning variables.
	&lt;/div&gt;
&lt;/center&gt;
&lt;p&gt;Virtual clinical trials reveal important insights about AI system performance. By systematically varying clinical parameters, we identify performance degradation in specific scenarios such as rare disease presentations or suboptimal imaging conditions. We also uncover biases related to patient demographics that may not be apparent from standard evaluation.&lt;/p&gt;
&lt;center&gt;
    &lt;img style=&quot;border-radius: 0.3125em;
    box-shadow: 0 2px 4px 0 rgba(34,36,38,.12),0 2px 10px 0 rgba(34,36,38,.08);&quot; 
    src=&quot;/media/paper-images/virtual-trials/analysis.png&quot;&gt;
    &lt;br&gt;
    &lt;div style=&quot;color:orange; border-bottom: 1px solid #d9d9d9;
    display: inline-block;
    color: #999;
    padding: 2px;&quot;&gt;
	AI performance analysis across different clinical scenarios in virtual trials.
	&lt;/div&gt;
&lt;/center&gt;
&lt;p&gt;The virtual trial framework enables identification of failure modes and performance boundaries that would require prohibitively large real clinical trials to discover. This provides valuable information for improving AI systems and defining appropriate use cases for clinical deployment.&lt;/p&gt;
&lt;h2 id=&quot;conclusion&quot; style=&quot;position:relative;&quot;&gt;&lt;a href=&quot;#conclusion&quot; aria-label=&quot;conclusion permalink&quot; class=&quot;anchor before&quot;&gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Conclusion&lt;/h2&gt;
&lt;p&gt;This article is only meant for a brief introduction.&lt;/p&gt;
&lt;p&gt;We present a framework for virtual clinical trials of radiology AI systems using conditional generative modeling. By synthesizing realistic medical images with controlled variations in clinical parameters, we enable comprehensive evaluation of AI performance across diverse scenarios. This approach addresses key limitations of traditional clinical trials including cost, duration, and limited coverage of rare or challenging cases. Virtual trials reveal performance variations and potential biases that may not be apparent from standard evaluation, providing valuable insights for AI development and deployment decisions. While not replacing real clinical trials, our framework offers a powerful complementary tool for rigorous pre-deployment validation of medical AI systems, ultimately contributing to safer and more effective clinical AI deployment.&lt;/p&gt;</content:encoded></item><item><title><![CDATA[Deep learning xerostomia prediction model with anatomy normalization and high-resolution class activation map]]></title><description><![CDATA[<p>We develop an interpretable deep learning model for xerostomia prediction using anatomy normalization and high-resolution class activation maps for improved spatial interpretability.</p> <p style="font-style: italic;"><span style="font-weight: bold">Bohua Wan</span>, T. McNutt, H. Quon, J. Lee</p> <p style="font-style: italic;">Proc. SPIE Medical Imaging 2025 (2025).</p>]]></description><link>https://bohua-wan.netlify.com/posts/Deep-learning-xerostomia-prediction-model-with-anatomy-normalization-and-high-resolution-class-activation-map</link><guid isPermaLink="false">https://bohua-wan.netlify.com/posts/Deep-learning-xerostomia-prediction-model-with-anatomy-normalization-and-high-resolution-class-activation-map</guid><pubDate>Thu, 01 Feb 2024 14:13:40 GMT</pubDate><content:encoded>&lt;p&gt;&lt;strong&gt;Paper:&lt;/strong&gt; &lt;a href=&quot;https://doi.org/10.1117/12.3046796&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener noreferrer&quot;&gt;https://doi.org/10.1117/12.3046796&lt;/a&gt;&lt;/p&gt;
&lt;center&gt;
    &lt;img style=&quot;border-radius: 0.3125em;
    box-shadow: 0 2px 4px 0 rgba(34,36,38,.12),0 2px 10px 0 rgba(34,36,38,.08);&quot; 
    src=&quot;/media/paper-images/xerostomia-cam/fig1.jpg&quot;&gt;
    &lt;br&gt;
    &lt;div style=&quot;color:orange; border-bottom: 1px solid #d9d9d9;
    display: inline-block;
    color: #999;
    padding: 2px;&quot;&gt;
	Overall architecture of the xerostomia prediction model with anatomy normalization and high-resolution CAM.
	&lt;/div&gt;
&lt;/center&gt;
&lt;h2 id=&quot;highlights&quot; style=&quot;position:relative;&quot;&gt;&lt;a href=&quot;#highlights&quot; aria-label=&quot;highlights permalink&quot; class=&quot;anchor before&quot;&gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Highlights&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Novel anatomy normalization approach to standardize medical images for improved model generalization.&lt;/li&gt;
&lt;li&gt;High-resolution class activation mapping (CAM) for fine-grained spatial interpretability.&lt;/li&gt;
&lt;li&gt;Improved prediction accuracy through anatomically-aligned feature learning.&lt;/li&gt;
&lt;li&gt;Enhanced clinical interpretability enabling identification of critical anatomical regions contributing to xerostomia risk.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&quot;abstract&quot; style=&quot;position:relative;&quot;&gt;&lt;a href=&quot;#abstract&quot; aria-label=&quot;abstract permalink&quot; class=&quot;anchor before&quot;&gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Abstract&lt;/h2&gt;
&lt;p&gt;Radiation-induced xerostomia remains a significant challenge in head and neck cancer radiotherapy. While deep learning models have shown promise in predicting treatment outcomes, their clinical adoption is limited by lack of interpretability and challenges in handling anatomical variations across patients. We propose a deep learning framework that incorporates anatomy normalization to standardize patient-specific anatomical variations and employs high-resolution class activation maps (CAM) to provide spatially-precise explanations of model predictions. The anatomy normalization module aligns anatomical structures across patients, enabling the model to learn more generalizable features. The high-resolution CAM provides fine-grained visualization of which anatomical regions contribute most to xerostomia risk, offering valuable insights for clinicians. Our approach achieves superior prediction performance while maintaining high interpretability, demonstrating the importance of combining domain knowledge with deep learning for medical outcome prediction.&lt;/p&gt;
&lt;h2 id=&quot;method&quot; style=&quot;position:relative;&quot;&gt;&lt;a href=&quot;#method&quot; aria-label=&quot;method permalink&quot; class=&quot;anchor before&quot;&gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Method&lt;/h2&gt;
&lt;center&gt;
    &lt;img style=&quot;border-radius: 0.3125em;
    box-shadow: 0 2px 4px 0 rgba(34,36,38,.12),0 2px 10px 0 rgba(34,36,38,.08);&quot; 
    src=&quot;/media/paper-images/xerostomia-cam/fig2.jpg&quot;&gt;
    &lt;br&gt;
    &lt;div style=&quot;color:orange; border-bottom: 1px solid #d9d9d9;
    display: inline-block;
    color: #999;
    padding: 2px;&quot;&gt;
	Illustration of the anatomy normalization process.
	&lt;/div&gt;
&lt;/center&gt;
&lt;p&gt;Our method consists of two key innovations: anatomy normalization and high-resolution class activation mapping.&lt;/p&gt;
&lt;p&gt;The anatomy normalization module addresses the challenge of anatomical variation across patients. By aligning key anatomical structures before feature extraction, we enable the model to learn features that are robust to patient-specific anatomical differences. This normalization is performed using deformable registration guided by anatomical landmarks and segmentation masks.&lt;/p&gt;
&lt;center&gt;
    &lt;img style=&quot;border-radius: 0.3125em;
    box-shadow: 0 2px 4px 0 rgba(34,36,38,.12),0 2px 10px 0 rgba(34,36,38,.08);&quot; 
    src=&quot;/media/paper-images/xerostomia-cam/fig3.jpg&quot;&gt;
    &lt;br&gt;
    &lt;div style=&quot;color:orange; border-bottom: 1px solid #d9d9d9;
    display: inline-block;
    color: #999;
    padding: 2px;&quot;&gt;
	High-resolution class activation maps showing regions contributing to xerostomia prediction.
	&lt;/div&gt;
&lt;/center&gt;
&lt;p&gt;The high-resolution CAM module provides detailed spatial explanations of model predictions. Unlike traditional CAM methods that produce low-resolution activation maps, our approach generates high-resolution visualizations that precisely localize anatomical regions contributing to prediction. This is achieved through a specialized upsampling strategy that preserves spatial details while maintaining semantic meaning.&lt;/p&gt;
&lt;h2 id=&quot;results&quot; style=&quot;position:relative;&quot;&gt;&lt;a href=&quot;#results&quot; aria-label=&quot;results permalink&quot; class=&quot;anchor before&quot;&gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Results&lt;/h2&gt;
&lt;p&gt;Our framework demonstrates superior performance in xerostomia prediction while providing clinically meaningful interpretations. The anatomy normalization significantly improves model generalization across diverse patient populations. The high-resolution CAMs successfully identify known risk factors such as parotid gland dose distributions and reveal novel spatial patterns associated with xerostomia risk.&lt;/p&gt;
&lt;p&gt;Ablation studies confirm that both anatomy normalization and high-resolution CAM contribute to improved performance and interpretability. Clinical evaluation by radiation oncologists validates that the CAM visualizations align with clinical knowledge and provide actionable insights for treatment planning.&lt;/p&gt;
&lt;h2 id=&quot;conclusion&quot; style=&quot;position:relative;&quot;&gt;&lt;a href=&quot;#conclusion&quot; aria-label=&quot;conclusion permalink&quot; class=&quot;anchor before&quot;&gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Conclusion&lt;/h2&gt;
&lt;p&gt;This article is only meant for a brief introduction.&lt;/p&gt;
&lt;p&gt;We present a deep learning framework for xerostomia prediction that combines anatomy normalization with high-resolution class activation mapping. The anatomy normalization module enables robust feature learning across patients with varying anatomical structures, improving model generalization. The high-resolution CAM provides fine-grained spatial interpretability, identifying specific anatomical regions contributing to xerostomia risk. Our approach achieves state-of-the-art prediction performance while maintaining clinical interpretability, demonstrating the value of incorporating medical domain knowledge into deep learning models. This work represents an important step toward clinically deployable AI systems for personalized radiation therapy planning.&lt;/p&gt;</content:encoded></item><item><title><![CDATA[Deep learning prediction of radiation-induced xerostomia with supervised contrastive pre-training and cluster-guided loss]]></title><description><![CDATA[<p>We propose a novel deep learning framework for predicting radiation-induced xerostomia using supervised contrastive pre-training and cluster-guided loss.</p> <p style="font-style: italic;"><span style="font-weight: bold">Bohua Wan</span>, T. McNutt, R. Ger, H. Quon, J. Lee</p> <p style="font-style: italic;">Proc. SPIE Medical Imaging 2024 (2024).</p>]]></description><link>https://bohua-wan.netlify.com/posts/Deep-learning-prediction-of-radiation-induced-xerostomia-with-supervised-contrastive-pre-training-and-cluster-guided-loss</link><guid isPermaLink="false">https://bohua-wan.netlify.com/posts/Deep-learning-prediction-of-radiation-induced-xerostomia-with-supervised-contrastive-pre-training-and-cluster-guided-loss</guid><pubDate>Mon, 01 Jan 2024 14:13:40 GMT</pubDate><content:encoded>&lt;p&gt;&lt;strong&gt;Paper:&lt;/strong&gt; &lt;a href=&quot;https://doi.org/10.1117/12.3004498&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener noreferrer&quot;&gt;https://doi.org/10.1117/12.3004498&lt;/a&gt;&lt;/p&gt;
&lt;center&gt;
    &lt;img style=&quot;border-radius: 0.3125em;
    box-shadow: 0 2px 4px 0 rgba(34,36,38,.12),0 2px 10px 0 rgba(34,36,38,.08);&quot; 
    src=&quot;/media/paper-images/xerostomia-contrastive/architecture.jpg&quot;&gt;
    &lt;br&gt;
    &lt;div style=&quot;color:orange; border-bottom: 1px solid #d9d9d9;
    display: inline-block;
    color: #999;
    padding: 2px;&quot;&gt;
	Overall architecture of the supervised contrastive pre-training framework.
	&lt;/div&gt;
&lt;/center&gt;
&lt;h2 id=&quot;highlights&quot; style=&quot;position:relative;&quot;&gt;&lt;a href=&quot;#highlights&quot; aria-label=&quot;highlights permalink&quot; class=&quot;anchor before&quot;&gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Highlights&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Novel supervised contrastive pre-training strategy for radiation-induced xerostomia prediction.&lt;/li&gt;
&lt;li&gt;Cluster-guided loss function to improve model performance on imbalanced medical datasets.&lt;/li&gt;
&lt;li&gt;State-of-the-art performance in predicting radiation-induced xerostomia.&lt;/li&gt;
&lt;li&gt;Improved generalization through contrastive learning on limited medical imaging data.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&quot;abstract&quot; style=&quot;position:relative;&quot;&gt;&lt;a href=&quot;#abstract&quot; aria-label=&quot;abstract permalink&quot; class=&quot;anchor before&quot;&gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Abstract&lt;/h2&gt;
&lt;p&gt;Radiation-induced xerostomia is a common side effect of head and neck cancer radiotherapy that significantly impacts patients’ quality of life. Accurate prediction of xerostomia risk before treatment could enable personalized treatment planning. We propose a deep learning framework that combines supervised contrastive pre-training with cluster-guided loss to predict radiation-induced xerostomia. The supervised contrastive learning approach learns robust feature representations from limited medical imaging data, while the cluster-guided loss addresses class imbalance issues common in medical datasets. Our method achieves superior performance compared to existing approaches, demonstrating the effectiveness of combining contrastive learning with specialized loss functions for medical outcome prediction.&lt;/p&gt;
&lt;h2 id=&quot;method&quot; style=&quot;position:relative;&quot;&gt;&lt;a href=&quot;#method&quot; aria-label=&quot;method permalink&quot; class=&quot;anchor before&quot;&gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Method&lt;/h2&gt;
&lt;p&gt;Our method consists of two main components: supervised contrastive pre-training and cluster-guided loss. &lt;/p&gt;
&lt;p&gt;The supervised contrastive pre-training phase learns discriminative feature representations by pulling together samples from the same class while pushing apart samples from different classes in the embedding space. This approach is particularly effective for medical imaging tasks where labeled data is limited.&lt;/p&gt;
&lt;p&gt;The cluster-guided loss addresses the class imbalance problem by incorporating cluster information into the loss function. This ensures that the model learns to distinguish between different outcome groups even when some classes have significantly fewer samples.&lt;/p&gt;
&lt;h2 id=&quot;results&quot; style=&quot;position:relative;&quot;&gt;&lt;a href=&quot;#results&quot; aria-label=&quot;results permalink&quot; class=&quot;anchor before&quot;&gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Results&lt;/h2&gt;
&lt;p&gt;Our framework achieves state-of-the-art performance on xerostomia prediction tasks. The supervised contrastive pre-training significantly improves feature quality, while the cluster-guided loss effectively handles class imbalance. Ablation studies demonstrate the contribution of each component to the overall performance.&lt;/p&gt;
&lt;h2 id=&quot;conclusion&quot; style=&quot;position:relative;&quot;&gt;&lt;a href=&quot;#conclusion&quot; aria-label=&quot;conclusion permalink&quot; class=&quot;anchor before&quot;&gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Conclusion&lt;/h2&gt;
&lt;p&gt;This article is only meant for a brief introduction.&lt;/p&gt;
&lt;p&gt;We present a novel deep learning framework for predicting radiation-induced xerostomia that combines supervised contrastive pre-training with cluster-guided loss. The supervised contrastive learning approach enables effective learning from limited medical imaging data by learning robust feature representations. The cluster-guided loss addresses class imbalance issues common in medical outcome prediction tasks. Our experimental results demonstrate that this combination significantly improves prediction accuracy compared to existing methods, providing a promising tool for personalized radiation therapy planning.&lt;/p&gt;</content:encoded></item><item><title><![CDATA[Spatial-temporal attention for video-based assessment of intraoperative surgical skill]]></title><description><![CDATA[<p>We propose a spatial-temporal attention mechanism for automated surgical skill assessment from intraoperative videos, enabling objective evaluation of surgical performance.</p> <p style="font-style: italic;"><span style="font-weight: bold">Bohua Wan</span>, M. Peven, G. Hager, S. Sikder, S. S. Vedula</p> <p style="font-style: italic;">Scientific Reports (2024).</p>]]></description><link>https://bohua-wan.netlify.com/posts/Spatial-temporal-attention-for-video-based-assessment-of-intraoperative-surgical-skill</link><guid isPermaLink="false">https://bohua-wan.netlify.com/posts/Spatial-temporal-attention-for-video-based-assessment-of-intraoperative-surgical-skill</guid><pubDate>Thu, 01 Jun 2023 14:13:40 GMT</pubDate><content:encoded>&lt;p&gt;&lt;strong&gt;Paper:&lt;/strong&gt; &lt;a href=&quot;https://doi.org/10.1038/s41598-024-77176-1&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener noreferrer&quot;&gt;https://doi.org/10.1038/s41598-024-77176-1&lt;/a&gt;&lt;/p&gt;
&lt;center&gt;
    &lt;img style=&quot;border-radius: 0.3125em;
    box-shadow: 0 2px 4px 0 rgba(34,36,38,.12),0 2px 10px 0 rgba(34,36,38,.08);&quot; 
    src=&quot;/media/paper-images/surgical-skill/fig1.jpg&quot;&gt;
    &lt;br&gt;
    &lt;div style=&quot;color:orange; border-bottom: 1px solid #d9d9d9;
    display: inline-block;
    color: #999;
    padding: 2px;&quot;&gt;
	Overall architecture of the spatial-temporal attention network for surgical skill assessment.
	&lt;/div&gt;
&lt;/center&gt;
&lt;h2 id=&quot;highlights&quot; style=&quot;position:relative;&quot;&gt;&lt;a href=&quot;#highlights&quot; aria-label=&quot;highlights permalink&quot; class=&quot;anchor before&quot;&gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Highlights&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Novel spatial-temporal attention mechanism tailored for surgical video analysis.&lt;/li&gt;
&lt;li&gt;Automated objective assessment of surgical skill from intraoperative videos.&lt;/li&gt;
&lt;li&gt;Attention visualization reveals important surgical actions and anatomical regions correlated with skill level.&lt;/li&gt;
&lt;li&gt;State-of-the-art performance on multiple surgical skill assessment benchmarks.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&quot;abstract&quot; style=&quot;position:relative;&quot;&gt;&lt;a href=&quot;#abstract&quot; aria-label=&quot;abstract permalink&quot; class=&quot;anchor before&quot;&gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Abstract&lt;/h2&gt;
&lt;p&gt;Objective assessment of surgical skill is crucial for surgical training, credentialing, and quality improvement. Traditional methods rely on manual expert evaluation, which is subjective, time-consuming, and resource-intensive. We propose an automated surgical skill assessment framework based on spatial-temporal attention mechanisms applied to intraoperative videos. Our method learns to identify and focus on critical surgical actions and anatomical regions that are indicative of skill level. The spatial attention module identifies important regions in each video frame, such as surgical instruments and key anatomical structures. The temporal attention module captures the dynamics of surgical workflow and the temporal patterns that distinguish expert from novice performance. By combining these complementary attention mechanisms, our model achieves objective, consistent, and interpretable surgical skill assessment. Experimental results on multiple surgical datasets demonstrate that our approach achieves superior performance compared to existing methods and provides insights into the visual cues associated with surgical expertise.&lt;/p&gt;
&lt;h2 id=&quot;method&quot; style=&quot;position:relative;&quot;&gt;&lt;a href=&quot;#method&quot; aria-label=&quot;method permalink&quot; class=&quot;anchor before&quot;&gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Method&lt;/h2&gt;
&lt;center&gt;
    &lt;img style=&quot;border-radius: 0.3125em;
    box-shadow: 0 2px 4px 0 rgba(34,36,38,.12),0 2px 10px 0 rgba(34,36,38,.08);&quot; 
    src=&quot;/media/paper-images/surgical-skill/fig2.jpg&quot;&gt;
    &lt;br&gt;
    &lt;div style=&quot;color:orange; border-bottom: 1px solid #d9d9d9;
    display: inline-block;
    color: #999;
    padding: 2px;&quot;&gt;
	Illustration of the spatial attention mechanism identifying critical regions in surgical videos.
	&lt;/div&gt;
&lt;/center&gt;
&lt;p&gt;Our approach consists of two main components: spatial attention and temporal attention.&lt;/p&gt;
&lt;p&gt;The spatial attention module processes each video frame to identify regions that are most relevant for skill assessment. Rather than treating all regions equally, the spatial attention mechanism learns to focus on surgical instruments, target anatomy, and areas where critical actions occur. This is implemented through a learnable attention map that weighs different spatial regions based on their importance for skill classification.&lt;/p&gt;
&lt;center&gt;
    &lt;img style=&quot;border-radius: 0.3125em;
    box-shadow: 0 2px 4px 0 rgba(34,36,38,.12),0 2px 10px 0 rgba(34,36,38,.08);&quot; 
    src=&quot;/media/paper-images/surgical-skill/fig3.jpg&quot;&gt;
    &lt;br&gt;
    &lt;div style=&quot;color:orange; border-bottom: 1px solid #d9d9d9;
    display: inline-block;
    color: #999;
    padding: 2px;&quot;&gt;
	Temporal attention weights across video frames showing important surgical phases.
	&lt;/div&gt;
&lt;/center&gt;
&lt;p&gt;The temporal attention module analyzes the sequence of frames to capture surgical workflow dynamics and temporal patterns. Expert surgeons exhibit smoother, more efficient movements and better adherence to optimal surgical sequences. The temporal attention mechanism learns to identify these temporal signatures of expertise by attending to key phases of the procedure and transitions between surgical actions.&lt;/p&gt;
&lt;p&gt;The spatial and temporal features are integrated through a fusion layer, and the combined representation is used for skill level prediction. This joint spatial-temporal modeling enables comprehensive understanding of surgical performance.&lt;/p&gt;
&lt;h2 id=&quot;results&quot; style=&quot;position:relative;&quot;&gt;&lt;a href=&quot;#results&quot; aria-label=&quot;results permalink&quot; class=&quot;anchor before&quot;&gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Results&lt;/h2&gt;
&lt;p&gt;Our framework achieves state-of-the-art performance on standard surgical skill assessment benchmarks. The spatial-temporal attention mechanism significantly outperforms methods using only spatial or only temporal features, demonstrating the importance of their combination.&lt;/p&gt;
&lt;center&gt;
    &lt;img style=&quot;border-radius: 0.3125em;
    box-shadow: 0 2px 4px 0 rgba(34,36,38,.12),0 2px 10px 0 rgba(34,36,38,.08);&quot; 
    src=&quot;/media/paper-images/surgical-skill/fig4.jpg&quot;&gt;
    &lt;br&gt;
    &lt;div style=&quot;color:orange; border-bottom: 1px solid #d9d9d9;
    display: inline-block;
    color: #999;
    padding: 2px;&quot;&gt;
	Attention visualizations showing regions and time points the model focuses on for skill assessment.
	&lt;/div&gt;
&lt;/center&gt;
&lt;p&gt;The attention visualizations provide interpretable insights into what the model considers important for skill assessment. Spatial attention maps highlight surgical instruments and critical anatomical structures. Temporal attention weights reveal that the model learns to focus on challenging phases of the procedure where skill differences are most pronounced.&lt;/p&gt;
&lt;center&gt;
    &lt;img style=&quot;border-radius: 0.3125em;
    box-shadow: 0 2px 4px 0 rgba(34,36,38,.12),0 2px 10px 0 rgba(34,36,38,.08);&quot; 
    src=&quot;/media/paper-images/surgical-skill/fig5.jpg&quot;&gt;
    &lt;br&gt;
    &lt;div style=&quot;color:orange; border-bottom: 1px solid #d9d9d9;
    display: inline-block;
    color: #999;
    padding: 2px;&quot;&gt;
	Comparison of skill assessment performance across different methods and datasets.
	&lt;/div&gt;
&lt;/center&gt;
&lt;h2 id=&quot;conclusion&quot; style=&quot;position:relative;&quot;&gt;&lt;a href=&quot;#conclusion&quot; aria-label=&quot;conclusion permalink&quot; class=&quot;anchor before&quot;&gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Conclusion&lt;/h2&gt;
&lt;p&gt;This article is only meant for a brief introduction.&lt;/p&gt;
&lt;p&gt;We present a spatial-temporal attention framework for automated surgical skill assessment from intraoperative videos. The spatial attention module identifies critical regions in each frame, while the temporal attention module captures the dynamics of surgical workflow. By combining these complementary attention mechanisms, our model achieves accurate, objective, and interpretable surgical skill assessment. The attention visualizations provide insights into the visual and temporal cues associated with surgical expertise, which could inform surgical training curricula. Our approach demonstrates the potential of deep learning to provide scalable, consistent surgical skill evaluation, supporting surgical education and quality improvement initiatives.&lt;/p&gt;</content:encoded></item><item><title><![CDATA[Combining ADDA with Deep CORAL: Unsupervised Domain Adaptation for Image Classification]]></title><description><![CDATA[<p>We combine Adversarial Discriminative Domain Adaptation (ADDA) with Deep CORAL to allow ADDA better utilize the pretrained initialization. Vanilla ADDA diverses drastically from the initialization resulting much poorer results in early epochs comparing to the initialization. It requires sophisticated fine-tuning for ADDA to give satisfying results. With our novel modifications ADDA-CORAL can be trained extremely faster and yields better results.</p> <p style="font-style: italic;"><span style="font-weight: bold">Bohua Wan</span>, Cong Mu, Ruzhang Zhao, Zhuoying Li (Ordered by alphabetic)</p>]]></description><link>https://bohua-wan.netlify.com/posts/combining-adda-with-deep-coral-unsupervised-domain-adaptation-for-image-classification</link><guid isPermaLink="false">https://bohua-wan.netlify.com/posts/combining-adda-with-deep-coral-unsupervised-domain-adaptation-for-image-classification</guid><pubDate>Sun, 23 May 2021 14:13:40 GMT</pubDate><content:encoded>&lt;center&gt;
    &lt;img style=&quot;border-radius: 0.3125em;
    box-shadow: 0 2px 4px 0 rgba(34,36,38,.12),0 2px 10px 0 rgba(34,36,38,.08);&quot; 
    src=&quot;/media/paper-images/ADDA_CORAL/Network.png&quot;&gt;
    &lt;br&gt;
    &lt;div style=&quot;color:orange; border-bottom: 1px solid #d9d9d9;
    display: inline-block;
    color: #999;
    padding: 2px;&quot;&gt;
    An illustration of our proposed method
combining Deep Coral and ADDA. Blue and orange
arrows denote data flows of source and target domain
respectively. Blue encoder and classifier are
pretrained and fixed.
	&lt;/div&gt;
&lt;/center&gt;
&lt;h2 id=&quot;abstract&quot; style=&quot;position:relative;&quot;&gt;&lt;a href=&quot;#abstract&quot; aria-label=&quot;abstract permalink&quot; class=&quot;anchor before&quot;&gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Abstract&lt;/h2&gt;
&lt;p&gt;Unsupervised domain adaptation techniques are essential for image classification tasks in the real world. As the domain of images, or the space of all possible images, is so enormous that models trained on any dataset will inevitably suffer from out of domain issues. One promising research direction is to use domain adaptation methods to adapt models trained on source domain to the target domain. Adversarial Discriminative Domain Adaptation (ADDA) is one typical adversarial learning based unsupervised domain adaptation method. Though it is proved to be effective on simple and small datasets, it requires sophisticated training strateies and is hard to converge at times. We propose to force align the distribution of the model’s output with that of an adapted model, which also serves as the initialization for the adversarial training. In this way, the adversarial process will be forced to search within a space with results at least as good as the initialization. Experiments on our proposed Tiny-16-Class-Imagenet show our method is effective and efficient in terms of accuracies and training time.&lt;/p&gt;
&lt;h2 id=&quot;introduction&quot; style=&quot;position:relative;&quot;&gt;&lt;a href=&quot;#introduction&quot; aria-label=&quot;introduction permalink&quot; class=&quot;anchor before&quot;&gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Introduction&lt;/h2&gt;
&lt;h4 id=&quot;background&quot; style=&quot;position:relative;&quot;&gt;&lt;a href=&quot;#background&quot; aria-label=&quot;background permalink&quot; class=&quot;anchor before&quot;&gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Background&lt;/h4&gt;
&lt;p&gt;By generalizability, we refer to the model’s ability to perform equally well on unseen data.
The word, “domain”, in this article denotes the space of input features &lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;mi&gt;X&lt;/mi&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;X&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:0.6833em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord mathnormal&quot; style=&quot;margin-right:0.07847em;&quot;&gt;X&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt; and the marginal distribution &lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;mi&gt;P&lt;/mi&gt;&lt;mo stretchy=&quot;false&quot;&gt;(&lt;/mo&gt;&lt;mi&gt;X&lt;/mi&gt;&lt;mo stretchy=&quot;false&quot;&gt;)&lt;/mo&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;P(X)&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:1em;vertical-align:-0.25em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord mathnormal&quot; style=&quot;margin-right:0.13889em;&quot;&gt;P&lt;/span&gt;&lt;span class=&quot;mopen&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mord mathnormal&quot; style=&quot;margin-right:0.07847em;&quot;&gt;X&lt;/span&gt;&lt;span class=&quot;mclose&quot;&gt;)&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;. Specifically, for image classification tasks, the domain of training dataset is the set of all possible images and the marginal distribution in this dataset &lt;sup id=&quot;fnref-6&quot;&gt;&lt;a href=&quot;#fn-6&quot; class=&quot;footnote-ref&quot;&gt;6&lt;/a&gt;&lt;/sup&gt;.
It is crucuial for models to be generalizable doing image classification tasks as the space of possible images is too big that any dataset can only capture one small fraction of it and if the model fails to generalize, then it is useless.
Domain shift refers two domains being different, which is common. For example, when using a model trained with images taken in daylight, but used with images taken
at night. Unsurprisingly, the model usually fails. Different patterns of perturbations like noises imposed
on images are another souce of domain shift.
To solve the problem of domain shift, one promising research area is domain adaptation, which aims to adapt a model trained on source domain to the target domain.
In this project, we investigate the unsupervised
domain adaptation problem, which does not require the target domain to be labeled. &lt;/p&gt;
&lt;h4 id=&quot;related-work&quot; style=&quot;position:relative;&quot;&gt;&lt;a href=&quot;#related-work&quot; aria-label=&quot;related work permalink&quot; class=&quot;anchor before&quot;&gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Related Work&lt;/h4&gt;
&lt;p&gt;Extensive domain adaptation
algorithms have been proposed to account for the
degradation in performance due to domain shift.
Deep Coral &lt;sup id=&quot;fnref-4&quot;&gt;&lt;a href=&quot;#fn-4&quot; class=&quot;footnote-ref&quot;&gt;4&lt;/a&gt;&lt;/sup&gt; extends the unsupervised domain
adaption method Coral to learn a nonlinear transformation
that is able to align correlations of layer
activations in deep neural networks. Adversarial Discriminative
Domain Adaptation (ADDA) &lt;sup id=&quot;fnref-5&quot;&gt;&lt;a href=&quot;#fn-5&quot; class=&quot;footnote-ref&quot;&gt;5&lt;/a&gt;&lt;/sup&gt; combines
discriminative model and generative adversarial
networks to learn a discriminative mapping by fooling
a domain discriminator.&lt;/p&gt;
&lt;h2 id=&quot;method&quot; style=&quot;position:relative;&quot;&gt;&lt;a href=&quot;#method&quot; aria-label=&quot;method permalink&quot; class=&quot;anchor before&quot;&gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Method&lt;/h2&gt;
&lt;h4 id=&quot;datasets&quot; style=&quot;position:relative;&quot;&gt;&lt;a href=&quot;#datasets&quot; aria-label=&quot;datasets permalink&quot; class=&quot;anchor before&quot;&gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Datasets&lt;/h4&gt;
&lt;p&gt;&lt;span
      class=&quot;gatsby-resp-image-wrapper&quot;
      style=&quot;position: relative; display: block; margin-left: auto; margin-right: auto; max-width: 960px; &quot;
    &gt;
      &lt;a
    class=&quot;gatsby-resp-image-link&quot;
    href=&quot;/static/cd6ef8dff440ba3750db98b965969e5e/10cbc/noises.jpg&quot;
    style=&quot;display: block&quot;
    target=&quot;_blank&quot;
    rel=&quot;noopener&quot;
  &gt;
    &lt;span
    class=&quot;gatsby-resp-image-background-image&quot;
    style=&quot;padding-bottom: 45.833333333333336%; position: relative; bottom: 0; left: 0; background-image: url(&apos;data:image/jpeg;base64,/9j/2wBDABALDA4MChAODQ4SERATGCgaGBYWGDEjJR0oOjM9PDkzODdASFxOQERXRTc4UG1RV19iZ2hnPk1xeXBkeFxlZ2P/2wBDARESEhgVGC8aGi9jQjhCY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2P/wgARCAAJABQDASIAAhEBAxEB/8QAFwABAQEBAAAAAAAAAAAAAAAABAACA//EABQBAQAAAAAAAAAAAAAAAAAAAAD/2gAMAwEAAhADEAAAAS7R0Ayo/8QAGRAAAgMBAAAAAAAAAAAAAAAAAAERMTJB/9oACAEBAAEFAuki26R//8QAFBEBAAAAAAAAAAAAAAAAAAAAEP/aAAgBAwEBPwE//8QAFBEBAAAAAAAAAAAAAAAAAAAAEP/aAAgBAgEBPwE//8QAFhABAQEAAAAAAAAAAAAAAAAAMQAg/9oACAEBAAY/AmZx/8QAHBAAAgICAwAAAAAAAAAAAAAAAAERITFhQXGB/9oACAEBAAE/IVLRbXI1F6KCMu+TB6Zzl2f/2gAMAwEAAgADAAAAEKMP/8QAFBEBAAAAAAAAAAAAAAAAAAAAEP/aAAgBAwEBPxA//8QAFBEBAAAAAAAAAAAAAAAAAAAAEP/aAAgBAgEBPxA//8QAHxAAAQQBBQEAAAAAAAAAAAAAAQARITFhEEFxobHw/9oACAEBAAE/EGbICDLXNoj32gDeRlATocK9X2cruD3Tf//Z&apos;); background-size: cover; display: block;&quot;
  &gt;&lt;/span&gt;
  &lt;picture&gt;
          &lt;source
              srcset=&quot;/static/cd6ef8dff440ba3750db98b965969e5e/8ac56/noises.webp 240w,
/static/cd6ef8dff440ba3750db98b965969e5e/d3be9/noises.webp 480w,
/static/cd6ef8dff440ba3750db98b965969e5e/e46b2/noises.webp 960w,
/static/cd6ef8dff440ba3750db98b965969e5e/94575/noises.webp 1298w&quot;
              sizes=&quot;(max-width: 960px) 100vw, 960px&quot;
              type=&quot;image/webp&quot;
            /&gt;
          &lt;source
            srcset=&quot;/static/cd6ef8dff440ba3750db98b965969e5e/09b79/noises.jpg 240w,
/static/cd6ef8dff440ba3750db98b965969e5e/7cc5e/noises.jpg 480w,
/static/cd6ef8dff440ba3750db98b965969e5e/6a068/noises.jpg 960w,
/static/cd6ef8dff440ba3750db98b965969e5e/10cbc/noises.jpg 1298w&quot;
            sizes=&quot;(max-width: 960px) 100vw, 960px&quot;
            type=&quot;image/jpeg&quot;
          /&gt;
          &lt;img
            class=&quot;gatsby-resp-image-image&quot;
            src=&quot;/static/cd6ef8dff440ba3750db98b965969e5e/6a068/noises.jpg&quot;
            alt=&quot;Sample noises in the **Tiny-16-Class-ImageNet** dataset&quot;
            title=&quot;Sample noises in the **Tiny-16-Class-ImageNet** dataset&quot;
            loading=&quot;lazy&quot;
            style=&quot;width:100%;height:100%;margin:0;vertical-align:middle;position:absolute;top:0;left:0;&quot;
          /&gt;
        &lt;/picture&gt;
  &lt;/a&gt;
    &lt;/span&gt;
Figure 2: Sample noises in the &lt;strong&gt;Tiny-16-Class-ImageNet&lt;/strong&gt; dataset.
Top row from left to right: No noise, uniform noise,
salt-and-pepper noise. Bottom row from left to right:
rotation, high-pass, low-pass. Image manipulations
follow the procedure in &lt;sup id=&quot;fnref-1&quot;&gt;&lt;a href=&quot;#fn-1&quot; class=&quot;footnote-ref&quot;&gt;1&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;
&lt;p&gt;We conduct experiments on two datasets:
&lt;strong&gt;Tiny-16-Class-ImageNet&lt;/strong&gt; and &lt;strong&gt;MNIST-USPS&lt;/strong&gt;&lt;sup id=&quot;fnref-2&quot;&gt;&lt;a href=&quot;#fn-2&quot; class=&quot;footnote-ref&quot;&gt;2&lt;/a&gt;&lt;/sup&gt;&lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;msup&gt;&lt;mrow&gt;&lt;/mrow&gt;&lt;mo separator=&quot;true&quot;&gt;,&lt;/mo&gt;&lt;/msup&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;^,&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:0.4369em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;span class=&quot;msupsub&quot;&gt;&lt;span class=&quot;vlist-t&quot;&gt;&lt;span class=&quot;vlist-r&quot;&gt;&lt;span class=&quot;vlist&quot; style=&quot;height:0.4369em;&quot;&gt;&lt;span style=&quot;top:-3.063em;margin-right:0.05em;&quot;&gt;&lt;span class=&quot;pstrut&quot; style=&quot;height:2.7em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;sizing reset-size6 size3 mtight&quot;&gt;&lt;span class=&quot;mpunct mtight&quot;&gt;,&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;sup id=&quot;fnref-3&quot;&gt;&lt;a href=&quot;#fn-3&quot; class=&quot;footnote-ref&quot;&gt;3&lt;/a&gt;&lt;/sup&gt;. Most experiments are done on the &lt;strong&gt;Tiny-16-Class-
ImageNet&lt;/strong&gt;, which is self-produced following guidelines
in &lt;sup id=&quot;fnref-1&quot;&gt;&lt;a href=&quot;#fn-1&quot; class=&quot;footnote-ref&quot;&gt;1&lt;/a&gt;&lt;/sup&gt;.
The &lt;strong&gt;Tiny-16-Class-ImageNet&lt;/strong&gt; has three
subsets: training set, validation set and test set, each containing 10015, 1269 and 10350 images respectively.
All three subsets have 16 general classes (like
bear rather than brown bear), but with different domains.
Training and validation sets contain samples
of different sub-classes (brown bear vs black bear). We apply different patterns of noises to generate different domains. Sample noises are illustrated in figure 2.
Test set contains all samples from every sub-classes
(brown bear, black bear, etc). We have also tested
our proposed method on &lt;strong&gt;MNIST-USPS&lt;/strong&gt; dataset.&lt;/p&gt;
&lt;h4 id=&quot;deep-coral&quot; style=&quot;position:relative;&quot;&gt;&lt;a href=&quot;#deep-coral&quot; aria-label=&quot;deep coral permalink&quot; class=&quot;anchor before&quot;&gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Deep Coral&lt;/h4&gt;
&lt;p&gt;We adapt the idea of Deep Coral &lt;sup id=&quot;fnref-4&quot;&gt;&lt;a href=&quot;#fn-4&quot; class=&quot;footnote-ref&quot;&gt;4&lt;/a&gt;&lt;/sup&gt;
to simply align second-order statistics in the last layer
of the backbone network by adding a coral loss. This
method is simple yet effective and is very extensible.
We replace the backbone of the Deep Coral
with ResNet-50 pretrained on ImageNet when doing
experiments on the &lt;strong&gt;Tiny-16-Class-ImageNet&lt;/strong&gt;. We
use the same SGD hyper-parametsers as in &lt;sup id=&quot;fnref-4&quot;&gt;&lt;a href=&quot;#fn-4&quot; class=&quot;footnote-ref&quot;&gt;4&lt;/a&gt;&lt;/sup&gt; The
&lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;mi&gt;λ&lt;/mi&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;\lambda&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:0.6944em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord mathnormal&quot;&gt;λ&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt; controlling the weight of the coral loss is set the
same with &lt;sup id=&quot;fnref-4&quot;&gt;&lt;a href=&quot;#fn-4&quot; class=&quot;footnote-ref&quot;&gt;4&lt;/a&gt;&lt;/sup&gt;, except on &lt;strong&gt;MNIST-USPS&lt;/strong&gt; dataset,
where we set &lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;mi&gt;λ&lt;/mi&gt;&lt;mo&gt;=&lt;/mo&gt;&lt;mn&gt;1&lt;/mn&gt;&lt;mo&gt;−&lt;/mo&gt;&lt;mfrac&gt;&lt;mrow&gt;&lt;mi&gt;e&lt;/mi&gt;&lt;mi&gt;p&lt;/mi&gt;&lt;mi&gt;o&lt;/mi&gt;&lt;mi&gt;c&lt;/mi&gt;&lt;mi&gt;h&lt;/mi&gt;&lt;/mrow&gt;&lt;mrow&gt;&lt;mi&gt;n&lt;/mi&gt;&lt;mi&gt;u&lt;/mi&gt;&lt;mi&gt;m&lt;/mi&gt;&lt;mi mathvariant=&quot;normal&quot;&gt;_&lt;/mi&gt;&lt;mi&gt;e&lt;/mi&gt;&lt;mi&gt;p&lt;/mi&gt;&lt;mi&gt;o&lt;/mi&gt;&lt;mi&gt;c&lt;/mi&gt;&lt;mi&gt;h&lt;/mi&gt;&lt;mi&gt;s&lt;/mi&gt;&lt;/mrow&gt;&lt;/mfrac&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;\lambda = 1- \frac{epoch}{num\_epochs}&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:0.6944em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord mathnormal&quot;&gt;λ&lt;/span&gt;&lt;span class=&quot;mspace&quot; style=&quot;margin-right:0.2778em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mrel&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;mspace&quot; style=&quot;margin-right:0.2778em;&quot;&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:0.7278em;vertical-align:-0.0833em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;mspace&quot; style=&quot;margin-right:0.2222em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mbin&quot;&gt;−&lt;/span&gt;&lt;span class=&quot;mspace&quot; style=&quot;margin-right:0.2222em;&quot;&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:1.4942em;vertical-align:-0.562em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;&lt;span class=&quot;mopen nulldelimiter&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mfrac&quot;&gt;&lt;span class=&quot;vlist-t vlist-t2&quot;&gt;&lt;span class=&quot;vlist-r&quot;&gt;&lt;span class=&quot;vlist&quot; style=&quot;height:0.9322em;&quot;&gt;&lt;span style=&quot;top:-2.655em;&quot;&gt;&lt;span class=&quot;pstrut&quot; style=&quot;height:3em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;sizing reset-size6 size3 mtight&quot;&gt;&lt;span class=&quot;mord mtight&quot;&gt;&lt;span class=&quot;mord mathnormal mtight&quot;&gt;n&lt;/span&gt;&lt;span class=&quot;mord mathnormal mtight&quot;&gt;u&lt;/span&gt;&lt;span class=&quot;mord mathnormal mtight&quot;&gt;m&lt;/span&gt;&lt;span class=&quot;mord mtight&quot; style=&quot;margin-right:0.02778em;&quot;&gt;_&lt;/span&gt;&lt;span class=&quot;mord mathnormal mtight&quot;&gt;e&lt;/span&gt;&lt;span class=&quot;mord mathnormal mtight&quot;&gt;p&lt;/span&gt;&lt;span class=&quot;mord mathnormal mtight&quot;&gt;oc&lt;/span&gt;&lt;span class=&quot;mord mathnormal mtight&quot;&gt;h&lt;/span&gt;&lt;span class=&quot;mord mathnormal mtight&quot;&gt;s&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&quot;top:-3.23em;&quot;&gt;&lt;span class=&quot;pstrut&quot; style=&quot;height:3em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;frac-line&quot; style=&quot;border-bottom-width:0.04em;&quot;&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&quot;top:-3.4461em;&quot;&gt;&lt;span class=&quot;pstrut&quot; style=&quot;height:3em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;sizing reset-size6 size3 mtight&quot;&gt;&lt;span class=&quot;mord mtight&quot;&gt;&lt;span class=&quot;mord mathnormal mtight&quot;&gt;e&lt;/span&gt;&lt;span class=&quot;mord mathnormal mtight&quot;&gt;p&lt;/span&gt;&lt;span class=&quot;mord mathnormal mtight&quot;&gt;oc&lt;/span&gt;&lt;span class=&quot;mord mathnormal mtight&quot;&gt;h&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;vlist-s&quot;&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;vlist-r&quot;&gt;&lt;span class=&quot;vlist&quot; style=&quot;height:0.562em;&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;mclose nulldelimiter&quot;&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;.&lt;/p&gt;
&lt;h4 id=&quot;adda&quot; style=&quot;position:relative;&quot;&gt;&lt;a href=&quot;#adda&quot; aria-label=&quot;adda permalink&quot; class=&quot;anchor before&quot;&gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;ADDA&lt;/h4&gt;
&lt;p&gt;We also adopt the idea of ADDA by first learning a discriminative representation using data
from the source domain and then learning another encoding
that maps the target domain to the source domain
with a domain-adversarial loss. We use ResNet-50 (excluding the last layer) as the backbone for encoder and a &lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;mn&gt;3&lt;/mn&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;3&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:0.6444em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;3&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt; layer MLP as
the discriminator with hidden size of &lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;mn&gt;1024&lt;/mn&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;1024&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:0.6444em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;1024&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;. The pretrained ResNet-50 will be freezed during adversarial training. Adam is
used as the optimizer with &lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;msub&gt;&lt;mi&gt;β&lt;/mi&gt;&lt;mn&gt;1&lt;/mn&gt;&lt;/msub&gt;&lt;mo&gt;=&lt;/mo&gt;&lt;mn&gt;0.5&lt;/mn&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;\beta_1=0.5&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:0.8889em;vertical-align:-0.1944em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;&lt;span class=&quot;mord mathnormal&quot; style=&quot;margin-right:0.05278em;&quot;&gt;β&lt;/span&gt;&lt;span class=&quot;msupsub&quot;&gt;&lt;span class=&quot;vlist-t vlist-t2&quot;&gt;&lt;span class=&quot;vlist-r&quot;&gt;&lt;span class=&quot;vlist&quot; style=&quot;height:0.3011em;&quot;&gt;&lt;span style=&quot;top:-2.55em;margin-left:-0.0528em;margin-right:0.05em;&quot;&gt;&lt;span class=&quot;pstrut&quot; style=&quot;height:2.7em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;sizing reset-size6 size3 mtight&quot;&gt;&lt;span class=&quot;mord mtight&quot;&gt;1&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;vlist-s&quot;&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;vlist-r&quot;&gt;&lt;span class=&quot;vlist&quot; style=&quot;height:0.15em;&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;mspace&quot; style=&quot;margin-right:0.2778em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mrel&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;mspace&quot; style=&quot;margin-right:0.2778em;&quot;&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:0.6444em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;0.5&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt; and &lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;msub&gt;&lt;mi&gt;β&lt;/mi&gt;&lt;mn&gt;2&lt;/mn&gt;&lt;/msub&gt;&lt;mo&gt;=&lt;/mo&gt;&lt;mn&gt;0.999&lt;/mn&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;\beta_2=0.999&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:0.8889em;vertical-align:-0.1944em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;&lt;span class=&quot;mord mathnormal&quot; style=&quot;margin-right:0.05278em;&quot;&gt;β&lt;/span&gt;&lt;span class=&quot;msupsub&quot;&gt;&lt;span class=&quot;vlist-t vlist-t2&quot;&gt;&lt;span class=&quot;vlist-r&quot;&gt;&lt;span class=&quot;vlist&quot; style=&quot;height:0.3011em;&quot;&gt;&lt;span style=&quot;top:-2.55em;margin-left:-0.0528em;margin-right:0.05em;&quot;&gt;&lt;span class=&quot;pstrut&quot; style=&quot;height:2.7em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;sizing reset-size6 size3 mtight&quot;&gt;&lt;span class=&quot;mord mtight&quot;&gt;2&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;vlist-s&quot;&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;vlist-r&quot;&gt;&lt;span class=&quot;vlist&quot; style=&quot;height:0.15em;&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;mspace&quot; style=&quot;margin-right:0.2778em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mrel&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;mspace&quot; style=&quot;margin-right:0.2778em;&quot;&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:0.6444em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;0.999&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;.
The learning rate is set to be &lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;mn&gt;0.0002&lt;/mn&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;0.0002&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:0.6444em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;0.0002&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt; and the batch
size is &lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;mn&gt;32&lt;/mn&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;32&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:0.6444em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;32&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;. During the adaption stage, target encoder
is updated every &lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;mn&gt;4&lt;/mn&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;4&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:0.6444em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;4&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt; steps.&lt;/p&gt;
&lt;h4 id=&quot;adda-coral&quot; style=&quot;position:relative;&quot;&gt;&lt;a href=&quot;#adda-coral&quot; aria-label=&quot;adda coral permalink&quot; class=&quot;anchor before&quot;&gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;ADDA-CORAL&lt;/h4&gt;
&lt;p&gt;We propose a new method
that combines the Deep Coral and the ADDA methods,
by using Deep Coral as the pretraining of the
ADDA, and aligning the target domains’ second order
statistics between the classification outputs of
the fixed pretrained encoder and the ADDA trained
target encoder. The overall architecture is illustrated
in Fig.1. During experiments, we find that vanilla
ADDA ruins the pretrained encoder due to the poorly
trained discriminator. To better use the initialization
of the Deep Coral pretrained encoder while ensuring
the target encoder learned will generate similar features
for target and source domain, we use coral loss
to only align the ADDA trained encoder’s classification output with that of the fixed pretrained encoder,
and gradually decrease the coral loss’s weight. &lt;/p&gt;
&lt;p&gt;The underlying assumption we made here is that we assume the best possible solution lies near (with respect to learning using Adam) to the already good initialization in the solution space.&lt;/p&gt;
&lt;h4 id=&quot;results&quot; style=&quot;position:relative;&quot;&gt;&lt;a href=&quot;#results&quot; aria-label=&quot;results permalink&quot; class=&quot;anchor before&quot;&gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Results&lt;/h4&gt;
&lt;p&gt;Table 1: Our Deep Coral+ADDA’s results on &lt;strong&gt;Tiny-16-Class-ImageNet&lt;/strong&gt; and &lt;strong&gt;MINIST-USPS&lt;/strong&gt;.&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th align=&quot;center&quot;&gt;&lt;div style=&quot;width:150px&quot;&gt;Setting&lt;/div&gt;&lt;/th&gt;
&lt;th align=&quot;center&quot;&gt;&lt;div style=&quot;width:100px&quot;&gt;Source&lt;/div&gt;&lt;/th&gt;
&lt;th align=&quot;center&quot;&gt;&lt;div style=&quot;width:100px&quot;&gt;target&lt;/div&gt;&lt;/th&gt;
&lt;th align=&quot;right&quot;&gt;Acc&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td align=&quot;center&quot;&gt;ResNet-50&lt;/td&gt;
&lt;td align=&quot;center&quot;&gt;train&lt;/td&gt;
&lt;td align=&quot;center&quot;&gt;val&lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;msup&gt;&lt;mrow&gt;&lt;/mrow&gt;&lt;mi mathvariant=&quot;normal&quot;&gt;†&lt;/mi&gt;&lt;/msup&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;^\dag&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:0.8491em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;span class=&quot;msupsub&quot;&gt;&lt;span class=&quot;vlist-t&quot;&gt;&lt;span class=&quot;vlist-r&quot;&gt;&lt;span class=&quot;vlist&quot; style=&quot;height:0.8491em;&quot;&gt;&lt;span style=&quot;top:-3.063em;margin-right:0.05em;&quot;&gt;&lt;span class=&quot;pstrut&quot; style=&quot;height:2.7em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;sizing reset-size6 size3 mtight&quot;&gt;&lt;span class=&quot;mord mtight&quot;&gt;†&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;&lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;mn&gt;25.13&lt;/mn&gt;&lt;mi mathvariant=&quot;normal&quot;&gt;%&lt;/mi&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;25.13\%&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:0.8056em;vertical-align:-0.0556em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;25.13%&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td align=&quot;center&quot;&gt;ADDA&lt;/td&gt;
&lt;td align=&quot;center&quot;&gt;train&lt;/td&gt;
&lt;td align=&quot;center&quot;&gt;val&lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;msup&gt;&lt;mrow&gt;&lt;/mrow&gt;&lt;mi mathvariant=&quot;normal&quot;&gt;†&lt;/mi&gt;&lt;/msup&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;^\dag&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:0.8491em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;span class=&quot;msupsub&quot;&gt;&lt;span class=&quot;vlist-t&quot;&gt;&lt;span class=&quot;vlist-r&quot;&gt;&lt;span class=&quot;vlist&quot; style=&quot;height:0.8491em;&quot;&gt;&lt;span style=&quot;top:-3.063em;margin-right:0.05em;&quot;&gt;&lt;span class=&quot;pstrut&quot; style=&quot;height:2.7em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;sizing reset-size6 size3 mtight&quot;&gt;&lt;span class=&quot;mord mtight&quot;&gt;†&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;&lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;mn&gt;48.32&lt;/mn&gt;&lt;mi mathvariant=&quot;normal&quot;&gt;%&lt;/mi&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;48.32\%&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:0.8056em;vertical-align:-0.0556em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;48.32%&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td align=&quot;center&quot;&gt;Deep Coral&lt;/td&gt;
&lt;td align=&quot;center&quot;&gt;train&lt;/td&gt;
&lt;td align=&quot;center&quot;&gt;val&lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;msup&gt;&lt;mrow&gt;&lt;/mrow&gt;&lt;mi mathvariant=&quot;normal&quot;&gt;†&lt;/mi&gt;&lt;/msup&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;^\dag&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:0.8491em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;span class=&quot;msupsub&quot;&gt;&lt;span class=&quot;vlist-t&quot;&gt;&lt;span class=&quot;vlist-r&quot;&gt;&lt;span class=&quot;vlist&quot; style=&quot;height:0.8491em;&quot;&gt;&lt;span style=&quot;top:-3.063em;margin-right:0.05em;&quot;&gt;&lt;span class=&quot;pstrut&quot; style=&quot;height:2.7em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;sizing reset-size6 size3 mtight&quot;&gt;&lt;span class=&quot;mord mtight&quot;&gt;†&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;&lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;mn&gt;73.52&lt;/mn&gt;&lt;mi mathvariant=&quot;normal&quot;&gt;%&lt;/mi&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;73.52\%&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:0.8056em;vertical-align:-0.0556em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;73.52%&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td align=&quot;center&quot;&gt;Ours&lt;/td&gt;
&lt;td align=&quot;center&quot;&gt;train&lt;/td&gt;
&lt;td align=&quot;center&quot;&gt;val&lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;msup&gt;&lt;mrow&gt;&lt;/mrow&gt;&lt;mi mathvariant=&quot;normal&quot;&gt;†&lt;/mi&gt;&lt;/msup&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;^\dag&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:0.8491em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;span class=&quot;msupsub&quot;&gt;&lt;span class=&quot;vlist-t&quot;&gt;&lt;span class=&quot;vlist-r&quot;&gt;&lt;span class=&quot;vlist&quot; style=&quot;height:0.8491em;&quot;&gt;&lt;span style=&quot;top:-3.063em;margin-right:0.05em;&quot;&gt;&lt;span class=&quot;pstrut&quot; style=&quot;height:2.7em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;sizing reset-size6 size3 mtight&quot;&gt;&lt;span class=&quot;mord mtight&quot;&gt;†&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;&lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;mn&gt;77.69&lt;/mn&gt;&lt;mi mathvariant=&quot;normal&quot;&gt;%&lt;/mi&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;77.69\%&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:0.8056em;vertical-align:-0.0556em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;77.69%&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td align=&quot;center&quot;&gt;LeNet&lt;/td&gt;
&lt;td align=&quot;center&quot;&gt;MINST&lt;/td&gt;
&lt;td align=&quot;center&quot;&gt;USPS&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;&lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;mn&gt;25.13&lt;/mn&gt;&lt;mi mathvariant=&quot;normal&quot;&gt;%&lt;/mi&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;25.13\%&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:0.8056em;vertical-align:-0.0556em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;25.13%&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td align=&quot;center&quot;&gt;ADDA&lt;/td&gt;
&lt;td align=&quot;center&quot;&gt;MINST&lt;/td&gt;
&lt;td align=&quot;center&quot;&gt;USPS&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;&lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;mn&gt;89.40&lt;/mn&gt;&lt;mi mathvariant=&quot;normal&quot;&gt;%&lt;/mi&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;89.40\%&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:0.8056em;vertical-align:-0.0556em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;89.40%&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td align=&quot;center&quot;&gt;Deep Coral&lt;/td&gt;
&lt;td align=&quot;center&quot;&gt;MINST&lt;/td&gt;
&lt;td align=&quot;center&quot;&gt;USPS&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;&lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;mn&gt;54.30&lt;/mn&gt;&lt;mi mathvariant=&quot;normal&quot;&gt;%&lt;/mi&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;54.30\%&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:0.8056em;vertical-align:-0.0556em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;54.30%&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td align=&quot;center&quot;&gt;Ours&lt;/td&gt;
&lt;td align=&quot;center&quot;&gt;MINST&lt;/td&gt;
&lt;td align=&quot;center&quot;&gt;USPS&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;&lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;mn&gt;94.56&lt;/mn&gt;&lt;mi mathvariant=&quot;normal&quot;&gt;%&lt;/mi&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;94.56\%&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:0.8056em;vertical-align:-0.0556em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;94.56%&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;&lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;msup&gt;&lt;mrow&gt;&lt;/mrow&gt;&lt;mi mathvariant=&quot;normal&quot;&gt;†&lt;/mi&gt;&lt;/msup&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;^\dag&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:0.8491em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;span class=&quot;msupsub&quot;&gt;&lt;span class=&quot;vlist-t&quot;&gt;&lt;span class=&quot;vlist-r&quot;&gt;&lt;span class=&quot;vlist&quot; style=&quot;height:0.8491em;&quot;&gt;&lt;span style=&quot;top:-3.063em;margin-right:0.05em;&quot;&gt;&lt;span class=&quot;pstrut&quot; style=&quot;height:2.7em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;sizing reset-size6 size3 mtight&quot;&gt;&lt;span class=&quot;mord mtight&quot;&gt;†&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;: validation set with uniform noise (0.5)&lt;/p&gt;
&lt;p&gt;Table 2: Our Deep Coral+ADDA’s results on unseen test set of Tiny-16-Class-ImageNet.&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th align=&quot;center&quot;&gt;&lt;div style=&quot;width:150px&quot;&gt;Setting&lt;/div&gt;&lt;/th&gt;
&lt;th align=&quot;center&quot;&gt;&lt;div style=&quot;width:130px&quot;&gt;Train Source&lt;/div&gt;&lt;/th&gt;
&lt;th align=&quot;center&quot;&gt;&lt;div style=&quot;width:130px&quot;&gt;Train target&lt;/div&gt;&lt;/th&gt;
&lt;th align=&quot;center&quot;&gt;Unseen Target&lt;/th&gt;
&lt;th align=&quot;right&quot;&gt;Acc&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td align=&quot;center&quot;&gt;ResNet-50&lt;/td&gt;
&lt;td align=&quot;center&quot;&gt;train&lt;/td&gt;
&lt;td align=&quot;center&quot;&gt;None&lt;/td&gt;
&lt;td align=&quot;center&quot;&gt;Test&lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;msup&gt;&lt;mrow&gt;&lt;/mrow&gt;&lt;mi mathvariant=&quot;normal&quot;&gt;†&lt;/mi&gt;&lt;/msup&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;^\dag&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:0.8491em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;span class=&quot;msupsub&quot;&gt;&lt;span class=&quot;vlist-t&quot;&gt;&lt;span class=&quot;vlist-r&quot;&gt;&lt;span class=&quot;vlist&quot; style=&quot;height:0.8491em;&quot;&gt;&lt;span style=&quot;top:-3.063em;margin-right:0.05em;&quot;&gt;&lt;span class=&quot;pstrut&quot; style=&quot;height:2.7em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;sizing reset-size6 size3 mtight&quot;&gt;&lt;span class=&quot;mord mtight&quot;&gt;†&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;&lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;mn&gt;5.34&lt;/mn&gt;&lt;mi mathvariant=&quot;normal&quot;&gt;%&lt;/mi&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;5.34\%&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:0.8056em;vertical-align:-0.0556em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;5.34%&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td align=&quot;center&quot;&gt;ResNet-50-ImageNet&lt;/td&gt;
&lt;td align=&quot;center&quot;&gt;train&lt;/td&gt;
&lt;td align=&quot;center&quot;&gt;None&lt;/td&gt;
&lt;td align=&quot;center&quot;&gt;Test&lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;msup&gt;&lt;mrow&gt;&lt;/mrow&gt;&lt;mi mathvariant=&quot;normal&quot;&gt;†&lt;/mi&gt;&lt;/msup&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;^\dag&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:0.8491em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;span class=&quot;msupsub&quot;&gt;&lt;span class=&quot;vlist-t&quot;&gt;&lt;span class=&quot;vlist-r&quot;&gt;&lt;span class=&quot;vlist&quot; style=&quot;height:0.8491em;&quot;&gt;&lt;span style=&quot;top:-3.063em;margin-right:0.05em;&quot;&gt;&lt;span class=&quot;pstrut&quot; style=&quot;height:2.7em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;sizing reset-size6 size3 mtight&quot;&gt;&lt;span class=&quot;mord mtight&quot;&gt;†&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;&lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;mn&gt;13.14&lt;/mn&gt;&lt;mi mathvariant=&quot;normal&quot;&gt;%&lt;/mi&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;13.14\%&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:0.8056em;vertical-align:-0.0556em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;13.14%&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td align=&quot;center&quot;&gt;DeepCoral&lt;/td&gt;
&lt;td align=&quot;center&quot;&gt;train&lt;/td&gt;
&lt;td align=&quot;center&quot;&gt;val&lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;msup&gt;&lt;mrow&gt;&lt;/mrow&gt;&lt;mi mathvariant=&quot;normal&quot;&gt;†&lt;/mi&gt;&lt;/msup&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;^\dag&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:0.8491em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;span class=&quot;msupsub&quot;&gt;&lt;span class=&quot;vlist-t&quot;&gt;&lt;span class=&quot;vlist-r&quot;&gt;&lt;span class=&quot;vlist&quot; style=&quot;height:0.8491em;&quot;&gt;&lt;span style=&quot;top:-3.063em;margin-right:0.05em;&quot;&gt;&lt;span class=&quot;pstrut&quot; style=&quot;height:2.7em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;sizing reset-size6 size3 mtight&quot;&gt;&lt;span class=&quot;mord mtight&quot;&gt;†&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/td&gt;
&lt;td align=&quot;center&quot;&gt;Test&lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;msup&gt;&lt;mrow&gt;&lt;/mrow&gt;&lt;mi mathvariant=&quot;normal&quot;&gt;†&lt;/mi&gt;&lt;/msup&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;^\dag&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:0.8491em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;span class=&quot;msupsub&quot;&gt;&lt;span class=&quot;vlist-t&quot;&gt;&lt;span class=&quot;vlist-r&quot;&gt;&lt;span class=&quot;vlist&quot; style=&quot;height:0.8491em;&quot;&gt;&lt;span style=&quot;top:-3.063em;margin-right:0.05em;&quot;&gt;&lt;span class=&quot;pstrut&quot; style=&quot;height:2.7em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;sizing reset-size6 size3 mtight&quot;&gt;&lt;span class=&quot;mord mtight&quot;&gt;†&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;&lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;mn&gt;38.37&lt;/mn&gt;&lt;mi mathvariant=&quot;normal&quot;&gt;%&lt;/mi&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;38.37\%&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:0.8056em;vertical-align:-0.0556em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;38.37%&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td align=&quot;center&quot;&gt;Ours&lt;/td&gt;
&lt;td align=&quot;center&quot;&gt;train&lt;/td&gt;
&lt;td align=&quot;center&quot;&gt;val&lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;msup&gt;&lt;mrow&gt;&lt;/mrow&gt;&lt;mi mathvariant=&quot;normal&quot;&gt;†&lt;/mi&gt;&lt;/msup&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;^\dag&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:0.8491em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;span class=&quot;msupsub&quot;&gt;&lt;span class=&quot;vlist-t&quot;&gt;&lt;span class=&quot;vlist-r&quot;&gt;&lt;span class=&quot;vlist&quot; style=&quot;height:0.8491em;&quot;&gt;&lt;span style=&quot;top:-3.063em;margin-right:0.05em;&quot;&gt;&lt;span class=&quot;pstrut&quot; style=&quot;height:2.7em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;sizing reset-size6 size3 mtight&quot;&gt;&lt;span class=&quot;mord mtight&quot;&gt;†&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/td&gt;
&lt;td align=&quot;center&quot;&gt;Test&lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;msup&gt;&lt;mrow&gt;&lt;/mrow&gt;&lt;mi mathvariant=&quot;normal&quot;&gt;†&lt;/mi&gt;&lt;/msup&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;^\dag&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:0.8491em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;span class=&quot;msupsub&quot;&gt;&lt;span class=&quot;vlist-t&quot;&gt;&lt;span class=&quot;vlist-r&quot;&gt;&lt;span class=&quot;vlist&quot; style=&quot;height:0.8491em;&quot;&gt;&lt;span style=&quot;top:-3.063em;margin-right:0.05em;&quot;&gt;&lt;span class=&quot;pstrut&quot; style=&quot;height:2.7em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;sizing reset-size6 size3 mtight&quot;&gt;&lt;span class=&quot;mord mtight&quot;&gt;†&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;&lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;mn&gt;52.96&lt;/mn&gt;&lt;mi mathvariant=&quot;normal&quot;&gt;%&lt;/mi&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;52.96\%&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:0.8056em;vertical-align:-0.0556em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;52.96%&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;&lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;msup&gt;&lt;mrow&gt;&lt;/mrow&gt;&lt;mi mathvariant=&quot;normal&quot;&gt;†&lt;/mi&gt;&lt;/msup&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;^\dag&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:0.8491em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;span class=&quot;msupsub&quot;&gt;&lt;span class=&quot;vlist-t&quot;&gt;&lt;span class=&quot;vlist-r&quot;&gt;&lt;span class=&quot;vlist&quot; style=&quot;height:0.8491em;&quot;&gt;&lt;span style=&quot;top:-3.063em;margin-right:0.05em;&quot;&gt;&lt;span class=&quot;pstrut&quot; style=&quot;height:2.7em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;sizing reset-size6 size3 mtight&quot;&gt;&lt;span class=&quot;mord mtight&quot;&gt;†&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;: validation set with uniform noise (0.5)&lt;/p&gt;
&lt;p&gt;&lt;span
      class=&quot;gatsby-resp-image-wrapper&quot;
      style=&quot;position: relative; display: block; margin-left: auto; margin-right: auto; max-width: 960px; &quot;
    &gt;
      &lt;a
    class=&quot;gatsby-resp-image-link&quot;
    href=&quot;/static/8912e4e424e5400f62874244f293e048/eea4a/Confusion_matrix.jpg&quot;
    style=&quot;display: block&quot;
    target=&quot;_blank&quot;
    rel=&quot;noopener&quot;
  &gt;
    &lt;span
    class=&quot;gatsby-resp-image-background-image&quot;
    style=&quot;padding-bottom: 37.916666666666664%; position: relative; bottom: 0; left: 0; background-image: url(&apos;data:image/jpeg;base64,/9j/2wBDABALDA4MChAODQ4SERATGCgaGBYWGDEjJR0oOjM9PDkzODdASFxOQERXRTc4UG1RV19iZ2hnPk1xeXBkeFxlZ2P/2wBDARESEhgVGC8aGi9jQjhCY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2P/wgARCAAIABQDASIAAhEBAxEB/8QAFgABAQEAAAAAAAAAAAAAAAAAAAEF/8QAFQEBAQAAAAAAAAAAAAAAAAAAAAH/2gAMAwEAAhADEAAAAduBAv8A/8QAFRABAQAAAAAAAAAAAAAAAAAAARD/2gAIAQEAAQUCb//EABQRAQAAAAAAAAAAAAAAAAAAABD/2gAIAQMBAT8BP//EABQRAQAAAAAAAAAAAAAAAAAAABD/2gAIAQIBAT8BP//EABQQAQAAAAAAAAAAAAAAAAAAABD/2gAIAQEABj8Cf//EABcQAAMBAAAAAAAAAAAAAAAAAAABEBH/2gAIAQEAAT8hQyf/2gAMAwEAAgADAAAAEIff/8QAFhEBAQEAAAAAAAAAAAAAAAAAAAER/9oACAEDAQE/EIx//8QAFBEBAAAAAAAAAAAAAAAAAAAAEP/aAAgBAgEBPxA//8QAGhABAAIDAQAAAAAAAAAAAAAAAQAhETFxkf/aAAgBAQABPxBHYxTTnsVKR8n/2Q==&apos;); background-size: cover; display: block;&quot;
  &gt;&lt;/span&gt;
  &lt;picture&gt;
          &lt;source
              srcset=&quot;/static/8912e4e424e5400f62874244f293e048/8ac56/Confusion_matrix.webp 240w,
/static/8912e4e424e5400f62874244f293e048/d3be9/Confusion_matrix.webp 480w,
/static/8912e4e424e5400f62874244f293e048/e46b2/Confusion_matrix.webp 960w,
/static/8912e4e424e5400f62874244f293e048/af3f0/Confusion_matrix.webp 1280w&quot;
              sizes=&quot;(max-width: 960px) 100vw, 960px&quot;
              type=&quot;image/webp&quot;
            /&gt;
          &lt;source
            srcset=&quot;/static/8912e4e424e5400f62874244f293e048/09b79/Confusion_matrix.jpg 240w,
/static/8912e4e424e5400f62874244f293e048/7cc5e/Confusion_matrix.jpg 480w,
/static/8912e4e424e5400f62874244f293e048/6a068/Confusion_matrix.jpg 960w,
/static/8912e4e424e5400f62874244f293e048/eea4a/Confusion_matrix.jpg 1280w&quot;
            sizes=&quot;(max-width: 960px) 100vw, 960px&quot;
            type=&quot;image/jpeg&quot;
          /&gt;
          &lt;img
            class=&quot;gatsby-resp-image-image&quot;
            src=&quot;/static/8912e4e424e5400f62874244f293e048/6a068/Confusion_matrix.jpg&quot;
            alt=&quot;Confusion matrix of our results on different target domains&quot;
            title=&quot;Confusion matrix of our results on different target domains&quot;
            loading=&quot;lazy&quot;
            style=&quot;width:100%;height:100%;margin:0;vertical-align:middle;position:absolute;top:0;left:0;&quot;
          /&gt;
        &lt;/picture&gt;
  &lt;/a&gt;
    &lt;/span&gt;
Figrue 3: Classification accuracy in percent for different
domains. Model &lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;msub&gt;&lt;mi&gt;M&lt;/mi&gt;&lt;mn&gt;0&lt;/mn&gt;&lt;/msub&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;M_0&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:0.8333em;vertical-align:-0.15em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;&lt;span class=&quot;mord mathnormal&quot; style=&quot;margin-right:0.10903em;&quot;&gt;M&lt;/span&gt;&lt;span class=&quot;msupsub&quot;&gt;&lt;span class=&quot;vlist-t vlist-t2&quot;&gt;&lt;span class=&quot;vlist-r&quot;&gt;&lt;span class=&quot;vlist&quot; style=&quot;height:0.3011em;&quot;&gt;&lt;span style=&quot;top:-2.55em;margin-left:-0.109em;margin-right:0.05em;&quot;&gt;&lt;span class=&quot;pstrut&quot; style=&quot;height:2.7em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;sizing reset-size6 size3 mtight&quot;&gt;&lt;span class=&quot;mord mtight&quot;&gt;0&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;vlist-s&quot;&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;vlist-r&quot;&gt;&lt;span class=&quot;vlist&quot; style=&quot;height:0.15em;&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt; is only trained on the source
domain. Models &lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;msub&gt;&lt;mi&gt;M&lt;/mi&gt;&lt;mn&gt;1&lt;/mn&gt;&lt;/msub&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;M_1&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:0.8333em;vertical-align:-0.15em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;&lt;span class=&quot;mord mathnormal&quot; style=&quot;margin-right:0.10903em;&quot;&gt;M&lt;/span&gt;&lt;span class=&quot;msupsub&quot;&gt;&lt;span class=&quot;vlist-t vlist-t2&quot;&gt;&lt;span class=&quot;vlist-r&quot;&gt;&lt;span class=&quot;vlist&quot; style=&quot;height:0.3011em;&quot;&gt;&lt;span style=&quot;top:-2.55em;margin-left:-0.109em;margin-right:0.05em;&quot;&gt;&lt;span class=&quot;pstrut&quot; style=&quot;height:2.7em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;sizing reset-size6 size3 mtight&quot;&gt;&lt;span class=&quot;mord mtight&quot;&gt;1&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;vlist-s&quot;&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;vlist-r&quot;&gt;&lt;span class=&quot;vlist&quot; style=&quot;height:0.15em;&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt; to &lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;msub&gt;&lt;mi&gt;M&lt;/mi&gt;&lt;mn&gt;5&lt;/mn&gt;&lt;/msub&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;M_5&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:0.8333em;vertical-align:-0.15em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;&lt;span class=&quot;mord mathnormal&quot; style=&quot;margin-right:0.10903em;&quot;&gt;M&lt;/span&gt;&lt;span class=&quot;msupsub&quot;&gt;&lt;span class=&quot;vlist-t vlist-t2&quot;&gt;&lt;span class=&quot;vlist-r&quot;&gt;&lt;span class=&quot;vlist&quot; style=&quot;height:0.3011em;&quot;&gt;&lt;span style=&quot;top:-2.55em;margin-left:-0.109em;margin-right:0.05em;&quot;&gt;&lt;span class=&quot;pstrut&quot; style=&quot;height:2.7em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;sizing reset-size6 size3 mtight&quot;&gt;&lt;span class=&quot;mord mtight&quot;&gt;5&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;vlist-s&quot;&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;vlist-r&quot;&gt;&lt;span class=&quot;vlist&quot; style=&quot;height:0.15em;&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt; are adapted on one target
domain (in red rectangle) via ADDA. &lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;msub&gt;&lt;mi&gt;M&lt;/mi&gt;&lt;mn&gt;6&lt;/mn&gt;&lt;/msub&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;M_6&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:0.8333em;vertical-align:-0.15em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;&lt;span class=&quot;mord mathnormal&quot; style=&quot;margin-right:0.10903em;&quot;&gt;M&lt;/span&gt;&lt;span class=&quot;msupsub&quot;&gt;&lt;span class=&quot;vlist-t vlist-t2&quot;&gt;&lt;span class=&quot;vlist-r&quot;&gt;&lt;span class=&quot;vlist&quot; style=&quot;height:0.3011em;&quot;&gt;&lt;span style=&quot;top:-2.55em;margin-left:-0.109em;margin-right:0.05em;&quot;&gt;&lt;span class=&quot;pstrut&quot; style=&quot;height:2.7em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;sizing reset-size6 size3 mtight&quot;&gt;&lt;span class=&quot;mord mtight&quot;&gt;6&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;vlist-s&quot;&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;vlist-r&quot;&gt;&lt;span class=&quot;vlist&quot; style=&quot;height:0.15em;&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt; to &lt;span class=&quot;katex&quot;&gt;&lt;span class=&quot;katex-mathml&quot;&gt;&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;msub&gt;&lt;mi&gt;M&lt;/mi&gt;&lt;mn&gt;1&lt;/mn&gt;&lt;/msub&gt;&lt;mn&gt;0&lt;/mn&gt;&lt;/mrow&gt;&lt;annotation encoding=&quot;application/x-tex&quot;&gt;M_10&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/span&gt;&lt;span class=&quot;katex-html&quot; aria-hidden=&quot;true&quot;&gt;&lt;span class=&quot;base&quot;&gt;&lt;span class=&quot;strut&quot; style=&quot;height:0.8333em;vertical-align:-0.15em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;&lt;span class=&quot;mord mathnormal&quot; style=&quot;margin-right:0.10903em;&quot;&gt;M&lt;/span&gt;&lt;span class=&quot;msupsub&quot;&gt;&lt;span class=&quot;vlist-t vlist-t2&quot;&gt;&lt;span class=&quot;vlist-r&quot;&gt;&lt;span class=&quot;vlist&quot; style=&quot;height:0.3011em;&quot;&gt;&lt;span style=&quot;top:-2.55em;margin-left:-0.109em;margin-right:0.05em;&quot;&gt;&lt;span class=&quot;pstrut&quot; style=&quot;height:2.7em;&quot;&gt;&lt;/span&gt;&lt;span class=&quot;sizing reset-size6 size3 mtight&quot;&gt;&lt;span class=&quot;mord mtight&quot;&gt;1&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;vlist-s&quot;&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;vlist-r&quot;&gt;&lt;span class=&quot;vlist&quot; style=&quot;height:0.15em;&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;mord&quot;&gt;0&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt; are
similar except with Deep Coral. Best results for each
domain and method are bold in blue.&lt;/p&gt;
&lt;h4 id=&quot;discussion&quot; style=&quot;position:relative;&quot;&gt;&lt;a href=&quot;#discussion&quot; aria-label=&quot;discussion permalink&quot; class=&quot;anchor before&quot;&gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Discussion&lt;/h4&gt;
&lt;p&gt;Experiment results in Figure 3 shows ADDA and
Deep Coral’s improvements on the target domain.
Deep Coral generally outperform ADDA by a large
margin except on the High-Pass target domain. The
failure on this domain is mostly likely due to the drastic
domain shift between High-Pass and others, as illustrated
in Figure 2 in the &lt;em&gt;dataset&lt;/em&gt; section. Deep Coral has
better generalizability to unseen domains. It’s most
likely because Deep Coral doesn’t alter the encoder
much and the encoder is pretrained on the ImageNet
(though without any added noises).&lt;/p&gt;
&lt;p&gt;Table 1 shows our proposed Deep Coral+ADDA’s
results on the Tiny-16-Class-ImageNet and
MNIST-USPS. We added uniform noise (0.5) to
the validation set making the domain shift to the
training set even larger and the domain adaptation
task even harder. The high performance and concrete
improvements of our Deep Coral+ADDA method
over other settings validate the effectiveness of our
novel modifications and designs. We also test our
method on unseen and untrained target domain and
observe a significantly better results as shown in Table
2 in the appendix.&lt;/p&gt;
&lt;div class=&quot;footnotes&quot;&gt;
&lt;hr&gt;
&lt;ol&gt;
&lt;li id=&quot;fn-1&quot;&gt;
&lt;p&gt;Robert Geirhos, Carlos R Medina Temme, Jonas Rauber, Heiko H Schutt, Matthias Bethge, and Felix A Wichmann. Generalisation in humans and deep neural networks. arXiv preprint arXiv:1808.08750, 2018.  &lt;/p&gt;
&lt;a href=&quot;#fnref-1&quot; class=&quot;footnote-backref&quot;&gt;↩&lt;/a&gt;
&lt;/li&gt;
&lt;li id=&quot;fn-2&quot;&gt;
&lt;p&gt;Yann LeCun, Leon Bottou, Yoshua Bengio, and Patrick Haner. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278-2324, 1998.  &lt;/p&gt;
&lt;a href=&quot;#fnref-2&quot; class=&quot;footnote-backref&quot;&gt;↩&lt;/a&gt;
&lt;/li&gt;
&lt;li id=&quot;fn-3&quot;&gt;
&lt;p&gt;Yuval Netzer, Tao Wang, Adam Coates, Alessandro Bissacco, BoWu, and Andrew Y Ng. Reading digits in natural images with unsupervised feature learning. 2011.  &lt;/p&gt;
&lt;a href=&quot;#fnref-3&quot; class=&quot;footnote-backref&quot;&gt;↩&lt;/a&gt;
&lt;/li&gt;
&lt;li id=&quot;fn-4&quot;&gt;
&lt;p&gt;Baochen Sun and Kate Saenko. Deep coral: Correlation alignment for deep domain adaptation.  In European conference on computer vision, pages 443-450. Springer, 2016.  &lt;/p&gt;
&lt;a href=&quot;#fnref-4&quot; class=&quot;footnote-backref&quot;&gt;↩&lt;/a&gt;
&lt;/li&gt;
&lt;li id=&quot;fn-5&quot;&gt;
&lt;p&gt;Eric Tzeng, Judy Homan, Kate Saenko, and Trevor Darrell. Adversarial discriminative domain adaptation. In Proceedings of the IEEE con- ference on computer vision and pattern recogni- tion, pages 7167-7176, 2017.&lt;/p&gt;
&lt;a href=&quot;#fnref-5&quot; class=&quot;footnote-backref&quot;&gt;↩&lt;/a&gt;
&lt;/li&gt;
&lt;li id=&quot;fn-6&quot;&gt;
&lt;p&gt;Wang, Mei, and Weihong Deng. “Deep visual domain adaptation: A survey.” Neurocomputing 312 (2018): 135-153.&lt;/p&gt;
&lt;a href=&quot;#fnref-6&quot; class=&quot;footnote-backref&quot;&gt;↩&lt;/a&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/div&gt;</content:encoded></item><item><title><![CDATA[Dyadic Relational Graph Convolutional Networks for Skeleton-based Human Interaction Recognition]]></title><description><![CDATA[<p>We apply Graph Convolutional Networks on skeleton-based human-human interaction recognitions. We designed a Relational Adjacency Matrix (RAM) to represent dynamic relational graphs on the two actor's skeletons.</p> <p style="font-style: italic;">Liping Zhu*, <span style="font-weight: bold">Bohua Wan*</span>, Chengyang Li, Gangyi Tian, Yi Hou, Kun Yuan</p> <p style="font-style: italic;">Pattern Recognition 115 (2021): 107920.</p>]]></description><link>https://bohua-wan.netlify.com/posts/dyadic-relational-graph-convolutional-networks-for-skeleton-based-human-interaction-recognition</link><guid isPermaLink="false">https://bohua-wan.netlify.com/posts/dyadic-relational-graph-convolutional-networks-for-skeleton-based-human-interaction-recognition</guid><pubDate>Fri, 19 Feb 2021 14:13:40 GMT</pubDate><content:encoded>&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;#highlights&quot;&gt;Highlights&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#abstract&quot;&gt;Abstract&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#method&quot;&gt;Method&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#results&quot;&gt;Results&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#conclusion&quot;&gt;Conclusion&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Here, I briefly introduce our work. Some contents are extracted from the accepted version of our paper. For more information please see &lt;a href=&quot;https://www.sciencedirect.com/science/article/pii/S0031320321001072&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener noreferrer&quot;&gt;our paper&lt;/a&gt;.&lt;/strong&gt; Code is available at &lt;a href=&quot;https://github.com/GlenGGG/DR-GCN&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener noreferrer&quot;&gt;github&lt;/a&gt;.&lt;/p&gt;
&lt;center&gt;
    &lt;img style=&quot;border-radius: 0.3125em;
    box-shadow: 0 2px 4px 0 rgba(34,36,38,.12),0 2px 10px 0 rgba(34,36,38,.08);&quot; 
    src=&quot;/media/paper-images/DR-GCN/structure.jpg&quot;&gt;
    &lt;br&gt;
    &lt;div style=&quot;color:orange; border-bottom: 1px solid #d9d9d9;
    display: inline-block;
    color: #999;
    padding: 2px;&quot;&gt;
	Overall architecture of DR-GCN.
	&lt;/div&gt;
&lt;/center&gt;
&lt;h2 id=&quot;highlights&quot; style=&quot;position:relative;&quot;&gt;&lt;a href=&quot;#highlights&quot; aria-label=&quot;highlights permalink&quot; class=&quot;anchor before&quot;&gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Highlights&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;We are the first to construct dynamic graphs on skeleton sequences that capture discriminative relations between skeletons.&lt;/li&gt;
&lt;li&gt;Relational Adjacency Matrix is proposed to present relational graphs using geometric features and relative attention.&lt;/li&gt;
&lt;li&gt;Proposed Dyadic Relational Graph Convolutional Network achieves state-of-the-art accuracy on three challenging datasets and improvements of 6.63% on NTU-RGB+D and 5.47% on NTU-RGB+D 120 over the baseline model.&lt;/li&gt;
&lt;li&gt;Our methods consistently help advanced models achieve higher accuracy of 1.26% on NTU-RGB+D and 2.86% on NTU-RGB+D 120.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&quot;abstract&quot; style=&quot;position:relative;&quot;&gt;&lt;a href=&quot;#abstract&quot; aria-label=&quot;abstract permalink&quot; class=&quot;anchor before&quot;&gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Abstract&lt;/h2&gt;
&lt;p&gt;&lt;em&gt;Skeleton-based human interaction recognition is a challenging task requiring all abilities to recognize spatial, temporal, and interactive features. These abilities rarely co-exist in existing methods. Graph convolutional network (GCN) based methods fail to extract interactive features. Traditional interaction recognition methods cannot effectively capture spatial features from skeletons. Toward this end, we propose a novel Dyadic Relational Graph Convolutional Network (DR-GCN) for interaction recognition. Specifically, we make four contributions: (i) we design a Relational Adjacency Matrix (RAM) that represents dynamic relational graphs. These graphs are constructed combining both geometric features and relative attention from the two skeleton sequences; (ii) we propose a Dyadic Relational Graph Convolution Block (DR-GCB) that extracts spatial-temporal interactive features; (iii) we stack the proposed DR-GCBs to build DR-GCN and integrate our methods with an advanced model. (iv) Our models achieve state-of-the-art results on SBU and significant improvements on the mutual action sub-datasets of NTU-RGB+D and NTU-RGB+D 120.&lt;/em&gt;&lt;/p&gt;
&lt;h2 id=&quot;method&quot; style=&quot;position:relative;&quot;&gt;&lt;a href=&quot;#method&quot; aria-label=&quot;method permalink&quot; class=&quot;anchor before&quot;&gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Method&lt;/h2&gt;
&lt;center&gt;
    &lt;img style=&quot;border-radius: 0.3125em;
    box-shadow: 0 2px 4px 0 rgba(34,36,38,.12),0 2px 10px 0 rgba(34,36,38,.08);&quot; 
    src=&quot;/media/paper-images/DR-GCN/FIG3c.jpg&quot;&gt;
    &lt;br&gt;
    &lt;div style=&quot;color:orange; border-bottom: 1px solid #d9d9d9;
    display: inline-block;
    color: #999;
    padding: 2px;&quot;&gt;An illustration of the relational graph. Green dots in this image represents
body joints. The orange links represent relational links denoting the strong relation between joints of the two actors.&lt;/div&gt;
&lt;/center&gt;
&lt;p&gt;Above image shows one relational graph at a single frame, which is represented by proposed Relational Adjacency Matrix. It is generated separately for each frame of a sequence of frames in the skeleton sequence. &lt;/p&gt;
&lt;center&gt;
    &lt;img style=&quot;border-radius: 0.3125em;
    box-shadow: 0 2px 4px 0 rgba(34,36,38,.12),0 2px 10px 0 rgba(34,36,38,.08);&quot; 
    src=&quot;/media/paper-images/DR-GCN/FIG5.jpg&quot;&gt;
    &lt;br&gt;
    &lt;div style=&quot;color:orange; border-bottom: 1px solid #d9d9d9;
    display: inline-block;
    color: #999;
    padding: 2px;&quot;&gt;
	An illustration of the Relational Adjacency Matrix (RAM) generation procedure.
	&lt;/div&gt;
&lt;/center&gt;
&lt;p&gt;The generation and utilization are the key components of our paper. Briefly speaking, we generate the relational links, or the RAM, which represents the relational links by considering two components. They are the geometric component and the relative attention component. The geometric component is straitforward. If two joints each from one actor are close, then we consider them to be correlated. This simple assumption turns out to be very effective. For the relative attention component, we hope it can capture semantic information and connect joints that are semantically similar. We do this by first encode each joint with spatial-temporal graph convolutional layers and then calculate similarity between each joint pairs. Basing on above two component, we combine them using network-learned param and then we have the RAM.&lt;/p&gt;
&lt;center&gt;
    &lt;img style=&quot;border-radius: 0.3125em;
    box-shadow: 0 2px 4px 0 rgba(34,36,38,.12),0 2px 10px 0 rgba(34,36,38,.08);&quot; 
    src=&quot;/media/paper-images/DR-GCN/FIG6.jpg&quot;&gt;
    &lt;br&gt;
    &lt;div style=&quot;color:orange; border-bottom: 1px solid #d9d9d9;
    display: inline-block;
    color: #999;
    padding: 2px;&quot;&gt;
	An illustration of Dyadic Relational Graph Convolution Block (DR-GCB). DR-GC refers to dyadic relational graph convolution.
	&lt;/div&gt;
&lt;/center&gt;
&lt;p&gt;With the RAM, we propose Dyadic Relational Graph Convolution Block (DR-GCB) that apply dyadic relational graph convolution on the two skeletons to learn relational features. DR-GCB is highly extensible and can be plugged to other networks to improve their performance.&lt;/p&gt;
&lt;h2 id=&quot;results&quot; style=&quot;position:relative;&quot;&gt;&lt;a href=&quot;#results&quot; aria-label=&quot;results permalink&quot; class=&quot;anchor before&quot;&gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Results&lt;/h2&gt;
&lt;p&gt;We have done extensive experiments. Results show our network and methods achieve significantly better results comparing with other state-of-the-art methods. They also prove the extensibility of our methods. To review the data, please read &lt;a href=&quot;https://www.sciencedirect.com/science/article/pii/S0031320321001072&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener noreferrer&quot;&gt;our paper&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Below we show some generated relational graphs.
&lt;img src=&quot;/54ad5b0f655fc3f3ca559d7a90c70bdc/demo1.gif&quot; alt=&quot;demo-1&quot;&gt;
&lt;img src=&quot;/13d03f15fc5253a5b2a85f535ee964c6/demo2.gif&quot; alt=&quot;demo-2&quot;&gt;
&lt;img src=&quot;/b469fd1d61965213c7295b25085039e2/demo3.gif&quot; alt=&quot;demo-3&quot;&gt;
&lt;img src=&quot;/4da7a4b137cf02fcc8c7f4b7a8689ce0/demo4.gif&quot; alt=&quot;demo-4&quot;&gt;&lt;/p&gt;
&lt;center&gt;
	&lt;div style=&quot;color:orange; border-bottom: 1px solid #d9d9d9;
	display: inline-block;
	color: #999;
	padding: 2px;&quot;&gt;
	Some demos of the generated relational graphs.
	&lt;/div&gt;
&lt;/center&gt;
&lt;h2 id=&quot;conclusion&quot; style=&quot;position:relative;&quot;&gt;&lt;a href=&quot;#conclusion&quot; aria-label=&quot;conclusion permalink&quot; class=&quot;anchor before&quot;&gt;&lt;svg aria-hidden=&quot;true&quot; focusable=&quot;false&quot; height=&quot;16&quot; version=&quot;1.1&quot; viewBox=&quot;0 0 16 16&quot; width=&quot;16&quot;&gt;&lt;path fill-rule=&quot;evenodd&quot; d=&quot;M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z&quot;&gt;&lt;/path&gt;&lt;/svg&gt;&lt;/a&gt;Conclusion&lt;/h2&gt;
&lt;p&gt;This article is only meant for a brief introduction, if interested please read &lt;a href=&quot;https://www.sciencedirect.com/science/article/pii/S0031320321001072&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener noreferrer&quot;&gt;our paper&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Our paper presents a novel Dyadic Relational Graph Convolutional
Network (DR-GCN) for skeleton-based interaction recognition. We devise
Relational Adjacency Matrix (RAM) denoting relational graph. It combines
both the geometric features and relative attention of the two skeletons in
interaction. Dyadic Relational Graph Convolution Block (DR-GCB) is fur-
ther proposed to extract spatial-temporal interactive features with RAM.
We stack multiple layers of DR-GCBs to build the backbone of our network.
We further propose Two-Stream Dyadic Relational AGCN (2S-DRAGCN)
that demonstrates our methods’ compatibility with ST-GCN based mod-
els. Our proposed models show superior abilities in interaction recognition.
They achieve the highest accuracy on the mutual action sub-dataset of NTU-
RGB+D, that of NTU-RGB+D 120, and the interaction dataset, SBU.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;© 2021. Contents from the accepted version is made available under the CC-BY-NC-ND 4.0 license &lt;a href=&quot;http://creativecommons.org/licenses/by-nc-nd/4.0/&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener noreferrer&quot;&gt;http://creativecommons.org/licenses/by-nc-nd/4.0/&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;</content:encoded></item></channel></rss>