Bohua Wan's personal site

Towards Virtual Clinical Trials of Radiology AI with Conditional Generative Modeling

Sat, 01 Jun 2024 14:13:40 GMT

Framework for virtual clinical trials using conditional generative modeling.

Highlights

Novel framework for conducting virtual clinical trials to evaluate radiology AI systems.
Conditional generative modeling to synthesize realistic and diverse medical imaging scenarios.
Systematic evaluation of AI performance across controlled variations in patient characteristics and imaging conditions.
Addresses limitations of traditional clinical trials including cost, time, and limited scenario coverage.

Abstract

Rigorous evaluation of radiology AI systems is essential for safe clinical deployment, but traditional clinical trials are expensive, time-consuming, and limited in their ability to test AI performance across the full spectrum of clinical scenarios. We propose a framework for virtual clinical trials that uses conditional generative models to synthesize realistic medical images with controlled variations in patient characteristics, disease presentations, and imaging parameters. This enables systematic evaluation of AI performance across diverse scenarios that may be rare or difficult to acquire in real clinical settings. Our conditional generative models are trained to produce high-fidelity medical images conditioned on relevant clinical variables such as patient demographics, disease severity, and imaging protocol. By sampling from these models, we can create large-scale synthetic test sets that comprehensively probe AI system behavior. We demonstrate that virtual clinical trials can reveal performance variations and failure modes that may not be apparent from evaluation on standard test sets. This approach provides a scalable, cost-effective complement to traditional clinical trials, enabling more thorough pre-deployment validation of radiology AI systems.

Method

Conditional generative model architecture for synthesizing medical images with controlled attributes.

Our framework consists of three main components: conditional generative modeling, virtual trial design, and comprehensive AI evaluation.

The conditional generative modeling component learns to synthesize realistic medical images conditioned on clinical variables. We employ advanced generative architectures such as conditional GANs or diffusion models that can capture the complex distribution of medical images while maintaining controllability through conditioning. The conditioning variables include patient demographics (age, sex, etc.), disease characteristics (type, severity, location), and imaging parameters (scanner type, acquisition protocol).

Virtual trial design process showing systematic variation of clinical parameters.

The virtual trial design component systematically varies conditioning variables to create comprehensive test scenarios. This allows us to evaluate AI performance across different patient subgroups, disease presentations, and imaging conditions. The design follows principles from real clinical trial methodology but with the flexibility to test scenarios that may be impractical in real trials.

The comprehensive evaluation component assesses AI system performance on the synthesized test sets, analyzing not only overall accuracy but also performance stratified by clinical variables. This reveals potential biases, performance gaps in specific subpopulations, and failure modes that may not be apparent from aggregate metrics.

Results

Our framework successfully generates realistic medical images that are clinically plausible and diagnostically useful. Radiologist evaluation confirms that synthetic images are difficult to distinguish from real images and maintain clinical relevance.

Examples of synthetic medical images generated with different conditioning variables.

Virtual clinical trials reveal important insights about AI system performance. By systematically varying clinical parameters, we identify performance degradation in specific scenarios such as rare disease presentations or suboptimal imaging conditions. We also uncover biases related to patient demographics that may not be apparent from standard evaluation.

AI performance analysis across different clinical scenarios in virtual trials.

The virtual trial framework enables identification of failure modes and performance boundaries that would require prohibitively large real clinical trials to discover. This provides valuable information for improving AI systems and defining appropriate use cases for clinical deployment.

Conclusion

This article is only meant for a brief introduction.

We present a framework for virtual clinical trials of radiology AI systems using conditional generative modeling. By synthesizing realistic medical images with controlled variations in clinical parameters, we enable comprehensive evaluation of AI performance across diverse scenarios. This approach addresses key limitations of traditional clinical trials including cost, duration, and limited coverage of rare or challenging cases. Virtual trials reveal performance variations and potential biases that may not be apparent from standard evaluation, providing valuable insights for AI development and deployment decisions. While not replacing real clinical trials, our framework offers a powerful complementary tool for rigorous pre-deployment validation of medical AI systems, ultimately contributing to safer and more effective clinical AI deployment.

Deep learning xerostomia prediction model with anatomy normalization and high-resolution class activation map

Thu, 01 Feb 2024 14:13:40 GMT

Paper: https://doi.org/10.1117/12.3046796

Overall architecture of the xerostomia prediction model with anatomy normalization and high-resolution CAM.

Highlights

Novel anatomy normalization approach to standardize medical images for improved model generalization.
High-resolution class activation mapping (CAM) for fine-grained spatial interpretability.
Improved prediction accuracy through anatomically-aligned feature learning.
Enhanced clinical interpretability enabling identification of critical anatomical regions contributing to xerostomia risk.

Abstract

Radiation-induced xerostomia remains a significant challenge in head and neck cancer radiotherapy. While deep learning models have shown promise in predicting treatment outcomes, their clinical adoption is limited by lack of interpretability and challenges in handling anatomical variations across patients. We propose a deep learning framework that incorporates anatomy normalization to standardize patient-specific anatomical variations and employs high-resolution class activation maps (CAM) to provide spatially-precise explanations of model predictions. The anatomy normalization module aligns anatomical structures across patients, enabling the model to learn more generalizable features. The high-resolution CAM provides fine-grained visualization of which anatomical regions contribute most to xerostomia risk, offering valuable insights for clinicians. Our approach achieves superior prediction performance while maintaining high interpretability, demonstrating the importance of combining domain knowledge with deep learning for medical outcome prediction.

Method

Illustration of the anatomy normalization process.

Our method consists of two key innovations: anatomy normalization and high-resolution class activation mapping.

The anatomy normalization module addresses the challenge of anatomical variation across patients. By aligning key anatomical structures before feature extraction, we enable the model to learn features that are robust to patient-specific anatomical differences. This normalization is performed using deformable registration guided by anatomical landmarks and segmentation masks.

High-resolution class activation maps showing regions contributing to xerostomia prediction.

The high-resolution CAM module provides detailed spatial explanations of model predictions. Unlike traditional CAM methods that produce low-resolution activation maps, our approach generates high-resolution visualizations that precisely localize anatomical regions contributing to prediction. This is achieved through a specialized upsampling strategy that preserves spatial details while maintaining semantic meaning.

Results

Our framework demonstrates superior performance in xerostomia prediction while providing clinically meaningful interpretations. The anatomy normalization significantly improves model generalization across diverse patient populations. The high-resolution CAMs successfully identify known risk factors such as parotid gland dose distributions and reveal novel spatial patterns associated with xerostomia risk.

Ablation studies confirm that both anatomy normalization and high-resolution CAM contribute to improved performance and interpretability. Clinical evaluation by radiation oncologists validates that the CAM visualizations align with clinical knowledge and provide actionable insights for treatment planning.

Conclusion

This article is only meant for a brief introduction.

We present a deep learning framework for xerostomia prediction that combines anatomy normalization with high-resolution class activation mapping. The anatomy normalization module enables robust feature learning across patients with varying anatomical structures, improving model generalization. The high-resolution CAM provides fine-grained spatial interpretability, identifying specific anatomical regions contributing to xerostomia risk. Our approach achieves state-of-the-art prediction performance while maintaining clinical interpretability, demonstrating the value of incorporating medical domain knowledge into deep learning models. This work represents an important step toward clinically deployable AI systems for personalized radiation therapy planning.

Deep learning prediction of radiation-induced xerostomia with supervised contrastive pre-training and cluster-guided loss

Mon, 01 Jan 2024 14:13:40 GMT

Paper: https://doi.org/10.1117/12.3004498

Overall architecture of the supervised contrastive pre-training framework.

Highlights

Novel supervised contrastive pre-training strategy for radiation-induced xerostomia prediction.
Cluster-guided loss function to improve model performance on imbalanced medical datasets.
State-of-the-art performance in predicting radiation-induced xerostomia.
Improved generalization through contrastive learning on limited medical imaging data.

Abstract

Radiation-induced xerostomia is a common side effect of head and neck cancer radiotherapy that significantly impacts patients’ quality of life. Accurate prediction of xerostomia risk before treatment could enable personalized treatment planning. We propose a deep learning framework that combines supervised contrastive pre-training with cluster-guided loss to predict radiation-induced xerostomia. The supervised contrastive learning approach learns robust feature representations from limited medical imaging data, while the cluster-guided loss addresses class imbalance issues common in medical datasets. Our method achieves superior performance compared to existing approaches, demonstrating the effectiveness of combining contrastive learning with specialized loss functions for medical outcome prediction.

Method

Our method consists of two main components: supervised contrastive pre-training and cluster-guided loss.

The supervised contrastive pre-training phase learns discriminative feature representations by pulling together samples from the same class while pushing apart samples from different classes in the embedding space. This approach is particularly effective for medical imaging tasks where labeled data is limited.

The cluster-guided loss addresses the class imbalance problem by incorporating cluster information into the loss function. This ensures that the model learns to distinguish between different outcome groups even when some classes have significantly fewer samples.

Results

Our framework achieves state-of-the-art performance on xerostomia prediction tasks. The supervised contrastive pre-training significantly improves feature quality, while the cluster-guided loss effectively handles class imbalance. Ablation studies demonstrate the contribution of each component to the overall performance.

Conclusion

This article is only meant for a brief introduction.

We present a novel deep learning framework for predicting radiation-induced xerostomia that combines supervised contrastive pre-training with cluster-guided loss. The supervised contrastive learning approach enables effective learning from limited medical imaging data by learning robust feature representations. The cluster-guided loss addresses class imbalance issues common in medical outcome prediction tasks. Our experimental results demonstrate that this combination significantly improves prediction accuracy compared to existing methods, providing a promising tool for personalized radiation therapy planning.

Spatial-temporal attention for video-based assessment of intraoperative surgical skill

Thu, 01 Jun 2023 14:13:40 GMT

Paper: https://doi.org/10.1038/s41598-024-77176-1

Overall architecture of the spatial-temporal attention network for surgical skill assessment.

Highlights

Novel spatial-temporal attention mechanism tailored for surgical video analysis.
Automated objective assessment of surgical skill from intraoperative videos.
Attention visualization reveals important surgical actions and anatomical regions correlated with skill level.
State-of-the-art performance on multiple surgical skill assessment benchmarks.

Abstract

Objective assessment of surgical skill is crucial for surgical training, credentialing, and quality improvement. Traditional methods rely on manual expert evaluation, which is subjective, time-consuming, and resource-intensive. We propose an automated surgical skill assessment framework based on spatial-temporal attention mechanisms applied to intraoperative videos. Our method learns to identify and focus on critical surgical actions and anatomical regions that are indicative of skill level. The spatial attention module identifies important regions in each video frame, such as surgical instruments and key anatomical structures. The temporal attention module captures the dynamics of surgical workflow and the temporal patterns that distinguish expert from novice performance. By combining these complementary attention mechanisms, our model achieves objective, consistent, and interpretable surgical skill assessment. Experimental results on multiple surgical datasets demonstrate that our approach achieves superior performance compared to existing methods and provides insights into the visual cues associated with surgical expertise.

Method

Illustration of the spatial attention mechanism identifying critical regions in surgical videos.

Our approach consists of two main components: spatial attention and temporal attention.

The spatial attention module processes each video frame to identify regions that are most relevant for skill assessment. Rather than treating all regions equally, the spatial attention mechanism learns to focus on surgical instruments, target anatomy, and areas where critical actions occur. This is implemented through a learnable attention map that weighs different spatial regions based on their importance for skill classification.

Temporal attention weights across video frames showing important surgical phases.

The temporal attention module analyzes the sequence of frames to capture surgical workflow dynamics and temporal patterns. Expert surgeons exhibit smoother, more efficient movements and better adherence to optimal surgical sequences. The temporal attention mechanism learns to identify these temporal signatures of expertise by attending to key phases of the procedure and transitions between surgical actions.

The spatial and temporal features are integrated through a fusion layer, and the combined representation is used for skill level prediction. This joint spatial-temporal modeling enables comprehensive understanding of surgical performance.

Results

Our framework achieves state-of-the-art performance on standard surgical skill assessment benchmarks. The spatial-temporal attention mechanism significantly outperforms methods using only spatial or only temporal features, demonstrating the importance of their combination.

Attention visualizations showing regions and time points the model focuses on for skill assessment.

The attention visualizations provide interpretable insights into what the model considers important for skill assessment. Spatial attention maps highlight surgical instruments and critical anatomical structures. Temporal attention weights reveal that the model learns to focus on challenging phases of the procedure where skill differences are most pronounced.

Comparison of skill assessment performance across different methods and datasets.

Conclusion

This article is only meant for a brief introduction.

We present a spatial-temporal attention framework for automated surgical skill assessment from intraoperative videos. The spatial attention module identifies critical regions in each frame, while the temporal attention module captures the dynamics of surgical workflow. By combining these complementary attention mechanisms, our model achieves accurate, objective, and interpretable surgical skill assessment. The attention visualizations provide insights into the visual and temporal cues associated with surgical expertise, which could inform surgical training curricula. Our approach demonstrates the potential of deep learning to provide scalable, consistent surgical skill evaluation, supporting surgical education and quality improvement initiatives.

Combining ADDA with Deep CORAL: Unsupervised Domain Adaptation for Image Classification

Sun, 23 May 2021 14:13:40 GMT

An illustration of our proposed method combining Deep Coral and ADDA. Blue and orange arrows denote data flows of source and target domain respectively. Blue encoder and classifier are pretrained and fixed.

Abstract

Unsupervised domain adaptation techniques are essential for image classification tasks in the real world. As the domain of images, or the space of all possible images, is so enormous that models trained on any dataset will inevitably suffer from out of domain issues. One promising research direction is to use domain adaptation methods to adapt models trained on source domain to the target domain. Adversarial Discriminative Domain Adaptation (ADDA) is one typical adversarial learning based unsupervised domain adaptation method. Though it is proved to be effective on simple and small datasets, it requires sophisticated training strateies and is hard to converge at times. We propose to force align the distribution of the model’s output with that of an adapted model, which also serves as the initialization for the adversarial training. In this way, the adversarial process will be forced to search within a space with results at least as good as the initialization. Experiments on our proposed Tiny-16-Class-Imagenet show our method is effective and efficient in terms of accuracies and training time.

Introduction

Background

By generalizability, we refer to the model’s ability to perform equally well on unseen data. The word, “domain”, in this article denotes the space of input features $X$ and the marginal distribution $P(X)$ . Specifically, for image classification tasks, the domain of training dataset is the set of all possible images and the marginal distribution in this dataset ⁶. It is crucuial for models to be generalizable doing image classification tasks as the space of possible images is too big that any dataset can only capture one small fraction of it and if the model fails to generalize, then it is useless. Domain shift refers two domains being different, which is common. For example, when using a model trained with images taken in daylight, but used with images taken at night. Unsurprisingly, the model usually fails. Different patterns of perturbations like noises imposed on images are another souce of domain shift. To solve the problem of domain shift, one promising research area is domain adaptation, which aims to adapt a model trained on source domain to the target domain. In this project, we investigate the unsupervised domain adaptation problem, which does not require the target domain to be labeled.

Extensive domain adaptation algorithms have been proposed to account for the degradation in performance due to domain shift. Deep Coral ⁴ extends the unsupervised domain adaption method Coral to learn a nonlinear transformation that is able to align correlations of layer activations in deep neural networks. Adversarial Discriminative Domain Adaptation (ADDA) ⁵ combines discriminative model and generative adversarial networks to learn a discriminative mapping by fooling a domain discriminator.

Method

Datasets

Figure 2: Sample noises in the Tiny-16-Class-ImageNet dataset. Top row from left to right: No noise, uniform noise, salt-and-pepper noise. Bottom row from left to right: rotation, high-pass, low-pass. Image manipulations follow the procedure in ¹.

We conduct experiments on two datasets: Tiny-16-Class-ImageNet and MNIST-USPS² $^,$ ³. Most experiments are done on the Tiny-16-Class- ImageNet, which is self-produced following guidelines in ¹. The Tiny-16-Class-ImageNet has three subsets: training set, validation set and test set, each containing 10015, 1269 and 10350 images respectively. All three subsets have 16 general classes (like bear rather than brown bear), but with different domains. Training and validation sets contain samples of different sub-classes (brown bear vs black bear). We apply different patterns of noises to generate different domains. Sample noises are illustrated in figure 2. Test set contains all samples from every sub-classes (brown bear, black bear, etc). We have also tested our proposed method on MNIST-USPS dataset.

Deep Coral

We adapt the idea of Deep Coral ⁴ to simply align second-order statistics in the last layer of the backbone network by adding a coral loss. This method is simple yet effective and is very extensible. We replace the backbone of the Deep Coral with ResNet-50 pretrained on ImageNet when doing experiments on the Tiny-16-Class-ImageNet. We use the same SGD hyper-parametsers as in ⁴ The $\lambda$ controlling the weight of the coral loss is set the same with ⁴, except on MNIST-USPS dataset, where we set $\lambda = 1- \frac{epoch}{num\_epochs}$ .

ADDA

We also adopt the idea of ADDA by first learning a discriminative representation using data from the source domain and then learning another encoding that maps the target domain to the source domain with a domain-adversarial loss. We use ResNet-50 (excluding the last layer) as the backbone for encoder and a $3$ layer MLP as the discriminator with hidden size of $1024$ . The pretrained ResNet-50 will be freezed during adversarial training. Adam is used as the optimizer with $\beta_1=0.5$ and $\beta_2=0.999$ . The learning rate is set to be $0.0002$ and the batch size is $32$ . During the adaption stage, target encoder is updated every $4$ steps.

ADDA-CORAL

We propose a new method that combines the Deep Coral and the ADDA methods, by using Deep Coral as the pretraining of the ADDA, and aligning the target domains’ second order statistics between the classification outputs of the fixed pretrained encoder and the ADDA trained target encoder. The overall architecture is illustrated in Fig.1. During experiments, we find that vanilla ADDA ruins the pretrained encoder due to the poorly trained discriminator. To better use the initialization of the Deep Coral pretrained encoder while ensuring the target encoder learned will generate similar features for target and source domain, we use coral loss to only align the ADDA trained encoder’s classification output with that of the fixed pretrained encoder, and gradually decrease the coral loss’s weight.

The underlying assumption we made here is that we assume the best possible solution lies near (with respect to learning using Adam) to the already good initialization in the solution space.

Results

Table 1: Our Deep Coral+ADDA’s results on Tiny-16-Class-ImageNet and MINIST-USPS.

Setting	Source	target	Acc
ResNet-50	train	val $^\dag$	$25.13\%$
ADDA	train	val $^\dag$	$48.32\%$
Deep Coral	train	val $^\dag$	$73.52\%$
Ours	train	val $^\dag$	$77.69\%$
LeNet	MINST	USPS	$25.13\%$
ADDA	MINST	USPS	$89.40\%$
Deep Coral	MINST	USPS	$54.30\%$
Ours	MINST	USPS	$94.56\%$

$^\dag$ : validation set with uniform noise (0.5)

Table 2: Our Deep Coral+ADDA’s results on unseen test set of Tiny-16-Class-ImageNet.

Setting	Train Source	Train target	Unseen Target	Acc
ResNet-50	train	None	Test $^\dag$	$5.34\%$
ResNet-50-ImageNet	train	None	Test $^\dag$	$13.14\%$
DeepCoral	train	val $^\dag$	Test $^\dag$	$38.37\%$
Ours	train	val $^\dag$	Test $^\dag$	$52.96\%$

$^\dag$ : validation set with uniform noise (0.5)

Figrue 3: Classification accuracy in percent for different domains. Model $M_0$ is only trained on the source domain. Models $M_1$ to $M_5$ are adapted on one target domain (in red rectangle) via ADDA. $M_6$ to $M_10$ are similar except with Deep Coral. Best results for each domain and method are bold in blue.

Discussion

Experiment results in Figure 3 shows ADDA and Deep Coral’s improvements on the target domain. Deep Coral generally outperform ADDA by a large margin except on the High-Pass target domain. The failure on this domain is mostly likely due to the drastic domain shift between High-Pass and others, as illustrated in Figure 2 in the dataset section. Deep Coral has better generalizability to unseen domains. It’s most likely because Deep Coral doesn’t alter the encoder much and the encoder is pretrained on the ImageNet (though without any added noises).

Table 1 shows our proposed Deep Coral+ADDA’s results on the Tiny-16-Class-ImageNet and MNIST-USPS. We added uniform noise (0.5) to the validation set making the domain shift to the training set even larger and the domain adaptation task even harder. The high performance and concrete improvements of our Deep Coral+ADDA method over other settings validate the effectiveness of our novel modifications and designs. We also test our method on unseen and untrained target domain and observe a significantly better results as shown in Table 2 in the appendix.

Robert Geirhos, Carlos R Medina Temme, Jonas Rauber, Heiko H Schutt, Matthias Bethge, and Felix A Wichmann. Generalisation in humans and deep neural networks. arXiv preprint arXiv:1808.08750, 2018.
↩
Yann LeCun, Leon Bottou, Yoshua Bengio, and Patrick Haner. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278-2324, 1998.
↩
Yuval Netzer, Tao Wang, Adam Coates, Alessandro Bissacco, BoWu, and Andrew Y Ng. Reading digits in natural images with unsupervised feature learning. 2011.
↩
Baochen Sun and Kate Saenko. Deep coral: Correlation alignment for deep domain adaptation. In European conference on computer vision, pages 443-450. Springer, 2016.
↩
Eric Tzeng, Judy Homan, Kate Saenko, and Trevor Darrell. Adversarial discriminative domain adaptation. In Proceedings of the IEEE con- ference on computer vision and pattern recogni- tion, pages 7167-7176, 2017.
↩
Wang, Mei, and Weihong Deng. “Deep visual domain adaptation: A survey.” Neurocomputing 312 (2018): 135-153.
↩

Dyadic Relational Graph Convolutional Networks for Skeleton-based Human Interaction Recognition

Fri, 19 Feb 2021 14:13:40 GMT

Highlights
Abstract
Method
Results
Conclusion

Here, I briefly introduce our work. Some contents are extracted from the accepted version of our paper. For more information please see our paper. Code is available at github.

Overall architecture of DR-GCN.

Highlights

We are the first to construct dynamic graphs on skeleton sequences that capture discriminative relations between skeletons.
Relational Adjacency Matrix is proposed to present relational graphs using geometric features and relative attention.
Proposed Dyadic Relational Graph Convolutional Network achieves state-of-the-art accuracy on three challenging datasets and improvements of 6.63% on NTU-RGB+D and 5.47% on NTU-RGB+D 120 over the baseline model.
Our methods consistently help advanced models achieve higher accuracy of 1.26% on NTU-RGB+D and 2.86% on NTU-RGB+D 120.

Abstract

Skeleton-based human interaction recognition is a challenging task requiring all abilities to recognize spatial, temporal, and interactive features. These abilities rarely co-exist in existing methods. Graph convolutional network (GCN) based methods fail to extract interactive features. Traditional interaction recognition methods cannot effectively capture spatial features from skeletons. Toward this end, we propose a novel Dyadic Relational Graph Convolutional Network (DR-GCN) for interaction recognition. Specifically, we make four contributions: (i) we design a Relational Adjacency Matrix (RAM) that represents dynamic relational graphs. These graphs are constructed combining both geometric features and relative attention from the two skeleton sequences; (ii) we propose a Dyadic Relational Graph Convolution Block (DR-GCB) that extracts spatial-temporal interactive features; (iii) we stack the proposed DR-GCBs to build DR-GCN and integrate our methods with an advanced model. (iv) Our models achieve state-of-the-art results on SBU and significant improvements on the mutual action sub-datasets of NTU-RGB+D and NTU-RGB+D 120.

Method

An illustration of the relational graph. Green dots in this image represents body joints. The orange links represent relational links denoting the strong relation between joints of the two actors.

Above image shows one relational graph at a single frame, which is represented by proposed Relational Adjacency Matrix. It is generated separately for each frame of a sequence of frames in the skeleton sequence.

An illustration of the Relational Adjacency Matrix (RAM) generation procedure.

The generation and utilization are the key components of our paper. Briefly speaking, we generate the relational links, or the RAM, which represents the relational links by considering two components. They are the geometric component and the relative attention component. The geometric component is straitforward. If two joints each from one actor are close, then we consider them to be correlated. This simple assumption turns out to be very effective. For the relative attention component, we hope it can capture semantic information and connect joints that are semantically similar. We do this by first encode each joint with spatial-temporal graph convolutional layers and then calculate similarity between each joint pairs. Basing on above two component, we combine them using network-learned param and then we have the RAM.

An illustration of Dyadic Relational Graph Convolution Block (DR-GCB). DR-GC refers to dyadic relational graph convolution.

With the RAM, we propose Dyadic Relational Graph Convolution Block (DR-GCB) that apply dyadic relational graph convolution on the two skeletons to learn relational features. DR-GCB is highly extensible and can be plugged to other networks to improve their performance.

Results

We have done extensive experiments. Results show our network and methods achieve significantly better results comparing with other state-of-the-art methods. They also prove the extensibility of our methods. To review the data, please read our paper.

Below we show some generated relational graphs.

Some demos of the generated relational graphs.

Conclusion

This article is only meant for a brief introduction, if interested please read our paper.

Our paper presents a novel Dyadic Relational Graph Convolutional Network (DR-GCN) for skeleton-based interaction recognition. We devise Relational Adjacency Matrix (RAM) denoting relational graph. It combines both the geometric features and relative attention of the two skeletons in interaction. Dyadic Relational Graph Convolution Block (DR-GCB) is fur- ther proposed to extract spatial-temporal interactive features with RAM. We stack multiple layers of DR-GCBs to build the backbone of our network. We further propose Two-Stream Dyadic Relational AGCN (2S-DRAGCN) that demonstrates our methods’ compatibility with ST-GCN based mod- els. Our proposed models show superior abilities in interaction recognition. They achieve the highest accuracy on the mutual action sub-dataset of NTU- RGB+D, that of NTU-RGB+D 120, and the interaction dataset, SBU.

Bohua Wan's personal site

Towards Virtual Clinical Trials of Radiology AI with Conditional Generative Modeling

Highlights

Abstract

Method

Results

Conclusion

Deep learning xerostomia prediction model with anatomy normalization and high-resolution class activation map

Highlights

Abstract

Method

Results

Conclusion

Deep learning prediction of radiation-induced xerostomia with supervised contrastive pre-training and cluster-guided loss

Highlights

Abstract

Method

Results

Conclusion

Spatial-temporal attention for video-based assessment of intraoperative surgical skill

Highlights

Abstract

Method

Results

Conclusion

Combining ADDA with Deep CORAL: Unsupervised Domain Adaptation for Image Classification

Abstract

Introduction

Background

Related Work

Method

Datasets

Deep Coral

ADDA

ADDA-CORAL

Results

Discussion

Dyadic Relational Graph Convolutional Networks for Skeleton-based Human Interaction Recognition

Highlights

Abstract

Method

Results

Conclusion