MMCD: Multi-Modal Collaborative Decision-Making for Connected Autonomy with Knowledge Distillation

Rui Liu, Zikang Wang, Peng Gao, Yu Shen, Pratap Tokekar, Ming Lin

Overview of the MMCD framework with knowledge distillation. Each connected vehicle processes its own RGB data, LiDAR data, or both locally to generate feature embeddings, which are then shared to the ego vehicle. Upon receiving the shared data, the ego vehicle fuses the multi-vehicle, multi-modality features to generate its action, such as braking, particulary in accident-prone scenarios. During the training stage, LiDAR feature embeddings are used as a teacher to guide the learning of RGB feature embeddings through cross-modal knowledge distillation.

Abstract

Autonomous systems have advanced significantly, but challenges persist in accident-prone environments where robust decision-making is crucial. A single vehicle's limited sensor range and obstructed views increase the likelihood of accidents. Multi-vehicle connected systems and multi-modal approaches, leveraging RGB images and LiDAR point clouds, have emerged as promising solutions. However, existing methods often assume the availability of all data modalities and connected vehicles during both training and testing, which is impractical due to potential sensor failures or missing connected vehicles. To address these challenges, we introduce a novel framework MMCD (Multi-Modal Collaborative Decision-making) for connected autonomous driving. Our framework fuses multi-modal observations from ego and collaborative vehicles to enhance decision-making under challenging conditions. To ensure robust performance when certain data modalities are unavailable during testing, we propose an approach based on cross-modal knowledge distillation with a teacher-student model structure. The teacher model is trained with multiple data modalities, while the student model is designed to operate effectively with reduced modalities. In experiments on connected autonomous driving with ground vehicles and aerial-ground vehicles collaboration, our method improves driving safety by up to 20.7%, surpassing the best-existing baseline in detecting potential accidents and making safe driving decisions.

Motivation
Descriptive Alt Text

A motivating scenario of multi-modal collaborative decision-making (MMCD) for connected autonomous driving. The purple dash line represents the exchange of vital information between vehicles, overcoming occlusions and sensor limitations. MMCD remains robust by leveraging available RGB data to enable the ego vehicle to take brake actions and avoid accidents even when its LiDAR is unavailable.

Connected Autonomous Driving
BEV
Ego view
C1 view
C2 view
Overtaking
BEV
Ego view
C1 view
C2 view
Left Turn
BEV
Ego view
C1 view
C2 view
Red Light Violation
Experimental Results
Performance Comparison Table
Performance comparison of MMCD against multiple baselines for connected autonomous driving with ground vehicles. MMCD shows superior performance across all evaluated scenarios and the improvement relative to STG is shown in bold.
Approach PS ↓ Overtaking Left Turn Red Light Violation
ADR ↑ IR ↑ ADR ↑ IR ↑ ADR ↑ IR ↑
COOPERNAUT 65.5KB 0.8813 0.8544 0.5071 0.7696 0.5446 0.8323
STG 4.9KB 0.9265 0.8336 0.6070 0.7670 0.6451 0.7846
AML 1.0KB 0.8206 0.8322 0.5000 0.7600 0.4175 0.7328
Non-collaborative
Case A (RGB, RGB) 1.0KB 0.8179 0.8169 0.4601 0.7268 0.3125 0.6756
Case B (LiDAR, LiDAR) 65.5KB 0.8228 0.8134 0.4059 0.7217 0.4018 0.7351
Case C (RGB + LiDAR, RGB + LiDAR) 66.5KB 0.8654 0.8205 0.5576 0.7483 0.5250 0.7504
Collaborative
Case A (RGB, RGB) 1.0KB 0.8522 0.8415 0.5642 0.7804 0.5176 0.8104
Case B (LiDAR, LiDAR) 65.5KB 0.8813 0.8544 0.5071 0.7696 0.5446 0.8323
MMCD (RGB + LiDAR, RGB + LiDAR) 66.5KB 0.9604 0.8676 0.7329 0.8122 0.6875 0.8578
Case D (RGB + LiDAR, RGB) 1.0KB 0.9288 0.8381 0.6632 0.7946 0.6607 0.8262
Improvement 4.9X 3.7% 4.1% 20.7% 5.9% 6.6% 9.3%