Autonomous systems have advanced significantly, but challenges persist in accident-prone environments where robust decision-making is crucial. A single vehicle's limited sensor range and obstructed views increase the likelihood of accidents. Multi-vehicle connected systems and multi-modal approaches, leveraging RGB images and LiDAR point clouds, have emerged as promising solutions. However, existing methods often assume the availability of all data modalities and connected vehicles during both training and testing, which is impractical due to potential sensor failures or missing connected vehicles. To address these challenges, we introduce a novel framework MMCD (Multi-Modal Collaborative Decision-making) for connected autonomous driving. Our framework fuses multi-modal observations from ego and collaborative vehicles to enhance decision-making under challenging conditions. To ensure robust performance when certain data modalities are unavailable during testing, we propose an approach based on cross-modal knowledge distillation with a teacher-student model structure. The teacher model is trained with multiple data modalities, while the student model is designed to operate effectively with reduced modalities. In experiments on connected autonomous driving with ground vehicles and aerial-ground vehicles collaboration, our method improves driving safety by up to 20.7%, surpassing the best-existing baseline in detecting potential accidents and making safe driving decisions.
A motivating scenario of multi-modal collaborative decision-making (MMCD) for connected autonomous driving. The purple dash line represents the exchange of vital information between vehicles, overcoming occlusions and sensor limitations. MMCD remains robust by leveraging available RGB data to enable the ego vehicle to take brake actions and avoid accidents even when its LiDAR is unavailable.
Approach | PS ↓ | Overtaking | Left Turn | Red Light Violation | |||
---|---|---|---|---|---|---|---|
ADR ↑ | IR ↑ | ADR ↑ | IR ↑ | ADR ↑ | IR ↑ | ||
COOPERNAUT | 65.5KB | 0.8813 | 0.8544 | 0.5071 | 0.7696 | 0.5446 | 0.8323 |
STG | 4.9KB | 0.9265 | 0.8336 | 0.6070 | 0.7670 | 0.6451 | 0.7846 |
AML | 1.0KB | 0.8206 | 0.8322 | 0.5000 | 0.7600 | 0.4175 | 0.7328 |
Non-collaborative | |||||||
Case A (RGB, RGB) | 1.0KB | 0.8179 | 0.8169 | 0.4601 | 0.7268 | 0.3125 | 0.6756 |
Case B (LiDAR, LiDAR) | 65.5KB | 0.8228 | 0.8134 | 0.4059 | 0.7217 | 0.4018 | 0.7351 |
Case C (RGB + LiDAR, RGB + LiDAR) | 66.5KB | 0.8654 | 0.8205 | 0.5576 | 0.7483 | 0.5250 | 0.7504 |
Collaborative | |||||||
Case A (RGB, RGB) | 1.0KB | 0.8522 | 0.8415 | 0.5642 | 0.7804 | 0.5176 | 0.8104 |
Case B (LiDAR, LiDAR) | 65.5KB | 0.8813 | 0.8544 | 0.5071 | 0.7696 | 0.5446 | 0.8323 |
MMCD (RGB + LiDAR, RGB + LiDAR) | 66.5KB | 0.9604 | 0.8676 | 0.7329 | 0.8122 | 0.6875 | 0.8578 |
Case D (RGB + LiDAR, RGB) | 1.0KB | 0.9288 | 0.8381 | 0.6632 | 0.7946 | 0.6607 | 0.8262 |
Improvement | 4.9X | 3.7% | 4.1% | 20.7% | 5.9% | 6.6% | 9.3% |