Handling Data Structure Issues with Machine Learning in a Connected and Autonomous Vehicle Communication System
Abstrak
Connected and Autonomous Vehicles (CAVs) remain vulnerable to cyberattacks due to inherent security gaps in the Controller Area Network (CAN) protocol. We present a structured Python (3.11.13) framework that repairs structural inconsistencies in a public CAV dataset to improve the reliability of machine learning-based intrusion detection. We assess the effect of training data volume and compare Random Forest (RF) and Extreme Gradient Boosting (XGBoost) classifiers across four attack types: DoS, Fuzzy, RPM spoofing, and GEAR spoofing. XGBoost outperforms RF, achieving 99.2 % accuracy on the DoS dataset and 100 % accuracy on the Fuzzy, RPM, and GEAR datasets. The Synthetic Minority Oversampling Technique (SMOTE) further enhances minority-class detection without compromising overall performance. This methodology provides a generalizable framework for anomaly detection in other connected systems, including smart grids, autonomous defense platforms, and industrial control networks.
Topik & Kata Kunci
Penulis (2)
Pranav K. Jha
Manoj K. Jha
Akses Cepat
- Tahun Terbit
- 2025
- Sumber Database
- DOAJ
- DOI
- 10.3390/vehicles7030073
- Akses
- Open Access ✓