Novel Synthetic Dataset Generation Method with Privacy-Preserving for Intrusion Detection System
Abstrak
The expansion of Internet of Things (IoT) networks has enabled real-time data collection and automation across smart cities, healthcare, and agriculture, delivering greater convenience and efficiency; however, exposure to diverse threats has also increased. Machine learning-based Intrusion Detection Systems (IDSs) provide an effective means of defense, yet they require large volumes of data, and the use of raw IoT network data containing sensitive information introduces new privacy risks. This study proposes a novel privacy-preserving synthetic data generation model based on a tabular diffusion framework that incorporates Differential Privacy (DP). Among the three diffusion models (TabDDPM, TabSyn, and TabDiff), TabDiff with Utility-Preserving DP (UP-DP) achieved the best Synthetic Data Vault (SDV) Fidelity (0.98) and higher values on multiple statistical metrics, indicating improved utility. Furthermore, by employing the DisclosureProtection and attribute inference to infer and compare sensitive attributes on both real and synthetic datasets, we show that the proposed approach reduces privacy risk of the synthetic data. Additionally, a Membership Inference Attack (MIA) was also used for demonstration on models trained with both real and synthetic data. This approach decreases the risk of leaking patterns related to sensitive information, thereby enabling secure dataset sharing and analysis.
Topik & Kata Kunci
Penulis (5)
JaeCheol Kim
Seungun Park
Jaesik Cha
Eunyeong Son
Yunsik Son
Akses Cepat
- Tahun Terbit
- 2025
- Sumber Database
- DOAJ
- DOI
- 10.3390/app151910609
- Akses
- Open Access ✓