DOAJ Open Access 2025

Optimizing Data Pipelines for Green AI: A Comparative Analysis of Pandas, Polars, and PySpark for CO2 Emission Prediction

Youssef Mekouar Mohammed Lahmer Mohammed Karim

Abstrak

This study evaluates the performance and energy trade-offs of three popular data processing libraries—Pandas, PySpark, and Polars—applied to GreenNav, a CO2 emission prediction pipeline for urban traffic. GreenNav is an eco-friendly navigation app designed to predict CO2 emissions and determine low-carbon routes using a hybrid CNN-LSTM model integrated into a complete pipeline for the ingestion and processing of large, heterogeneous geospatial and road data. Our study quantifies the end-to-end execution time, cumulative CPU load, and maximum RAM consumption for each library when applied to the GreenNav pipeline; it then converts these metrics into energy consumption and CO2 equivalents. Experiments conducted on datasets ranging from 100 MB to 8 GB demonstrate that Polars in lazy mode offers substantial gains, reducing the processing time by a factor of more than twenty, memory consumption by about two-thirds, and energy consumption by about 60%, while maintaining the predictive accuracy of the model (R2 ≈ 0.91). These results clearly show that the careful selection of data processing libraries can reconcile high computing performance and environmental sustainability in large-scale machine learning applications.

Topik & Kata Kunci

Electronic computers. Computer science

Penulis (3)

Youssef Mekouar

Mohammed Lahmer

Mohammed Karim

Format Sitasi

APA MLA BibTeX

Mekouar, Y., Lahmer, M., Karim, M. (2025). Optimizing Data Pipelines for Green AI: A Comparative Analysis of Pandas, Polars, and PySpark for CO2 Emission Prediction. https://doi.org/10.3390/computers14080319

Akses Cepat

PDF tidak tersedia langsung

Cek di sumber asli →

Lihat di Sumber doi.org/10.3390/computers14080319

Informasi Jurnal

Tahun Terbit: 2025
Sumber Database: DOAJ
DOI: 10.3390/computers14080319
Akses: Open Access ✓

Optimizing Data Pipelines for Green AI: A Comparative Analysis of Pandas, Polars, and PySpark for CO<sub>2</sub> Emission Prediction

Abstrak

Topik & Kata Kunci

Penulis (3)

Format Sitasi

Akses Cepat