About the Project
This project addresses environmental monitoring by predicting and forecasting the Air Quality Index (AQI) for over 203 cities.
It combines traditional machine learning models with time-series forecasting techniques to provide accurate, real-time pollution insights via an interactive dashboard.
Project Goals
- To predict AQI with high accuracy using pollution and weather data.
- To forecast future air quality trends using time-series analysis.
- To visualize real-time data on an interactive map and dashboard.
- To implement MLOps practices for model monitoring and retraining.
Technologies Used
- XGBoost & Random Forest: For regression tasks and AQI prediction.
- LSTM & ARIMA: For time-series forecasting of pollution levels.
- Pandas & NumPy: For preprocessing 575k+ pollution and weather records.
- Streamlit: For deploying the interactive user dashboard.
- MLOps: Pipelines for model drift detection and automated retraining.
Approach
Data Engineering
- Preprocessed over 5.75 lakh records.
- Engineered fusion features combining pollution and weather data.
- Performed statistical analysis to identify key drivers of AQI.
Modeling
- Built Random Forest and XGBoost models for current AQI prediction.
- Implemented ARIMA, GRU, and LSTM networks for forecasting future trends.
Deployment
- Developed an interactive web dashboard using Streamlit.
- Deployed the application with real-time data integration.
Results & Insights
- Achieved 87% accuracy in AQI prediction.
- Successfully scaled to cover data for 203 cities.
- Real-time dashboard provides actionable insights for environmental monitoring.