hello this is the entirity of adhoc code used used to collect scrape and train models for my SIH project.
- data is sourced from State load Dispatch Center and Grid India reports
- the grid report data was in the form of pdfs and was converted to xlsx and then those xlsx files were merged and turned into a consistant dataset.
- data from the SLDC(with 5 minute time frequency) was retrieved using power query M language(scripting language for MS excel).
- would like apologise to anyone who had to look through this absolute mess of a codebase.
- as the data was complete and was definitely accurate so we shifted our focus on testing and trying diffrent models, cross refrencing and validating our results.
- here are our findings:
- the arima plus model provided by vertex ai for on demand custom training was the most inaccurate one and yielded around 89% accuracy
- Seasonal Autoregressive Moving Average(SARIMA) model was one of the best choices we could pursue and here are the specific values we used:
- the model was stationary and had a p-value 0(could be a methodological error)
- the ADF statistic was : -23.047657363554325
- the p , q and r values were taken to be 1,1 and 1 respectively
- P , Q , R and m values were taken to be 0, 1 ,5 and 12 respectively
- It is pretty apparent that the data is seasonal in nature as we encounter steep peaks in the months of May and they slowly taper off till December.
- other than being seasonal it has nonsensical spikes and dips that need to be explained on a monthly, weekly and daily level for it to be coheasive in predicting the demand for a 5 minute interval on a specific day.
- there are periodic peaks and downturns experienced probably influenced by weekends.
- The data fluctuates significantly over 24 hours, indicating dynamic behavior during the day.
- Peaks are observed in the late evening and during working hours (morning to early evening).
- There is a sharp drop in the metric during the night (early morning hours, between 2:00 AM to 6:00 AM).
- There is a gradual rise post-6:00 AM, reaching a stable level during the daytime and declining again post-evening.