Autoregressive Integrated Moving Average (ARIMA) and Long Short-Term Memory (LSTM) Network Models for Forecasting Energy Consumptions

— Unlike other sources of energy, electricity can't be stored. Therefore, an estimation of Energy Consumption (EC) with good accuracy is required to manage demand and supply in the smart grid. Not only good accuracy, but reliability is also on-demand in the prediction model to optimize resource allocation. Therefore, in this study we have implemented and examine two different models: a machine learning model, Autoregressive Integrated Moving Average (ARIMA), and a deep learning-based model Long Short-Term Memory (LSTM). Although ARIMA showed powerful statistical analysis and less robustness, LSTM demonstrated highly accurate results which may stop us to lead false alarming of over-demand and low consumption of energy. In last, we have concluded our result by presenting significant improvement in forecasting energy by LSTM using various evaluation criteria e.g., Mean Square Error (MSE), Root Mean Square Error (RMSE), and other normalized matrices.


I. INTRODUCTION
Electric vehicles are the revolution in terms of environmental pollution [1]. Stringent rules and guidelines are set by various countries from all over the world and formulated CO2 emission control. This increased demand for electric vehicles. These initiatives made by government's has produced major demand to elevate the electric vehicle range for each charging cycle [2]. Furthermore, EV manufacturing companies have provided option to home charging, ultra-fast charging stations as well as on road charging capabilities. This will add another new feature to the convenience of consumers [3]. Even, various trending technology like cloud computing optimizes the chagrining infrastructure [4], but to achieve their demands of charging, we need a tremendous amount of power, and we can do that either by forming new power plants or by proper utilization of existing power plants by forecasting energy requirements for different areas.
We can use different tools stacks such as manual estimation or statistical analysis to forecast the energy demand. But, for the last few years, AI is playing an important role to forecast energy [5], [6]. And most of the work has already been done on this kind of project by using such stacks for example from machine learning [7], [8], they had used support vector machine (SVM) [9], different unsupervised learning models e.g., PCA [10] and through different deep learning models also and no doubt that they got the good results as well. But no one has a proper dataset, all they have hypothetical data, and there is no comparison between the different learning models which gives us better and more efficient results.
We used the dataset of Columbia and took from PJM Interconnection LLC which will eventually improve the prediction demand. Analysis and management of several activities such as market acquisition, day-end planning, unit commitments, and economic expenditure require the realization of future demand and load requirements [11]. Therefore, we worked on two different deep learning models i.e., LSTM [12] and ARIMA [13] models.
LSTM is a Memory Short Term network that is a special type of Recurrent Neural Network (RNN) and the latest version of Artificial Neural Network (ANN) that can learn long-term dependence. They work very well on many different problems and are now widely used. LSTMs are designed to avoid the problem of long-term reliance. Longterm memory is their automatic behavior, not something they strive to learn. LSTM can delete or add information in the next layer, carefully controlled by past results. Whereas ARIMA is known for representing Auto-Regressive Integrated Moving Averages. It is a model segment that captures a set of standard temporary frames in time series data. This Forecasting can also be used in the determination of coming sales in different businesses.
The manuscript is organized as follows: Section II, "Modeling", describe the detail of the dataset used in the proposed model, models formulation, and evaluation criteria. Section III, "Implementation", deals with the implementation of both the models. Furthermore, the results and discussion are given in Section IV. Section V gives the concluding remarks about the proposed approach and its performance.

II. MODELLING
In this study, we have merged these models to forecast the consumption of energy; the ARIMA and LSTM. Modeling for both systems is divided into the following three steps: A) Data description and preprocessing, B) Model formulation, and C) Evaluation criteria that are described hereunder: @ Samir M. Shariff Autoregressive Integrated Moving Average (ARIMA) and Long Short-Term Memory (LSTM) Network

A. Data Description and Preprocessing
The data used in this study and analyzing the models were accessed from a regional transmission organization of Columbia named PJM Interconnection LLC. Data has hourly consumption of energy in Megawatts (MW). To formulate and implementation of the models, we have split the data into two parts: first, data of one year for training, and a month data for testing.

III. MODEL FORMULATION
A. ARIMA ARIMA is not a novel approach as it is classical statistical analysis. While its model is based on a time series it combines both Auto Regression (AR) along with the concept of Moving Average (MA). We initially apply a test on the provided sample data. This will help us in providing the eligibility of these date before relating it to our ARIMA model. Here we choose to conduct our eligibility using the Augmented Dickey-Fuller (ADF) test.

1) ADF Test
This testing approach is done to check the stationarity of the provided sample data that is dependent to the unit root. This test is conducted using the following hypothesis, Null Hypothesis (H0): A unit root exists.
Alternative Hypothesis (H1): Unit root does not exist.
where, our statistical hypothesis root test is defined as: ̂ can be obtained from the following equation: However, is a constant value, is defined as the coefficient of time-trend, is the number of autoregressive terms, and ̂ is the standard deviation of ̂. In our case, the value of Ho is rejected while H1 is accepted is demonstrated by the tabulated values of the sample data being higher than the tabulated values. In addition, the p-value is given as 0.05 that has an interval based incremental values. This shows an acceptable statistical rate of the values of hypothesis H1.
Furthermore, our data is processed with no more limitation as to adopt the ARIMA model. We adjust our data to fit into an ionospheric containment. Hence, we will introduce the following parameter. Auto-Regressive Terms (p), Integrated-Process (d), and Moving-Average Terms (q).

2) Auto-Regressive Terms (p)
This parameter defines the linearity of time (t) dependance to the variable. This is represented by the following (3): It is apparent in (3), that has a linear combination of the past time-series terms with the coefficient's terms of ( ) as being 1 , 2 , 3 … and is a random shock at time t.

3) Integrated-Process (d)
Most time-series are observed to have a trend (vertical, and horizontal) approach; this causes instability in the data. Furthermore, this instability may be reduced by differentiating the cumulative terms. Considering a high irregularity in the data, then difference of two successive terms can be a constant and is given in (4):

4) Moving Average Terms (q)
This is a linear containment of various errors reported successively. Where q defines the number of possible terms that will reduce this error of each time interval t and is defined in (5): Considering (3), (4), and (5), will express the ARIMA model (p, d, q) as follows: where is a constant value.

B. LSTM
For the Long Short Term Memory has a distinctive approach as its time series date have both dependent and independent variables. These variables even though are dependent they depend upon an independent variable. That is the target attribute depends on it. An observer is to be installed in order to control the feedback. Similarly, the patterns in each time series of the provided data must follow a certain data of the EC. On regular basis, EC have to maintain certain levels (min/max) values of minor erratum. This erratum is very high particularly during peak events. For example, it is affected by regular and irregular usage such as weekdays events, weekend activities, weather, etc.) ANN is an advanced process in such that it is able to recognize and self-learn the non-regular activities that are recorded and analyzed by the given data as well as the upcoming events. In this case the EC time series data will correlate with the past terms. For the next event we will introduce an RNN. The RNN is utilized to pass past self-learned nodes that are nearby. Even though RNN gives good data for our analysis, but we introduce TEC. TEC will provide series for a longterm dependent value. These values will be stored in its memory. Our observer will correlate our patterns and provide a long distance from the current predictions. This will give us a better prediction of the utilization of the forecast. While RNN can't keep any memory of the nodes in its processer, we work out a solution using the LSTM. This will overcome the vanishing gradient situation and will have what is known to be the LSTMRNN network. This will make learning and forecasting better. Fig. 1 Represents the component of a one hidden layer node. This provides each of its components to study and formulate a LSTM that has gates for Input, Forget, and Output along with a Cell State.
Let input is Xt, weight vector = [ ] , bias = [ ] , and output Ot at time t. The forget gate is used to throw the information from the memory which is not necessary for forecasting and this task process by a sigmoid layer.
Now, the second important task is to consider the information which is necessary for the forecasting, and this responsibility is fulfilled by the input gate layer.
Now, there is a need to do updates in cell states ( −1 to ) forgetting the things not wanted to remain in our observations. After this, to scale values of the state, we add * ~.
C t =f t *C t-1 +i t *C t~ (10) We decide our output values by using sigmoid and tanh function (that gives value between -1 and 1) of cell state as follows: where, , is our final output from the LSTM network at time t.

C. Model Evaluation Criteria
Various metrics employed to evaluate the performance of the purposed models; MSE, RMSE, Normalized Mean Square Error (NMSE), Normalized Root Mean Square Error (NRMSE), and Standard Deviation (SD). The calculation methods for each are the following: where n is the number of the observer, EC stands for Energy Consumption.

IV. IMPLEMENTATION
A. ARIMA Implementation As we have discussed in the previous section, applying ARIMA we need to consider providing stationarity-tests. We have tested the ADF Test on the dataset. The test results from sample data are (-3.431041, -2.861845, -2.566932). These results are clearly higher than the provided tabulated Statistical values (-9.514361). This indicate that H0 is rejected and H1 is accepted. Furthermore, p-value is less compared to %5 indicated time interval. This indicate that the acceptance rate is higher as an alternative hypothesis (H1) showed. As a result, a stable model has been satisfied. Thus, plotting ACF vs. PACF gives us a better understanding of the acceptance for the ARIMA parameters that are set at (p = 2, and q = 5). Hence, we conducted stability tests for the sample date and resulted in non-transformation of the original data sample. Thus, d will be set to 1 and furthermore, the original time series will not undergo any differentiation. Finally, we train the set data set for deep learning via the model of the provided ARIMA. The will fit into the model for the data set to provide an accurate forecast EC for the upcoming period of events.

B. LSTM Implementation
In this experiment, we have examined LSTM. It is noted that the input layer size 12 along with the hidden layer of the same size is optimal to forecast the EC. Furthermore, EC dipped down by a 20 percent rate for each hidden layer that was introduced for the same size of 12 for a single node output layer. This has given an optimal prediction and forecast of EC. In addition, the overfitting problem has had an acceptable rate of difference in ANN structure. Finally, we have fitted LSTM with a 30 iterations and batch size of 12 in order to accommodate an optimal weight for each edge.

V. RESULTS AND DISCUSSIONS
We have made a correlation to accommodate both models in order to achieve an optimal forecast value for the models of ARIMA along with the LSTM models. Based on the evaluation criteria, we have seen significant improvement in the LSTM model. We have recorded NRMSE of 0.22 MW, 0.03 MW in ARIMA and LSTM, respectively. Fig. 3 represents the forecast values of EC from both the models.

VI. CONCLUSION
This work has shown that it is evident that, the EVs are growing rapidly the energy demands. In addition, it is increasing yet manageable for the demand and supply of energy appropriately, we need to know an estimated value of EC. Therefore, in this study, we have employed machine learning based model of ARIMA, including the approach of deep learning-based model LSTM. In summary, we list the results obtained that for the EC forecast we noticed that both ARIMA and LSTM showed optimal solutions. Even though LSTM is accurate comparted to ARIMA but still LSTM reduce the number of false over demands. Furthermore, LSTM has a robust and optimal solution once activated alongside the ARIMA model. Further analysis will focus on improving the time complexity of the programable code that is based on the outcomes of the deep learning methodologies.

CONFLICT OF INTEREST
Authors declare that they do not have any conflict of interest.