www.listendata.com
Open in
urlscan Pro
216.239.38.21
Public Scan
URL:
https://www.listendata.com/2015/09/time-series-forecasting-arima-part-2.html
Submission: On November 09 via api from ZA — Scanned from DE
Submission: On November 09 via api from ZA — Scanned from DE
Form analysis
1 forms found in the DOMGET /search
<form action="/search" id="search-form" method="get" style="display: inline;"><input aria-label="Search" id="search-box" name="q" onblur="if(this.value=='')this.value=this.defaultValue;" onfocus="if(this.value==this.defaultValue)this.value='';"
type="text" value="Search..." vinput="">
<input id="search-button" type="submit" value="Go">
</form>
Text Content
* About * Index * Contact Menu * Home * SAS * Tutorials * SAS Certification * Interview Questions * Resumes * Clinical SAS * R * Python * Data Science * Credit Risk * SQL * Excel * Functions * Advanced Excel * Dashboard / Charts * VBA / Macros * Statistical Analysis * Resumes * Calculators * Others * Infographics * SPSS Home » SAS Statistics » SAS: Time Series Forecasting - ARIMA SAS: TIME SERIES FORECASTING - ARIMA Rajat 1 Comment SAS Statistics, Time Series In this tutorial, we will cover how to perform ARIMA with SAS, along with an explanation of how it works. Hope you have gone through the Part-1 of this series Table of Contents 1. Data Preparation Steps For ARIMA Modeling 1. Step 1 : Check the time series 2. Step 2 : Check the volatility of the series 3. Step 3 : Treatment of Volatile Series 4. Step 4 : Check For Non-Stationarity 5. Step 5 : Make Non-Stationary Data Stationary 6. Step 6 : Check Seasonality 7. Step 7 : Split Data into Training and Validation -------------------------------------------------------------------------------- DATA PREPARATION STEPS FOR ARIMA MODELING 1. Check if there is variance that changes with time - Volatility. For ARIMA, the volatility should not be very high. 2. If the volatility is very high, we need to make it non-volatile. 3. Check for Stationary - a series should be stationary before performing ARIMA. 4. If data is non-stationary, we need to make it stationary. 5. Check for Seasonality in the data Data File Location Tech and Gaming 0 seconds of 34 secondsVolume 0% Press shift question mark to access a list of keyboard shortcuts Keyboard ShortcutsEnabledDisabled Play/PauseSPACE Increase Volume↑ Decrease Volume↓ Seek Forward→ Seek Backward← Captions On/Offc Fullscreen/Exit Fullscreenf Mute/Unmutem Seek %0-9 facebook twitter Email pinterest Linkhttps://cdn.jwplayer.com/previews/JwSFvlzL Copied Auto180p720p540p360p270p180p Live 00:00 00:34 00:34 > Library - SASHELP > Data set - AIR STEP 1 : CHECK THE TIME SERIES As a matter of practice, we first plot the time series and have a cursory look upon it. It can be done directly in SAS using following code : > proc sgplot data = sashelp.AIR; > series x = date Y = AIR; > run; > quit; It would give you the following plot in the result window : SAS : Time Series Modeling It is clear from the chart above that the series of AIR is having an increasing trend and consistent pattern over time. The peaks are at a constant time interval which is indicative of presence of seasonality in the series. > This is a non-stationary series for sure and hence we need to make it > stationary first. Practically, ARIMA works well in case of such types of series with a clear trend and seasonality. We first separate and capture the trend and seasonality component off the time-series and we are left with a series i.e. stationary. This stationary series is forecasted using ARIMA and then final forecasting incorporates the pre-captured trend and seasonality. We would understand it in details further in Step-3. STEP 2 : CHECK THE VOLATILITY OF THE SERIES Volatility is the degree of variation of a time-series over time. For ARIMA, the volatility should not be very high. For checking the volatility of time-series, we do a scatter plot using the following SAS code : > Proc gplot data=SAShelp.AIR; > plot Date * AIR; > Run;Quit; It would give you the following plot in the result window : Check the volatility of Series The highlighted area is showing the diverging pattern (Fan shaped) of the scatter plot and hence depicting that the data is volatile. Ideally, the highlighted pattern should be parallel for ARIMA modeling. STEP 3 : TREATMENT OF VOLATILE SERIES We need to make the series non-volatile and move ahead. We would transform the AIR series and remove volatility. Generally a hit and trail method for transformation is used, but we would suggest to not to waste your time. Box-Cox Transformation can be used to help you out and recommend the suitable transformation. > Proc Transreg Data = sashelp.AIR; > Model BOXCOX (AIR) = Identity(Date); > Run; You get following plot along with Lamba value, which is "0" in this case. Now based on this Lambda value, you can decide the transformation. Take help from the table provided below. Box cox Transformation In our case, it is suggesting a log transformation, so we do the same. In a new data (Masterdata) we create a new variable (Log_AIR). Data Masterdata; Set SAShelp.AIR; Log_AIR = log(AIR); Run; We can check the volatility again of the transformed series, just to be sure, using scatter plot as elaborated above. STEP 4 : CHECK FOR NON-STATIONARITY Now on the transformed series, we check whether the series is stationary or non-stationary. For performing ARIMA , a series should be stationary, however if the series is non-stationary, we make it stationary (For more explanation on stationarity, read Part 1 of this series). Rather than identifying the series's stationarity visually as we have done in step 1, we now use Augmented Dickey-Fuller Unit Ratio Test for the same. Unit Root - Homogeneous Non-Stationarity Data Dickey-Fuller test > The Dickey-Fuller test is used to test the null hypothesis that the time > series exhibits a lag d unit root against the alternative of stationarity. Null Hypothesis : Non-Stationary Alternative Hypothesis : Stationary There are three types by which you can calculate test statistics of dickey-fuller test. 1. Zero Mean - No Intercept. Series is a random walk without drift. 2. Single Mean - Includes Intercept. Series is a random walk with drift. 3. Trend - Includes Intercept and Trend. Series is a random walk with linear trend. > All the above test statistics are computed from the OLS regression model. Drawback of ADF Test Uncertainty about what test version to use, i.e. about including the intercept and time trend terms. Inappropriate exclusion or inclusion of these terms substantially affects test reliability. > Using of prior knowledge (for instance, as result of visual inspection of a > given time series) about whether the intercept and time trend should be > included is the mostly recommended way to overcome the difficulty mentioned. We run Proc ARIMA with Stationarity = (ADF) option to do so : > PROC ARIMA DATA= Masterdata ; > IDENTIFY VAR = log_Air STATIONARITY= (ADF) ; > RUN; > QUIT; There are many outputs of the above code, a part of which is used for checking stationarity: ARIMA : Check Stationary Important Note : > Check Tau Statistics (Pr < Tau) in ADF Unit Root Tests table. It should be > less than 0.05 to say data is stationary at 5% level of significance. STEP 5 : MAKE NON-STATIONARY DATA STATIONARY Post establishing the non-stationarity of the series, we need to make the series stationary. Differencing process is used for making the series stationary. > Differencing : Transformation of the series to a new time series where the > values are the differences between consecutive values Differencing Procedure may be applied consecutively more than once, giving rise to the "first differences", "second differences", etc. Differencing Orders : 1st order : ∇xt = xt - xt-1 2nd order : ∇2xt = (∇xt - ∇xt-1) = xt - 2xt-1 + xt-2 > It is unlikely that more than two differencing orders would ever be required. Note : If there is a physical explanation for a trend or seasonal cycle : use regression to make series stationary. For that we use the output of the Step-3 code itself. While we have run the code above, we have got "Autocorrelation Check for White Noise" along with " Augmented Dickey-Fuller Unit Root Tests". Looking at "Autocorrelation Check for White Noise", we decide the order(s) of differencing required. Stationary : Order of Differencing A heat map has been made using Excel for demonstration, SAS output is black and white only. The first row of the above autocorrelation matrix shows correlation of time-series with 1st to 6th lags, second row show the same for 7th to 12th lags...and so on ... The same is visible in ACF chart provided in Step-3 visuals. > We can see that in above matrix the highest auto-correlation exists with 1st > lag, it starts decreasing but again increases to attain a local peak at 12th > lag. STEP 6 : CHECK SEASONALITY Highest Correlation with 1st Lag indicates towards the presence of trend and that with 12th lag indicates an annual seasonality. Hence we need to do differencing at first and Twelfths orders. We perform differencing and check the stationarity again. > PROC ARIMA DATA= masterdata ; > IDENTIFY VAR = Log_Air (1,12) STATIONARITY= (ADF) ; > RUN;quit; We have used 1 and 12 in bracket to define the 1st and 12th order of differencing. Check whether data is stationary > Check Tau Statistics (Pr < Tau) in ADF Unit Root Tests table again and see if > the value <0.05 to say data is stationary at 5% level of significance. How this differencing actually worked : 1. First order (1) Differencing removes the trend, but Seasonality still exists. 2. Second Order (12) Differencing removes the seasonality. How to do it with MS Excel: > First subtract first lag from each observation and plot it. Then in this new > series subtract 12th lag from each observation. STEP 7 : SPLIT DATA INTO TRAINING AND VALIDATION Now we can break the data into Training and Validation samples.We cannot use random sampling like we do in regression models to split the data. Instead, we can use recent data for validation and remaining data be used to train the model. We would develop ARIMA model and forecast on Testing part and would check the results on Validation part. > Data Training Validation; > Set Masterdata; > If date >= '01Jan1960'd then output Validation; > Else output Training; > Run; Next Step - Follow Part 3 of this series to learn how to train ARIMA model on a training dataset using SAS. > This article was originally written by Rajat Agarwal, later Deepanshu gave > final touch to the post. Rajat is an analytics professional with more than 8 > years of work experience in diverse business domains. He has gained expert > knowledge in Excel and SAS. He loves to create innovative and imaginative > dashboards with Excel. He is founder and lead author cum editor at Ask > Analytics. Related Posts * How to Calculate Correlation in SAS (with Examples) * SAS : Calculate AUC of Validation Data * SAS: Time Series Forecasting - ARIMA [Part 3] * How to Build a Random Forest Model in SAS * How to Build a Decision Tree in SAS SAS Tutorials : Top 100 SAS Tutorials Spread the Word! Share Share Tweet 1 Response to "SAS: Time Series Forecasting - ARIMA" 1. AnonymousJuly 24, 2017 at 6:22 AM despite doing everything - using MINIC, my autocorrelation is still significant, what should I do Autocorrelation Check of Residuals To Chi- Pr > Lag Square DF ChiSq --------------------Autocorrelations-------------------- 6 19.46 4 0.0006 -0.045 -0.094 0.282 -0.178 0.079 0.279 12 41.37 10 <.0001 -0.260 0.024 0.311 -0.199 0.005 -0.121 18 64.20 16 <.0001 -0.275 0.212 -0.075 -0.193 0.207 -0.072 24 92.59 22 <.0001 -0.125 0.151 -0.180 -0.168 0.272 -0.250 ReplyDelete Replies Reply Add comment Load more... Next → ← Prev Home Subscribe to: Post Comments (Atom) Follow us on Facebook Privacy Terms of Service Looks like you are using an ad blocker! To continue reading you need to turnoff adblocker and refresh the page. We rely on advertising to help fund our site. Please whitelist us if you enjoy our content. Diese Website verwendet Cookies von Google, um Dienste anzubieten und Zugriffe zu analysieren. Deine IP-Adresse und dein User-Agent werden zusammen mit Messwerten zur Leistung und Sicherheit für Google freigegeben. So können Nutzungsstatistiken generiert, Missbrauchsfälle erkannt und behoben und die Qualität des Dienstes gewährleistet werden.Weitere InformationenOk Update Privacy Preferences A Raptive Partner Site