Markov Switching Auto Regression (MS-AR) Algorithm

Since the last update I’ve spent a few late nights finishing off the full code implementation of the Markov Switching Auto Regression algorithm (MS-AR), this means not just finding the and values for each of the states but fitting a regression model with a pre-specified auto regressive order to each state, so each state develops a unique relationship between periodic returns within the data.

This implementation is built on top of the Expectation Maximisation (EM) algorithm which is an algorithm that iteratively tunes each states parameters to converge to weights for each states’ regression function. So as a result the algorithm is essentially built in 3 different sections:


Initialisation

Due to the nature of the EM algorithm its circular so to begin we estimate some starting values that can then be used to kickstart the iterative process of tuning parameters - think of this as the starter motor in a car, it gets the engine going initially. So for this we just need to estimate initial values for each states regressive function, this can be done through a standard clustering algorithm where we cluster the data into clusters. Next we perform Ordinary Least Squares regression (OLS) on each cluster individually, this is like performing Weighted Least Square regression but because we have used hard probabilities - ie a value either is or isn’t in the given cluster - theres no need to multiply by probabilities of being in each state/cluster so it’s simpler just applying the OLS algorithm to each cluster individually due to having the same effect.

The formula for OLS regression is:

Now we have estimated parameters for each state’s regression algorithm we can produce a standard deviation, for this we calculate a prediction for each periodic return based on the each state’s regression algorithm - so producing a matrix with a shape of where is the number of periodic returns in our data set and is the number of identified states - then for every value in each state we calculate the residual of the prediction then finally we filter each residual based on what cluster its target value is and use all residuals belonging to target values in cluster to calculate the initial standard deviation for state hence giving us the initial values we need for for the EM algorithm - regression function coefficients ( values) and standard deviation () for each state


Expectation Step

The aim of the expectation step is to get the probability that each value is in each state - so the output would be an matrix where is the number of individual periodic return data points - the size of the autoregressive order (ARO) and is the number of states being used within the system.

First we generate a new regressive prediction for each state using each state’s regressive coefficients for every data value in the time series data. This is achieved by using a feature matrix - - that has dimension . Each row within this matrix has the structure , where is the target data value - in this case periodic returns - which leaves us with a matrix of time lagged data. By using this matrix it makes it much simpler to calculate the prediction for each state based on the state’s regressive function coefficients as we just perform a dot product of the time lagged data - - by the vector of the state’s regression function coefficients.

After we have these predictions for each value in every state we then calculate the residual of the prediction in each state to the actual target value - . Then using the residuals we can use the current state’s calculated value along with an assumed of 0 and plug them into a continuous data probability density function to calculate the likelihood that the given residual is in the given state. The formula for the probability density function (PDF) for a normal distribution is:

The final step is to then row normalise the likelihoods across each state and scale them to sum to 1 whilst maintaining their relative difference hence giving us the probability each residual is in each state and therefore rounding out the expectation step of the algorithm.


Maximisation Step

This is the step where we re-tune the regression function coefficients and weighted standard deviation based on the probabilities produced in the expectation step in the algorithm. First we take the the probability values generated by the expectation step for each residual in every state. Then we calculate the new values for the given state by using a Weighted Least Squares (WLS) function that takes the Weights (probability each residual is in the given state), , the time lagged periodic returns matrix from the expectation step - , and finally the target/actual value for each data point in the time series data - . Then instead of having to iteratively tune the regression function coefficients we just use a series of matrix multiplications to instantly calculate the optimal regression coefficients.

The function for WLS:

Then represents the values for the regression function of the given state which we use to calculate new residuals for every value in each state, then the final step is to take these new residuals, multiply them by their corresponding weights calculated in the expectation step and then find the standard deviation of the resulting values for each state to get the given state’s weighted standard deviation. The maximisation step then returns the new values and standard deviation for each state.

The code for this can be found on my git hub page within the market-trading-algorithm repository.