Algorithmic Portfolio Optimization in Python
Author :: Kevin Vecmanis
In this installment I demonstrate the code and concepts required to build a Markowitz Optimal Portfolio in Python, including the calculation of the capital market line. I build flexible functions that can optimize portfolios for Sharpe ratio, maximum return, and minimal risk.
In this article you will learn:
- How to fetch stock market data from Quandl
- How to create a portfolio simulation function
- What the Markowitz Bullet is and how to plot one
- What the optimization process is all about
- What minimization functions are
- How to create an optimization function with scipy.optimize
- What the efficient frontier is and how to plot it
- What the capital market line is and how to plot it
Table of Contents
- Introduction
- Fetching data from Quandl
- Portfolio Simulation Function
- The Markowitz Bullet
- The Optimization Process
- Minimization Functions
- The Optimization Function
- The Efficient Frontier
- The Capital Market Line
Introduction
Portfolio optimization is a mathematically intensive process that can be accomplished with a variety of optimization functions that are freely available in Python.
In Part 1 of this series, we’re going to accomplish the following:
- Build a function to fetch asset data from Quandl.
- Show how this data can be converted into return matrix and a covariance matrix.
- Show how to simulate a basket of thousand of portfolios using the same assets.
- Show how portfolio weights can be optimized for either volatility, returns, or Sharp Ratio.
- Build the Markowitz efficient frontier.
- Build the Capital market line.
- Calculatet the optimal portfolio weights based on the intersection of the capital market line with the efficient frontier.
The theory behind the capital market line and efficient frontier is outside the scope of this post, but plenty of material is available with a quick google search on the topic. Explanations of concepts will be provided throughout this post as required. I assume here that the reader has a basic familiarity with modern portfolio theory (MPT).
Fetching data from Quandl
The first function we define pulls assets from Quandl based on a list of ticker names that we provide in the variable ‘assets’.
We’re going to complete this post by optimizing portfolio weights for a basket of five assets:
TLT: Long bond ETF
GLD: Gold
SPY: S&P 500 ETF
QQQ: Nasdaq ETF
VWO: Emerging Market ETF
Quandl data comes with a bunch of different column headers - but here we will strip out only the adjusted closes of each asset by creating a mask.
['TLT_Adj_Close', 'GLD_Adj_Close', 'SPY_Adj_Close', 'QQQ_Adj_Close', 'VWO_Adj_Close']
Now our dataframe will only contain columns with the adjusted closes listed above.
By plotting the normalized adjusted closes we can see the relative performance of each asset. The ideal portfolio will benefit from assets that tend to covary in opposing ways.
Next we can calculate the daily average returns for each asset in the dataset by doing the following
TLT_Adj_Close 0.000226
GLD_Adj_Close 0.000307
SPY_Adj_Close 0.000329
QQQ_Adj_Close 0.000493
VWO_Adj_Close 0.000269
dtype: float64
To get the average annualized returns we multiple by 252 trading days
TLT_Adj_Close 0.057061
GLD_Adj_Close 0.077453
SPY_Adj_Close 0.083012
QQQ_Adj_Close 0.124264
VWO_Adj_Close 0.067879
dtype: float64
TLT_Adj_Close GLD_Adj_Close SPY_Adj_Close QQQ_Adj_Close VWO_Adj_Close
TLT_Adj_Close 0.000076 0.000012 -0.000043 -0.000042 -0.000057
GLD_Adj_Close 0.000012 0.000142 0.000005 0.000001 0.000037
SPY_Adj_Close -0.000043 0.000005 0.000141 0.000137 0.000189
QQQ_Adj_Close -0.000042 0.000001 0.000137 0.000161 0.000188
VWO_Adj_Close -0.000057 0.000037 0.000189 0.000188 0.000339
Likewise, we can get the annualized covariance matrix for these 5 assets accordingly
TLT_Adj_Close GLD_Adj_Close SPY_Adj_Close QQQ_Adj_Close VWO_Adj_Close
TLT_Adj_Close 0.019235 0.003047 -0.010901 -0.010618 -0.014447
GLD_Adj_Close 0.003047 0.035806 0.001322 0.000274 0.009242
SPY_Adj_Close -0.010901 0.001322 0.035492 0.034597 0.047544
QQQ_Adj_Close -0.010618 0.000274 0.034597 0.040468 0.047254
VWO_Adj_Close -0.014447 0.009242 0.047544 0.047254 0.085304
The following single line of code generates a random array of weights that sum to 1.0. In the portfolio, one of the assumptions is that all funds will deployed to the assets in the portfolio according to some weighting.
[0.1158917 0.40789785 0.08818814 0.12767493 0.26034738]
From these weights, we can calculate the expected weighted return of the portfolio of assets using these random weights.
0.07906374674710082
The next thing we do is calculate the portfolio variance by way of the following. Here we’re using np.dot
to take the dot product of the three arguments. Weights is transposed into a column matrix from a row matrix. It might look fancy and confusing, but without transposing the weights we would end up multiplying all variances by all weights, which isn’t what we want.
0.020002880597943383
0.14143154032231772
Quick summary of what we just did!
The previous lines of code generated the portfolio mean return and portfolio volatility for one set of randomly selected weights. In order to find an optimal solution, we need to repeat this process iteratively many thousands of times to determine what the optimal asset weights might be. We’re going to do this next.
While we’re at it, we might as wrap all of this up into a function.
The Portfolio Simulation Function
Here we’ll pass our list of assets to the portfolio_simulation
function and have it randomly generate 3000 portfolios and plot them by their volatility and return.
The colorbar shows us the sharp ratio. Note that the sharp ratio calculation here assumes the risk-free rate is 0
Elapsed Time: 16.77 seconds
The Markowitz Bullet
The resulting plot above is called the Markowitz Bullet. You might have noticed that the sprawl of dots - each representing one portfolio in the simulation - starts to form a sideways parabola. This shape lends itself extremely well to quadratic optimization functions because there is only one truly global minima and no other “false minima” that the optimization algorithm might get “stuck in”.
The Optimization Process
All optimization and minimization functions require some kind of metric to optimize on - usually this means minimizing something. This often involves tradeoffs because even though multi-variables can be considered, typically you can only minimize on score metric. In this example, we’re going to try optimizing on three seperate metrics just to get the hang of this. The metrics will be:
- Sharpe Ratio: Risk adjusted returns. This will create the portfolio with the highest return per unit of incurred risk.
- Variance (risk): Purely risk. This will create the portfolio with the lowest risk
- Pure Return: Purely return. This will create the portfolio with the highest return, regardless of risk.
To do this, let’s define functions that will generate all of these metrics for us and package them into a dictionary that we can pass to our soon-to-be created minimization functions.
Next, if we want to optimize based on the sharpe ratio we need to define a function that returns only the sharpe ratio. Since our optimization functions naturally seek to minimize, we can minimize one of two quantities: The negative of the sharpe ratio, (or 1/(1+Sharpe Ratio). Accordingly, if the sharpe ratio increases both of these quantities will decrease. We’ll choose the negative of sharpe for this example.
Minimization Functions
The next thing we need to is introduce the optimization function we’ll use, and show how to seed the initial constraints, bounds, and parameters!
The Optimization Function
The scipy.optimize function accepts several parameters in order to optimize on your desired variable. Some of these are especially important in the portfolio optimization process.
- constraints: In this case, our key constraint is that all the portfolio weights should sum to 1.0. What this means, practically, is that all of our cash should be invested in an asset or ETF.
- bounds: Bounds is going to refer to how much of our portfolio one asset can take up, from 0.0 to 1.0. 0.0 being a 0% position, and 1.0 being a 100% position (That stock or ETF is our only holding). Note that we can change this if we want so that we don’t take on too much concentration risk. Concentration risk is the loss of diversification benefits you can encouter if one stock or ETF takes up too much of your portfolio. In reality, you might want to set these bounds to (0, 0.2), which means a single stock can only take up a maximum of 20% of the portfolio.
- initializer: Initializer just sets the initial weights of the optimization algorithm so that it has a starting point. Here we’ll just set them so that each stock takes up an equal percentage of the portfolio.
Click here to see the detailed documentation for this function.
[0.2, 0.2, 0.2, 0.2, 0.2]
((0, 1), (0, 1), (0, 1), (0, 1), (0, 1))
The output we get looks like this.
fun: -0.9855053843923874
jac: array([-2.30282545e-04, -4.28661704e-04, 1.79314971e-01, 4.32737172e-04,
9.32403825e-01])
message: 'Optimization terminated successfully.'
nfev: 50
nit: 7
njev: 7
status: 0
success: True
x: array([4.56881960e-01, 1.50685149e-01, 0.00000000e+00, 3.92432892e-01,
9.63567740e-17])
Our import variable here is the last line, x. These represent the portfolio weights that produce the best sharpe ratio! But we’re missing our ticker names, so we can just do something like this to add some meaning:
[('TLT', 0.4569), ('GLD', 0.1507), ('SPY', 0.0), ('QQQ', 0.3924), ('VWO', 0.0)]
Now, by calling our portfolio_stats
function we can quantify the performance using these weights.
So what have we done here? We’ve run the optimization function by maximizing the Sharpe Ratio (minimizing the negative of the Sharpe Ratio). Accordingly, the portfolio weights that are spit out will provide us with a portfolio optimized for Sharpe.
This tells us that a portfolio of 45.69% TLT, 15.07% GLD, and 39.24% QQQ will give us the best risk adjusted returns.
We can pull out the individual performance parameters of this portfolio accordingly.
[0.08650428 0.08777656 0.9855054 ]
Optimal Portfolio Return: 8.6504
Optimal Portfolio Volatility: 8.7777
Optimal Portfolio Sharpe Ratio: 0.9855
The Efficient Frontier
The efficient frontier is defined as all the portfolios that maximize the return for a given level of volatility. There can only be one of these for each level of volatility, and when plotted forms a curve around the cluster of portfolio values.
What we do is we iterate through a series of target returns, and for each target return we find the portfolio with the minimal level of volatility. To do this, we’ll need to minimize volatility instead of the negative of the sharpe ratio. This process is exactly the same as the process for sharpe ratio, except we substitute in our minimizing function for volatility instead.
And we get a familiar output…
fun: 0.006703774377738298
jac: array([0.0133528 , 0.01338956, 0.01350171, 0.01346032, 0.02028012])
message: 'Optimization terminated successfully.'
nfev: 112
nit: 16
njev: 16
status: 0
success: True
x: array([0.52228601, 0.13092821, 0.30560919, 0.0411766 , 0. ])
[('TLT', 0.5223),
('GLD', 0.1309),
('SPY', 0.3056),
('QQQ', 0.0412),
('VWO', 0.0)]
In the code above we had the optimization algorithm optimize a portfolio such that it has the least amount of risk. The output shows the asset weighting required to minimize risk with this set of assets. Note that this is only for one portfolio. To plot an efficient frontier we need to loop through a bunch of target returns and repeat the exact same process above. We can then collect these results and plot them to see our frontier line.
Now we can plot these results!
The Capital Market Line
In the above chart we can see the efficient frontier denoted by ‘x’s’. The big red star is the portfolio optimized for Sharpe Ratio, and the Yellow star is the portfolio is optimized to minimize variance (risk).
Now what we need to do is calculate the capital market line. We can accomplish this by calculating the line that intercepts the efficient frontier tangentially.
In order to do this, we need to make a better approximation of the efficient frontier and then calculate its first derivative along the approximated curve.
[0.01 0.89868124 0.08590308]
Note that solving for the capital market line equation can be finicky and you may have to play with it to get it right. Ultimately you’re looking for the capital market line to be tangential to the efficient frontier.
From experience, I find setting the first parameter equal to the risk free rate, the second paramter to half the max portfolio volatility, and the last parameter to half the max portfolio return seems to work.
Check to see the optimization function reduces all three equations to 0…
array([ 0., -0., 0.])
Now we’ll plot the capital market line, along with our spline approximation of the frontier along with all of the simulated portfolios.
Now we can arrive at the weights of the markowitz optimal portfolio by running the optimization function again using the output from this function as our constraint.
[('TLT', 0.447), ('GLD', 0.15), ('SPY', 0.0), ('QQQ', 0.403), ('VWO', 0.0)]
By zipping together out asset list and our list of optimal weights we get a clear picture of how the optimal portfolio should be constructed.
In part two of this series we’ll tie everything together into a unified class function that allows us to analyze a portfolio of any number of assets we choose.
I hope you enjoyed this post!
Kevin Vecmanis