Suppose you have $1 million invested in a trust fund at 8% annually with a standard deviation of 30%. Assume that inflation averages 3% over the term with a standard deviation of 9%. You plan to withdraw $80,000 a year in monthly installments of $6,666.67. What are the chances that the $1 million lasts you ten years?

This problem can be modeled using simulation. The fund will lose and gain money according to a process that has both deterministic and random components. The interest rate will fluctuate in a similar fashion and impact the fund’s bottom line. The goal is to model how likely the fund is to last the desired 10 years.

I’ve seen problems like this approached by simulating random price paths in excel, but they almost always look like shit because each solitary trial involves periodic iterations, quickly turning the spreadsheet into a clusterfuck. Excel also makes it incredibly hard to make a simple histogram or Kernel Density Estimate, which I personally think are the most effective approaches to model the fund’s survival. Instead of Excel, I’ll use R to quickly visualize the behavior of the fund and compute descriptive and inferential statistics using the same (admittedly questionable) Brownian Motion approach. First set the parameters:

#Fund balance and rates

start.capital = 1000000

annual.mean.return = 8 / 100

annual.ret.std.dev = 30 / 100

annual.inflation = 3 / 100

annual.inf.std.dev = 9 / 100

# Withdrawals

monthly.withdrawals = 80000/12

n.obs = 10*12

# Number of simulations

n.sim = 100

It’s easiest to treat the each iteration of the simulation as a sequence of 120 months and convert our annual return and inflation measures into monthly ones:

# monthly Investment and Inflation assumptions

monthly.mean.return = annual.mean.return / 12

monthly.ret.std.dev = annual.ret.std.dev / sqrt(12)

monthly.inflation = annual.inflation / 12

monthly.inf.std.dev = annual.inf.std.dev / sqrt(12)

With just one line of code, R can perform a wide range of simulations. Some are **parametric**, meaning their validity depends on an assumption(s) that must be made about the shape of the underlying distribution. Other techniques, like bootstrapping and subsampling, are called **robust, **meaning distributional assumptions aren’t necessary for sound results.

Brownian motion is a process characterized by periodic returns that are normally distributed and independent of each other. To simulate it, we must first generate normally distributed and independent returns and inflation rates for each of the 120 periods over which we will model the fund’s value. This completes one simulated ten year period. Since we can’t learn shit from just one simulated outcome, we iterate this process to a desired frequency, which is positively related to the model’s accuracy. First we simulate the required number of returns and inflation rates to simulate 100 outcomes:

monthly.invest.returns = matrix(0, n.obs, n.sim)

monthly.inflation.returns = matrix(0, n.obs, n.sim)

monthly.invest.returns[] = rnorm(n.obs * n.sim, mean = monthly.mean.return, sd = monthly.ret.std.dev)

monthly.inflation.returns[] = rnorm(n.obs * n.sim, mean = monthly.inflation, sd = monthly.inf.std.dev)

So now we’ve generated a matrix monthly.invest.returns with 120 rows, corresponding to months, and 100 columns, corresponding to iterations of the random entire 120-month long random process. The matrix monthly.inflation.returns is the same thing but for inflation.

The fund starts out with $1 million, and each period three adjustments are made to the net asset value (NAV): it is multiplied by a factor of 1 plus the random return, it’s discounted by the random inflation rate, and a withdrawal is made. So, to get to NAV at time x+1, the previous NAV is multiplied by:

(1+monthly.invest.returns(x, ) – monthly.inflation.returns(x ,))

Notice I left the column number blank when I called the matrices because I want to calculate values for month x (row x) over *all *columns. Using a simple code I can loop through all 100 columns efficiently for each row. Putting everything together, I have:

nav = matrix(1000000, n.obs + 1, n.sim)

for (j in 1:n.obs) {

nav[j + 1, ] = nav[j, ] * (1 + monthly.invest.returns[j, ] – monthly.inflation.returns[j, ]) – monthly.withdrawals

}

Since negative fund values don’t make any sense, I tell R to replace negative fund values with NA. Characterizing all such funds this way also gives me an easy way to count how many cases there are:

nav[ nav < 0 ] = NA

p.alive = 1 – rowSums(is.na(nav)) / ncol(nav)

The vector p.alive is the percentage of funds that haven’t yet run out of money. Plotting the probability of fund survival by period gives us an idea of the time at which funds start to experience solvency troubles:

About halfway through the 10 years the percentage of living funds turns downward and continues on a linear ish trend until the end of the term, though the proportion of living funds remains larger than the proportion of insolvent funds throughout.

I can get a snapshot of how the strong the 100 funds finished by printing out the 120th row of the matrix:

If I wanna know how big a typical surviving fund’s financial cushion is, I’m going to want to sort the final NAVs into groups. Aesthetically, the NA values everywhere just kind of fuck me up. Use the ! operator to group the final NAV values that aren’t equal to NA (i.e. all the surviving funds):

So of the 100 funds 56 survived the entire 120 months. Formally they represent the conditional distribution of the funds given they survived to month 120. This might be of interest in a more detailed analysis, but for now I’ll just point out one thing – the fact that only about half of the funds survived but more than a few of the ones that did survive have balances well above 1 million suggests to me that the issue stems from too much risk (variability) not insufficient return.

Finally, R makes it very easily to plot the density of the surviving funds, which is hard and annoying to do in Excel:

if(length(final.nav) == 0) return()

plot(density(final.nav, from=0, to=max(final.nav)), las = 1, xlab = ‘Final Capital’,

main = paste(‘Distribution of Final Balance,’, 100 * p.alive[120], ‘% still solvent’))

Ta-da. Personally, I think a density estimate of the final fund value for surviving funds, appropriately labeled with the proportion of the total 100 funds that belong to this group, gives a clearer, more precise, and more complete summary of the simulation results than could be obtained by just simulating and plotting price paths in Excel.