RSM338: Applications of Machine Learning in Finance

Week 2: Financial Data I | January 14–15, 2026

Kevin Mott

Rotman School of Management

Motivation and Overview

Much of quantitative finance—and the ML applications we will study—centers on return prediction.

  • We model and forecast returns because they have convenient statistical properties.

However, investors ultimately care about wealth: the dollar value of their portfolio.

  • Wealth is a nonlinear function of returns, which creates a fundamental issue.
  • In general, for a nonlinear function \(f\) and random variable \(X\): \[\mathbb{E}[f(X)] \neq f(\mathbb{E}[X])\]
  • As we will see, we usually choose to model log returns: \(r_t = \ln P_{t} - \ln P_{t-1}\)
  • A standard statistical assumption is that log returns \(r_t\) are normally distributed
  • What does this imply for the distribution of wealth \(W_t\)? \[ r_t \sim \mathcal N \implies W_t \sim \boxed{\quad\quad\quad} \text{?} \]

This means that knowing the expected return does not directly tell us the expected wealth.

Today we develop the statistical framework for translating between returns and wealth: this is an application of the more general study of random variables.

Part I introduces log returns (continuously compounded returns) and shows that if log returns are normally distributed, then wealth follows a log-normal distribution. We derive the expected value of log-normal wealth and show why it differs systematically from naive forecasts.

Part II addresses estimation risk: when we estimate expected returns from historical data, how does that uncertainty propagate into wealth forecasts? We show that estimation error introduces additional upward bias.

Part I: From Log Returns to Log-Normal Wealth

Simple (Arithmetic) Returns

The simple return (or arithmetic return) from period \(t-1\) to \(t\) is:

\[R_t = \frac{P_t + d_t}{P_{t-1}} - 1\]

where:

  • \(P_t\) = price at time \(t\)
  • \(P_{t-1}\) = price at time \(t-1\)
  • \(d_t\) = dividend paid during period (if any)

Example: If \(P_{t-1} = 100\), \(P_t = 105\), and \(d_t = 2\):

\[R_t = \frac{105 + 2}{100} - 1 = 0.07 \text{ or } 7\%\]

Interpretation: it tells you how much your wealth grew.

The Problem with Simple Returns: Compounding and Annualization

Suppose you earn \(R_1 = 10\%\) in year 1 and \(R_2 = 10\%\) in year 2.

What is your total return over both years?

Not \(10\% + 10\% = 20\%.\) Instead:

\[1 + R_{1\to2} = (1 + R_1)(1 + R_2) = (1.10)(1.10) = 1.21\]

So \(R_{1\to2} = 21\%.\)

Simple returns compound multiplicatively, not additively.

In general, over \(T\) years:

\[1 + R_{1 \to T} = (1 + R_1)(1 + R_2) \cdots (1 + R_T) = \prod_{t=1}^{T} (1 + R_t)\]

We often want to annualize multi-year returns for comparison.

Suppose you observe a \(T\)-year cumulative return \(R_{1 \to T}\). What’s the annualized return?

You need the \(\bar{R}\) such that earning \(\bar{R}\) each year gives the same cumulative return:

\[(1 + \bar{R})^T = 1 + R_{1 \to T}\]

Solving:

\[\bar{R} = (1 + R_{1 \to T})^{1/T} - 1\]

The problem: This \((\cdot)^{1/T}\) operation is a nonlinear function of returns, and recall that \(\mathbb{E}[f(X)] \neq f(\mathbb{E}[X])\) in general.

  • The average of annualized returns \(\neq\) annualized average return
  • Variances don’t scale nicely
  • Taking roots of random variables creates bias

With log returns, annualization is just division by \(T\). Much cleaner.

Log Returns (Continuously Compounded Returns)

Given a simple return \(R\), what is the equivalent continuously compounded return \(r\)?

By definition, \(r\) is the rate such that continuous compounding gives the same growth:

\[e^r = 1 + R\]

Solving for \(r\):

\[r = \ln(1 + R)\]

This is why they’re called log returns—they’re literally the logarithm of gross returns.

For stocks (ignoring dividends), the gross return is \(1 + R_t = \frac{P_t}{P_{t-1}}\), so:

\[r_t = \ln\left(\frac{P_t}{P_{t-1}}\right) = \ln(P_t) - \ln(P_{t-1})\]

Log returns are just differences in log prices. This is extremely convenient for computation.

Example: If \(R = 7\%\), then \(r = \ln(1.07) \approx 6.77\%\).

Why does this help? Recall from Week 1 that \(\ln(ab) = \ln(a) + \ln(b)\).

Apply this to multi-period returns:

\[\begin{align*} r_{1 \to T} &= \ln(1 + R_{1 \to T}) = \ln\left[\prod_{t=1}^{T}(1+R_t)\right] \\ &= \ln(1+R_1) + \ln(1+R_2) + \cdots + \ln(1+R_T) \\ &= r_1 + r_2 + \cdots + r_T = \sum_{t=1}^{T} r_t \end{align*}\]

Log returns add over time. This is much easier to work with mathematically.

The annualized log return \(\bar{r}\) is the constant rate that, if earned every year, gives the same terminal wealth.

Derivation: Earning \(\bar{r}\) for \(T\) years means terminal wealth is:

\[W_T = W_0 \cdot e^{\bar{r}} \cdot e^{\bar{r}} \cdots e^{\bar{r}} = W_0 \cdot e^{T\bar{r}}\]

This must equal the actual terminal wealth \(W_0 \cdot e^{r_{1 \to T}}\):

\[W_0 \cdot e^{T\bar{r}} = W_0 \cdot e^{r_{1 \to T}}\]

Taking logs of both sides:

\[T\bar{r} = r_{1 \to T} \quad \implies \quad \bar{r} = \frac{r_{1 \to T}}{T} = \frac{1}{T}\sum_{t=1}^{T} r_t\]

The annualized log return is just the arithmetic mean! No roots, no messy exponents.

Converting Between Simple and Log Returns

Key relationships:

\[r_t = \ln(1 + R_t) \quad \Longleftrightarrow \quad R_t = e^{r_t} - 1\]

For small returns, \(r_t \approx R_t\) (because \(\ln(1+x) \approx x\) for small \(x\)). The difference grows for larger returns:

import numpy as np
import matplotlib.pyplot as plt
from matplotlib.ticker import PercentFormatter

# Simple returns from -80% to +100%
R = np.linspace(-0.8, 1.0, 200)

# Log returns: r = ln(1 + R)
r = np.log(1 + R)

fig, ax = plt.subplots(figsize=(5, 5))
ax.plot(R, R, 'k--', label='If r = R (45° line)')
ax.plot(R, r, label='Actual: r = ln(1 + R)')
ax.set_xlabel('Simple Return R')
ax.set_ylabel('Log Return r')
ax.axhline(0, linestyle=':', color='gray')
ax.axvline(0, linestyle=':', color='gray')
ax.xaxis.set_major_formatter(PercentFormatter(xmax=1))
ax.yaxis.set_major_formatter(PercentFormatter(xmax=1))
ax.legend()
ax.set_title('Simple vs. Log Returns')
plt.show()

Note the asymmetry: log returns treat gains and losses differently.

  • A 50% gain: \(r = \ln(1.5) = 40.5\%\)
  • A 50% loss: \(r = \ln(0.5) = -69.3\%\)

Why Computational Finance Prefers Log Returns

Summary of advantages:

  1. Additivity: Multi-period returns are just sums and annualizing multi-period returns is multiplicative: \[r_{1 \to T} = \sum_{t=1}^{T} r_t \qquad r_{1 \to T}^\text{annual} = \frac{1}{T} r_{1 \to T}\]

  2. Statistical convenience: Sums/scalar multiples of random variables are easier to analyze than exponentiated products

  3. No lower bound: Log returns can be any real number; simple returns are bounded below by \(-100\%\)

Simple returns are what you actually earn. Log returns are what you compute with. We’ll move between them as needed.

The Key Assumption: Log Returns Are Normal

A foundational assumption in quantitative finance:

Log returns are normally distributed.

In notation (recall from Week 1):

\[r_t \sim N(\mu, \sigma^2)\]

This says: each period’s log return is a random draw from a normal distribution with mean \(\mu\) and variance \(\sigma^2\).

We usually invest to grow our wealth, so if log returns are normally distributed, what does that imply for the distribution of prices (or wealth)?

From Log Returns to Wealth

Suppose you start with wealth \(W_0\) and invest for \(T\) years.

Your terminal wealth is the product of gross returns:

\[W_T = W_0 \cdot (1 + R_1)(1 + R_2) \cdots (1 + R_T)\]

Using the relationship \(1 + R_t = e^{r_t}\):

\[W_T = W_0 \cdot e^{r_1} \cdot e^{r_2} \cdots e^{r_T} = W_0 \cdot e^{r_1 + r_2 + \cdots + r_T}\]

Taking logs of both sides:

\[\ln\left(\frac{W_T}{W_0}\right) = r_{1 \to T} = \sum_{t=1}^{T} r_t\]

Key insight: Log wealth growth is a sum of log returns. This is why the assumption about log returns matters so much.

Why \(T\) Appears Everywhere

If each year’s log return is an independent draw from \(N(\mu, \sigma^2)\):

\[r_1, r_2, \ldots, r_T \stackrel{\text{i.i.d.}}{\sim} N(\mu, \sigma^2)\]

Then the sum of \(T\) such draws is (recall from Week 1):

\[\sum_{t=1}^{T} r_t \sim N\left(\underbrace{T \cdot \mu}_{\text{mean}}, \; \underbrace{T \cdot \sigma^2}_{\text{variance}}\right)\]

Both the mean and variance scale with \(T\):

  • Mean grows linearly: expected cumulative return is \(T\mu\)
  • Variance grows linearly: uncertainty compounds over time
  • Standard deviation grows with \(\sqrt{T}\): \(\text{SD} = \sigma\sqrt{T}\)

This is why time horizon is so important in finance.

The Distribution of Log Wealth

Combining our results:

\[\ln(W_T) = \ln(W_0) + \sum_{t=1}^{T} r_t\]

Since \(\sum_{t=1}^{T} r_t \sim N(T\mu, T\sigma^2)\), we have:

\[\ln(W_T) \sim N\left(\ln(W_0) + T\mu, \; T\sigma^2\right)\]

Log wealth is normally distributed.

But we care about wealth itself, \(W_T\), not its logarithm.

If \(\ln(W_T)\) is normal, what distribution does \(W_T\) follow?

The Log-Normal Distribution

Definition: A random variable \(X\) is log-normally distributed if \(\ln(X)\) is normally distributed.

\[\text{If } Z \sim N(m, v), \text{ then } X = e^Z \text{ is log-normal.}\]

In our case:

  • \(\ln(W_T) \sim N(\ln(W_0) + T\mu, \; T\sigma^2)\)
  • Therefore \(W_T\) is log-normally distributed

The key implication:

\[\text{Normal log returns} \implies \text{Log-normal wealth}\]

Forecasting Future Wealth

We want to forecast expected terminal wealth: \(\mathbb{E}[W_T]\)

We know:

\[W_T = W_0 \cdot e^{\sum_{t=1}^T r_t}\]

And we know \(\mathbb{E}\left[\sum r_t\right] = T\mu\).

Tempting guess: \(\mathbb{E}[W_T] = W_0 \cdot e^{T\mu}\)?

This would require:

\[\mathbb{E}\left[e^{\sum r_t}\right] \stackrel{?}{=} e^{\mathbb{E}\left[\sum r_t\right]}\]

But in general: \(\mathbb{E}[f(X)] \neq f(\mathbb{E}[X])\)

So how wrong is our guess, and in which direction?

Normal vs. Log-Normal: A Visual Comparison

import numpy as np
import matplotlib.pyplot as plt
from scipy import stats

fig, (ax1, ax2) = plt.subplots(1, 2)

# Normal distribution
x_norm = np.linspace(-3, 3, 1000)
ax1.plot(x_norm, stats.norm.pdf(x_norm))
ax1.axvline(0, linestyle='--', label='mean = median = mode')
ax1.set_xlabel('z')
ax1.set_title('Normal: Z ~ N(0,1)')
ax1.legend()

# Log-normal distribution
x_lognorm = np.linspace(0.01, 5, 1000)
ax2.plot(x_lognorm, stats.lognorm.pdf(x_lognorm, s=1, scale=1))

# Mode, median, mean for log-normal
mode = np.exp(-1)    # exp(-sigma^2)
median = 1           # exp(mu)
mean = np.exp(0.5)   # exp(mu + sigma^2/2)

ax2.axvline(mode, linestyle='--', label=f'mode = {mode:.2f}')
ax2.axvline(median, linestyle='--', label=f'median = {median:.2f}')
ax2.axvline(mean, linestyle='--', label=f'mean = {mean:.2f}')
ax2.set_xlabel('x = exp(z)')
ax2.set_title('Log-Normal: X = exp(Z)')
ax2.legend()

plt.show()

Exponentiating a symmetric distribution creates asymmetry: Mode \(<\) Median \(<\) Mean.

The key formula: If \(Z \sim N(\mu, \sigma^2)\), then the mean of \(e^Z\) (recall: mean = expected value = \(\mathbb{E}[\cdot]\)) is:

\[\mathbb{E}[e^Z] = e^{\mu + \frac{\sigma^2}{2}}\]

The variance \(\sigma^2\) appears in the expected value! This is fundamental to log-normal distributions.

Advanced: Where does this formula come from?

Recall from Week 1: for a continuous random variable, \(\mathbb{E}[g(X)] = \int g(x) \cdot f_X(x) \, dx\) where \(f_X\) is the PDF.

Here \(g(z) = e^z\) and \(f_Z(z) = \frac{1}{\sqrt{2\pi \sigma^2}} e^{-\frac{(z-\mu)^2}{2\sigma^2}}\) is the normal PDF, so:

\[\mathbb{E}[e^Z] = \int_{-\infty}^{\infty} e^z \cdot \frac{1}{\sqrt{2\pi \sigma^2}} e^{-\frac{(z-\mu)^2}{2\sigma^2}} \, dz\]

Combine the exponentials: the exponent is \(z - \frac{(z-\mu)^2}{2\sigma^2}\). Complete the square:

\[\begin{align*} z - \frac{(z-\mu)^2}{2\sigma^2} &= \frac{2\sigma^2 z - (z^2 - 2\mu z + \mu^2)}{2\sigma^2} \\ &= \frac{-z^2 + 2z(\mu + \sigma^2) - \mu^2}{2\sigma^2} \\ &= -\frac{(z - (\mu + \sigma^2))^2}{2\sigma^2} + \frac{(\mu + \sigma^2)^2 - \mu^2}{2\sigma^2} \\ &= -\frac{(z - (\mu + \sigma^2))^2}{2\sigma^2} + \mu + \frac{\sigma^2}{2} \end{align*}\]

So the integral becomes:

\[\mathbb{E}[e^Z] = e^{\mu + \frac{\sigma^2}{2}} \int_{-\infty}^{\infty} \frac{1}{\sqrt{2\pi \sigma^2}} e^{-\frac{(z - (\mu + \sigma^2))^2}{2\sigma^2}} \, dz = e^{\mu + \frac{\sigma^2}{2}} \cdot 1\]

The integral equals 1 because it’s the PDF of \(N(\mu + \sigma^2, \sigma^2)\) integrated over all \(z\).

Why Does Variance Affect the Mean?

Intuition: The exponential function \(e^z\) is convex (curves upward, second derivative positive).

By Jensen’s inequality: for a convex function \(f\) and random variable \(Z\),

\[\mathbb{E}[f(Z)] \geq f(\mathbb{E}[Z])\]

Applied to \(f(z) = e^z\):

\[e^{\mu + \frac{\sigma^2}{2}} = {\mathbb{E}[e^Z]} \geq e^{\mathbb{E}[Z]} = e^{\mu}\]

The \(\frac{\sigma^2}{2}\) term is the variance boost. The more spread out \(Z\) is (higher variance), the more the exponential “boosts” the high values relative to the low values.

Expected Wealth After \(T\) Years

Recall: our “naive” forecast for terminal wealth (ignoring variance) was \(W_0 \cdot e^{T\mu}\).

But now we know the true expected value includes the variance boost:

\[\mathbb{E}[W_T] = W_0 \cdot e^{T\mu + \frac{T\sigma^2}{2}} = W_0 \cdot e^{T\mu} \cdot e^{\frac{T\sigma^2}{2}}\]

The factor \(e^{T\sigma^2/2}\) is the variance boost:

  • It grows with time horizon \(T\): longer investments have larger boosts
  • It grows with volatility \(\sigma^2\): riskier assets have larger boosts
  • This happens because wealth is log-normally distributed with a fat right tail: Mean \(>\) Median, and the gap widens with \(T\) and \(\sigma^2\)
  • Over long horizons, expected wealth becomes much larger than what a “typical” investor will experience

Simulating 100,000 Investors

A concrete example: \(\mu = 9.41\%\), \(\sigma = 18.48\%\), \(T = 60\) years, \(W_0 = \$1\)

  • Naive forecast: \(e^{T\mu} = e^{60 \times 0.0941} \approx \$283\)
  • True expected value: \(e^{T\mu + T\sigma^2/2} = e^{5.65 + 1.02} \approx \$789\) — nearly 3× higher!

The median is the “typical” outcome. The mean is pulled up by a small number of extremely lucky outcomes—most investors will earn less than the expected value. Let’s simulate this.

import numpy as np
import matplotlib.pyplot as plt
from scipy import stats

np.random.seed(42)
mu, sigma, T = 0.0941, 0.1848, 60  # Annual log return parameters
n_investors = 100000
W_0 = 1  # Starting wealth

# Step 1: Draw 60 years of log returns for each investor
# Each row = one investor, each column = one year
returns = np.random.normal(mu, sigma, (n_investors, T))

print(f"Shape: {returns.shape} (100,000 investors × 60 years)")
print(f"First investor's first 5 years: {returns[0, :5].round(3)}")
Shape: (100000, 60) (100,000 investors × 60 years)
First investor's first 5 years: [0.186 0.069 0.214 0.376 0.051]

Each investor gets 60 independent draws from \(N(\mu, \sigma^2)\).

Step 1: Individual annual returns are normally distributed (symmetric).

# The distribution of individual annual returns
all_returns = returns.flatten()

plt.hist(all_returns, bins=50, density=True, label='Simulated')

# Overlay the true normal distribution
x = np.linspace(all_returns.min(), all_returns.max(), 200)
plt.plot(x, stats.norm.pdf(x, mu, sigma), label=f'True N(μ={mu:.1%}, σ={sigma:.1%})')

plt.xlabel('Annual Log Return')
plt.ylabel('Density')
plt.legend()
plt.show()

Step 2: Log returns accumulate over time (still symmetric).

# Cumulative log returns (running sum over time)
cumulative_log_returns = np.cumsum(returns, axis=1)

# Plot sample paths for a few investors
for i in range(50):
    plt.plot(range(1, T+1), cumulative_log_returns[i], alpha=0.3)

plt.axhline(mu * T, linestyle='--', label=f'Expected: {mu*T:.2f}')
plt.xlabel('Year')
plt.ylabel('Cumulative Log Return')
plt.legend()
plt.show()

After 60 years, cumulative log return \(\sim N(T\mu, T\sigma^2)\) — still symmetric.

# Distribution of terminal cumulative log returns
terminal_log_returns = cumulative_log_returns[:, -1]

plt.hist(terminal_log_returns, bins=50, density=True, label='Simulated')

# Overlay the true normal distribution: N(T*mu, T*sigma^2)
x = np.linspace(terminal_log_returns.min(), terminal_log_returns.max(), 200)
plt.plot(x, stats.norm.pdf(x, mu*T, sigma*np.sqrt(T)),
         label=f'True N(Tμ={mu*T:.2f}, √Tσ={sigma*np.sqrt(T):.2f})')

plt.xlabel('Cumulative Log Return (60 years)')
plt.ylabel('Density')
plt.legend()
plt.show()

Step 3: Exponentiate to get wealth (now asymmetric!).

# Convert to wealth by exponentiating: W_T = W_0 * exp(cumulative log return)
wealth_paths = W_0 * np.exp(cumulative_log_returns)

# Plot sample paths
for i in range(50):
    plt.plot(range(1, T+1), wealth_paths[i], alpha=0.3)

plt.xlabel('Year')
plt.ylabel('Wealth ($)')
plt.yscale('log')  # Log scale to see all paths
plt.show()

Step 4: Terminal wealth is log-normally distributed.

# Final wealth after 60 years
terminal_wealth = wealth_paths[:, -1]

plt.hist(terminal_wealth, bins=100, density=True, label='Simulated')

# Overlay the true log-normal distribution
# If ln(W) ~ N(m, v), then W is log-normal with scale=exp(m) and s=sqrt(v)
x = np.linspace(terminal_wealth.min(), terminal_wealth.max(), 500)
plt.plot(x, stats.lognorm.pdf(x, s=sigma*np.sqrt(T), scale=np.exp(mu*T)), label='True log-normal')

plt.axvline(np.mean(terminal_wealth), linestyle='--', label=f'Mean: ${np.mean(terminal_wealth):.0f}')
plt.axvline(np.median(terminal_wealth), linestyle='--', label=f'Median: ${np.median(terminal_wealth):.0f}')
plt.xlabel('Terminal Wealth ($)')
plt.ylabel('Density (log scale)')
# plt.xscale('log')
plt.yscale('log')
plt.legend()
plt.show()

print(f"{100*np.mean(terminal_wealth < np.mean(terminal_wealth)):.0f}% of investors earn less than the mean!")

76% of investors earn less than the mean!

The punchline: Most investors earn less than the expected value!

The Punchline: Expected Value Is Biased Upward

Summary of Part I:

  1. Log returns assumed normal \(\implies\) wealth is log-normal
  2. The expected value of a log-normal includes a variance boost: \(e^{T\sigma^2/2}\)
  3. This boost grows with time horizon \(T\) and volatility \(\sigma^2\)
  4. As a result: Mean \(>\) Median \(>\) Mode for terminal wealth

The implication:

If you use expected wealth to plan for retirement (or advise clients), you will systematically overestimate what most people will actually experience.

Next: How do we adjust for this bias? And where does the uncertainty come from?

Part II: Estimation Risk

From Known to Unknown Parameters

In Part I, we treated \(\mu\) and \(\sigma^2\) as known parameters.

Reality: We don’t know the true expected return \(\mu\). We must estimate it from historical data.

This introduces estimation risk—additional uncertainty because our parameters are estimates, not truth.

Key question: If we plug our estimate \(\hat{\mu}\) into the wealth formula, do we get an unbiased forecast of expected wealth?

Spoiler: No. There’s an additional source of upward bias beyond what we saw in Part I.

What Is an Unbiased Estimator?

Definition: An estimator \(\hat{\theta}\) is unbiased if its expected value equals the true parameter:

\[\mathbb{E}[\hat{\theta}] = \theta\]

Example: The sample mean \(\bar{r} = \frac{1}{N}\sum_{i=1}^{N} r_i\) is an unbiased estimator of \(\mu\).

Why should we care?

  • Unbiased estimators are correct on average over repeated sampling
  • They avoid systematic over- or under-prediction
  • In finance: unbiased forecasts prevent consistently overoptimistic (or pessimistic) investment expectations

Caution: An unbiased estimator of \(\mu\) does not automatically give an unbiased estimator of functions of \(\mu\) (like \(e^{T\mu}\)).

Estimating the Mean Return

Setup: We have \(N\) historical observations of log returns: \(r_1, r_2, \ldots, r_N\). (For now, assume we know the true volatility \(\sigma\).)

We estimate the true mean \(\mu\) using the sample mean:

\[\hat{\mu} = \frac{1}{N} \sum_{i=1}^{N} r_i\]

Properties of \(\hat{\mu}\):

  • Unbiased: \(\mathbb{E}[\hat{\mu}] = \mu\)
  • Standard error: \(\text{SE}(\hat{\mu}) = \frac{\sigma}{\sqrt{N}}\)
  • Distribution: \(\hat{\mu} \sim N\left(\mu, \frac{\sigma^2}{N}\right)\)

The standard error \(\frac{\sigma}{\sqrt{N}}\) tells us how uncertain we are about \(\mu\). More data (larger \(N\)) means less uncertainty.

Standard Error: How Precise Is Our Estimate?

The standard error of \(\hat{\mu}\) is the standard deviation of our estimator:

\[SE(\hat{\mu}) = \frac{\sigma}{\sqrt{N}}\]

Example: With \(\sigma = 20\%\) annual volatility:

Years of data (\(N\)) Standard error
25 years \(\frac{0.20}{\sqrt{25}} = 4\%\)
50 years \(\frac{0.20}{\sqrt{50}} = 2.8\%\)
100 years \(\frac{0.20}{\sqrt{100}} = 2\%\)

Even with 100 years of data, our estimate of \(\mu\) is only accurate to about \(\pm 2\%\).

If \(\mu \approx 8\%\), a 2% standard error means we’re quite uncertain about the true value!

The Estimated Wealth Forecast

Suppose we want to forecast expected wealth over \(T\) periods using our estimate \(\hat{\mu}\).

From Part I, we know true expected wealth is \(\mathbb{E}[W_T] = W_0 \exp\left(T\mu + \frac{T\sigma^2}{2}\right)\).

Natural approach: Plug in our estimate:

\[\widehat{W}_T = W_0 \cdot \exp\left(T\hat{\mu} + \frac{T\sigma^2}{2}\right)\]

(For simplicity, assume \(\sigma^2\) is known.)

Question: Is \(\widehat{W}_T\) an unbiased estimator of \(\mathbb{E}[W_T]\)?

In other words: Does \(\mathbb{E}[\widehat{W}_T] = \mathbb{E}[W_T]\)?

The Bias in Estimated Wealth

Answer: No! \(\widehat{W}_T\) is upward biased.

Why? Since \(\hat{\mu} \sim N\left(\mu, \frac{\sigma^2}{N}\right)\), we have \(T\hat{\mu} \sim N\left(T\mu, \frac{T^2\sigma^2}{N}\right)\).

Now \(\exp(T\hat{\mu})\) is log-normal! Using our formula from Part I:

\[\mathbb{E}[\exp(T\hat{\mu})] = \exp\left(T\mu + \frac{T^2\sigma^2}{2N}\right)\]

Putting it together:

\[\begin{align*} \mathbb{E}[\widehat{W}_T] &= W_0 \cdot \mathbb{E}[\exp(T\hat{\mu})] \cdot \exp\left(\frac{T\sigma^2}{2}\right) \\ &= \underbrace{W_0 \cdot \exp\left(T\mu + \frac{T\sigma^2}{2}\right)}_{\mathbb{E}[W_T]} \cdot \underbrace{\exp\left(\frac{T^2\sigma^2}{2N}\right)}_{\text{Bias Factor}} \end{align*}\]

Key result:

\[\boxed{\mathbb{E}[\widehat{W}_T] = \mathbb{E}[W_T] \cdot \exp\left(\frac{T^2\sigma^2}{2N}\right)}\]

The bias factor is always \(> 1\), so \(\widehat{W}_T\) systematically overestimates expected wealth.

How Large Is the Bias?

Bias Factor: \(\exp\left(\frac{T^2\sigma^2}{2N}\right)\)

Example: \(\sigma = 18.48\%\), \(N = 98\) years of historical data

Horizon \(T\) \(\frac{T^2\sigma^2}{2N}\) Bias Factor
1 year 0.0002 1.000 (0% bias)
10 years 0.017 1.018 (1.8% bias)
30 years 0.157 1.170 (17% bias)
60 years 0.627 1.873 (87% bias)

For a 60-year horizon, our wealth forecast is biased upward by 87%!

The bias grows with \(T^2\)—it becomes severe for long horizons.

Intuition: Why Does Estimation Error Create Upward Bias?

The same Jensen’s inequality logic from Part I applies here.

  • Our estimate \(\hat{\mu}\) is sometimes too high, sometimes too low
  • When \(\hat{\mu}\) is too high, \(\exp(T\hat{\mu})\) overshoots by a lot
  • When \(\hat{\mu}\) is too low, \(\exp(T\hat{\mu})\) undershoots by less
  • On average, the overshoots win—the mean is pulled up

The exponential function is convex: it amplifies high values more than it dampens low values.

This is exactly the same mechanism that made \(\mathbb{E}[W_T] > \text{Median}[W_T]\) in Part I.

Bottom line: Any uncertainty that enters the exponent inflates the expected value.

Correcting the Bias: An Unbiased Estimator

To get an unbiased estimator of \(\mathbb{E}[W_T]\), we subtract the bias term:

\[\boxed{\widehat{W}_T^{\text{unbiased}} = W_0 \cdot \exp\left(T\hat{\mu} + \frac{T\sigma^2}{2} - \frac{T^2\sigma^2}{2N}\right)}\]

The correction term \(\frac{T^2\sigma^2}{2N}\) removes the upward bias caused by estimation risk.

Reference: This adjustment is derived in Jacquier, Kane, and Marcus (2003), “Geometric or Arithmetic Mean: A Reconsideration,” Financial Analysts Journal.

Intuition: We’re subtracting out the “extra variance” that comes from not knowing \(\mu\) perfectly.

Two Sources of Uncertainty

Our wealth forecast faces two distinct sources of uncertainty:

  1. Return risk: Future returns are random
    • Variance contribution to \(\ln(W_T)\): \(T\sigma^2\)
    • Grows linearly with horizon \(T\)
  2. Estimation risk: We don’t know \(\mu\) exactly
    • Variance contribution to forecast: \(\frac{T^2\sigma^2}{N}\)
    • Grows with \(T^2\)—much faster!

Total variance in log wealth forecast:

\[\text{Var}[\ln(\widehat{W}_T)] = T\sigma^2 + \frac{T^2\sigma^2}{N}\]

Estimation Risk Dominates at Long Horizons

Example: \(\sigma = 20\%\), \(N = 100\) years of data

Horizon \(T\) Return risk Estimation risk Ratio
\(T\sigma^2\) \(\frac{T^2\sigma^2}{N}\)
1 year 0.04 0.0004 1%
10 years 0.40 0.04 10%
30 years 1.20 0.36 30%
60 years 2.40 1.44 60%

At a 60-year horizon, estimation risk is 60% as large as return risk!

Implication: For long-horizon investors (pension funds, endowments, retirement planning), parameter uncertainty is a first-order concern.

Why Is the Mean So Hard to Estimate?

Comparing precision of \(\hat{\mu}\) vs. \(\hat{\sigma}\):

With \(N = 100\) years of annual data:

  • Standard error of \(\hat{\mu}\): \(\frac{\sigma}{\sqrt{N}} = \frac{0.20}{\sqrt{100}} = 2\%\)

  • True \(\mu\) is around 8%, so SE is 25% of the parameter

  • Standard error of \(\hat{\sigma}\): approximately \(\frac{\sigma}{\sqrt{2N}} \approx 1.4\%\)

  • True \(\sigma\) is around 20%, so SE is 7% of the parameter

The mean is much harder to estimate than volatility.

Volatility is estimated from squared deviations (lots of signal each period). The mean requires averaging returns that are very noisy relative to their expected value.

Summary of Part II

Key takeaways:

  1. While \(\hat{\mu}\) is an unbiased estimator of \(\mu\), the wealth forecast \(\widehat{W}_T\) is not an unbiased estimator of \(\mathbb{E}[W_T]\)

  2. The bias factor is \(\exp\left(\frac{T^2\sigma^2}{2N}\right)\)—it grows with \(T^2\)

  3. We can correct for this bias by subtracting \(\frac{T^2\sigma^2}{2N}\) from the exponent

  4. Estimation risk grows faster than return risk as horizon increases

Practical implication: Be skeptical of long-horizon wealth projections. They compound two sources of upward bias: the log-normal effect (Part I) and estimation error (Part II).

Looking Ahead: Why This Matters for Machine Learning

What Is Machine Learning?

At its core:

\[\textbf{Machine Learning} = \textbf{Statistical Algorithms}\]

ML algorithms learn patterns from data. But “learning from data” is just another way of saying estimation.

  • Linear regression estimates coefficients \(\beta\)
  • Neural networks estimate millions of weights
  • Decision trees estimate split points
  • Clustering algorithms estimate cluster centers

Everything we do in this course involves estimation.

The Lesson from Today: Estimation Is Subtle

Today we saw that even simple estimation problems have hidden traps.

  • \(\hat{\mu}\) is unbiased for \(\mu\)
  • But \(\exp(T\hat{\mu})\) is not unbiased for \(\exp(T\mu)\)

Why? The exponential function is nonlinear. Unbiasedness doesn’t survive nonlinear transformations.

The key question to always ask:

Where is the randomness, and where is the expected value?

When expectations pass through linear functions: no problem. When expectations pass through curved functions: bias appears.

Linear vs. Nonlinear: Where the Trouble Starts

Linear functions preserve unbiasedness:

\[\mathbb{E}[a + bX] = a + b\mathbb{E}[X]\]

If \(\hat{\mu}\) is unbiased for \(\mu\), then \(a + b\hat{\mu}\) is unbiased for \(a + b\mu\).

Nonlinear functions create bias:

\[\mathbb{E}[f(X)] \neq f(\mathbb{E}[X]) \quad \text{in general}\]

Today’s example: \(f(x) = e^x\) is convex, so \(\mathbb{E}[e^X] > e^{\mathbb{E}[X]}\).

In ML, nonlinearities are everywhere:

  • Activation functions in neural networks
  • Probability transformations (logistic, softmax)
  • Loss functions (squared error, cross-entropy)

Recovering True Parameters: Not Always Simple

Today’s example: we wanted to estimate \(\mathbb{E}[W_T]\).

The naive approach (plug in \(\hat{\mu}\)) gave a biased answer.

The corrected approach required understanding:

  • The distribution of our estimator (\(\hat{\mu} \sim N(\mu, \sigma^2/N)\))
  • How that distribution transforms through the exponential
  • The exact form of the bias (\(\exp(T^2\sigma^2/2N)\))

This is a recurring theme in ML:

  • Overfitting: in-sample performance \(\neq\) out-of-sample performance
  • Regularization: intentionally biasing estimates to reduce variance
  • Cross-validation: estimating true prediction error, not training error

The Interface of Modeling and Data

Two things interact in every statistical/ML problem:

  1. The model: Our assumptions about how the world works
    • Today: log returns are i.i.d. normal
    • Later: linear relationships, neural network architectures, etc.
  2. The data: What we actually observe
    • Finite samples (we have \(N\) observations, not infinity)
    • Measurement error, missing values, selection bias

The interface is complex. Even when the model is correct, estimation from finite data introduces:

  • Variance (estimates fluctuate across samples)
  • Bias (systematic errors in certain quantities)
  • Uncertainty about which model is correct

What’s Coming Next

Next week (Financial Data II):

  • Are returns actually normal? Skewness, kurtosis, and fat tails
  • Time series structure: are returns independent?
  • Setting up predictive regressions
  • In-sample vs. out-of-sample evaluation

Later in the course:

  • Supervised learning: regression and classification
  • The bias-variance tradeoff
  • Regularization and cross-validation
  • Neural networks and deep learning

Today’s message carries through: Always think carefully about what you’re estimating, what assumptions you’re making, and where bias might creep in.

Today’s Key Results

Part I: The Log-Normal Result

  • Normal log returns \(\implies\) log-normal wealth
  • \(\mathbb{E}[W_T] = W_0 \exp(T\mu + T\sigma^2/2)\)—variance inflates expected value
  • Mean \(>\) Median \(>\) Mode: most investors earn less than expected

Part II: Estimation Compounds the Problem

  • We estimate \(\mu\) with error; this adds more upward bias
  • Bias factor: \(\exp(T^2\sigma^2/2N)\)
  • Estimation risk grows with \(T^2\), not \(T\)

Next week: We’ll question the normality assumption—are returns actually normal?