Font Selection
Modify the font used in the body of these notes by selecting in the dropdown. Quite often it can take a moment (up to 10 seconds on a slow connection) to download the new font, so bear with any glitch in rendering.
This is some sample text from the course so that you can see what the new text and maths fonts you select will look like.
The full dataset we use to ‘learn’ is thus \(\mathcal{D} = \left( (\vec{x}_1, y_1), \dots, (\vec{x}_n, y_n) \right) \subset (\mathcal{X} \times \mathcal{Y})^n\), where \(\vec{x}_i\) is a vector of length \(d\),
\[\vec{x}_i = (x_{i1}, \dots, x_{id})^T \in \mathcal{X}\]
We denote all observations of a single feature as \(\vec{x}_{\cdot j}\),
\[ \vec{x}_{\cdot j} = (x_{1j}, \dots, x_{nj})^T \]
\[\begin{align*} \mathcal{E}(\hat{f}) &= -\mathbb{E}\left[ \sum_{j=1}^g \bbone\{Y=j\} \log \hat{p_j} \right] \\ &= -\int_{\mathcal{X} \times \mathcal{Y}} \sum_{j=1}^g \bbone\{Y=j\} \log \hat{p_j} \,d\pi_{XY} \\ &= -\int_\mathcal{X} \sum_{j=1}^g \pi_{Y|X}(Y=j \given X) \log \hat{f_j}(X) \,d\pi_{X} \\[2em] \hat{\mathcal{E}}_\mathcal{D'}(\hat{f}) &= -\frac{1}{m} \sum_{i=1}^m \sum_{j=1}^g \bbone\{y_i=j\} \log \hat{f_j}(\vec{x}_i) \end{align*}\]
Returning to the expected standard error, this leads to,
\[\begin{align*} \star &= \mathbb{E}_{D_{|\mathcal{T}_r|} D_{|\mathcal{T}_e|}}\left[ \left( \widehat{\text{Err}}_\text{ho} - \bar{\mathcal{E}}_n \right)^2 \right] - \underbrace{\left( \bar{\mathcal{E}}_n - \bar{\mathcal{E}}_{|\mathcal{T}_r|} \right)^2}_{\text{sample size bias}} - \text{Var}\left( \mathcal{E}(\hat{f} \given \mathcal{D}_{\mathcal{T}_r}) \right) \end{align*}\]Thus, the standard error is too small for a confidence interval based on it around \(\widehat{\text{Err}}_\text{ho}\) to have correct coverage for \(\bar{\mathcal{E}}_n\).