r - Understand the reason for calculating the MSE - Cross Validated


im following introduction course r-machine learning. im doing following:

  1. load faithful dataset (standard in r)
  2. creating model predicts waiting time -> eruptions of first 136 items of dataframe. mymodel = glm(data=faithful[1:136,], waiting ~ eruptions)

  3. calculate mean standard error on second half of data.

    mean((faithful$waiting - predict(mymodel, faithful[137:272, ])), ^2)

the way see take mean of waitingtimes - predicted values (using glm model) , exponent^2.

i not understand why done? explain me in plain english?

using mean squared error can seen special case of maximum likelihood parameter estimation. 1 wants choose model such likelihood of observation maximal selected model. likelihood is:

$$ {\cal l} = p(z_1|y_1) \cdot \ldots \cdot p(z_n|y_n) $$

where p(z|y) probability of observed value z when model predicts value y.

so 'fitting' model done maximizing l or equivalently minimizing

$$ - \log({\cal l}) $$

if assume normal distribution centered @ y_i p's , assume p's have same width parameter sigma (set 1 convenience), you'll get:

$$ - \log ({\cal l}) = constant - \log ( e^{-\frac{1}{2} \cdot (y_1-z_1)^2} ) - \ldots - \log ( e^{-\frac{1}{2} \cdot (y_n-z_n)^2} ) $$

setting width parameter of gaussians same value means measurements have same uncertainty (i.e. same precision).

where log , exp cancel each other , you'll end with:

$$ - \log ({\cal l}) = constant + \frac{1}{2} \cdot (y_1-z_1)^2 + ... + \frac{1}{2} \cdot (y_n-z_n)^2 $$ purposes of minimization, can drop constant , factors 0.5 , you'll end mean (or summed) squared error.


Comments