im following introduction course r-machine learning. im doing following:
- load faithful dataset (standard in r)
creating model predicts waiting time -> eruptions of first 136 items of dataframe.
mymodel = glm(data=faithful[1:136,], waiting ~ eruptions)calculate mean standard error on second half of data.
mean((faithful$waiting - predict(mymodel, faithful[137:272, ])), ^2)
the way see take mean of waitingtimes - predicted values (using glm model) , exponent^2.
i not understand why done? explain me in plain english?
using mean squared error can seen special case of maximum likelihood parameter estimation. 1 wants choose model such likelihood of observation maximal selected model. likelihood is:
$$ {\cal l} = p(z_1|y_1) \cdot \ldots \cdot p(z_n|y_n) $$
where p(z|y) probability of observed value z when model predicts value y.
so 'fitting' model done maximizing l or equivalently minimizing
$$ - \log({\cal l}) $$
if assume normal distribution centered @ y_i p's , assume p's have same width parameter sigma (set 1 convenience), you'll get:
$$ - \log ({\cal l}) = constant - \log ( e^{-\frac{1}{2} \cdot (y_1-z_1)^2} ) - \ldots - \log ( e^{-\frac{1}{2} \cdot (y_n-z_n)^2} ) $$
setting width parameter of gaussians same value means measurements have same uncertainty (i.e. same precision).
where log , exp cancel each other , you'll end with:
$$ - \log ({\cal l}) = constant + \frac{1}{2} \cdot (y_1-z_1)^2 + ... + \frac{1}{2} \cdot (y_n-z_n)^2 $$ purposes of minimization, can drop constant , factors 0.5 , you'll end mean (or summed) squared error.
Comments
Post a Comment