These Methods refit a model of interest to samples formed from the training set, in order to obtain additional Information about the fitted model. For example they provide estimates of test set prediction error and the standard deviation and bias of our parameter estimates.
Distinction between the Test Error and Training Error:
Test error is the average error that results from using a statistical learning method to predict the response on a new observation one that was not used in training the method. In contrast the training error can be easily calculated by applying the statistical learning method to the observations used in its training. But the training error rate is often quite different from the test error rate and in particular the former can dramatically underestimate the latter.
Bias-Variance Trade-off:
Bias and variance together give us prediction error and there’s a trade-off they sum together to get prediction and the trade-off is minimized. so bias and variance gives us the test error.
Validation-Set Approach:
Here we randomly divide the available set of samples into two parts: a training set and a validation or hold -out set.
The model is fit on the training set and the fitted model is used to predict the response for the observations in the validation set. The resulting validation set error provides on estimate of the test error. This is typically assessed using MSE in the case of a qualitative response and misclassification rate in the case of qualitative (discrete) response.
Drawbacks of validation approach:
the validation estimate of the test error can be highly variable, depending on precisely which observations are included in the training set and which observations are included in the validation set . In the validation approach only a subset of the observations those that are included in the training set rather than in the validation set are used to fit the model. This suggests that the validation set error may tend to overestimate the test error for the model fit on the entire data set. why? In general the more the data one has the lower the error.