What I want to explain here is how to use K-fold cross validation in neural networks. We know that in neural networks, we use the back propagation algorithm to adjust the weights or parameters of the neural network as many times until the error (i.e. the difference of the predicted output and expected output values) are small enough for us. I will call these set of parameters/weights in the neural network as simply model parameters.
Now, we have heard as well that K-fold cross validation can help determine the best model parameters, that is the set of model parameters that can generalise(i.e. perform well) on unseen data(e.g. held-out test set). So the question is: how can we use K-fold cross validation to do that?
For example, you were given tweets and the task is named entity recognition (NER). Each tweet has been tokenized and each token labeled with a particular entity type (e.g. location, person, organization, etc). So how do you proceed?
Using neural network, here’s how you should proceed:
Split the data into k-folds and each fold has train, dev, and test set.
For each fold, within an epoch loop, train the neural network parameters with the training set and decide whether to continue training using the dev set. Finally at the end of every epoch, you can test the model performance by using the test set.
Compute the final f-score using method 3 in this paper.
Retrain the whole dataset using the neural network param with the best f-score.
That’s it! You have the model which can generalise to unseen examples.