Robust Training of Deep Neural Networks with Extremely Noisy Labels

This paper introduces and investigates a co-teaching machine learning strategy to increase the robustness of a deep neural network to training datasets with noisy labels. The motivation for this investigation stems from the fact that typically in the world, it is not inconceivable that noisy labels may exist in various datasets and neural networks should be robust to such noise. The authors point out that deep neural networks are notoriously known to fitting noisy labels as training epochs become large due to the so-called memorizing effect. The authors assert that deep neural networks memorize easy instances first and then gradually try to adapt to noisy instances. The proposed approach exploits this phenomenon by training two neural networks simultaneously similar to the Co-training approach introduced in [1]. The training of each network is done with the biased selection of small loss instances from each mini batch of the peer network to update the parameters. Unlike in the existing MentorNet or Decoupling approaches, in which the error in one network is directly fed back into the same network in the second mini batch, the Co-teaching approach leverages the fact that two networks have different learning capabilities and this serves the purpose of filtering out errors introduced by noisy labels. The optimization method used for both networks was the stochastic gradient descent algorithm with momentum is known to generalize well. The authors argue that when deep neural networks memorize clean data, they become robust and hence will attenuate errors from the subsequent noisy data. To prove the proposed approach introduced, experiments were conducted on different noisy renditions of the popular MNIST, CIFAR-100 and CIFAR-10 datasets. Results from the experiments showed the proposed Co-teaching approach performed better than existing state-of-the-art baselines after training with varying degrees of noisy conditions.
Details of the Approach
The Co- teaching approach proposed trains two deep neural networks f, with parameter
Wf
and g with parameter Wg. During the first mini-batch pass, network f is trained with a percentage of instances in the minibatch with a small training loss. This selection is controlled by the parameter R(T). The R(T) selected instances are then fed into network g as useful knowledge for updating the parameters and the process is repeated with networks f and g swapped. The error flow therefore takes a crisscross path between the two networks.

Get Help With Your Essay
If you need assistance with writing your essay, our professional essay writing service is here to help!
Essay Writing Service

The authors acknowledged that in order for this approach to work it was important to establish that the small-loss instances selected were indeed clean. Working with the ability of neural networks to filter out noisy instances using their loss values at the initial stages of training explained by the memorizing effect, more instances in the mini batch were kept at the beginning of training and then gradually dropped the noisy instances by increasing R(T). This is similar to boosting and active learning which have been shown to be to be sensitive to outliers. The proposed Co-teaching combats this problem exploiting the fact that two classifiers can produce different hyperplanes and would have different abilities to filter noise when exposed to noisy data. This was the motivation behind exchanging selected small-loss instances between the networks to update the respective parameters. Although the authors drew motivation from Co-training, they argue the proposed approach needs a single set of features unlike Co-training which needs two and exploits memorization of deep neural networks which Co-training does not.
As stated earlier the authors used three popular benchmark datasets to verify the effectiveness of their proposed model MNIST, CIFAR-100 and CIFAR-10. The authors had to manually corrupt the datasets by using the transition matrix Q which flipped clean labels to noisy labels. The authors defined two structures for Q. Pair flipping where labels are flipped within very similar classes and symmetry flipping were labels are flipped based on a constant probability. Noise rate of 0.45 was chosen for the Pair flipping and 0.5 for the symmetry flipping. The model was also evaluated on data with noise rate of 0.2 in order to measure its performance against low-level noisy data. The performance on these datasets was compared to MentorNet, Decoupling, S-Model, Bootstrap, F-correction and the standard Deep neural network trained on noisy data. All of these methods were implemented with a Convolutional Neural Network with Leaky-RELU as the activation function.9 CNN layer structure with Adam optimizer and a learning rate of 0.001 was used. Test accuracy and label precision were used as the performance metrics. The results from the MNIST database revealed Co-teaching achieved better results with both 45% pair flipping noise rate and 50% symmetry flipping than all the other state-of-the-art methods. It was also performed better than all the other models except F-correction on data with 20% noise rate. The Co-teaching algorithm again outperformed its competitors on both CIFAR-100 and CIFAR-10 datasets in the various noise level conditions defined except in the 20% noise rate case where the F-correction was better.
The idea presented is elegant and relevant given that real data may have noisy labels. The experiments produced consistent results proving the reliability of the proposed approach. In the implementation of the Co-teaching approach, it is assumed the quality of the labels is unknown. The confidence of the labels is therefore estimated by the small -loss and the noise rate estimated by τ in the experiments which determines the drop rate R(T)
References
[1] A. Blum and T. Mitchell. Combining labeled and unlabeled data with co-training. In COLT, 1998.

Turn in your highest-quality paper
Get a qualified writer to help you with

“ Robust Training of Deep Neural Networks with Extremely Noisy Labels ”

Get high-quality paper

NEW! AI matching with writer

How Our Essay Writing Service Works

First, you will need to complete an order form. It's not difficult but, in case there is anything you find not to be clear, you may always call us so that we can guide you through it. On the order form, you will need to include some basic information concerning your order: subject, topic, number of pages, etc. We also encourage our clients to upload any relevant information or sources that will help.

Complete the order form

Once we have all the information and instructions that we need, we select the most suitable writer for your assignment. While everything seems to be clear, the writer, who has complete knowledge of the subject, may need clarification from you. It is at that point that you would receive a call or email from us.

Writer’s assignment

As soon as the writer has finished, it will be delivered both to the website and to your email address so that you will not miss it. If your deadline is close at hand, we will place a call to you to make sure that you receive the paper on time.

Completing the order and download