Apply semi-supervised learning for semantic segmentation

Gábor Lant
4 min readFeb 23, 2021

One of the key strength of deep learning is that it can work with large amount of data to train machine learning models. These models can be trained to find the underlying structure of the data. On the other hand this also means that data is also a requirement for these methods to work properly. Finding enough good quality data, however, can be a challenging problem by itself. In computer vision tasks this is a common problem. Labeling images is difficult tasks and usually can only be done by a trained human agent. For example annotating videos frame by frame require a lot more time to label than the creation or the playback of the original media. For computer vision tasks this sets new problems for researchers to solve.

Generalization

Deep learning methods achieved remarkable performance in image processing tasks. However, training the models require large datasets to avoid overfitting. Overfitting can cause the model to perfectly learn the features of the data preventing it from generalizing. Generalization in these terms mean the performance of the model when introduced to data it has never seen before (e.g. test data). This can be achieved through careful network design or data augmentation.

There are many strategies that focus on increasing the performance of the network by modifying its architecture. Some commonly applied methods are dropout, batch normalization or transfer learning.

Data augmentation on the other hand does not modify the model, instead it artificially increases the size of available training data. Data augmentation allow us to create new artificial data points through some kind of data modification method. This method varies by the task. For computer vision problems for example: translation, cropping, noise injection and color space transformations work well.

Augmentation examples
Augmentation examples

One of the problems when using data augmentation is choosing the right set of policies for the given data. Some recent approaches solve this problem by applying policy search methods.

  • AutoAugment is a method which uses reinforcement learning style policy search to find the optimal set of augmentations.
  • RandAugment does not apply a searching method, but instead applies augmentations randomly for each batch. This methods seems to have the similar performance but with less computational overhead.

Semi-supervised learning

This approach falls between supervised and unsupervised learning. Semi-supervised methods utilize the fact that unlabeled data is much easier to produce economically. These methods try to improve the performance of the model by providing it both labeled and unlabeled data in hopes to improve performance and generalization.

A recent study has shown that there is a lot to learn from semi-supervised methods, that apply data augmentation to the training phase.

Unsupervised Data Augmentation (UDA) has shown state-of-the-art performance when applied in to computer vision tasks where data is scarce. This method uses a semi-supervised model with consistency training. Left side of the model is calculates supervised loss while the other side calculates consistency loss from a set of augmented and non-augmented unlabeled data.

UDA model
UDA model

Applying it for semantic segmentation

Semantic segmentation is a problem where a lot of training is needed to achieve significant performance. However creating label maps for every pixel on the image is a time consuming process.

UDA can help to solve this problem by applying the strength of semi-supervised learning and data augmentation. Using two standard semantic segmentation models Deeplab v3 and U-Net on the popular benchmark Cityscapes dataset we achieved noticeable improvements over the baseline models with labeled data only. The two models achieved 0.600 and 0.676 mIoU score for the baseline training.

UDA was able to improve the results by +0.062 and +0.002 mIoU score over the baseline.

Some examples of the output: (top to bottom: U-Net, Deeplab, UDA U-Net, UDA Deeplab).

Input image — predicted image — ground truth

This study shows that semi-supervised learning can improve model accuracy in semantic segmentation tasks and that there is a lot more to learn about how to train models with specific augmentations to improve generalization.

--

--