What is Data Augmentation?
It is a way to perform Regularization. Data augmentation is a technique to increase the amount of data by applying various transformations to the original dataset.
Data Augmentation techniques
Image Manipulation based techniques
Pixel Level Transforms : Apply Blur, Jitter, Noise, etc., to the image.
Spatial Level Transforms : Change the whole image by flipping, rotating, cropping, etc.
“PatchShuffle Regularization”, 2017 : A technique that randomly shuffles feature values within an $N \times N$ non-overlapping sliding window.
“Random Erasing Data Augmentation”, 2017 : After creating a bounding box with a random size of the input image, fill it with random noise, ImageNet mean value, 0, 255, etc to train the model.
“Improved Regularization of Convolutional Neural Networks with Cutout”, 2017 : Cutout is a method of filling a random bounding box with zeros.
“Data Augmentation by Pairing Samples for Images Classification”, 2018 : After randomly extracting two images A and B from the training set, randomly crop them to a size of 224, and then apply random horizontal flip. The two patches thus obtained are averaged to create a mixed patch. At this time, Label uses the label of A as it is.
“Improved Mixed-Example Data Augmentation”, 2018 : Mixing two images by 8 ways.
“MixUp: Beyond Empirical Risk Minimization”, 2018 : A weighted linear interpolation technique between two images and a label through a lambda value between 0 and 1. Usually lambda values are drawn through a beta distribution.
“Data augmentation using random image cropping and patches for deep CNNs”, 2018 : A random image cropping and patching (RICAP) technique that combines random cropped patches from four images into one image. Also, like the mixup, 4 labels are mixed according to the area ratio of the patch to create a soft label for learning.
“Manifold Mixup: Better Representations by Interpolating Hidden States”, 2018 : Mixup is performed at the hidden representation or feature map level rather than on the input image.
“Hide-and-Seek: A Data Augmentation Technique for Weakly-Supervised Localization and Beyond”, 2018 : After dividing the image into grids, the patch is randomly erased at the every iteration and train the model.
“CutMix: Regularization Strategy to Train Strong Classifiers with Localizable Features”, 2019 : Add Cut and Mix technique. After erasing the box from the A image, the patch is extracted from the B image and inserted into the empty area.
“AugMix: A Simple Data Processing Method to Improve Robustness and Uncertainty”, 2019 : A method of connecting multiple augmentation techniques in series or parallel to a single image and then mixing them with the original.
“SmoothMix: A Simple Yet Effective Data Augmentation to Train Robust Classifiers”, 2020 : CutMix has a strong edge problem that occurs when a sharp change occurs in the edge area in the process of cutting and attaching patches. To alleviate this problem, the SmoothMix method blends the border area smoothly.
“PuzzleMix: Exploiting Saliency and Local Statistics for Optimal Mixup”, 2020 : The PuzzleMix method, which is a method of mixing while preserving the saliency information of each image. Through this, it is possible to preserve the local statistics of each image, show higher generalization performance than the existing Mix series, and obtain the effect of becoming robust against adversarial attacks.
“The Many Faces of Robustness: A Critical Analysis of Out-of-Distribution Generalization”, 2020 : A method of augmentation by changing the weights and activations of the learned Image-to-Image Network (Ex, Autoencoder, Super Resolution Network).
Generative model based methods
“Data Augmentation in Emotion Classification Using Generative Adversarial Networks”, 2017 : The facial emotion classification data generated by CycleGAN is used for learning to improve classification performance by mitigating class imbalance.
“GAN-based Synthetic Medical Image Augmentation for increased CNN Performance in Liver Lesion Classification”, 2018 : Liver Lesion images generated through DCGAN are additionally used for learning to improve classification performance.
“SinGAN: Learning a Generative Model from a Single Natural Image”, 2019 : SinGAN is a technique that trains GAN with one image to generate many similar plausible images.
AutoML based methods
“AutoAugment: Learning Augmentation Policies from Data”, 2018 : A method of extracting an augmentation policy through an RNN controller, training the network to select validation accuracy, and then using it as a reward for reinforcement learning (PPO) to learn.
“Population Based Augmentation: Efficient Learning of Augmentation Policy Schedules”, 2019 : A method based on the Population Based Training (PBT) algorithm, one of the Hyper Parameter Optimization techniques. The weights of the model with good performance are replicated (exploited), and the parameters are slightly modified (explore).
“Fast AutoAugment”, 2019 : The search time can be drastically reduced by extracting the augmentation policy through the Tree-structured Parzen Estimator (TPE) method, a Bayesian optimization technique, and applying augmentation in the process of validating the trained model. It is faster than PBA and has similar accuracy. can be achieved.
“Faster AutoAugment: Learning Augmentation Strategies using Backpropagation”, 2019 : A method for faster search through gradient-based optimization while relaxing a discrete search space into a continuous search space through a gradient approximation technique that makes non-differentiable image operations differentiable.
“RandAugment: Practical automated data augmentation with a reduced search space”, 2019 : A technique of randomly extracting and applying augmentation options among several augmentation options every time a batch is extracted, omitting the search for an augmentation policy.
“UniformAugment: A Search-free Probabilistic Data Augmentation Approach”, 2020 : Starting from the point where hyperparameter search is required in RandAugment, augmentation is randomly applied probabilistically without any search at all.