On the Geometric and Statistical Interpretation of Data Augmentation
Abstract
Data augmentation (DA) is a common technique in training machine learning models. For
example in image classifications, people augment image datasets by random cropping,
rotating, and adding random noises. Another trending technique is the adversarial training,
where the datasets are augmented by adversarial examples. Despite its empirical
effectiveness, the theory behind DA is rarely known.
In this thesis, we analyze why DA generalizes and robustifies our models, from both
geometric and statistical points of view. Geometrically, we provide both upper and lower
bounds on the margins created by DA, via convex geometric arguments. The upper bound
on the margin is distribution-independent, while the lower bound on the margin fits a
wide range of probability distributions.
Statistically, we prove that DA helps generalization by controlling the stability of our
learning algorithm, in a very small cost, given the training data is sufficiently large. In
addition, with the same sample complexity, noise robustness is guaranteed.
Subject
statistical learning theory
stability
robustness
Permanent Link
http://digital.library.wisc.edu/1793/79132Citation
TR1858