Skip to content

Regularization for Deep Learning

  • Goal: Make the algorithm "generalize", e.g. that it will perform well on data it has never been trained with
  • We don't want to memorize the training data

Parameter Regularization

  • Take e.g. the L2-Norm
  • Multiply it with a hyperparameter and add it to the cost function
  • Goal: Weights should be small (weight decay)
  • with , weights want to go towards zero
  • Intuition: Small curvatures get pulled to zero, big curvatures don't get affected by the regularization

  • L1-Regularization is a "soft threshold" that makes a lot of weights go to zero

    • Fewer values to calculate

Dataset Augmentation

  • Create artificial training data
  • e.g. for picture classification: Just turn the image slightly, or zoom in / out, darken / brighten, etc.
  • The label stays the same, but we can multiply the training set
  • i.e. a transformation that changes the input x, but not the output y
  • inject noise to make the dataset more non-deterministic
    • We could also apply noise to the weights. Bayes philosophy that weights are just random variables with DFs, noise is a way to simulate that