Deep Double Descent 요약

Deep Double Descent : Where Bigger Models And More Data Hurt(2019)

Double Descent phenomenon

Bigger models are better

  • Performance first improves, then gets worse, and then improves again with increasing model size, data size, or training time.
  • This phenomenon is called “Double Descent”

EMC(Effective Model Complexity)

  • EMC: the maximum number of samples n on which Training procedure T acheives on average ≈ 0 training error

  • interpolation threshold : EMC(T) = n
  • Critical interval : interval around interpolation threshold
    • Below and above critical interval : complexity ↑, performance ↑
    • Within critical interval : performance ↓

Model-wise double descent

  • Double Descent intensifies when there is more label noise.

Epoch-wise double descent

  • (Left) Larger model and intermediate model has double descent.
    • Larger model has double descent earlier than intermediate one.
  • (Right) Test error of larger model descreases first, increases, and then descreases again as # of epochs increases.
    • But test error of intermediate model is not. It is better to stop early.

Sample-wise non-monotonicity

  • (Left) Double descent abates when more samples.
  • (Right) there is a regime where more samples hurt performance. But more than 10K, smaller model is better.

  • increasing # of samples shifts the curve downwards towards lower test error.
  • More samples require larger models to fit.
  • For intermediate model, more samples hurts performance.

Conclusion

  • In general, the peak of test error appears when models are just barely able to fit the train set.
  • Models at the interpolation threshold is the worst and label noise can easily destroy its gloabl structure.
  • However, in the over-parameterized regime, there are many models that fit the train set.
  • But authors don’t know why this tendency happens.

references

  • https://arxiv.org/pdf/1912.02292.pdf
  • https://openai.com/blog/deep-double-descent/
  • https://bluediary8.tistory.com/59
박나깨

박나깨

저는 Deep Learning, Computer Vision, AI, Image Processing에 관심이 있는 학생입니다.