Deep Double Descent 요약 - 개발 여정 - A Journey to be a Developer

Deep Double Descent : Where Bigger Models And More Data Hurt(2019)

Double Descent phenomenon

Bigger models are better

Performance first improves, then gets worse, and then improves again with increasing model size, data size, or training time.
This phenomenon is called “Double Descent”

EMC(Effective Model Complexity)

EMC: the maximum number of samples n on which Training procedure T acheives on average ≈ 0 training error

interpolation threshold : EMC(T) = n
Critical interval : interval around interpolation threshold
- Below and above critical interval : complexity ↑, performance ↑
- Within critical interval : performance ↓

Model-wise double descent

Double Descent intensifies when there is more label noise.

Epoch-wise double descent

(Left) Larger model and intermediate model has double descent.
- Larger model has double descent earlier than intermediate one.
(Right) Test error of larger model descreases first, increases, and then descreases again as # of epochs increases.
- But test error of intermediate model is not. It is better to stop early.

Sample-wise non-monotonicity

(Left) Double descent abates when more samples.
(Right) there is a regime where more samples hurt performance. But more than 10K, smaller model is better.

increasing # of samples shifts the curve downwards towards lower test error.
More samples require larger models to fit.
For intermediate model, more samples hurts performance.

Conclusion

In general, the peak of test error appears when models are just barely able to fit the train set.
Models at the interpolation threshold is the worst and label noise can easily destroy its gloabl structure.
However, in the over-parameterized regime, there are many models that fit the train set.
But authors don’t know why this tendency happens.

references

https://arxiv.org/pdf/1912.02292.pdf
https://openai.com/blog/deep-double-descent/
https://bluediary8.tistory.com/59

박나깨

저는 Deep Learning, Computer Vision, AI, Image Processing에 관심이 있는 학생입니다.

Recent post

Deep Double Descent 요약 Improved Training of WGAN(WGAN-GP) 논문 요약 Wasserstein GAN 논문 요약 LSGAN과 DCGAN