MixMatch : A Holistic Approach to Semi-Supervised Learning 논문요약 - 개발 여정 - A Journey to be a Developer

1. Consistency Regularization

Supervised Learning : 데이터를 변형해도 클래스 정보는 바뀌지 않는다.
Semi-Supervised Learning : a classifier should output the same class distribution for an unlabeled example even after it has been augmented.
MixMatch에서는 Standard data augmentation(random horizontal flips & crops) 사용

2. Entropy Minimization

Entropy ↑ : 불확실성↑, 확률분포가 평평해짐.
Entropy ↓ : 확률분포가 극단적으로 치우져짐.

3. MixUp

한 쌍의 데이터와 레이블 각각을 가지고 convex combination을 통해 새로운 데이터를 생성함.
unseen data에 강해짐, overfitting 방지

MixMatch

label data : augmented
unlabeled data : K augmentations
Average Predictions & Temperature sharpening → Pseudo label

※ Sharpening

T→0, “one-hot” distribution이 됨
T ↓, Lower-Entropy distribution

N Labeled data, M unlabeled data가 있을 때, 이것을 모두 Concat & Shuffle
그리고 그 shuffled data를 다시 N개와 M개로 나눔. 각각을 W_L, W_U라고 할때

MixUp N labeled data & W_L
MixUp M unlabeled data & W_U

Mixup

0.5 <= λ’ <= 1
항상 x1과 p1이 많이 반영됨.
즉, W_L, W_U보다는 원래 데이터(N Labeled data, M unlabeled data)를 믹스업에 더 많이 반영되도록함.

labeled group : cross entropy between predictions and mixup label
unlabeled group : Squared L2 loss between predictions and mixup label

Loss

X’ : labeled group U’ : unlabeled group L_x : cross entropy loss L_u : Squared L2 Loss λ_u : weight for L_u

Results

250 examples일 때, 다른모델들과 비교하였을때 독보적으로 에러율이 낮았음.
250 examples일 때, 이미 supervised model 과 에러율이 비슷함.

Mixmatch의 모든 요소들이 성능에 영향을 미침.
특히 250 labels일 때 dramatic difference가 있었음.

박나깨

저는 Deep Learning, Computer Vision, AI, Image Processing에 관심이 있는 학생입니다.

Recent post

Deep Double Descent 요약 Improved Training of WGAN(WGAN-GP) 논문 요약 Wasserstein GAN 논문 요약 LSGAN과 DCGAN