Level Playing Field for Million Scale Face Recognition 논문 리뷰 - 개발 여정 - A Journey to be a Developer

본 논문의 MegaFace2 Dataset을 만드는 과정에서 라벨링을 위한 클러스터링 작업 일부를 차용할 예정.

Clustering Algorithm

Face graph 초기화
N x N binary matrix 생성. N은 number of faces, (i,j)번째 원소는 face i 와 face j 가 동일인물(Same identity)인지 표시함(= linking i and j)
average pairwise euclidean distance D를 계산
i와 j 사이의 distance < βD 이고 연결되어있지 않다면, edge 추가
connected components C 얻기
모든 c_i에 대해서, connected components의 크기가 Z보다 작다면, identity가 되기에는 너무 작다고 판단하여 그래프에서 제거함.
남아있는 connected components를 클러스터로 저장함.

Cluster Optimization

Impure Cluster Detection

Garbage 클러스터의 average pairwise distance가 높음.
조금의 노이즈를 포함한 순수 클러스터의 average pairwise distance도 높음.

→ average pairwise distance is not resistant to outliers

Computing outliers
- Median Absolute Deviation(MAD)
-MAD(X) = Median(X’) ( X’ is the vector of absolute deviations from the median)

클러스터 C의 모든 원소 c_i에 대해서 d_i가 다음조건을 만족하면 impure하다고 판단한다.

Inner-Cluster Purification

클러스터가 impure하다고 판단된 경우, noise를 제거해야함.
특정 클러스터 c_k의 distance matrix d_k를 생성.
- d_k의 (i,j) 원소는 face i와 face j 의 특징벡터 사이의 l2 distance를 말함.
- row-wise summation으로 벡터 v를 생성함. 벡터 v는 얼마나 distance에 기여를 했는지 나타냄.
벡터 v_i가 다음 조건을 만족하면 outlier라고 판단하여 제거한다.

Inner-cluster purification 후에는 다시 Impure cluster detection을 하여 여전히 impure cluster가 존재하는지 확인함.

활용방법

impurity를 판단하는 기준을 역으로 활용하여, purity한 클러스터를 찾는 방법으로 바꾼다.
각 클러스터의 average pairwise distance를 구하고, 위의 조건식에서 α보다 작은 경우 pure cluster라고 간주.
pure cluster중 average pairwise distance가 작은 순으로 소팅

박나깨

저는 Deep Learning, Computer Vision, AI, Image Processing에 관심이 있는 학생입니다.

Recent post

Deep Double Descent 요약 Improved Training of WGAN(WGAN-GP) 논문 요약 Wasserstein GAN 논문 요약 LSGAN과 DCGAN