Deep face recognition has achieved remarkable improvements due to the introduction of margin-based softmax loss,
in which the prototype stored in the last linear layer represents the center of each class.
In these methods, training samples are enforced to be close to positive prototypes and far apart from negative
prototypes by a clear margin. However, we argue that prototype learning only employs sample-to-prototype comparisons
without considering sample-to-sample comparisons during training and the low loss value gives us an illusion of perfect
feature embedding, impeding the further exploration of SGD. To this end, we propose Variational Prototype Learning (VPL),
which represents every class as a distribution instead of a point in the latent space. By identifying the slow feature
drift phenomenon, we directly inject memorized features into prototypes to approximate variational prototype sampling.
The proposed VPL can simulate sample-to-sample comparisons within the classification framework, encouraging the SGD solver
to be more exploratory, while boosting performance. Moreover, VPL is conceptually simple, easy to implement,
computationally efficient and memory saving. We present extensive experimental results on popular benchmarks,
which demonstrate the superiority of the proposed VPL method over the state-of-the-art competitors.