Though tremendous strides have been made in uncontrolled face detection, accurate and efficient 2D face alignment and 3D face reconstruction in-the-wild remain an
open challenge. In this paper, we present a novel singleshot, multi-level face localisation method, named RetinaFace, which unifies face box prediction, 2D facial landmark localisation and 3D vertices regression under one
common target: point regression on the image plane. To
fill the data gap, we manually annotated five facial landmarks on the WIDER FACE dataset and employed a semiautomatic annotation pipeline to generate 3D vertices for
face images from the WIDER FACE, AFLW and FDDB
datasets. Based on extra annotations, we propose a mutually beneficial regression target for 3D face reconstruction, that is predicting 3D vertices projected on the image
plane constrained by a common 3D topology. The proposed
3D face reconstruction branch can be easily incorporated,
without any optimisation difficulty, in parallel with the existing box and 2D landmark regression branches during joint
training. Extensive experimental results show that RetinaFace can simultaneously achieve stable face detection,
accurate 2D face alignment and robust 3D face reconstruction while being efficient through single-shot inference.