Stacked Dense U-Nets with Dual Transformers for Robust Face Alignment


Facial landmark localisation in images captured in-the-wild is an important and challenging problem. The current state-of-the-art revolves around certain kinds of Deep Convolutional Neural Networks (DCNNs) such as stacked U-Nets and Hourglass networks. In this work, we innovatively propose stacked dense U-Nets for this task. We design a novel scale aggregation network topology structure and a channel aggregation building block to improve the model’s capacity without sacrificing the computational complexity and model size. With the assistance of deformable convolutions inside the stacked dense U-Nets and coherent loss for outside data transformation, our model obtains the ability to be spatially invariant to arbitrary input face images. Extensive experiments on many in-the-wild datasets, validate the robustness of the proposed method under extreme poses, exaggerated expressions and heavy occlusions. Finally, we show that accurate 3D face alignment can assist pose-invariant face recognition where we achieve a new stateof-the-art accuracy on CFP-FP.

Demo Image




                    title={Stacked dense u-nets with dual transformers for robust face alignment},
                    author={Guo, Jia and Deng, Jiankang and Xue, Niannan and Zafeiriou, Stefanos},
                    journal={arXiv preprint arXiv:1812.01936},

Github Implementation