BlendGAN: Implicitly GAN Blending for Arbitrary Stylized Face Generation
NeurIPS 2021

Mingcong Liu Qiang Li Zekui Qin Guoxin Zhang Pengfei Wan Wen Zheng

Y-tech, Kuaishou Technology

[Paper]

[Code]

[Bibtex]

Abstract

Generative Adversarial Networks (GANs) have made a dramatic leap in high-fidelity image synthesis and stylized face generation. Recently, a layer-swapping mechanism has been developed to improve the stylization performance. However, this method is incapable of fitting arbitrary styles in a single model and requires hundreds of style-consistent training images for each style. To address the above issues, we propose BlendGAN for arbitrary stylized face generation by leveraging a flexible blending strategy and a generic artistic dataset. Specifically, we first train a self-supervised style encoder on the generic artistic dataset to extract the representations of arbitrary styles. In addition, a weighted blending module (WBM) is proposed to blend face and style representations implicitly and control the arbitrary stylization effect. By doing so, BlendGAN can gracefully fit arbitrary styles in a unified model while avoiding case-by-case preparation of style-consistent training images. To this end, we also present a novel large-scale artistic face dataset AAHQ. Extensive experiments demonstrate that BlendGAN outperforms state-of-the-art methods in terms of visual quality and style diversity for both latent-guided and reference-guided stylized face synthesis.

Overview

The style encoder E_style extracts the style latent code z_s of a reference style image. The face latent code z_f is randomly sampled from the standard Gaussian distribution. Two MLPs transform face and style latent codes into their W spaces separately, then they are combined by the weighted blending module (WBM) and fed into generator G to synthesise natural and stylized face images. Three discriminators are used in our method. The face discriminator D_face distinguishes between real and fake natural-face images, the style discriminator D_style distinguishes between real and fake stylized-face images, and the style latent discriminator D_{style_latent} predicts whether the stylized-face image is consistent with the style latent code z_s.

Paper

M. Liu, Q. Li, Z. Qin, G. Zhang,
P. Wan, W. Zheng.
BlendGAN: Implicitly GAN Blending for Arbitrary Stylized Face Generation.
NeurIPS, 2021.

[Paper] | [Bibtex]

Acknowledgements

We sincerely thank all the reviewers for their comments. We also thank Zhenyu Guo for help in preparing the comparison to StarGANv2.