Note
You are reading the documentation for MMEditing 0.x, which will soon be deprecated by the end of 2022. We recommend you upgrade to MMEditing 1.0 to enjoy fruitful new features and better performance brought by OpenMMLab 2.0. Check out the changelog, code and documentation of MMEditing 1.0 for more details.
Generation Models¶
CycleGAN (ICCV’2017)¶
Abstract¶
Image-to-image translation is a class of vision and graphics problems where the goal is to learn the mapping between an input image and an output image using a training set of aligned image pairs. However, for many tasks, paired training data will not be available. We present an approach for learning to translate an image from a source domain X to a target domain Y in the absence of paired examples. Our goal is to learn a mapping G:X→Y such that the distribution of images from G(X) is indistinguishable from the distribution Y using an adversarial loss. Because this mapping is highly under-constrained, we couple it with an inverse mapping F:Y→X and introduce a cycle consistency loss to push F(G(X))≈X (and vice versa). Qualitative results are presented on several tasks where paired training data does not exist, including collection style transfer, object transfiguration, season transfer, photo enhancement, etc. Quantitative comparisons against several prior methods demonstrate the superiority of our approach.
Results and models¶
We use FID and IS metrics to evaluate the generation performance of CycleGAN.
| Method | FID | IS | Download |
|---|---|---|---|
| official facades | 123.626 | 1.638 | - |
| ours facades | 118.297 | 1.584 | model | log |
| official facades-id0 | 119.726 | 1.697 | - |
| ours facades-id0 | 126.316 | 1.957 | model | log |
| official summer2winter | 77.342 | 2.762 | - |
| ours summer2winter | 76.959 | 2.768 | model | log |
| official winter2summer | 72.631 | 3.293 | - |
| ours winter2summer | 72.803 | 3.069 | model | log |
| official summer2winter-id0 | 76.773 | 2.750 | - |
| ours summer2winter-id0 | 76.018 | 2.735 | model | log |
| official winter2summer-id0 | 74.239 | 3.110 | - |
| ours winter2summer-id0 | 73.498 | 3.130 | model | log |
| official horse2zebra | 62.111 | 1.375 | - |
| ours horse2zebra | 63.810 | 1.430 | model | log |
| official horse2zebra-id0 | 77.202 | 1.584 | - |
| ours horse2zebra-id0 | 71.675 | 1.542 | model | log |
| official horse2zebra | 138.646 | 3.186 | - |
| ours zebra2horse | 139.279 | 3.093 | model | log |
| official horse2zebra-id0 | 137.050 | 3.047 | - |
| ours zebra2horse-id0 | 132.369 | 2.958 | model | log |
| official average | 95.935 | 2.444 | - |
| ours average | 95.102 | 2.427 | - |
Note: With a larger identity loss, the image-to-image translation becomes more conservative, which makes less changes. The original authors did not say what is the best weight for identity loss. Thus, in addition to the default setting, we also set the weight of identity loss to 0 (denoting id0 ) to make a more comprehensive comparison.
Citation¶
@inproceedings{zhu2017unpaired,
title={Unpaired image-to-image translation using cycle-consistent adversarial networks},
author={Zhu, Jun-Yan and Park, Taesung and Isola, Phillip and Efros, Alexei A},
booktitle={Proceedings of the IEEE international conference on computer vision},
pages={2223--2232},
year={2017}
}
Pix2Pix (CVPR’2017)¶
Abstract¶
We investigate conditional adversarial networks as a general-purpose solution to image-to-image translation problems. These networks not only learn the mapping from input image to output image, but also learn a loss function to train this mapping. This makes it possible to apply the same generic approach to problems that traditionally would require very different loss formulations. We demonstrate that this approach is effective at synthesizing photos from label maps, reconstructing objects from edge maps, and colorizing images, among other tasks. As a community, we no longer hand-engineer our mapping functions, and this work suggests we can achieve reasonable results without hand-engineering our loss functions either.
Results and models¶
We use FID and IS metrics to evaluate the generation performance of pix2pix.
| Method | FID | IS | Download |
|---|---|---|---|
| official facades | 119.135 | 1.650 | - |
| ours facades | 127.792 | 1.745 | model | log |
| official maps-a2b | 149.731 | 2.529 | - |
| ours maps-a2b | 118.552 | 2.689 | model | log |
| official maps-b2a | 102.072 | 3.552 | - |
| ours maps-b2a | 92.798 | 3.473 | model | log |
| official edges2shoes | 75.774 | 2.766 | - |
| ours edges2shoes | 85.413 | 2.747 | model | log |
| official average | 111.678 | 2.624 | - |
| ours average | 106.139 | 2.664 | - |
Note: we strictly follow the paper setting in Section 3.3:
“At inference time, we run the generator net in exactly the same manner as during the training phase. This differs from the usual protocol in that we apply dropout at test time, and we apply batch normalization using the statistics of the test batch, rather than aggregated statistics of the training batch.”
i.e., use model.train() mode, thus may lead to slightly different inference results every time.
Citation¶
@inproceedings{isola2017image,
title={Image-to-image translation with conditional adversarial networks},
author={Isola, Phillip and Zhu, Jun-Yan and Zhou, Tinghui and Efros, Alexei A},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={1125--1134},
year={2017}
}