image2image¶

Summary¶

Number of checkpoints: 12
Number of configs: 13
Number of papers: 3
- ALGORITHM: 3

Disco Diffusion (2022)¶

Disco Diffusion

Task: Text2Image, Image2Image

Abstract¶

Disco Diffusion (DD) is a Google Colab Notebook which leverages an AI Image generating technique called CLIP-Guided Diffusion to allow you to create compelling and beautiful images from text inputs.

Created by Somnai, augmented by Gandamu, and building on the work of RiversHaveWings, nshepperd, and many others. See more details in Credits.

Results and models¶

We have converted several unet weights and offer related configs. See more details of different unet in Tutorial.

Model	Dataset	Download
512x512_diffusion_uncond_finetune_008100	ImageNet	model
256x256_diffusion_uncond	ImageNet	model
portrait_generator_v001	unknown	model

Model	Download
pixelartdiffusion_expanded	Coming soon!
pixel_art_diffusion_hard_256	Coming soon!
pixel_art_diffusion_soft_256	Coming soon!
pixelartdiffusion4k	Coming soon!
watercolordiffusion_2	Coming soon!
watercolordiffusion	Coming soon!
PulpSciFiDiffusion	Coming soon!

To-do List¶

[x] Text2Image
[x] Image2Image
[x] Imagenet, portrait diffusion models
[ ] pixelart, watercolor, sci-fiction diffusion models
[ ] image prompt
[ ] video generation
[ ] faster sampler(plms, dpm-solver etc.)

We really welcome community users supporting these items and any other interesting stuffs!

Quick Start¶

Running the following codes, you can get a text-generated image.

from mmengine import Config, MODELS
from mmengine.registry import init_default_scope
from torchvision.utils import save_image

init_default_scope('mmagic')

disco = MODELS.build(
    Config.fromfile('configs/disco_diffusion/disco-baseline.py').model).cuda().eval()
text_prompts = {
    0: [
        "A beautiful painting of a singular lighthouse, shining its light across a tumultuous sea of blood by greg rutkowski and thomas kinkade, Trending on artstation.",
        "yellow color scheme"
    ]
}
image = disco.infer(
    height=768,
    width=1280,
    text_prompts=text_prompts,
    show_progress=True,
    num_inference_steps=250,
    eta=0.8)['samples']
save_image(image, "image.png")

Tutorials¶

Considering that disco-diffusion contains many adjustable parameters, we provide users with a jupyter-notebook / colab tutorial that exhibits the meaning of different parameters, and gives results corresponding to adjustment. Refer to Disco Sheet.

Credits¶

Since our adaptation of disco-diffusion are heavily influenced by disco colab, here we copy the credits below.

Credits

Original notebook by Katherine Crowson (https://github.com/crowsonkb, https://twitter.com/RiversHaveWings). It uses either OpenAI's 256x256 unconditional ImageNet or Katherine Crowson's fine-tuned 512x512 diffusion model (https://github.com/openai/guided-diffusion), together with CLIP (https://github.com/openai/CLIP) to connect text prompts with images.

Modified by Daniel Russell (https://github.com/russelldc, https://twitter.com/danielrussruss) to include (hopefully) optimal params for quick generations in 15-100 timesteps rather than 1000, as well as more robust augmentations.

Further improvements from Dango233 and nshepperd helped improve the quality of diffusion in general, and especially so for shorter runs like this notebook aims to achieve.

Vark added code to load in multiple Clip models at once, which all prompts are evaluated against, which may greatly improve accuracy.

The latest zoom, pan, rotation, and keyframes features were taken from Chigozie Nri’s VQGAN Zoom Notebook (https://github.com/chigozienri, https://twitter.com/chigozienri)

Advanced DangoCutn Cutout method is also from Dango223.

–

Disco:

Somnai (https://twitter.com/Somnai_dreams) added Diffusion Animation techniques, QoL improvements and various implementations of tech and techniques, mostly listed in the changelog below.

3D animation implementation added by Adam Letts (https://twitter.com/gandamu_ml) in collaboration with Somnai. Creation of disco.py and ongoing maintenance.

Turbo feature by Chris Allen (https://twitter.com/zippy731)

Improvements to ability to run on local systems, Windows support, and dependency installation by HostsServer (https://twitter.com/HostsServer)

VR Mode by Tom Mason (https://twitter.com/nin_artificial)

Horizontal and Vertical symmetry functionality by nshepperd. Symmetry transformation_steps by huemin (https://twitter.com/huemin_art). Symmetry integration into Disco Diffusion by Dmitrii Tochilkin (https://twitter.com/cut_pow).

Warp and custom model support by Alex Spirin (https://twitter.com/devdef).

Pixel Art Diffusion, Watercolor Diffusion, and Pulp SciFi Diffusion models from KaliYuga (https://twitter.com/KaliYuga_ai). Follow KaliYuga’s Twitter for the latest models and for notebooks with specialized settings.

Integration of OpenCLIP models and initiation of integration of KaliYuga models by Palmweaver / Chris Scalf (https://twitter.com/ChrisScalf11)

Integrated portrait_generator_v001 from Felipe3DArtist (https://twitter.com/Felipe3DArtist)

Citation¶

@misc{github,
  author={alembics},
  title={disco-diffusion},
  year={2022},
  url={https://github.com/alembics/disco-diffusion},
}

CycleGAN (ICCV’2017)¶

CycleGAN: Unpaired Image-to-Image Translation Using Cycle-Consistent Adversarial Networks

Task: Image2Image

Abstract¶

Image-to-image translation is a class of vision and graphics problems where the goal is to learn the mapping between an input image and an output image using a training set of aligned image pairs. However, for many tasks, paired training data will not be available. We present an approach for learning to translate an image from a source domain X to a target domain Y in the absence of paired examples. Our goal is to learn a mapping G: X \rightarrow Y such that the distribution of images from G(X) is indistinguishable from the distribution Y using an adversarial loss. Because this mapping is highly under-constrained, we couple it with an inverse mapping F: Y \rightarrow X and introduce a cycle consistency loss to push F(G(X)) \approx X (and vice versa). Qualitative results are presented on several tasks where paired training data does not exist, including collection style transfer, object transfiguration, season transfer, photo enhancement, etc. Quantitative comparisons against several prior methods demonstrate the superiority of our approach.

Results and Models¶

Results from CycleGAN trained by mmagic

We use FID and IS metrics to evaluate the generation performance of CycleGAN.¹ https://download.openmmlab.com/mmediting/cyclegan/refactor/cyclegan_lsgan_resnet_in_1x1_80k_facades_20210902_165905-5e2c0876.pth https://download.openmmlab.com/mmediting/cyclegan/refactor/cyclegan_in_1x1_80k_facades_20210902_165905-5e2c0876.pth

Model	Dataset	FID	IS	Download
Ours	facades	124.8033	1.792	model \| log ²
Ours	facades-id0	125.1694	1.905	model
Ours	summer2winter	83.7177	2.771	model
Ours	summer2winter-id0	83.1418	2.720	model
Ours	winter2summer	72.8025	3.129	model
Ours	winter2summer-id0	73.5001	3.107	model
Ours	horse2zebra	64.5225	1.418	model
Ours	horse2zebra-id0	74.7770	1.542	model
Ours	zebra2horse	141.1517	3.154	model
Ours	zebra2horse-id0	134.3728	3.091	model

FID comparison with official:

Dataset	facades	facades-id0	summer2winter	summer2winter-id0	winter2summer	winter2summer-id0	horse2zebra	horse2zebra-id0	zebra2horse	zebra2horse-id0	average
official	123.626	119.726	77.342	76.773	72.631	74.239	62.111	77.202	138.646	137.050	95.935
ours	124.8033	125.1694	83.7177	83.1418	72.8025	73.5001	64.5225	74.7770	141.1571	134.3728	97.79

IS comparison with evaluation:

Dataset	facades	facades-id0	summer2winter	summer2winter-id0	winter2summer	winter2summer-id0	horse2zebra	horse2zebra-id0	zebra2horse	zebra2horse-id0	average
official	1.638	1.697	2.762	2.750	3.293	3.110	1.375	1.584	3.186	3.047	2.444
ours	1.792	1.905	2.771	2.720	3.129	3.107	1.418	1.542	3.154	3.091	2.462

Note:

With a larger identity loss, the image-to-image translation becomes more conservative, which makes less changes. The original authors did not say what is the best weight for identity loss. Thus, in addition to the default setting, we also set the weight of identity loss to 0 (denoting id0) to make a more comprehensive comparison.
This is the training log before refactoring. Updated logs will be released soon.

Citation¶

@inproceedings{zhu2017unpaired,
  title={Unpaired image-to-image translation using cycle-consistent adversarial networks},
  author={Zhu, Jun-Yan and Park, Taesung and Isola, Phillip and Efros, Alexei A},
  booktitle={Proceedings of the IEEE international conference on computer vision},
  pages={2223--2232},
  year={2017},
  url={https://openaccess.thecvf.com/content_iccv_2017/html/Zhu_Unpaired_Image-To-Image_Translation_ICCV_2017_paper.html},
}

Pix2Pix (CVPR’2017)¶

Pix2Pix: Image-to-Image Translation with Conditional Adversarial Networks

Task: Image2Image

Abstract¶

We investigate conditional adversarial networks as a general-purpose solution to image-to-image translation problems. These networks not only learn the mapping from input image to output image, but also learn a loss function to train this mapping. This makes it possible to apply the same generic approach to problems that traditionally would require very different loss formulations. We demonstrate that this approach is effective at synthesizing photos from label maps, reconstructing objects from edge maps, and colorizing images, among other tasks. Moreover, since the release of the pix2pix software associated with this paper, hundreds of twitter users have posted their own artistic experiments using our system. As a community, we no longer hand-engineer our mapping functions, and this work suggests we can achieve reasonable results without handengineering our loss functions either.

Results and Models¶

Results from Pix2Pix trained by mmagic

We use `FID` and `IS` metrics to evaluate the generation performance of pix2pix.¹

Model	Dataset	FID	IS	Download
Ours	facades	124.9773	1.620	model \| log²
Ours	aerial2maps	122.5856	3.137	model
Ours	maps2aerial	88.4635	3.310	model
Ours	edges2shoes	84.3750	2.815	model

FID comparison with official:

Dataset	facades	aerial2maps	maps2aerial	edges2shoes	average
official	119.135	149.731	102.072	75.774	111.678
ours	124.9773	122.5856	88.4635	84.3750	105.1003

IS comparison with official:

Dataset	facades	aerial2maps	maps2aerial	edges2shoes	average
official	1.650	2.529	3.552	2.766	2.624
ours	1.620	3.137	3.310	2.815	2.7205

Note:

we strictly follow the paper setting in Section 3.3: “At inference time, we run the generator net in exactly the same manner as during the training phase. This differs from the usual protocol in that we apply dropout at test time, and we apply batch normalization using the statistics of the test batch, rather than aggregated statistics of the training batch.” (i.e., use model.train() mode), thus may lead to slightly different inference results every time.
This is the training log before refactoring. Updated logs will be released soon.

Citation¶

@inproceedings{isola2017image,
  title={Image-to-image translation with conditional adversarial networks},
  author={Isola, Phillip and Zhu, Jun-Yan and Zhou, Tinghui and Efros, Alexei A},
  booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
  pages={1125--1134},
  year={2017},
  url={https://openaccess.thecvf.com/content_cvpr_2017/html/Isola_Image-To-Image_Translation_With_CVPR_2017_paper.html},
}