AI with ComfyUI

#AI#ComfyUI

26 May 2024

In this session, I'll share how to use AI models trained for custom photo creation with ComfyUI software. ComfyUI is The most powerful and modular stable diffusion GUI and backend.

This ui will let we design and execute advanced stable diffusion pipelines using a graph/nodes/flowchart based interface. Some features that we can do with ComfyUI:

Generating text-to-image

Face swap

Generating image-to-image

Creating videos and animation

…

How to install the ComfyUI. Please clone the repo ComfyUI and read more detail here for steps to install depending on your OS system.

After we install and clone the repo code we need to install all dependencies

pip install -r requirements.txt

Note: Using python >3.9

Then running the app

1
2
3

pyenv global 3.10.13

python main.py

The app will run with http://127.0.0.1:8188

Components of a Stable Diffusion Model

Stable Diffusion isn't just one large, single model. Instead, it's made up of various components and models that collaborate to generate images from text.

Model files are large .ckpt or .safetensors files obtained from repositories such as HuggingFace or CivitAI. These files contain the weights for three different models:

CLIP - a model to convert text prompt to a compressed format that the UNET model can understand

MODEL - the main Stable Diffusion model, also known as UNET. Generates a compressed image

VAE - Decodes the compressed image to a normal-looking image

1. CheckpointLoader

In the default ComfyUI workflow, the CheckpointLoader serves as a representation of the model files. It allows users to select a checkpoint to load and displays three different outputs: MODEL, CLIP, and VAE.

We need to download models from https://huggingface.co/ and place them inside the path models/checkpoints in the repository. Then we can load the model at Load Checkpoint.

2. CLIPTextEncode

The CLIP model is connected to CLIPTextEncode nodes. CLIP, acting as a text encoder, converts text to a format understandable by the main MODEL.

3. KSampler and LatenImage

In Stable Diffusion, image generation involves a sampler, represented by the sampler node in ComfyUI. The sampler takes the main Stable Diffusion MODEL, positive and negative prompts encoded by CLIP, and a Latent Image as inputs. The Latent Image is an empty image since we are generating an image from text (txt2img).

The sampler adds noise to the input latent image and denoises it using the main MODEL. Gradual denoising, guided by encoded prompts, is the process through which Stable Diffusion generates images.

4. VAE and SAveImage Node

The third model used in Stable Diffusion is the VAE, responsible for translating an image from latent space to pixel space. Latent space is the format understood by the main MODEL, while pixel space is the format recognizable by image viewers.

The VAEDecode node takes the latent image from the sampler as input and outputs a regular image. This image is then saved to a PNG file using the SaveImage node. The save images will store inside path output in repostory

I show several common nodes that we use to make images. If we install ComfyUI-Manager, we can also add new nodes or install custom nodes. ComfyUI-Manager is an extension designed to enhance the usability of ComfyUI. It offers management functions to install, remove, disable, and enable various custom nodes of ComfyUI. Furthermore, this extension provides a hub feature and convenience functions to access a wide range of information within ComfyUI.

After installed, we can see the UI

Reference documents:

Workflow: https://comfyworkflows.com

Models: https://huggingface.co/

Prompt: https://www.seaart.ai/