AI with ComfyUI

#AI#ComfyUI
26 May 2024

In this session, I'll share how to use AI models trained for custom photo creation with ComfyUI software. ComfyUI is The most powerful and modular stable diffusion GUI and backend.

image

This ui will let we design and execute advanced stable diffusion pipelines using a graph/nodes/flowchart based interface. Some features that we can do with ComfyUI:

  • Generating text-to-image
  • Face swap
  • Generating image-to-image
  • Creating videos and animation
  • How to install the ComfyUI. Please clone the repo ComfyUI and read more detail here for steps to install depending on your OS system.

    After we install and clone the repo code we need to install all dependencies

    1 pip install -r requirements.txt

    Note: Using python >3.9

    Then running the app

    1 2 3 pyenv global 3.10.13 python main.py

    The app will run with http://127.0.0.1:8188

    Components of a Stable Diffusion Model

    Stable Diffusion isn't just one large, single model. Instead, it's made up of various components and models that collaborate to generate images from text.

    Model files are large .ckpt or .safetensors files obtained from repositories such as HuggingFace or CivitAI. These files contain the weights for three different models:

  • CLIP - a model to convert text prompt to a compressed format that the UNET model can understand
  • MODEL - the main Stable Diffusion model, also known as UNET. Generates a compressed image
  • VAE - Decodes the compressed image to a normal-looking image
  • 1. CheckpointLoader

    In the default ComfyUI workflow, the CheckpointLoader serves as a representation of the model files. It allows users to select a checkpoint to load and displays three different outputs: MODELCLIP, and VAE.

    image

    We need to download models from https://huggingface.co/ and place them inside the path models/checkpoints in the repository. Then we can load the model at Load Checkpoint.

    image

    2. CLIPTextEncode

    The CLIP model is connected to CLIPTextEncode nodes. CLIP, acting as a text encoder, converts text to a format understandable by the main MODEL.

    image

    3. KSampler and LatenImage

    In Stable Diffusion, image generation involves a sampler, represented by the sampler node in ComfyUI. The sampler takes the main Stable Diffusion MODEL, positive and negative prompts encoded by CLIP, and a Latent Image as inputs. The Latent Image is an empty image since we are generating an image from text (txt2img).

    image

    The sampler adds noise to the input latent image and denoises it using the main MODEL. Gradual denoising, guided by encoded prompts, is the process through which Stable Diffusion generates images.

    4. VAE and SAveImage Node

    The third model used in Stable Diffusion is the VAE, responsible for translating an image from latent space to pixel space. Latent space is the format understood by the main MODEL, while pixel space is the format recognizable by image viewers.

    image

    The VAEDecode node takes the latent image from the sampler as input and outputs a regular image. This image is then saved to a PNG file using the SaveImage node. The save images will store inside path output in repostory

    image

    I show several common nodes that we use to make images. If we install ComfyUI-Manager, we can also add new nodes or install custom nodes. ComfyUI-Manager is an extension designed to enhance the usability of ComfyUI. It offers management functions to install, remove, disable, and enable various custom nodes of ComfyUI. Furthermore, this extension provides a hub feature and convenience functions to access a wide range of information within ComfyUI.

    After installed, we can see the UI

    image

    Reference documents:

  • Workflow: https://comfyworkflows.com
  • Models: https://huggingface.co/
  • Prompt: https://www.seaart.ai/