0

I am using this sample python code from Hugging Face

from diffusers import DiffusionPipeline
import torch

# load both base & refiner
base = DiffusionPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0", torch_dtype=torch.float16, variant="fp16", use_safetensors=True
)
base.to("cuda")
refiner = DiffusionPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-refiner-1.0",
    text_encoder_2=base.text_encoder_2,
    vae=base.vae,
    torch_dtype=torch.float16,
    use_safetensors=True,
    variant="fp16",
)
refiner.to("cuda")

# Define how many steps and what % of steps to be run on each experts (80/20) here
n_steps = 40
high_noise_frac = 0.8

prompt = "A majestic lion jumping from a big stone at night"

# run both experts
image = base(
    prompt=prompt,
    num_inference_steps=n_steps,
    denoising_end=high_noise_frac,
    output_type="latent",
).images
image = refiner(
    prompt=prompt,
    num_inference_steps=n_steps,
    denoising_start=high_noise_frac,
    image=image,
).images[0]
image

But every time the image is about the be generated, the Google Colab session crashes. Has anyone successfully tried the Base + Refiner in Google Colab? I have successfully tried only the Base but I want to try both at the same time.

James Arnold
  • 698
  • 3
  • 9
  • 22

2 Answers2

1

Yes I have. If you’re on the free tier there’s not enough VRAM for both models.

Set base to None, do a gc.collect and CUDA cache purge after creating refiner.

This is my code

    %pip install --quiet --upgrade diffusers transformers accelerate mediapy
    use_refiner = True
    import mediapy as media
    import random
    import sys
    import torch
    
    from diffusers import DiffusionPipeline
    refiner = None
    pipe = None
    import gc
    gc.collect()
    
    torch.cuda.empty_cache()
    base_output = []
    prompt_details = []
    
    prompt = "a photo of a goth doris day, fine dining with a view to the Eiffel Tower"
    image_count = 5
    
    if pipe is None:
    
      refiner = None
      import gc
      gc.collect()
    
      pipe = DiffusionPipeline.from_pretrained(
          "stabilityai/stable-diffusion-xl-base-1.0",
          torch_dtype=torch.float16,
          use_safetensors=True,
          variant="fp16",
          ).to('cuda')
    
    
    base_output = []
    prompt_details = []
    
    for i in range(image_count):
    
      seed = random.randint(0, sys.maxsize)
    
      images = pipe(
        prompt = prompt,
        output_type = "latent" if use_refiner else "pil",
        generator = torch.Generator("cuda").manual_seed(seed),
        num_inference_steps=30
        ).images
    
      if use_refiner:
        base_output.append(images)
        prompt_details.append(f"Prompt:\t{prompt}\nSeed:\t{seed}")
      else:
        print(f"Prompt:\t{prompt}\nSeed:\t{seed}")
        media.show_images(images)
    
    if use_refiner:
    
      refiner = DiffusionPipeline.from_pretrained(
          "stabilityai/stable-diffusion-xl-refiner-1.0",
          text_encoder_2=pipe.text_encoder_2,
          vae=pipe.vae,
          torch_dtype=torch.float16,
          use_safetensors=True,
          variant="fp16",
      ).to('cuda')
    
      pipe = None
      torch.cuda.empty_cache()
    
      for i in range(image_count):
    
        if use_refiner:
          images = refiner(
              prompt = prompt,
              image = base_output[i],
              ).images
    
        print(prompt_details[i])
        media.show_images(images)

I have split the code where my cells are.

That last big cell is re-runable, hence the awkward setting of values to None and memory purges before anything is done as Colab doesn't release the VRAM for the next run of that cell.

Note I really ought to move the user_refiner setting to the main cell, and try using the 16bit vae from madebyollin to reduce the memory usage more, and there an unnecessary gc import still in there :-)

Vargol
  • 18
  • 2
Vargol
  • 11
  • 1
0

Seems I can't reply to myself so here's a new Answer.

Yes it can be done. I've done a fair bit of work on the code I used above and have got a version now that runs well within the free colab limits, uses 8.9Gb of VRAM, 5Gb of Ram and 38Gb of storage. So that should stop the crashing,

The main improvements were the use of the 16bit VAE which saved a lot of VRAM on decoding and not re-using the text_encoder from the base pipe which seems to free up another 3gb of VRAM.

I did try to use the CPU offloading to be able of have both models in memory even it it was CPU and GPU memory but that just used up all the system RAM so settled for doing the base run for all images followed by doing the refiner run for all images.

I've stuck in a Gist

https://gist.github.com/Vargol/ae56f6c1bd825523d028a5925b4b1dad

It also can generate more than one image at a time and uses the same styles as the various bots and Clipdrop, the "No Style" style is called Enhance.

As before the generation part, the last cell, is re-runnable so it doesn't go through the setup over and over if you what to generate more images.

Vargol
  • 18
  • 2