Diffusing Images on a Laptop with no GPU

#ai #machinelearning #tutorial #docker

This weekend I learned that it is possible to diffuse images without a GPU. I didn't think this would work but it's not only possible, it's easy and actually pretty fast! (Disclaimer: you need a good amount of RAM, I have 20GB)

Setup

To keep this simple and portable. I have used docker to run fastsdcpu.

Dockerfile

FROM ubuntu:24.04 AS base

RUN apt update \
&& apt-get install -y python3 python3-venv python3-pip python3-wheel ffmpeg git wget nano \
&& apt-get clean \
&& rm -rf /var/lib/apt/lists/* \
&& pip install uv --break-system-packages

FROM base AS fastsd

ARG FASTSDCPU_VERSION=v1.0.0-beta.200

RUN git clone https://github.com/rupeshs/fastsdcpu /app \
&& cd app \
&& git checkout -b $FASTSDCPU_VERSION \
&& wget https://huggingface.co/rupeshs/FastSD-Flux-GGUF/resolve/main/libstable-diffusion.so?download=true -O libstable-diffusion.so

WORKDIR /app

SHELL [ "/bin/bash", "-c" ]

RUN echo y | bash -x ./install.sh --disable-gui

VOLUME /app/models/gguf/
VOLUME /app/lora_models/
VOLUME /app/controlnet_models/
VOLUME /root/.cache/huggingface/hub/

ENV GRADIO_SERVER_NAME=0.0.0.0
EXPOSE 7860

CMD [ "/app/start-webui.sh" ]

And used Docker Compose to map the volumes to directories on my host system. This will store downloaded models outside of the container and enable adding custom models.

docker-compose.yaml

services:
  fastsdcpu:
    build:
      context: .
      dockerfile: Dockerfile
    ports:
      - "7860:7860"
    volumes:
      - gguf:/app/models/gguf/
      - lora:/app/lora_models/
      - ctrl:/app/controlnet_models/
      - cache:/root/.cache/huggingface/hub/
    deploy:
      resources:
        limits:
          memory: 20g
    stdin_open: true
    tty: true
    environment:
      - GRADIO_SERVER_NAME=0.0.0.0

volumes:
  gguf:
    driver: local
    driver_opts:
      type: none
      o: bind
      device: ./models/gguf
  lora:
    driver: local
    driver_opts:
      type: none
      o: bind
      device: ./models/lora
  cache:
    driver: local
    driver_opts:
      type: none
      o: bind
      device: ./models/cache
  ctrl:
    driver: local
    driver_opts:
      type: none
      o: bind
      device: ./models/ctrl

Then run sudo docker compose up --build to start the container. Once the Web UI service has started you can access it at http://localhost:7860.

Usage

This app is designed to auto-download the selected model the first time you try to generate an image with it. You'll have to experiment with what works best for your use-case. The default model LCM -> stabilityai/sd-turbo works pretty well for objects and scenery but does not do so well with realistic images of people. LCM-Lora -> Lykon/dreamshaper-8 is much better at people and quite surprisingly fast. Even with my modest Intel(R) Core(TM) i5-7200U CPU @ 2.50GHz with no dedicated GPU, I can generate crisp, consistent images in ~30 seconds.

Of course your generation settings will affect this. Higher resolution images or more inference steps will take longer. I found the best settings for dreamshaper to be 4-5 steps with guidance scale 1. I can quickly generate 256x256 images for testing prompts and after I get roughly what I want, I increase the resolution and other settings gradually until I get exactly the image I'm looking for. Using tiny auto encoder for SD makes a significant difference in speed.

Testing

I tried using LCM-OpenVINO -> rupeshs/sd-turbo-openvino which is specifically for Intel setups but I found this took longer and bogged my system down. If you have a newer Intel Arc based system this will probably work better for you.

Sadly I was not able to get Flux1 working. I think it requires CPU instructions that my system does not possess. If you have an i7 or higher, this would be the ideal model to choose if you want highly creative images especially in a fantasy setting. Also, Flux1 can generate coherent text which Stable Diffusion notoriously fails at.

Bonus

This app also has an API that you can hook into but it needs to be enabled before you can access it by adding these extra bits to the docker config.

Add to Dockerfile after RUN echo y | bash -x ./install.sh --disable-gui

RUN cat > run.sh <<EOF
#!/bin/bash
/app/start-webserver.sh &
/app/start-webui.sh
EOF

RUN chmod +x start-webserver.sh run.sh

# ... VOLUMES ...

EXPOSE 7860
EXPOSE 8000

CMD [ "/app/run.sh" ]

Add the port to docker-compose.yaml

services:
  fastsdcpu:
    ports:
      - "7860:7860"
      - "8000:8000"

Then you can access the API browser at http://localhost:8000/api/docs and do stuff such as make POST requests to /api/generate with a JSON body like this:

{
  "diffusion_task": "text_to_image",
  "use_tiny_auto_encoder": true,
  "use_lcm_lora": true,
  "lcm_lora": {
    "base_model_id": "Lykon/dreamshaper-8",
    "lcm_lora_id": "latent-consistency/lcm-lora-sdv1-5"
  },
  "prompt": "a silly cat",
  "negative_prompt": "humans",
  "image_height": 256,
  "image_width": 256,
  "inference_steps": 1,
  "guidance_scale": 1,
  "number_of_images": 1
}

You'll get a JSON response back containing a base64 encoded JPG image that you can throw directly into a DataURL. The first image you generate through the API will take a little longer as the system warms up, but after that things run pretty smoothly.

Top comments (1)

Besworks • Apr 27 • Edited

There's an issue with the latest fastsdcpu release that was fixed yesterday so the dockerfile above might not work as expected. Changing FASTSDCPU_VERSION to 4bbfce7325bdc2048790ec17444721a584f1d5b0 should get this back in working order.

Also, this container would be better with a different base image. Ubuntu is rather large and requires a lot of packages to be installed. This creates much leaner base image:

FROM python:3.12-slim-bullseye@sha256:87128420453f1b60615c67120fad73dcdf9bed02a26c7c027079bc7a2589cf97 AS base

RUN apt update \
&& apt-get install -y ffmpeg git nano wget \
&& apt-get clean \
&& rm -rf /var/lib/apt/lists/* \
&& pip install uv --break-system-packages

FROM base AS fastsd

I'll be refining this further in the near future. I want to really tighten this up for production deployment.