cross-posted from: https://lemmy.world/post/3549390
stable-diffusion.cpp
Introducing
stable-diffusion.cpp
, a pure C/C++ inference engine for Stable Diffusion! This is a really awesome implementation to help speed up home inference of diffusion models.Tailored for developers and AI enthusiasts, this repository offers a high-performance solution for creating and manipulating images using various quantization techniques and accelerated inference.
Key Features:
- Efficient Implementation: Utilizing plain C/C++, it operates seamlessly like llama.cpp and is built on the ggml framework.
- Multiple Precision Support: Choose between 16-bit, 32-bit float, and 4-bit to 8-bit integer quantization.
- Optimized Performance: Experience memory-efficient CPU inference with AVX, AVX2, and AVX512 support for x86 architectures.
- Versatile Modes: From original
txt2img
toimg2img
modes and negative prompt handling, customize your processing needs.- Cross-Platform Compatibility: Runs smoothly on Linux, Mac OS, and Windows.
Getting Started
Cloning, building, and running are made simple, and detailed examples are provided for both text-to-image and image-to-image generation. With an array of options for precision and comprehensive usage guidelines, you can easily adapt the code for your specific project requirements.
git clone --recursive https://github.com/leejet/stable-diffusion.cpp cd stable-diffusion.cpp
- If you have already cloned the repository, you can use the following command to update the repository to the latest code.
cd stable-diffusion.cpp git pull origin master git submodule update
More Details
- Plain C/C++ implementation based on ggml, working in the same way as llama.cpp
- 16-bit, 32-bit float support
- 4-bit, 5-bit and 8-bit integer quantization support
- Accelerated memory-efficient CPU inference
- Only requires ~2.3GB when using txt2img with fp16 precision to generate a 512x512 image
- AVX, AVX2 and AVX512 support for x86 architectures
- Original
txt2img
andimg2img
mode- Negative prompt
- stable-diffusion-webui style tokenizer (not all the features, only token weighting for now)
- Sampling method
Euler A
- Supported platforms
- Linux
- Mac OS
- Windows
This is a really exciting repo. I'll be honest, I don't think I am as well versed in what's going on for diffusion inference - but I do know more efficient and effective methods running those models are always welcome by people frequently using diffusers. Especially for those who need to multi-task and maintain performance headroom.
Got a 1.42s generation on the Cpp one and 2.1s with auto1111's SD (note my torch is outdated, model was converted to fp16).
Though i'm having trouble finding the generated image ๐ .
All on the same generation settings, 5800x cpu & 3080 12gb
I'd love a 2s generation, it usually takes about 60s with my 1660ti 4gb
oh that's rough, this was on 512x512 20 steps. I usually do 768x768 50 steps for 33 seconds, gets me better quality without using upscale.
768x768 50 steps takes me several minutes
Are you sure that it's using your GPU? That seems way slower than I'd expect.
I'm using it on low, because is runs out of memory whenever I try medium or high