Better than JPEG? Stable Diffusion better image compression

Last week, Swiss software engineer Matthias Buhlmann discovered that the popular Stable Diffusion image compositing model could compress existing bitmap images with fewer visual artifacts than JPEG or WebP formats at high compression tones, although there are currently some losses.


Stable Diffusion is an AI image compositing model that typically creates images based on text descriptions (called “prompts”). The AI ​​model learned this ability by studying millions of images from the Internet. During the training process, the model makes statistical associations between images and related words, making a much smaller representation of key information for each image and storing them as “weights”, which are mathematical values ​​that represent what the AI ​​knows about pictures.

When Stable Diffusion analyzes and “compresses” images into weights, they also exist in what researchers call “latent space,” which is a way of saying that they exist as a kind of fuzzy data that can be turned into images once decoded. . With Stable Diffusion 1.4, the weights file reaches about 4GB, but represents knowledge for hundreds of millions of images.

While most use Stable Diffusion with text prompts, Buhlmann removed the text encoder and created his images through the Stable Diffusion image encoder process, which takes a low-resolution 512×512 image and converts it to a higher-resolution latent space representation precision at 64×64. At this point, the image exists in a much smaller data size than the original, but can still be scaled (decoded) to a 512×512 image with fairly good results.

In testing, Buhlmann found that images compressed with Stable Diffusion subjectively looked better at higher compression (smaller file size) than JPEG or WebP equivalents.

s candy

The example above shows a photo of a pastry shop compressed to 5,68 KB using JPEG, 5,71 KB using WebP, and 4,98 KB using constant diffusion.
The image with Stable Diffusion appears to have more detail and less obvious compression artifacts than those with other formats.

However, Buhlmann's method currently has significant limitations:

It is not good with faces or text and in some cases, it can add features to the decoded image that were not present in the original image. No one of course wants to invent the image compressor that uses details that are not present in an image.

Also, decoding requires the weights file of Stable Diffusion which reaches 4GB and needs additional decoding time.

Buhlmann's code and more technical details are at Google Colab and Toward AI. The Best Technology Site in Greecegns

every publication, directly to your inbox

Join the 2.113 registrants.
jpeg,Stable Diffusion,image compression,iguru

Written by giorgos

George still wonders what he's doing here ...

Leave a reply

Your message will not be published if:
1. Contains insulting, defamatory, racist, offensive or inappropriate comments.
2. Causes harm to minors.
3. It interferes with the privacy and individual and social rights of other users.
4. Advertises products or services or websites.
5. Contains personal information (address, phone, etc.).