Better than JPEG? Stable Diffusion better image compression

Last week, Swiss software engineer Matthias Buhlmann discovered that the popular image compositing model stable diffusion could compress existing bitmap images with fewer visual artifacts than high-tone JPEG or WebP formats compression, although at present there are some losses.

Stable Diffusion is an AI image compositing model that typically creates images based on text descriptions (called “prompts”). The AI model learned this ability by studying millions of images from the Internet. During training procedures, the model makes statistical associations between images and related words, making a much smaller representation of key information for each image and storing them as “weights,” which are mathematical values that represent what the AI knows about the images.

When Stable Diffusion analyzes and “compresses” images into weights, they also exist in what researchers call “latent space,” which is a way of saying that they exist as a kind of fuzzy data that can be turned into images once decoded. . With Stable Diffusion 1.4, the weights file reaches about 4GB, but represents knowledge for hundreds of millions of images.

While most use Stable Diffusion with text prompts, Buhlmann removed the text encoder and created his images through the Stable Diffusion image encoder process, which takes a low-resolution 512x512 image and converts it to a higher-resolution latent space representation precision at 64×64. At this point, the image exists in a lot smaller data size from the original, but it can still be scaled (decoded) to a 512×512 image with fairly good results.

In testing, Buhlmann found that images compressed with Stable Diffusion subjectively looked better at higher compression (smaller file size) than JPEG or WebP equivalents.

The example above shows a photo of a pastry shop compressed to 5,68 KB using JPEG, 5,71 KB using WebP, and 4,98 KB using constant diffusion.
The image with Stable Diffusion appears to have more detail and less obvious compression artifacts than those with other formats.

However, Buhlmann's method currently has significant limitations:

It is not good with faces or text and in some cases, it can add features to the decoded image that were not present in the original image. No one of course wants to invent the image compressor that uses details that are not present in an image.

Also, decoding requires the weights file of Stable Diffusion which reaches 4GB and needs additional decoding time.

Buhlmann's code and more technical details are at Google Colab and Toward AI.

Better than JPEG? Stable Diffusion better image compression

every publication, directly to your inbox

Written by giorgos

Leave a reply Ακύρωση απάντησης

every publication, directly to your inbox

spread the news

Leave a reply Ακύρωση απάντησης