Better than JPEG? Stable Diffusion better image compression

Last week, the Swiss λογισμικού Matthias Buhlmann ανακάλυψε ότι το δημοφιλές μοντέλο σύνθεσης εικόνας Stable Diffusion θα μπορούσε να συμπιέσει υπάρχουσες εικόνες bitmap με λιγότερα οπτικά artifacts από τις μορφές JPEG ή WebP σε υψηλούς τόνους συμπίεσης, αν και προς το παρόν υπάρχουν ορισμένες απώλειες.

compress

Stable Diffusion is an AI image compositing model that typically creates images based on text descriptions (called “prompts”). The AI ​​model learned this ability by studying millions of images from the Internet. During the training process, the model makes statistical associations between images and related words, making a much smaller representation of key information for each image and storing them as “weights”, which are mathematical values ​​that represent what the AI ​​knows about pictures.

When Stable Diffusion analyzes and “compresses” the images in the form of weights, and these exist in what the they call it "latent space", which is a way of saying that it exists as a kind of fuzzy data that can be turned into images once decoded. With Stable Diffusion 1.4, the weights file reaches about 4GB, but represents knowledge for hundreds of millions of images.

While most people use Stable Diffusion with text prompts, Buhlmann removed the text encoder and created his images through the Stable Diffusion image encoder process, which takes a low-resolution image. s 512×512 and converts it to a higher precision latent space representation at 64×64. At this point, the image exists in a much smaller data size than the original, but can still be scaled (decoded) to a 512×512 image with fairly good results.

In testing, Buhlmann found that images compressed with Stable Diffusion subjectively looked better at higher compression (smaller file size) than JPEG or WebP equivalents.

s candy

The example above shows a photo of a pastry shop compressed to 5,68 KB using JPEG, 5,71 KB using WebP, and 4,98 KB using constant diffusion.
The image with Stable Diffusion appears to have more detail and less obvious compression artifacts than those with other formats.

However, Buhlmann's method currently has significant limitations:

It is not good with faces or text and in some cases, it can add features to the decoded image that were not present in the original image. No one of course wants to invent the image compressor that uses details that are not present in an image.

Also, decoding requires the weights file of Stable Diffusion which reaches 4GB and needs additional decoding time.

Buhlmann's code and more technical details are at Google Colab and Toward AI.

iGuRu.gr The Best Technology Site in Greecefgns

every publication, directly to your inbox

Join the 2.082 registrants.
jpeg,Stable Diffusion,image compression,iguru

Written by giorgos

George still wonders what he's doing here ...

Leave a reply

Your email address is not published. Required fields are mentioned with *

Your message will not be published if:
1. Contains insulting, defamatory, racist, offensive or inappropriate comments.
2. Causes harm to minors.
3. It interferes with the privacy and individual and social rights of other users.
4. Advertises products or services or websites.
5. Contains personal information (address, phone, etc.).