Την περασμένη εβδομάδα, η Meta ανακοίνωσε μια νέα μέθοδο συμπίεσης ήχου με τεχνητή νοημοσύνη που ονομάζεται “EnCodec”. Η νέα μέθοδος φέρεται να συμπιέζει τον ήχο 10 φορές περισσότερο από τη μορφή MP3 στα 64 kbps χωρίς απώλεια ποιότητας.
Meta says this technique could significantly improve speech audio quality over low-bandwidth connections, such as phone calls in underserved areas.
The same technique works for music.
Meta announced the technology on October 25 in a paper titled “High Fidelity Neural Audio Compression”, co-authored by Meta AI researchers Alexandre Defossez, Jade Copet, Gabriel Synnaeve and Yossi Adi.
Meta also published a summary of the research on her blog.
The company describes its method as a three-part system that is trained to compress audio to a desired target size. First, the codec converts the uncompressed data into a lower frame rate “latent space” representation.
The “quantizer” then compresses the representation to the target size while finding the most important information that will be used later to reconstruct the original signal. (This compressed signal will be the one sent over a network or stored on disk.) Finally, the decoder converts the compressed data into audio in real time using a neural network on a single CPU.
“The key to lossy compression is detecting changes that cannot be perceived by humans. So perfect reconstruction is impossible at low bit rates.”
"To get better results, we use tokens to improve the perceptual quality of the generated samples."
