Skip to the content.
In this paper, we challenge the common practice of training neural audio codecs end-to-end, instead proposing a three-stages strategy that allows us to rely on an implicit neural quantization layer for neural audio coding.
Post-training quantization
Training procedure of QINCODEC with offline quantization: First, we train a continuous compression model with spectral and adversarial losses. Next, we quantize the bottleneck latent vec- tors into discrete embeddings. We then finetune the decoder on the quantized representations

Experiments and results

The tables below provide audio clips for evaluating the reconstruction quality of our model in comparison to the baselines presented in the paper. Some differences between audio samples may be subtle, so we recommend using headphones for an accurate assessment.

Comparison with baselines, at 16kbps

Original Qincodec DAC Encodec

Comparison with baselines, at 8kbps

Original Qincodec DAC Encodec

Impact of finetuning

Original QinCodec QinCodec (w/o finetuning)