# LPCNet for FreeDV Experimental version of LPCNet being developed for over the air Digital Voice experiments with FreeDV. ## Quickstart ``` $ git clone https://github.com/drowe67/codec2.git $ cd codec2 && mkdir build_linux && cd build_linux && cmake ../ && make $ cd ~ $ git clone https://github.com/drowe67/LPCNet.git $ cd LPCNet && mkdir build_linux && cd build_linux $ cmake -DCODEC2_BUILD_DIR=~/codec2/build_linux .. $ make ``` Unquantised LPCNet: ``` $ cd ~/LPCNet/build_linux/src $ sox ../../wav/wia.wav -t raw -r 16000 - | ./dump_data --c2pitch --test - - | ./test_lpcnet - - | aplay -f S16_LE -r 16000 ``` LPCNet at 1733 bits/s using direct-split quantiser: ``` sox ../../wav/wia.wav -t raw -r 16000 - | ./lpcnet_enc -s | ./lpcnet_dec -s | aplay -f S16_LE -r 16000 ``` # CTests ``` $ cd ~/LPCNet/build_linux $ ctest ``` Note, due to precision/library issues several tests (1-3) will only pass on certain machines such as Ubuntu 16 and 18, Ubuntu 17 is known to fail. # Reading Further 1. [Original LPCNet Repo with more instructions and background](https://github.com/mozilla/LPCNet/) 1. [LPCNet: DSP-Boosted Neural Speech Synthesis](https://people.xiph.org/~jm/demo/lpcnet/) 1. [Sample model files](https://jmvalin.ca/misc_stuff/lpcnet_models/) # Credits Thanks [Jean-Marc Valin](https://people.xiph.org/~jm/demo/lpcnet/) for making LPCNet available, and [Richard](https://github.com/hobbes1069) for the CMake build system. # Cross Compiling for Windows This code has been cross compiled to Windows using Fedora Linux 30, see the freedv-gui README.md, and build_windows.sh script. # Speech Material for Training Suitable training material can be obtained from the McGill University Telecommunications & Signal Processing Laboratory. Download the ISO and extract the 16k-LP7 directory, the src/concat.sh script can be used to generate a headerless file of training samples. ``` cd 16k-LP7 sh /path/to/LPCNet/src/concat.sh ``` # Quantiser Experiments The quantiser files used for these experiments (pred_v2.tgz and split.tgz) are [here](http://rowetel.com/downloads/deep/lpcnet_quant) ## Exploring Features Install GNU Octave (if thats your thing). Extract a feature file, fire up Octave, and mesh plot the 18 cepstrals for the first 100 frame (1 second): ``` $ ./dump_data --test speech_orig_16k.s16 speech_orig_16k_features.f32 $ cd src $ octave --no-gui octave:3> f=load_f32("../speech_orig_16k_features.f32",55); nrows: 1080 octave:4> mesh(f(1:100,1:18)) ``` ## Uniform Quantisation Listen to the effects of 4dB step uniform quantisation on cepstrals: ``` $ cat ~/Downloads/wia.wav | ./dump_data --test - - | ./quant_feat -u 4 | ./test_lpcnet - - | play -q -r 16000 -s -2 -t raw - ``` This lets us listen to the effect of quantisation error. Once we think it sounds OK, we can compute the variance (average squared quantiser error). A 4dB step size means the error PDF is uniform in the range of -2 to +2 dB. A uniform PDF has variance of (b-a)^2/12, so (2--2)^2/12 = 1.33 dB^2. We can then try to design a quantiser (e.g. multi-stage VQ) to achieve that variance. ## Training a Predictive VQ Clone and build [codec2](https://github.com/drowe67/codec2.git): ``` $ git clone https://github.com/drowe67/codec2.git $ cd codec2 && mkdir build_linux && cd build_linux && cmake ../ && sudo make install ``` In train_pred2.sh, adjust PATH for the location of codec2-dev on your machine. Generate 5E6 vectors using the -train option on dump_data to apply a bunch of different filters, then run the predictive VQ training script ``` $ cd LPCNet $ ./dump_data --train all_speech.s16 all_speech_features_5e6.f32 /dev/null $ ./train_pred2.sh ``` ## Mbest VQ search Keeps M best candidates after each stage: ```cat ~/Downloads/speech_orig_16k.s16 | ./dump_data --test - - | ./quant_feat --mbest 5 -q pred2_stage1.f32,pred2_stage2.f32,pred2_stage3.f32 > /dev/null``` In this example, the VQ error variance was reduced from 2.68 to 2.28 dB^2 (I think equivalent to 3 bits), and the number of outliers >2dB reduced from 15% to 10%. ## Streaming of WIA broadcast material Interesting mix of speakers and recording conditions, some not so great microphones. Faster speech than the training material. Basic unquantised LPCNet model: ```sox -r 16000 ~/Downloads/wianews-2019-01-20.s16 -t raw - trim 200 | ./dump_data --c2pitch --test - - | ./test_lpcnet - - | aplay -f S16_LE -r 16000``` Fully quantised at (44+8)/0.03 = 1733 bits/s: ```sox -r 16000 ~/Downloads/wianews-2019-01-20.s16 -t raw - trim 200 | ./dump_data --c2pitch --test - - | ./quant_feat -g 0.25 -o 6 -d 3 -w --mbest 5 -q pred_v2_stage1.f32,pred_v2_stage2.f32,pred_v2_stage3.f32,pred_v2_stage4.f32 | ./test_lpcnet - - | aplay -f S16_LE -r 16000``` ## Fully quantised encoder/decoder programs Same thing as above with quantisation code packaged up into library functions. Between quant_enc and quant_dec are 52 bit frames every 30ms: ```cat ~/Downloads/speech_orig_16k.s16 | ./dump_data --c2pitch --test - - | ./quant_enc | ./quant_dec | ./test_lpcnet - - | aplay -f S16_LE -r 16000``` Same thing with everything integrated into stand alone encoder and decoder programs: ```cat ~/Downloads/speech_orig_16k.s16 | ./lpcnet_enc | ./lpcnet_dec | aplay -f S16_LE -r 16000``` The bit stream interface is 1 bit/char, as I find that convenient for my digital voice over radio experiments. The decimation rate, number of VQ stages, and a few other parameters can be set as command line options, for example 20ms frame rate, 3 stage VQ (2050 bits/s): ```cat ~/Downloads/speech_orig_16k.s16 | ./lpcnet_enc -d 2 -n 3 | ./lpcnet_dec -d 2 -n 3 | aplay -f S16_LE -r 16000``` You'll need the same set of parameters for the encoder as decoder. Useful additions would be: 1. Run time loading of .h5 NN models. 1. A --packed option to pack the quantised bits tightly, which would make the programs useful for storage applications. ## Direct Split VQ Four stage VQ of log magnitudes (Ly), 11 bits (2048 entries) per stage, First 3 stages 18 elements wide; final stage 12 elements wide. During training this acheived similar variance to 4 stage predictive quantiser (measured on 12 bands). Same bit rate, but direct quantisation means more robust to bit errors and especially packet loss. ``` sox ~/Desktop/deep/quant/wia.wav -t raw - | ./dump_data --c2pitch --test - - | ./quant_feat -d 3 -i -p 0 --mbest 5 -q split_stage1.f32,split_stage2.f32,split_stage3.f32,split_stage4.f32 | ./test_lpcnet - - | aplay -f S16_LE -r 16000 ``` Compare this to four stage predictive VQ of Cepstrals (DCT of Ly), 11 bits (2048 entries) per stage, 18 element wide vectors. We quantise the predictor output. ``` sox ~/Desktop/deep/quant/wia.wav -t raw - | ./dump_data --c2pitch --test - - | ./quant_feat -d 3 -w --mbest 5 -q pred_v2_stage1.f32,pred_v2_stage2.f32,pred_v2_stage3.f32,pred_v2_stage4.f32 | ./test_lpcnet - - | aplay -f S16_LE -r 16000 ``` Both are decimated by a factor of 3 (so 30ms update of parameters, 30*44=1733 bits/s). # Effect of Bit Errors Random 1 Bit Error Rate (BER): Predictive: ```sox wav/wia.wav -t raw -r 16000 - | ./lpcnet_enc | ./lpcnet_dec -b 0.01 | aplay -f S16_LE -r 16000``` Direct-split: ```sox wav/wia.wav -t raw -r 16000 - | ./lpcnet_enc -s | ./lpcnet_dec -s -b 0.01 | aplay -f S16_LE -r 16000```