Investigation of LPC Synthesis Parameters

LPC Introduction

Linear Predictive Coding is a technique used to model speech and other similar systems. It has applications in the following areas: [1]

  • Economic series modeling
  • Seismic exploration
  • Speech synthesis, coding, and recognition
  • Communications receivers

LPC is the process of predicting future samples in a sequence given a set of its N past values. Thus, for an LPC resynthesis of order N, the following equation is used:

{photo href=”http://www.flickr.com/photos/chancesend/390852670/” title=”LPC resynthesis of order N” src=”http://farm1.static.flickr.com/128/390852670_4de715eb3a_m.jpg” width=”240″ caption=”LPC resynthesis of order N”}

To determine the proper set of coefficients "a", we need to predict what values would produce a resynthesis closest to the original signal. This prediction can be accomplished by various means, but the minimization of the mean-square error of the prediction is a common method.

For speech coding and synthesis, a window of samples is analyzed according to the above equation. Upon calculating the linear predictor coefficients, the residual signal (error signal) epsilon is calculated. The coeffients are then saved and the next window is analyzed. Upon analyzing the entire signal, the LPC coefficients are saved for later resynthesis, or transmitted across a channel for remote resynthesis.

Because only coefficients are saved, the coded signal has a much lower bitrate and thus is useful for applications such as cellular phones, internet audio and voice prompting, where high bitrates are not available.

It is interesting to note that if the residual signal is added to the resynthesized signal, the original signal can be perfectly reconstructed. This is because the residual is by its very nature the difference of the input and resynthesized signals. Though at first this sounds like lossless compression, the residual has a bitrate equal to the original signal, so transmitting the residual does not result in any data compression. Usually, the residual is simply thrown out after analysis.

During signal resynthesis, the speech is modeled as a periodic impulse (glottal pulses) filtered through a vocal tract modeled by the linear prediction coefficients. Though this approximates the fundamentals of speech well, it does not model any of the complexities of the human voice, such as inharmonic frequencies and voiced/unvoiced-combination vocal utterances.

Implementation

To investigate LPC synthesis in detail, we will use a computer routine originally developed by Perry R. Cook. The code has been updated to use LibSndFile for sound file I/O, and signal statistics have been added in order to qualitatively measure resynthesis quality.

The source code has been compiled on Microsoft Visual C++ .Net, but should be compiler- and platform-independant. (Note: source requires LibSndFile to compile) Effect of LPC order on resynthesis quality

As the order of the LPC analysis increases, the resynthesized signal approximates the original signal more closely. We can here this from the following audio samples:

From these audio samples, it is clear that as order increases, the resynthesized LPC signal approximates the original file better. We can also ear the robotic nature of LPC, with its simplistic vocal tract model. However, we want to objectively measure the effect of LPC order on analysis quality.

We will look at plots of several signal statistics versus LPC order, to measure signal quality.

Average Block Error, a measure of the average mean square error of the synthesized output across all analysis blocks

{photo href=”http://www.flickr.com/photos/chancesend/390852657/” title=”LPC Block Error Power” src=”http://farm1.static.flickr.com/163/390852657_a4ba4f1949_m.jpg” width=”240″ caption=”Average Block Error vs. LPC Order”}

Input / Residual cross-correlation, a measure of how closely related the input and residual signals are to eachother.

{photo href=”http://www.flickr.com/photos/chancesend/390852841/” title=”Cross-correlation of Residual & Input” src=”http://farm1.static.flickr.com/169/390852841_69a84fe908_m.jpg” width=”240″ caption=”Cross Correlation of Residual and Input vs. LPC Order”}

Input / Output cross-correlation, a measure of how closely related the input and output signals are to each other.

{photo href=”http://www.flickr.com/photos/chancesend/390852833/” title=”Cross-correlation of Output & Input” src=”http://farm1.static.flickr.com/130/390852833_26d943c775_m.jpg” width=”240″ caption=”Cross Correlation of Output and Input vs. LPC Order”}

Mean square error of residual, a measure of how much deviation exists between the residual and input signals.

{photo href=”http://www.flickr.com/photos/chancesend/390852686/” title=”Mean Square Error of Residual” src=”http://farm1.static.flickr.com/152/390852686_5946e7ddd2_m.jpg” width=”240″ caption=”Mean Square Error of Residual vs. LPC Order”}

Mean square error of output, a measure of how much deviation exists between the output and input signals

{photo href=”http://www.flickr.com/photos/chancesend/390852680/” title=”Mean Square Error of Output” src=”http://farm1.static.flickr.com/142/390852680_081bf1be49_m.jpg” width=”240″ caption=”Mean Square Error of Output vs. LPC Order”}

Power of residual, a measure of signal strength. Notice that residual power decreases as order increases, as more of the signal is coded in the predictor coefficients.

{photo href=”http://www.flickr.com/photos/chancesend/390852699/” title=”Power of Residual” src=”http://farm1.static.flickr.com/151/390852699_0177d7f7cf_m.jpg” width=”240″ caption=”Power of Residual vs. LPC Order”}

Power of output, a measure of signal strength.

{photo href=”http://www.flickr.com/photos/chancesend/390852693/” title=”Power of Output” src=”http://farm1.static.flickr.com/161/390852693_ce5700f19f_m.jpg” width=”240″ caption=”Power of Output vs. LPC Order”}

From the plots above, we can make several observations.

  1. The input/output correlation does not follow the general trend of the other statistics. This is to be expected, for LPC is not designed to reconstruct the time-domain signal. Thus, phase differences will lead to an output signal that is not correlated to its corresponding input.
  2. There exists a "knee" near order 9 and 10, above which the statistics do not change much. Using an order above roughly 10 does not affect signal resynthesis enough to justify the added computing time.
  3. The LPC model will never reach perfect reconstruction, even with a very high order. The reason is the related to the two points above: LPC is not designed to reconstruct a signal, but rather merely synthesize speech using a simplistic model of the vocal tract.
  4. Upon reflection, several of these statistics are meaningless for our explorations. Particularly, Input/Output Cross-Correlation, Output Mean Square Error, and Output Power are not relevant for our discussion, as LPC is not designed to recreate the time-domain version of a signal.

We can also view the time- and frequency-domain plots of the output signals for various LPC orders. We will see how LPC resynthesis of increasing order affects the quality of signal synthesis.

{photo href=”http://www.flickr.com/photos/chancesend/390852812/” title=”LPC time-domain output animation” src=”http://farm1.static.flickr.com/160/390852812_3018863e87_o.gif” width=”240″ caption=”Animation of output in time-domain vs. LPC Order”}

A few observations can be made about this time-domain animation.

  1. The physical model of LPC speech synthesis is clearly seen in the time-domain. As previously discussed, the vocal tract is modeled by a pulse train filtered through a model of the vocal tract. The vocal tract model is governed by the LPC coeffients. As we see, as the LPC order increases, the time-domain response contains an increasing number of harmonics.
  2. We can see that even for order = 1, the base frequency of the synthesized signal matches the original. This is because accuracy of the base frequency is not dependant on LPC order. Interestingly, LPC resynthesis can use frequency parameters from other sources. Through this technique, interesting robotic effects, hybrid sounds, or whispers can be created.
  3. Though the time-domain signal resembles the original closer at higher orders, it does not exhibit some of the subtleties that exist in the original signal. These nuances are properties of the original signal that cannot be modeled through the LPC vocal tract model, and are thrown away with the residual.

{photo href=”http://www.flickr.com/photos/chancesend/390852783/” title=”LPC freq-domain output animation” src=”http://farm1.static.flickr.com/132/390852783_2c5ed371bc_m.jpg” width=”240″ caption=”Animation [not working -RA] of output in freq-domain vs. LPC Order”}

From the above plot, we notice several things

  1. The base frequency and its harmonics of a resynthesis of any order are the same as the original signal. This is the effect of the pulse train used in the resynthesis.
  2. In the frequency domain, the LPC vocal tract model attempts to match the formant curve of the original signal. We can see that higher LPC orders produce responses closer to the original signal, since more poles are available to match the original response.
  3. After roughly order = 10, the frequency response does not change all that much. Thus, higher order LPC analyses can be considered a waste of processing power and bandwidth.

Stability of LPC coefficients

Careful observers might notice that for the time-domain and frequency-domain graphs above, orders N=2 and N=3 produce odd plots. Indeed, the Cook implementation used sometimes produces unstable responses.

Looking into the source of the unstable responses, we note that using autocorrelation to minimize the mean-square of the error-signal (which the Cook implementation uses) should result in guaranteed stability [2]. However, further investigation reveals that precision and roundoff errors for coefficients near 0 or greater than 1 may cause the actual response to deviate from the required response, generating instabilities in the process.

These errors result in improper frequency and time responses for the orders in question, as seen in the above animations.

Concluding remarks

LPC Synthesis is a powerful technique for speech analysis, coding, and resynthesis. By investigating various stastics, parameters and outputs related to the technique, we can better understand the effects of LPC on an input signal.

We have shown that LPC is not designed for reconstructing audio signals, and as a result does not work well for non-speech signal coding. But to transmit intelligible audio across a low bandwidth channel, LPC is a very useful technique.

We ave shown that beyond 9 or 10 coefficients, the increase in quality of the LPC reconstruction does not justify the added computation. For this reason, LPC-10 (with -10 denoting the 10 prediction coefficients, and 180 samples per analysis frame) became an industry-standard codec for low-bandwidth speech transmission. We can calculate the estimated bitrate of the coded signal as follows:

  • Original signal: 8000 samples/second = 64000 bits/second for 8-bit audio
  • Coded LPC signal, order 10: 44.44 frames/sec * (8 bits/coefficient/frame * 10 coefficients + 8 bits/frame pitch + 8 bits/frame gain) = 4088 bits/second
  • The LPC-10 specification includes slight changes in the bit allocation per frame, as follows: [3]
  • Coded LPC-10 signal: 44.44 frames/sec * (42 bits/frame for coefficients + 7 bits/frame pitch + 5 bits/frame for gain) = 2400 bits/second

It is quite evident from the above calculations that LPC coding produces an extraordinary bitrate compared with sampled audio, with a bitrate of roughly 3.5% of the original signal. Of course, the output will sound robotic and will be highly succeptible to other noise in the signal. But if transmitting intelligible speech reproduction is all that is needed, LPC is a wonderful tool.

References

  1. Gibson, Jerry D. Lecture on LPC analysis and coding, MAT 201A. UC Santa Barbara, 05/09/2005.
  2. Morgan, Nelson. Lecture slides on feature extraction, EECS 225D. UC Berkeley . {extlink href=”http://www.icsi.berkeley.edu/eecs225d/spr05/slides/frontend_arch.ppt” linktext=”http://www.icsi.berkeley.edu/eecs225d/spr05/slides/frontend_arch.ppt”}
  3. Robinson, Tony. Speech Vision Robots group website. {extlink href=”http://svr-www.eng.cam.ac.uk/~ajr/SA95/node87.html” linktext=”http://svr-www.eng.cam.ac.uk/~ajr/SA95/node87.html”}