Real-Time Speech Pitch Shifting on an FPGA

Frequency Shifting

The simplest way to shift the frequency content of an audio signal is by single-sideband (SSB) modulation. This process entails eliminating the negative frequency content of a signal, modulating the positive frequencies by multiplication with a complex sinusoid, and finally reconstructing the real signal. Elimination of the lower sideband prevents frequency content from switching sidebands during modulation. Multiplication with a complex sinusoid (as opposed to a real-valued sinusoid) is used to avoid imaging issues introduced by standard modulation.

The lower sideband of a real signal can be removed using the Hilbert transform, a type of all-pass filter easily approximated in hardware or software as a finite-length FIR filter. For any signal g(t), its Hilbert transform, denoted ĝ(t) is defined as follows:

  (Eq. 1)

For the case of a real input signal, the transform outputs only the upper sideband, producing a signal subsequently referred to as the analytic signal [1]. Appendix A includes a derivation of how we can achieve SSB modulation using the Hilbert transform yielding the following result:

  (Eq. 2)

where xp[n] is the analytic signal, ωc is the amount of frequency shift in radians per second, x[n] is the input signal, and Ts is the sampling period.

This relatively simple technique lends itself well to hardware implementation. Figure 1 shows a block diagram of the frequency shifter using operations that can be easily implemented using Simulink-based FPGA design tools. A filter is added before the input to of the Hilbert transformer to prevent frequency content at the ends of the spectrum from aliasing when modulated. The filtered signal is then branched to the Hilbert transform filter and a delay line equal in length to the Hilbert filter to ensure that the signal will recombine in phase.


Figure 1. Block diagram of SSB frequency shifter

To evaluate the effectiveness of SSB modulation as a pitch shifting solution, we have simulated the algorithm in MATLAB for qualitative and quantitative analysis. As an initial test, classical music is modulated up in frequency by 100 Hz. Qualitatively, it is immediately evident that the algorithm dramatically changes the sound of the input. The original warm sound of the music becomes metallic and dissonant. While the pitch is audibly higher, the shift introduces significant harmonic distortion. Figure 2 shows the signal frequency spectrum before and after the modulation.


Figure 2. Signal spectrum before and after SSB modulation

The output spectrum confirms that the input is linearly shifted along the frequency axis. That is, each frequency component is increased or decreased by an additive constant, 100 Hz in this example. Linear frequency shifts have many applications; however, human perception of sound relies on the harmonic relationship between frequency components. Modulation does not preserve this harmonic relationship, which results in the perceived degradation of sound quality. Speech input demonstrates that this shifting technique is less problematic for non-musical audio signals. Nonetheless, SSB modulation is certainly not an ideal pitch shifting solution.

Copyright 2006 Habib Estephan, Scott Sawyer, and Daniel Wanninger.  All rights reserved.