Real-Time Speech Pitch Shifting on an FPGA

Hardware Implementation

The software interface chosen for programming the FPGA was the MATLAB Simulink environment and Xilinx System Generator. Simulink is a real-time simulation platform that provides a comprehensive set of "blocks" that can be used to model a particular system. Each blockset contains a number of elements representing filters, converters, sources, scopes, or other devices that can be interconnected. System Generator, a high-level design utility marketed by Xilinx for high-performance DSP systems, includes additional blocksets specific to Xilinx FPGAs. These blocksets allow for very abstract system design and modeling, and can be used to automatically generate HDL netlists that can be downloaded to an FPGA. This development system greatly simplifies the design process and can drastically reduce the time required when compared to conventional HDL coding.

Another advantage is the ability of System Generator to run a hardware co-simulation. After a design has been laid out in blocks, it can be compiled to a single co-simulation block. A source generated on the computer by Simulink can be sent to the FPGA, processed, and then returned to Simulink for analysis in real-time. In this way, a design can actually be tested with hardware in-the-loop, with input and output from MATLAB, to ensure that it is working [6].


Figure 8.
Simulink Schematic of a Frequency Shifter

The original SSB modulation frequency shifting algorithm was the first to be designed and laid out in Simulink with the System Generator blocks and is shown above in Figure 8. As can be seen, it is quite similar to the basic diagram of Figure 1, illustrating the advantage of highly abstract programming. Connecting the Xilinx blocks to those provided by Simulink are gateways which provide the conversion from the floating point numerical representation used by MATLAB to the fixed point format which is used by the FPGA. This allows for the easy attachment of input/output Simulink blocks to examine the operation of every aspect of the design. The ability to test in this fashion makes this a very attractive approach.

The design was then compiled and downloaded to a Spartan3 FPGA as a hardware co-simulation on a Digilent development board, provided free of charge by Xilinx. This particular board, however, lacked the required I/O for stand alone operation, and the Spartan3 was insufficient for the extent of the project. (A low pass anti-aliasing filter had to be removed for the design to fit the FPGA.) At the start of the Fall 2005 semester, Xilinx provided a new Digilent development board (the XUPV2P), equipped with an AC-97 audio codec and an audio amplifier which performs 18-bit analog-to-digital and digital-to-analog conversion at rates up to 48kHz, as well as microphone and speaker connectors. The onboard FPGA is a Virtex-II Pro, and with 88 multipliers, and over 20,000 logic cells, it provides ample capabilities for pitch shifting purposes [7].

Implementation of Time Domain Algorithm

After analyzing the two pitch shifting algorithms, the time domain approach was selected for its relative simplicity and reasonably high quality output. Operating in the frequency domain, while certainly feasible, would nonetheless be far more complicated, and it was decided that it could be attempted as the next step after successful implementation of the time domain algorithm.

The challenge of designing this system is that the convenience of MATLAB functions is no longer available and that everything must be built using digital logic elements. Additionally, the system must operate in real-time with little temporal lag. Since the method to shift up is different from shifting down, it is also necessary to create different components to address both cases. Figure 9 shows the Simulink schematic of a working down pitch shifter, with windows of size W truncated by 25%. The real-time specification of the pitch shifter requires that the input and output operate at the same sample time. As such, the primary component of this design is the Xilinx Dual Port RAM block, which allows the simultaneous reading and writing of data to and from a memory bank at two different rates. As an input streams into the system, each window of data is stored sequentially into memory, and each subsequent window replaces the former. At the same time, the first 75% of the data points are read from the memory at a slower rate such that both read and write operators finish at the same time. In this fashion, the incoming and outgoing memory data are perfectly synchronized but at different data rates. The output of this block is then passed through an interpolator followed by a decimator. These cascading elements are necessary to achieve the fractional interpolation required by the algorithm. The final output is a stream of data, shifted lower in pitch but possessing the same sample time.


Figure 9.
Simulink Schematic of a Down Pitch Shifter (click to enlarge)


Figure 10.
Output of the up pitch shifter.  The first waveform shows windowed excerpts from an input, padded in between with zeros.  The second waveform is a shifted version, with each window overlapping its counterpart by 75%.  The third is the down sampled addition and final output.

The implementation of an up shifter (Figure 11) is similar to the down shifter, but requires a few additional elements. Again, a Dual Port RAM block is used, but this time the incoming data is written at a rate slower than the rate at which it is read. Once all data points have been read, zeros are padded at the end of the data stream until the next window has finished being written, and the process begins again. In this way, synchronization is maintained. The windows have now been separated from each other with zeros inserted in between (Figure 10). The data is then sent to an addition block where it is added with a delayed version of itself. For a window sized W and a fractional overlap of x, the delay is equal to W(1-x). Following this is an interpolation block and decimation block which convert the signal back to its original sample time.


Figure 11.
Simulink Schematic of an Up Pitch Shifter (click to enlarge)

Software/Hardware Issues

Using Simulink, working simulations of the up and down time domain pitch shifter have been realized. Audio files stored in the .wav format can be processed and written back and the resulting audio file is of the same duration but different pitch. Due to the limitations of Simulink, a real-time demonstration with microphone and speaker cannot be performed; however, this design is perfectly capable of operating in real-time once it has been implemented on the FPGA development board.

The hardware implementation of this system has not yet been realized due to unforeseen difficulties with the software and hardware. The beginning of the Fall 2005 semester, Villanova University issued new laptop computers on which we could operate the Xilinx development software. This was not anticipated to be problematic; however, the installation of the software was plagued with errors and proved to be exceedingly time consuming. A greater problem encountered with System Generator was its lack of support for I/O devices. Our original understanding was that the entire design could be built in Simulink using System Generator without having to do any manual coding in VHDL. A Xilinx tutorial offered different methods for incorporating the audio I/O, however, none worked properly, even with the sample designs included with the tutorial. The development board, having recently been released, was poorly supported by the software, and documentation offered little insight into how the operate audio. Ultimately, more time would be required to further research and learn how to utilize the audio hardware on the board before the pitch shifter can be completed.

Copyright 2006 Habib Estephan, Scott Sawyer, and Daniel Wanninger.  All rights reserved.