| |
Real-Time Speech Pitch Shifting on an FPGA
Hardware Implementation
The software interface chosen for programming the FPGA was the MATLAB
Simulink environment and Xilinx System Generator. Simulink is a real-time
simulation platform that provides a comprehensive set of "blocks" that can
be used to model a particular system. Each blockset contains a number of
elements representing filters, converters, sources, scopes, or other devices
that can be interconnected. System Generator, a high-level design utility
marketed by Xilinx for high-performance DSP systems, includes additional
blocksets specific to Xilinx FPGAs. These blocksets allow for very abstract
system design and modeling, and can be used to automatically generate HDL
netlists that can be downloaded to an FPGA. This development system greatly
simplifies the design process and can drastically reduce the time required
when compared to conventional HDL coding.
Another advantage is the ability of System Generator to run a hardware
co-simulation. After a design has been laid out in blocks, it can be
compiled to a single co-simulation block. A source generated on the computer
by Simulink can be sent to the FPGA, processed, and then returned to
Simulink for analysis in real-time. In this way, a design can actually be
tested with hardware in-the-loop, with input and output from MATLAB, to
ensure that it is working [6].

Figure 8. Simulink Schematic of a Frequency Shifter
The original SSB modulation frequency shifting algorithm was the first to
be designed and laid out in Simulink with the System Generator blocks and is
shown above in Figure 8. As can be seen, it is quite similar to the basic
diagram of Figure 1, illustrating the advantage of highly abstract
programming. Connecting the Xilinx blocks to those provided by Simulink are
gateways which provide the conversion from the floating point numerical
representation used by MATLAB to the fixed point format which is used by the
FPGA. This allows for the easy attachment of input/output Simulink blocks to
examine the operation of every aspect of the design. The ability to test in
this fashion makes this a very attractive approach.
The design was then compiled and downloaded to a Spartan3 FPGA as a hardware
co-simulation on a Digilent development board, provided free of charge by
Xilinx. This particular board, however, lacked the required I/O for stand
alone operation, and the Spartan3 was insufficient for the extent of the
project. (A low pass anti-aliasing filter had to be removed for the design
to fit the FPGA.) At the start of the Fall 2005 semester, Xilinx provided a
new Digilent development board (the XUPV2P), equipped with an AC-97 audio
codec and an audio amplifier which performs 18-bit analog-to-digital and
digital-to-analog conversion at rates up to 48kHz, as well as microphone and
speaker connectors. The onboard FPGA is a Virtex-II Pro, and with 88
multipliers, and over 20,000 logic cells, it provides ample capabilities for
pitch shifting purposes [7].
Implementation of Time Domain Algorithm
After analyzing the two pitch shifting algorithms, the time domain
approach was selected for its relative simplicity and reasonably high
quality output. Operating in the frequency domain, while certainly feasible,
would nonetheless be far more complicated, and it was decided that it could
be attempted as the next step after successful implementation of the time
domain algorithm.
The challenge of designing this system is that the convenience of MATLAB
functions is no longer available and that everything must be built using
digital logic elements. Additionally, the system must operate in real-time
with little temporal lag. Since the method to shift up is different from
shifting down, it is also necessary to create different components to
address both cases. Figure 9 shows the Simulink schematic of a working down
pitch shifter, with windows of size W truncated by 25%. The real-time
specification of the pitch shifter requires that the input and output
operate at the same sample time. As such, the primary component of this
design is the Xilinx Dual Port RAM block, which allows the simultaneous
reading and writing of data to and from a memory bank at two different
rates. As an input streams into the system, each window of data is stored
sequentially into memory, and each subsequent window replaces the former. At
the same time, the first 75% of the data points are read from the memory at
a slower rate such that both read and write operators finish at the same
time. In this fashion, the incoming and outgoing memory data are perfectly
synchronized but at different data rates. The output of this block is then
passed through an interpolator followed by a decimator. These cascading
elements are necessary to achieve the fractional interpolation required by
the algorithm. The final output is a stream of data, shifted lower in pitch
but possessing the same sample time.

Figure 9. Simulink Schematic of a Down Pitch Shifter (click to enlarge)

Figure 10. Output of the up pitch shifter. The first waveform shows
windowed excerpts from an input, padded in between with zeros. The second
waveform is a shifted version, with each window overlapping its counterpart
by 75%. The third is the down sampled addition and final output.
The implementation of an up shifter (Figure 11) is similar to the down
shifter, but requires a few additional elements. Again, a Dual Port RAM
block is used, but this time the incoming data is written at a rate slower
than the rate at which it is read. Once all data points have been read,
zeros are padded at the end of the data stream until the next window has
finished being written, and the process begins again. In this way,
synchronization is maintained. The windows have now been separated from each
other with zeros inserted in between (Figure 10). The data is then sent to
an addition block where it is added with a delayed version of itself. For a
window sized W and a fractional overlap of x,
the delay is equal to W(1-x).
Following this is an interpolation block and decimation block which convert
the signal back to its original sample time.

Figure 11. Simulink Schematic of an Up Pitch Shifter (click to enlarge)
Software/Hardware Issues
Using Simulink, working simulations of the up and down time domain pitch
shifter have been realized. Audio files stored in the .wav format can be
processed and written back and the resulting audio file is of the same
duration but different pitch. Due to the limitations of Simulink, a
real-time demonstration with microphone and speaker cannot be performed;
however, this design is perfectly capable of operating in real-time once it
has been implemented on the FPGA development board.
The hardware implementation of this system has not yet been realized due to
unforeseen difficulties with the software and hardware. The beginning of the
Fall 2005 semester, Villanova University issued new laptop computers on
which we could operate the Xilinx development software. This was not
anticipated to be problematic; however, the installation of the software was
plagued with errors and proved to be exceedingly time consuming. A greater
problem encountered with System Generator was its lack of support for I/O
devices. Our original understanding was that the entire design could be
built in Simulink using System Generator without having to do any manual
coding in VHDL. A Xilinx tutorial offered different methods for
incorporating the audio I/O, however, none worked properly, even with the
sample designs included with the tutorial. The development board, having
recently been released, was poorly supported by the software, and
documentation offered little insight into how the operate audio. Ultimately,
more time would be required to further research and learn how to utilize the
audio hardware on the board before the pitch shifter can be completed.
Copyright 2006 Habib Estephan, Scott Sawyer, and Daniel
Wanninger. All rights reserved.
|
|