Tutorial 3: Example Spectrometer
Introduction
In this tutorial will go over building a simple spectrometer using CASPER DSP and hardware yellow blocks for RFSoC.
This tutorial assumes that the casper-ite is familiar wth the RFDC Interface tutorial. This also assumes that the CASPER development environment is setup for RFSoC as described in the Getting Started tutorial. A brief walkthrough of example of the spectrometer design, software control of the spectrometer will be given in this tutorial. However, it is also beneficial to become learn how to interact with the FPGA using casperfpga
as demonstrated in the simulink platform introduction.
A spectrometer is an analysis filter bank that takes for its input a time domain signal and transforms it to a frequency domain representation. In digital systems, this is typically achieved by utilising the FFT (Fast Fourier Transform) algorithm. However, with a modest increase in compute, better spectral bin performance can be improved by using a Polyphase Filter Bank (PFB) based approach.
When designing a spectrometer for astronomical applications, it is important to consider the target science case. For example, pulsar timing searches typically require a spectrometer that can dump spectra on short timescales. This allows the rate of change of the spectral content to be finely observed. In contrast, a deep field HI survey will accumulate multiple spectra to increase the signal to noise ratio above a detectable threshold. It is important to note that “bigger isn’t always better” here; the higher your spectral and time resolution are, the more data your computer (and scientist on the other end) will have to deal with. For now however, we skip the target science case and rather look to more familiarize ourselves with an example spectrometer design.
Setup
This tutorial comes with a completed simulink model files for RFSoC platforms. There are different examples for configuring the RFSoC in different operational modes. Those files can be found here.
Spectrometer Basics
When designing a spectrometer there are a few main parameters of note:
Bandwidth: The width of your frequency spectrum, in Hz. This depends on the sampling rate; for complex sampled data this is equivalent to:

In contrast, for real or Nyquist sampled data the rate is half this:

as two samples are required to reconstruct a given waveform.
Frequency resolution: The frequency resolution of a spectrometer, Δf, is given by:

and is the width of each frequency bin. This parameter is a measure of how precise you can measure a frequency, or rather its frequency resolution.
Time resolution: Time resolution is simply the spectral dump rate of your instrument. We generally accumulate multiple spectra to average out noise; the more accumulations we do, the lower the time resolution. For looking at short timescale events, such as pulsar bursts, higher time resolution is necessary; conversely, if we want to look at a weak HI signal, a long accumulation time is required, so time resolution is less important.
Simulink / CASPER Toolflow
Simulink Design Overview
If you’re reading this, hopefully you’ve managed to find the model files. Open a model file of your choice and have a look around to start to get a vague idea of what’s happening. The best way to understand fully is to follow the dataflow arrows starting from the RFDC. Then read a few of the comment boxes and design annotations. Also, it is worthwhile to go through what each block is doing and make sure you know why each block is configured the way it is. To help you through, there is some “blockumentation” in the following part of the tutorial. That blockumentation will try to highlight some of the important ports and parameters to the blocks and their configuration to help guide to answers for the questions you may have.
A brief rundown of some top-level details are as follows before you get down and dirty:
There are two styles of spectrometer tutorial model files. One that configures the RFDC data output mode to output in real sample mode. The others configure the RFDC to use the digital down converter and output complex data samples from the RFDC.
The comment boxes above the
RFDC
yellow block in these model spectrometer designs comment on theRFDC
sample rate, output data type, and number of samples per clock on the output interface of theRFDC
.The all-important Xilinx token is placed to allow System Generator to be called to compile the design.
In the platform yellow block, the hardware Platform is set to the target RFSoC platform (e.g., “ZCU216:xczu49dr”) and the User IP clock rate must be specified. Recall from the RFDC Interface that the
User IP Clock Rate (MHz)
field must match the reported clock rate in theRFDC
yellow block.Input signals are digitized by the RFDC. These samples presented in parallel at the output of the
RFDC
each clock cycle. The output data type will be expressed as a simulink fixed point integers (e.g.,fix_128_0
).The parallel time samples pass through the
pfb_fir_real
andfft_wideband_real blocks
(pfb_fir
andfft
blocks when theRFDC
is configured to output complex data samples). Together these blocks implement a polyphase filter bank.When the data output is real from the
RFDC
the FFT size is selected to be 212 = 4096 points. Because the CASPER wideband FFT outputs only the positive frequency bins we will have a 211 = 2048 channel filter bank.When the data output is complex from the
RFDC
the FFT size is selected to be 211 = 2048.You may notice Xilinx delay blocks dotted all over the design. It’s common practice to add these into the design as helps the design meet timing. It consumes more resources, but eases signal timing-induced placement restrictions.
The frequency bin outputs of the PFB passed through
power
blocks. This block converts from complex-valued outputs to real-valued power.The bin power enters the vector accumulators, vacc0 and vacc1. These are
simple_bram_vacc
64-bit vector accumulators. Accumulation length is controlled by theacc_cntrl
block. The accumulation length is set in software controlled bycasperfpga
.The accumulated signal is then fed into a BRAM yellow blocks.
Continue further to familiarize yourself with the model file. Clicking around, opening configuration windows and refer to the blockumentation as needed.
RFDC
The first step to creating a frequency spectrum is to digitize the signal. This is done with an ADC (analog-to-digital converter). For RFSoC, the ADC is represented by the RFDC
(RF Data Converter) yellow block. Work through the RFDC tutorial if you not already familiar with this block.
The RFDC
converts analog inputs to digital outputs. Every clock cycle, the inputs are sampled and digitized to a 14-bit, 2’s complement binary number representation. These samples are packed and MSB aligned into 16-bit words and presented in parallel on the output interface. This means we can represent numbers from -32768 through to 32767, including the number 0. Simulink represents such numbers as a fix_32_0
data type. As an example, when the RFDC
is configured in real mode with 8 samples per clock, the output data type is fix_128_0
. For more information about the output data representation of the RFDC
refer to the RFDC product guide.
Recall from the RFDC tutorial that when the RFDC
is configured in real -> I/Q
output mode quad-tile and dual-tile RFSoCs have differing behavior. For dual-tile platforms, in both Real
and I/Q
digital output modes these platforms output all data bits on the same bus. So for example, with 4
samples per clock this results in 2
complex samples ordered {I1, Q1, I0, Q0}
. Where in each ADC word, the most recent sample is at the MSB of the word.
For dual-tile platforms in I/Q
digital output modes, the inphase and quadarature data are produced from different ports. In this mode the first digit of the signal name corresponds ot the tile index (same for quad-tiles). But, the second digit is 0
for inphase and 1
for quadrature data from the first ADC on the tile. The second ADC will then have 2
for inphase data and 3
for quadrature data. For example, with 4
sample per clock this is 4
complex samples with the two complex components coming from different ports, m00_axis_tdata
for inphase data on tile 0 ADC 0 ordered {I3, I2, I1, I0}
and m01_axis_tdata
for tile 0 ADC 1 with quadrature data ordered {Q3, Q2, Q1, Q0}
. When configured in Real
digital output mode the second digit is 0
for the first ADC and 2
for the second.
Block Configuration
The following configuration examples are for the RFDC
from the rfsoc4x2_tut_spec.slx
model file. Refer to the RFDC
block in each model file for their respective configuration.
For the RFSoC 4x2
, each tile tab is configured the same with Enable Tile
selected and clock configuration set as follows.Note that the Required AXI4-Stream Clock (MHz)
field matches the clock of the RFSoC 4x2
platform block.

Each of the ADCs are configured as shown in the following following image. The output data mode is Real
with decimation factor set to 2x
and 8
samples per cycle. This will present an interface with simulink data type fix_128_0
to the design.

When operating the RFSoC 4x2
with the complex data mode spectrometer example (rfsoc4x2_tut_spec_cx.slx
) the ADC tiles are configured as shown in the following image. The output data mode is now I/Q
still with a decimation factor of 2x
and 8
samples per clock. The Mixer Type
is set to Fine
with Real -> I/Q
mode enabled and the NCO frequency set to -0.98304
. Refer again to the RFDC tutorial and the RFDC product guide for how these parameters affect the RFDC
operation. But, the result is that with the digital down converter active the input signal is first shifted by the fine frequency mixer and then passed through to the decimation filters.

Munge blocks
The munge blocks in these designs are responsible for reordering the output data from the RFDC
. This is required because the RFDC
outputs time samples with the newest time samples in the MSB and CASPER blocks want the oldest sample in the MSB (top of CASPER blocks, e.g., port 0
). When the RFDC
is configured to output the 8
real samples this just reversing the word order as shown in the following image for the munge block configuration.

With quad-tile platforms (e.g., ZCU216) when configured to output complex samples it is the same approach. This is because the samples are interleaved real/imaginary on the same bus. This translates to changes in the munge block to adjust the bitwidths and division sizes.
With the RFSoC 4x2
and other dual-tile platforms, when configured to output complex samples we have to build the interleaved real/imaginary samples before sending them to the pfb_fir
block. To do this, the real and imaginary parts of the samples coming from the output of the RFDC
on their respective interfaces are first combined with a bus_create
block. The munge then reorders the blocks. The following figure graphical shows how the divisions are reordered and their respective index for when the output is 4 samples per clock. Here I<#>
represents the real time sample at sample index #
and the same for Q<#>
but corresponding to the imaginary sample. In these examples, I3
and Q3
correspond to the newest time sample and I0
and Q0
are the oldest.

The colors in this figure are used to track the sample input position relative to the output position. Also note that that the “munge reorder index” column places the 0-th
element at the MSB with the highest value at the LSB. The 3rd column top-to-bottom is then what is input to the Division Packing Order
field of the munge
block. The resulting vector used in as the munge block parameter is the reordered column entered top to bottom from this table. In the RFSoC 4x2
design there are 8
samples per clock. Take a look at the munge block in the rfsoc4x2_tut_spec_cx.slx
model and this example to see if you can follow how the complex samples are created.
Polyphase FIRs
There are two main blocks required for a polyphase filter bank. The first is a polyphase FIR block and the second is the FFT. Depending on the input data type DSP different blocks can be used to take save hardware resources when possible. For real valued time samples the pfb_fir_real block is used. When the data are complex the pfb_fir is used instead. A polyphase FIR works by dividing the input time signal into parallel “taps” then applies finite impulse response filters (FIR). The output of this block is still a time-domain signal. When combined with an FFT, this constitutes a polyphase filter bank. The fft_wideband_real block is used following the pfb_fir. The fft block is follows the pfb_fir. The complex valued inputs are the generalized implementation of these DSP algorithms. Review the real-valued and complex-valued RFSoC example designs to see more of the differences.
The following descriptions are for the pfb_fir_real block but the general pfb is configured similar.

INPUTS/OUTPUTS
Port |
Data Type |
Description |
---|---|---|
sync |
bool |
A sync pulse should be connected here. |
pol<#>_in<#> |
inherited |
The (real) time-domain stream(s). |
PARAMETERS
Parameter |
Description |
---|---|
Size of PFB |
This parameter is to match the FFT size we are to have.
|
Number of taps |
The number of taps in the PFB FIR filter. Each tap uses 2 real multiplier
cores and requires buffering the data streams for 2*PFBSize samples.
Generally, more taps means less inter-channel spectral leakage, but more
logic is used. There are diminishing returns after about 8 taps or so.
|
Windowing function |
Which windowing function to use (this allows trading passband ripple for
steepness of rolloff, etc). Hamming is the default and best for most
purposes.
|
Number of Simultaneous
Inputs
|
The number of parallel time samples which are presented to the core
each clock.
|
Make biplex |
0 (not making it biplex) is default. Double up the inputs to match with a
biplex FFT.
|
Input bitwidth |
The number of bits in each real and imaginary sample input to the PFB. The
ADC outputs 16-bit data.
|
Output bitwidth |
The number of bits in each real and imaginary sample output from the PFB.
This should match the bit width in the FFT that follows. 18 bits is
recommended as a minimum. Recommended values depend on the DSP
architecture for the FPGA.
|
Coefficient bitwidth |
The number of bits in each coefficient. This is usually chosen to be less
than or equal to the input bit width.
|
Use dist mem for coefficients |
Store the FIR coefficients in distributed memory (if = 1). Otherwise,
BRAMs are used to hold the coefficients. 0 (not using distributed memory)
is default.
|
Add/Mult/BRAM/Convert Latency |
These values set the number of clock cycles taken by various processes in
the filter. There’s normally no reason to change this unless you’re having
troubles with design timing.
|
Quantization Behaviour |
Specifies the rounding behaviour used at the end of each butterfly
computation to return to the number of bits specified above. Rounding is
strongly suggested to avoid artifacts.
|
Bin Width Scaling |
PFBs give enhanced control over the width of frequency channels. By
adjusting this parameter, you can scale bins to be wider (for values > 1)
or narrower (for values < 1).
|
Multiplier specification |
Specifies what type of resources are used by the various multiplications
required by the filter.
|
Fold adders into DSPs |
If this option is checked, adding operations will be combined into the
FPGAs DSP cores, which have both the multiplying and adding capabilities.
|
Adder implementation |
Adders not folded into DSPs can be implemented either using fabric
resources (i.e. registers and LUTs in slices) or using DSP cores. Here you
get to choose which is used. Choosing a behavioural implementation will
allow the compiler to choose whichever implementation it thinks is best.
|
Share coeff. between | polarisations |
Where the PFB block is simultaneously processing more than one
polarization, you can save RAM by using the same set of coefficients for
each stream. This may, however, make the timing performance of your design
worse.
|
CASPER FFTs
The FFT block you use is is the most important part of the design to understand. The cool green of the FFT blocks hide the complex and confusing FFT butterfly biplex algorithms that are under the hood. You do need to have a working knowledge of it though, so I recommend reading Chapter 8 and Chapter 12 of Smith’s free online DSP guide at (http://www.dspguide.com/). Parts of the documentation below are taken from the fft_wideband_real
block documentation. When the RFDC
outputs complex time samples the fft block needs to be used instead. Similarly, see the fft
block documentation for more information.

INPUTS/OUTPUTS
Port |
Description |
---|---|
sync |
Like many of the blocks, the FFT needs a heartbeat to keep it sync’d. |
shift |
Sets the shifting schedule through the FFT. Bit 0 specifies the behavior of stage 0, bit 1 of stage 1, and
so on. If a stage is set to shift (with bit = 1), then every sample is divided by 2 at the output of that
stage. This strategy will always prevent overflow. This is the strategy chosen in these deigns.
|
in<#> |
Input data (real for wideband fft, complex for general fft) |
out<#> |
The number of ports is the number of simultaneous bin outputs produces. For a real input signal, the outputs
FFTs spectrum’s left and right halves are mirror images (complex conjugate symmetric). The block does not
output the imaginary (negative channel) indices.
Thus, for a 4096-point FFT, 2048 channels are output. This is why there are half the number of parallel
outputs. Each of these parallel FFT outputs will produce sequential channels on every clock cycle. On the
first clock cycle (after a sync pulse, which denotes the start), frequency channel zero is on port 0,
frequency channel one is on port 1 and so forth. Each of those are now complex-valued numbers. Then on the
clock cycle this process repeats where the previous bin count left off.
|
PARAMETERS
Parameter |
Description |
---|---|
Size of FFT |
How many points the FFT will have. The number of output channels will be
half this when the wideband real fft if used as previously explained.
|
Input/output bitwidth |
The number of bits in each real and imaginary sample as they are carried
through the FFT. Each FFT stage will round numbers back down to this
number of bits after performing a butterfly computation. This has to match
what the pfb_fir is throwing out.
|
Number of simultaneous inputs |
The number of parallel time samples which are presented to the FFT core
each clock.
|
Coefficient bitwidth |
The amount of bits for each coefficient. 18 is default.
|
Unscramble output |
Some reordering is required to make sure the frequency channels are output
in canonical frequency order. If you’re absolutely desperate to save as
much RAM and logic as possible you can disable this processing, but you’ll
have to make sure you account for the scrambling of the channels in your
downstream software. For now, because our design will comfortably fit on
the FPGA, leave the unscramble option checked.
|
Overflow Behavior |
Indicates the behavior of the FFT core when the value of a sample exceeds
what can be expressed in the specified bit width. Here we’re going to use
Wrap, since Saturate will not make overflow corruption better behaved.
|
Add Latency |
Latency through adders in the FFT. |
Mult Latency |
Latency through multipliers in the FFT. |
BRAM Latency |
Latency through BRAM in the FFT. |
Convert Latency |
Latency through blocks used to reduce bit widths after twiddle and
butterfly stages.
|
Input Latency |
Here you can register your input data streams in case you run into timing
issues.
|
Latency between internal biplexes and
fft_direct blocks |
Here you can add optional register stages between the two major processing
blocks in the FFT. These can help a failing design meet timing. For this
tutorial, you should be able to compile the design with this parameter set
to 0.
|
Architecture |
|
Number of bits above which to store
stage’s coefficients in BRAM
|
Determines the threshold at which the twiddle coefficients in a stage are
stored in BRAM. Below this threshold distributed RAM is used. By changing
this, you can bias your design to use more BRAM or more logic.
|
Number of bits above which to store
stage’s delays in BRAM
|
Determines the threshold at which the twiddle coeff. in a stage are stored
in BRAM. Below this threshold distributed RAM is used.
|
Multiplier Implementation |
Determines how multipliers are implemented in the twiddle function at
each stage. Using behavioral HDL allows adders following the multiplier to
be folded into the DSP48Es. Other options choose multiplier cores which
allows quicker compile time. You can enter an array of values allowing
exact specification of how multipliers are implemented at each stage.
|
Hardcode shift schedule |
If you wish to save logic, at the expense of being able to dynamically
specify your shifting regime using the block’s “shift” input, you can
check this box. Leave it unchecked for this tutorial.
|
Use DSP48’s for adders |
The butterfly operation at each stage consists of two adders and two
subtracters that can be implemented using DSP48 units instead of logic.
Leave this unchecked.
|
Power

The power block computes the power of a complex number. The power block typically has a latency of 5 and will compute the power of its input by taking the sum of the squares of its real and imaginary components.
INPUTS/OUTPUTS
Port |
Direction |
Data Type |
Description |
---|---|---|---|
c |
IN |
2*BitWidth Fixed point |
A complex number whose higher BitWidth bits are its real part and lower
BitWidth bits are its imaginary part.
|
power |
OUT |
UFix_(2*BitWidth)_(2*BitWidth-1) | The computed power of the input complex number. |
PARAMETERS
Parameter |
Variable |
Description |
---|---|---|
Bit Width |
BitWidth |
The number of bits in its input. |
Sync Gen
CASPER DSP blocks require a sync pulse as a periodic heartbeat to keep the data path flowing. This can be either a 1 PPS pulse or something generated internal to the design. We use an internal approach in this design with the sync gen
block. For more information about configuring this block please refer to the CASPER Sync Memo for how the reorders are to be setup for this block if the FFT size were to change in your designs.
Vector Accumulator
The simple_bram_vacc
block is used in this design for vector accumulation. Vector growth is approximately 28 bits each second. As the name suggests, the simple_bram_vacc
is simpler so it is fine for this demo spectrometer. Each vector accumulator will capture a fraction of the total output bandwidth in parallel. This means that the frequency bins will need to be interleaved to reconstruct the output spectrum when read back out. With N
parallel output frequency bins we will have N
vector accumulators, each containing, 2FFT_SIZE/N bins.

PARAMETERS
Parameter |
Description |
---|---|
Vector length |
The length of the input/output vector. Set to store the fraction of spectrum as explained above. |
no. output bits |
As there is bit growth due to accumulation, we need to set this higher than the input bits.
|
Binary point (output) |
Where the binary point is placed in the accumulated output.
|
INPUTS/OUTPUTS
Port |
Description |
---|---|
new_acc |
A boolean pulse should be sent to this port to signal a new accumulation. We can’t directly use the sync
pulse, otherwise this would reset after each spectrum. The
acc_cntrl keeps track of the accumulationto send this pulse.
|
din/dout |
Data input and output. The output depends on the no. output bits parameter. |
valid |
The output of this block will only be valid when it has finished accumulating (signaled by a boolean pulse
sent to
new_acc ). This will output a boolean 1 while the vector is being output, and 0 otherwise. |
Software Registers
There are a few control registers, led GPIOs, and snapshot blocks within the design:
cnt_rst: Counter reset control. Pulse this high to reset all counters back to zero.
acc_len: Sets the accumulation length. See python script help string for usage.
sync_cnt: Sync pulse counter. Counts the number of sync pulses issued. Can be used to figure out board uptime and confirm that your design is being clocked correctly.
acc_cnt: Accumulation counter. Keeps track of how many accumulations have been done.
led0_sync: The led0_sync light flashes each time a sync pulse is generated.
led1_new_acc: This lights up led1 each time a new accumulation is triggered.
led2_acc_clip: This lights up led2 whenever clipping is detected.
If you’ve made it to here, congratulations. Take a break and then come back for part two, which explains the second part of the tutorial – actually getting the spectrometer running, and having a look at some spectra.
Configuration and Control
Hardware Configuration
Make sure the RFSoC platform board is running the proper Linux image as explained in the Getting Started tutorial and that clocks are running (e.g., ZCU216 requires clocking module board be installed). You will also need test signals at the inputs of the RFSoC.
The tutorial .slx
model files for different platforms are found here. Extending the files to a different platform not yet provided is possible following this tutorial. Open one of the example model files and run the jasper
command in the matlab command prompt to build the .fpg
and .dtbo
files (found in model projects outputs/
folder). After this completes, we can now run and configure casperfpga
to communicate with the hardware design to readout and plot output spectra!
Python
We assume here working with the RFSoC 4x2
and provided rfsoc4x2_tut_spec.py
script for reading output. But, these instructions and files can be extended to other platforms.
There are two prebuilt model files: one using the RFDC
configured to output real time samples and the other enabling the digital down converter to output complex time samples. For the real spectrometer design the RFDC is set to sample at 3932.16 MHz with a decimation rate of 2x
. The spectrometer uses the fft_wideband_real
set to a transform size of 4096. The number of output bins is only the positive frequency with a a size of 2048. This is an effective bandwidth from 0 to 983.04 MHz. The complex spectrometer design is also set to sample at 3932.16 MHz with a decimation rate of 2x
. However, in this design the fine mixer is used with the NCO set to -983.04 MHz
. The FFT is set to transform to a size of 2048. With sufficient anti-alias filtering the effective bandwidth of this design is from 0 to 1966.08 MHz.
It may be helpful to first run the python script in the two operating modes before going through the script. This can help identify what the script does by knowing before what is being presented.
To get help information from the script you can run:
python rfsoc4x2_tut_spec.py -h
The general syntax to run the script is as follows. Either an IP address or hostname can be given to connect to the board. The string real
or cx
must be provided. This is to setup how to read out and display the spectra. The -b
option is used to program the board with a specified .fpg
file. If no file is provided the prebuilt .fpg
files are used. The -a
option is used to specify which adc input to plot. The value can be 0, 1, 2
, or 3
. The default is 0
. The -l
option is used to specify the accumulation length. The -s
option can be used if you know the design has already been previously programmed and is running. If not, it will be expected to a problem.
Run in real mode using the prebuilt model. I am assuming here a hostname of rfsoc4x2
for the board:
python rfsoc4x2_tut_spec.py rfsoc4x2 real
Assuming all goes well and with a plot window should appear updating the spectrum periodically, with the spectrum accumulation count displayed in the title. An example is shown in the following figure. Here a tone at 800 MHz is present on the input of the RFSoC and appears as expected 800 MHz.

Now run the complex mode design:
python rfsoc4x2_tut_spec.py rfsoc4x2 cx
Similar to the real mode, a plot should shortly appear. As an example, the same 800 MHz tone is present as shown.

As an example of something more interesting, the following output is from the complex version of the design with wideband noise filtered to a passband from about 1280-1780 MHz and a tone present in that passband at 1520 MHz.

Conclusion
If you have followed this tutorial faithfully, you should now know:
What a spectrometer is and what the important parameters for astronomy are.
Which CASPER blocks you might want to use to make a spectrometer, and how to connect them up in Simulink.
How to connect to and control a the RFSoC spectrometer using python scripting.