Real-sampled Biplex FFT (demuxed by 2)

Block: Real-sampled Biplex FFT (with output demuxed by 2) (fft_biplex_real_2x)
Block Author: Aaron Parsons
Block Maintainer: Andrew Martens
Document Author: Aaron Parsons, Andrew Martens

Contents

Summary

Computes the real-sampled Fast Fourier Transform using the standard Hermitian conjugation trick to use a complex core to transform a two real streams. Thus, a biplex core (which can do 2 complex FFTs) can transform 4 real streams. Twiddle factor, and other logic sharing, allows multiples of 4 input streams to be processed simultaneously with minimal resource increases. Only positive frequencies are output (negative frequencies are the mirror images of their positive counterparts). Data is output in normal frequency order, meaning that channel 0 (corresponding to DC) is output first, followed by channel 1, on up to channel 2N − 1 − 1. Real inputs 0 and 2 share one output port (with the data for 0 coming first, then the data for 2), likewise for inputs 1 and 3, and so on.

Please note that this documentation refers to the latest version of this block and may not be valid for older versions, please look in the history for older versions of this documentation.

Mask Parameters

Parameter

Variable

Description

Recommended Value

Number simultaneous inputs (4*?)

n_inputs

The number of inputs the FFT is to process as a multiple of 4.

Size of FFT: (2^?)

FFTSize

The number of channels computed in the complex FFT core. The number of channels output for each real stream is half of this.

Input bit width

input_bit_width

The number of bits in each real and imaginary sample as they are input to the FFT. If bit growth is not chosen, each FFT stage will round numbers back down to this number of bits after performing a butterfly computation. If bit growth is chosen, the number of bits will increase by one with every FFT stage up to the maximum specified.

To make optimal use of BRAMs => 18 For low FFT noise => 25

Input binary point

bin_pt

The position of the binary point in the input data

Coefficient Bit Width

coeff_bit_width

The number of bits used in the real and imaginary part of the twiddle factors at each stage.

18

Asynchronous operation

async

Whether valid data is input on every clock cycle or is flagged via the en input port.

Quantization Behavior

quantization

Specifies the rounding behaviour used at the end of each twiddle and butterfly computation to return to the number of bits if bit growth is not enabled or to keep the number of bits at the maximum specified.

NOT Truncate.

Overflow Behavior

overflow

Indicates the behaviour of the FFT core when the value of a sample exceeds what can be expressed in the specified bit width.

Add Latency

add_latency

Latency through adders in the FFT.

1

Mult Latency

mult_latency

Latency through multipliers in the FFT.

2

BRAM Latency

bram_latency

Latency through BRAM in the FFT.

2 For designs aimed at > 200MHz => 3

Convert Latency

conv_latency

Latency through blocks used to reduce bit widths after twiddle and butterfly stages.

1 For designs aimed at > 180Mhz => 2

Number bits above which to store stage’s coefficients in BRAM (2^? bits)

coeffs_bit_limit

Determines the threshold at which the twiddle coefficients in a stage are stored in BRAM. Below this threshold distributed RAM is used.

8 (ensures at least 2^8=256 bits out of 18432 bits of BRAM used)

Number bits above which to implement stage’s delays in BRAM (2^? bits)

delays_bit_limit

Determines the threshold at which data delays in a stage are stored in BRAM. Below this threshold distributed RAM is used.

8 (ensures at least 2^8=256 bits out of 18432 bits of BRAM used)

BRAM sharing in coeff storage

coeff_sharing

Real and imaginary components of twiddle factors can be generated from the same set of coefficients, reducing BRAM use at the cost of some logic.

Store a fraction of coeff factors where useful

coeff_decimation

The full set of twiddle factors can be generated from a smaller set, reducing BRAM use at the cost of the some logic.

Maximum fanout

max_fanout

The maximum fanout the twiddle factors are allowed to experience between where they are generated and when they are multiplied with the data stream. As the coefficients are shared, large fanout can occur which can affect maximum timing achievable. Decreasing the maximum fanout allowed should increase possible performance at the expense of some logic.

Multiplier specification (0=core, 1=embedded, 2=behavioural) (left=1st stage)

mult_spec

Array of values allowing exact specification of how multipliers are implemented at each stage. A single value indicates all multipliers be implemented in the same way.

2 (behavioral HDL) for each stage

Bit growth instead of shifting

bit_growth

Bit growth at every stage in the FFT can result in overflows which affect data quality. This can be prevented by dividing the data by two on the output of every stage, or by increasing the number of bits in the data stream by one bit. Shifting decreases the dynamic range and possible data quality whereas bit growth increases the resource requirements.

Max bits to growth to

max_bits

The maximum number of bits to increase the data path to when the bit growth option is chosen. Shifting is used for FFT stages after this.

Hardcode shift schedule

hardcode_shifts

When shifting to prevent overflow, use a fixed shifting schedule. This uses less logic and increases performance when compared to using a dynamic shift schedule.

Shift schedule

shift_schedule

When using a fixed shift schedule, use the shift schedule specified. A ‘1’ at position M in the array indicates a shift for the M’th FFT stage, a ‘0’ indicates no shift.

DSP48 adders in butterfly

dsp48_adders

The butterfly operation at each stage consists of two adders and two subtracters that can be implemented using DSP48 units instead of logic.

on (enabled) to reduce logic use.

Ports

Port

Dir

Data Type

Description

Recommended Use

sync

in

Boolean

sync is used to indicate the last data word of a frame of input data. When the block is in asynchronous operating mode an active signal is aligned with en being active. When the block is in synchronous operating mode, a an active pulse is aligned with the clock cycle before the first valid data of a new input frame.

Ensure the sync period complies with the memo describing correct use.

shift

in

Unsigned

Sets the shifting schedule through the FFT to prevent overflow. Bit 0 specifies the behavior of stage 0, bit 1 of stage 1, and so on. If a stage is set to shift (with bit = 1), then every sample is divided by 2 at the output of that stage.

pol

in

Signed consisting of one (Input Bit Width) width signals per input.

The time-domain stream(s) to be channelised.

Data amplitude should not exceed 0.5 (divide data by 2 pre-FFT)

en

in

Boolean

When asynchronous operation is chosen, this port indicates that valid input data is available on all input data ports.

sync_out

out

Boolean

Indicates that data out will be valid next clock cycle when in synchronous mode, or when dvalid is next active.

pol_out

out

Inherited

The frequency channels.

of

out

Unsigned, one bit per 4 inputs

Indication of internal arithmetic overflow. Not time aligned with data. The most significant bit is the flag for pol0_in, pol1_in, pol2_in and pol3_in etc.

dvalid

out

Boolean

Indicates that valid data is available on all output data ports.

Description

Computes the real-sampled Fast Fourier Transform using the standard Hermitian conjugation trick to use a complex core to transform a two real streams. Thus, a biplex core (which can do 2 complex FFTs) can transform 4 real streams. Twiddle factor, and other logic sharing, allows multiples of 4 input streams to be processed simultaneously with minimal resource increases. Only positive frequencies are output (negative frequencies are the mirror images of their positive counterparts). Data is output in normal frequency order, meaning that channel 0 (corresponding to DC) is output first, followed by channel 1, on up to channel 2N − 1 − 1. Real inputs 0 and 2 share one output port (with the data for 0 coming first, then the data for 2), likewise for inputs 1 and 3, and so on.