Tutorial 4: 100GbE

Introduction

This tutorial will introduce and demonstrate functionality of using the 100GbE yellow block with the RFDC over two parts. The first part is used as a simple example to get familiar with the 100GbE yellow block and to test your hardware setup. The second part of this tutorial will stream and packetize time samples from the RFDC and send them over 100GbE to a server and demonstrate a simple catcher and processing script for processing packets received from the RFSoC.

Prerequisites and Common Troubleshooting

  • Make sure the CASPER development environment is setup for RFSoC as described in the Getting Started tutorial. This includes the initialization of the git submodules.
  • The use of the integrated UltraScale+ 100G CMAC requires that the no charge license be downloaded and included with your licenses. That can be acquired here following the instructions under the section titled “Activation Instructions”. After the license has been added you can check the status of the license using the Vivado License Manager. This is can be accessed from in Vivado going to Help > Manage License. Or it can be opened from the Xilinx Information Center (XIC) Manage Installs Tab > Manage License. If the license has succussfully been included a line starting with cmac_usplus should appear in the licenses table. You may need to refresh the license cache if it has not been updated to reflect the cmac_usplus license. For more license related issues the managing licenses documentation from Xilinx may help here.
  • A 100GbE QSFP28 NIC installed in a server.
  • A 100GBASE-SR4 QSFP28 active optical cable (AOC) or 100GBASE-CR4 QSFP28 direct attach copper cable (DAC). Fiber is recommended as it has been tested more frequently.
  • All possible network hardware configurations (transceivers, cables, NICs, etc.) have not been tested and as such results may vary based on your chosen hardware.
  • When working with fiber, the transceivers must be coded to match the vendor for the hardware they plug into (e.g., NIC, switch). When the pluggable transceiver is to be used with the FPGA the “Generic” coded module should be purchased. But, in some cases it has been reported that vendor compatible transceivers have worked with the FPGA. Examples of hardware that would be purchased for using 100GbE between an RFSoC and a Mellanox 100GbE NIC would be: generic compatible 100GBASE-SR4 module, MTP fiber cable, and Mellanox Compatible 100GBASE-SR4 module. Other vendor transceivers and fiber should work as long as this scheme is followed.
  • When working with copper, cables longer than 3m may not work well. It is required to check the datasheets for the NIC (and switch) for compatibility with copper and the supported cable lengths they can drive.
  • A 100GbE switch is optional but can be used with this tutorial to switch packets if wanting to test as part of a larger network.
  • This design is (and others will typically always will be) setup to send Jumbo Ethernet packets. If no packets show up, make sure the NIC (and switch if used) are configured to have their MTU set to 9000 bytes. Otherwise packets can be filtered out by the OS and not received.
  • Configuring the RFDC was covered in the RFDC Interface tutorial. Refer back to this tutorial for any questions regarding the RFDC or what might be done to change the parameters to fit a different requirement.

Simple Packet Capture and Processing with Python

Script Overview

We assume here working with the RFSoC 4x2 and the provided python scripts under the py/ directory. These scripts can be extended for use with other platforms looking to implement similar functionality.

There are two scripts that are to be ran together in this example: a catcher, and a listener. The catcher (implemented in tut_100g_catcher.py) is the packet sniffer. It will open an Ethernet interface and capture and buffer raw Ethernet packets. Because opening a raw packet socket requires elevated (root) permissions it is required to run this script as root. It may not be desirable to do everything as root, so the listener (implemented in tut_100g_listener.py) creates a simple socket for the cathcer to forward the received packets to. The listener then parses the packets, does some simple packet filtering, extracts the packet payload (ADC time samples), performs a simple spectrometer operation, and plots the data. With 16 complex time samples per 512-bit word, and the packetizer sending 128 words per packet, there are 2048 time samples per packet. The FFT size chosen for this spectrometer was a 1024.

Before going through the code, it will help to go over what is being done and have first ran the scripts. This will help identify what is being done by knowing before what is being presented.

The listener and catcher implement a simple state machine where the listener is to be started first accepting configuration parameters for its arguments. These parameters are the RFSoC platform (casperfpga server) to connect to, the the ADC port to capture data on, and the number of packets to catch in a sequence. The listener then configures the RFSoC and sets up the socket for the catcher to connect to.

This same information is preneted by running the tut_100g_listener.py with the -h switch

./tut_100g_listener.py -h
Usage: tut_100g_listener.py <HOSTNAME_or_IP> [options]

-h, --help            show this help message and exit
-n NUMPKT, --numpkt=NUMPKT
                      Set the number of packets captured in sequence and
                      then sent to listener. Must be power of 2. default is 2**8
-s, --skip            Skip programming and begin to plot data
-b FPGFILE, --fpg=FPGFILE
                      Specify the fpg file to load
-a ADC_CHAN_SEL, --adc=ADC_CHAN_SEL
                      adc input to select values are 0,1,2, or 3. deafult is 0

The catcher is then started by passing the name of the ethernet interface it is to conenct to. It then opens that interface and connects to the listener socket. The listener will post the number of packets to capture in a sequence and wait for the listener to report that it is “ready” to start receiving packet sequences. After a sequence of packets has been sent the catcher waits until the listener is “ready” again before catching and sending the next sequence of packets.

The listener unpacks the sequence of packets parsing each packets IPv4/UDP header, payload packet count header, and packet payload data. The data is then transformed using an FFT, averaged over all the packets received in the sequence, and updates the plot of the spectra. After it plots it reports that it is “ready” to process the next packet sequence. This sequence repeats back and forth between the listener and the catcher until terminted with a keyboard interrupt at the catcher (also ending the listener).

Running the Scripts

With the above in mind, let’s run them!

First start the listener. If you are using your .fpg file that you created in this tutorial you would use the -b flag with the path to the .fpg. Otherwise, the prebuilt is used by default. An example using the prebuilt and connecting to an RFSoC with hostname rfsoc4x2, transmitting packets from the third ADC port, and setting the number of packets received in a sequence to 256 would be:

python tut_100g_listener.py rfsoc4x2 -n 256 -a 3

After starting the listener the following output should be reported:

using prebuilt fpg file at ../prebuilt/rfsoc4x2/rfsoc4x2_tut_100g_stream_rfdc.fpg
Connecting to rfsoc4x2
Programming FPGA with ../prebuilt/rfsoc4x2/rfsoc4x2_tut_100g_stream_rfdc.fpg...
done
setting capture on adc port 3
waiting for catcher to connect

At this point the catcher can be started. In the following the Ethernet interface that is opened is enp193s0f0.

python tut_100g_catcher.py enp193s0f0

A plot should then appear and continue to update with an increasing number of packet sequence counts reported in the title of the plot and by the catcher process. The catcher process will have also reported the configuration parameters received from the listener with a prompt to end the programs with a keyboard interrupt (Ctrl-c).

With an input on the RFSoC 4x2, and if all goes well, an example of the output should look like the following. Here a tone is injected starting at 500 MHz and stepping through the spectrum up to 1750 MHz.

Like in the example sepctrometer, a more interesting signal could be used at the input of the RFSoC. In the following, a wideband noise source is filtered to a passband from about 1280-1780 MHz with a tone present starting in that passband at 1520 MHz. The tone is then moved around that passband and ends back at 1520 MHz.

Conclusion

High-rate data transport is a critical component of digital radio astronomy instrumentation. This tutorial has demonstrated the functionality and use of the 100GbE yellow block on RFSoC platforms. Additionally, the manipulation of the output data from the RFDC with a simple packetization scheme has been demonstrated. More common packetizer implementations will vary in complexity, but the fundamentals lie in being able to manipulate data ordering and buffering. Capturing, parsing, and processing Ethernet packets received at NIC from the FPGA is typically where data reduction in a science backend begins. This simple example provides a starting point for being able to work with the fundamentals of capturing and working with Ethernet packets. When lossless data streams are required it is more typical to use optimized dediated high-throughput pipelines. In these frameworks where the NIC and CPU perform the packet capturing functions the computation is typically performed on a GPU. These function are also distributed over many threads and optimized for performance and throughput. In this tutorial we did not focus on processor performance, but rather focused on the fundamentals that lead in that direction.

Appendix and Reference

Memory Map and Software Programmable Interface

This section of the tutorial will not provide a complete explanation of everything that can be done with the software interface, but rather demonstrate breifly some of the software programmable capabilities of the 100GbE core through casperfpga that can be further explored by the casper-ite.

When using casperfpga the memory map for the platform can be accessed using the listdev() command. This was demonstrated and explained briefly in the Platform Getting Started tutorial. A quick example is as follows:

import casperfpga
rfsoc = casperfpga.CasperFpga('<ip_address_or_hostname>')
rfsoc.upload_to_ram_and_program('/path/to/design.fpg')
rfsoc.listdev()

The rfsoc Casperfpga object here returns a list of all the registers in the memory map. With a 100GbE yellow block in the design the registers that make up that core’s memory mapy will be present. These can be accessed using the normal read_uint/write_int casperfpga methods.

However, when a 100GbE is present in the design the casperfpga object will build a 100GbE object that abstracts working with the core by providing methods that manage the lower-level read and writes. For example, with the tutorial design programmed on the FPGA we can get a reference the 100GbE object and configure the IP address and MAC after startup

# get the eth core
eth = rfsoc.gbes['onehundred_gbe']

# get core information
ip = '10.17.16.61'
mac = 0x02a202000203

# configure core
eth.configure_core(mac, ip, 60000)

If ever in doubt however, all can be configured from the memory map

rfsoc.write_int('gmac_reg_mac_address_l_0', (mac & 0xffffffff))
rfsoc.write_int('gmac_reg_mac_address_h_0', (mac >> 32) & 0xffff)

The 100GbE also has an ARP table that can be programmed. The following is an example for how to do this:

# a configuration dictionary with IP/MAC key-value pairs
c = {
  'arp': {
    10.17.16.10: 0x0c42a1a39a06,
    # ...
    # ...
    10.17.16.61: 0x0c42a1a3992e
  }
}

# set arp table
for ip, mac in c['arp'].items():
  print("Configuring ip {:s} with mac {:x}".format(ip, mac))
  eth.set_single_arp_entry(ip, mac)