DRAM¶

Block: DRAM (`dram`)
Block Author: Pierre Yves Droz (BEE2), David George(ROACH)
Document Author: Jason Manley, Laura Spitler

Summary¶

This block interfaces to the BEE2+ROACH’s 1GB DDR2 ECC DRAM modules. Commands that are clocked-in are executed with an unknown delay, however, execution order is maintained. The underlying controller for the BEE2 and the ROACH are different and not all features are supported across both platforms (see below for details).

Parameter Variable Description
DIMM dimm Selects which physical DIMM to use (four per user FPGA).
Data Type arith_type Inform Simulink how it should interpret the stored data.
Data binary point bin_pt Inform Simulink how it should interpret the stored data - specifically, the bit position in the word where it should place the binary point.
Datapath clock rate (MHz) ip_clock Clock rate for DRAM. Default: 200MHz (400DDR).
Sample period sample_period Is significant for clocking the block. Default: 1
Simulate DRAM using ModelSim use_sim Requires the addition of the ModelSim block at the top level of the design. Used to simulate DRAM block only.
Lesser Simulation Address Width ??? If the ModelSim simulation is disabled a very basic simulation using BRAMs will be performed. This parameter selects the address width to the bram memory and cannot exceed 20 (or so) bits.
Enable bank management bank_mgt Advise leave off for BEE2. Allows multiple banks to be open at the same time. Always enabled on ROACH (setting ignored).
Use wide data bus (288 bits) wide_data Burst writes require 288 bits. If not selected, provide a 144 bit bus which needs to be supplied with data in consecutive clock cycles to form the 288 bits. 288 bit bus can make for challenging routing! Not implemented on ROACH.
Use half-burst half_burst Only store 144 bits per burst (wastes half capacity as the second 144 bits are unusable). If enabled, requires at least two clock cycles to store 144 bits. Second clock cycle’s data is forfeited. Not implemented on ROACH.
Use BRAM FIFOs (ROACH only) bram_fifos Use blockRAM FIFO’s in DRAM controller. This is required only if the application clock rate is less than the dram clock rate to avoid overflows on the read interface. By default distributed RAM will be used which exhibits better timing performance and reduces BRAM resources.
Include CPU Interface (ROACH only) use_sniffer Includes the CPU interface which allows direct DRAM access from software. Including this may introduce timing issues at very high DRAM controller frequencies.

Ports¶

Port Dir Data Type Description
rst in boolean Resets the block when pulsed high
address in UFix_32_0 A signal which accepts the address. See below for details.
data_in in 144 or 288 bit unsigned Accepts data to be saved to DRAM.
wr_be in UFix_18_0 or UFix_36_0 Selects bytes for writing (write byte enable). It is normally 18 bits wide for a 144 bit data bus, but if 288 bit data bus is selected, this becomes a 36 bit variable.
RWn in boolean Selects read or not-write. `1` for read, `0` for write.
cmd_tag in UFix_32_0 Accepts a user-defined tag for labelling entered commands. Not implemented on ROACH.
cmd_valid in boolean Clocks data into the command buffer.
rd_ack out boolean Used to acknowledge that the last `data_out` value has been read.
cmd_ack out boolean Acknowledges that the last command was accepted (when buffer is full, will not accept additional commands). ROACH: Pin HI unless an attempt to clock in a command failed
data_out out UFix_144_0 Outputs data from DRAM, 144 bits at a time. Reads are in groups of 288 bits (ie, 2 clocks).
rd_tag out UFix_32_0 Outputs the identifier for the data on `data_out` (as submitted on `cmd_tag` when the command was issued). Not implemented on ROACH.
rd_valid out boolean Indicates that the data on `data_out` is valid.

Description¶

BEE2 Specific Info¶

Core details about the BEE2 memory interface can be found at the (static) BEE2 wiki:

http://bee2.eecs.berkeley.edu/wiki/Bee2Memory.html

The 1GB storage DIMMs have 18 512Mbit chips each. They are arranged as 64Mbit x 8 (bus width) x 9 (chips per side/rank) x 2 (sides/ranks). Two ranks (sides) per module with the 9 memory ICs connected in parallel, each holding 8 bits of the data bus width (72 bits). Each IC has four banks, with 13 bits of row addressing and 10 bits for column addressing. Normally, each address would hold 64 bits + parity (8 bits), however, the BEE2 uses the parity space as additional data storage giving a capacity of 1.125 GB per DIMM module.

From Micron’s datasheet on the MT47H64M8CD-37E (as used by CASPER in its Crucial 1GB CT12872AA53E modules): The double data rate architecture is essentially a 4n-prefetch architecture, with an interface designed to transfer two data words per clock cycle at the I/O balls. A single read or write access effectively consists of a single 4n-bit-wide, one-clock-cycle data transfer at the internal DRAM core and four corresponding n-bit-wide, one-half-clock-cycle data transfers at the I/O balls.

Reads and writes must thus occur four-at-a-time. 4 x 72bits = 288 bits. Although the mapping of the logical to physical addressing is abstracted from the user, it is useful to know how the DRAM block’s address bus is derived, as it impacts performance:

Column 12 $rightarrow$ 3
Rank 13
Row 27 $rightarrow$ 14
Bank 29 $rightarrow$ 28
not used 31 $rightarrow$ 30

Each group of 8 addresses selects a 144 bit logical location (the lowest 3 bits are ignored). For example, address `0x00` through `0x7` all address the same 144 bit location. To address consecutive locations, increment the address port by eight. There are thus a total of 227 possible addresses. The block supports 2GB DIMMs (UNCONFIRMED) since 14 bits of addressing are reserved for row selection. The 1GB DIMMs using Micron 512Mb chips, however, only use 13 bits for row selection which results in 226 possible address locations. Care should be taken when addressing the 1GB DIMMS as bit 27 of the address range is not valid. However, bits 28 and 29 are mapped. Since bit 27 is ignored, it results in overlapping memory spaces.

Data bus width¶

The BEE2 uses ECC DRAM, however, the parity bits are used for data storage rather than parity storage. Thus, the data bus is 72 bits wide instead of the usual 64 bits.

The memory module has a DDR interface requiring two reads or writes per RAM clock cycle (~200MHz), thus requiring the user to provide 144 bits per clock cycle. Furthermore, as outlined above, data has to be captured in batches of 288 bits. This can be done in one of two ways: in two consecutive blocks of 144 bits, or over a single 288 bit-wide bus. This is selectable as a mask parameter. If half-burst is selected, only a 144 bit input is required. 288 bits are still written to DRAM, but the second 144 bits are not specified. Thus, half of the DRAM capacity is unusable.

ROACH Specific Info¶

The ROACH DRAM infrastructure currently doesn’t support half burst and wide data modes. Bank management is always enabled. Tag buffers are not implemented. The DRAM controller clock rate can be one of the following: 150, 200, 266, 300 or 333. If a frequency other than these is provided the default of 266 will be used. The dram controller has been known to work at 300MHz.

Interfacing Details¶

To write data into the DRAM, ‘RWn’ is held low, ‘cmd_valid’ is held high for a minimum of two FPGA clocks, and the ‘address’ port is held constant for both clock cycles. For example, to write into addresses 0x00 and 0x01, keep the address at 0x00 for both clocks. To read data out of the DRAM, hold ‘RWn’ high, keep the address constant for two FPGA clock cycles, and toggle the ‘cmd_valid’ pin every clock. Note that a new word will be available on the ‘data_out’ pin on every clock cycle. ‘rd_valid’ will frame valid output data some indeterminate number of clock cycles after the read ‘cmd_valid’ toggles. ‘cmd_ack’ is high unless an attempt to write a command into the input FIFO failed, at which point it will go low synchronously with the issuing of the failed command.

Many ROACHs have been shipped with 1 GB dual rank DIMMs by default. The current DRAM controller is not able to handle multiple ranks, so when a dual-rank DIMM is installed on the board, only half the memory is available. In order to use the full 1 GB, a single rank DIMM is needed, or in principle a dual rank 2 GB module.

Note that on the ROACH all of the oddities of the DRAM addressing specified above for the BEE2 version are taken care of for you, so you can just directly address locations 0 to (2^30 / 16) = 2^26 in the hardware.

DRAM CPU interface¶

If the block mask was set to include the CPU interface, the DRAM can be accessed by bytes through BORPH through ‘dram_memory’. The width of the CPU interface is only 128 bits (16 bytes), which results in discrepancy between hardware and CPU address. After every 64 bits, there are 8 ECC bits not visible to the CPU. For example bytes 0x00-0x07 in the DRAM are seen as 0x00-0x07 in the CPU, byte 0x08 in the DRAM is not visible to the CPU, and byte 0x09 in the DRAM is seen as byte 0x08 in the CPU.

Only 64MB of DRAM can be mapped into the ‘dram_memory’ register at any given time. You can select which 64MB segment is mapped into the ‘dram_memory’ register though the first 32-bit word of the ‘dram_controller’ register. For example, to access the first 64MB chunk of DRAM write 0x0 into this register and for the second 0x1.

The DRAM is most easily accessed using the KATCP function “read_dram”.

The second 32-bit word in the ‘dram_controller’ register indicates the DRAM controller ready flag. This value stores will be 0x1 if the controller is operational. If it is not your DRAM will not operate at all. Typical problems causing this would include using an unsupported RDIMM.

Example models¶

1) David George’s million channel ROACH spectrometer (“buf” block): rmspec.mdl

2) Laura Spitler’s simple design that reads and writes a counter into the DRAM: Dram roach rwramp.mdl

3) Jason Manley’s DRAM counter example: Dram counter test 10 1.gz

4) Tim Madden’s DRAM streaming output design (April 2015) https://github.com/argonnexraydetector/RoachFirmPy

Performance Tips¶

The performance of the DRAM block is dependent on the relative location of the addressed data and whether or not the mode (read/write) is changed. For example, consecutive column addresses can be written without delay, however, changing rows or banks incur delay penalties. See above for the address bit assignment.

To obtain optimum performance, it is recommended that the least significant bits be changed first (ie address the memory from `0x0000000` through to address `0x20000000` on the BEE2). This will increment column addresses first, followed by rank change, both of which incur little delay. Changing rows or banks can take twice as long. Further information can be found in the DRAM module’s datasheet (Micron MT47H64M8 on the BEE2).

Changing the mode(read/write) results in large delays, so it is recommended that read and writes be done in bursts into consecutive addresses. For a fabric clock speed of 200 MHz and DRAM speed of 266 MHz, a burst length of at least 32 words is recommended.

Bank management allows for three banks to be open simultaneously, reducing the overhead when switching between these banks. This feature is always enabled on ROACH, but YMMV with the BEE2 controller.