Skip to content

Snitch Frontend

Snitch Frontend Role

The Snitch frontend (idma_inst64_top) is tightly coupled to the Snitch RISC-V core through custom Xdma ISA extensions. DMA transfers are launched directly from the instruction stream via the accelerator bus interface, eliminating register file overhead and enabling single-cycle transfer submission.

Snitch Integration

Xdma Instruction Set

A DMA transfer requires three steps: (1) set the source and destination addresses (DMSRC, DMDST), (2) launch the transfer with a length and config (DMCPY/DMCPYI), (3) poll for completion (DMSTAT/DMSTATI). Optional instructions set 2D parameters (DMSTR, DMREP) and AXI user fields (DMUSER).

All DMA instructions that return a value write to rd (destination register). The assembly syntax is DMCPYI rd, rs1, immrd receives the transfer ID, rs1 provides the length.

InstructionOperandsDescription
DMSRCrs1 (low 32b), rs2 (high bits)Set source address
DMDSTrs1 (low 32b), rs2 (high bits)Set destination address
DMCPYIrd = transfer ID, rs1 = length, imm = {channel, config}Launch transfer (immediate config). Returns transfer ID in rd
DMCPYrd = transfer ID, rs1 = length, rs2 = {channel, config}Launch transfer (register config). Returns transfer ID in rd
DMSTATIrd = status value, imm = {channel, status_sel}Query status (immediate). Returns status in rd
DMSTATrd = status value, rs2 = {channel, status_sel}Query status (register). Returns status in rd
DMSTRrs1 = src_stride, rs2 = dst_strideSet 2D strides
DMREPrs1 = repetitionsSet 2D repetition count
DMUSERrs1, rs2Set AXI user field. When AxiUserWidth <= 32, only rs1 is used (lower bits). When AxiUserWidth > 32, rs1 provides bits [31:0] and rs2 provides the remaining upper bits

Status select values (DMSTAT/DMSTATI):

  • 0: Completed transfer ID — compare against the ID returned by DMCPY to check if a specific transfer has finished
  • 1: Next transfer ID — the ID that will be assigned to the next submitted transfer
  • 2: Busy flag — 1 if any transfer is in-flight on this channel
  • 3: Backend FIFO full flag — 1 if the request FIFO is full; software should wait before submitting more transfers

Config field (DMCPY/DMCPYI):

  • Bit 0: Reserved
  • Bit 1: Enable 2D mode (use previously set strides/reps). If 2D mode is enabled but DMSTR/DMREP were not called since the last transfer, the previously set stride and repetition values are reused. On reset, these default to zero
  • Bits 4:2: Channel select — $clog2(NumChannels) bits wide, remaining upper bits are zero-extended. For the common single-channel case (NumChannels=1), these bits are unused and only bit 1 (2D enable) matters

Parameters

For most Snitch cluster integrations, NumChannels=1 and NumAxInFlight=3 are standard. Increase NumChannels only if you need independent DMA channels on separate address spaces.

ParameterDescription
AxiDataWidthAXI data bus width
AxiAddrWidthAXI address width
AxiUserWidthAXI user signal width (max 64 bits)
AxiIdWidthAXI ID width
NumAxInFlightNumber of in-flight AXI transactions (default: 3)
DMAReqFifoDepthDepth of the request FIFO between frontend and midend (default: 3)
NumChannelsNumber of independent DMA channels, each with its own backend + ND midend (default: 1)
DMATracingEnable DMA trace file generation for debugging

Programming Sequence

The following assembly sequence demonstrates a complete 2D DMA transfer with polling completion:

# 1. Set source address
DMSRC a0, a1 # src_addr = {a1, a0}
# 2. Set destination address
DMDST a2, a3 # dst_addr = {a3, a2}
# 3. (Optional) Set 2D parameters
DMSTR a4, a5 # src_stride = a4, dst_stride = a5
DMREP a6 # reps = a6
# 4. Launch transfer (2D mode on channel 0)
DMCPYI t0, a7, 0b010 # t0 = transfer_id, config[1]=1 (2D mode)
# 5. Poll for completion
loop:
DMSTATI t1, 0b000 # t1 = completed_id on channel 0
blt t1, t0, loop # Wait until completed_id >= transfer_id

Internal Architecture

The idma_inst64_top module instantiates NumChannels independent backends, each paired with an ND midend (NumDim=2, BufferDepth=3). The frontend instruction decoder fills an idma_nd_req_t struct from the instruction stream and routes it to the selected channel’s request FIFO. A per-channel transfer ID generator tracks issue and retire events. Each backend produces separate AXI read and write manager ports. The axi_rw_join module merges them into a single AXI manager port for connection to the SoC interconnect.

When NumChannels > 1, each channel has its own independent backend and ND midend. The channel is selected via the config field in DMCPY/DMCPYI (bits 4:2). Channels operate independently — one can be busy while another accepts new transfers. The AXI ports from all channels are merged via axi_rw_join, so they share bus bandwidth.

Source Files

  • src/frontend/inst64/idma_inst64_top.sv — Top-level module
  • src/frontend/inst64/idma_inst64_snitch_pkg.sv — Instruction encodings
  • src/frontend/inst64/idma_inst64_events.sv — Performance event counters