Skip to content

System Integration

Integration Overview

This guide covers the practical steps for integrating iDMA into a system-on-chip: choosing a frontend/midend/backend combination, instantiating the type macros, wiring the bus interfaces, and setting the key parameters.

System Integration

System Integration (alternate view)

Integration Steps

1. Choose Your Stack

Select a frontend, midend, and backend based on your requirements:

  • Backend: Pick the variant matching your bus protocols — see the variant matrix. If your SoC uses AXI4 throughout, rw_axi is the standard choice. Use r_obi_w_axi when reading from OBI-attached memory but writing via AXI. The Occamy variants (r_obi_rw_init_w_axi, r_axi_rw_init_rw_obi) add the INIT protocol for efficient memory zeroing. The backend page also covers legalizer splitting rules and error handling constraints.
  • Frontend: Choose based on your SoC’s control interface — see the frontend comparison
  • Midend: Use the ND midend for 2D/3D transfers, RT midend for periodic transfers, or skip the midend entirely for 1D-only systems

2. Define Types

Use the convenience macros from typedef.svh to define all required types in one shot:

`include "idma/typedef.svh"
// Define address, data, and ID types
typedef logic [AddrWidth-1:0] addr_t;
typedef logic [DataWidth-1:0] data_t;
typedef logic [IdWidth-1:0] id_t;
typedef logic [TFLenWidth-1:0] tf_len_t;
// 1D request/response types (expands options_t and err_payload_t internally)
`IDMA_TYPEDEF_FULL_REQ_T(idma_req_t, id_t, addr_t, tf_len_t)
`IDMA_TYPEDEF_FULL_RSP_T(idma_rsp_t, addr_t)
// ND request type (if using ND midend)
typedef logic [RepWidth-1:0] reps_t;
typedef logic [StrideWidth-1:0] strides_t;
`IDMA_TYPEDEF_FULL_ND_REQ_T(idma_nd_req_t, idma_req_t, reps_t, strides_t)

The IDMA_TYPEDEF_FULL_REQ_T macro is a convenience wrapper that internally invokes IDMA_TYPEDEF_OPTIONS_T and IDMA_TYPEDEF_REQ_T. Use the FULL_ variants for integration — they define all intermediate types automatically. The non-FULL_ macros (IDMA_TYPEDEF_REQ_T, IDMA_TYPEDEF_OPTIONS_T) define individual types and require you to provide the intermediate types manually. Use FULL_ unless you need custom intermediate types.

3. Instantiate

All iDMA modules use a single clock (clk_i) and synchronous active-low reset (rst_ni). The clock must be the same for all three layers (frontend, midend, backend) and the attached memory system.

Wire the three layers together. The key connections are:

  • Frontend dma_req_o / req_valid_o / req_ready_i -> Midend ND request input
  • Midend burst_req_o / burst_req_valid_o / burst_req_ready_i -> Backend request input
  • Backend idma_rsp_o / rsp_valid_o / rsp_ready_i -> back through midend to frontend

The following skeleton shows how the three layers connect. Signal types come from the macros defined in step 2 above — fe_req is idma_nd_req_t, be_req is idma_req_t:

// Frontend -> Midend -> Backend
idma_reg64_2d #(
.NumRegs ( 1 ),
.NumStreams ( 1 ),
.reg_req_t ( reg_req_t ),
.reg_rsp_t ( reg_rsp_t ),
.dma_req_t ( idma_nd_req_t )
) i_frontend (
.clk_i,
.rst_ni,
.reg_req_i ( dma_reg_req ),
.reg_rsp_o ( dma_reg_rsp ),
.dma_req_o ( fe_req ),
.req_valid_o ( fe_valid ),
.req_ready_i ( fe_ready ),
.dma_rsp_i ( fe_rsp ),
.rsp_valid_i ( fe_rsp_valid ),
.rsp_ready_o ( fe_rsp_ready )
);
idma_nd_midend #(
.NumDim ( 2 ),
.addr_t ( addr_t ),
.idma_req_t ( idma_req_t ),
.idma_rsp_t ( idma_rsp_t ),
.idma_nd_req_t ( idma_nd_req_t ),
.RepWidths ( 32'd32 )
) i_midend (
.clk_i,
.rst_ni,
.nd_req_i ( fe_req ),
.nd_req_valid_i ( fe_valid ),
.nd_req_ready_o ( fe_ready ),
.nd_rsp_o ( fe_rsp ),
.nd_rsp_valid_o ( fe_rsp_valid ),
.nd_rsp_ready_i ( fe_rsp_ready ),
.burst_req_o ( be_req ),
.burst_req_valid_o ( be_valid ),
.burst_req_ready_i ( be_ready ),
.burst_rsp_i ( be_rsp ),
.burst_rsp_valid_i ( be_rsp_valid ),
.burst_rsp_ready_o ( be_rsp_ready )
);
idma_backend_rw_axi #(
.DataWidth ( DataWidth ),
.AddrWidth ( AddrWidth ),
.idma_req_t ( idma_req_t ),
.idma_rsp_t ( idma_rsp_t ),
.axi_req_t ( axi_req_t ),
.axi_rsp_t ( axi_rsp_t )
// Bus ports — variant-specific, see generated module for full port list
) i_backend (
.clk_i,
.rst_ni,
.idma_req_i ( be_req ),
.req_valid_i ( be_valid ),
.req_ready_o ( be_ready ),
.idma_rsp_o ( be_rsp ),
.rsp_valid_o ( be_rsp_valid ),
.rsp_ready_i ( be_rsp_ready ),
.busy_o ( busy )
// AXI read/write ports omitted — variant-specific
);

4. Set Parameters

Recommended parameter presets:

Minimum AreaBalancedHigh Throughput
DataWidth3264256–512
BufferDepth233
NumAxInFlight234–8
MemSysDepth008–16
CombinedShifter100
RAWCouplingAvail011
HardwareLegalizer011
ErrorCapNO_ERROR_HANDLINGERROR_HANDLINGERROR_HANDLING

Minimum Area sacrifices throughput for gate count — single shifter, no coupling, software legalization. Balanced adds hardware legalization and coupling for correct-by-default behavior. High Throughput uses deep FIFOs and wide buses to saturate memory bandwidth.

Real-World Examples

The following SoCs provide canonical integration examples spanning different bus protocols, data widths, and frontend styles.

Cheshire

Repopulp-platform/cheshire
Frontendidma_reg64_{1d,2d}
MidendND (2D, conditional)
Backendrw_axi
BusAXI4
Data Width64-bit
Key Filehw/cheshire_idma_wrap.sv

Start here if your SoC uses a CVA6 core with AXI4. Cheshire is a CVA6-based Linux-capable SoC built around an AXI4 fabric. Its iDMA instance supports conditional 1D/2D mode, making it a good reference for register-frontend integrations with optional multi-dimensional transfers.

Croc

Repopulp-platform/croc
FrontendCustom OBI register interface
MidendND (2D)
Backendrw_obi
BusOBI
Data Width32-bit
Key Filertl/idma/croc_idma.sv

Start here for a minimal OBI-based integration. Croc is a minimal OBI-based SoC with a custom register frontend — the smallest and simplest example.

Snitch Cluster

Repopulp-platform/snitch_cluster
Frontendinst64 (Xdma ISA)
MidendND
Backendrw_axi
BusAXI4
Data Width512-bit
Key Filehw/snitch_cluster/src/

Reference for ISA-coupled DMA with wide (512-bit) data paths. The Snitch cluster uses a wide 512-bit data path with an ISA-coupled DMA on a dedicated core. Transfers are submitted via Xdma custom instructions, achieving single-cycle launch latency.

PULP Cluster

Repopulp-platform/pulp_cluster
Frontendidma_reg32_2d_frontend
MidendND (2D)
Backendrw_axi
BusAXI4
Data Width64-bit
Key Filertl/idma_wrap.sv

Shows multi-core DMA sharing and the mchan/iDMA selection mechanism. The PULP cluster is a multi-core architecture with tightly-coupled data memory (TCDM). It uses a register-based 2D frontend and supports conditional selection between the legacy mchan DMA and iDMA via the TARGET_MCHAN parameter.

Dependency Management

iDMA uses Bender for RTL dependency management. Add iDMA to your Bender.yml and run bender update to pull it and its transitive dependencies (axi, common_cells, register_interface, obi). Bender resolves the source file list for your build system — use bender script vsim for Questa or bender script vcs for VCS.