Midend

Midend Role

The midend sits between the frontend and backend. It accepts N-dimensional or round-trip transfer descriptors and decomposes them into a stream of 1D requests that the backend can execute. The midend is optional — for systems that only need 1D transfers, the frontend can drive the backend directly. Use the ND midend when your transfers are 2D or higher (e.g., tiling a matrix, copying framebuffer rows with stride). Skip the midend (connect frontend directly to backend) when all your transfers are 1D contiguous copies — the midend adds latency and area for no benefit in this case. See the System Integration guide for wiring examples showing how the midend connects to the frontend and backend.

Four midend variants are available:

Variant	Module	Purpose
ND	`idma_nd_midend`	Multi-dimensional transfer decomposition
RT	`idma_rt_midend`	Event-driven periodic (round-trip) transfers
MP_DIST	`idma_mp_dist_midend`	Distribute transfers across multiple backends by address
MP_SPLIT	`idma_mp_split_midend`	Split transfers at region boundaries for a single backend

ND Midend

The ND midend (idma_nd_midend) decomposes an N-dimensional transfer into a sequence of 1D transfers. After each 1D burst completes, the midend checks whether the current dimension has remaining repetitions. If so, it adds the dimension’s stride to the address and emits the next burst. When a dimension exhausts its repetitions, the next-higher dimension increments. Internally, this is implemented with cascaded counters (one per dimension) and a popcount-based selector that handles simultaneous dimension overflows.

ND Parameters

Parameter	Description
`NumDim`	Number of dimensions. Must be >= 2 (dimension 1 is the 1D burst handled by the backend)
`RepWidths`	Per-dimension counter widths — an array specifying the counter width for each dimension. For example, with `NumDim=3` and `RepWidths = '{32, 16, 8}`, dimension 1 supports up to 2³² repetitions, dimension 2 up to 2¹⁶, etc.

Request Types

The ND request wraps a 1D idma_req_t with per-dimension stride/repetition descriptors:

`IDMA_TYPEDEF_D_REQ_T(idma_d_req_t, reps_t, strides_t)
// Expands to:
typedef struct packed {
    reps_t    reps;         // Number of repetitions for this dimension
    strides_t src_strides;  // Source address stride (bytes)
    strides_t dst_strides;  // Destination address stride (bytes)
} idma_d_req_t;

`IDMA_TYPEDEF_ND_REQ_T(idma_nd_req_t, idma_req_t, idma_d_req_t)
// Expands to:
typedef struct packed {
    idma_req_t                burst_req;       // Base 1D request
    idma_d_req_t [NumDim-2:0] d_req;           // Per-dimension descriptors
} idma_nd_req_t;

Worked Example: 2D Transfer

Consider a 2D transfer copying a 64-byte row repeated 4 times with different source and destination pitches. For this transfer, the request parameters are:

NumDim     = 2
length     = 64           (bytes per row)
reps       = 4            (number of rows)
src_stride = 128          (source row pitch)
dst_stride = 64           (destination row pitch, tightly packed)

The ND midend emits 4 sequential 1D transfers:

Iteration	src_addr	dst_addr	length
0	`base_src + 0`	`base_dst + 0`	64
1	`base_src + 128`	`base_dst + 64`	64
2	`base_src + 256`	`base_dst + 128`	64
3	`base_src + 384`	`base_dst + 192`	64

RT Midend

The RT midend (idma_rt_midend) supports event-driven periodic transfers. It is designed for periodic data movement — sensor sampling at fixed intervals, display buffer refresh, or ring-buffer rotation. Each event channel triggers its pre-configured transfer when its countdown reaches zero, without CPU intervention.

It contains NumEvents countdown counters, each triggering an ND transfer when its counter reaches zero. A round-robin arbiter selects among ready events, and a bypass path allows non-periodic transfers to pass through.

Example: A sensor sampling system needs to copy 256 bytes from sensor MMIO (0x4000_0000) to a ring buffer (0x8000_0000) every 1000 clock cycles. Configure event channel 0 with: src_addr = 0x4000_0000, dst_addr = 0x8000_0000, length = 256, countdown = 1000. The RT midend will autonomously re-trigger this transfer every 1000 cycles.

RT Parameters

Parameter	Description
`NumEvents`	Number of parallel event channels
`EventCntWidth`	Width of the countdown counters (period in clock cycles)
`NumOutstanding`	Maximum outstanding transfers (depth of the response routing FIFO)

Operation

Software configures each event channel with: source/destination addresses, transfer length, 2D strides/repetitions, and a countdown threshold. Configuration is submitted through the module’s nd_req_i port — software sends ND requests with an event channel ID. The countdown threshold and enable signals are separate input ports
Each enabled counter decrements every clock cycle
On overflow (reaching zero), the counter’s pre-configured ND request is submitted to the arbiter
A round-robin arbiter (stream_arbiter) selects among triggered events
The bypass path allows direct ND requests to be interleaved with periodic ones via a second round-robin arbiter
A response FIFO routes completions back to the correct requester (periodic or bypass)

Multicore Midends

MP_DIST

Use MP_DIST when your SoC has multiple memory banks and you want a single transfer to be distributed across backends, each serving a contiguous address region (e.g., tightly-coupled data memory in a cluster).

The distributed midend (idma_mp_dist_midend) splits a single transfer across NumBEs backends based on address regions. Each backend owns a contiguous RegionWidth-byte slice within the range [RegionStart, RegionEnd). The following parameters control the address region mapping:

Parameter	Description
`NumBEs`	Number of backends to distribute across
`RegionWidth`	Size of each backend’s address region in bytes
`RegionStart`	Base address of the distributed region
`RegionEnd`	End address of the distributed region
`AddrWidth`	Address width
`PrintInfo`	Print debug info on transfers

The midend uses a stream_fork to fan out the request to all backends simultaneously. Backends whose region does not overlap the transfer receive a suppressed request (valid deasserted, ready tied high). Completion is signaled only when all involved backends have finished.

MP_SPLIT

Use MP_SPLIT when a single transfer may span multiple address regions that require separate handling (e.g., crossing from one memory bank to another), and you want the hardware to serialize the sub-transfers automatically.

The split midend (idma_mp_split_midend) serializes a transfer that spans multiple RegionWidth boundaries into a sequence of region-aligned sub-transfers for a single backend. It uses a two-state FSM (Idle / Busy) to emit the first region-clipped transfer immediately, then iterates through remaining regions. The following parameters define the region layout:

Parameter	Description
`RegionWidth`	Size of each region in bytes
`RegionStart`	Base address of the managed region
`RegionEnd`	End address of the managed region
`AddrWidth`	Address width
`PrintInfo`	Print debug info on transfers

Source Files

src/midend/idma_nd_midend.sv — ND midend + counter submodule
src/midend/idma_rt_midend.sv — RT midend
src/midend/idma_mp_dist_midend.sv — Distributed multicore midend
src/midend/idma_mp_split_midend.sv — Split multicore midend