MSX DMA implementation (3)

July 19, 2025

Comes from part 2: Minimalistic DMA manager

To begin our minumum logic implementation, first we define our interface. That is, the definition of the physical signals that we'll manage, both for input and output. That mainly comprises our bus signals (direct to Z-80 or, in the Tides-rider, the cartridge slot), and the dma signals used by our slave device:

module dma1_top(
// msx bus
input clk,
input reset_n,
inout [15:0] a,
inout [7:0] d,
input slt_n,
input wait_n,
output mreq_n,
inout iorq_n,
inout rd_n,
inout wr_n,

// dma control
input req0_n,
output ack0_n, // AKA /ready

// bus arbitration
output busreq_n,
input busack_n);

Note that we use _n suffix for negated signals (signals that are asserted when low).

Also the memory selection signal from the bus is split in two signals (/slt used for slave access and /mreq used only for output) This a very convenient help provided by the msx slot that we can take advantage of in the Tides-Rider. Otherwise we would need to use /mreq both as an input (for slave access) and output (as a bus master) and provide additional logic to know is the bus selected page if "our slot" when we are the bus slave.

When we are the bus slave, we provide our set of registers for configuration. We provide a selection mechanism to map them at 7Fxxh and put our actual registers and bits in a submodule:

wire sel = (a[14:8] == 'h7f);
wire sel_ch0 = sel & ~slt_n & ~a[3] & ~a[2];
wire [WIDTH-1:0] addr_ch0;
channel_module #(.WIDTH(WIDTH)) ch0(.clk(clk), .reset(~reset_n),
.sel(sel_ch0), .wr(~wr_n), .a(a[1:0]), .din(d),
.addr(addr_ch0), .en(en_ch0), .ack(ack_ch0));

Note that we achieve greater readability in our top module and, even better, we can replicate the submodule for each "dma channel" when we want to have more than one. Note that the module provides auto-increment of the pointer too, that's why it receives the ack signal.

Now that we have every signal and configuration, we would go to the state-machine. Actually, we make two state-machines:

One that does all the steps we discussed in part 2 (engine)
Another that does the actual z-80 bus cycles (buster)

wire [15:0] dma_addr;
wire [7:0] dma_dout;
engine_module #(.WIDTH(WIDTH)) engine(.clk(clk), .reset(~reset_n),
.snoop(snoop), .din(d),
.dma_seq(1), .dma_req(mux_req), .addr_in(mux_addr), // from dma mux
.dma_ack(dma_ack), // to dma mux -> current dma periph
.addr_out(dma_addr), .dout(dma_dout), .bus_req(bus_req), // to buster
.we(dma_we), .ioe(dma_ioe), // buster mode
.bus_term(bus_term)); // from buster

Note that we take the current state of the dma channel and the request to feed our state machine, and generate the address, the data out (for writes) for the buster to make actual bus cycles. We also provide a snoop signal to track the cpu changes on the mapper when idle (more on this later)

buster_module buster(.clk(clk), .reset(~reset_n),
.req(bus_req), .master(bus_master), .term(bus_term),
.busreq(busreq_out), .busack(~busack_n), .buswait(~wait_n),
.we(dma_we), .ioe(dma_ioe),
.rd(bus_rd), .wr(bus_wr), .mreq(bus_mreq), .iorq(bus_iorq),
.doe(bus_doe));

The buster generates the cpu cycles for each type of bus access (memory or io) and watches the /wait signal to enlarge the cycles if needed. When each bus access is completed it notifies the engine to continue with bus_term.

It manages the bus arbitration too. When the dma controller owns the bus, the signal bus_master is active, so we use it to actually drive our bus only when we are allowed to do so:

assign a = bus_master ? dma_addr : 16'bz;
assign d = (bus_master & bus_doe) ? dma_dout : 8'bz;
assign mreq_n = bus_master ? ~bus_mreq : 1'bz;
assign iorq_n = bus_master ? ~bus_iorq : 1'bz;
assign rd_n = bus_master ? ~bus_rd : 1'bz;
assign wr_n = bus_master ? ~bus_wr : 1'bz;

Otherwise, the bus is high-z on our part (input)

Note that the data bus is passes directly to our slave-device. It's notified to take the data during the actual read in master mode by us. This is also known as "fly by" dma because no "write" access is done to the slave device. It as simpler because there are less engine states (we are trying to get the functionality with minimum logic) and is actually faster. At the msx clock speed of 3.57 Mhz, the dma engine can harvest 1 byte every three clock cycles (ignoring start/stop overhead). That's more than 1 MByte/sec!

On the other hand, with no wait states (as typical in msx), a single byte dma takes about 12-14 clock cycles. That means that the impact on the cpu for a dma transaction is low (less than 4 NOPs)

Single byte dma transaction

Note that the io transactions of the buster are still done as regular memory cycles (3 clocks, but with /iorq instead of /mreq), and the value used to restore the mapper (CCh) is a dummy value as there are no map settings yet prior in the simulation.

We'll see the state machines in detail in part 4: MSX DMA engine

Back to blog

Item added to your cart

MSX DMA implementation (3)

Country/region