IDE DMA
All checks were successful
Build website / build-and-deploy (push) Successful in 3m42s

This commit is contained in:
2026-04-30 19:54:01 +02:00
parent a5fb6bb942
commit 3e32931002

View File

@@ -0,0 +1,246 @@
= Adding DMA support to an IDE driver
Kamil Kowalczyk
2026-04-30
:jbake-type: post
:jbake-tags: MOP2 osdev
:jbake-status: published
:og-title: Adding DMA support to an IDE driver
In this article I'd like to show you, how I've added DMA support to MOP3's IDE driver!
== Terminology
1. PRD = Physical Region Descriptor
2. PRDT = Physical Region Descriptor Table
3. DMA = Direct Memory Access
== PCI initialization
We initialize the IDE driver from the PCI layer, so we need to enable some stuff in order to
get DMA support.
[source,c]
----
uint16_t pci_cmd = pci_read16(pci_info.bus, pci_info.slot, pci_info.func, PCI_COMMAND);
uint16_t new_cmd = pci_cmd;
new_cmd |= (1 << PCI_CMD_IOSPACE);
new_cmd |= (1 << PCI_CMD_BUSMASTER); // <---- HERE!
new_cmd &= ~(1 << PCI_CMD_INTRDISABLE);
if (pci_cmd != new_cmd) {
pci_write16(pci_info.bus, pci_info.slot, pci_info.func, PCI_COMMAND, new_cmd);
}
----
We need to enable bus mastering. By flipping bit 2. This allows the device to talk to our memory via
provided addresses.
We also must get the port number, which we'll be using to configure our DMA transfers. This info is
located in `BAR4+0` for primary IDE channel and `BAR4+8` for secondary.
[source,c]
----
uint32_t bar4 = pci_read32(pci_info.bus, pci_info.slot, pci_info.func, PCI_BAR4);
uint16_t bmbase = (uint16_t)(bar4 & 0xFFFC);
bm_support = (bmbase != 0) && (bar4 & PCI_BAR_IO);
----
We of cource then have to pass bmbase and bm_support to our IDE driver init function and
now we just have to modify the driver itself to work with that.
== The driver
=== Structures
We first need to prepare some structs before we write the rest of the code.
[source,c]
.Physical Region Descriptor struct
----
// PRD
struct ide_prd_entry {
uint32_t phys_addr;
uint16_t size;
uint16_t rsvd_eot;
} PACKED;
----
This is our Physical Region Descriptor (PRD) struct. It holds info about DMA transfers. `phys_addr`
tells the hardware what memory to use - for reading, it will write memory there and for writing,
it will copy memory from there. `size` is the size of the data. PRDs can hold up to 64KiB of
data, so we'll have to split our transfers across 64KiB chunks. `rsvd_eot` is a marker for the
hardware to know when it has reached the end of the PRD list and to stop processing further.
*IMPORTANT*: `phys_addr` is a 32 bit pointer, so we MUST assert that this address is under 4GiB.
Otherwise it will truncate and make the hardware read/write to/from somewhere else entirely.
We also must ensure that the physical memory we allocate for the PRD is continuous, meaning
that there's no gaps/fragmentation.
*IMPORTANT 2*: We must also note that size = 0 actually means size = 64KiB.
[source,c]
.New IDE drive struct
----
struct idedrv {
struct device* device;
bool lba48;
size_t sector_count;
size_t sector_size;
uint16_t io, ctrl;
uint8_t devno;
uint8_t irq;
struct idedrv_request* current_req;
bool irqs_support;
/* New fields */
uint16_t bmbase; /* From BAR4 */
bool bm_support; /* From PCI layer */
struct ide_prd_entry* prdt; /* Virtual pointer to Physical Region Descriptor Table */
uintptr_t prdt_phys; /* physcal PRDT address */
size_t prdt_entry_count; /* Max count of PRDs */
uintptr_t bounce_buffer_phys; /* Bounce buffer used to move data between hardware and OS */
void* bounce_buffer;
};
----
=== Initialization
Instead of allocating every time we try to read/write, why not just pre-allocate all the needed memory?
[source,c]
.Bits of idedrv_init
----
idedrv->bm_support = init->bm_support;
if (idedrv->bm_support) {
idedrv->prdt_phys = pmm_alloc(1);
if (idedrv->prdt_phys >= 0xFFFFFFFF) {
pmm_free(idedrv->prdt_phys, 1);
free(idedrv);
return false;
}
idedrv->prdt_entry_count = PAGE_SIZE / sizeof(struct ide_prd_entry);
idedrv->prdt = (struct ide_prd_entry*)((uintptr_t)hhdm->offset + idedrv->prdt_phys);
idedrv->bounce_buffer_phys = pmm_alloc_aligned(64, 16);
if (idedrv->bounce_buffer_phys >= 0xFFFFFFFF) {
pmm_free(idedrv->bounce_buffer_phys, 64);
pmm_free(idedrv->prdt_phys, 1);
free(idedrv);
return false;
}
idedrv->bounce_buffer = (void*)((uintptr_t)hhdm->offset + idedrv->bounce_buffer_phys);
}
----
Now our driver supports 4096 / 8 = 512 PRDs -> 1 PRD = 64KiB -> 32 MiB of data transfered at one time.
=== Reading and writing
Here I'm going to focus on reading and writing with IRQ support enabled, although there are variants of read/write
functions which handle the case where IRQs are not enabled.
First we must prepare the PRDs:
[source,c]
----
size_t rem = sector_count * idedrv->sector_size;
uint32_t phys = idedrv->bounce_buffer_phys;
size_t prd_idx = 0;
while (rem > 0 && prd_idx < idedrv->prdt_entry_count) {
uint32_t chunk = (rem >= 0x10000) ? 0x10000 : rem;
idedrv->prdt[prd_idx].phys_addr = phys;
idedrv->prdt[prd_idx].size = (uint16_t)chunk; // If chunk is 64KiB, it will overflow to 0
rem -= chunk;
phys += chunk;
idedrv->prdt[prd_idx].rsvd_eot = (rem == 0) ? 0x8000 : 0x0000; // nothing has remained, so mark as End Of Table
prd_idx++;
}
----
Then we tell the hardware where are the PRDs physically:
[source,c]
----
outl(idedrv->bmbase + IDE_DMA_REG_PRDT, (uint32_t)idedrv->prdt_phys);
----
Tell if we're reading or writing. Send `0x08` to set reading mode.
[source,c]
----
outb(idedrv->bmbase + IDE_DMA_REG_CMD, 0x08);
----
Clear error/interrupt bits of status register
[source,c]
----
outb(idedrv->bmbase + IDE_DMA_REG_STATUS, status | IDE_DMA_STATUS_INTR | IDE_DMA_STATUS_ERROR);
----
Prepare position and sector count and enable interrupts
[source,c]
----
ide_prepare(idedrv, sector, sector_count, true);
----
Send the right DMA read (or write) depending on LBA48 support.
[source,c]
----
uint8_t cmd = idedrv->lba48 ? IDE_CMD_READ_DMA48 : IDE_CMD_READ_DMA28;
outb(idedrv->io + IDE_REG_CMD, cmd);
outb(idedrv->bmbase + IDE_DMA_REG_CMD, 0x08 | 0x01); // Start DMA engine
----
We can the finally copy the received data from the bounce buffer:
[source,c]
----
if (idedrv->bm_support)
memcpy(buffer, idedrv->bounce_buffer, sector_count * idedrv->sector_size);
----
Of course, for writing we must first copy into the bounce buffer.
==== Interrupt handler
Inside the handler there were a few changes to be made.
Acknowledge the interrupt by reading status and clearing intr/error bits:
[source,c]
----
uint8_t bm_status = inb(idedrv->bmbase + IDE_DMA_REG_STATUS);
if (!(bm_status & IDE_DMA_STATUS_INTR))
return;
outb(idedrv->bmbase + IDE_DMA_REG_STATUS,
bm_status | IDE_DMA_STATUS_INTR | IDE_DMA_STATUS_ERROR);
----
And then after we're done processing the interrupt, we must stop the DMA engine:
[source,c]
----
outb(idedrv->bmbase + IDE_DMA_REG_CMD, 0x00);
atomic_store(&req->done, 1);
idedrv->current_req = NULL;
----
== Conclusion and testing
In conclusion, adding DMA support was fairly easy. I've put it off for a long time, because
I was a bit scared to tackle it and didn't understand the subject that well, but after having
written the XHCI driver (which is all about DMA), I felt pretty confident!
After having tested the driver for a bit on real hardware, there is a definitive performance boost!
As a benchmark I'm using `sys:/sdutil -format-fat32 -d ide0`, which now takes up to a minute on a 32GiB
drive, where previously it was 2-3 minutes.