2

setup

I have a bunch of RAM on the PL (programmable logic / FPGA) side of a zync-7000 chip. This memory can be accessed both via the PL and PS (processing system / CPU) side. The plan is for the CPU to load a large GiB buffer and hand it off to the PL.

Linux bursts to / from the RAM when device tree is modified

When I modify the device tree so linux can see the ram I observe fast read/write speeds; the hardware/firmware is capable of burst read/write.

    memory {
        device_type = "memory";
        // The 512 MiB memory at 0x60000000
        reg = <0x0 0x40000000 0x60000000 0x20000000>;
    };

mmap device tree memory

The device tree is modified to prevent linux from using the RAM (so it can be used as a buffer for the PL instead)

    memory {
        device_type = "memory";
        reg = <0x0 0x40000000>;
    };

mmap is slow even after playing around with flags

I have tried several ways of setting up mmap()

int* addr_start = mmap(NULL, mmap_size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, address);
int* addr_start = mmap(NULL, mmap_size, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_POPULATE, fd, address);

While reliable, none of them give fast results when running an iterate - write / read test

// words_per_page is on the order of 2**20/4
case TEST_WRITE:
    for( int ii=0; ii < words_per_page; ii++)
        *waddr++=count++;
    break;
case TEST_READ:
    for( int ii=0; ii < words_per_page; ii++)
        sum += *raddr++;
    break;

question

Are there any user space ways of creating direct burst transactions to / from memory? If not, relevant linux kernel links would be appreciated.

vermaete
  • 1,330
  • 1
  • 17
  • 28
philn
  • 654
  • 6
  • 17
  • What language is it? I'm assuming it's C, it seems – new Q Open Wid Nov 01 '20 at 18:47
  • When RAM is declared as system memory, that memory will be accessed with processor cache, i.e. read and writeback caching. RAM that is used for I/O and/or shared typically requires that memory region to be uncached to avoid any coherency issues. – sawdust Nov 02 '20 at 01:28
  • @zixuan Yes, I'm writing it in C for low level access – philn Nov 02 '20 at 18:12
  • @sawdust That make sense, but is there a way to cache the memory and occasionally flush it? – philn Nov 02 '20 at 18:12
  • *"is there a way to cache the memory and occasionally flush it?"* -- From the Linux kernel, yes; see https://www.kernel.org/doc/gorman/html/understand/understand006.html#toc26. But unless there's synchronization with the other processor(s), there's the risk of race conditions and resulting coherency issues. – sawdust Nov 02 '20 at 20:41

1 Answers1

0

You definitely need to map the region as bufferable in order to maximize transfer speeds. You might need to use a different device driver than /dev/mem.

It is easier to control transfers from the programmable logic side if you use DMA to the Zynq host memory. On Zynq, I found I needed 8 maximum-length read requests in flight at a time to maximize throughput on a link.

If you need cache coherence with user space, you will need to use the ACP port so that the processor's cache will snoop on the memory writes from the programmable logic (PL).

Jamey Hicks
  • 2,340
  • 1
  • 14
  • 20