From my brief skim of the material, it should be possible to issue DMA reads or writes from an RC to your Cyclone V (EP) using the IP core you're interested in.
I've done DMA reads and writes on a Stratix V, however it was in a non-Qsys design just using the PCIe core HIP block (custom TLP encoding and decoding logic). This block just seems to be a wrapper around their PCIe HIP block that also handles the transaction layer for you.
The first step will be to get your RC to issue PCIe DMA read or writes requests. In the case of a read request, you'll want to send a memory read complete with data (CplD) request with a length greater than 1 DWORD. I would suggest dedicating an entire BAR to map the memory space you want to DMA from on the FPGA to keep your address targeting simple.
On the FPGA side, I would suggest using Signal Tap and probing the Rxm*
interface signals on the core. This way you can see the exact timing of the DMA read request that comes out of the core. My guess is that the RXMRead_<n>_o
signal will go high indicating the start of the request. At which point you'll have to decode and pass the RxmAddress_<n>_o
and RXMBurstCount_<n>_o
to some glue logic that will fetch the requested data from the FPGA's memory. Once you're ready to send back the data, assert the RXMReadDataValid_<n>_i
for each valid word being sent.
I'm guessing that the «Cyclone V Avalon-MM DMA for PCIe» core that you referenced takes care of that 'glue' logic I mentioned for you, and allows you to connect straight to a SDRAM controller on your Qsys bus. Altera doesn't usually encrypt their megafuction code, so if your system verilog is strong, it might be worth digging through their generated files and seeing if you can reuse that bit of code in some way.
As for core settings, the only thing that I saw that you need to look out for is making sure the Single DW Completer setting is turned OFF. Otherwise the core will abort any requests it receives with a length greater than 1 DWORD.
Hope that helped somewhat.