14

I need a suggestion on on how do I copy a block of memory efficiently, in single attempt if possible, in C++ or assembly language.

I have a pointer to memory location and offset. Think of a memory as a 2D array that I need to copy consisting of rows and columns.

unwind
  • 391,730
  • 64
  • 469
  • 606
Abdul Khaliq
  • 2,423
  • 12
  • 40
  • 65

6 Answers6

42

How about std::memcpy?   

Aistina
  • 12,435
  • 13
  • 69
  • 89
  • Yes, use memcpy, as it is usually optimal for the target architecture. On x86 architectures, optimal implementations use a few 128-bit SSE registers. – Eric Bainville Jun 03 '09 at 11:55
  • well i had already tried that. What memcopy does that it copies one row at a time. Think of i have a block consisting of 5000 rows or more and in a function that is called all the time 10000 times. – Abdul Khaliq Jun 03 '09 at 11:57
  • 2
    If all rows are contiguous in memory, you can copy all rows in a single memcpy call. If the gaps between the rows in memory are small, a single memcpy call will probably be the fastest way. If all rows are allocated separately, then a loop of memcpy will be needed. – Eric Bainville Jun 03 '09 at 12:04
  • 1
    Beware of the fact that the origin and destination memory areas must not overlap. If they overlap, either you create an algorithm to perform N non-overlaping memcpy instead of a single operation – David Rodríguez - dribeas Jun 03 '09 at 14:54
  • Sorry, memcpy has been deemed not safe. :P ( http://stackoverflow.com/questions/876557/microsoft-sdl-and-memcpy-deprecation ) – Sanjaya R Jun 03 '09 at 16:50
6

If you need to implement such functionality yourself, I suggest you to check up Duff's Device if it has to be done efficiently.

user44556
  • 5,763
  • 5
  • 30
  • 27
2

Reading your comments, it sounds like you might want to use parallelism. There are instructions to do this, but they only operate on registers, not memory.

This is because of the way the computer architecture is (I'm assuming x86).

You can only be accessing one memory location at a time because the computer only has one address bus. If you tried to access more than one location at a time, you would be overloading the bus and nothing would work properly.

If you can put the data you need in registers, then you can use a lot of cool processor instructions, such as MMX or SSE, to perform parallel calculations. But as for copying memory in parallel, it's not possible.

As others have said, use memcpy. It's reliable, debugged, and fast.

samoz
  • 56,849
  • 55
  • 141
  • 195
1

Use memmove() if the origin and source overlap. Usually memcpy() and memmove() have been highly optimized already for your compiler's clib. If you do write a replacement, at least benchmark it against the clib versions to make sure you're not slowing down your code.

i have a block consisting of 5000 rows or more and in a function that is called all the time 10000 times

Also, consider changing your data structure. Perhaps instead of a 2D array, you can have a 1D array of Pointers to secondary Arrays (the columns). Then instead of copying the entire rows, you need only copy or move the Pointers. You could Pool the column Arrays in a Free-List so that you're not spending lots of time allocating and freeing them as well.

Adisak
  • 6,708
  • 38
  • 46
0

memcpy?

Martin
  • 2,442
  • 14
  • 15
0

REP MOVSD in assembly perhaps? Hard to say without more information on exactly what you're trying to copy... Or, you can reprogram the DMA controller to do it too, but it'll actually end up being slower than just using the processor. :-)

Brian Knoblauch
  • 20,639
  • 15
  • 57
  • 92