1

The ESP8266 is running an xtensa core and to read data from flash storage all accesses must be performed with 32bit words. To perform this I wrote the following method:

void memcpy_P(void * dst, const void * src, const unsigned int len)
{
  char       * _dst = (      char *)dst;
  const char * _src = (const char *)src;

  unsigned int aligned_len = len & ~0x3;
  while(aligned_len > 0)
  {
    *(uint32_t *)_dst = *(uint32_t *)_src;
    _dst        += 4;
    _src        += 4;
    aligned_len -= 4;
  }

  const unsigned int remainder = len & 0x3;
  if (remainder > 0)
  {
    uint32_t tmp = *(uint32_t *)_src;
    _dst[0] = (tmp & 0xFF000000) >> 24;
    if (remainder > 1)
    {
      _dst[1] = (tmp & 0x00FF0000) >> 16;
      if (remainder > 2)
        _dst[2] = (tmp & 0x0000FF00) >>  8;
    }
  }
}

Is there any changes here one could suggest to improve performance?

Note: This is platform specific and will never be used on any other platform/architecture, an assembly version that specifically targets the xtensa core would be perfectly acceptable in this instance.

EDIT

Based on feedback/review & google I have come up with the following:

void memcpy_P(void * dst, const void * src, const unsigned int len)
{
  uint32_t       * _dst = (      uint32_t *)dst;
  const uint32_t * _src = (const uint32_t *)src;
  const uint32_t * _end = _src + (len >> 2);

  while(_src != _end)
    *_dst++ = *_src++;  

  const uint32_t rem = len & 0x3;
  if (!rem)
    return;

  const uint32_t mask = 0xFFFFFFFF << ((4 - rem) << 3);
  *_dst = (*_dst & ~mask) | (*_src & mask);
}
Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
Geoffrey
  • 10,843
  • 3
  • 33
  • 46
  • 2
    This is perfectly on-topic here, but for future reference, this kind of question is more suited for [Code Review](https://codereview.stackexchange.com/) – Patrick Roberts May 23 '17 at 04:58
  • By the way, this is less related to performance and more related to type conventions: wouldn't `len`, `aligned_len` and `remainder` be better as `size_t` which guarantees a length that can address any location, rather than `unsigned int` which does not guarantee that? For example, you might need more than 32 bits to address some locations on some platforms, in which case `size_t` would need to be 64 bits. – Patrick Roberts May 23 '17 at 05:09
  • C11 has the qualifier `alignas(4)`, defined in ``. – Davislor May 23 '17 at 05:18
  • @PatrickRoberts: the use of size_t is not an issue here since this is platform specific code, it will never be built on anything other then this 32-bit platform. – Geoffrey May 23 '17 at 05:23
  • @Davislor: The alignas just ensures the memory/struct is aligned on a 32-bit boundary, it does not ensure that all reads are of 32-bits, this is critical on this architecture. In my usage of this I am copying buffers of odd lengths for TCP transmission, they are aligned on 32-bit boundaries, but may not end on the boundary. – Geoffrey May 23 '17 at 05:27
  • 1
    Note that this code violates the strict aliasing rule ; you should disable TBAA optimizations in your compiler for this function – M.M May 23 '17 at 05:30
  • 1
    In that case, you probably want to alias them to `uint32* const` variables right away, incrementing or decrementing them by 1. That will give you only `void*` pointers, which cannot be used to read memory at all, or pointers that read the correct word size. You also won’t run into aliasing issues if you don’t keep any other alias. Also, if the memory ranges shouldn’t overlap, you can declare them `restrict`. – Davislor May 23 '17 at 05:31
  • The last read of `_src` reads out of bounds of the input buffer, if the length wasn't a multiple of 4 – M.M May 23 '17 at 05:31
  • @M.M: correct, but on the platform it doesn't matter, and in actually necessary. This is bare metal with no MMU. – Geoffrey May 23 '17 at 05:33
  • 3
    Compilers may optimize code (including whole-program optimization) under the assumption that the code complies with the rules of the language, so by using this technique you're relying on your *compiler* -- not just the architecture - either accidentally or deliberately permitting an out-of-bounds read. Which may or may not be a safe assumption for your particular compiler and platform, IDK, but it's something you need to bear in mind – M.M May 23 '17 at 05:35

0 Answers0