2

I am working on custom board containing a 32bit MCU (Cortex A5) and a 16bit wide DRAM chip (LPDDR2). The MCU has an on-board DRAM controller which supports both DDR3 and LPDDR2, and I do have a working setup using LPDDR2.

Now, I am trying to half the clock rate at boot time on both MCU and DRAM (they both use the same PLL) due to power-restrictions, and this is where my troubles begin.

As mentioned, I do have a working setup using the full frequency (DRAM: 400MHz, MCU: 396MHz), so one would expect that halving the frequency and updating the timings according to the DRAM datasheet should yeld another working setup, but no.

The DRAM init runs at boot time from MCU intram, so does any tests. The whole procedure is handled by a board-specific version of U-Boot 2015.04.

I have a collection of tests that run at MCU boot to verify DRAM integrity. One of these tests is a so-called "walking bit"-test, where I use a 32bit uint, a toggle each bit in sequence, reading back to verify.

What I found was that, when reading back, the lower 16 bits have not been touched, while the upper 16 bits seems altered. After some investigation, I found the following pattern (assuming a watermark "0xaa"):

   write    ->  readback
0x8000_0000 -> 0x0000_aaaa
0x4000_0000 -> 0x0000_aaaa
0x2000_0000 -> 0x0000_aaaa
0x1000_0000 -> 0x0000_aaaa
[...]
0x0008_0000 -> 0x0000_aaaa
0x0004_0000 -> 0x0000_aaaa
0x0002_0000 -> 0x0000_aaaa
0x0001_0000 -> 0x0000_aaaa

0x0000_8000 -> 0x8000_aaaa
0x0000_4000 -> 0x4000_aaaa
0x0000_2000 -> 0x2000_aaaa
0x0000_1000 -> 0x1000_aaaa
[...]
0x0000_0008 -> 0x0008_aaaa
0x0000_0004 -> 0x0004_aaaa
0x0000_0002 -> 0x0002_aaaa
0x0000_0001 -> 0x0001_aaaa

The watermark is present, although I suspect it got there from a previous debugging-session. This I will address later, hence my primary focus at the moment is getting the "walking bit"-test to pass.

Here is a memory dump:

(gdb) x/16b addr  
0x80000000:     0x00    0x00    0x55    0x55    0x55    0x55    0x00    0x80
0x80000008:     0xaa    0xaa    0xaa    0xaa    0xaa    0xaa    0x00    0x55
(gdb) p/x *addr
$59 = 0x55550000
(gdb) set *addr = 0xaabbccdd
(gdb) p/x *addr 
$60 = 0xccdd0000
(gdb) x/16b addr
0x80000000:     0x00    0x00    0xdd    0xcc    0xbb    0xaa    0x00    0x80
0x80000008:     0xaa    0xaa    0xaa    0xaa    0xaa    0xaa    0x00    0x55

Can anyone tell my what might cause this type of behaviour?

Cheers

Note: I have intentionally left out MCU and DRAM specifications, as I believe that the question can be addressed only with JEDEC/DFI in mind.

Edit: Added memory dump.

Edit: Here is the source of the "walking bit"-test. Run from MCU intram on memory area located on DRAM. Assumed bug-free:

static u32 __memtest_databus(volatile u32 * const addr)
{
  /* Walking bit */

  u32 pattern = (1u << 31);
  u32 failmask = 0;

  for(; pattern; pattern >>= 1)
  {
    *addr = pattern;

    if(*addr != pattern)
      failmask |= pattern;
  }

  return failmask;
}

Edit: The PLL and VCO has been checked, and settings are correct. PLL is stable and DRAM PHY does obtain a lock.

Link to DRAM Data Sheet

Tom
  • 414
  • 4
  • 17
  • Do you meet the timing specifications for the DRAM at the lower frequency? Are you sure the PLL is stable? – David Schwartz May 10 '16 at 07:08
  • Yes, I do. The PLL is stable, and the PHY is able to obtain a lock. – Tom May 10 '16 at 07:23
  • @Lundin We have ruled out bugs, and are assuming the error(s) to be related to the DRAM configuration. For your convenience, I have added the source of the memory test in question. – Tom May 10 '16 at 11:54
  • @artlessnoise This is not a change of frequency, as the DRAM initialization happens after the clocks have been set. The DRAM PHY indicate lock and the DRAM controller on the MCU indicate that the memory controller has been initialized. – Tom May 10 '16 at 13:14
  • ill ask the pll question yet again. Understand that just because the pll locks that is not the end of it. the VCO needs to be within range, if it worked before and you are halving it you probably want to do that by doubling the divisor not halving the multiplier. That or double, triple, check the VCO range and where you are setting it. – old_timer May 10 '16 at 13:25
  • and then of course double check your timing. make sure the refresh rate is basically twice as fast now relative to the now half speed clock in order to get the same refresh relative to wall clock time. – old_timer May 10 '16 at 13:27
  • 2
    and always remember dram/ddr is a nightmare, allocate MONTHS in your schedule to get it working right. 10-12 weeks minimum. – old_timer May 10 '16 at 13:27
  • 1
    You must go over the parameters for both the controller (CPU side) and the DDR parameters (OCD, DLL, CAS, etc). They will be different for each frequency. Then your board may have resonance at different frequencies. Sorry, it was not clear that your are changing the 'BOOT frequency'; also your question title is not suitable. It is difficult to know if your writes succeed with your test. DDR can hold data for some time. You should zero it at the working frequency. – artless noise May 10 '16 at 13:39
  • 1
    Related: [ARM memtest](http://stackoverflow.com/questions/11640062/how-to-do-memory-test-on-arm-architecture-hardware-something-like-memtest86); the `ldmia/stmia` mechanism would be helpful. It does appear that the a write is failing (or maybe a read). However, this could be the 'single beat' write only. Do you have cache enabled? The DDR cycles are much different for single beat versus a burst. Does a single 16 bit walking test pass? Your CPU is little endian, then it is only the first 16bits that is corrupt? A fixed pattern might be better to determine this. Ie, 0x12345678. – artless noise May 10 '16 at 13:49
  • @dwelch I am doubling the divisor as you say, and the VCO is in range. The refresh timing also seem to be corrent. – Tom May 10 '16 at 14:11
  • all of your tests passed at the faster speed? – old_timer May 10 '16 at 15:05
  • @dwelch That is correct – Tom May 10 '16 at 15:11
  • 1
    AS @artlessnoise suggested, I loaded the DRAM using a a working frequency (400 MHz), the read it back using the experimental frequency (200 MHz). I was able to read back the pattern that I initially wrote to it, indicating that the read-operation is fine. I then wrote an anti-pattern to the same region I just read, and found that the two first bytes of each burst was unchanged. – Tom May 10 '16 at 15:16
  • Correction: Every other burst was wrong. – Tom May 10 '16 at 15:23
  • maybe covered, maybe hinted at. the arm can do certain sized bursts multiples of 64 bits. to get to the point...do you know if the dram is seeing a read-modify-write when you do a write or are your writes ideally sized, and not modified in the middle to be write only and not a read-modify write. (16 bit wide I cant imagine, but have to ask). – old_timer May 10 '16 at 18:41
  • does your controller have a back door that you can control the size of the access and not have to rely on the arm axi/amba and whatever is in the middle? – old_timer May 10 '16 at 18:42
  • Sometimes the controller is setup to work. Ie, it will boot naturally at 400MHz. If you access the DDR at all before the 200MHz activation, there can be problems (like ROM boot code directions). Please ensure that doesn't happen? As Dwelch suggests some details on the CPU DDR controller (is it one or two AXI or AHB or custom bus?) and the DDR data sheet might be good to add to your question. I believe that technique maybe generic not the Q/A. I think you should update the quesiton and remove any comments that are obsolete by the update. – artless noise May 10 '16 at 20:32
  • As @artlessnoise suggested, I have updated the question. – Tom May 12 '16 at 07:11
  • @artlessnoise The MCU DRAM controller has two AXI-controllers – Tom May 12 '16 at 07:12

2 Answers2

0

the bytes look like they have shifted, not altered.

quote

(gdb) x/16b addr
0x80000000:     0x00    0x00    *0xdd    0xcc    0xbb    0xaa*    0x00    0x80
0x80000008:     0xaa    0xaa    0xaa    0xaa    0xaa    0xaa    0x00    0x55

unquote

Mrunmoy
  • 111
  • 6
  • could you check if `addr` still points to `0x8000000` before you execute the statement `set *addr = 0xaabbccdd` – Mrunmoy May 10 '16 at 12:17
  • It does: `(gdb) p/x addr $78 = 0x80000000` – Tom May 10 '16 at 13:09
  • I noticed the shift too. Playing around with read/write latencies, I am able to push this shift even further to the right, but not the opposite direction. – Tom May 10 '16 at 13:17
  • could you post your changes to the read/write latencies? sounds like the processor tries to read in 16 bits of 0s before the actual data are shifted out on the data bus. – Mrunmoy May 10 '16 at 13:47
0

You have one severe bug here: u32 pattern = (1 << 31);.

The integer constant 1 is of type int, which is 32 bits on your ARM system.

You left shift this signed number out of bounds and invoke undefined behavior; anything could happen. The variable pattern can get any value.

Correct code would be u32 pattern = (u32)1 << 31; or u32 pattern = 1u << 31;

Lundin
  • 195,001
  • 40
  • 254
  • 396
  • 1
    While this is a possible issue, I assume that the OP runs the test at the good frequency and it passes? If so, there is no functional code issue in the test but some DDR parameter/initialization issue? – artless noise May 10 '16 at 13:59
  • As @artlessnoise points out: This test has passed at other frequencies. I would assume the compiler would figure this out. Regardless, I have corrected the issue. – Tom May 10 '16 at 14:15