I'm trying to use a PyBoard v1.1 to drive the protocol for Adafruit's NeoPixel LEDs SK6812RGBW using Micropython's inline assembler.
Protocol
As can be seen in the linked data sheet, a single LED is driven by assembling 4 8-bit rgbw values. Each high bit consists of 0.6 us of analog high, followed by 0.6 us of digital low, a low bit has the ratio of 0.3 us high to 0.9 us low. This makes every data bit used in the 4 byte LED colour value out to be 4 analog bits at 0.3 us each, or a total 128 bits over 38.4 us. The byte stream sent to the first LED contains the values for all subsequent LEDs as well, it passes all but its own on to the next, and so on.
Implementation
The protocol can be implemented quite easily using the pyboard's SPI interface. Once the data stream has been generated (effectively 16 bytes per LED) and the baudrate has been calculated (1s / 0.3 us = ca 3333333), one only needs to create a pyb.SPI instance and call its send method with the bytes as argument.
The Task
Now to the task at hand: I want to drive three different LED strips with one PyBoard. However there are only 2 SPI buses available. So after trying to bitbang the protocol with pyb.Pin and loops I quickly realized that wasn't going to work, the minimum toggle speed was 54 us which is just a bit shy of the 0.3 us I need...
Implementation V2
After trying some optimization steps I turned to Micropython's inline assembler. A few hours later I had managed to toggle a given pin at a breezy 23 ns as measured with an oscilloscope. That was great and all but I didn't need to mindlessly toggle pins, I needed to toggle pins according to a bit stream following an exact protocol. So a another couple of hours later I finished the following implementation:
@micropython.asm_thumb
def send_bits_on_x9(r0):
# r0 0th word contains the data array address
# r0 1st word contains length of data array
# Store the GPIOB address in r3
movwt(r3, stm.GPIOB)
# Store the bit mask for PB6 (the pin X9)
movw(r4, 1 << 6)
# Load address into r5
ldr(r5, [r0, 0])
# Load array length into r6
ldr(r6, [r0, 4])
# Jump to condition evaluation
b(loop_entry)
# Main loop
label(loop)
# Get current "bit" word
ldr(r0, [r5, 0])
# Shift address to next "bit" for next time
add(r5, r5, 4)
# Evaluating the bit and toggling accordingly
cmp(r0, 1)
ite(eq)
strh(r4, [r3, stm.GPIO_BSRRL]) # Turn LED on
strh(r4, [r3, stm.GPIO_BSRRH]) # Turn LED off
# Delay for a bit
movwt(r7, 8) # 20948000 cycles is about 1s
label(delay)
sub(r7, r7, 1)
cmp(r7, 0)
bgt(delay)
# Eval loop; using data array length as initial counter value
sub(r6, r6, 1)
label(loop_entry)
cmp(r6, 0)
bgt(loop)
B6 is the CPU name for the pin X9 that I use as data connection to the LEDs.
To run I embedded it in a demo python script:
import array
import uctypes
import micropython
import stm
@micropython.asm_thumb
def send_bits_on_x9(r0):
...
send_buffer = array.array("i", [1, 0, 1, 1, 0, 0, 1, 0])
send_bits_on_x9(array.array("i", [uctypes.addressof(send_buffer), len(send_buffer)]))
The Problem
This worked beautifully, however when using it in place of the SPI streamer, looking at the LEDs occasional artifacts could be seen every couple of executions. The following is an image from when I looked at it with the oscilloscope: oscilloscope log with artifact As can be seen, there is a spot where for some reason it seems to stop toggling for exactly 2 value bits: missing flanks penciled in This happens seemingly at random, at any part of the bit stream, sometimes starting with a rising Flank, sometimes with a falling flank.
Question
Now obviously my question is why this would happen. It doesn't happen with SPI, though I assume that the C implementation takes care not to let anything interrupt the stream. I tried disabling the garbage collector before calling send_bits_on_x9 and re-enabling after, but that didn't help. I also changed the number of delay cycles that didn't change anything either.
A second thing I noticed was that when having a number of trailing zero-bytes (a 80 us reset period as per protocol defined), it seemed that that period would execute in about a quarter of the time it was supposed to. When changing the trailing bytes to 0xff, they retained their intended duration and the LEDs don't seem to mind.
Now if anyone could point me to a resource other that the official inline assembler documentation or even provide some insight, I'd appreciate it. Cheers!