2

So I'm working with incoming audio from Watson Text to Speech. I want to play the sound immediately when data arrives to Python with a websocket from nodeJS. This is a example of data I'm sending with the websocket:

  <Buffer e3 f8 28 f9 fa f9 5d fb 6c fc a6 fd 12 ff b3 00 b8 02 93 04 42 06 5b 07 e4 07 af 08 18 0a 95 0b 01 0d a2 0e a4 10 d7 12 f4 12 84 12 39 13 b0 12 3b 13 ... >

So the data arrives as a hex bytestream and I try to convert it to something that Sounddevice can read/play. (See documentation: The types 'float32', 'int32', 'int16', 'int8' and 'uint8' can be used for all streams and functions.) But how can I convert this? I already tried something, but when I run my code I only hear some noise, nothing recognizable. Here you can read some parts of my code:

def onMessage(self, payload, isBinary):
    a = payload.encode('hex')
    queue.put(a)

After I receive the bytesstream and convert to hex, I try to send the incoming bytestream to Sounddevice:

def stream_audio():
    with sd.OutputStream(channels=1, samplerate=24000, dtype='int16', callback=callback):
        sd.sleep(int(20 * 1000))


def callback(outdata, frames, time, status):
    global reststuff, i, string
    LENGTH = frames
    while len(reststuff) < LENGTH:
        a = queue.get()
        reststuff += a
    returnstring = reststuff[:LENGTH]
    reststuff = reststuff[LENGTH:]

    for char in returnstring:
        i += 1
        string += char
        if i % 2 == 0:
            print string
            outdata[:] = int(string, 16)
            string = ""
Aagje
  • 21
  • 1
  • Did you try to remove the `for` loop and just assign `outdata[:] = returnstring`? – Matthias Apr 09 '17 at 18:19
  • @Matthias Yes, but that didn't work. If I try that, I get this error: `ValueError: invalid literal for int() with base 10: ` – Aagje Apr 10 '17 at 07:20
  • If you mention an error message (which is good!), you should also show how you got it. In this case I can only guess that you are still using `int()`, which you shouldn't. You should try to use a `sd.RawOutputStream` and directly assign the `bytes` you already have to the output buffer. You just have to make sure to match the data type and the number of (interleaved) channels. – Matthias Apr 10 '17 at 11:25
  • @Matthias, the first error is when I tried this in my callback(without the int and forloop): `outdata[:] = returnstring` If I also change `sd.RawOutputStream` I get this error: `ValueError: right operand length must match slice length`. Any ideas? – Aagje Apr 10 '17 at 12:04
  • I think the problem now is that `frames` is the length in, well, frames and `LENGTH` looks like the length in bytes. Since you seem to be using `'int16'` and mono data, each frame has a size of 2 bytes. – Matthias Apr 10 '17 at 13:17
  • @Matthias, yes, that's why I used the for loop where I divide `returnstring` in 2 bytes and send it to the stream. – Aagje Apr 10 '17 at 14:06
  • It's very hard to know what you are trying to do since you didn't provide a [MVCE](https://stackoverflow.com/help/mcve). What is `payload`? I think the `.encode('hex')` doesn't make sense at all, you should probably get rid of it, but it's hard to tell. Do you have 16-bit values to start with? Are they stored in a `bytes` string? – Matthias Apr 10 '17 at 15:39
  • You can see the content of `payload` in my post. the buffer example is a part of what the websocket receives. The code you can see in de the post is the only relevant part. The rest are imports, variables and the websocket class. I used the `.encode('hex')` because that's the only way the data was readable. I also tried without the encoding. An example if I print `a`: `0040002e0013000600fcffeeffe2ffd0ffc0ffb8ffbcffc1ffbeffbbffbcffbf` – Aagje Apr 10 '17 at 16:20
  • I can't see the type of `payload`, and I still don't know what it is. Assuming it is something buffer-like, you should try to put it (or a copy of it, if the buffer is temporary) into the queue, without converting it into something *you* can read. Then, in the callback just chop it to the right size and assign it to the output buffer, but taking into account that a frame has a size of 2 bytes. At some point, you should also handle the case when you don't have enough data to fill the output buffer. – Matthias Apr 10 '17 at 17:34
  • So I tried what you asked. I removed the `.encode('hex')` and tried it with the `for`-loop (so it chops the data to the right size of 2bytes), and without the ìnt`. But I still get `ValueError: right operand length must match slice length`. Thank you already for putting so much effort in this case! – Aagje Apr 11 '17 at 06:54
  • It would really help if you would provide code that others can actually run. Without that, I can only guess what's wrong. For debugging, you can try to print `len(outdata)` and `len(returnstring)`. Those two numbers should be the same. – Matthias Apr 11 '17 at 10:39
  • I totally understand that this is a hard way to find the problem. Here you ca find the code. nodejs: https://pastebin.com/6LS5NEtD and python: https://pastebin.com/J6ENq5sv . Te only problem is that for using watson text to speech, you need to have an account and get your own credentials. – Aagje Apr 11 '17 at 11:47

1 Answers1

0

look at your stream of data:

e3 f8 28 f9 fa f9 5d fb 6c fc a6 fd 12 ff b3 00
b8 02 93 04 42 06 5b 07 e4 07 af 08 18 0a 95 0b
01 0d a2 0e a4 10 d7 12 f4 12 84 12 39 13 b0 12
3b 13

you see here that every two bytes the second one is starting with e/f/0/1 which means near zero (in two's complement). So that's your most significant bytes, so your stream is little-endian! you should consider that in your conversion. If I have more data I would have tested but this is worth some miliseconds!

mnz
  • 156
  • 6