How can I parse a bytestring in Python 3?

Question

Basically, I have two bytestrings in a single line like this:

b'\xe0\xa6\xb8\xe0\xa6\x96 - \xe0\xa6\xb6\xe0\xa6\x96\n'

This is a Unicode string that I'm importing from an online file using urllib, and I want to compare the individual bytestrings so that I can replace the wrong ones. However, I can't find out any way to parse the string so that I get \xe0\xa6\xb8\xe0\xa6\x96 and \xe0\xa6\xb6\xe0\xa6\x96 in two different variables.

I tried converting it into a raw string like str(b'\xe0\xa6\xb8\xe0\xa6\x96') and the indexing actually works, but in that case I can't revert back to the original bytestring in the first place.

Is it possible?

score 4 · Accepted Answer · edited Apr 28 '22 at 13:33

4

I would recommend trying something like this...

arr = b'\xe0\xa6\xb8\xe0\xa6\x96 - \xe0\xa6\xb6\xe0\xa6\x96\n'

splt = arr.decode().split(' - ')

b_arr1 = splt[0].encode()
b_arr2 = splt[1].encode()

I tried it out in the Python 3 terminal and it works fine.

edited Apr 28 '22 at 13:33

Peter Mortensen

30,738
21
105
131

answered Jan 10 '18 at 04:22

Jake

617
1
6
21

Hey that works! Thank you so much! Just one more question, how do I get rid of the newline character at the end of the second bytestring? Same `decode()` - `encode()` procedure, I hope? – srdg Jan 10 '18 at 04:45

score -1 · Answer 2 · answered Jan 10 '18 at 04:35

-1

I would do something like this:

a = b'\xe0\xa6\xb8\xe0\xa6\x96 - \xe0\xa6\xb6\xe0\xa6\x96\n'

parts = [part.strip() for part in a.decode().split('-')]

first_part = parts[0].encode()
second_part = parts[1].encode()

answered Jan 10 '18 at 04:35

Jahongir Rahmonov

13,083
10
47
91

1

What is the purpose of the strip() function? – Jake Jan 10 '18 at 04:38
@JakeStephens it strips off the white space, leaving only the the needed chars. Just in case there is more than one space before and after `-` – Jahongir Rahmonov Jan 10 '18 at 04:41
1

Typically you don't want to do that when working with binary data... it's not real text, what you consider a "space" may be not what you think. – Havenard Jan 10 '18 at 16:18

How can I parse a bytestring in Python 3?

2 Answers2