How to parse values appear after the same string in python?

Question

I have a input text like this (actual text file contains tons of garbage characters surrounding these 2 string too.)

(random_garbage_char_here)**value=xxx**;(random_garbage_char_here)**value=yyy**;(random_garbage_char_here)

I am trying to parse the text to store something like this: value1="xxx" and value2="yyy". I wrote python code as follows:

value1_start = content.find('value')
value1_end = content.find(';', value1_start)

value2_start = content.find('value')
value2_end = content.find(';', value2_start)


print "%s" %(content[value1_start:value1_end])
print "%s" %(content[value2_start:value2_end])

But it always returns:

value=xxx
value=xxx

Could anyone tell me how can I parse the text so that the output is:

value=xxx
value=yyy

sorry I just edit my question, actually the text file does not just have that string, it also contains a lot of non-printing chars, and garbage chars surrounding the string too. — weefwefwqg3, Dec 30 '16 at 07:51

Mike Müller · Answer 1 · 2016-12-30T08:08:59.257

1

For this input:

content = '(random_garbage_char_here)**value=xxx**;(random_garbage_char_here)**value=yyy**;(random_garbage_char_here)'

use a simple regex and manually strip off the first and last two characters:

import re

values = [x[2:-2] for x in re.findall(r'\*\*value=.*?\*\*', content)]
for value in values:
    print(value)

Output:

value=xxx
value=yyy

Here the assumption is that there are always two leading and two trailing * as in **value=xxx**.

edited Dec 30 '16 at 08:08

answered Dec 30 '16 at 07:51

Mike Müller

82,630
20
166
161

sorry I just edit my question, actually the text file does not just have that string, it also contains a lot of non-printing chars, and garbage chars surrounding the string too – weefwefwqg3 Dec 30 '16 at 07:54

score 1 · Accepted Answer · answered Dec 30 '16 at 07:54

1

Use a regex approach:

re.findall(r'\bvalue=[^;]*', s)

Or - if value can be any 1+ word (letter/digit/underscore) chars:

re.findall(r'\b\w+=[^;]*', s)

See the regex demo

Details:

\b - word boundary
value= - a literal char sequence value=
[^;]* - zero or more chars other than ;.

See the Python demo:

import re
rx = re.compile(r"\bvalue=[^;]*")
s = "$%$%&^(&value=xxx;$%^$%^$&^%^*value=yyy;%$#^%"
res = rx.findall(s)
print(res)

answered Dec 30 '16 at 07:54

Wiktor Stribiżew

607,720
39
448
563

my text has something like this: (random_text)avalue=xxx;(random_text),value=yyy; so I had to remove \b to parse both values. If \b is there, the code only parses the second value=yyy. Btw, it works now. Thank you for your dedicated answer. – weefwefwqg3 Dec 30 '16 at 08:26
1

Great you could adjust the pattern as per your real data, that's why I always provide explanation of the patterns I suggest. Yes, `\b` requires a non-word char or start of string before `v`, and if you need to match all attributes that *end with* `value`, you might try `\w*value=[^;]*`. – Wiktor Stribiżew Dec 30 '16 at 08:34
Hi, could you please tell me how the regex should be if the end of my want-to-parse string end with ;;$ (2 consecutive semicolons and a dollar sign). I try the regex: re.compile(r"param=[^;;$]*") to get the value, but did not succeed. – weefwefwqg3 Jan 02 '17 at 18:15
1

No, a negated character class negates only 1 char. You need `r'param=(.*?);;\$'` – Wiktor Stribiżew Jan 02 '17 at 18:21
OH I see. Thank you so much for your help. – weefwefwqg3 Jan 02 '17 at 18:24

Christian Dean · Answer 3 · 2016-12-30T08:02:30.117

Use regex to filter the data you want from the "junk characters":

>>> import re
>>> _input = '#4@5%value=xxx38u952035983049;3^&^*(^%$3value=yyy#%$#^&*^%;$#%$#^'
>>> matches = re.findall(r'[a-zA-Z0-9]+=[a-zA-Z0-9]+', _input)
>>> matches
['value=xxx', 'value=yyy']
>>> for match in matches:
    print(match)


value=xxx
value=yyy
>>>

Summary or the regular expression:

[a-zA-Z0-9]+: One or more alphanumeric characters
=: literal equal sign
[a-zA-Z0-9]+: One or more alphanumeric characters

score 1 · Answer 4 · answered Dec 30 '16 at 08:35

You already have good answers based on the re module. That would certainly be the simplest way.

If for any reason (perfs?) you prefere to use str methods, it is indeed possible. But you must search the second string past the end of the first one :

value2_start = content.find('value', value1_end)
value2_end = content.find(';', value2_start)

How to parse values appear after the same string in python?

4 Answers4