0

I have a string which looks something like

text = "customer: Anna Smith; payment: 123; date: 12-02-2020; customer: Jack; payment: 10.3; date: 20-03-2020"

Now I want to turn it into a list of tuples (which later I can use to create a dictionary):

[('customer', 'Anna Smith'),
 ('payment', '123'),
 ('date', '12-02-2020'),
 ('customer', 'Jack'),
 ('payment', '10.3'),
 ('date', '20-03-2020')]

I tried to use re.findall for this purpose in the following way:

re.findall(u'(\w+): (.+?);', text)

Of course it doesn't capture the last pair of a key and a value, because of the semicolon in the regular expression. I think that I need an if-else if operation here: if the parser encounters a semicolon, then it extracts the words, else it checks for a regular expression for EOL (\Z). Please, help

mechanical_meat
  • 163,903
  • 24
  • 228
  • 223
diplodocus
  • 67
  • 1
  • 1
  • 7

3 Answers3

1

If you want to match the last one as well, use this pattern (\w+): (.+?)(?:;|$)

The only difference between this and your pattern is it will accept look for a match that ends in either ; or $ which is the end of line character.

Demo

emsimpson92
  • 1,779
  • 1
  • 9
  • 24
1

Instead of matching the ;, change .+? to [^;]+ so it matches everything that isn't a ;.

re.findall(r'(\w+): ([^;]+)', text)
Barmar
  • 741,623
  • 53
  • 500
  • 612
1

To just keep it simple, we can also use the split function.

arr = text.split("; ")
result = []
for a in arr:
    _ = a.split(": ")
    result.append((_[0], _[1]))

Now the result list contains your desired output

San
  • 453
  • 3
  • 14