0

I have a paragraph that records the conversation between a customer and a customer service agent. How do I separate apart the conversation and create two lists (or any other format like a dictionary) with one that only contains the customer's text and the other one that only contains the agent's text?

Example paragraph:
Agent Name: Hello! My name is X. How can I help you today? ( 4m 46s ) Customer: My name is Y. Here is my issue ( 4m 57s ) Agent Name: Here's the solution ( 5m 40s ) Agent Name: Are you there? ( 6m 30s ) Customer: Yes I'm still here. I still don't understand... ( 6m 40s ) Agent Name: Ok. Let's try another way. ( 6m 50s ) Agent Name: Does that solve the problem? ( 7m 40s ) Agent Name: Thank you for contacting the customer service.

Expected Output:
List that only contains agent's text: ['Agent Name: Hello! My name is X. How can I help you today? ( 4m 46s )', 'Agent Name: Are you there? ( 6m 30s )', 'Agent Name: Ok. Let's try another way. ( 6m 50s )', 'Agent Name: Does that solve the problem? (7m 40s) Agent Name: Thank you for contacting the customer service.']

List that only contains customer's text: ['Customer: My name is Y. Here is my issue ( 4m 57s )', 'Customer: Yes I'm still here. I still don't understand... ( 6m 40s )'].

Thank you!

LY1
  • 35
  • 5

1 Answers1

0

given:

txt='''\
Agent Name: Hello! My name is X. How can I help you today? ( 4m 46s ) Customer: My name is Y. Here is my issue ( 4m 57s ) Agent Name: Here's the solution ( 5m 40s ) Agent Name: Are you there? ( 6m 30s ) Customer: Yes I'm still here. I still don't understand... ( 6m 40s ) Agent Name: Ok. Let's try another way. ( 6m 50s ) Agent Name: Does that solve the problem? (7m 40s) Agent Name: Thank you for contacting the customer service.'''

You can use re.findall:

s1='Agent Name:'
s2='Customer:'
>>> re.findall(rf'({s1}.*?(?={s2}|\Z))', txt)
['Agent Name: Hello! My name is X. How can I help you today? ( 4m 46s ) ', "Agent Name: Here's the solution ( 5m 40s ) Agent Name: Are you there? ( 6m 30s ) ", "Agent Name: Ok. Let's try another way. ( 6m 50s ) Agent Name: Does that solve the problem? (7m 40s) Agent Name: Thank you for contacting the customer service."]

>>> re.findall(rf'({s2}.*?(?={s1}|\Z))', txt)
['Customer: My name is Y. Here is my issue ( 4m 57s ) ', "Customer: Yes I'm still here. I still don't understand... ( 6m 40s ) "]
dawg
  • 98,345
  • 23
  • 131
  • 206
  • Thanks so much for the response! I have tried re.findall but couldn't figure out the syntax. I tried your code and found it doesn't parse out the last part where agent continuously made comments without the customer's response. For instance, the last item in the first list should be parsed to into 3 items with each one starting with "Agent Name". Do you know how to insert a Or in the re.findall code so it can end with S1 or S2? – LY1 Jan 05 '21 at 21:38
  • Are there line endings in the string? If so, this regex will stop at the `\n` since `.*` matches any character except line endings. – dawg Jan 05 '21 at 21:38
  • Hey I think I figured it out the one I need: re.findall(rf'({s1}.*?(?={s2}|\Z|{s1}))', txt). Thanks so much for the help!! Appreciate it indeed. – LY1 Jan 05 '21 at 21:57