-3

I have a long file

Jet pack(human, metal)
thin wire, sheet; fat tube,rod
thin girl;
fat boy;
We like to read
They like to write
End

I would like to extract all the words after "thin" and "fat" which are comma separated. These words can also be alone. In any case, even if both thin and thick are present on a single line, they will be separated by a semicolon. My array will contain:

wire, sheet, tube,rod,girl,boy

I need an array of these words which I will then use in extend the arguments of the function. Since it is a mixture, how can we use strip for ; and then again use strip for ,?

Cheers

user3483203
  • 50,081
  • 9
  • 65
  • 94
Hamad Hassan
  • 139
  • 3
  • 13
  • 2
    Please show us what you tried and the problems you've run into. – Thierry Lathuille Apr 19 '18 at 19:38
  • Please read and follow the posting guidelines in the help documentation, as suggested when you created this account. [On topic](http://stackoverflow.com/help/on-topic) and [how to ask](http://stackoverflow.com/help/how-to-ask) apply here. StackOverflow is not a design, coding, research, or tutorial service. – Prune Apr 19 '18 at 19:54
  • @Prune, I do not know anything and I am not a developer or programmer. So I only ask when I need some help! Any problem with this? – Hamad Hassan Apr 21 '18 at 17:32
  • With -3, I do not care. I can always make a new account, you can turn it to -50. I do not give a damn. I just thank those people who will help or who are ready to help. If you are too perfect, win the famous challenges among yourselves and do not bother! – Hamad Hassan Apr 21 '18 at 17:33
  • @ThierryLathuille, I did not try anything, because I do not know where to start! – Hamad Hassan Apr 21 '18 at 17:35
  • @HamadHassan. Yes, there is a problem with this. Again, read the posting guidelines. – Prune Apr 23 '18 at 15:10

1 Answers1

1

You could use a regular expression here to extract the values that you need, and then use re.split() to split on either commas or semicolons:

This is the regex I am using:

(?:thin|fat)(.*?)(?=thin|fat|\n)

It will match anything after thin/fat, and before it either finds another thin/fat, or a newline.

x = """
Jet pack(human, metal)
thin wire, sheet; fat tube,rod
thin girl;
fat boy;
We like to read
They like to write
End
"""
import re

y = [j.strip() for i in re.findall(r'(?:thin|fat)(.*?)(?=thin|fat|\n)', x) for j in re.split(r'[;,]', i) if j.strip()]
print(y)

Output:

['wire', 'sheet', 'tube', 'rod', 'girl', 'boy']

You mentioned you were having difficulty reading this from a file, here is a working example reading from a file:

test.txt

Jet pack(human, metal)
thin wire, sheet; fat tube,rod
thin girl;
fat boy;
We like to read
They like to write
End

Code

import re

with open('test.txt') as f:
  y = [j.strip() for i in re.findall(r'(?:thin|fat)(.*?)(?=thin|fat|\n)', f.read()) for j in re.split(r'[;,]', i) if j.strip()]
  print(y)

Output:

['wire', 'sheet', 'tube', 'rod', 'girl', 'boy']

You can try out my solution to see that it works here

user3483203
  • 50,081
  • 9
  • 65
  • 94
  • if I just put all the text in a text file and then read that text file as a generalized way of doing the same, the last 'boy' is missing. I copied the text in file 'trial.txt' and used the same code. I used to read the text to 'x' as follows. with open('trial.txt', 'r') as myfile: x=myfile.read() – Hamad Hassan Apr 23 '18 at 19:33
  • any suggestion to this generalization? – Hamad Hassan Apr 23 '18 at 19:33
  • @HamadHassan I updated my answer showing how to read from a file. – user3483203 Apr 23 '18 at 20:46