0

I followed this answer's (Python: Split by 1 or more occurrences of a delimiter) directions to a T and it keeps failing so I'm wondering if it's something simple I'm missing or if I need a new method to solve this.

I have the following .eml file:

enter image description here

My goal is to eventually parse out all the fish stocks and their corresponding weight amounts, but for a test I'm just using the following code:

with open(file_path) as f:
    for line in f:
        if ("Haddock" in line):
            #fish, remainder = re.split(" +", line)
            fish, remainder = line.split()
            print(line.lower().strip())
            print("fish:", fish)
            print("remainder:", remainder)

and it fails on the line fish, remainder = line.split() with the error

ValueError: too many values to unpack (expected 2)

which tells me that Python is failing because it is trying to split on too many spaces, right? Or am I misunderstanding this? I want to get two values back from this process: the name of the fish (a string containing all the text before the many spaces) and the quantity (integer from the right side of the input line).

Any help would be appreciated.

Community
  • 1
  • 1
theprowler
  • 3,138
  • 11
  • 28
  • 39
  • You are correct. `line.split()` results in `['GB', 'Haddock', 'West', '22572']` which of course can't be unpacked into 2 names. – vallentin Apr 06 '17 at 17:17
  • Ohh ok. So is there a way to directly answer that linked user's question? Can I `split()` specifically on several spaces in a row? – theprowler Apr 06 '17 at 17:22
  • Could you give an example as to what `fish` and `remainder` would be? – vallentin Apr 06 '17 at 17:27
  • Right, I wasn't very clear about that. In past cases (emails) they would normally list a fish, weight, and price; so the first `split()` would produce a `fish`, and a `remainder`, then I would `split()` the `remainder` to produce a weight and price. In this case, I would like the `fish` to be `GB Haddock West` and the `remainder` to be `22572`. – theprowler Apr 06 '17 at 17:29

4 Answers4

2

You may use below regular expression for splitting

fish, remainder = re.split(r'(?<=\w)\s+(?=\d)',line.strip())

it will split and give `['GB Haddock West', '22572']`
Deba
  • 609
  • 8
  • 17
1

I would like the fish to be GB Haddock West and the remainder to be 22572

You could do something line this:

s = line.split()
fish, remainder = " ".join(s[:-1]), s[-1]

Instead of using split() you could utilize rindex() and find the last space and split between there.

at = line.rindex(" ")
fish, remainder = line[:at], line[at+1:]

Both will output:

print(fish) # GB Haddock West  
print(remainder) # 22572
vallentin
  • 23,478
  • 6
  • 59
  • 81
1

Yes ... you can split on multiple spaces. However, unless you can specify the number of spaces, you're going to get additional empty fields in the middle, just as you're getting now. For instance:

in_stuff = [
    "GB Haddock West          22572",
    "GB Cod West               7207",
    "GB Haddock East           3776"
]

for line in in_stuff:
    print line.split("   ")

Output:

['GB Haddock West', '', '', ' 22572']
['GB Cod West', '', '', '', '', '7207']
['GB Haddock East', '', '', '  3776']

However, a simple change will get what you want: pick off the first and last fields from this:

for line in in_stuff:
    fields = line.split("   ")
    print fields[0], int(fields[-1])

Output:

GB Haddock West 22572
GB Cod West 7207
GB Haddock East 3776

Will that solve your problem?

Prune
  • 76,765
  • 14
  • 60
  • 81
1

Building upon @Vallentin's answer, but using the extended unpacking features of Python 3:

In [8]: line = "GB Haddock West 22572"

In [9]: *fish, remainder = line.split()

In [10]: print(" ".join(fish))
GB Haddock West

In [11]: print(int(remainder))
22572
Felix
  • 6,131
  • 4
  • 24
  • 44