-3

Im having some problems pulling numbers(54878, 45666, 23331,003455) from a list of strings, I have a list of strings like the following (about 2700+):

["011 54878 20000 0.00", " 45666 134 2.75", " 23331 0 0.00", "015 00345 -11110 2.75"]

every new line the numbers are different but kinda stay at the same length, the only sure consistent is the space between the numbers...

im trying to pull only the second column numbers (54878,45666,23331,00345) , is there a way to set a regex code to pull a number only after certain number of white space, then start pulling the number untill the first next space?

Thank you(: !

MaxKedem
  • 41
  • 1
  • 1
  • 9

4 Answers4

0

Assuming that the first x numbers that you want to skip don't have decimals, you can use something like:

^(\d+\s){x}(\d+)\s

Here, the result is captured in group #2. (Make sure you replace x with what you want).

For e.g., ^(\d+\s){1}(\d+)\s for the example provided by you captures '54878' in group #2. Working example and explanation can be found here.

If decimals are allowed, the regex gets a bit complicated to:

^(\d*\.?\d*\s){1}(\d*\.?\d*)\s

Working example for this can be found here.

ketan vijayvargiya
  • 5,409
  • 1
  • 21
  • 34
0

You can use cut (Linux program) to separate the fields like below

cut -d " " -f2 test.txt

Where, -d " " means space delimited and -f2 to take field 2.

Example text file test.txt:

011 54878 20000 0.00
012 548781 20000 0.00
013 5487822 20000 0.00
014 54878333 20000 0.00
015 548784444 20000 0.00
Jaakko
  • 4,674
  • 2
  • 26
  • 20
0

You can use string split in Python to separate the fields.

with open("test.txt") as fid:
    for line in fid:
        print line.split()[1]

Resulting print:

54878
548781
5487822
54878333
548784444

Used example test.txt file

011 54878 20000 0.00
012 548781 20000 0.00
013 5487822 20000 0.00
014 54878333 20000 0.00
015 548784444 20000 0.00
Jaakko
  • 4,674
  • 2
  • 26
  • 20
0

if you use python you can do:

import re

string = "011 54878 20000 0.00"
regex = "^[^ ]* ([0-9]*) .*$"
re.search(regex, string).group(1)
a.costa
  • 1,029
  • 1
  • 9
  • 19