0

I am using python to process pcap files and input the processed values to a text file. The text file has around 8000 rows and some times, the text file has string such as 7.70.582 . In my further processing of the text file i am splitting the file into lines and extracting each of the float values in every line. Then I get this error

ValueError: invalid literal for float(): 7.70.582

In such cases I am interested only in 7.70 and I need to avoid everything after the second decimal including it. Is there any trick to extract only the string till the first character after the first decimal point?

I was searching for an answer for this and it seems there has been no such situation asked before.

Or is there a method where I can skip those lines where this kind of errors are happening?

Ashish Kurian
  • 51
  • 5
  • 12

3 Answers3

0

You can use str.split() and '.'.join:

s = "7654 16.317 8.651 7.70.582 17.487"
final_data = map(float, ['.'.join(i.split('.')[:-1]) if len(i.split('.')) > 2 else i for i in s.split()])

Output:

[7654.0, 16.317, 8.651, 7.7, 17.487]

Regarding the single string:

s = ["7.70.582"]
final_data = map(float, ['.'.join(i.split('.')[:-1]) if len(i.split('.')) > 2 else i for i in s])

Output:

[7.7]
Ajax1234
  • 69,937
  • 8
  • 61
  • 102
  • I like this better than my approach, but I'd suggest that indexing with `[:2]` might be better. – jedwards Nov 10 '17 at 20:38
  • Hi @Ajax1234, I will try your method first and see if it will fix my issue. I am stripping the single string using (x.split()[3]). How would your solution look like just for this string rather than the whole line. I need to get till the two more digits after the first decimal – Ashish Kurian Nov 10 '17 at 20:48
  • @Ajax1234 : Thank you for your help and it fixed my problem. Although it is affecting my overall accuracy of my results as other columns are having 4 decimal places and with this trick it is always getting rounded to one decimal place. – Ashish Kurian Nov 11 '17 at 07:52
0

I'm not a huge fan of this approach, but the simplest might be something like:

strs = [
    "7",
    "7.70",
    "7.70.582",
    "7.70.582.123"
]

def parse(s):
    s += ".."
    return float(s[:s.index(".", s.index(".")+1)])

for s in strs:
    print(s, parse(s))

It's a more legible approach might be to use something like:

def parse(s):
    if s.count('.') <= 1: return float(s)
    return float(s[:s.index(".", s.index(".")+1)])

Or, based off Ajax1234's answer:

def parse(s):
    return float('.'.join(s.split('.')[:2]))

All versions output:

7               7.0
7.70            7.7
7.70.582        7.7
7.70.582.123    7.7
jedwards
  • 29,432
  • 3
  • 65
  • 92
0

You can use a regular expression, like this one:

https://pythex.org/?regex=%5E(%5B0-9%5D%2B%5C.%5B0-9%5D%2B).*&test_string=7.70.582&ignorecase=0&multiline=0&dotall=0&verbose=0

If your line is like '7.70.582' this regex will extract the 7.70 into the first group:

^([0-9]+.[0-9]+).*

https://docs.python.org/2/library/re.html

import re
line = "7654 16.317 8.651 7.70.582 17.487"
val = line.split(" ")[3]
m = re.search('^([0-9]+\.[0-9]+).*', val)
m.group(1)

'7.70'

float(m.group(1))

7.70

mikeb
  • 10,578
  • 7
  • 62
  • 120