stripping the correct float value out of my string

Question

I am using python to process pcap files and input the processed values to a text file. The text file has around 8000 rows and some times, the text file has string such as 7.70.582 . In my further processing of the text file i am splitting the file into lines and extracting each of the float values in every line. Then I get this error

ValueError: invalid literal for float(): 7.70.582

In such cases I am interested only in 7.70 and I need to avoid everything after the second decimal including it. Is there any trick to extract only the string till the first character after the first decimal point?

I was searching for an answer for this and it seems there has been no such situation asked before.

Or is there a method where I can skip those lines where this kind of errors are happening?

Some kind of findall to find the dots and then slice off the extra, or a regex pattern for any amount of digits, optional dot, optional more digits — jonatan, Nov 10 '17 at 20:30
7654 16.317 8.651 7.70.582 17.487 >> this is an example row in my text file. — Ashish Kurian, Nov 10 '17 at 20:34

Ajax1234 · Answer 1 · 2017-11-10T20:51:27.747

0

You can use str.split() and '.'.join:

s = "7654 16.317 8.651 7.70.582 17.487"
final_data = map(float, ['.'.join(i.split('.')[:-1]) if len(i.split('.')) > 2 else i for i in s.split()])

Output:

[7654.0, 16.317, 8.651, 7.7, 17.487]

Regarding the single string:

s = ["7.70.582"]
final_data = map(float, ['.'.join(i.split('.')[:-1]) if len(i.split('.')) > 2 else i for i in s])

Output:

[7.7]

edited Nov 10 '17 at 20:51

answered Nov 10 '17 at 20:34

Ajax1234

69,937
8
61
102

I like this better than my approach, but I'd suggest that indexing with `[:2]` might be better. – jedwards Nov 10 '17 at 20:38
Hi @Ajax1234, I will try your method first and see if it will fix my issue. I am stripping the single string using (x.split()[3]). How would your solution look like just for this string rather than the whole line. I need to get till the two more digits after the first decimal – Ashish Kurian Nov 10 '17 at 20:48
@Ajax1234 : Thank you for your help and it fixed my problem. Although it is affecting my overall accuracy of my results as other columns are having 4 decimal places and with this trick it is always getting rounded to one decimal place. – Ashish Kurian Nov 11 '17 at 07:52

jedwards · Answer 2 · 2017-11-10T20:40:36.080

I'm not a huge fan of this approach, but the simplest might be something like:

strs = [
    "7",
    "7.70",
    "7.70.582",
    "7.70.582.123"
]

def parse(s):
    s += ".."
    return float(s[:s.index(".", s.index(".")+1)])

for s in strs:
    print(s, parse(s))

It's a more legible approach might be to use something like:

def parse(s):
    if s.count('.') <= 1: return float(s)
    return float(s[:s.index(".", s.index(".")+1)])

Or, based off Ajax1234's answer:

def parse(s):
    return float('.'.join(s.split('.')[:2]))

All versions output:

7               7.0
7.70            7.7
7.70.582        7.7
7.70.582.123    7.7

mikeb · Answer 3 · 2017-11-10T20:40:42.363

You can use a regular expression, like this one:

https://pythex.org/?regex=%5E(%5B0-9%5D%2B%5C.%5B0-9%5D%2B).*&test_string=7.70.582&ignorecase=0&multiline=0&dotall=0&verbose=0

If your line is like '7.70.582' this regex will extract the 7.70 into the first group:

^([0-9]+.[0-9]+).*

https://docs.python.org/2/library/re.html

import re
line = "7654 16.317 8.651 7.70.582 17.487"
val = line.split(" ")[3]
m = re.search('^([0-9]+\.[0-9]+).*', val)
m.group(1)

'7.70'

float(m.group(1))

7.70

stripping the correct float value out of my string

3 Answers3