1

I have a data which looks like

5,960464,6,65430376927058E-10,-91,7689743041992,-89,5235061645508,6,71407200919707E-05,0,00869479635485696
11,92093,5,23110578457515E-10,-92,8140640258789,-90,5685958862305,8,97218165598724E-05,0,011619072933792
17,88139,4,52214777469635E-10,-93,4465560913086,-91,2010803222656,0,000104674258706842,0,0135554304720727
23,84186,4,29985026518504E-10,-93,6654663085938,-91,4199981689453,0,00011655840052398,0,0150944397768688
29,80232,5,20411183436712E-10,-92,8365325927734,-90,591064453125,0,000128133766655091,0,0165934622941289

This data has actually 6 columns. Comma used to separate the data as well as used for floating numbers.

I need to convert above data to the following for further processing.

x,           y1,                    y2,                y3,               y4,               y5
5.960464,6.65430376927058E-10,-91.7689743041992,-89.5235061645508,6.71407200919707E-05,0.00869479635485696
11.92093,5.23110578457515E-10,-92.8140640258789,-90.5685958862305,8.97218165598724E-05,0.011619072933792
17.88139,4.52214777469635E-10,-93.4465560913086,-91.2010803222656,0.000104674258706842,0.0135554304720727
23.84186,4.29985026518504E-10,-93.6654663085938,-91.4199981689453,0.00011655840052398,0.0150944397768688
29.80232,5.20411183436712E-10,-92.8365325927734,-90.591064453125,0.000128133766655091,0.0165934622941289

I've tried to read the file in python and read data from each column which separated by comma. This case is not working, at some rows has more columns.

Can anyone helps me to find a solution ?

The code works, Thank you. But there is an error occurred on lines 432 and other lines of output.

16433.7,08114511022965E-12.-111,498962402344.-109,253494262695.0,000461181910941812.016438.96,5.80785796046257E-12,-112.359840393066,-110.114372253418,0.000461223557779535,0.0597289529135575

16886.6,00392270522813E-12.-112,21565246582.-109,970184326172.0,000463834922107529.016891.96,4.66341618448496E-12,-113.312957763672,-111.067489624023,0.00046386919345512,0.0600715656141191


17488.3,70004369566838E-12.-114,317932128906.-112,072463989258.0,00046732773455058.017493.96,5.22050812530021E-12,-112.822875976563,-110.57740020752,0.000467356179916467,0.0605231341575011

The data before and after the error lines are perfectly ok.

17476.08,6.52454250181715E-12,-111.854499816895,-109.609031677246,0.00046727035202924,0.0605120193526557
17482.04,3.88578861020505E-12,-114.105209350586,-111.859741210938,0.000467303551829094,0.0605163187628847


17499.92,5.28003710011641E-12,-112.773628234863,-110.528160095215,0.000467389661260297,0.0605274700279817
17505.88,4.37629603159924E-12,-113.588935852051,-111.343467712402,0.000467420448688495,0.0605314570334547

Please help me to correct it! Thanks in advance

Sunny
  • 13
  • 3
  • Your data is messed up. There are some cases where there's no way of knowing what comma is a decimal separator and which comma is a number separator – Sembei Norimaki Mar 21 '23 at 09:21
  • Yes, this was an output from a device. This is the only way to save the data. Is there an option to change alternative comma to decimal? This helps in my case – Sunny Mar 21 '23 at 09:24
  • Can you be sure every number will have decimal part? Can we assume there are no integers? – Jorge Luis Mar 21 '23 at 09:28

1 Answers1

3

Since it looks like you only have real numbers, you can replace every other comma:

with (open('data.txt', 'r') as fr,
      open('out.csv', 'w') as fw):
    fw.write('x,y1,y2,y3,y4,y5\n')
    for row in fr:
        l = row.split(',')
        l = ['.'.join(i) for i in zip(l[::2], l[1::2])]
        fw.write(','.join(l))

out.csv:

x,y1,y2,y3,y4,y5
5.960464,6.65430376927058E-10,-91.7689743041992,-89.5235061645508,6.71407200919707E-05,0.00869479635485696
11.92093,5.23110578457515E-10,-92.8140640258789,-90.5685958862305,8.97218165598724E-05,0.011619072933792
17.88139,4.52214777469635E-10,-93.4465560913086,-91.2010803222656,0.000104674258706842,0.0135554304720727
23.84186,4.29985026518504E-10,-93.6654663085938,-91.4199981689453,0.00011655840052398,0.0150944397768688
29.80232,5.20411183436712E-10,-92.8365325927734,-90.591064453125,0.000128133766655091,0.0165934622941289

Read it with Pandas and check:

import pandas as pd

df = pd.read_csv('out.csv')
>>> df
           x            y1         y2         y3        y4        y5
0   5.960464  6.654304e-10 -91.768974 -89.523506  0.000067  0.008695
1  11.920930  5.231106e-10 -92.814064 -90.568596  0.000090  0.011619
2  17.881390  4.522148e-10 -93.446556 -91.201080  0.000105  0.013555
3  23.841860  4.299850e-10 -93.665466 -91.419998  0.000117  0.015094
4  29.802320  5.204112e-10 -92.836533 -90.591064  0.000128  0.016593

>>> df.dtypes
x     float64
y1    float64
y2    float64
y3    float64
y4    float64
y5    float64
dtype: object
Corralien
  • 109,409
  • 8
  • 28
  • 52
  • Thank you @Corralien . The code works. The file has more than 10k 'x' values. At line 432, an error occurred. This error again rises at some other values as well. Do you know why? Thanks in advance! – Sunny Mar 21 '23 at 10:20
  • Please update your post with line 432 (and another to verify similarity). Maybe a `try/except` can solve the problem if all the problematic lines have the same format. – Corralien Mar 21 '23 at 10:31
  • This code works as long as every number has decimal positions. If you find an integer value, you will substitute the comma you don't want to substitute. – Jorge Luis Mar 21 '23 at 10:53
  • @Sunny. What is the error? Does Python raise an exception? – Corralien Mar 21 '23 at 10:59
  • How `16433.7,08114511022965E-12` should be interpreted? – Corralien Mar 21 '23 at 11:06
  • The error is happening at the points where data is integer. The line should be interpreted as ``` 16433, 7.08114511022965E-12, -111.498962402344,-109.253494262695,0.000461181910941812,0.055 ``` and the next line is ``` 16438.96,5.80785796046257E-12,-112.359840393066,-110.114372253418,0.000461223557779535,0.0597289529135575 ``` – Sunny Mar 22 '23 at 10:09
  • I think there is no (simple) solution for this problem :-(. You said "there is an error": Is python raised an exception? Do you see ",000461181910941812.016438.96,"? – Corralien Mar 22 '23 at 13:11