1

I have a data file generated using C++ std::setw e.g.

file << std::scientific << std::setprecision(data_precision);  

for (double data : a_data)                                                                                        
   {                                                                                                                 
       file << std::setw(data_width) << data;                                                                    
   }

file << "\n";

Is it possible to read the data using python csv.reader or similar? I have tried the following:

with data as csvfile:
    fieldreader = csv.reader(csvfile) 
    next(fieldreader)                                                                                                                                                                    
    for row in fieldreader:                                                                  
       values.append(float(row[0]))                                                                              

which outputs the entire first row, indicating the whole row is stored as one entry. I have also tried a few different delimiters e.g. \t which didn't help.

Example output below:

#          z        phi               phi1          Massless 
 -16.0000000  0.0000000   9.9901854997e-01  1.0910677716e-19
 -16.0000000  0.0245437   9.9871759471e-01  1.6545142956e-05
 -16.0000000  0.0490874   9.9781493216e-01  3.3051500271e-05
 -16.0000000  0.0736311   9.9631097893e-01  4.9477653557e-05
 -16.0000000  0.0981748   9.9420658732e-01  6.5784269579e-05 
martineau
  • 119,623
  • 25
  • 170
  • 301
A. Drew
  • 37
  • 7
  • 1
    can you give an example output of your C++ code? – Nullman Jan 19 '20 at 15:10
  • Added to question – A. Drew Jan 19 '20 at 15:17
  • 1
    did you try a simple space as the delimiter? – Nullman Jan 19 '20 at 15:17
  • Yes, I get `ValueError: could not convert string to float: ` with blank afterwards, where previously I had `ValueError: could not convert string to float: ' -16.0000000 0.0000000 9.9901854997e-01 1.0910677716e-19'` – A. Drew Jan 19 '20 at 15:21
  • The data output I posted is misleading, will change – A. Drew Jan 19 '20 at 15:23
  • It's more like that with different length gaps between the columns – A. Drew Jan 19 '20 at 15:24
  • Your output is not in CSV format. CSV stands for "comma-separated values", not "a bunch of spaces-separated values". – Igor Tandetnik Jan 19 '20 at 15:28
  • Yes, but you can use csv.reader to define different delimiters – A. Drew Jan 19 '20 at 15:30
  • Well, have you defined space as a delimiter in your Python code? Though I suspect that won't help much - you are outputing three spaces in a row, which would be interpreted as three columns with empty-string values. – Igor Tandetnik Jan 19 '20 at 15:33
  • 1
    @A.Drew: Yeah, I realized the `skipinitialspace` wasn't going to work after posting the (now deleted) comment. It would only work it the fields themselves were delimited by something other than spaces. – martineau Jan 19 '20 at 15:39
  • Come use pandas :) https://pandas.pydata.org/pandas-docs/version/1.0.0/reference/api/pandas.read_csv.html#pandas.read_csv – KaiserKatze Jan 19 '20 at 16:00

2 Answers2

1

The csvfile argument to the csv.reader initializer "can be any object which supports the iterator protocol and returns a string each time its next() method is called".

This means you could read the file by defining a generator function like the one shown below to preprocess the lines of the file to make them acceptable to the csv.reader:

import csv

def preprocess(file):
    for line in file:
        yield ','.join(line.split())

values = []
with open('cppfile.txt') as file:
    fieldreader = csv.reader(preprocess(file))
    next(fieldreader)
    for row in fieldreader:
        print(f'row={row}')
        values.append(float(row[0]))

print()
print(values)

Output:

row=['-16.0000000', '0.0000000', '9.9901854997e-01', '1.0910677716e-19']
row=['-16.0000000', '0.0245437', '9.9871759471e-01', '1.6545142956e-05']
row=['-16.0000000', '0.0490874', '9.9781493216e-01', '3.3051500271e-05']
row=['-16.0000000', '0.0736311', '9.9631097893e-01', '4.9477653557e-05']
row=['-16.0000000', '0.0981748', '9.9420658732e-01', '6.5784269579e-05']

[-16.0, -16.0, -16.0, -16.0, -16.0]
martineau
  • 119,623
  • 25
  • 170
  • 301
  • This worked well, although `print(f'{row=}')` didn't work for me - thanks! – A. Drew Jan 20 '20 at 12:44
  • Sorry, using that format of an `f-string` wasn't [added](https://docs.python.org/3.8/whatsnew/3.8.html#f-strings-support-for-self-documenting-expressions-and-debugging) until Python 3.8.0 — should have known better than to use it in an answer. Fixed. – martineau Jan 20 '20 at 14:46
0

I would choose pandas, a marvelous third-party library providing high-performance, easy-to-use data structures and data analysis tools, to parse the generated files as you mentioned:

example.txt

#          z        phi               phi1          Massless 
 -16.0000000  0.0000000   9.9901854997e-01  1.0910677716e-19
 -16.0000000  0.0245437   9.9871759471e-01  1.6545142956e-05
 -16.0000000  0.0490874   9.9781493216e-01  3.3051500271e-05
 -16.0000000  0.0736311   9.9631097893e-01  4.9477653557e-05
 -16.0000000  0.0981748   9.9420658732e-01  6.5784269579e-05 

test.py

#!/usr/bin/env python
# -*- coding: utf-8 -*-

import pandas as pd

if __name__ == "__main__":
    df = pd.read_csv("test.txt", sep=r'\s+', skiprows=1, names=["z", "phi", "phi1", "Massless",])
    print(df)

After running the command as below:

python test.py

I got the following result:

      z       phi      phi1      Massless
0 -16.0  0.000000  0.999019  1.091068e-19
1 -16.0  0.024544  0.998718  1.654514e-05
2 -16.0  0.049087  0.997815  3.305150e-05
3 -16.0  0.073631  0.996311  4.947765e-05
4 -16.0  0.098175  0.994207  6.578427e-05
KaiserKatze
  • 1,521
  • 2
  • 20
  • 30