1

I have a dataframe (size: 71363 x 7) of float values I want to extract to csv:

df.to_csv('title', sep ='\t', header=None, index=False)

However, as some cells have more or less decimals than others, the .csv file looks something like:

GRID        111163.0    100000.0    7.931   -0.061  1.798   100010.0
GRID        111164.0    100000.0    7.936   -0.057  1.8 100010.0
GRID        111165.0    100000.0    7.893   -0.025  1.781   100010.0
GRID        119347.0    100000.0    7.692   0.631   1.703   100010.0
GRID        119348.0    100000.0    7.686   0.635   1.7 100010.0
GRID        119385.0    100000.0    7.68    0.651   1.698   100010.0

Instead of something clean like :

GRID        111163.0    100000.0    7.931   -0.061  1.798   100010.0
GRID        111164.0    100000.0    7.936   -0.057  1.8     100010.0
GRID        111165.0    100000.0    7.893   -0.025  1.781   100010.0
GRID        119347.0    100000.0    7.692   0.631   1.703   100010.0
GRID        119348.0    100000.0    7.686   0.635   1.7     100010.0
GRID        119385.0    100000.0    7.68    0.651   1.698   100010.0

I tried to force a "round" to my columns :

grid_rep_coq[[1,2,3,4,5,6]] = round(grid_rep_coq[[1,2,3,4,5,6]], 3)

But any value of less than 3 decimals doesn't change and I therefore have the same issue.

I couldn't find any documentation on forcing a tab separator.

Would anyone know how to do this?

Thank you in advance!

elle.delle
  • 328
  • 3
  • 15
  • 1
    the thing you're looking for is a fixed width file, it looks like this may not be possible out the box but here is a solution https://stackoverflow.com/a/63905022/5125264 – Matt Jun 30 '21 at 14:22

1 Answers1

1

The data can be read in using a standard CSV reader with the delimiter set as a space and skipinitialspace enabled.

If you want to produce a better structured output file, you can use a little helper function to calculate the maximum width of each column and then apply that to all items of data:

import csv

def write_cols(data):
    col_spacer = "  "       # added between columns
    widths = [max(len(str(item)) for item in row) for row in zip(*data)]
    return '\n'.join(col_spacer.join(f"{col:{widths[index]}}" for index, col in enumerate(row)) for row in data)

with open('input.txt', newline='') as f_input:
    data = list(csv.reader(f_input, delimiter=' ', skipinitialspace=True))

with open('output.txt', 'w') as f_output:
    f_output.write(write_cols(data))

This would give you an output.txt file as:

GRID  111163.0  100000.0  7.931  -0.061  1.798  100010.0
GRID  111164.0  100000.0  7.936  -0.057  1.8    100010.0
GRID  111165.0  100000.0  7.893  -0.025  1.781  100010.0
GRID  119347.0  100000.0  7.692  0.631   1.703  100010.0
GRID  119348.0  100000.0  7.686  0.635   1.7    100010.0
GRID  119385.0  100000.0  7.68   0.651   1.698  100010.0

Pandas can also load this in a similar way:

import pandas as pd

df = pd.read_csv('input.txt', skipinitialspace=True, delimiter=' ', names=['GRID', 'A', 'B', 'C', 'D'], index_col=False)
print(df)

Giving:

   GRID         A         B      C      D
0  GRID  111163.0  100000.0  7.931 -0.061
1  GRID  111164.0  100000.0  7.936 -0.057
2  GRID  111165.0  100000.0  7.893 -0.025
3  GRID  119347.0  100000.0  7.692  0.631
4  GRID  119348.0  100000.0  7.686  0.635
5  GRID  119385.0  100000.0  7.680  0.651
Martin Evans
  • 45,791
  • 17
  • 81
  • 97
  • Thanks Martin :) Sadly I can't use the skipinitialspace, to begin with, as the dataframe is extracted from a complex input file full of different tables of different sizes but it's always a good tool to know for the future! – elle.delle Jul 01 '21 at 13:02