0

Let's say I have CSV data as note below; let's call this original.csv:

name,value1,value2
firstname,34326408129478932874,553
secondname_a_very_long_one,65,123987
thirdname_medium,9686509933423,33

Basically, it's either single word text (no space separation, so no need for quoting) or numbers (here integers, but could be floats with decimals or scientific 1e-5 notation) - and there is no expectation that a comma could appear somewhere (outside of being a delimiter), so no need for special handling of the quoting of comma either ...

So, to ease the strain on my eyes when I view this .csv file in a text editor, I would like to format it with fixed width - space padded (left or right padding choice made per column, and separately for header row); note that the file is still comma delimited as a data format, the fixed width is just for viewing in editor - this is how I would like it to look like, let's call this tmpfw.csv:

name                      , value1              , value2
firstname                 , 34326408129478932874, 553
secondname_a_very_long_one,                   65, 123987
thirdname_medium          ,        9686509933423, 33

So here, the heading row is all left-aligned (right-padded with spaces); columns name and value2 are also left-aligned (right-padded with spaces); and column value1 is right-aligned (right-padded with spaces). The columns are sized (in characters) according to the largest string length of the data in that column; and there is an extra space as visual delimiter after the commas.

Of course, if I want to use this data in Python properly, I'd have to "strip" it first - but I don't mind, since as I mentioned, the data is such that I don't have to worry about quoting issues; here is a Python example of how I could use tmpfw.csv - let's call it test.py:

import sys
import csv
import pprint

with open('tmpfw.csv', newline='') as csvfile:
  my_csv = csv.reader(csvfile)
  my_csv_list = list(my_csv)

my_csv_list_stripped = [list(map(str.strip, irow)) for irow in my_csv_list]

print("\nmy_csv_list:\n")
pprint.pprint( my_csv_list )
print("\nmy_csv_list_stripped:\n")
pprint.pprint( my_csv_list_stripped )

#print("\nreprint stripped as csv:\n")
#csvwriter = csv.writer(sys.stdout) # just print out to terminal
#csvwriter.writerows(my_csv_list_stripped)

This is what I get printed:

$ python3 test.py

my_csv_list:

[['name                      ', ' value1              ', ' value2'],
 ['firstname                 ', ' 34326408129478932874', ' 553'],
 ['secondname_a_very_long_one', '                   65', ' 123987'],
 ['thirdname_medium          ', '        9686509933423', ' 33']]

my_csv_list_stripped:

[['name', 'value1', 'value2'],
 ['firstname', '34326408129478932874', '553'],
 ['secondname_a_very_long_one', '65', '123987'],
 ['thirdname_medium', '9686509933423', '33']]

I can use this as a base to convert the numbers to int later - so, I can use such a fixed-width csv fine, all is good ...

So, my question is: let's say I have the original.csv - what would be the easiest way in Python to obtain a "fixed-width formatted" tmpfw.csv? Do csv or pandas or other libraries have facilities for exporting a CSV format like this?

sdbbs
  • 4,270
  • 5
  • 32
  • 87
  • 1
    Do you need to also edit the CSV while you're looking at it? If not, I wouldn't mess with the CSV format itself; instead just make it look really pretty (and different), https://csvkit.readthedocs.io/en/latest/scripts/csvlook.html to make a table. If you do need to edit it by hand, I'd still recommend not changing the structure and get a plugin like https://marketplace.visualstudio.com/items?itemName=mechatroner.rainbow-csv. – Zach Young Mar 22 '23 at 13:31
  • 1
    If you find you must edit the CSV, add the spaces **after** the comma, before the next cell of data. That way you can use the `skipinitialspace=True` option from Python's csv.reader to get it "back to normal"—no special processing necessary. – Zach Young Mar 22 '23 at 13:37
  • Also, somewhat distantly related, but for reference: https://stackoverflow.com/questions/18911984/align-column-names-with-data-in-a-csv-file-using-python – sdbbs Mar 22 '23 at 13:41

1 Answers1

1

Sure – compute what the maximum length of each column is, then .ljust() them accordingly when printing:

import csv
import io

# pretend reading csv from file
csv_data = list(csv.reader(io.StringIO("""
name,value1,value2
firstname,34326408129478932874,553
secondname_a_very_long_one,65,123987
thirdname_medium,9686509933423,33
""".strip())))

n_cols = len(csv_data[0])
col_widths = [max(len(row[i]) for row in csv_data) for i in range(n_cols)]

for row in csv_data:
    print(', '.join(val.ljust(width) for val, width in zip(row, col_widths)))

This prints out

name                      , value1              , value2
firstname                 , 34326408129478932874, 553   
secondname_a_very_long_one, 65                  , 123987
thirdname_medium          , 9686509933423       , 33    

and naturally you could open a file and print(..., file=...) instead.

AKX
  • 152,115
  • 15
  • 115
  • 172
  • Awesome, thanks @AKX - that works great! +1 for the "surprise" - I kind of thought "plain Python" solution would be a lot more complicated/verbose, but your solution is quite readable. – sdbbs Mar 22 '23 at 13:39