How to read fixed-width data?

Question

data looks like

212253820000025000.00000002500.00000000375.00111120211105202117
212456960000000750.00000000075.00000000011.25111120211102202117
212387470000010000.00000001000.00000000150.00111120211105202117

need to add separator like

21225382,0000025000.00,000002500.00,000000375.00,11112021,11052021,17
21245696,0000000750.00,000000075.00,000000011.25,11112021,11022021,17
21238747,0000010000.00,000001000.00,000000150.00,11112021,11052021,17

The CSV file length is high nearly 20000 rows are there is there any possibility to do

first 8 numbers are a first column,after that the 12 numbers are second column, after that the 11 numbers are third column, after that the 11 numbers are fourth column, after that the 8 numbers are fifth column,after that the 8 numbers are sixth column, every column need a separator — Suri.B, Apr 19 '22 at 07:27
I don't know any ready to use tools for doing this... think you need a script to read your csv, place the comma and export it ... which technology your working with? — toffler, Apr 19 '22 at 08:50
If you are working with SQL, is data stored in a server or file-based such as SQLite ? I am wondering if you could do it directly in a server-stored function with `SUBSTRING(x,y)` or similar, concatenated. Then just call that function for each row, server-side ? — MyICQ, Apr 19 '22 at 11:09
How are you exporting your CSV currently? Please [edit] your question to **show your code**, ideally a [mcve] that fully demonstrates how the problem arose. See [ask]. — dbc, Apr 19 '22 at 13:04
Also, please correct your tags to indicate the technologies you are actually using. You have tagged it [tag:qgraphicsview] which is a *a class from the Qt toolkit which provides a widget for displaying the contents of a QGraphicsScene*, [tag:csvhelper] which is a *.NET library that allows for easy reading and writing of comma-separated values (CSV) files* and [tag:read.csv] which is a *command in R to read a data file with a “comma separated values” (csv) format* These have nothing to do with each other, so please re-tag correctly. See https://stackoverflow.com/help/tagging. — dbc, Apr 19 '22 at 13:05

score 1 · Answer 1 · answered Apr 19 '22 at 18:14

This question is generally about reading "fixed width data".

If you're stuck with this data, you'll need to parse it line by line then column by column. I'll show you how to do this with Python.

First off, the columns you counted off in the comment do not match your sample output. You seemed to have omitted the last column with a count of 2 characters.

You'll need accurate column widths to perform the task. I took your sample data and counted the columns for you and got these numbers:

8, 13, 12, 12, 8, 8, 2

So, we'll read the input data line by line, and for every line we'll:

Read 8 chars and save it as a column, then 13 chars and save it as a column, then 12 chars, etc... till we've read all the specified column widths
As we move through the line we'll keep track of our position with the variables beg and end to denote where a column begins (inclusive) and where it ends (exclusive)
The end of the first column becomes the beginning of the next, and so on down the line
We'll store those columns in a list (array) that is the new row
At the end of the line we'll save the new row to a list of all the rows
Then, we'll repeat the process for the next line

Here's how this looks in Python:

import pprint

Col_widths = [8, 13, 12, 12, 8, 8, 2]

all_rows = []
with open("data.txt") as in_file:
    for line in in_file:
        row = []
        beg = 0
        for width in Col_widths:
            end = beg + width
            col = line[beg:end]
            row.append(col)
            beg = end
        all_rows.append(row)

pprint.pprint(all_rows, width=100)

all_rows is just a list of lists of text:

[['21225382', '0000025000.00', '000002500.00', '000000375.00', '11112021', '11052021', '17'],
 ['21245696', '0000000750.00', '000000075.00', '000000011.25', '11112021', '11022021', '17'],
 ['21238747', '0000010000.00', '000001000.00', '000000150.00', '11112021', '11052021', '17']]

With this approach, if you miscounted the column width or the number of columns you can easily modify the Column_widths to match your data.

From here we'll use Python's CSV module to make sure the CSV file is written correctly:

import csv

with open("data.csv", "w", newline="") as out_file:
    writer = csv.writer(out_file)
    writer.writerows(all_rows)

and my data.csv file looks like:

21225382,0000025000.00,000002500.00,000000375.00,11112021,11052021,17
21245696,0000000750.00,000000075.00,000000011.25,11112021,11022021,17
21238747,0000010000.00,000001000.00,000000150.00,11112021,11052021,17

score 0 · Answer 2 · answered Apr 19 '22 at 18:41

If you have access to the command-line tool awk, you can fix your data like the following:

substr() gives a portion of the string $0, which is the entire line
you start at char 1 then specify the width of your first column, 8
for the next substr(), you again use $0, you start at 9 (1+8 from the last substr), and give it the second column's width, 13
and repeat for each column, starting at "the start of the last column plus the last column's width"

#!/bin/sh

# Col_widths = [8, 13, 12, 12, 8, 8, 2]

awk '{print substr($0,1,8) "," substr($0,9,13) "," substr($0,22,12) "," substr($0,34,12) "," substr($0,46,8) "," substr($0,54,8) "," substr($0,62,2)}' data.txt > data.csv

How to read fixed-width data?

2 Answers2