Loading text data in Octave with specific format

Question

I have a data set that I would like to store and be able to load in Octave

18.0   8   307.0      130.0      3504.      12.0   70  1    "chevrolet chevelle malibu"
15.0   8   350.0      165.0      3693.      11.5   70  1    "buick skylark 320"
18.0   8   318.0      150.0      3436.      11.0   70  1    "plymouth satellite"
16.0   8   304.0      150.0      3433.      12.0   70  1    "amc rebel sst"
17.0   8   302.0      140.0      3449.      10.5   70  1    "ford torino"
15.0   8   429.0      198.0      4341.      10.0   70  1    "ford galaxie 500"
14.0   8   454.0      220.0      4354.       9.0   70  1    "chevrolet impala"
14.0   8   440.0      215.0      4312.       8.5   70  1    "plymouth fury iii"
14.0   8   455.0      225.0      4425.      10.0   70  1    "pontiac catalina"
15.0   8   390.0      190.0      3850.       8.5   70  1    "amc ambassador dpl"

It does not work immediately when I try to use:

data = load('auto.txt')

Is there a way to load from a text files with the given format or do I need to convert it to e.g

18.0,8,307.0,130.0,3504.0,12.0,70,1
...

EDIT: Deleting the last row and fixing the 'half' number e.g. 3504. -> 3504.0 and then used:

data = load('-ascii','autocleaned.txt');

Loaded the data as wanted in to a matrix in Octave.

Since all the data is in fixed width columns (except the last strings), you should be able to read it line by line, using `fscanf` to decode the line. In effect you would be reading it with the same record specifier that you would use to write it (in C or Fortran), or with `fprintf`. — hpaulj, Aug 06 '14 at 02:22
I've figured out how to load it into `Python` using its `csv` and `numpy` modules, and then transfer it to `Octave` via a `.mat` file. The result was a `1x10 struct array` with 9 fields - 8 numeric and 1 string. I could post it as an answer if you want. — hpaulj, Aug 06 '14 at 07:25
@hpaulj Interesting to learn about more low level ways to do this. For the time being cleaning up the input file and using load was enough for me. But there are just cases where you want a bit more control. Here more specialized ways might come in handy. — user317706, Aug 06 '14 at 10:02
Digging further in the Octave docs, I see that it does have `dlmread`, `csvread`, `fileread`, `textread`, `textscan`. All except `dlmread` are interpreted and can be read with `type`. — hpaulj, Aug 10 '14 at 02:16

ShaneQful · Accepted Answer · 2014-08-05T23:10:32.003

load is usually meant for loading octave and matlab binary files but can be used for loading textual data like yours. You can load your data using the "-ascii" option but you'd have to reformat your file slightly before putting it into load even with the "-ascii" option enabled. Use a consistent column separator ie. just a tab or a comma, use full numbers not 3850. and don't use strings.

Then you can do something like this to get it to work

DATA = load("-ascii", "auto.txt");

hpaulj · Answer 2 · 2014-08-11T01:08:07.760

If the final string field is removed from each line, the file can be read with:

filename='stack25148040_1.txt'
fid = fopen(filename, 'r');
[x, count] = fscanf(fid, '%f', [10, Inf])
endif
fclose(fid);

Alternatively the whole file could read in as one column and reshaped.

I haven't figured out how to read both the numeric fields and the string field. For that I've had to fall back on Python with more general purpose file reading tools.

Here is a Python script that reads the file, creates a numpy structured array, writes that to a .mat file, which Octave can then read:

import csv
import numpy as np

data=[]
with open('stack25148040.txt','rb') as f:
    r = csv.reader(f, delimiter=' ')
    # csv handles quoted strings with white space
    for l in r:
        # remove empty strings from the split on ' '
        data.append([x for x in l if x])
print data[0]
for dd in data:
    # convert 8 of the strings (per line) to float
    dd[:]=[float(d) for d in dd[:8]]+dd[-1:]

data=data[:-1]  # remove empty last line
print data[0]
print
# make a structured array, with numbers and a string
dt=np.dtype("f8,i4,f8,f8,f8,f8,i4,i4,|S25")
A=np.array([tuple(d) for d in data],dtype=dt)
print A
from scipy.io import savemat
savemat('stack25148040.mat',{'A':A})

In Octave this could read with

load stack25148040.mat
A
# A = 1x10 struct array containing the fields:
#    f0 f1 ... f8

A.f8  # string field
A(1)  # 1st row
#  scalar structure containing the fields:
#   f0 =  18
#   f1 = 8
...
#   f8 = chevrolet chevelle malibu

Newer Octave (3.8) has an importdata function. It handles the original data file without any extra arguments. It returns a structure with 2 fields

x.data is a (10,11) matrix. x.data(:,1:8) is the desire numerical data. x.data(:,9:11) is a mix of NA and random numbers. The NA stand in for the words at the end of the lines. x.textdata is a (24,1) cell with those words. The quoted string s could be reassembled from those words, using the NA and quotes to determine how many words belong to which line.

To read the numeric data it uses dlmread. Since the rest of importdata is written in Octave, it could be used as the starting point for a custom function that handles the string data properly.

dlmread ('stack25148040.txt')(:,1:8)
importread ('stack25148040.txt').data(:,1:8)
textread ('stack25148040.txt','')(:,1:8)

score 2 · Answer 3 · edited Jan 21 '19 at 16:36

2

https://octave.org/doc/v4.0.0/Simple-File-I_002fO.html

Try this,

data = importdata('Auto.data')

edited Jan 21 '19 at 16:36

Zoe

27,060
21
118
148

answered Jan 20 '19 at 18:12

Engineering Locha

21
2

Loading text data in Octave with specific format

3 Answers3

Linked