1

I have a CSV file with possibly missing data, and the data is both chars and numbers. What is the best way to deal with this?

Amro
  • 123,847
  • 25
  • 243
  • 454
Trup
  • 1,635
  • 13
  • 27
  • 40

2 Answers2

5

Here is an example:

file.csv

name,age,gender
aaa,20,m
bbb,25,
ccc,,m
ddd,40,f

readMyCSV.m

fid = fopen('file.csv','rt');
C = textscan(fid, '%s%f%s', 'Delimiter',',', 'HeaderLines',1, 'EmptyValue',NaN);
fclose(fid);
[name,age,gender] = deal(C{:});

The data read:

>> [name num2cell(age) gender]
ans = 
    'aaa'    [ 20]    'm'
    'bbb'    [ 25]    '' 
    'ccc'    [NaN]    'm'
    'ddd'    [ 40]    'f'
Amro
  • 123,847
  • 25
  • 243
  • 454
  • I tried it, for some reason it only reads 1 line, and it makes a cell array out of it. How can I make it run everything, and then have a matrix with all the data? – Trup Aug 09 '11 at 15:12
  • Also, how do I get a double from a cell? – Trup Aug 09 '11 at 15:27
  • @Trup: for the sample file above it works just fine. If you have another format for the CSV, then please post it in your question. Your question was general, and the only way to answer it was with a made-up example... – Amro Aug 09 '11 at 15:54
1

What @Amro has suggested is the most common way to read a csv file with missing values. In you case since your data types are both characters and numbers you should provide the proper format of each column. So your function should look something like this:

C = textscan(fid, '%d32 %c %d8 %d8 %d32 %f32 %f %s ','HeaderLines', 1, 'Delimiter', ',');

for more data formats look here: http://www.mathworks.com/help/techdoc/ref/textscan.html

BenMorel
  • 34,448
  • 50
  • 182
  • 322
A. K.
  • 34,395
  • 15
  • 52
  • 89
  • 2
    The problem of types like int8,int32,etc... is that they don't support the `NaN` values (get replaced by zeros). See this related question: http://stackoverflow.com/questions/6657963/textscan-in-matlab-read-null-value-as-nan/6658121#6658121 – Amro Aug 09 '11 at 22:01