0

I am trying to read a large .csv file to .dat using textscan. The file contains 124,861 rows including a header row and 130 columns. The data in the file is mixed: strings, doubles, missing values, etc. The .csv data looks like this:

example.csv

I use the following code:

fid = fopen('example.csv'); result = textscan(fid,['%s', '%d', '%s', repmat('%f', [1,12]), '%f', '%f', '%f', repmat('%f', [1,103]), '%s', '%s', '%d', '%s', '%s', '%s', '%d', '%d', '%f'],'HeaderLines', 1, 'Delimiter', ',');

The code yields a result.dat file with 205,000 rows, not 124,861 rows. It seems that matlab arbitrarily adds more rows. The funny thing is that these rows are populated with some data and I do not even see it in my original .csv file. Does anyone have any ideas why this is happening?

  • This is most likely happening, because you got some commata in your string fields but using it as a delimiter. that way you get an extra column, but since you are telling matlab, how many columns it should create, it will instead create an extra row. so try using a texteditor like notepad++ to count the amount of commata that's in your data and if it's not 129*124861 then this is your bug. – Max Oct 28 '17 at 19:52
  • Thanks! Any suggestions for the notepad++ analog for iMac? – user8846252 Oct 29 '17 at 23:38
  • I can't recommend you a specific one, but I think you can use pretty much any texteditor to do this. There are just two easy requirements: 1. open .csv files 2. having the functionality to count the appearances of a character – Max Oct 30 '17 at 08:42
  • You were right about counting the commas. In fact, Matlab was doing its job correctly, the problem was in Excel. For some reason, when I open the .csv file in Excel in Windows it showed me only 124,861 rows. If I perform the same operation in Mac OS, the .csv file opens with 205,000 rows. I also counted the number of commas using TextEdit, and it was consistent with 205,000 rows. Thanks for making me double-check the initial data. – user8846252 Oct 30 '17 at 16:15

0 Answers0