0

When reading a large csv file Matlab doesn't recognize ||,|| as a proper delimiter as input argument for textscan. The data is as follows (simplified):

||X||,||Y||,||Z|| (header)
||1||,||2||,||4||
||4||,||4||,||3||

etc.

I use data = textscan(fileID,formatSpec,'Delimiter',','); to read in the data with some format spec '%f %f %f'.

My rubber band solution has been to use 010 editor to replace all '||' with '', making it a proper csv file for matlab, but due to the size of the document (6M lines with approx 35 fields) and the frequency of new documents this is hardly a great solution.

Does anyone know a proper way to import such a file?

Max
  • 1
  • I'd say find a way to create your files as a COMMA separated file, not a pie pie comma pipe pipe-separated file. That being said, you can probably get the two pipes as a string and separate them out that way, i.e. `%s %f %s` as format spec. – Adriaan Sep 11 '15 at 13:10
  • The creation of the file is not up to me, so I can't make it a true comma separated file. – Max Sep 12 '15 at 10:45

1 Answers1

3

You should be able to include it in the format specifier:

data = textscan(fid, '||%f||,||%f||,||%f||', 'headerlines', 1)

and then just leave out the delimiter.

Edit (Following on from comments)

If you are trying to read in strings, the trick is to get it to read in strings without the | character. This is done using %[^|], like this:

data = textscan(fid, '|| %[^|] ||,|| %[^|] ||,|| %[^|] ||', 'headerlines', 1)
RPM
  • 1,704
  • 12
  • 15
  • doesnt it place all numbers in 1 cell now? i.e. `textscan(fid,'||%f||', 'delimiter', ',')` – Adriaan Sep 11 '15 at 13:29
  • 1
    Well, yes, if you do `textscan(fid,'||%f||', 'delimiter', ',')` you get it all in one cell. But using what I did I get the results in three cells - just checked it. – RPM Sep 11 '15 at 13:34
  • Ah, I do you mean he might want them all in one cell? – RPM Sep 11 '15 at 13:37
  • No, for sure not. I thought your solution did that, since you did not specify the delimiter. – Adriaan Sep 11 '15 at 13:37
  • Thank you for the response! It works perfectly with numbers, but when it tries to read a string it goes wrong. The two entries `||CFF1368176799564369A951||,||2013-05-02||` get read as `CFF1368176799564369A951||,||2013-05-02`. After this textscan stops reading. Any suggestions? I've tried messing with %[^|] in the format spec without succes. – Max Sep 12 '15 at 10:52
  • If you use `%s` instead of `%f` you can read in a string. But this would then read in all your numbers as strings too. If it's unclear what the datatype is before you start I would say read in the data as a string and convert it to a number/date/whatever subsequently. – RPM Sep 13 '15 at 08:11
  • When trying to read all data as strings a similar error occurs. When specifying ||%s||, Matlab recognizes the fist two '|', but can't find the end and never delimits. So `||aabb||,||bbcc||` reads as `aabb,||,||bbcc||` into a single field, instead of the desired `aabb` `bbcc` in two fields. Is there a way to give the || higher priority as delimiter in the reading of the String that you know of? Thanks anyway for all the suggestions! – Max Sep 13 '15 at 09:50
  • Reading everything as a string and using `strrep` to remove the ||'s fixed it. Still seems more complex than should be needed, but thanks for the ideas! – Max Sep 13 '15 at 15:25
  • Edited to show how you can read in strings avoiding particular characters – RPM Sep 14 '15 at 07:24