0

I have a file in the following format:

**400**,**100**::400,descendsFrom,**76**::0
**400**,**119**::400,descendsFrom,**35**::0
**400**,**4**::400,descendsFrom,**45**::0
...
...

Now I need to read, the part only in the bold. I've written the following formatspec:

formatspec = '%d,%d::%*d,%*s,%d::%*d\n';
data = textscan(fileID, formatspec);

It doesn't seem to work. Can someone tell me what's wrong? I also need to know how to 'not use' delimiter, and how to proceed if I want to express the exact way my file is written in, for example in the case above.

Ander Biguri
  • 35,140
  • 11
  • 74
  • 120
Prateek
  • 41
  • 1
  • 5
  • 2
    Are you trying to mix code blocks with Markdown-style formatting to make text bold? Are the double asterisks actually in your file? If they're not, edit your question to remove them – code block are formatted exactly as is. You''l need to use words to describe which elements you want. – horchler Feb 05 '15 at 16:44

3 Answers3

0

Your Delimiter is "," you should first delimit it then maybe run a regex. Here is how I would go about it:

fileID = fopen('file.csv');
D = textscan(fileID,'%s %s %s %s ','Delimiter',','); %read everything as strings

column1 = regexprep(D{1},'*','')
column2 = regexprep(D{2},{'*',':'},{'',''})
column3 = D{3}
column4 = regexprep(D{4},{'*',':'},{'',''})

This should generate your 4 columns which you can then combine I believe the Delimiter can only be one symbol. The more efficient way is to directly do regexprep on your entire line, which would generate:

 test = '**400**,**4**::400,descendsFrom,**45**::0'
 test = regexprep(test,{'*',':'},{'',''})

 >> test = 400,4400,descendsFrom,450
GameOfThrows
  • 4,510
  • 2
  • 27
  • 44
  • Thanks for answering. Yeah, I mean I can always just put all into a string and then separate, but I need to know 'how to exactly specify the format'. In this case, maybe okay there is a ',' delimiter, but what if there isn't one delimiter, and I have a file like the following: 400,100::400;descendsFrom:76::0 Then I want to know how to exactly specify the format, and why what I've written doesn't work. – Prateek Feb 05 '15 at 16:48
  • Hmmmm I don't think you can do your multiple delimiter in textscan, you could do a sprintf on your string (using your formatting code)before using textscan if you want, but I don't know how to specify multiple specific delimiters in textscan... – GameOfThrows Feb 05 '15 at 16:52
  • Bless you for formatting your code. Now your posts actually look nice. – rayryeng Feb 05 '15 at 16:58
  • Okay, thanks a lot. Can you tell me your way of using sprintf? How to do this that way? – Prateek Feb 05 '15 at 16:58
  • may I redirect you to other SO threads such as http://stackoverflow.com/questions/5607597/matlab-sprintf-formatting there are many threads on this, you could do a bit of research – GameOfThrows Feb 05 '15 at 17:02
0

EDITED

A possible problem is with the %s part of the formatspec variable. Because %s is an arbitrary string therefore the descendsFrom,76::0 part of the line is ordered to this string. So with the formatspec '%d,%d::%d,%s,%d::%d\n' you will get the following cells form the first line:

400 100 400 'descendsFrom,76::0'

To solve this problem you have two possibilities:

formatspec = %d,%d::%d,descendsFrom,%d::%d\n

OR

formatspec = %d,%d::%d,%12s,%d::%d\n

In the first case the 'descendForm' string has to be contained by each row (as in your example). In the second case the string can be changed but its length must be 12.

Tibor Takács
  • 3,535
  • 1
  • 20
  • 23
  • * in textscan formats indicates that a field should be skipped, they're not superfluous. – nkjt Feb 05 '15 at 19:07
0

You can do multiple delimiters in textscan, they need to be supplied as a cell array of strings. You don't need the end of line character in the format, and you need to set 'MultipleDelimsAsOne'. Don't have MATLAB to hand but something along these lines should work:

formatspec = '%d %d %*d %*s %d %*d';
data = textscan(fileID, formatspec,'Delimiter',{',',':'},'MultipleDelimsAsOne',1);

If you want to return it as a matrix of numbers not a cell array, try adding also the option 'CollectOutput',1

nkjt
  • 7,825
  • 9
  • 22
  • 28