0

Given below is my text inside a text file:

<DOC>
<DOCNO>annotations/01/1515.eng</DOCNO>
<TITLE>Yacare Ibera</TITLE>
<DESCRIPTION>an alligator in the water;</DESCRIPTION>
<NOTES></NOTES>
<LOCATION>Corrientes, Argentina</LOCATION>
<DATE>August 2002</DATE>
<IMAGE>images/01/1515.jpg</IMAGE>
<THUMBNAIL>thumbnails/01/1515.jpg</THUMBNAIL>
</DOC>

How to split the words inside it a store in a single variable, like

x = 'annotations' '1515.eng' 'Yacare' ...and so on?

peterh
  • 11,875
  • 18
  • 85
  • 108
user3416063
  • 155
  • 1
  • 11
  • Have you even tried something before asking? http://stackoverflow.com/help/how-to-ask Also regexp might help you. Please check it in matlab's help. – hyamanieu Mar 28 '14 at 14:56
  • Something like [this solution](http://stackoverflow.com/questions/5867093/regexp-for-html-tags-with-matlab) may help you get started at least. – Steve Osborne Mar 28 '14 at 16:34

1 Answers1

1

So you have two steps. First is to extract string between tags. Second is to split the extracted string using delimiters. I assume that the delimiters are / and (space). I also assume that your string is loaded from some file using importdata function.

Then

% load string from a file
STR = importdata('testin');

% extract string between tags
B = regexprep(STR, '<.*?>','');

% split each string by delimiters and add to C
C = [];
for i=1:length(B)
    if ~isempty(B{i})
        C = [C strsplit(B{i}, {'/', ' '})];
    end
end
ysakamoto
  • 2,512
  • 1
  • 16
  • 22
  • Sir, i am trying to use importdata for multiple files but is not accessing all files but reads just one file, how can i use multiple text files for this – user3416063 Mar 31 '14 at 14:50
  • you just need to call `importdata` for each file. – ysakamoto Mar 31 '14 at 18:05
  • i have 122 text files, how i am suppose to call import data for each file. i gave path for each file using for loop, but it doesn't work. – user3416063 Apr 01 '14 at 08:35
  • means it only reads one file. I wrote this command--sdirectory = `'C:\Users\anurag\Desktop\Animals\Annotations\';` `textfiles = dir([sdirectory '/*.eng']);` `for k = 1:length(textfiles)` `file = [sdirectory '/' textfiles(k).name];` `STR = importdata(file);` `B = regexprep(STR, '<.*?>','');` `C = [];` `for i=1:length(B)` ` if ~isempty(B{i})` ` C = [C strsplit(B{i}, {'/', ' '})];` ` end` `end` `end` – user3416063 Apr 01 '14 at 08:54
  • Sir, i have provided the code. Now how to use importdata over all files – user3416063 Apr 01 '14 at 15:21
  • i want to split strings in each text file so that i could check the occurrence of my query words in the string. Example- given all text files, find which text file contains words from query string. – user3416063 Apr 01 '14 at 19:35
  • Please post it as another question. – ysakamoto Apr 01 '14 at 21:23