0

I have many text files containing data like this:

{'photo': {'people': {'haspeople': 0}, 'dateuploaded': '1264588417', 'originalformat': 'jpg', 'tags': {'tag': [{'machine_tag': 0, 'author': '14988396@N00', 'text': 'bokehlicious', 'raw': 'Bokehlicious', 'authorname': 'chachahavana', 'id': '1921934-4308203423-4944107'}],[{'machine_tag': 0, 'author': '14988396@N00', 'text': 'bokehlicious2', 'raw': 'Bokehlicious2', 'authorname': 'chachahavana', 'id': '1921934-4308203423-4944107'}], 'stat': 'ok'}

This was supposed to be in json format, but there was some issue which led it to be saved like this.

Now, I want to extract specific strings from these files. For example, I want the following: text bokehlicious, bokehlicious2 and so on as a cell array for this file.

I tried using textscan, but this does not have any proper format and so on, so I'd like to know how to extract all the strings after all occurrences of 'text' in the textfile.

Could you give any inputs on how to do this? Thanks

1 Answers1

0

Try to extract it with regexp.

fid = fopen('...yourpath\textFile.txt','r');
str = fread(fid,inf,'uint8=>char')';
str = strrep(str,'''','');
textStr = regexp(str,'(?<=text:\s*)\w*','match');

If you for example want the 'id' ou use regexp(str,'(?<=id:\s*)\w*','match');instead

NLindros
  • 1,683
  • 15
  • 15
  • Thanks a lot :). it works. A minor change though - str=str' should be there for regexp to work on a row. – Sharath Chandra May 15 '14 at 07:55
  • Ohk :). Btw one other doubt: What if I have a situation like this - I have a token 'username: Sara Povin,' In this case, how can I get the entire Sara Povin stored after username? – Sharath Chandra May 15 '14 at 08:54
  • 1
    You can capture whitespace characters with \s so try regexp(str,'(?<=username:\s*)[\w\s]*','match') – NLindros May 15 '14 at 11:14
  • thanks a lot for your inputs. Regarding the whitespace, is there a way to convert it into '_', while writing it into the string? The issue is that some of the text is phrases and some words. So, to distinguish between them, I want to have _ in phrases. Could you tell me how to do this? Thanks :) – Sharath Chandra May 29 '14 at 01:47