-2

I am using matlab to extract words from text files. I have several text files and I want to textscan 'AB' part of each file.

From my knowledge I know how to read specific lines from a text file, however, because I want apply the same code for all the text files in the folder, the line number is going to differ each time and I will have to change it every time.

This is what all of my text files looks like (sample):

PMID- 27401974
OWN - NLM
STAT- Publisher
DP - 2016 Jul 8
TI - North-seeking magnetotactic Gammaproteobacteria in the Southern Hemisphere.
LID - AEM.01545-16 [pii]
AB - Magnetotactic bacteria (MTB) comprise a phylogenetically diverse group of prokaryotes capable of orienting and navigating along magnetic field lines. Under oxic conditions, MTB in natural environments in the Northern Hemisphere generally display north-seeking (NS) polarity, swimming parallel to the Earth's magnetic field lines, while those in the Southern Hemisphere generally swim antiparallel to magnetic field lines (south-seeking (SS) polarity).
CI - Copyright (c) 2016, American Society for Microbiology. All Rights Reserved.
FAU - Leao, Pedro
AU - Leao P

Thank you in advance!

tamkrit
  • 27
  • 5

1 Answers1

2

I suppose regexp is your friend:

fid = fopen('/path/to/file.txt');
line = fgetl(fid);
target = '';
found_ab = false;
while ischar(line)
    line = strtrim(line); % remove trailing white space
    if ~found_ab        
        res = regexp(line, '^AB\s*-?\s*(\S.*)$', 'tokens', 'once');
        if ~isempty(res)
            target = res{1};
            found_ab = true;
        end
    else
        % we found an "AB -" line, we see if there are multiple lines here
        res = regexp(line, '^[A-Z]+\s-\s'); 
        if ~ismepty(res)
            % we reached the end of AB - lines
            break;
        end
        % there are multiple text lines for "AB - "
        target = [target, line];
    end
    line = fgetl(fid);
end
fclose(fid);
Shai
  • 111,146
  • 38
  • 238
  • 371