Missing information after reading text file with textscan (Matlab)

Question

I am trying to read a comma delimited text file but there is information missing on my results.

The following is an extract of my text file:

number,datetime start,datetime arrive
4027,25/03/2016 11:20,25/03/2016 11:20
4000,25/03/2016 11:20,25/03/2016 11:20
4027,25/03/2016 11:21,25/03/2016 11:21
4000,25/03/2016 11:21,25/03/2016 11:21

My code:

file = fopen('myfile.txt');
TextCell=textscan(file,'%s');
Text=TextCell{1}; 
fclose(file);

containsStr = ~cellfun('isempty',strfind(Text,'4027')); 
FilteredText=Text(containsStr); % filters the strings that contain 4027

Results obtained:

4027,25/03/2016
4027,25/03/2016

Results expected:

4027,25/03/2016 11:20,25/03/2016 11:20
4027,25/03/2016 11:21,25/03/2016 11:21

Where is my mistake?

in that case we can use `table` and `datetime` objects. Let me post a solution — Amro, Apr 01 '16 at 09:08

Amro · Answer 1 · 2016-04-01T11:05:48.810

1

Given the following CSV file:

number,datetime start,datetime arrive
4027,25/03/2016 11:20,25/03/2016 11:20
4000,25/03/2016 11:20,25/03/2016 11:20
4027,25/03/2016 11:21,25/03/2016 11:21
4000,25/03/2016 11:21,25/03/2016 11:21

We load the data into a MATLAB table. We specify the formatting to parse the last two columns as datetime objects.

t = readtable('file.csv', 'FileType','text', 'Delimiter',',', ...
    'Format','%f %{dd/MM/yyyy HH:mm}D %{dd/MM/yyyy HH:mm}D');

The result:

>> t
t = 
    number     datetimeStart       datetimeArrive 
    ______    ________________    ________________
    4027      25/03/2016 11:20    25/03/2016 11:20
    4000      25/03/2016 11:20    25/03/2016 11:20
    4027      25/03/2016 11:21    25/03/2016 11:21
    4000      25/03/2016 11:21    25/03/2016 11:21

(you might see a warning about identifiers being fixed, this is harmless. The reason is that the names in the header line contained spaces, and those are not valid in variable names).

Finally we select the rows where the first column is equal to 4027:

>> tt = t(t.number == 4027,:)
tt = 
    number     datetimeStart       datetimeArrive 
    ______    ________________    ________________
    4027      25/03/2016 11:20    25/03/2016 11:20
    4027      25/03/2016 11:21    25/03/2016 11:21

edited Apr 01 '16 at 11:05

answered Apr 01 '16 at 09:11

Amro

123,847
25
243
454

Thanks for your answer but unfortunately I have tried your solution and I get the following error: 'Error using readtable. Unable to read the entire file. You may need to specify a different format string, delimiter, or number of header lines.' – Sarah Apr 01 '16 at 09:42
what is the format of your file exactly? Do you have a header line? You need to show us verbatim the first few lines of the **actual file** not some random extract... The above works fine with the sample you posted in your question. We can only work with what you've showed us! – Amro Apr 01 '16 at 09:51
Yes, sorry, my file has a header on the first line. The rest of the file matches the format posted in my example. I have tried the following:t = readtable(filename, 'FileType','text', 'Delimiter',',', ... 'Format','%f %{dd/MM/yyyy HH:mm}D %{dd/MM/yyyy HH:mm}D', ... 'ReadVariableNames',false,'HeaderLines',1); But I still get error. – Sarah Apr 01 '16 at 10:24
The new error is the following: "Unable to read the DATATIME data with the format 'dd/MM/yyyy HH:mm'. If the data is not a time, use %q to get string data" – Sarah Apr 01 '16 at 10:25
no, you need to set `ReadVariableNames` to `true` (don't use `HeaderLines` in this case). Again, please edit the question and show us the header line of the file along with the data! The actual thing.. – Amro Apr 01 '16 at 10:27
I have edited my post adding the headerline. I did not know the headerline could cause problems. The header did not affect me when I tried Lati's solution. – Sarah Apr 01 '16 at 10:41
Of course it is going to be a problem, you can't blindly parse a file without knowing its exact structure... Anyway I updated my answer, should work now :) – Amro Apr 01 '16 at 11:03
I am sorry, I really do not know what could it be this time but I keep getting the following error: 'Unable to read the DATATIME data with the format 'dd/MM/yyyy HH:mm'. If the data is not a time, use %q to get string data'. My file is exactly as I have posted it. With only one header on the first line. – Sarah Apr 01 '16 at 11:53
@Sarah It works with 'textscan' because you use the index results of 'strfind'. But sure 'textscan' reads your headerline(s) as well. So better to define 'HeaderLines' number while reading the file to focus on the data only – Lati Apr 01 '16 at 11:58
@Sarah It works correctly for the exact file you've shown. If your data is different or malformed, it will fail to parse the dates... This is really straightforward, you can even use the "Import Data" tool in MATLAB `uiimport('file.csv')` to do the same thing. Here is a similar example with pictures: http://www.mathworks.com/help/matlab/ref/importtool-app.html#butnqc6-1 – Amro Apr 01 '16 at 12:13
Lati's code is reading the whole file as a bunch of strings, so it makes no difference it there was a header or not, or if the dates were malformed, everything is a string, including the numbers in the first column. In my solution I tried to parse the data correctly, which allows you to work with it more comfortably like `plot(t.datetimeStart, t.number)` and you would get nicely labeled dates on the x-axis, and numeric values in the y-axis. – Amro Apr 01 '16 at 12:18
@Lati I am aware textscan reads my headerline as well. But as I after filter the data I need (just 4027 strings), it does not affect my output results. – Sarah Apr 01 '16 at 13:28
@Amro When I import the file with "Import Data" tool in Matlab, it does it correctly. However I did not want to use this tool as my file has thousands of lines and that would be too slow. Textscan is faster. I will test your solution more and see if I can find the problem. – Sarah Apr 01 '16 at 13:32
@Sarah both the import tool and `readtable` are using `textscan` underneath. Import tool can even generate code for you, click on the arrow under the green checkmark, and select "generate script" or "generate function". This can't get any easier! – Amro Apr 01 '16 at 13:39
@Amro I used to use 'Import tool' in the past, including the function it generates by clicking the 'generate function' option. But I had to stop using it as it is incredibly slow to handle large files. I attended a Matlab course and I was told there to use textscan and read everything as strings as this was going to speed up my file reading process. There must be a difference somewhere between readtable and using textscan on its own... – Sarah Apr 01 '16 at 13:47
I suggested using the import tool to help you in loading the data. If your file is large, run the import tool on a small subset of it (say the first 100 lines), and generate code. Then you can use this code to load the full file... If you look at the generated code, it is using `textscan` the same as you would have manually written... – Amro Apr 01 '16 at 13:50
Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/107963/discussion-between-sarah-and-amro). – Sarah Apr 01 '16 at 13:51

score 0 · Accepted Answer · answered Mar 31 '16 at 16:05

First you have comma as delimiter, better to use in textscan. Then you should specify the data columns in your text file. Moreover I would use CollectOutput option to get the data in the same cell array. Check the following code:

file = fopen('myfile.txt');
TextCell=textscan(file,'%s %s %s %s %*[^\n]', 'Delimiter',',', 'CollectOutput',1);
Text=TextCell{1}; 
fclose(file);

containsStr = ~cellfun('isempty',strfind(Text,'4027')); 
FilteredText=Text(containsStr,:);

As a result, you will get a cell array as well.

Missing information after reading text file with textscan (Matlab)

2 Answers2