1

How do you import time series data into timetable where the timestamps are formatted strings as opposed to date serial?

For example, I have the following data in a csv file.

20191208 18:17:00,17,17,17,17
20191208 18:18:00,17.5,17,17,17.5
20191208 18:19:00,17.5,17.5,17.5,17.5

where the timestamps have the format YYYYMMDD hh:mm:ss. Headings for each column may or may not be present in the file.

Using readtimetable would give the error message

Unable to detect datetime or duration data in file

Of course, ultimately, the data can be parsed manually as plain text. At a minimum, since we are talking about well formed csv files, readtable will always recognize it (and I think readtable is able to accommodate potentially present column headings; not sure). From there, one can convert the data into timetable after string processing the timestamp column.

My question is, what is the most efficient way possible to convert a time series stored in csv with timestamp in formatted string? Efficient as simple in coding and inexpensive computationally?

Argyll
  • 8,591
  • 4
  • 25
  • 46

1 Answers1

1

Unfortunately, as you've probably noticed, readtimetable does not allow you to tell the function how you want to treat the data. This function is relatively new so perhaps it's something they have missed.

readtable on the other hand, a more mature function, allows you to do that. We can use the Format option to specify how to read data.

So we can do:

>> tt = table2timetable(readtable('file.csv','Format','%{YYYYMMdd HH:mm:ss}D %f%f%f%f'))

>> head(tt)
ans =

3×4 timetable

      Var1           Var2    Var3    Var4    Var5
_________________    ____    ____    ____    ____

20191208 18:17:00      17      17      17      17
20191208 18:18:00    17.5      17      17    17.5
20191208 18:19:00    17.5    17.5    17.5    17.5

>> class(tt.Var1)
ans =

'datetime'
Paolo
  • 21,270
  • 6
  • 38
  • 69
  • Thank you for replying. I don't think the particular formatting string works. The timestamp is not read correctly. What's the rule with the `"Format", regex` pair? – Argyll Aug 21 '20 at 22:38
  • Also I notice a 10-fold increase in time cost from using `readtable` alone to `readtable` with the above `"Format", regex' pair. Is there a way to avoid the large increase in time cost? FYI, I started my attempt with `readtable` without formatting and then use `datetime(Table.Var1,'InputFormat','yyyyMMdd HH:mm:ss')`. But this latter part is worse than 10 times longer than the initial `readtable` part. – Argyll Aug 21 '20 at 22:44
  • @Argyll There's no regex in my solution. Where are you getting that from? What does "it does not work mean"? What output are you getting? .. is this really the bottleneck in your application? have you profiled your code properly? Anyway, if you want to end up with a timetable, its either what I've suggested or you first get the date in the right format (with some preprocessing with tools like `sed`) and then use `readtimetable` afterwards. – Paolo Aug 21 '20 at 23:05
  • `20191808 18:18:00` is not `20191208 18:18:00`. By `regex` I simply mean it is a formatting string. As for whether it bottlenecks my application, I don't know yet. Currently reading 10k rows into `timetable` takes a few seconds, it is quite possibly a source of bottleneck as I build my tools, and it may mean that I cannot use `timetable`. So I'd like to know. Aside, what's `sed`? – Argyll Aug 21 '20 at 23:10
  • @Argyll My apologies, there was a typo in my code. See update.. Yes, it's not actually a regex pattern but rather the format MATLAB uses for specifiers. Fair enough, I'll try and see tomorrow if there's a better, faster solution for you. You might want to look into `datastore` https://www.mathworks.com/help/matlab/datastore.html – Paolo Aug 21 '20 at 23:22
  • Please give it some thoughts. I appreciate it. re: `datastore`. I used to have issue with `.mat` file corrupting data via write/read cycles and abandoned the idea of storing data in Matlab-native formats. I'll experiment with `datastore` in the future. – Argyll Aug 21 '20 at 23:28