0

I have data of idle duration of a system as follows:

Date | Idle Time Start | Idle Time End | Idle Duration |

2017/07/11 | 10:36:21 | 10:37:28 | 67 |

2017/07/11 | 10:45:44 | 10:46:58 | 74 | .......

I want to check whether the idle duration is linear or non linear using python. My second question is if I want to predict idle duration for future time then any suggestions that how can I convert this data into dataframe so that i can perform some kind of regression analysis. I have stored the idle duration and the starting of idle duration in an array. I have also plotted the data using the following code:

SampleOne, Times = get_idletime_set(1000)
FMT = '%Y-%m-%d %H:%M:%S'
Dates=[]
for i in Times:
    Dates.append(datetime.strptime(i, FMT))


plt.plot(Dates, SampleOne)
plt.ylabel('Idle Duration')
plt.xlabel('Time')
plt.show()

I got this graph. enter image description here

The obtained graph is not straight line. Does it mean that idle duration is not linear with respect to time.

M. Paul
  • 361
  • 5
  • 18
  • What did you try? It should be pretty straightforward to calculate the duration between timesteps and check that they don't vary by more than a given percentage. – Eric Duminil Jul 11 '17 at 09:25
  • 1
    1. pls show your effort , that's how stack overflows works 2. for DataFrames see pandas 3. consider [Stat](https://stats.stackexchange.com) since this is also statistical and there are a lot of tests to check non-linearity – Marvin Taschenberger Jul 11 '17 at 09:35
  • @MarvinTaschenberger I did not asked how to create dataframe in python. My question is how to represent duration of time as an index in the dataframe. My motive is to predict idleness of a system in future instant of time. The problem is idleduration value is given in an interval (i.e, start time and end time of idleness) but not on a particular instance of time. – M. Paul Jul 11 '17 at 11:31

1 Answers1

0
  1. excuse that i will make this a little more statistical now. So you want to check if you model look like idle = c * t + u or idle = g(t) + u (with some constant c , t as time and u as factor for everything unobserved ) right ? so first thing first, your graph is not a valide time-series graph since the dates seem not to be ordered correctly ( some line are crossing which is not correct)

  2. As proposed this is imo a statistical question about modelselection. So you could run either test about linear or non-linear relationship between two variable or regress it as timeseries model without autoregressive term and compare with information criterions. But modelselection is a huge topic. Easy way would be to make an average idle time per day and regress it on a time-index as idel ~ t + t²+ln(t) in various combination and compare their performance and significance.

  3. I would consider that you read up on this or propose this question as the statistical stack exchange