0

I am using Python and Pandas. I am working on a predictive maintenance project where my intention is to predict the probability of a failure which will occur in a given time period, say 4-6 hours. I have preprocessed the data and reduced it to the following: The dataset has 4 attributes, Start time, end time, duration of the event(Which is the difference in start and end time) and fourth attribute being event which is a fail or not fail. (1 being Fail and 0 Being not fail) Sample data is as follows:

START_TIME      END_TIME        DURATION_MINUTES    EVENT
2/15/2018 2:32  2/15/2018 2:32  0.566666667           0
2/15/2018 2:32  2/15/2018 2:33  0.916666667           0
2/15/2018 2:33  2/15/2018 2:33  0.116666667           1
2/15/2018 2:33  2/15/2018 2:35  1.283333333           0
2/15/2018 2:35  2/15/2018 2:35  0.083333333           0
2/15/2018 2:35  2/15/2018 2:35  0.166666667           0
2/15/2018 2:35  2/15/2018 2:35  0                     0

I have about 120000 data instances. Can anybody let me know how I can visualize and predict at what probability a Failure (EVENT=1) will occur on any given day (Time frame of 4 hours)

Ben.T
  • 29,160
  • 6
  • 32
  • 54
  • You're saying the only data you have is a timestamp and whether or not a failure occurred at this timestamp? Shouldn't you have some other input, such as temperature, vibration, age of equipment, hours on duty, etc? Otherwise what can we possibly predict? Do you believe the failures occur at a specific time of day or during a specific season? – John Zwinck Jul 31 '18 at 23:35
  • @JohnZwinck Thank you. The whole point of the exercise is to figure out up to what extent we can predict the occurrence of a failure in a given time frame. One of the inferences of this problem could be predicting whether the machine will fail from 10am-5pm? Will I be able to make a prediction that it will fail with about 70% confidence or more? Again, this exercise is to see what machine learning models can do with minimal data attributes which in my case is 4 attributes with 120000 instances. I will be collecting more attributes like you said in the future. – Pavan Kumar Shekar Aug 01 '18 at 05:09
  • To visualise data you can use `step` function from `matplotlib` module and set `START_TIME` as x-axis and `EVENT` as y-axis. For prediction you can convert _date_ to _float_ (`numpy.astype(np.float64)`). – RobJan Aug 01 '18 at 11:24
  • @RobJan Which algorithm are you suggesting I use to predict the failure? Since this is predicting of the failure over a time period and not with input as a time stamp and a classification as the output. – Pavan Kumar Shekar Aug 01 '18 at 21:59
  • One issue with this question is it does not explain what you mean by "prediction." I and at least one other commenter assumed you were using some set of measurements to identify that conditions were ripe for failure. However your data set is historical in nature, which implies that you might actually be asking something along the lines of: **Given historical trend, what is the expected total duration of failure in the next day?** At this point, you've arrived a statistical question, which is better suited to https://stats.stackexchange.com/ – Maus Jun 09 '19 at 21:09
  • Once you know how you intend to estimate the likelihood of error, you could dip back over here to ask how you might implement your selected method in python – Maus Jun 09 '19 at 21:10
  • @Maus Thank you for the response. To clarify the intent of the question, 1- I am trying to figure out up to what extent of predictions we can make with just a time series data with minimum features. However, if the data is being collected from a machine and the EVENT is the feature we are interested in (Value being 1) if other features such as the temperature during the event, the vibration intensity during the event is given which are all contributing factors to the failure which in this case is EVENT 1, what algorithms or approaches work best to predict the probability of a failure? – Pavan Kumar Shekar Jun 10 '19 at 22:06
  • @Maus continuing the above comment: If I would want to make a prediction more meaningful and useful, what do you think should be the prediction I must be making? ( Number of EVENTs in a given time frame, probability of an event, etc) TIA – Pavan Kumar Shekar Jun 10 '19 at 22:12
  • @PavanKumarShekar How did you solve your problem? Which ML model did you use? Any GitHub links? – santobedi Jun 30 '21 at 10:22

1 Answers1

-5

neural nets and some deep learning should be the algorithmic route to go