1

I am trying to plot anomaly regions in Bokeh. The idea is to have a line that will use red color to show that those samples are anomalous ones.

Here is a sample reproducible code.

import numpy as np
import random

n=300
dat = pd.DataFrame()
dat['X_axis'] = np.linspace(start=0.0, stop=1000, num = n)
mean = 4
std = 1
dat['Y_axis']=np.random.normal(loc=mean, scale=std, size = n)
dat['anom'] = np.random.choice([False, True ], size = (n,), p= [0.90, 0.10])

I was able to implement the Box Annotation, and I am trying to do the same thing but this time, the same region will just have a red color for that portion of the line.

enter image description here

EDIT:

Following a comment/suggestion, I plotted those two lines as separate. However, Bokeh interpolates between values, instead of having a smooth transaction. Is there a way to drop interpolation, or at least minimize between it to two adjacent values?

enter image description here

EDIT 2:

I was able to break it into individual segments. However, now there are gaps between data samples that need to be eliminated. Any suggestion on how to do that? enter image description here

eemamedo
  • 325
  • 1
  • 6
  • 14
  • What do you mean by "smooth transaction"? Bokeh just connections different dots that you asked it to plot. If you want them to be connected in some other way, you have to add some bogus points yourself. If you don't want the points to be connected at all, just use a scatter plot. – Eugene Pakhomov Apr 01 '20 at 21:23
  • Ah, I think I see what you meant. Along with splitting your data by normal vs anomalous points, you also have to tell Bokeh to not draw anything in between separate regions. For that, you will have to call `multi_line` multiple times instead of line. – Eugene Pakhomov Apr 01 '20 at 21:25
  • Thank you for the suggestion. Let's say, we split data into `data_normal` and `data_anomaly`. Are you suggesting calling `multi_line` to `data_normal` and `data_anomaly`. Why would it be multiple? – eemamedo Apr 02 '20 at 14:24
  • Multiple - because you have multiple segments of each kind. You can either call `multi_line` 2 times, one for each kind, or call `line` M+N times, where M is the number of normal segments and N is the number of anomalous segments. – Eugene Pakhomov Apr 02 '20 at 14:51
  • To remove the gap on your plot, just make sure that there's a shared point between lines of different kinds. After all, to cover the gap, the lines must connect to the same point. Bokeh does not do that for you. – Eugene Pakhomov Apr 02 '20 at 14:52
  • I got the first one done, and the result is in Edit 2. I understand the second point, and that's essentially what I am trying to figure out: is there a function in Bokeh that will interpolate/impute those gaps with numbers between the two closest integers? – eemamedo Apr 02 '20 at 14:56
  • You don't need to impute/interpolate anything. Just add the last point from the segment N - 1 to the start of the segment N, even if the segments come from different categories. If you really want to interpolate, then it's just two arithmetic means of two pairs of values - you don't need a function in Bokeh for that. – Eugene Pakhomov Apr 02 '20 at 15:06
  • I am aware of the concept of interpolation. What you suggested can be done with `ffill` and `bfill` quite easily. However, there are some graphs in my code where this approach doesn't work as `y-axis` values change. – eemamedo Apr 02 '20 at 15:19
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/210811/discussion-between-eugene-pakhomov-and-eemamedo). – Eugene Pakhomov Apr 02 '20 at 15:37

1 Answers1

2

You will have to split your data up and use either multiple calls to line or a single call to multi_line. It is not possible to specify different colors along different parts of a single line.

bigreddot
  • 33,642
  • 5
  • 69
  • 122