2

I would like to make a histogram using a if statement. I.e.:

u ($5 < 356 ? $9:1/0) . Plotting column $9 when, for instance, column $5 is lower then some number.

The simple script found online (also in some stackoverflow questions):

n=100 #number of intervals
max=7.5 #max value
min=4.5 #min value
width=(max-min)/n #interval width
#function used to map a value to the intervals
hist(x,width)=width*floor(x/width)+width/2.0
set boxwidth width*0.8

#count and plot
plot "file.txt" u (hist(($3 < 356 ? $9:1/0),width)):(1.0) smooth freq w boxes lc rgb "green" notitle

Is not working, producing an overlap of solutions quite hard to interpret complex. Horizontal lines over-plot with different ratio are the problems.

In comparison the normal one:

plot "file.txt" u (hist($9,width)):(1.0) smooth freq w boxes lc rgb "green" notitle

it shows all the points but well separated. I plot a range [0:500] to better visualize the horizontal lines in the first case.

If the file is needed (since it is large) I will provide an example one.

Thanks to all.

1 Answers1

1

Although it's and old question, it still might be relevant and interesting to others. Recently, I had a similar observation. When you filter your data by setting "unwanted" x-values to NaN, you introduce "interruptions" in your data, e.g. in a line plot the lines would be interrupted. Apparently, similarly, this happens with smooth freq which seems to start new histograms for each of the separated sequences, that's what you see on top of each other in your graph.

The simplest solution is: (however, apparently only for gnuplot>=5.4)

 set datafile missing NaN

Nevertheless, there is a solution for earlier versions (e.g. 5.0, the time of OP's question). There is another way to get your conditional histogram: instead of setting the unwanted x-values to NaN set the corresponding unwanted y-values to 0. You can do this because smooth freq is simply summing up and not calculating an average value like, e.g. smooth unique.

The following example will work for gnuplot>=5.0.0. However, there seems to be a bug(?!), apparently fixed in version 5.0.6. If you plot your results into a table, for some reason the last value is strange. But it looks like this value only appears when plotting to a table, not to a graph.

set table $Test
    plot $Data u (Bin($1)):(1) smooth freq 
unset table
print $Test

....
 2.3  2  i
 2.5  3  i
 2.7  1  i
 3.1  1  i
-1.9  1  u    # what's this???

Anyway, the following example creates 2 columns of random test data and filters the data, i.e. values of the first column are only taken into account when the value of the 2nd column is smaller than 5.0.

Code: (works with gnuplot 5.x)

### histogram with filtered data (gnuplot>=5.0.0)
reset session

# create some random test data
set print $Data
    do for [i=1:1000] {
        print sprintf("%g %g", invnorm(rand(0)), int(rand(0)*10))
    }
set print

# histogram
BinWidth = 0.2
Bin(x) = floor(x/BinWidth)*BinWidth + BinWidth*0.5

set style fill solid 0.3
set key noautotitle
set boxwidth BinWidth

myFilter(colF,valF) = column(colF)<valF ? 1 : 0

plot $Data u (Bin($1)):(1)               smooth freq w boxes lc "blue" ti "unfiltered", \
        '' u (Bin($1)):(myFilter(2,5.0)) smooth freq w boxes lc "red"  ti "filtered
### end of code

Result:

enter image description here

theozh
  • 22,244
  • 5
  • 28
  • 72