2

I am a gnuplot-newbie and am stuck with the following situation. Based on this I have a gnuplot script as follows:

clear
reset
set key off
set border 3

set style fill solid 1.0 noborder

bin_width = 0.01;
set boxwidth bin_width absolute

bin_number(x) = floor(x/bin_width)

rounded(x) = bin_width * ( bin_number(x) + 0.5 )

plot '1000randomValuesBetween0and1.dat' using (rounded($1)):(1) smooth frequency

Which was a good first step; but I would like to have a smooth curve through the points that are generated by counting the frequency. with filledcurves lacked what I wanted in 2 ways. First it is not smoothed (I would prefer something like bezier which is not usable after with); second the filling is done in a rather unexpected way which doesn't fit my needs (for me unexpected). See this picture using 'with filledcurves'.

To give a little bit more context: I ultimately want to use this to generate violin plots with gnuplot without having to do the binning beforehand so I can just give my script a single-column data-file and am ready to go.

EDIT: I tried adapting the "normal" density plot from this demo as another first step, but I failed; I read in the documentation that bandwidth should be 1/#points so it should be 0.001 in my case meaning I tried this:

set border 3 front lt black linewidth 1.000 dashtype solid
set style increment default
set style data filledcurves 
set xtics border in scale 0,0 nomirror norotate  autojustify
set xtics  norangelimit 0.00000,0.5,1.0
set title "Same data - kernel density" 
set title  font ",15" norotate

plot 'random01.dat' using 1:(1) smooth kdensity bandwidth 0.001 with filledcurves above y lt 9

which results in this picture:second attempt failing with kdensity. Setting no bandwith or lower/higher values didn't solve the issue. The plot specifies using 1:(1) because I just have a single column so according to the doc the first value should be this column and as the second value would specify a weighting which should be 1/#points according to doc.

EDIT2: Setting bandwidth to the ideal value or not setting it at all always yields the same result which doesn't change anything except the scale of the y-axis with changing the weighting.

My data are 1000 values in a range between 0 and 1 (created randomly for testing purposes).

Here the new plotattempt with corrected bandwidth

EDIT3: zooming out may show another aspect of the problem as the plot seems to extend outside the interval of the given values (I checked the values and there are no examples <0 or >1). Here's the graph:

zoomed out graph

Wolfone
  • 1,276
  • 3
  • 11
  • 31
  • You mis-read the documentation. 1/N is not the bandwidth, it is the normalized uniform weight. The plot you show looks like the bandwidth was set far too low. What is the range of values in your data? I suggest letting the program calculate the "ideal" bandwidth for you and then adjusting it afterwards if you think it is too large. The ideal value is stored in GPVAL_KDENSITY_BANDWIDTH. – Ethan Mar 22 '19 at 23:50
  • I really misread that! But unfortunately not specifying the bandwidth or setting it to the ideal value still yields weird results. I have 1000 values randomly created between 0 and 1. Changing the weight to 0.001 (or any other value didn't have any effect). I will update my question with the plot. – Wolfone Mar 23 '19 at 08:24
  • Ok, I recognized one thing I didn't see before because I never zoomed out; if I zoom out it shows that the "real" curve extends beyond 0 and 1; I double checked my values and there really are no values <0 or >1. – Wolfone Mar 23 '19 at 09:21
  • Extending beyond the end of the data is expected. The kdensity function is approximating your distribution as a summation of Gaussians, and each Gaussian by its nature extends to infinity on both sides. Your EDIT3 plot looks reasonable to me. What is the "same problem" that you see? – Ethan Mar 23 '19 at 17:12
  • Oh wow...you are completely right; that resolves that! Sorry for my slow thinking on the nature of the resulting density-function estimate; I really should have recognized that earlier! So I will just have to find a fitting heuristic where to cut of my plots to beautify them a little bit. I thank you very much for your patience! If you include your first comment under the question into your answer I will gladly accept it as the solution! – Wolfone Mar 24 '19 at 09:59
  • BTW In general you can plot the data your data with the smooth option and save the result in a file (`set table 'smooth.dat'`)... then you can use it... Give it a look [here](https://stackoverflow.com/a/30443329/3569208)... – Hastur Feb 21 '20 at 17:17

1 Answers1

0

The demo 'violinplot.dem' included with the gnuplot distribution package and also available online shows how to do what you want using the combination "smooth kdensity" and "with filledcurve" applied to unbinned data.

Online version here: violin plot demo enter image description here

Notes:

You mis-read the documentation. 1/N is not the recommended bandwidth, it is the normalized uniform weight. The plot you showed initially looks like the bandwidth was set far too low. What is the range of values in your data?

I suggest letting the program calculate the "ideal" bandwidth for you and then adjusting it afterwards if you think it is too large. The ideal value is stored in GPVAL_KDENSITY_BANDWIDTH. Increasing the bandwidth will make the envelope smoother; decreasing it will emphasize local spikes.

Ethan
  • 13,715
  • 2
  • 12
  • 21
  • Hello and thank you for your answer; I encountered this example some hours ago but I wasn't able to fit it to my situation; so after your answer I gave it another try but I was not successful; I thought I should start with the "normal" density plot in the demo but I still fail; I will update my question to reflect what I did with that. I would appreciate any hints! – Wolfone Mar 22 '19 at 21:26