0

I have a large dataset which I need to plot in loglog scale in Gnuplot, like this:

set log xy
plot 'A_1D_l0.25_L1024_r0.dat' u 1:($2-512)

LogLogPlot of my datapoints

Text file with the datapoints

Datapoints on the x axis are equally spaced, but because of the logscale they get very dense on the right part of the graph, and as a result the output file (I finally export it in .tex) gets very large. In linear scale, I would simply use the option every to reduce the number of points which get plotted. Is there a similar option for loglogscale, such that the plotted points appear equally spaced?

I am aware of a similar question which was raised a few years ago, but in my opinion the solution is unsatisfactory: plotted points are not equally spaced along the x-axis. I think this is a really unsophisticated problem which deserves a clearer solution.

2 Answers2

1

As I understand it, you don't want to plot the actual data points; you just want to plot a line through them. But you want to keep the appearance of points rather than a line. Is that right?

  set log xy
  plot 'A_1D_l0.25_L1024_r0.dat' u 1:($2-512) with lines dashtype '.' lw 2

enter image description here

Amended answer

If it is important to present outliers/errors in the data set then you must not use every or any other technique that simply discards or skips most of the data points. In that case I would prefer the plot with points that you show in the original question, perhaps modified to represent each point as a dot rather than a cross. I will simulate this by modifying a single point in your 500000 point data set (first figure below). But I would also suggest that the presence of outliers is even more apparent if you plot with lines (second figure below).

Showing error bounds is another alternative for noisy data, but the options depend on what you have to work with in your data set. If you want to pursue that, please ask a separate question.

enter image description here enter image description here

Ethan
  • 13,715
  • 2
  • 12
  • 21
  • Thanks, this sounds like a very smart solution. However, I forgot to mention that these data refer to scientific measurements: can I use such a trick on a plot which is supposed to enter a scientific article? Moreover, how should I behave if my data is accompanied by errorbars? – Davide Venturelli Jul 24 '21 at 14:09
  • I note that your original question asked how to discard the majority of the data points. That would be a far worse "sin of omission" than drawing a dotted line that passes through every point in the data set. Answer amended to show the highly visible effect of adding a single outlier point to the data (but only if you keep all the points!) – Ethan Jul 24 '21 at 16:55
  • Thanks again, this remains a good answer, but of course the scenario I have in mind is not one where I willingly omit some "outliers". If my data comes from, say, a numerical simulation, then in general I have a lot of points (equally significant and widely homogeneous) and I need to omit some of them for the sake of clarity of their presentation. – Davide Venturelli Jul 25 '21 at 10:37
0

If you really want to reduce the number of data to be plotted, you might consider the following script.

s = 0.1           ### sampling interval in log scale
                  ###  (try 0.05 for more detail)

c = log10(0.01)   ### a parameter used in sampler(x) 
                  ### which should be initialized by 
                  ### smaller value than any x in log scale

sampler(x) = (x>0 && log10(x)>=c) ? (c=ceil(log10(x)/s+0.5)*s, x) : NaN

set log xy
set grid xtics
plot 'A_1D_l0.25_L1024_r0.dat' using (sampler($1)):($2-512) with points pt 7 lt 1 notitle , \
     'A_1D_l0.25_L1024_r0.dat' using 1:($2-512) with lines lt 1 notitle

This script samples the data in increments of roughly 0.1 on x-axis in log scale. It makes use of the property that points whose x value is evaluated as NaN in using are not drawn.

enter image description here

binzo
  • 1,527
  • 1
  • 3
  • 13