0

I am using gnuplot combined with AWK to plot 2D bar plot from the following input data:

#Acceptor                DonorH           Donor   Frames         Frac      AvgDist       AvgAng
lig_608@O3          HIE_163@HE2     HIE_163@NE2      498       0.5304       2.8317     153.0580
lig_608@O             GLU_166@H       GLU_166@N      476       0.5069       2.8858     161.7174
lig_608@O1           HIE_41@HE2      HIE_41@NE2      450       0.4792       2.8484     158.5193
THR_26@O             lig_608@H9      lig_608@N1      399       0.4249       2.8312     149.9578
lig_608@O2             THR_26@H        THR_26@N      312       0.3323       2.9029     164.8033
lig_608@O1         ASN_142@HD21     ASN_142@ND2       14       0.0149       2.8445     158.4224
lig_608@O1         GLN_189@HE22     GLN_189@NE2        2       0.0021       2.8562     149.7421
lig_608@O1         GLN_189@HE21     GLN_189@NE2        1       0.0011       2.7285     158.4377
lig_608@O3            GLY_143@H       GLY_143@N        1       0.0011       2.7421     147.8213

My script takes the data from the third and 5th columns considering only the lines where the value from the 5th column > 0.05, producing bar graph

cat <<EOS | gnuplot > graph.png
set term pngcairo size 800,600
set xtics noenhanced
set xlabel "Fraction, %"
set ylabel "H-bond donor, residue"
set key off
set style fill solid 0.5
set boxwidth 0.9
plot "<awk 'NR == 1 || \$5 > 0.05' $file" using 0:5:xtic(3) with boxes
EOS

!EDITED: within my bash workflow the script looks like

for file in "${output}"/${target}*.log ; do
 file_name3=$(basename "$file")
 file_name2="${file_name3/.log/}"
 file_name="${file_name2/${target}_/}"
echo "vizualisation with Gnuplot!"
cat <<EOS | gnuplot > ${output}/${file_name2}.png
set title "$file_name" font "Century,22" textcolor "#b8860b"
set tics font "Helvetica,12"
#set term pngcairo size 1280,960
set term pngcairo size 800,600
set yrange [0:1]
set xtics noenhanced
set xlabel "Fraction, %"
set ylabel "H-bond donor, residue"
set key off
set style fill solid 0.5
set boxwidth 0.9
plot "<awk 'NR == 1 || \$5 > 0.05' $file" using 0:5:xtic(3) with boxes
EOS
done

enter image description here

This is the image produced from following filtered data:

HIE_163@NE2 0.5304
GLU_166@N 0.5069
HIE_41@NE2 0.4792
lig_608@N1 0.4249
THR_26@N 0.3323

I need to modify my awk searching expression integrated in the gnuplot that makes selection of the two columns from the whole data. Instead of taking the index from the third column (Donor) from each line I need to take it either from the first (#Acceptor) or form the third (#Donor) column. The index should be taken from one of these columns depending on the lig_* pattern. E.g. if the data in the (#Acceptor) column starts from lig* I need to take the value from the third column (#Donor) of the same line and visa verse (lig* pattern presents either in the 1st column or in the 3rd but not in the both..) Taking my example, the filtered data with the updated searching should become:

HIE_163@NE2 0.5304 # the first index from the third column
GLU_166@N 0.5069 # the first index from the third column
HIE_41@NE2 0.4792 # the first index from the third column
THR_26@O 0.4249 # !!!! the first index from the first column !!
THR_26@N 0.3323 # the first index from the third column
James Starlight
  • 317
  • 1
  • 6

2 Answers2

2

No need for awk, you can do it all in gnuplot (hence platform-independent). This would be my first attempt. You will filter by plotting the unwanted data at x-value of NaN, however, this will give some warnings: warning: add_tic_user: list sort error which you can ignore. But this can probably be avoided by some changes.

Edit: the original script would have failed when the first line had a value <0.05 in column 5. Here are two versions which don't have this problem. There will also be no warnings. Maybe these attempts can be further simplified.

For creating an output file simply add this to your script: (check help output)

set term pngcairo size 800,600
set output "myOutputFile.png"

...<your script>...

set output

Data: SO73961783.dat

#Acceptor                DonorH           Donor   Frames         Frac      AvgDist       AvgAng
lig_608@O3          HIE_163@HE2     HIE_163@NE2      498       0.5304       2.8317     153.0580
lig_608@O             GLU_166@H       GLU_166@N      476       0.5069       2.8858     161.7174
lig_608@O1           HIE_41@HE2      HIE_41@NE2      450       0.4792       2.8484     158.5193
THR_26@O             lig_608@H9      lig_608@N1      399       0.4249       2.8312     149.9578
lig_608@O2             THR_26@H        THR_26@N      312       0.3323       2.9029     164.8033
lig_608@O1         ASN_142@HD21     ASN_142@ND2       14       0.0149       2.8445     158.4224
lig_608@O1         GLN_189@HE22     GLN_189@NE2        2       0.0021       2.8562     149.7421
lig_608@O1         GLN_189@HE21     GLN_189@NE2        1       0.0011       2.7285     158.4377
lig_608@O3            GLY_143@H       GLY_143@N        1       0.0011       2.7421     147.8213

Script 1:

Filter your data and write it into a new table. If condition >0.05 is not met, write an empty line. Probably the easiest to understand and gives the shortest final plot command.

### conditional xtic labels
reset session
set termoption noenhanced 

FILE = "SO73961783.dat"

set xlabel "Fraction, %"
set ylabel "H-bond donor, residue"
set key off
set style fill solid 0.5
set boxwidth 0.9
set grid y
set xrange[-1:5]

set table $Filtered
    myTic(col1,col2) = strcol(col1)[1:3] eq 'lig' ? strcol(col2) : strcol(col1)
    plot FILE u ((y0=column(5))>0.05 ? sprintf("%g %s",y0,myTic(1,3)) : '') w table
unset table

plot $Filtered u 0:1:xtic(2) w boxes
### end of script

Script 2:

Without extra table, but a more complex plot command. Increase the x-position x0 if a value>0.05 is found (except for the first time) and keep the previous position and and label (i.e. overwrite it) if a value<=0.05 is found.

### conditional xtic labels
reset session
set termoption noenhanced

FILE = "SO73961783.dat"

set xlabel "Fraction, %"
set ylabel "H-bond donor, residue"
set key off
set style fill solid 0.5
set boxwidth 0.9
set grid y
set xrange[-1:5]

myTic(col1,col2) = strcol(col1)[1:3] eq 'lig' ? strcol(col2) : strcol(col1)

plot x0=c=(t0='',0) FILE u ((y0=column(5))>0.05 ? (c==0 ? (c=1,t0=myTic(1,3)) : (x0=x0+1,t0=myTic(1,3))) : (y0=NaN),x0):(y0):xtic(t0) w boxes
### end of script

Result:

enter image description here

theozh
  • 22,244
  • 5
  • 28
  • 72
  • The return value of the myTic function returns a string of column data, but should return an integer representing the column number to pass the value to xtic(). – binzo Oct 05 '22 at 23:12
  • @binzo in principle yes, but have you tried instead `myTic(col1,col2) = strcol(col1)[1:3] eq 'lig' ? col2 : col1`? Although the function is returning a column number it will not place the xtic from the corresponding column, for some reason which is not fully clear to me. The argument of `xtic()` may also contain a string. – theozh Oct 06 '22 at 07:03
  • hey many thanks for this elegant sollution! would it be possible to show a version of the script which load a file (from bash) and save it e.g. using file.png as it was shown in my first example? Actually this is my first experience with the gnuplot so .. – James Starlight Oct 06 '22 at 08:44
  • 1
    @JamesStarlight actually, I just noticed that the current script above will fail if the first line of your list has a value<0.05 in the 5th column. I will add a more robust solution, and also from input file to output file. – theozh Oct 06 '22 at 09:04
  • thank you! may you just show a one more example in the similar format which I demonstrated in my first topic beginining with cat < output.png in order that I could understand how to manipulate it within my bash workflow? – James Starlight Oct 06 '22 at 12:17
  • 1
    @JamesStarlight sorry, I am not familiar with bash. I would have to make wild guesses which I cannot test myself since (so far) I am not working under Linux. – theozh Oct 06 '22 at 12:19
  • right, thank you! I've just updated my first topic demonstrating how it works in the linux shell :-) – James Starlight Oct 06 '22 at 12:20
  • 1
    @JamesStarlight probably you can do it all in gnuplot. But I do not fully understand what your bash script is actually doing. Maybe plotting all `.log` files in a directory and create `.png` files for each of them. Then this is maybe be helpful if you want to do it in gnuplot: https://stackoverflow.com/q/67716946/7295599 – theozh Oct 06 '22 at 12:33
  • hey, finally I could integrate you first version in my bash workflow and it seems to me that it works. just one question would it be possible to add the Y values directly above the bars on the plot (e.g. in the first version) ? ;-) – James Starlight Oct 06 '22 at 13:33
  • 1
    @JamesStarlight there are a few examples here on SO. In your case something like `plot $Filtered u 0:1:xtic(2) w boxes, '' 0:1:1 w labels offset 0,1`. – theozh Oct 06 '22 at 14:52
  • gnuplot> plot $Filtered u 0:1:xtic(2) w boxes, '' 0:1:1 w labels offset 0,1 ^ line 0: unexpected or unrecognized token – James Starlight Oct 06 '22 at 15:08
  • 1
    @JamesStarlight sorry, I forgot the `using` or `u` in the second plot command: `plot $Filtered u 0:1:xtic(2) w boxes, '' u 0:1:1 w labels offset 0,1` – theozh Oct 06 '22 at 15:11
  • Thank you very much my fried! I have another question related to gnuplot visualisation of bar plots but I will create a new topic in order thet you may share there with your experiences – James Starlight Oct 06 '22 at 15:15
  • here it's, I believe the question might be very trivial for person experienced with gnuplot :-) https://stackoverflow.com/questions/73976330/gnuplot-sort-bars-on-the-bar-graphs – James Starlight Oct 06 '22 at 15:24
1

As you potentially want to do more complicated processing with awk, I would suggest an alternative way of mixing awk and gnuplot.

Gnuplot supports including inline data in its script files, so you could have awk generate the inline data while supplying the plot-configuration with bash, all done in a sub-shell. For example:

(
printf '$data << EOD\n'

awk 'NR>1 && $5>0.05 { print $1 ~ /^lig/ ? $3 : $1, $5 }' infile

cat << EOS
EOD

set term pngcairo size 1280,960 font ",20"
set output "output.png"

set xtics noenhanced
set ytics 0.02
set grid y
set key off

set style fill solid 0.5
set boxwidth 0.9

set xlabel "Fraction, %"
set ylabel "H-bond donor, residue"

plot "\$data" using 0:2:xtic(1) with boxes, "" using 0:2:2 with labels offset 0,1
EOS
)

Would produce this gnuplot script:

$data << EOD
HIE_163@NE2 0.5304
GLU_166@N 0.5069
HIE_41@NE2 0.4792
THR_26@O 0.4249
THR_26@N 0.3323
EOD

set term pngcairo size 1280,960 font ",20"
set output "output.png"

set xtics noenhanced
set ytics 0.02
set grid y
set key off

set style fill solid 0.5
set boxwidth 0.9

set xlabel "Fraction, %"
set ylabel "H-bond donor, residue"

plot "$data" using 0:2:xtic(1) with boxes, "" using 0:2:2 with labels offset 0,1

Pipe it to Gnuplot, i.e. (...) | gnuplot and get this in output.png:

Thor
  • 45,082
  • 11
  • 119
  • 130
  • could you precise if it would be possible to use a bash variable inside the gnuplot sctipt e,g to print the filename directly on the graph. For example set title "$file_name" font "Century,22" textcolor "#b8860b" print "$file_name" ;-) – James Starlight Oct 07 '22 at 15:19
  • also, this part does not work , "" using 0:(\$2+.01):2 with labels – James Starlight Oct 07 '22 at 16:00
  • 1
    @JamesStarlight: Did you try using a bash variables in the heredoc? It works as you would expect. "does not work" how? the `using with labels` and offset calculation works here. Try running it with `LC_ALL=C` infront of the gnuplot command. Anyway I updated it to use `offset` instead. – Thor Oct 10 '22 at 07:38
  • Right, thank you it works very well! I've just tried to improve the pipeline adding sort after AWK and it works very well. just a question: what is the difference between <( cat <<'EOG' and cat << EOS ? I mean only in the second case I could use bash variables directly in the gnuplot script, e.g. set title "$file_name – James Starlight Oct 10 '22 at 09:27
  • 1
    @JamesStarlight: heredocs require the same start and end, here are some more examples: https://linuxize.com/post/bash-heredoc/ – Thor Oct 10 '22 at 13:09