0

I wrote a short script to create a frequency distribution plot from raw data. The only thing I cannot make right is the x-axis. As you can see below, when the numbers are too long they got written in the e-notation which is difficult to read (also, labels are long enough to be cut out from the picture).

enter image description here

Normally I would use digits = X but unfortunately this notation cannot be used with the command cut. Full code is attached. Also, any other advice to make the graph more readable is warmly welcome.

##Paramaters definition
num.bins = 60 #The number of bins you want to be used
w.data = 2 #The column you have the data in

##Data loading
dataset = read.csv(file.choose())

##Calculating frequency
d.min = min(dataset[,w.data])
d.max = max(dataset[,w.data])

breaks = seq(d.min, d.max, by = (d.max-d.min)/num.bins)
d.cut = cut((dataset[,w.data]), breaks, right = FALSE, digits = 6)
d.freq = table(d.cut)

##Plot
plot(d.freq, ylab = 'Frequency', las = 2)
Edgar Derby
  • 2,543
  • 4
  • 29
  • 48
  • So the `cut` function has a `labels` argument that controls how the intervals are labelled. You haven't said how you'd like the intervals labelled or what you have attempted thus far using the `labels` argument. – joran Feb 12 '14 at 17:50
  • Check the answers at this other thread: [R changing format of scale on y-axis](http://stackoverflow.com/questions/8918452/r-changing-format-of-scale-on-y-axis). – celiomsj Feb 12 '14 at 18:02

1 Answers1

0

Weird to answer to my own question; however, I found the solution.

The cut function has a dig.lab argument which is an equivalent of digits. Why two commands with the same function were implemented with different names is obscure to me.

So, the amended code looks something like:

##Paramaters definition
num.bins = 35 #The number of bins you want to be used
w.data = 2 #The column you have the data in

##Data loading
#dataset = read.csv(file.choose())

##Calculating frequency
d.min = min(dataset[,w.data])
d.max = max(dataset[,w.data])

breaks = seq(d.min, d.max, by = (d.max-d.min)/num.bins)
d.cut = cut((dataset[,w.data]), breaks, right = FALSE, dig.lab = 6)
d.freq = table(d.cut)

##Plot
par(mar=c(4,4.5,3,1))
par(oma=c(4,2,0,0) )
plot(d.freq, ylab = 'Frequency', las = 2)
mtext(side=3, text="Frequency Distribution", line=1.2, cex=1.5)

This results in:

Thank you very much to @joran and @celiomsj for pointing me in the right direction.

Community
  • 1
  • 1
Edgar Derby
  • 2,543
  • 4
  • 29
  • 48