Conditional Histograms Using Lattice Package, Output Plots Incorrect

Question

I'm using histogram from the lattice package to plot two histograms conditioning on a variable with two options: Male or Female.

histogram(~ raw$Housework_Tot_Min [(raw$Housework_Tot_Min != 0) & 
(raw$Housework_Tot_Min < 1000)] | raw$Gender)

Output of code: two histograms, minutes doing housework by gender

But, when I actually look at the data, these histograms are not correct. By plotting:

histogram(~ raw$Housework_Tot_Min [(raw$Housework_Tot_Min != 0) & 
(raw$Housework_Tot_Min < 1000) & (raw$Gender == "Female")]

and:

histogram(~ raw$Housework_Tot_Min [(raw$Housework_Tot_Min != 0) & 
(raw$Housework_Tot_Min < 1000) & (raw$Gender == "Male")]

I get two histograms again, but they look very different

Does anyone have insight on why these outputs don't match? I have a bunch more binary-type panels to plot, and having to do them separately really defeats the purpose of working with the lattice package!

I apologize if this belies a fundamental misunderstanding of an easy concept, I'm still very much a beginner at R! Many thanks for the help.

Please refer to [How do I ask a good question?](http://stackoverflow.com/help/how-to-ask). You should at least provide some sample data in order to make the code reproducible. — fdetsch, Apr 14 '16 at 09:19
Note: When plotting a subset of your data, either **1)** subset the data.frame in the `data` argument or **2)** use the `subset` argument. Subsetting in the formula is likely not the way to go about this. — BenBarnes, Apr 15 '16 at 06:06
Thanks Ben; is there any reason to use the data argument other than aesthetics and saving yourself typing $raw all the time? Just curious! — jwint, Apr 18 '16 at 16:29

score 2 · Answer 1 · edited Apr 15 '16 at 06:09

The problem is related with differing values in panel.args.common(i.e., the arguments common to all the panel functions, see ?trellis.object). Here is some sample code to clarify my point.

library(lattice)

## paneled plot
hist1 <- histogram( ~ Sepal.Width | Species, data = iris)
hist1$panel.args.common

# $breaks
# [1] 1.904 2.228 2.552 2.876 3.200 3.524 3.848 4.172 4.496
# 
# $type
# [1] "percent"
#
# $equal.widths
# [1] TRUE
# 
# $nint
# [1] 8

## single plot    
hist2 <- histogram( ~ Sepal.Width, data = iris[iris$Species == "setosa", ])
hist2$panel.args.common

# $breaks
# [1] 2.216 2.540 2.864 3.188 3.512 3.836 4.160 4.484
# 
# $type
# [1] "percent"
# 
# $equal.widths
# [1] TRUE
# 
# $nint
# [1] 7

nint (number of histogram bins, see ?histogram) and breaks (breakpoints of the bins) are calculated across all target panels, and therefore vary between hist1 and hist2. If you want these arguments to be identical so that the two plots look similar, you just have to run the following line of code after the two plots have been created.

hist2$panel.args.common <- hist1$panel.args.common
## or vice versa, depending on the number of bins and breakpoints to use

library(gridExtra)
grid.arrange(hist1, hist2, ncol = 2)

Thanks for the help. However, the issue is not that the axes and bin widths are different. The issue is that on the single sex plots, the Male data max value is c. 150 minutes. However, in the split panel, the distribution is entirely different (different max value as well). Based on the code, why are the two types of plots not outputting the same values? Thanks again! — jwint, Apr 18 '16 at 16:28
Ah, now I get your point. You should report this issue to the [R-help mailing list](https://www.r-project.org/mail.html) since it probably requires moderation from the developer site. — fdetsch, Apr 19 '16 at 09:17

score 0 · Accepted Answer · answered Apr 23 '16 at 23:52

Turns out that the issue was around a mismatch of data based on the exclusions applied using the brackets. Instead of:

histogram(~ raw$Housework_Tot_Min [(raw$Housework_Tot_Min != 0) & 
(raw$Housework_Tot_Min < 1000)] | raw$Gender)

It should read:

histogram(~ Housework_Tot_Min [(Housework_Tot_Min != 0) & (Housework_Tot_Min < 1000)] | 
        Gender [(Housework_Tot_Min != 0) & (Housework_Tot_Min < 1000)], data = raw,
      main = "Time Observed Housework by Gender",
      xlab = "Minutes spent",
      breaks = seq(from = 0, to = 400, by = 20))

Note that the exclusions are now applied to both the housework time and gender variables, eliminating the mismatches in the data.

The correct plot has been pasted below. Thanks again to all for the guidance.

Updated Histogram

Conditional Histograms Using Lattice Package, Output Plots Incorrect

2 Answers2