9

I have a dataframe ("data") with 7 columns (2 Factor, 5 num). The first column is containing the names of 7 different countries and in the following columns I have collected data for different parameters (like population, GDP etc.) characterizing each country. In the last column a factor variable assigns which continent the respective country belongs to.

The data looks like this:

structure(list(Country = structure(c(5L, 4L, 7L, 2L, 1L, 6L, 
3L), .Label = c("Brazil", "Chile", "China", "France", "Germany", 
"India", "Netherlands"), class = "factor"), GDP = c(0.46, 0.57, 
0.75, 0.56, 0.28, 0.88, 1), Population = c(0.18, 0.09, 0.54, 
0.01, 0.02, 0.17, 0.84), Birth.rate = c(87.21, 18.34, 63.91, 
14.21, 5.38, 51.19, 209.26), Income = c(43.89, 18.23, 63.91, 
12.3, 0.1, 14.61, 160.82), Savings = c(43.32, 0.11, 0, 1.91, 
5.29, 36.58, 50.38), Continent = structure(c(2L, 2L, 2L, 3L, 
3L, 1L, 1L), .Label = c("Asia", "Europe", "South America"), class = "factor")), .Names = c("Country", 
"GDP", "Population", "Birth.rate", "Income", "Savings", "Continent"
), class = "data.frame", row.names = c(NA, -7L))

I need some sort of loop function which plots (e.g. scatter plot) every single column against each other so that in the end every column (except the first and the last, i.e. the two factor variables) has been plotted against all other columns but each in a single plot chart (not all plots in one). Preferably all these plots are being saved to some folder on my local machine.

Also it would be great if the x and y axis are already labeled according to the respective two columns that are plotted against each other. Moreover it would be convenient to have a label next to each point in the plot displaying the respective country name. Lastly it would be nice to have three different colors for the points of the countries according to the three different continents.

So far I only have a piece of code that goes like

for (i in seq(1,length(data),1)) {
   plot(data[,i], ylab=names(data[i]), xlab="Country", 
   text(i, labels=Country, pos=4, cex =.5)) 
} 

As you can see it only plots each column against the first column ("Country") which is not what I want in the end.

Do you have any idea how I could achieve this?

zx8754
  • 52,746
  • 12
  • 114
  • 209
Jonathan Rhein
  • 1,616
  • 3
  • 23
  • 47

2 Answers2

11

You can use pairs() directly from R. Note that dt represents your dataset.

pairs(dt)

enter image description here

dt <- structure(list(Country = structure(c(5L, 4L, 7L, 2L, 1L, 6L, 
3L), .Label = c("Brazil", "Chile", "China", "France", "Germany", 
"India", "Netherlands"), class = "factor"), GDP = c(0.46, 0.57, 
0.75, 0.56, 0.28, 0.88, 1), Population = c(0.18, 0.09, 0.54, 
0.01, 0.02, 0.17, 0.84), Birth.rate = c(87.21, 18.34, 63.91, 
14.21, 5.38, 51.19, 209.26), Income = c(43.89, 18.23, 63.91, 
12.3, 0.1, 14.61, 160.82), Savings = c(43.32, 0.11, 0, 1.91, 
5.29, 36.58, 50.38), Continent = structure(c(2L, 2L, 2L, 3L, 
3L, 1L, 1L), .Label = c("Asia", "Europe", "South America"), class =      "factor")), .Names = c("Country",  
"GDP", "Population", "Birth.rate", "Income", "Savings", "Continent"
), class = "data.frame", row.names = c(NA, -7L))
Worice
  • 3,847
  • 3
  • 28
  • 49
  • and ist there some way of obtaining all these plots which pairs() generates each in a single plot window? – Jonathan Rhein Apr 19 '16 at 05:12
  • It would defeat the purpose of the function. The scatterplot matrix allows you to explore the dataset for interesting correlations. Once you find one that you consider relevant, you can plot it singularly. – Worice Apr 19 '16 at 07:34
  • You are welcome, we are here to help each others. Then, if an answer actually helped you in solving your problem, please give it the credit it deserves, by checking it or by assigning it a point. Bye! – Worice Apr 19 '16 at 14:42
  • I would really much like to but my reputation score is still to low in order to vote... – Jonathan Rhein Apr 19 '16 at 15:51
2

I've alway thought that splom function in package 'lattice' was quite useful for this sort of exploratory analysis. This is obviously not a great example since it obscures the group memberships but it shows the combinations of points and a non-parametric regression line in the "pairs" format:

png()
    print( splom(~iris[1:4], groups = Species, data = iris,
          panel = function(x, y, i, j, ...) {
          panel.points(x,y, ...)
          panel.loess(x,y, ...)
      })); dev.off()

enter image description here

IRTFM
  • 258,963
  • 21
  • 364
  • 487
  • You can try also the `corrgram` function/package. With this function you can include correlations at the same time. – Roman Apr 13 '16 at 14:52