I am working with rather large datasets (appx. 4 mio rows per month with 25 numberic attributes and 4 factor attributes). I would like to create a graph that contains per month (for the last 36 months) a boxplot for each numeric attribute per product (one of the 4 factor attributes).
So as an example for product A:
-
_ | -
_|_ | _|_
| | | | |
| | _|_ | |
| | | | |---|
| | |---| | |
|---| | | | |
|_ _| | | |_ _|
| |_ _| |
| | |
- | -
-
--------------------------------------------------------------
jan '10 feb '10 mar '10 ................... feb '13
But since these are quite large datasets I will be working with I would like some advice to get started on how to approach. My idea (but I am not sure if this is possible) is to
- a) extract the data per month per product
- b) create a boxplot for that specific month (so let's say jan'10 for product A)
- c) store the boxplot summary data somewhere
- d) repeat a-c for all months until feb '13
- e) combine all the stored boxplot summary data into one
- f) plot the combined boxplot g) repeat a-f for all other products
So my main question is: is it possible to combine separate boxlot summaries into one and create the combined graph as sketched above from this?
Any help would be appreciated,
Thank you