Big dataset for hadoop join?

Asked Feb 27 '13 at 19:21

Active Feb 27 '13 at 19:21

Viewed 61 times

I did some joining operation, both mapside and reduce side, with small dataset. I am looking for a gigabyte scale publicly available dataset for measuring performance on cluster. Do you guys any possible many-to-many join dataset?

asked Feb 27 '13 at 19:21

Yeameen

1

Here is similar http://stackoverflow.com/questions/10843892/download-large-data-for-hadoop. I guess that you find something appropriate. GL – www Feb 27 '13 at 21:47
Thanks @WawrzyniecSz.! I am currently looking into those, yet to find any dataset with multiple files which I can use for joining use hadoop. – Yeameen Feb 27 '13 at 23:47
You could add some dummy field like rand(1,row_number/1M) to any of those data sets in map only job. Copy result. Two data sets with many to many relation is ready! – www Feb 28 '13 at 13:33

Big dataset for hadoop join?

0 Answers0