1

I want to generate around 10 GB of sample data where I have columns with sample values and cardinality using PIG script.

Example:-

A        B         C
1   10/10/2011  abc-xyz
2   10/11/2012  assd-asd
3   10/12/2011  asd-asd
1   10/13/2013  abc-xyz
1   10/14/2011  assd-asd

Cardinality of Column A - 8
Cardinality of Column B - Year(3) , Month(36)
Cardinality of Column C - 24

Can you please help me with this. Is it possible to do this kind of transformation using PIG.

Nikita
  • 13
  • 5

1 Answers1

0

That's indeed possible.

You can generate three datasets, each with one columns, such as :

-- I assume your big dataset is named data and contains three fields: a, b, c
columnA = FOREACH data GENERATE a;
columnADistinct = DISTINCT columnA;
countA = FOREACH (GROUP columnADistinct ALL) GENERATE COUNT(columnADistinct);

Same for the other columns.

glefait
  • 1,651
  • 1
  • 13
  • 11
  • Thanks for your reply.. I have generated 10GB data through random function and if condition. – Nikita Jul 24 '15 at 08:01