0

I have two load statements A and B. In each one I have a surrogate key. I want to match the surrogate key columns if both keys will match the stored data.

I tried the following code.

A = LOAD 'a/data/' using PigStorage('\t') as (SourceWebSite:chararray,PropertyID:chararray,ListedOn:chararray,ContactName:chararray,TotalViews:int,Price:chararray);
B = LOAD 'b/data/' using PigStorage('\t') as (SourceWebsite:chararray,PropertyType:chararray,IPLSNO:int,Locality:chararray,City:chararray,Price:chararray);
C = COGROUP A BY Price, B BY Price;
D = FOREACH C GENERATE FLATTEN((IsEmpty(A) ? null : A)), FLATTEN((IsEmpty(B) ? null : B));

The above command prints all the data.

mr2ert
  • 5,146
  • 1
  • 21
  • 32
dazzles dina
  • 23
  • 1
  • 5

1 Answers1

0

If I understand it right you would like to have dose data where both A and B has any data for the given price, am I right? Than you may have to use filter:

D = FILTER C BY (NOT IsEmpty(A) AND NOT IsEmpty(B));

The D will contain those data rows where both A and B has value for the price used to group.

kecso
  • 2,387
  • 2
  • 18
  • 29