I was going through the examples for patent data in Hadoop in action. Could you please explain in detail about the data sets being used?
The patent citation data set
This data set contains two columns citing and cited patents. Citing column refers to the owner ID who submitted the patent? Cited column refers to the patent ID which forms the key to the second data set?The patent description data set
There are number of fields in this data set. To form the mapping for this two datasets, is it citing or cited column from first data set that has corresponding key in the second dataset first column (patent)?