I need to join 2 pipes with same set of fields, i.e ('id, 'groupName, 'name), same way as SQL UNION works. How it is possible to do it in Twitter Scalding?
Asked
Active
Viewed 1,268 times
3 Answers
5
Use ++ to concatenate the pipes then use project to get rid of the id field.
If this answer is too concise, let me know and I'll try to expand.

samthebest
- 30,803
- 25
- 102
- 142
-
This would work if its a `union all`, if it is just `union` you would have dedupe the pipe afterwards. – Sibimon Sasidharan Apr 05 '16 at 04:18
0
to join two pipes on three sets of fields, you first want to know which pipe operates on the smaller dataset:
largerPipe1.joinWithSmaller(('id1, 'groupName1, 'name1) -> ('id2, 'groupName2, 'name2), smallerPipe2)
notice that the field names do not need to be the same. you just have to have them in the same order. The result will contain only the Symbol names in the largerPipe1.
note on the comment below: the ++ concatenate operation merely appends the data from one pipe to another. This is not a join.

Davis Dulin
- 629
- 5
- 9
-
1Right. I think ++ is more similar to what the asker wanted, as he says he wants SQL UNION, not SQL JOIN. – Simon Radford May 23 '14 at 22:29
0
def ++[U >: T](other: TypedPipe[U]): TypedPipe[U]
Merge two TypedPipes (no order is guaranteed) This is only realized when a group (or join) is performed.

Leo Tang
- 1