I am fairly new to Scalding and I am trying to write a scalding program that takes as input 2 datasets: 1) book_id_title: ('id,'title): contains the mapping between book ID and book title, Both are strings. 2) book_sim: ('id1, 'id2, 'sim): contains the similarity between pairs of books, identified by their IDs.
The goal of the scalding program is to replace each (id1, id2) in book_ratings with their respective titles by looking up the book_id_title table. However, I am not able to retrieve the title. I would appreciate it if someone could help with the getTitle() function below.
My scalding code is as follows:
// read in the mapping between book id and title from a csv file
val book_id_title =
Csv(book_file, fields=book_format)
.read
.project('id,'title)
// read in the similarity data from a csv file and map the ids to the titles
// by calling getTitle function
val result =
book_sim
.map(('id1, 'id2)->('title1, 'title2)) {
pair:(String,String)=> (getTitle(pair._1), getTitle(pair._2))
}
.write(out)
// function that searches for the id and retrieves the title
def getTitle(search_id: String) = {
val btitle =
book_id_title
.filter('id){id:String => id == search_id} // extract row matching the id
.project('title) // get the title
}
thanks