2

I have a csv file with flight information:

10397,ATL,GA,10135,ABE,PA,692,188
10397,ATL,GA,10135,ABE,PA,692,142
10434,AVP,PA,10135,ABE,PA,50,65
...

Columns are as follows: ORIGIN_AIRPORT_ID,ORIGIN,ORIGIN_STATE_ABR,DEST_AIRPORT_ID,DEST,DEST_STATE_ABR,DISTANCE,TIME

I want to create edge and vertex rdds from these. (the data is stored infiltflights.csv) for edge I wrote the following: val flighttime:RDD[Edge[Integer]] = sc.textFile("filtflights.csv").map {line => val row = line.split(",") Edge(row(0).toInt, row(3).toInt, row(7).toInt) } But I am not sure about vertexes. From what I've gathered I can create a class called Airport for example, and do the following:

val vertices: RDD[(VertexId,Airport)] = sc.textFile("filtflights.csv").map

but I am unsure as to how exactly set VertexId to be ORIGIN_AIRPORT_ID of any given row (I live with the assumption that every node will be origin eventually so don't need to create vertices from the DEST_AIRPORT_ID column.)

ZsoltF
  • 41
  • 6

0 Answers0