I have a csv file with flight information:
10397,ATL,GA,10135,ABE,PA,692,188
10397,ATL,GA,10135,ABE,PA,692,142
10434,AVP,PA,10135,ABE,PA,50,65
...
Columns are as follows: ORIGIN_AIRPORT_ID,ORIGIN,ORIGIN_STATE_ABR,DEST_AIRPORT_ID,DEST,DEST_STATE_ABR,DISTANCE,TIME
I want to create edge and vertex rdds from these.
(the data is stored infiltflights.csv
)
for edge I wrote the following:
val flighttime:RDD[Edge[Integer]] = sc.textFile("filtflights.csv").map {line =>
val row = line.split(",")
Edge(row(0).toInt, row(3).toInt, row(7).toInt)
}
But I am not sure about vertexes.
From what I've gathered I can create a class called Airport
for example, and do the following:
val vertices: RDD[(VertexId,Airport)] = sc.textFile("filtflights.csv").map
but I am unsure as to how exactly set VertexId to be ORIGIN_AIRPORT_ID
of any given row (I live with the assumption that every node will be origin eventually so don't need to create vertices from the DEST_AIRPORT_ID
column.)