1

I am trying to get all the values from Rows into Columns. I don't have an Index, so find it hard to have all in one column.

Code: getting the values

traceFilters = sqlContext.read.format("csv").options(header='true', delimiter = ',').load("/data/*.txt")

traceFilters.take(5)
fields = [
 StructField("City", StringType(), False),
 StructField("Country", StringType(), False)
]

traceFilters.track(5)

for row in traceFilters.rdd.collect():
    a =  row.City
    print a

This is the data that i am getting from above code:

New York
London
Vienna

and the result that i want.

[ New York, London, Vienna ]

I tried using transpose, but its not working and also with zip. Code that i tried:

print a.transpose()

or val1= a.set_index('City').T

Any help appreciated.

Thanks

1 Answers1

0

It looks like you are just printing each value, but that you really want a list. This appends each value into a list, then prints it:

traceFilters = sqlContext.read.format("csv").options(header='true', delimiter = ',').load("/data/*.txt")

traceFilters.take(5)
fields = [
 StructField("City", StringType(), False),
 StructField("Country", StringType(), False)
]

traceFilters.track(5)

a = []
for row in traceFilters.rdd.collect():
    a.append(row.City)
print(a)
Brian
  • 1,988
  • 1
  • 14
  • 29