1
sqlContext.read.format('orc').load(hdfspath)
sqlContext.read.format('parquet').load(hdfspath)

This works fine

sqlContext.read.format('sequencefile').load(hdfspath)

But sequencefile format does not work like that.

How can I read a sequence file as a dataframe in PySpark?

OneCricketeer
  • 179,855
  • 19
  • 132
  • 245
Tronald Dump
  • 1,300
  • 3
  • 16
  • 27

1 Answers1

1

Use sequenceFile method from SparkContext:

from pyspark.sql.functions import input_file_name 

df = sc.sequenceFile("/tmp/foo/").toDF()
Yehor Krivokon
  • 837
  • 5
  • 17