Read TSV file in pyspark

Question

What is the best way to read .tsv file with header in pyspark and store it in a spark data frame.

I am trying to use "spark.read.options" and "spark.read.csv" commands however no luck.

Thanks.

Regards, Jit

Hi JKD, welcome to SO! Please read up on [asking questions](https://stackoverflow.com/help/how-to-ask) before writing your next one. Happy coding! — Diggy., May 14 '20 at 14:13
Does this answer your question : https://stackoverflow.com/questions/43508054/spark-sql-how-to-read-a-tsv-or-csv-file-into-dataframe-and-apply-a-custom-sche?rq=1 — Ehsan, May 14 '20 at 14:19
@Ehsan Do we have to create a schema always (as it involves opening the file in local machine) ? Can't we use the header column as column names? — Jitu, May 14 '20 at 14:49

Shubham Jain · Accepted Answer · 2020-05-16T06:58:35.590

12

Well you can directly read the tsv file without providing external schema if there is header available as:

df = spark.read.csv(path, sep=r'\t', header=True).select('col1','col2')

Since spark is lazily evaluated it'll read only selected columns. Hope it helps.

edited May 16 '20 at 06:58

answered May 14 '20 at 16:11

Shubham Jain

1 Answers1