Highest Voted 'spark-csv' Questions

6

votes

2 answers

How to save CSV with all fields quoted?

The below code does not add the double quotes which is the default. I also tried adding # and single quote using option quote with no success. I also used quoteMode with ALL and NON_NUMERIC options, still no change in the…

scala apache-spark spark-csv

asked Apr 26 '17 at 20:31

Arvind Kandaswamy

1,821
3
21
30

5

votes

2 answers

Why is difference between sqlContext.read.load and sqlContext.read.text?

I am only trying to read a textfile into a pyspark RDD, and I am noticing huge differences between sqlContext.read.load and sqlContext.read.text. s3_single_file_inpath='s3a://bucket-name/file_name' indata =…

apache-spark pyspark apache-spark-sql spark-csv

asked Dec 05 '17 at 02:11

makansij

9,303
37
105
183

5

votes

1 answer

Scala: Spark SQL to_date(unix_timestamp) returning NULL

Spark Version: spark-2.0.1-bin-hadoop2.7 Scala: 2.11.8 I am loading a raw csv into a DataFrame. In csv, although the column is support to be in date format, they are written as 20161025 instead of 2016-10-25. The parameter date_format includes…

scala apache-spark apache-spark-sql spark-csv

asked Nov 04 '16 at 23:24

Sai Wai Maung

1,607
6
18
28

4

votes

2 answers

inferSchema=true isn't working for csv file reading n Spark Structured Streaming

I'm getting the error message java.lang.IllegalArgumentException: Schema must be specified when creating a streaming source DataFrame. If some files already exist in the directory, then depending on the file format you may be able to create a static…

scala apache-spark spark-structured-streaming spark-csv

asked Oct 17 '21 at 19:56

Eljah

4,188
4
41
85

4

votes

3 answers

Spark CSV package not able to handle \n within fields

I have a CSV file which I am trying to load using Spark CSV package and it does not load data properly because few of the fields have \n within them for e.g. the following two rows "XYZ", "Test Data", "TestNew\nline", "OtherData" "XYZ", "Test…

scala apache-spark apache-spark-sql spark-csv apache-spark-1.6

asked May 30 '17 at 17:17

Umesh K

13,436
25
87
129

4

votes

1 answer

Spark CSV 2.1 File Names

i'm trying to save DataFrame into CSV using the new spark 2.1 csv option df.select(myColumns: _*).write .mode(SaveMode.Overwrite) .option("header", "true") .option("codec",…

apache-spark apache-spark-sql spark-csv

asked Mar 18 '17 at 05:09

Avi P

53
4

4

votes

2 answers

Spark Stand Alone - Last Stage saveAsTextFile takes many hours using very little resources to write CSV part files

We run Spark in Standalone mode with 3 nodes on a 240GB "large" EC2 box to merge three CSV files read into DataFrames to JavaRDDs into output CSV part files on S3 using s3a. We can see from the Spark UI, the first stages reading and merging to…

apache-spark amazon-ec2 spark-csv

asked Nov 18 '16 at 04:26

twiz911

634
1
9
18

4

votes

0 answers

Spark-csv returns and empty DataFrame when passed a compressed file

I'm looking to consume some compressed csv files into DataFrames so that I can eventually query them using SparkSQL. I would normally just use sc.textFile() to consume the file and use various map() transformations to parse and transform the data…

python-2.7 csv apache-spark pyspark spark-csv

asked Nov 25 '15 at 01:08

justafisch

41
3

3

votes

1 answer

Spark - CSV - Write Options - Quotes

Hope everyone is doing well. While going through the spark csv datasource options for the question I am quite confused on the difference between the various quote related options available. Do we have any detailed differences between them ? Does…

csv apache-spark databricks spark-csv

asked Nov 14 '22 at 08:07

rainingdistros

450
3
11

3

votes

0 answers

Streaming from CSV files with Spark

I am trying to use Spark Streaming to collect data from CSV files located on NFS. The code I have is very simple, and so far I have been running it only in spark-shell, but even there I am running into some issues. I am running spark-shell with a…

apache-spark spark-csv

asked Nov 13 '17 at 21:23

Dan Markhasin

752
2
8
20

3

votes

1 answer

How to define schema of streaming dataset dynamically to write to csv?

I have a streaming dataset, reading from kafka and trying to write to CSV case class Event(map: Map[String,String]) def decodeEvent(arrByte: Array[Byte]): Event = ...//some implementation val eventDataset: Dataset[Event] = spark .readStream …

scala apache-spark apache-kafka spark-structured-streaming spark-csv

asked Jul 28 '17 at 18:51

K P

861
1
8
25

3

votes

2 answers

Spark 2.1 cannot write Vector field on CSV

I was migrating my code from Spark 2.0 to 2.1 when I stumbled into a problem related to Dataframe saving. Here's the code import org.apache.spark.sql.types._ import org.apache.spark.ml.linalg.VectorUDT val df =…

csv apache-spark apache-spark-sql spark-csv

asked May 24 '17 at 14:05

CARREAU Clément

657
5
16

3

votes

1 answer

Spark CSV issue with new line (LF) character in the field of file imported using scala

I am trying to load a CSV (tab delimited) using spark csv - by scala. what I observed is , if a column contains new line character LF (\n) spark considering it as end of the line even though we have double quotes on both sides of the column in the…

scala apache-spark cassandra spark-cassandra-connector spark-csv

asked Apr 08 '17 at 09:25

Goutham

97
2
10

3

votes

2 answers

How to add header and column to dataframe spark?

I have got a dataframe, on which I want to add a header and a first column manually. Here is the dataframe : import org.apache.spark.sql.SparkSession val spark = SparkSession.builder.master("local").appName("my-spark-app").getOrCreate() val df =…

scala apache-spark-sql spark-csv

asked Mar 31 '17 at 13:08

user3637823

95
1
1
6

3

votes

2 answers

filter and save first X lines of a dataframe

I'm using pySpark to read and calculate statistics for a dataframe. The dataframe looks like: TRANSACTION_URL START_TIME END_TIME SIZE FLAG COL6 COL7 ... www.google.com 20170113093210 20170113093210 150 1 …

apache-spark pyspark apache-spark-sql spark-csv

asked Mar 18 '17 at 09:04

Adiel

1,203
3
18
31

Questions tagged [spark-csv]