Highest Voted 'spark-csv' Questions

2

votes

1 answer

How to split input file name and add specific value in the spark data frame column

This is how i load my csv file in spark data frame val sqlContext = new org.apache.spark.sql.SQLContext(sc) import sqlContext.implicits._ import org.apache.spark.{ SparkConf, SparkContext } import java.sql.{Date, Timestamp} import…

asked Oct 05 '17 at 18:10

user7547751

2

votes

1 answer

Error while reading very large files with spark csv package

We are trying to read a 3 gb file which has multiple new line character in one its column using spark-csv and univocity 1.5.0 parser, but the file is getting split in the multiple column in some row on the basis of newline character. This scenario…

scala apache-spark apache-spark-sql spark-csv univocity

asked Dec 17 '16 at 06:19

Rajat Mishra

3,635
4
27
41

2

votes

2 answers

Error while exporting spark sql dataframe to csv

I have referred the following links in order to understand how to export spark sql dataframe in python https://github.com/databricks/spark-csv How to export data from Spark SQL to CSV My code: df = sqlContext.createDataFrame(routeRDD,…

apache-spark pyspark apache-spark-sql spark-csv

asked Dec 01 '16 at 09:26

Hardik Gupta

4,700
9
41
83

2

votes

1 answer

Index out of bounds error when doing dataframe union on bzip2 csv data

Problem is pretty weird. If I work with the uncompressed file, there is no issue. But, if I work with the compressed bz2 file, I get an index out of bounds error. From what I've read, it apparently is spark-csv parser that doesn't detect the end of…

scala apache-spark spark-csv

asked Oct 16 '16 at 20:36

flipper2gv

41
3

2

votes

1 answer

Dynamically loading com.databricks:spark-csv spark package to my application

I need to load com.csv spark packages dynamically to my application, using spark submit , it works spark-submit --class "DataLoaderApp" --master yarn --deploy-mode client --packages com.databricks:spark-csv_2.11:1.4.0 …

apache-spark spark-csv

asked Aug 16 '16 at 07:25

Mahdi

787
1
8
33

2

votes

0 answers

Spark: spark-csv partitioning and parallelism in subsequent DataFrames

I'm wondering how to enforce usage of subsequent, more appropriately partitioned DataFrames in Spark when importing source data with spark-csv. Summary: spark-csv doesn't seem to support explicit partitioning on import like sc.textFile() does.…

scala apache-spark dataframe apache-spark-sql spark-csv

asked Jul 07 '16 at 19:19

chucknelson

2,328
3
24
31

2

votes

1 answer

Spark saving df as csv throws error

I am using pyspark and have a dataframe loaded. When I try to save it as a CSV file, I get the error below. I initialize spark like this: ./pyspark --master local[4] --executor-memory 14g --driver-memory 14g --conf…

apache-spark spark-csv

asked May 16 '16 at 17:37

skunkwerk

2,920
2
37
55

2

votes

0 answers

Python-Spark IllegalArgumentException when load CSV to DataFrame with DateType by spark-csv_2.10-1.3.0

I am trying to load csv file to dataframe by using spark-csv_2.10-1.3.0 sqlContext.read.format('com.databricks.spark.csv') .options(header='true',dateFormat='dd/MM/YYYY hh:mm') .load('test.csv',schema =…

python csv pyspark apache-spark-sql spark-csv

asked May 04 '16 at 18:11

Bo Wan

35
5

2

votes

4 answers

Decimal data type not storing the values correctly in both spark and Hive

I am having a problem storing with the decimal data type and not sure if it is a bug or I am doing something wrong The data in the file looks like this Column1 column2 column3 steve 100 100.23 ronald 500 20.369 maria 600 …

apache-spark hive apache-spark-sql spark-csv

asked Feb 04 '16 at 19:01

newSparkbabie

73
1
1
9

1

vote

2 answers

DateType column read as StringType from CSV file even when appropriate schema provided

I am trying to read a CSV file using PySpark containing a DateType field in the format "dd/MM/yyyy". I have specified the field as DateType() in schema definition and also provided the option "dateFormat" in DataFrame CSV reader. However, the output…

apache-spark date pyspark schema spark-csv

asked Jun 26 '22 at 16:10

Monami Sen

119
1
1
12

1

vote

1 answer

Why I'm getting CSVHeaderChecker:69 - CSV header does not conform to the schema.?

When reading the csv data I'm getting the warning like that and no data is picked to the dataFrame batches. The schema is exactly as exists in the csv. What could be the reason of the worning and the wrong behavior?

apache-spark spark-structured-streaming spark-csv

asked Oct 17 '21 at 21:15

Eljah

4,188
4
41
85

1

vote

2 answers

Spark CSV reader : garbled Japanese text and handling multilines

In my Spark job (spark 2.4.1) , I am reading CSV files on S3.These files contain Japanese characters.Also they can have ^M character (u000D) so I need to parse them as multiline. First I used following code to read CSV files: implicit class…

scala apache-spark character-encoding apache-spark-sql spark-csv

asked May 18 '20 at 11:35

Ashika Umanga Umagiliya

8,988
28
102
185

1

vote

0 answers

Spark CSV : Parse files deliminated by Ascii æ (Hex E6)

I have large data files deliminated by ASCII character æ (Hex E6). My code snipped for parsing the file is as follows ,but seems the parser does not slit values properly (I use Spark 2.4.1) implicit class DataFrameReadImplicits (dataFrameReader:…

apache-spark apache-spark-sql spark-csv

asked May 18 '20 at 03:02

Ashika Umanga Umagiliya

8,988
28
102
185

1

vote

3 answers

CSV format is not loading in spark-shell

Using spark 1.6 I tried following code: val diamonds = spark.read.format("csv").option("header", "true").option("inferSchema", "true").load("/got_own/com_sep_fil.csv") which caused the error error: not found: value spark

scala csv dataframe apache-spark spark-csv

asked Apr 29 '20 at 07:54

abdul sattar

11
1
2

1

vote

1 answer

Spark - handle blank values in CSV file

Let's say I've got a simple pipe delimited file, with missing values: A|B||D I read that into a dataframe: val foo = spark.read.format("csv").option("delimiter","|").load("/path/to/my/file.txt") The missing third column, instead of being a null…

apache-spark spark-csv

asked Jan 23 '20 at 16:27

Andrew

8,445
3
28
46

Questions tagged [spark-csv]