Highest Voted 'spark-csv' Questions

3

votes

1 answer

Using spark to merge data in sorted order to csv files

I have a data set like this: name time val ---- ----- --- fred 04:00 111 greg 03:00 123 fred 01:00 411 fred 05:00 921 fred 11:00 157 greg 12:00 333 And csv files in some folder, one for each unique name from the data set: fred.csv greg.csv The…

asked Mar 16 '17 at 00:17

Greg Clinton

365
1
7
18

3

votes

1 answer

Programmatically generate the schema AND the data for a dataframe in Apache Spark

I would like to dynamically generate a dataframe containing a header record for a report, so creating a dataframe from the value of the string below: val headerDescs : String = "Name,Age,Location" val headerSchema =…

apache-spark dataframe apache-spark-sql rdd spark-csv

asked Jan 19 '17 at 15:15

Jon Robinson

33
2
4

3

votes

2 answers

About how to create a custom org.apache.spark.sql.types.StructType schema object starting from a json file programmatically

i have to create a custom org.apache.spark.sql.types.StructType schema object with the info from a json file, the json file can be anything, so i have parametriced it within a property file. This is how it looks the property file: //ruta al esquema…

scala spark-csv

asked Nov 10 '16 at 11:13

aironman

837
5
26
55

3

votes

2 answers

Adding spark-csv package in PyCharm IDE

I have successfully loaded spark-csv library in python standalone mode through $ --packages com.databricks:spark-csv_2.10:1.4.0 Running the above command While running the above command, it creates two folders(jars and cache) at this…

python apache-spark pycharm pyspark spark-csv

asked Jun 27 '16 at 09:31

mahima

1,875
1
11
15

3

votes

0 answers

Saving a dataframe using spark-csv package throws exceptions and crashes (pyspark)

I am running a script on spark 1.5.2 in standalone mode (using 8 cores), and at the end of the script I attempt to serialize a very large dataframe to disk, using the spark-csv package. The code snippet that throws the exception is: numfileparts =…

apache-spark pyspark spark-csv

asked Apr 20 '16 at 07:41

Magnus

371
1
14

3

votes

1 answer

how to avoid spark NumberFormatException: null

I have a general question derived from the specific exception I have encountered. I'm querying data with dataproc using spark 1.6. I need to get 1 day of data (~10000 files) from 2 logs and then do some transformations. However, my data may (or may…

scala apache-spark apache-spark-sql spark-csv

asked Mar 17 '16 at 10:05

Zahiro Mor

1,708
1
16
30

3

votes

2 answers

Characters get corrupt if spark.executor.memory is not set properly when importing CSV to DataFrame

UPDATE: Please hold on to this question. I found this might be a problem of Spark 1.5 itself, for I am not using the official version of Spark. I'll keep updating this question. Thank you! I noticed a strange bug recently when using Spark-CSV to…

csv apache-spark spark-csv

asked Mar 07 '16 at 05:11

DarkZero

2,259
3
25
36

2

votes

1 answer

How to split an array structure to csv in PysPark

Here is an exemple data and schema : mySchema = StructType([ StructField('firstname', StringType()), StructField('lastname', StringType()), StructField('langages', ArrayType(StructType([ StructField('lang1', StringType()), …

python arrays dataframe pyspark spark-csv

asked Oct 15 '21 at 13:31

Fabrice

355
4
9

2

votes

1 answer

Reading a file in Spark with newline(\n) in fields, escaped with backslash(\) and not quoted

I have an input file that has following structure, col1, col2, col3 line1filed1,line1filed2.1\ line1filed2.2, line1filed3 line2filed1,line2filed2.1\ line2filed2.2, line2filed3 line3filed1,…

scala apache-spark amazon-emr spark-csv

asked Apr 02 '20 at 17:52

Kishore Indraganti

1,296
3
17
34

2

votes

2 answers

Parse Micro/Nano Seconds timestamp in spark-csv Dataframe reader : Inconsistent results

I'm trying to read a csv file which has timestamps till nano seconds. sample content of file TestTimestamp.csv- spark- 2.4.0, scala - 2.11.11 /** * TestTimestamp.csv - * 101,2019-SEP-23 11.42.35.456789123 AM * */ Tried to…

apache-spark spark-csv

asked Oct 25 '19 at 11:14

ValaravausBlack

691
5
12

2

votes

2 answers

Spark - loading many small csv takes very long

Description At my work place we have a large amount of data that needs processing. It concerns a rapidly growing amount of instances (currently ~3000) which all have a few megabytes worth of data stored in gzipped csv files on S3. I have setup a…

amazon-s3 pyspark spark-csv

asked Aug 15 '19 at 10:26

Jeroen Bos

87
9

2

votes

1 answer

Spark 2.4 CSV Load Issue with option "nullvalue"

We were using Spark 2.3 before, now we're on 2.4: Spark version 2.4.0 Using Scala version 2.11.12 (OpenJDK 64-Bit Server VM, Java 1.8.0_212) We had a piece of code running in production that converted csv files to parquet format. One of the options…

scala apache-spark databricks spark-csv

asked Jun 25 '19 at 10:52

KK2486

353
2
3
13

2

votes

1 answer

Strange behavior on CSV parser of Spark 2 when multiLine option is enabled

When creating a DataFrame from a CSV file, if multiLine option is enabled, some file columns are parsed incorrectly. Here goes the code execution. I'll try to show the strange behaviors as the code goes. First, I load the file in two variables:…

apache-spark apache-spark-sql spark-csv apache-spark-2.2

asked May 15 '18 at 09:21

Fernando Lemos

297
3
9

2

votes

1 answer

Prevent delimiter collision while reading csv in Spark

I'm trying to create an RDD using a CSV dataset. The problem is that I have a column location that has a structure like (11112,222222) that I dont use. So when I use the map function with split(",") its resulting in two columns. Here is my code : …

scala apache-spark apache-spark-sql rdd spark-csv

asked Dec 03 '17 at 12:09

sir_sevenboo

23
4

2

votes

3 answers

How to write data as single (normal) csv file in Spark?

I am trying to save a data frame as CSV file in my local drive. But, when I do that so, I get a folder generated and within that partition files were written. Is there any suggestion to overcome this ? My Requirement: To get a normal csv file with…

scala csv apache-spark spark-csv

asked Nov 01 '17 at 11:26

Ramkumar

444
1
7
22

Questions tagged [spark-csv]