Highest Voted 'spark-csv' Questions

1

vote

2 answers

Drop column(s) in spark csv data frame

I have a dataframe to which i do concatenation to its all fields. After concatenation it becomes another dataframe and finally I write its output to csv file with partitioned on two of its columns. One of its column is present in first dataframe…

asked Oct 07 '17 at 09:22

user7547751

1

vote

0 answers

Spark CSV Handle Corrupt GZip Files

I have a spark 2.0 java application that is using sparks csv reading utilities to read a CSV file into a dataframe. The problem is that sometimes 1 out of 100 input files may be invalid ( corrupt gzip ) which causes the job to fail…

csv apache-spark gzip spark-csv

asked Mar 13 '17 at 21:04

Nathan Case

655
1
6
15

1

vote

1 answer

Loading nested csv files from S3 with Spark

I have hundreds of gzipped csv files in s3 that I am trying to load. The directory structure resembles the following: bucket -- level1 ---- level2.1 -------- level3.1 ------------ many files -------- level3.2 ------------ many files ----…

csv hadoop apache-spark amazon-s3 spark-csv

asked Jan 30 '17 at 17:24

Nathan Case

655
1
6
15

1

vote

1 answer

Parquet schema and Spark

I am trying to convert CSV files to parquet and i am using Spark to accomplish this. SparkSession spark = SparkSession .builder() .appName(appName) .config("spark.master", master) .getOrCreate(); Dataset logFile =…

java scala apache-spark parquet spark-csv

asked Jan 19 '17 at 11:18

changepicture

466
1
4
10

1

vote

2 answers

Not able to read text file from local file path - Spark CSV reader

We are using Spark CSV reader to read the csv file to convert as DataFrame and we are running the job on yarn-client, its working fine in local mode. We are submitting the spark job in edge node. But when we place the file in local file path instead…

apache-spark-sql spark-csv databricks

asked Dec 24 '16 at 08:54

Shankar

8,529
26
90
159

1

vote

1 answer

NumberFormatException when I try to create a parquet file with a custom schema and BigDecimal types

I need to create a parquet file from csv files using a customized json schema file, like this one: {"type" : "struct","fields" : [ {"name" : "tenor_bank","type" : "string","nullable" : false}, {"name":"tenor_frtb", "type":"string",…

scala parquet spark-csv

asked Dec 05 '16 at 12:01

aironman

837
5
26
55

1

vote

2 answers

Would spark dataframe read from external source on every action?

On a spark shell I use the below code to read from a csv file val df = spark.read.format("org.apache.spark.csv").option("header", "true").option("mode", "DROPMALFORMED").csv("/opt/person.csv") //spark here is the spark session df.show() Assuming…

caching apache-spark spark-csv

asked Dec 05 '16 at 11:31

Andy Dufresne

6,022
7
63
113

1

vote

1 answer

Spark CSV Escape Not Working

I am using spark-core version 2.0.1 with Scala 2.11. I have simple code to read a csv file which has \ escapes. val myDA = spark.read .option("quote",null) .schema(mySchema) .csv(filePath) As per documentation \ is default escape for…

apache-spark spark-csv

asked Oct 27 '16 at 15:03

JNish

145
2
10

1

vote

1 answer

How to provide parserLib and inferSchema options together for spark-csv

sqlContext.read.format("com.databricks.spark.csv").option("header", "true").option("inferSchema","true").option("parserLib", "UNIVOCITY").option("escape","\"").load("file.csv") When I create dataframe using above code I am getting following…

scala apache-spark-sql spark-csv

asked Oct 20 '16 at 06:36

nirali.gandhi

221
1
11

1

vote

0 answers

Databircks.CSV.Write after applying UDF - spark 2.0.0, scala 2.11.8

I have stadalone instance of: - Hadoop 2.7.3 - Scala 2.11.8 - Spark 2.0.0 - SBT 0.13.11 Everything build locally. The code is developed in Intellij and I run it by clicking debug. Everything works fine, unless I try to use a udf def testGeolocation…

scala apache-spark intellij-idea spark-csv databricks

asked Oct 11 '16 at 22:24

MPękalski

6,873
4
26
36

1

vote

2 answers

How to convert column type from str to date when the str is of format dd/mm/yyyy?

I have a large table in sql I imported from a large csv file. A column is recognized as a str when it contains date information of format dd/mm/yyyy. I tried select TO_DATE('12/31/2015') as date but that does not work because TO_DATE function needs…

sql date apache-spark-sql spark-csv databricks

asked Aug 19 '16 at 05:04

Semihcan Doken

776
3
10
23

1

vote

0 answers

Whats the best way to read multiline input format to one record in spark?

Below is the input file(csv) looks like: Carrier_create_date,Message,REF_SHEET_CREATEDATE,7/1/2008 Carrier_create_time,Message,REF_SHEET_CREATETIME,8:53:57 Carrier_campaign,Analog,REF_SHEET_CAMPAIGN,25 Carrier_run_no,Analog,REF_SHEET_RUNNO,7 Below…

java python scala apache-spark spark-csv

asked Jun 15 '16 at 04:30

Gangadhar Kadam

536
1
4
15

1

vote

2 answers

spark-csv falls apart with SparkR & RStudio

I've tried several permutations of the suggestions in How to load csv file into SparkR on RStudio? but I am only able to get the inmemory to Spark solution to…

r apache-spark sparkr spark-csv

asked Jun 09 '16 at 23:12

Chris

1,219
2
11
21

1

vote

2 answers

Using Sparksql and SparkCSV with SparkJob Server

Am trying to JAR a simple scala application which make use of SparlCSV and spark sql to create a Data frame of the CSV file stored in HDFS and then just make a simple query to return the Max and Min of specific column in CSV file. I am getting error…

apache-spark sbt apache-spark-sql spark-jobserver spark-csv

asked May 26 '16 at 08:56

Ashesh Nair

317
5
21

1

vote

1 answer

PySpark: How to compare two dataframes

I have two dataframes which I've loaded from two csv files. Examples: old +--------+---------+----------+ |HOTEL ID|GB |US | +--------+---------+----------+ | 80341| 0.78| 0.7| | 255836| 0.6| 0.6| | 245281| …

dataframe pyspark spark-csv

asked Apr 25 '16 at 18:25

Rafael

572
5
9

Questions tagged [spark-csv]