The below code does not add the double quotes which is the default. I also tried adding # and single quote using option quote with no success. I also used quoteMode with ALL and NON_NUMERIC options, still no change in the…
I am only trying to read a textfile into a pyspark RDD, and I am noticing huge differences between sqlContext.read.load and sqlContext.read.text.
s3_single_file_inpath='s3a://bucket-name/file_name'
indata =…
Spark Version: spark-2.0.1-bin-hadoop2.7
Scala: 2.11.8
I am loading a raw csv into a DataFrame. In csv, although the column is support to be in date format, they are written as 20161025 instead of 2016-10-25. The parameter date_format includes…
I'm getting the error message
java.lang.IllegalArgumentException: Schema must be specified when creating a streaming source DataFrame. If some files already exist in the directory, then depending on the file format you may be able to create a static…
I have a CSV file which I am trying to load using Spark CSV package and it does not load data properly because few of the fields have \n within them for e.g. the following two rows
"XYZ", "Test Data", "TestNew\nline", "OtherData"
"XYZ", "Test…
i'm trying to save DataFrame into CSV using the new spark 2.1 csv option
df.select(myColumns: _*).write
.mode(SaveMode.Overwrite)
.option("header", "true")
.option("codec",…
We run Spark in Standalone mode with 3 nodes on a 240GB "large" EC2 box to merge three CSV files read into DataFrames to JavaRDDs into output CSV part files on S3 using s3a.
We can see from the Spark UI, the first stages reading and merging to…
I'm looking to consume some compressed csv files into DataFrames so that I can eventually query them using SparkSQL.
I would normally just use sc.textFile() to consume the file and use various map() transformations to parse and transform the data…
Hope everyone is doing well.
While going through the spark csv datasource options for the question I am quite confused on the difference between the various quote related options available.
Do we have any detailed differences between them ?
Does…
I am trying to use Spark Streaming to collect data from CSV files located on NFS.
The code I have is very simple, and so far I have been running it only in spark-shell, but even there I am running into some issues.
I am running spark-shell with a…
I have a streaming dataset, reading from kafka and trying to write to CSV
case class Event(map: Map[String,String])
def decodeEvent(arrByte: Array[Byte]): Event = ...//some implementation
val eventDataset: Dataset[Event] = spark
.readStream
…
I was migrating my code from Spark 2.0 to 2.1 when I stumbled into a problem related to Dataframe saving.
Here's the code
import org.apache.spark.sql.types._
import org.apache.spark.ml.linalg.VectorUDT
val df =…
I am trying to load a CSV (tab delimited) using spark csv - by scala.
what I observed is , if a column contains new line character LF (\n)
spark considering it as end of the line even though we have double quotes on both sides of the column in the…
I have got a dataframe, on which I want to add a header and a first column
manually. Here is the dataframe :
import org.apache.spark.sql.SparkSession
val spark = SparkSession.builder.master("local").appName("my-spark-app").getOrCreate()
val df =…
I'm using pySpark to read and calculate statistics for a dataframe.
The dataframe looks like:
TRANSACTION_URL START_TIME END_TIME SIZE FLAG COL6 COL7 ...
www.google.com 20170113093210 20170113093210 150 1 …