I want to upload a dataframe to a server as csv file with Gzip encoding without saving it on the disc.
It is easy to build some csv file with Gzip encoding using spark-csv lib:
df.write
.format("com.databricks.spark.csv")
.option("header",…
For example, if I have something like
"I like, cookies"
I do NOT want Spark's read() to csv method to split between the like and cookies. I want it to be parsed as "I like, cookies". I thought this was common csv practice, but this is being…
I am facing issue while reading the csv file using spark with multiline option as true. Is there any criteria when we should set multiline as true or false?
Using windows 10, scala 2.11.11 and spark 2.2.0 version.
Dataset that I am using to test…
I have a CSV file where the last column is inside parenthesis and the values are separated by commas. The number of values is variable in the last column. When I read them to as Dataframe with some column names as follows, I get Exception in thread…
I have this csv file which contains description of several cities:
Cities_information_extract.csv
I can parse this file just fine using python pandas.read_csv or R read.csv methods. They both return 693 rows for 25 columns.
I am trying,…
When I am reading the CSV file with spark-csv, inferschema=true, I am able to get the count on dataframe (df.count).
But after when I removed spaces in column names and created a new schema and created new dataframe with the help of the first…
I have a non-standard kafka format messages
so the code looks like as following
val df:Dataset[String] = spark
.readStream
.format("kafka")
.option("subscribe", topic)
.options(kafkaParams)
.load()
.select($"value".as[Array[Byte]])
…
I'm trying to write a dataframe to a *.csv file to HDFS using Databricks' spark-csv_2.10 dependency. The dependency seems to work fine as I'm able to read a .csv file to a DataFrame. But when I perform a write, I get the following error. The…
I am trying to write a spark DF with Array of string to a csv file, I followed the instructions provided in the site here
But my column also contains nulls, How can i handle the nulls and write the DF to file
I have a csv file
1577,true,false,false,false,true
I tried to load the csv file with custom schema,
val customSchema = StructType(Array(
StructField("id", StringType, nullable = false),
StructField("flag1", BooleanType, nullable =…
I am writing a data frame using spark csv library. I am using spark 1.6. I was wondering if there is a way to specify the new line character. Usually, I think it is \n.
Or if not, is there a good solution to changing the new line character?…
I am trying to merge all spark output part files in a directory and create a single file in Scala.
Here is my code:
import org.apache.spark.sql.functions.input_file_name
import org.apache.spark.sql.functions.regexp_extract
def merge(srcPath:…
I am doing a join on two data frame having data 280 GB and 1 GB respectively.
My actual spark job which is computing join is fast but shuffle read and write takes very long time and that makes overall spark job very slow.
I am using m3.2xlarge 10…
I have to do the records count in a file per partition in spark data frame and then I have to write output to XML file.
Here is my data frame.
dfMainOutputFinalWithoutNull.coalesce(1).write.partitionBy("DataPartition","StatementTypeCode")
…