Highest Voted 'pyspark-schema' Questions

0

votes

0 answers

Is there a way in Spark SQL to do mergeschema option for parquet file?

I have a parquet table for which I get an error: FileReadException: Error while reading file dbfs:/mnt/gold/catalog.parquet/part-00120-tid-1146522170304013652-7e167102-3a27-46d7-b674-901496f37d84-353-1-c000.snappy.parquet. Parquet column cannot be…

apache-spark apache-spark-sql pyspark-schema

asked Mar 29 '23 at 20:50

DejanS

96
9

0

votes

0 answers

Convert some specific columns that have 0 and 1 values in Kafka messages to False and True in PySpark

Requirement We are consuming messages from Kafka using PySpark. In these JSON messages, there are some keys corresponding to which we have values such as 0 and 1. Now the requirement here is to convert these 0's and 1's to False and True while…

python pyspark apache-kafka spark-streaming-kafka pyspark-schema

asked Mar 17 '23 at 15:59

tall-e.stark

23
4

0

votes

2 answers

Read a nested json string and explode into multiple columns in pyspark

I want to parse a JSON request and create multiple columns out of it in pyspark as follows: { "ID": "abc123", "device": "mobile", "Ads": [ { "placement": "topright", "Adlist": [ { "name": "ad1", …

json apache-spark pyspark apache-spark-sql pyspark-schema

asked Mar 16 '23 at 21:07

Gingerbread

1,938
8
22
36

0

votes

1 answer

Pyspark: Compare Column Values across different dataframe

we are planning to do the following, compare two dataframe, based on comparision add values into first dataframe and then groupby to have combined data. We are using pyspark dataframe and the following are our dataframes. Dataframe1: | Manager |…

python apache-spark pyspark pyspark-pandas pyspark-schema

asked Mar 14 '23 at 11:04

frp farhan

445
5
19

0

votes

0 answers

How to resolve org.apache.spark.SparkException: Kryo serialization failed: Buffer overflow?

While trying to read a file using pyspark i'm geting this error: org.apache.spark.SparkException: Kryo serialization failed: Buffer overflow. Available: 0, required: 60459493. To avoid this, increase spark.kryoserializer.buffer.max value. Here is…

python pyspark apache-spark-sql pyspark-schema

asked Mar 06 '23 at 06:40

JG1

1
2

0

votes

1 answer

commas within a field in a file using pyspark

my data file contains column values that include commas teledyne.com', 'Teledyne Technologies is a leading provider of sophisticated electronic components, instruments & communications products, including defense electronics, data acquisition &…

python pyspark pyspark-schema

asked Mar 04 '23 at 04:04

romi

5
2

0

votes

1 answer

Weird behaviour in Pyspark dataframe

I have the following pyspark dataframe that contains two fields, ID and QUARTER: pandas_df = pd.DataFrame({"ID":[1, 2, 3,4, 5, 3,5,6,3,7,2,6,8,9,1,7,5,1,10],"QUARTER":[1, 1, 1, 1, 1,2,2,2,3,3,3,3,3,4,4,5,5,5,5]}) spark_df =…

python apache-spark pyspark union pyspark-schema

asked Mar 02 '23 at 15:54

Abdessamad139

325
4
16

0

votes

0 answers

How PySpark allows columns with special characters?

The dataframe df_problematic in PySpark has the following columns: +------------+-----------+------------+ |sepal@length|sepal.width|petal_length| +------------+-----------+------------+ | 5.1| 3.5| 1.4| | 4.9| …

apache-spark pyspark apache-spark-sql pyspark-pandas pyspark-schema

asked Feb 27 '23 at 10:06

Uylenburgh

1,277
4
20
46

0

votes

0 answers

pyspark stream from kafka topic with avro format returns null dataframe

I have a topic with Avro format and I want to read it as a stream in Pyspark but the output is null. my data is like this: { "ID": 559, "DueDate": 1676362642000, "Number": 1, "__deleted": "false" } and the schema in the schema registry…

pyspark spark-streaming avro spark-avro pyspark-schema

asked Feb 14 '23 at 08:01

Anna b

5
3

0

votes

1 answer

Spark incorrectly interpret data type from csv to Double when string ending with 'd'

There is a CSV with a column ID (format: 8-digits & "D" at the end). When reading csv with .option("inferSchema", "true"), it returns the data type as double and trimed the…

apache-spark pyspark read.csv pyspark-schema

asked Feb 01 '23 at 07:44

Tracy Ng

1

0

votes

1 answer

AttributeError: 'DataFrameWriter' object has no attribute 'schema'

I will like to write a Spark Dataframe with a fix schema. I m trying that: from pyspark.sql.types import StructType, IntegerType, DateType, DoubleType, StructField my_schema = StructType([ StructField("seg_gs_eur_am", DoubleType()), …

pyspark delta-lake pyspark-schema

asked Dec 23 '22 at 12:26

Enrique Benito Casado

1,914
1
20
40

0

votes

1 answer

Read excel file in a directory using pyspark

`Hi , I am trying to read excel file in a directory using pyspark but i am getting fielnotfound error `env_path='dbfs:/mnt' raw='dev/raw/work1' path=env_path+raw file_path=path+'/' objects = dbutils.fs.ls(file_path) for file_name in objects: `if…

pyspark pyspark-schema

asked Nov 28 '22 at 08:56

workpyspark

23
3

0

votes

1 answer

Read multiple CSVs with different headers into one single dataframe

I have a few CSV files where some files might have some matching columns and some have altogether different columns. For Example file 1 has the following columns: ['circuitId', 'circuitRef', 'name', 'location', 'country', 'lat', 'lng', 'alt',…

pyspark databricks pyspark-schema

asked Nov 24 '22 at 06:00

Ankit Tyagi

175
2
17

0

votes

0 answers

I am having a sample json where the data type of the key is the value which is in string format which i want to read and save it to pyspark dataframe

Below is a piece of sample json schema. I want my pyspark dataframe to read netWorthOfTheCompany as column and float as its data type. But currently when i read the json schema and save it in dataframe & print(df.dtypes) it prints as string as it…

json python-3.x pyspark pyspark-schema

asked Nov 18 '22 at 10:59

Aziz Shaikh

1
1

0

votes

0 answers

Pyspark Distinct records form the string column by considering Null values in groupby

I have a dataframe like the following: rdd = sc.parallelize([(22,'fl1.variant,fl2.variant,fl3.control','xxx','yyy','zzz'),(22,'fl1.variant, fl2.neither,fl3.control','xxx','yyy','NULL'), (22,'fl1.variant,…

pyspark pyspark-schema

asked Nov 14 '22 at 08:31

shaa

17
6

Questions tagged [pyspark-schema]