Questions tagged [pyspark-schema]
68 questions
0
votes
2 answers
Spark Merge schema, correcting datatypes (timestamp, string)
I was reading a spark DF with options below:
testDF = spark.read.format("parquet").option("header", "true") \
.option("mergeSchema", "true").option("inferSchema", "true").load("folderPath/*/*")
However, this fails because one of the col…

OneWorld
- 952
- 2
- 8
- 21
0
votes
0 answers
Error selecting an array type column from dataframe in PySpark
I am getting an error "Column does not exist" when selecting an array of structs type column from a dataframe. This column is actually present in the dataframe and contains data. I can select it by its index. How can I select it by its name?
Data…

bda
- 372
- 1
- 7
- 22
0
votes
1 answer
Convert SQL dataframe into nested Json format in pyspark
I have sql output I am creating from the parquet file, I want to convert this sql df into the below mentioned format (structType/structField) using pyspark (not…

Pooja
- 165
- 4
- 14
0
votes
1 answer
Rename the column every time in PySpark if it is coming with different name in some files?
I have to rename the column name every time if column name contains address in it.
For example, for the first file I am receiving columns as ADDRESS1, ADDRESS2, ADDRESS3:
For the next file I am receiving column names as T_ADDRESS1, T_ADDRESS2,…

DataWorld
- 53
- 1
- 7
0
votes
1 answer
is there any solution for how to convert decimal values and alphanumeric in integer type in pyspark
ex: act: salesorgcode: Reqired
6001.0 6001
9001.0 9001
7002.0 7002
A001 A001
T001 T001

KIRAN KUMAR
- 7
- 2
0
votes
1 answer
Using regular expression in pyspark to replace part of the key inside a column containing maps?
I am stuck on this problem.
I have a pyspark dataframe looks as…

user8178045
- 1
- 1
0
votes
0 answers
PySpark read in multiple files CSV or TSV
I'm trying to load all the files in a folder. They have they same schema, but sometimes have a different delimiter (ie Usually CSV, but occasionally tab separated)
Is there a way to pass in two delimiters?
Being specific I don't want a two character…

WellyGus
- 1
0
votes
1 answer
Pyspark Edit Schema (json column)
I have the following dataframe.
and the schema looks like this.
root
|-- nro_ot: decimal(12,0) (nullable = true)
|-- json_bcg: string (nullable = true)
The column "json_bcg" is just a string and I need to edit the schema to explore the…

King Blood
- 23
- 5
0
votes
1 answer
Reading JSON using Pyspark returns data frame full of nulls
I have the following json structure in a file that i want to read using pyspark
[{'id': '34556',
'InsuranceProvider': 'sdcsdf',
'Type': {'Client': {'PaidIn': {'Insuranceid': '442211',
'Insurancedesc': 'sdfsdf vdsfs',
'purchaseditems':…

Aman Mishra
- 65
- 8
0
votes
2 answers
Loading table in Databricks job converts all columns to lowercase
I have a SQL view stored in Databricks as a table and all of the columns are capitalised. When I load the table in a Databricks job using spark.table(<>), all of the columns are converted to lowercase which causes my code to crash.…

stosxri
- 51
- 5
0
votes
1 answer
JSON Fomatting in Pyspark
I have a json stored as string in the below format
{
'aaa':'',
'bbb':'',
'ccc':{
'ccc':[{dict of values}] //list of dictionaries
}
'ddd':'',
'eee':{
'eee':[{dict of values},{dict of values},{dict of values}] //list of…

Amaravathi Satya
- 13
- 4
0
votes
1 answer
Unable to create a new column from a list using spark concat method?
i have below data frame in which i am trying to create a new column by concatinating name from a list
df=
----------------------------------
| name| department| state| id| hash
------+-----------+-------+---+----
James| Sales1 |null …

Adhi cloud
- 39
- 6
0
votes
0 answers
Nesting dataframe using pyspark
I am new to pyspark, I am trying to have multiple country data in a single row. I dont know the exact number of country fields i will get. So, i want to have a row where i will have multiple data of country name and country capital according to the…

Shivam
- 1
- 1
0
votes
0 answers
PySpark scoped temporary view
I am using PySpark SQL to create temporary views from dataframes and to make data processing with them.
I created a python service where a user can hit some APIs where they can pass the dataframe and the SQL query to be applied to it to make the…

Samuele Ceroni
- 1
- 1
0
votes
1 answer
How to flatten nested struct using PySpark?
How to flatten nested struct using PySpark?
Link to dataset
https://drive.google.com/file/d/1-xOpd2B7MDgS1t4ekfipHSerIm6JMz9e/view?usp=sharing
Thanks in advance.

Raunak
- 13
- 4