Highest Voted 'pyspark-schema' Questions

0

votes

2 answers

Spark Merge schema, correcting datatypes (timestamp, string)

I was reading a spark DF with options below: testDF = spark.read.format("parquet").option("header", "true") \ .option("mergeSchema", "true").option("inferSchema", "true").load("folderPath/*/*") However, this fails because one of the col…

pyspark pyspark-schema

asked Nov 04 '22 at 07:50

OneWorld

952
2
8
21

0

votes

0 answers

Error selecting an array type column from dataframe in PySpark

I am getting an error "Column does not exist" when selecting an array of structs type column from a dataframe. This column is actually present in the dataframe and contains data. I can select it by its index. How can I select it by its name? Data…

arrays apache-spark pyspark pyspark-schema

asked Oct 18 '22 at 15:19

bda

372
1
7
22

0

votes

1 answer

Convert SQL dataframe into nested Json format in pyspark

I have sql output I am creating from the parquet file, I want to convert this sql df into the below mentioned format (structType/structField) using pyspark (not…

pyspark pyspark-schema

asked Oct 13 '22 at 21:00

Pooja

165
4
14

0

votes

1 answer

Rename the column every time in PySpark if it is coming with different name in some files?

I have to rename the column name every time if column name contains address in it. For example, for the first file I am receiving columns as ADDRESS1, ADDRESS2, ADDRESS3: For the next file I am receiving column names as T_ADDRESS1, T_ADDRESS2,…

apache-spark pyspark apache-spark-sql rename pyspark-schema

asked Oct 03 '22 at 16:02

DataWorld

53
1
7

0

votes

1 answer

is there any solution for how to convert decimal values and alphanumeric in integer type in pyspark

ex: act: salesorgcode: Reqired 6001.0 6001 9001.0 9001 7002.0 7002 A001 A001 T001 T001

pyspark-schema

asked Sep 28 '22 at 06:13

KIRAN KUMAR

7
2

0

votes

1 answer

Using regular expression in pyspark to replace part of the key inside a column containing maps?

I am stuck on this problem. I have a pyspark dataframe looks as…

python pyspark pyspark-schema

asked Sep 15 '22 at 15:01

user8178045

1
1

0

votes

0 answers

PySpark read in multiple files CSV or TSV

I'm trying to load all the files in a folder. They have they same schema, but sometimes have a different delimiter (ie Usually CSV, but occasionally tab separated) Is there a way to pass in two delimiters? Being specific I don't want a two character…

pyspark pyspark-schema

asked Sep 09 '22 at 08:57

WellyGus

1

0

votes

1 answer

Pyspark Edit Schema (json column)

I have the following dataframe. and the schema looks like this. root |-- nro_ot: decimal(12,0) (nullable = true) |-- json_bcg: string (nullable = true) The column "json_bcg" is just a string and I need to edit the schema to explore the…

pyspark pyspark-schema

asked Sep 07 '22 at 17:54

King Blood

23
5

0

votes

1 answer

Reading JSON using Pyspark returns data frame full of nulls

I have the following json structure in a file that i want to read using pyspark [{'id': '34556', 'InsuranceProvider': 'sdcsdf', 'Type': {'Client': {'PaidIn': {'Insuranceid': '442211', 'Insurancedesc': 'sdfsdf vdsfs', 'purchaseditems':…

pyspark pyspark-schema

asked Jul 11 '22 at 19:42

Aman Mishra

65
8

0

votes

2 answers

Loading table in Databricks job converts all columns to lowercase

I have a SQL view stored in Databricks as a table and all of the columns are capitalised. When I load the table in a Databricks job using spark.table(<>), all of the columns are converted to lowercase which causes my code to crash.…

pyspark apache-spark-sql databricks azure-databricks pyspark-schema

asked Jul 07 '22 at 14:46

stosxri

51
5

0

votes

1 answer

JSON Fomatting in Pyspark

I have a json stored as string in the below format { 'aaa':'', 'bbb':'', 'ccc':{ 'ccc':[{dict of values}] //list of dictionaries } 'ddd':'', 'eee':{ 'eee':[{dict of values},{dict of values},{dict of values}] //list of…

python json pyspark aws-glue pyspark-schema

asked Jun 06 '22 at 14:00

Amaravathi Satya

13
4

0

votes

1 answer

Unable to create a new column from a list using spark concat method?

apache-spark pyspark apache-spark-sql python-3.6 pyspark-schema

asked Jun 03 '22 at 18:13

Adhi cloud

39
6

0

votes

0 answers

Nesting dataframe using pyspark

I am new to pyspark, I am trying to have multiple country data in a single row. I dont know the exact number of country fields i will get. So, i want to have a row where i will have multiple data of country name and country capital according to the…

python dataframe pyspark pyspark-schema

asked May 23 '22 at 13:43

Shivam

1
1

0

votes

0 answers

PySpark scoped temporary view

I am using PySpark SQL to create temporary views from dataframes and to make data processing with them. I created a python service where a user can hit some APIs where they can pass the dataframe and the SQL query to be applied to it to make the…

pyspark apache-spark-sql pyspark-schema

asked May 16 '22 at 10:59

Samuele Ceroni

1
1

0

votes

1 answer

How to flatten nested struct using PySpark?

How to flatten nested struct using PySpark? Link to dataset https://drive.google.com/file/d/1-xOpd2B7MDgS1t4ekfipHSerIm6JMz9e/view?usp=sharing Thanks in advance.

python pyspark apache-spark-sql pyspark-schema

asked May 13 '22 at 12:28

Raunak

13
4

Questions tagged [pyspark-schema]