Questions tagged [pyspark-schema]

68 questions
0
votes
2 answers

Spark Merge schema, correcting datatypes (timestamp, string)

I was reading a spark DF with options below: testDF = spark.read.format("parquet").option("header", "true") \ .option("mergeSchema", "true").option("inferSchema", "true").load("folderPath/*/*") However, this fails because one of the col…
OneWorld
  • 952
  • 2
  • 8
  • 21
0
votes
0 answers

Error selecting an array type column from dataframe in PySpark

I am getting an error "Column does not exist" when selecting an array of structs type column from a dataframe. This column is actually present in the dataframe and contains data. I can select it by its index. How can I select it by its name? Data…
bda
  • 372
  • 1
  • 7
  • 22
0
votes
1 answer

Convert SQL dataframe into nested Json format in pyspark

I have sql output I am creating from the parquet file, I want to convert this sql df into the below mentioned format (structType/structField) using pyspark (not…
Pooja
  • 165
  • 4
  • 14
0
votes
1 answer

Rename the column every time in PySpark if it is coming with different name in some files?

I have to rename the column name every time if column name contains address in it. For example, for the first file I am receiving columns as ADDRESS1, ADDRESS2, ADDRESS3: For the next file I am receiving column names as T_ADDRESS1, T_ADDRESS2,…
0
votes
1 answer

is there any solution for how to convert decimal values and alphanumeric in integer type in pyspark

ex: act: salesorgcode: Reqired 6001.0 6001 9001.0 9001 7002.0 7002 A001 A001 T001 T001
0
votes
1 answer

Using regular expression in pyspark to replace part of the key inside a column containing maps?

I am stuck on this problem. I have a pyspark dataframe looks as…
0
votes
0 answers

PySpark read in multiple files CSV or TSV

I'm trying to load all the files in a folder. They have they same schema, but sometimes have a different delimiter (ie Usually CSV, but occasionally tab separated) Is there a way to pass in two delimiters? Being specific I don't want a two character…
0
votes
1 answer

Pyspark Edit Schema (json column)

I have the following dataframe. and the schema looks like this. root |-- nro_ot: decimal(12,0) (nullable = true) |-- json_bcg: string (nullable = true) The column "json_bcg" is just a string and I need to edit the schema to explore the…
King Blood
  • 23
  • 5
0
votes
1 answer

Reading JSON using Pyspark returns data frame full of nulls

I have the following json structure in a file that i want to read using pyspark [{'id': '34556', 'InsuranceProvider': 'sdcsdf', 'Type': {'Client': {'PaidIn': {'Insuranceid': '442211', 'Insurancedesc': 'sdfsdf vdsfs', 'purchaseditems':…
0
votes
2 answers

Loading table in Databricks job converts all columns to lowercase

I have a SQL view stored in Databricks as a table and all of the columns are capitalised. When I load the table in a Databricks job using spark.table(<>), all of the columns are converted to lowercase which causes my code to crash.…
0
votes
1 answer

JSON Fomatting in Pyspark

I have a json stored as string in the below format { 'aaa':'', 'bbb':'', 'ccc':{ 'ccc':[{dict of values}] //list of dictionaries } 'ddd':'', 'eee':{ 'eee':[{dict of values},{dict of values},{dict of values}] //list of…
0
votes
1 answer

Unable to create a new column from a list using spark concat method?

i have below data frame in which i am trying to create a new column by concatinating name from a list df= ---------------------------------- | name| department| state| id| hash ------+-----------+-------+---+---- James| Sales1 |null …
0
votes
0 answers

Nesting dataframe using pyspark

I am new to pyspark, I am trying to have multiple country data in a single row. I dont know the exact number of country fields i will get. So, i want to have a row where i will have multiple data of country name and country capital according to the…
Shivam
  • 1
  • 1
0
votes
0 answers

PySpark scoped temporary view

I am using PySpark SQL to create temporary views from dataframes and to make data processing with them. I created a python service where a user can hit some APIs where they can pass the dataframe and the SQL query to be applied to it to make the…
0
votes
1 answer

How to flatten nested struct using PySpark?

How to flatten nested struct using PySpark? Link to dataset https://drive.google.com/file/d/1-xOpd2B7MDgS1t4ekfipHSerIm6JMz9e/view?usp=sharing Thanks in advance.
Raunak
  • 13
  • 4