Questions tagged [pyspark-schema]

68 questions
0
votes
1 answer

parse pyspark dataframe column of varying keys into new column for one key's values

I have an input pyspark dataframe df. the dataframe df has a column "field1" that has values that are dictionaries. the dictionaries do not all have the same keys. I would like to parse the "b" key into a new field "newcol". to further…
user3476463
  • 3,967
  • 22
  • 57
  • 117
0
votes
2 answers

how to sequentially iterate rows in Pyspark Dataframe

I have a Spark DataFrame like this: +-------+------+-----+---------------+ |Account|nature|value| time| +-------+------+-----+---------------+ | a| 1| 50|10:05:37:293084| | a| 1| 50|10:06:46:806510| | a| …
M_Gh
  • 1,046
  • 4
  • 17
  • 43
0
votes
2 answers

Write PySpark dataframe to BigQuery "Numeric" datatype

For simplicity, I've a table in BigQuery with one field of type "Numeric". When I try to write a PySpark dataframe, with one column, to BigQuery it keeps on raising the NullPointerException. I tried converting pyspark column into int, float, string,…
0
votes
1 answer

pyspark json to dataframe schema

i have tricky json which i would like to load into a dataframe and need assistance on how i may be able to define a schema { "1-john": { "children": ["jack", "jane", "jim"] }, "2-chris": { "children": ["bill", "will"] …
RData
  • 959
  • 1
  • 13
  • 33
0
votes
2 answers

Pyspark- Fill an empty strings with a '0' if Data type is BIGINT/DOUBLE/Integer

I am trying to fill an empty strings with a '0' if column Data type is BIGINT/DOUBLE/Integer in a dataframe using pyspark data = [("James","","Smith","36","M",3000,"1.2"), ("Michael","Rose"," ","40","M",4000,"2.0"), …
K Soumya
  • 71
  • 6
0
votes
2 answers

i want to obtain max value of a column depending on two other columns and for the forth column the value of the most repeated number

I've got this dataframe df1 = spark.createDataFrame([ ('c', 'd', 3.0, 4), ('c', 'd', 7.3, 8), ('c', 'd', 7.3, 2), ('c', 'd', 7.3, 8), ('e', 'f', 6.0, 3), ('e', 'f', 6.0, 8), ('e', 'f', 6.0, 3), ('c', 'j', 4.2, 3), …
-1
votes
1 answer

How to write a schema for below nested Json pyspark

How to write schema for below json : "place_results": { "title": "W2A Architects", "place_id": "ChIJ4SUGuHw5xIkRAl0856nZrBM", "data_id": "0x89c4397cb80625e1:0x13acd9a9e73c5d02", "data_cid": "1417747306467056898", …
Xi12
  • 939
  • 2
  • 14
  • 27
-1
votes
2 answers

compare two dataframes and display the data that are different

i have two dataframes and i want to compare the values of two columns and display those who are different, for exemple: compare this Table…
1 2 3 4
5