Questions tagged [pyspark-schema]
68 questions
0
votes
1 answer
parse pyspark dataframe column of varying keys into new column for one key's values
I have an input pyspark dataframe df. the dataframe df has a column "field1" that has values that are dictionaries. the dictionaries do not all have the same keys. I would like to parse the "b" key into a new field "newcol". to further…

user3476463
- 3,967
- 22
- 57
- 117
0
votes
2 answers
how to sequentially iterate rows in Pyspark Dataframe
I have a Spark DataFrame like this:
+-------+------+-----+---------------+
|Account|nature|value| time|
+-------+------+-----+---------------+
| a| 1| 50|10:05:37:293084|
| a| 1| 50|10:06:46:806510|
| a| …

M_Gh
- 1,046
- 4
- 17
- 43
0
votes
2 answers
Write PySpark dataframe to BigQuery "Numeric" datatype
For simplicity, I've a table in BigQuery with one field of type "Numeric". When I try to write a PySpark dataframe, with one column, to BigQuery it keeps on raising the NullPointerException. I tried converting pyspark column into int, float, string,…

Malina Dale
- 153
- 1
- 3
- 8
0
votes
1 answer
pyspark json to dataframe schema
i have tricky json which i would like to load into a dataframe and need assistance on how i may be able to define a schema
{
"1-john": {
"children": ["jack", "jane", "jim"]
},
"2-chris": {
"children": ["bill", "will"]
…

RData
- 959
- 1
- 13
- 33
0
votes
2 answers
Pyspark- Fill an empty strings with a '0' if Data type is BIGINT/DOUBLE/Integer
I am trying to fill an empty strings with a '0' if column Data type is BIGINT/DOUBLE/Integer in a dataframe using pyspark
data = [("James","","Smith","36","M",3000,"1.2"),
("Michael","Rose"," ","40","M",4000,"2.0"),
…

K Soumya
- 71
- 6
0
votes
2 answers
i want to obtain max value of a column depending on two other columns and for the forth column the value of the most repeated number
I've got this dataframe
df1 = spark.createDataFrame([
('c', 'd', 3.0, 4),
('c', 'd', 7.3, 8),
('c', 'd', 7.3, 2),
('c', 'd', 7.3, 8),
('e', 'f', 6.0, 3),
('e', 'f', 6.0, 8),
('e', 'f', 6.0, 3),
('c', 'j', 4.2, 3),
…

sunny
- 11
- 5
-1
votes
1 answer
How to write a schema for below nested Json pyspark
How to write schema for below json :
"place_results": {
"title": "W2A Architects",
"place_id": "ChIJ4SUGuHw5xIkRAl0856nZrBM",
"data_id": "0x89c4397cb80625e1:0x13acd9a9e73c5d02",
"data_cid": "1417747306467056898",
…

Xi12
- 939
- 2
- 14
- 27
-1
votes
2 answers
compare two dataframes and display the data that are different
i have two dataframes and i want to compare the values of two columns and display those who are different, for exemple: compare this Table…

sunny
- 11
- 5