Highest Voted 'pyspark-schema' Questions

0

votes

1 answer

parse pyspark dataframe column of varying keys into new column for one key's values

I have an input pyspark dataframe df. the dataframe df has a column "field1" that has values that are dictionaries. the dictionaries do not all have the same keys. I would like to parse the "b" key into a new field "newcol". to further…

json pyspark pyspark-schema

asked May 11 '22 at 16:44

user3476463

3,967
22
57
117

0

votes

2 answers

how to sequentially iterate rows in Pyspark Dataframe

I have a Spark DataFrame like this: +-------+------+-----+---------------+ |Account|nature|value| time| +-------+------+-----+---------------+ | a| 1| 50|10:05:37:293084| | a| 1| 50|10:06:46:806510| | a| …

pyspark apache-spark-sql pyspark-schema

asked May 03 '22 at 18:57

M_Gh

1,046
4
17
43

0

votes

2 answers

Write PySpark dataframe to BigQuery "Numeric" datatype

For simplicity, I've a table in BigQuery with one field of type "Numeric". When I try to write a PySpark dataframe, with one column, to BigQuery it keeps on raising the NullPointerException. I tried converting pyspark column into int, float, string,…

google-cloud-platform pyspark google-bigquery apache-spark-sql pyspark-schema

asked Apr 28 '22 at 04:51

Malina Dale

153
1
3
8

0

votes

1 answer

pyspark json to dataframe schema

i have tricky json which i would like to load into a dataframe and need assistance on how i may be able to define a schema { "1-john": { "children": ["jack", "jane", "jim"] }, "2-chris": { "children": ["bill", "will"] …

json dataframe pyspark pyspark-schema

asked Apr 27 '22 at 02:40

RData

959
1
13
33

0

votes

2 answers

Pyspark- Fill an empty strings with a '0' if Data type is BIGINT/DOUBLE/Integer

I am trying to fill an empty strings with a '0' if column Data type is BIGINT/DOUBLE/Integer in a dataframe using pyspark data = [("James","","Smith","36","M",3000,"1.2"), ("Michael","Rose"," ","40","M",4000,"2.0"), …

pyspark pyspark-schema

asked Apr 25 '22 at 06:07

K Soumya

71
6

0

votes

2 answers

i want to obtain max value of a column depending on two other columns and for the forth column the value of the most repeated number

I've got this dataframe df1 = spark.createDataFrame([ ('c', 'd', 3.0, 4), ('c', 'd', 7.3, 8), ('c', 'd', 7.3, 2), ('c', 'd', 7.3, 8), ('e', 'f', 6.0, 3), ('e', 'f', 6.0, 8), ('e', 'f', 6.0, 3), ('c', 'j', 4.2, 3), …

pyspark apache-spark-sql pyspark-pandas pyspark-schema

asked Apr 19 '22 at 13:09

sunny

11
5

-1

votes

1 answer

How to write a schema for below nested Json pyspark

How to write schema for below json : "place_results": { "title": "W2A Architects", "place_id": "ChIJ4SUGuHw5xIkRAl0856nZrBM", "data_id": "0x89c4397cb80625e1:0x13acd9a9e73c5d02", "data_cid": "1417747306467056898", …

apache-spark pyspark apache-spark-sql pyspark-schema

asked Nov 07 '22 at 18:00

Xi12

939
2
14
27

-1

votes

2 answers

compare two dataframes and display the data that are different

i have two dataframes and i want to compare the values of two columns and display those who are different, for exemple: compare this Table…

dataframe pyspark apache-spark-sql pyspark-pandas pyspark-schema

asked Apr 15 '22 at 07:14

sunny

11
5

Questions tagged [pyspark-schema]