I have this dataframe path_df:
path_df.show()
+---------------+-------------+----+
|FromComponentID|ToComponentID|Cost|
+---------------+-------------+----+
| 160| 163|27.0|
| 160| 183|27.0|
| 161| 162|22.0|
| 161| 170|31.0|
| 162| 161|22.0|
| 162| 167|24.0|
| 163| 160|27.0|
| 163| 164|27.0|
| 164| 163|27.0|
| 164| 165|35.0|
| 165| 164|35.0|
| 165| 166|33.0|
| 166| 165|33.0|
| 166| 167|31.0|
| 167| 162|24.0|
| 167| 166|31.0|
| 167| 168|27.0|
| 168| 167|27.0|
| 168| 169|23.0|
| 169| 168|23.0|
+---------------+-------------+----+
only showing top 20 rows
From this, I want to make a dictionnary, as follow:
{FromComponentID:{ToComponentID:Cost}}
For my current data, it would be:
{160 : {163 : 27,
183 : 27},
161 : {162 : 22,
170 : 31},
162 : {161 : 22
167 : 24},
...
167 : {162 : 24,
166 : 31,
168 : 27}
168 : {167 : 27,
169 : 23},
169 : {168 : 23}
}
Can I do that using only PySpark and how ? Or maybe it's better to extract my data and process them directly with python.