I am very new in informatics. I am trying to make a scatter from a pyspark dataframe I got from Kaggle. I managed to make a basic exploratory data analysis. I got the basic statistics (mean, median, skweness, kurtosis) and the histogram for the columns. But when I tried to make a scatter between two columns such df_min_max_scaled['pmbar_norm'] and df_min_max_scaled['TdegC_norm'] I got stuck.
type(df_min_max_scaled) and df_min_max_scaled.printSchema() Outputs
My main goal is to make the scatter without converting it to pandas. But the tips like this one, have been not useful for me.
I tried to use
import plotly.express as px
df = df_min_max_scaled
fig = px.scatter(df, x = 'pmbar_norm', y = 'TdegC_norm')
fig.show()
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
~\AppData\Local\Temp\ipykernel_16132\2057007735.py in ?()
1 import plotly.express as px
2 df = df_min_max_scaled
----> 3 fig = px.scatter(df, x = 'pmbar_norm', y = 'TdegC_norm')
4 fig.show()
~\.conda\envs\learning3\lib\site-packages\plotly\express\_chart_types.py in ?(data_frame, x, y, color, symbol, size, hover_name, hover_data, custom_data, text, facet_row, facet_col, facet_col_wrap, facet_row_spacing, facet_col_spacing, error_x, error_x_minus, error_y, error_y_minus, animation_frame, animation_group, category_orders, labels, orientation, color_discrete_sequence, color_discrete_map, color_continuous_scale, range_color, color_continuous_midpoint, symbol_sequence, symbol_map, opacity, size_max, marginal_x, marginal_y, trendline, trendline_options, trendline_color_override, trendline_scope, log_x, log_y, range_x, range_y, render_mode, title, template, width, height)
62 """
63 In a scatter plot, each row of `data_frame` is represented by a symbol
64 mark in 2D space.
65 """
---> 66 return make_figure(args=locals(), constructor=go.Scatter)
~\.conda\envs\learning3\lib\site-packages\plotly\express\_core.py in ?(args, constructor, trace_patch, layout_patch)
1986 trace_patch = trace_patch or {}
1987 layout_patch = layout_patch or {}
1988 apply_default_cascade(args)
1989
-> 1990 args = build_dataframe(args, constructor)
1991 if constructor in [go.Treemap, go.Sunburst, go.Icicle] and args["path"] is not None:
1992 args = process_dataframe_hierarchy(args)
1993 if constructor in [go.Pie]:
~\.conda\envs\learning3\lib\site-packages\plotly\express\_core.py in ?(args, constructor)
1302
1303 # Cast data_frame argument to DataFrame (it could be a numpy array, dict etc.)
1304 df_provided = args["data_frame"] is not None
1305 if df_provided and not isinstance(args["data_frame"], pd.DataFrame):
-> 1306 args["data_frame"] = pd.DataFrame(args["data_frame"])
1307 df_input = args["data_frame"]
1308
1309 # now we handle special cases like wide-mode or x-xor-y specification
~\.conda\envs\learning3\lib\site-packages\pandas\core\frame.py in ?(self, data, index, columns, dtype, copy)
813 )
814 # For data is scalar
815 else:
816 if index is None or columns is None:
--> 817 raise ValueError("DataFrame constructor not properly called!")
818
819 index = ensure_index(index)
820 columns = ensure_index(columns)
ValueError: DataFrame constructor not properly called!
In the forums and documentation it should be enough, but I am stuck in this. Could someone give me a hand?
Thanks in advance