0

I am very new in informatics. I am trying to make a scatter from a pyspark dataframe I got from Kaggle. I managed to make a basic exploratory data analysis. I got the basic statistics (mean, median, skweness, kurtosis) and the histogram for the columns. But when I tried to make a scatter between two columns such df_min_max_scaled['pmbar_norm'] and df_min_max_scaled['TdegC_norm'] I got stuck.

type(df_min_max_scaled) and df_min_max_scaled.printSchema() Outputs

My main goal is to make the scatter without converting it to pandas. But the tips like this one, have been not useful for me.

I tried to use

import plotly.express as px
df = df_min_max_scaled
fig = px.scatter(df, x = 'pmbar_norm', y = 'TdegC_norm')
fig.show()
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
~\AppData\Local\Temp\ipykernel_16132\2057007735.py in ?()
      1 import plotly.express as px
      2 df = df_min_max_scaled
----> 3 fig = px.scatter(df, x = 'pmbar_norm', y = 'TdegC_norm')
      4 fig.show()

~\.conda\envs\learning3\lib\site-packages\plotly\express\_chart_types.py in ?(data_frame, x, y, color, symbol, size, hover_name, hover_data, custom_data, text, facet_row, facet_col, facet_col_wrap, facet_row_spacing, facet_col_spacing, error_x, error_x_minus, error_y, error_y_minus, animation_frame, animation_group, category_orders, labels, orientation, color_discrete_sequence, color_discrete_map, color_continuous_scale, range_color, color_continuous_midpoint, symbol_sequence, symbol_map, opacity, size_max, marginal_x, marginal_y, trendline, trendline_options, trendline_color_override, trendline_scope, log_x, log_y, range_x, range_y, render_mode, title, template, width, height)
     62     """
     63     In a scatter plot, each row of `data_frame` is represented by a symbol
     64     mark in 2D space.
     65     """
---> 66     return make_figure(args=locals(), constructor=go.Scatter)

~\.conda\envs\learning3\lib\site-packages\plotly\express\_core.py in ?(args, constructor, trace_patch, layout_patch)
   1986     trace_patch = trace_patch or {}
   1987     layout_patch = layout_patch or {}
   1988     apply_default_cascade(args)
   1989 
-> 1990     args = build_dataframe(args, constructor)
   1991     if constructor in [go.Treemap, go.Sunburst, go.Icicle] and args["path"] is not None:
   1992         args = process_dataframe_hierarchy(args)
   1993     if constructor in [go.Pie]:

~\.conda\envs\learning3\lib\site-packages\plotly\express\_core.py in ?(args, constructor)
   1302 
   1303     # Cast data_frame argument to DataFrame (it could be a numpy array, dict etc.)
   1304     df_provided = args["data_frame"] is not None
   1305     if df_provided and not isinstance(args["data_frame"], pd.DataFrame):
-> 1306         args["data_frame"] = pd.DataFrame(args["data_frame"])
   1307     df_input = args["data_frame"]
   1308 
   1309     # now we handle special cases like wide-mode or x-xor-y specification

~\.conda\envs\learning3\lib\site-packages\pandas\core\frame.py in ?(self, data, index, columns, dtype, copy)
    813                 )
    814         # For data is scalar
    815         else:
    816             if index is None or columns is None:
--> 817                 raise ValueError("DataFrame constructor not properly called!")
    818 
    819             index = ensure_index(index)
    820             columns = ensure_index(columns)

ValueError: DataFrame constructor not properly called!

In the forums and documentation it should be enough, but I am stuck in this. Could someone give me a hand?

Thanks in advance

0 Answers0