7

I'm trying to display a PySpark dataframe as an HTML table in a Jupyter Notebook, but all methods seem to be failing.

Using this method displays a text-formatted table:

import pandas
df.toPandas()

Using this method displays the HTML table as a string:

df.toPandas().to_html()

This prints the non-resolved HTML prettier, but it doesn't resolve into a table:

print(df.toPandas().to_html())

And, all of these

from IPython.display import display, HTML

HTML(df.toPandas().to_html())
print(HTML(df.toPandas().to_html()))
display(HTML(df.toPandas().to_html()))

Simply print this object description:

<IPython.core.display.HTML object>

Any other ideas I can try?

nxl4
  • 714
  • 2
  • 8
  • 17

3 Answers3

3

I ran into this issue using PySpark kernels within JupyterLab notebooks on AWS EMR clusters. I found that the sparkmagic command %%display solved the issue. For instance, my Jupyter cell would look like -

%%display
some_spark_df

Also worth pointing out that this errored if there were empty lines between the %%display and the variable.

However I'm not sure how to do the same with a pandas dataframe. That still returns the object description when using the PySpark kernel (as oppose to a pure Python3 kernel)

mkirzon
  • 413
  • 5
  • 9
1

so df.toPandas() really renders the dataframe as a html object, but my assumption is that you are looking for something else or are trying to get ride of the ellipses (...).

you can config pandas before to get ride of those, this is what i use to get ride of truncation at the column,row and field levels;

pd.set_option('display.max_colwidth', -1)
pd.set_option('display.max_rows', 500)
pd.set_option('display.max_columns',500)

Also you can use the method above but you are a little out of order, here is a quick little udf that i use;

from IPython.display import display, HTML
from pyspark.sql.functions import *

def printDf(sprkDF,records): 
    return HTML(sprkDF.limit(records).toPandas().to_html())

#printDf(df,10)

hope this helps.

0

Maybe what you are looking for is something like this, it prints the output df in a table format:

import pandas
df.toPandas().to_html(index=False,col_space="40px", classes=('table', 'table-striped'))
nonoDa
  • 413
  • 2
  • 16
  • This still simply prints `` for me – mkirzon Apr 20 '21 at 23:53
  • try doing: `import ipywidgets as widgets` `import pandas` `out = widgets.HTML("")` `out.value =df.toPandas().to_html(index=False,col_space="40px", classes=('table', 'table-striped'))` Works like this for me, let me know – nonoDa Apr 21 '21 at 14:43