-1

I have a Python Flask app that recently switched from using Pandas to Polars for some dataframe handling. The pertinent code is shown here:

data = { 'Text': ['Virginia Woolf, Mrs. Dalloway', 'College website corpus', 'Presidential inaugural speeches', 'Federalist Papers', 'British Novels', 'YOUR TEXT'], 
             'To Be Frequency': [28.3, 16.7, 31.8, 39.8, 31.4, results[1]] }
df = pd.from_dict(data)

# textresult = (df.sort_values(by=['To Be Frequency'], ascending=False)).style

# See https://pola-rs.github.io/polars/py-polars/html/reference/config.html for complete list of Polars.Config settings
    pd.Config.set_tbl_hide_column_data_types(True)
    pd.Config.set_tbl_hide_dataframe_shape(True)
    pd.Config.set_fmt_str_lengths(40)
    pd.Config.set_tbl_width_chars(200)

textresult = df.sort( 'To Be Frequency' )._repr_html_( )  # convert the result to HTML because a simple string won't do

The textresult output looks like this:

Current textresult dataframe

I'm looking for any way to remove the double quotes that surround my textresult output. Suggestions?

I've tried every conceivable pd.Config value as well as different approaches to declaration of the data dictionary. I've also searched for any CSS that might easily let me "hide" the quotes. Nothing has worked thus far.

I'm expecting the table you see in the posted image, but with no quotes around the "Text" values.

---Update--- Based on comments received I found that a simple print(df) returns the frame in my console without quotes around Text values, but I need HTML output suitable for rendering in a Flask template so it seems the root of my problem is in the .repr_html( ) representation only?

---Update--- Moments ago I added an enhancement request to the Polars issue queue. In conjunction with that issue I wrote a test and tried to modify pertinent functions in py-polars/polars/_html.py but could not craft an elegant solution; I just don't know enough about @HTMLFormatter and related features yet. So, my simple fix was to modify my code to look like this:

    htmlresult = df.sort( 'To Be Frequency' )._repr_html_( )  # convert the result to HTML because a simple string won't do
    textresult = htmlresult.replace(""", "")

It's a kludge but a quick solution to my very simple use of Polars.

  • 2
    reproducible example please? – ignoring_gravity Jan 23 '23 at 16:50
  • It looks like the quotes come from [`.get_fmt()`](https://github.com/pola-rs/polars/blob/master/py-polars/polars/_html.py#L115) which dispatches to [the rust function.](https://github.com/pola-rs/polars/blob/master/py-polars/src/series.rs#L374) I could be mistaken - but it doesn't look like it's currently possible to customize this behaviour. You may have to open a feature request / discuss it on the issues tracker. – jqurious Jan 23 '23 at 18:24
  • My apologies, again, since this is a Flask app, and the quotes issue seem to be within the `_repr_html()` output only, I'm finding it difficult to craft a suitably brief example. – SummittDweller Jan 23 '23 at 18:33
  • Thanks @jqurious. I started down the path of creating an issue this morning, but the sage advice there suggests posting questions here first. I'll introduce an issue as you suggest and may even have my daughter try to do some work on it. Take care. – SummittDweller Jan 23 '23 at 18:39

1 Answers1

2

You want strip:

In [37]: df = pl.DataFrame({'Text': ['"College website corpus"']})

In [38]: df['Text'][0]
Out[38]: '"College website corpus"'

In [39]: df['Text'].str.strip('"')[0]
Out[39]: 'College website corpus'
ignoring_gravity
  • 6,677
  • 4
  • 32
  • 65
  • Thanks for the quick answer, but I'm sorry to say I was unable to make it work. However I did learn from it. – SummittDweller Jan 23 '23 at 18:25
  • 1
    it what sense did it not work? – ignoring_gravity Jan 23 '23 at 18:25
  • I'm unable to change the contents of the dataframe using this syntax and I see some posts that suggest syntax like `df['Text'][0] = "Something"` would work in Pandas but not in Polars, and that seems true in my code. What I learned is that a simple `print(df)` returns the frame without quotes around Text values, but I need HTML output suitable for rendering in a Flask template so it seems the root of my problem is in the `._repr_html_( )` representation only? – SummittDweller Jan 23 '23 at 18:31
  • 1
    The issue is not that their data contains the surrounding quotes but that `._s.get_fmt()` gets called from `._repr_html()` which adds them e.g. `pl.Series(['College Website corpus'])._s.get_fmt(0, 128)` – jqurious Jan 23 '23 at 18:49