1

I'm new to scattertext and have written the code which should produce an interactive html visualisation.

import spacy
import pandas as pd
import scattertext as st


twitterData = pd.read_csv("stock_data.csv")
twitterData.dtypes

nlp = spacy.load("en_core_web_sm")
corpus = st.CorpusFromPandas(
    twitterData, category_col="Sentiment", text_col="Text", nlp=nlp
).build()

sent = st.produce_scattertext_explorer(
    corpus,
    category="1",
    category_name="Positive",
    not_category_name="Negative",
    width_in_pixels=1000,
)

open("StockMarketSentiment.html", "wb").write(html.encode("utf-8"))

However, the code that I have written, following a template online, throws an assertion error and as I'm a newbie at software development i'm struggling to understand where Im going wrong.

Traceback (most recent call last):
  File "/Users/lukeashton/PycharmProjects/Project/venv/Visualiser.py", line 15, in <module>
    sent = st.produce_scattertext_explorer(corpus,
  File "/Users/lukeashton/PycharmProjects/Project/venv/lib/python3.8/site-packages/scattertext/__init__.py", line 594, in produce_scattertext_explorer
    scatter_chart_data = scatter_chart_explorer.to_dict(
  File "/Users/lukeashton/PycharmProjects/Project/venv/lib/python3.8/site-packages/scattertext/ScatterChartExplorer.py", line 115, in to_dict
    json_data = ScatterChart.to_dict(self,
  File "/Users/lukeashton/PycharmProjects/Project/venv/lib/python3.8/site-packages/scattertext/ScatterChart.py", line 276, in to_dict
    assert category in all_categories
AssertionError
Process finished with exit code 1

Appreciate it may be hard to offer advice with such limited info but the code & error details are below if anyone can spot anything!

Laurent
  • 12,287
  • 7
  • 21
  • 37
Luke Ashton
  • 39
  • 1
  • 6
  • 1
    Presumably `category="1"` is the problem. It seems it's expecting a different value other than `"1"` there. (Perhaps `"Sentiment"`?) – 0x5453 Jul 06 '21 at 17:51
  • @0x5453 hmm I see what you mean as per the error message, "Sentiment" throws the same error message however :/ I will try a few other possible values for that variable. – Luke Ashton Jul 07 '21 at 08:22

2 Answers2

2

Make sure at least in of the values in the sentiment column of your data frame is the exact string "1".

  • The values in the column were int's so I converted them all to strings and voila! Thanks a lot for the pointer. – Luke Ashton Jul 13 '21 at 16:56
1

Slightly different use case from the package, but I solved it this way:

html = st.produce_scattertext_explorer(corpus,
                                   category='dt',
                                   category_name='the thing you want to compare',
                                   not_category_name='other data',
                                   width_in_pixels=1000,
                                   height_in_pixels=500,
                                   minimum_term_frequency=2,
                                   metadata=df['parse'])

Where category='dt' is a dummy variable made into a col for whatever feature I want to pivot on in the df. I haven't had luck using an actual value from within a col in this way. The value here has to be a col in its own right to function without an assertion error (for me).