I am putting together a text analysis script in Python using pyLDAvis, and I am trying to clean up one of the outputs into something cleaner and easier to read. The function to return the top 5 important words for 4 topics is a list that looks like:
[(0, '0.008*"de" + 0.007*"sas" + 0.004*"la" + 0.003*"et" + 0.003*"see"'),
(1,
'0.009*"sas" + 0.004*"de" + 0.003*"les" + 0.003*"recovery" + 0.003*"data"'),
(2,
'0.007*"sas" + 0.006*"data" + 0.005*"de" + 0.004*"recovery" + 0.004*"raid"'),
(3,
'0.019*"sas" + 0.009*"expensive" + 0.008*"disgustingly" + 0.008*"cool." + 0.008*"houses"')]
I ideally want to turn this into a dataframe where the first row contains the first words of each topic, as well as the corresponding score, and the columns represent the word and its score i.e.:
r1col1 is 'de', r1col2 is 0.008, r1col3 is 'sas', r1col4 is 0.009, etc, etc.
Is there a way to extract the contents of the list and separate the values given the format it is in?