I'm working with the python tool in Alteryx to do some NLP with Spacy. I have some comments I want to extract noun chunks from, but I can't figure out which comments they are from and I'll have hundreds or even thousands of comments being processed.
For example, I have these 2 comments:
|Row| Comment | Id#|
|1 |The dog is brown. He is soft.|245|
|2 |The black cat is soft. |763|
Current Output:
| 1 | 2 |
| dog | |
| he | |
| black | cat |
Desired output:
|1 | 2 | 3 |
|row1| dog | |
|row1| he | |
|row2| black | cat |
After importing spacy, dframcy and pandas in the Alteryx tool, my code starting at the .csv import in Alteryx:
df = Alteryx.read("#1")
Txt = df[["Comment"]]
String = doc.to_string()
dframcy = DframcCy(nlp)
Txt = dframcy.nlp(String)
output1 = pd.DataFrame(data=Txt.noun_chunks)
Alteryx.write(output1,1)
I know Alteryx very well, but I'm very new to NLP and spacy, and have beginner level knowledge of python.
I know I have to import the file as a data frame in Alteryx, convert it to a string for NLP, then convert it back to a data frame for the output out of the Alteryx, which is the tough part for me because most solutions don't involve converting anything back to a data frame and appending row data (for those who are familiar with python/spacy but not Alteryx). I need the row number (or Id number) because I want to join other data in the csv to those row numbers.