I've been trying to work with a dataset which has |
as a delimiter and a \n
for new line.
a | b | c
c | e | f
I have been trying to split the set with rec[0].split('|')
and apply nltk.FreqDist(rec)
Here's my source code
import nltk
import csv
from nltk.util import ngrams
with open('CG_Attribute.csv', 'r') as f:
for row in f:
splitSet = row.split('|')
for rec in splitSet:
# token = nltk.word_tokenize(rec)
result = nltk.FreqDist(rec)
print(result)
The output that I am getting is as follows
<FreqDist with 14 samples and 22 outcomes>
<FreqDist with 8 samples and 9 outcomes>
<FreqDist with 1 samples and 1 outcomes>
<FreqDist with 26 samples and 44 outcomes>
<FreqDist with 6 samples and 8 outcomes>
What I am expecting is
[('a',1),('b',1),('c',2),('e',1),('f',1)]
Can anyone point out as to where am I screwing up? Any suggestions would help :)
PS - I even used csv
, but had no luck