0

I want to find or separate noun and groups of nouns using NLTK from JSON file, this is the JSON file content:

[
  {
    "id": 18009,
    "ingredients": [
      "baking powder",
      "eggs",
      "all-purpose flour",
      "raisins",
      "milk",
      "white sugar"
    ]
  },
  {
    "id": 28583,
    "ingredients": [
      "sugar",
      "egg yolks",
      "corn starch",
      "cream of tartar",
      "bananas",
      "vanilla wafers",
      "milk",
      "vanilla extract",
      "toasted pecans",
      "egg whites",
      "light rum"
    ]
  },

I want to find the NN, NNS, NNP, NNPS.

DjaouadNM
  • 22,013
  • 4
  • 33
  • 55

1 Answers1

0
import nltk
from nltk import word_tokenize
for a in data:
    for b in a["ingredients"]:
        text = word_tokenize(b)
        res = nltk.pos_tag(text)
        res = [t for t in res if t[1] in ["NN", "NNS", "NNP", "NNPS"]]
        print(res)

#output:
#[('powder', 'NN')]
#[('eggs', 'NNS')]
#[('flour', 'NN')]
#[('raisins', 'NNS')]
#[('milk', 'NN')]
#[('sugar', 'NN')]
#[('sugar', 'NN')]
#[('egg', 'NN'), ('yolks', 'NNS')]
#[('corn', 'NN'), ('starch', 'NN')]
# ...

DBaker
  • 2,079
  • 9
  • 15
  • I am glad that I could help. (please don't forget to upvote and accept the answer :) ) – DBaker Sep 06 '19 at 15:33
  • I found an error when i run the code on json file error Keyerror ingredian – Shah Wajahat Sep 06 '19 at 19:47
  • Make sure your json file has the right keys. ( "ingredian" shouldn't be a key , instead "ingredients" should be) – DBaker Sep 07 '19 at 08:09
  • You should open a new question on this website about checking JSON files – DBaker Sep 09 '19 at 12:27
  • can you tell me how i can separate the remaining part of speech? – Shah Wajahat Sep 12 '19 at 09:30
  • I am not sure what you mean, perhaps you are asking about this: s = [('powder', 'NN')] print(s[0][1]) #'NN' But comments are not the right place for writing code. You should open new questions if you have new questions – DBaker Sep 12 '19 at 10:15