1

I currently have a list of words within a text file, all the words within the document are on a separate line. I also have imported nested json data into a pandas data frame.

Json data format looks something similar to this:

[  
   {  
      "year":"2019",
      "category":"chemistry",
      "laureates":[  
         {  
            "id":"976",
            "motivation":"\"for the development of lithium-ion batteries\"",
            "share":"3"
         },
         {  
            "id":"977",
            "motivation":"\"for the development of lithium-ion batteries\"",
            "share":"3"
         }
      ]
   },
   {  
      "year":"2019",
      "category":"economics",
      "laureates":[  
         {  
            "id":"982",
            "firstname":"Abhijit",
            "surname":"Banerjee",
            "motivation":"\"for their experimental approach to alleviating global poverty\"",
            "share":"3"
         },

I need to use the words within the text file to find out various frequencies within the json file for each of the categories (such as: chemistry). I am then asked to plot the multiple frequencies (1st most frequent word, 10th, 20th, 30th, 40th, 50th) using Matplotlib, for each of the subjects.

I am very confused as I'm not sure about the best way to go about this.

Zoe
  • 49
  • 3
  • To clarify: You want a graph that has the words on the x-axis, and numbers on the y-axis? Do you want a bar graph, line graph, etc? – PythonNerd Dec 03 '19 at 14:30
  • It is a plot graph, with the multiple frequencies (1st most frequent word, 10th, 20th, 30th, 40th, 50th) on the x axis and y axis should just have tick marks. – Zoe Dec 03 '19 at 14:43

1 Answers1

0

You can use python's port of moses - sacremoses for tokenization and normalization. This will give you list of words. Then you just need to calculate occurrences of each word and create plot. For fast plotting I recommend seaborn. Word cloud would also be neat.

Piotr Rarus
  • 884
  • 8
  • 16