0

I've been playing around with google's entity analyser, and it looks really good!

But I've been bashing my head against this for a while - I'm trying to replicate the image below (seen on google's natural language api page)

enter image description here

This is the format of the entity data I get back from a request.

There's no order to the data, only occurrences - so looping through each word, and checking against the enities seems really slow, and as there's multiple of each word - it might get a little complicated.

[
  {
  "mentions": [
    {
      "text": { "content": "group", "beginOffset": -1 },
      "type": "COMMON",
      "sentiment": { "magnitude": 0, "score": 0 }
    },
    {
      "text": { "content": "group", "beginOffset": -1 },
      "type": "COMMON",
      "sentiment": { "magnitude": 0, "score": 0 }
    },
    {
      "text": { "content": "group", "beginOffset": -1 },
      "type": "COMMON",
      "sentiment": { "magnitude": 0.30000001192092896, "score":0.30000001192092896 }
    },
    {
      "text": { "content": "group", "beginOffset": -1 },
      "type": "COMMON",
      "sentiment": { "magnitude": 0.30000001192092896, "score":-0.30000001192092896 }
    },
    {
      "text": { "content": "group", "beginOffset": -1 },
      "type": "COMMON",
      "sentiment": { "magnitude": 0, "score": 0 }
    },
    {
      "text": { "content": "group", "beginOffset": -1 },
      "type": "COMMON",
      "sentiment": { "magnitude": 0, "score": 0 }
    } 
  ],
  "metadata": {},
  "name": "group",
  "type": "ORGANIZATION",
  "salience": 0.34768930077552795,
  "sentiment": { "magnitude": 1.100000023841858, "score": 0 }
},
{
  "mentions": [
    {
      "text": { "content": "Commonwealth", "beginOffset": -1 },
      "type": "PROPER",
      "sentiment": { "magnitude": 0, "score": 0 }
    },
    {
      "text": { "content": "Commonwealth", "beginOffset": -1 },
      "type": "PROPER",
      "sentiment": { "magnitude": 0, "score": 0 }
    },
    {
      "text": { "content": "Commonwealth", "beginOffset": -1 },
      "type": "PROPER",
      "sentiment": { "magnitude": 0, "score": 0 }
    },
    {
      "text": { "content": "Commonwealth", "beginOffset": -1 },
      "type": "PROPER",
      "sentiment": { "magnitude": 0, "score": 0 }
    },
    {
      "text": { "content": "Commonwealth", "beginOffset": -1 },
      "type": "PROPER",
      "sentiment": { "magnitude": 0, "score": 0 }
    },
    {
      "text": { "content": "Commonwealth", "beginOffset": -1 },
      "type": "PROPER",
      "sentiment": { "magnitude": 0, "score": 0 }
    },
    {
      "text": { "content": "Commonwealth", "beginOffset": -1 },
      "type": "PROPER",
      "sentiment": { "magnitude": 0, "score": 0 }
    },
    {
      "text": { "content": "Commonwealth", "beginOffset": -1 },
      "type": "PROPER",
      "sentiment": { "magnitude": 0, "score": 0 }
    },
    {
      "text": { "content": "Commonwealth", "beginOffset": -1 },
      "type": "PROPER",
      "sentiment": { "magnitude": 0, "score": 0 }
    },
    {
      "text": { "content": "Commonwealth", "beginOffset": -1 },
      "type": "PROPER",
      "sentiment": { "magnitude": 0.10000000149011612, "score":-0.10000000149011612 }
    },
    {
      "text": { "content": "Commonwealth", "beginOffset": -1 },
      "type": "PROPER",
      "sentiment": { "magnitude": 0, "score": 0 }
    },
  {
    "text": { "content": "Commonwealth", "beginOffset": -1 },
    "type": "PROPER",
    "sentiment": { "magnitude": 0.20000000298023224, "score": -0.20000000298023224 }
  },
  {
    "text": { "content": "Commonwealth", "beginOffset": -1 },
    "type": "PROPER",
    "sentiment": { "magnitude": 0, "score": 0 }
  },
  {
    "text": { "content": "Commonwealth", "beginOffset": -1 },
    "type": "PROPER",
    "sentiment": { "magnitude": 0, "score": 0 }
  },
  {
    "text": { "content": "Commonwealth", "beginOffset": -1 },
    "type": "PROPER",
    "sentiment": { "magnitude": 0, "score": 0 }
  },
  {
    "text": { "content": "Commonwealth", "beginOffset": -1 },
    "type": "PROPER",
    "sentiment": { "magnitude": 0, "score": 0 }
  },
  {
    "text": { "content": "Commonwealth", "beginOffset": -1 },
    "type": "PROPER",
    "sentiment": { "magnitude": 0, "score": 0 }
  },
  {
    "text": { "content": "Commonwealth", "beginOffset": -1 },
    "type": "PROPER",
    "sentiment": { "magnitude": 0, "score": 0 }
  },
  {
    "text": { "content": "Commonwealth", "beginOffset": -1 },
    "type": "PROPER",
    "sentiment": { "magnitude": 0, "score": 0 }
  },
  {
    "text": { "content": "Commonwealth of Nations", "beginOffset": -1 },
    "type": "PROPER",
    "sentiment": { "magnitude": 0, "score": 0 }
  },
  {
    "text": { "content": "Commonwealth\r\nOne", "beginOffset": -1 },
    "type": "PROPER",
    "sentiment": { "magnitude": 0, "score": 0 }
  }
],
"metadata": {
  "mid": "/m/0j7v_",
  "wikipedia_url": "https://en.wikipedia.org/wiki/Commonwealth_of_Nations"
},
"name": "Commonwealth of Nations",
"type": "LOCATION",
"salience": 0.28001657128334045,
"sentiment": { "magnitude": 1.7000000476837158, "score": 0 }
 }, 
  ...
  ]

Is there an easy way of doing this, that I've completely missed? Thanks for any insight/ideas.

Ollie

Ollie
  • 1,104
  • 7
  • 24
  • 45

1 Answers1

0

I believe beginOffset is what you need:

beginOffset indicating the (zero-based) character offset within the given text where the sentence begins. Note that this offset is calculated using the passed encodingType.

It should work if you specify the EncodingType in the request.

If EncodingType is not specified, encoding-dependent information (such as beginOffset) will be set at -1.

Xiaoxia Lin
  • 736
  • 6
  • 16