2

I need to build a query on a database with around 50k terrain polygons (stored as geo_shape polygons on ES) where I give a point and it returns every polygon that contains this point.

I managed to do it using percolate queries (example below) but I read somewhere that percolate queries don't scale well.

Is there a more efficient way to achieve this behavior?

Example using percolate:

Demo polygons

PUT geo_demo
{
  "mappings": {
    "properties": {
      "thepoly": {
        "type": "percolator"
      },
      "thepoint": {
        "type": "geo_point"
      }
    }
  }
}

#region 1 (red)
POST /geo_demo/_doc/1
{
  "thepoly": {
    "bool": {
      "must": {
        "match_all": {}
      },
      "filter": {
        "geo_polygon": {
          "thepoint": {
            "points": [
              "-23.573978,-46.664806",
              "-23.583978,-46.664806",
              "-23.583978,-46.658806",
              "-23.573978,-46.658806",
              "-23.573978,-46.664806"
            ]
          }
        }
      }
    }
  }
}

#region 2 (green)
POST /geo_demo/_doc/2
{
  "thepoly": {
    "bool": {
      "must": {
        "match_all": {}
      },
      "filter": {
        "geo_polygon": {
          "thepoint": {
            "points": [
              "-23.579978,-46.664806",
              "-23.583978,-46.664806",
              "-23.583978,-46.652806",
              "-23.579978,-46.652806",
              "-23.579978,-46.664806"
            ]
          }
        }
      }
    }
  }
}

#should match doc/1 only
GET /geo_demo/_search
{
  "query": {
    "percolate": {
      "field": "thepoly",
      "document": {
        "thepoint": "-23.577007,-46.661811"
      }
    }
  }
}

#should match both doc/1 and doc/2
GET /geo_demo/_search
{
  "query": {
    "percolate": {
      "field": "thepoly",
      "document": {
        "thepoint": "-23.582002,-46.661811"
      }
    }
  }
}

#should match doc/2 only
GET /geo_demo/_search
{
  "query": {
    "percolate": {
      "field": "thepoly",
      "document": {
        "thepoint": "-23.582041,-46.655717"
      }
    }
  }
}

#should match none
GET /geo_demo/_search
{
  "query": {
    "percolate": {
      "field": "thepoly",
      "document": {
        "thepoint": "-23.576771,-46.655674"
      }
    }
  }
}
Blamoo
  • 21
  • 2

1 Answers1

-1

you almost don't need elasticearch for this, unless you have a strong reason.

For 50K polygon, you can easily hold them in heap, or decompose each polygon into list of geohashes.

you can have a in heap map with geohash as the key, and the polygon id as the value.

as you have point coming in, you first compute the geohash, then use Map#get to check the the point is in the map or which polyogns contains this point.

fast tooth
  • 2,317
  • 4
  • 25
  • 34
  • Those polygons are associated to other fields that can be combined for complex filtering. Also I have a lot of legacy code that I'm gonna reuse for filtering. That's my "strong reason" haha – Blamoo Nov 15 '19 at 22:50
  • then you probably want to perf test it, the polygon percolator can be slow in old version of es, but the latest es should be fine. it's hard to talk about optimization without concrete perf goals. – fast tooth Nov 18 '19 at 15:16
  • The best i can think of is to add a pre-check phase like what i described, that should rule out the majority of input docs, then match them with es polygon percolator like you described. – fast tooth Nov 18 '19 at 15:17