3

I am trying to store hotel room availability in elasticsearch. And then I need to search rooms those are available from a date till another date. I have come up with two ways to store data for availability, and they are as follows:

Here availability dictionary store all dates and value of each date key is true of false, representing its available on that day or not.

{
  "_id": "khg2uo47tyhgjwebu7624787",
  "room_type": "garden view",
  "hotel_name": "Cool hotel",
  "hotel_id": "jytu64r982u0299023",
  "room_metadata1": 233,
  "room_color": "black",
  "availability": {
    "2016-07-01": true,
    "2016-07-02": true,
    "2016-07-03": false,
    "2016-07-04": true,
    "2016-07-05": true,
    "2016-07-06": null,
    "2016-07-07": true,
    "2016-07-08": true,
    ----
    ----
    for 365 days 
  }

}

Here availability array only stores those dates when room is available

{
  "_id": "khg2uo47tyhgjwebu7624787",
  "room_type": "garden view",
  "hotel_name": "Cool hotel",
  "hotel_id": "jytu64r982u0299023",
  "room_metadata1": 535,
  "room_color": "black",
  "availability": ["2016-07-01", "2016-07-02", "2016-07-04", "2016-07-05", "2016-07-07", "2016-07-08"] ---for 365 days
  }
}

I want to search all rooms, those are available from from_date till to_date and that should look into availability dictionary or array.And my date range may span up to 365 days

How to store these availability data, so that I can perform the above search easily? And I could not find any way to search through range of dates, so any suggestion?

Please note, items in availability may not be kept sorted. And I may have more than 100 million records to search through.

Community
  • 1
  • 1
JVK
  • 3,782
  • 8
  • 43
  • 67

2 Answers2

3

One way to model this would be with parent/child documents. Room documents would be parent documents and availability documents would be their child documents. For each room, there would be one availability document per date the room is available. Then, at query time, we can query for parent rooms which have one availability child document for each date in the searched interval (even disjoint ones).

Note that you'll need to make sure that as soon as a room is booked, you remove the corresponding child documents for each booked date.

Let's try this out. First create the index:

PUT /rooms
{
  "mappings": {
    "room": {
      "properties": {
        "room_num": {
          "type": "integer"
        }
      }
    },
    "availability": {
      "_parent": {
        "type": "room"
      },
      "properties": {
        "date": {
          "type": "date",
          "format": "date"
        },
        "available": {
          "type": "boolean"
        }
      }
    }
  }
}

Then add some data

POST /rooms/_bulk
{"_index": { "_type": "room", "_id": 233}}
{"room_num": 233}
{"_index": { "_type": "availability", "_id": "20160701", "_parent": 233}}
{"date": "2016-07-01"}
{"_index": { "_type": "availability", "_id": "20160702", "_parent": 233}}
{"date": "2016-07-02"}
{"_index": { "_type": "availability", "_id": "20160704", "_parent": 233}}
{"date": "2016-07-04"}
{"_index": { "_type": "availability", "_id": "20160705", "_parent": 233}}
{"date": "2016-07-05"}
{"_index": { "_type": "availability", "_id": "20160707", "_parent": 233}}
{"date": "2016-07-07"}
{"_index": { "_type": "availability", "_id": "20160708", "_parent": 233}}
{"date": "2016-07-08"}

Finally, we can start querying. First, let's say we want to find a room that is available on 2016-07-01:

POST /rooms/room/_search
{
  "query": {
    "has_child": {
      "type": "availability",
      "query": {
        "term": {
          "date": "2016-07-01"
        }
      }
    }
  }
}
=> result: room 233

Then, let's try searching for a room available from 2016-07-01 to 2016-07-03

POST /rooms/room/_search
{
  "query": {
    "bool": {
      "minimum_should_match": 3,
      "should": [
        {
          "has_child": {
            "type": "availability",
            "query": {
              "term": {
                "date": "2016-07-01"
              }
            }
          }
        },
        {
          "has_child": {
            "type": "availability",
            "query": {
              "term": {
                "date": "2016-07-02"
              }
            }
          }
        },
        {
          "has_child": {
            "type": "availability",
            "query": {
              "term": {
                "date": "2016-07-03"
              }
            }
          }
        }
      ]
    }
  }
}
=> Result: No rooms

However, searching for a room available from 2016-07-01 to 2016-07-02 does yield room 233

POST /rooms/room/_search
{
  "query": {
    "bool": {
      "minimum_should_match": 2,
      "should": [
        {
          "has_child": {
            "type": "availability",
            "query": {
              "term": {
                "date": "2016-07-01"
              }
            }
          }
        },
        {
          "has_child": {
            "type": "availability",
            "query": {
              "term": {
                "date": "2016-07-02"
              }
            }
          }
        }
      ]
    }
  }
}
=> Result: Room 233

We can also search for disjoint intervals, say from 2016-07-01 to 2016-07-02 + from 2016-07-04 to 2016-07-05

POST /rooms/room/_search
{
  "query": {
    "bool": {
      "minimum_should_match": 4,
      "should": [
        {
          "has_child": {
            "type": "availability",
            "query": {
              "term": {
                "date": "2016-07-01"
              }
            }
          }
        },
        {
          "has_child": {
            "type": "availability",
            "query": {
              "term": {
                "date": "2016-07-02"
              }
            }
          }
        },
        {
          "has_child": {
            "type": "availability",
            "query": {
              "term": {
                "date": "2016-07-04"
              }
            }
          }
        },
        {
          "has_child": {
            "type": "availability",
            "query": {
              "term": {
                "date": "2016-07-05"
              }
            }
          }
        }
      ]
    }
  }
}
=> Result: Room 233

And so on... The key point is to add one has_child query per date you need to check availability for and set minimum_should_match to the number of dates you're checking.

UPDATE

Another option would be to use a script filter, but with 100 million documents, I'm not certain it would scale that well.

In this scenario you can keep your original design (preferably the second one, because with the first one, you'll create too many unnecessary fields in your mapping) and the query would look like this:

POST /rooms/room/_search
{
  "query": {
    "bool": {
      "filter": {
        "script": {
          "script": {
            "inline": "def dates = doc.availability.sort(false); from = Date.parse('yyyy-MM-dd', from); to = Date.parse('yyyy-MM-dd', to); def days = to - from; def fromIndex = doc.availability.values.indexOf(from.time); def toIndex = doc.availability.values.indexOf(to.time); return days == (toIndex - fromIndex)",
            "params": {
              "from": "2016-07-01",
              "to": "2016-07-04"
            }
          }
        }
      }
    }
  }
}
Val
  • 207,596
  • 13
  • 358
  • 360
  • Thanks @Val. So according to your solution, if I have to check a range from 2016-01-01 till 2016-12-01, that means in search query I will have almost 330 `has_child` dates for every dates. Do you think that's a good idea? – JVK Jun 15 '16 at 18:34
  • Do you think you'll often have someone booking a room for 330 days ? ;-) – Val Jun 15 '16 at 21:01
  • Yes, there are some hotels, they provide extended stay. For example "commuters" who rent rooms for a year – JVK Jun 15 '16 at 21:57
  • I also read that parent-child relationship has limitation https://www.elastic.co/guide/en/elasticsearch/guide/current/parent-child.html#parent-child the parent document and all of its children must live on the same shard. question-1) So if I have many shards, then will it scale? Question-2) Also my original question, if my search is for a range of 3 months (that is quite common for my app), then it would be 90 `has_child`. So, do you see it is still an elegant solution? – JVK Jun 16 '16 at 02:13
  • 1) Parent and child documents live on the same shard for technical and practical reasons, it won't really be a problem. 2) I would have liked to see this requirement in your original question if it's a common requirement for your app ;-) With your initial design (either the first or the second), the only option that I know of would be by using scripting, but if you have 100 million documents this would not be really performant in my opinion. – Val Jun 16 '16 at 04:05
  • I sincerely appreciate your helping hand. I will try these options and will get back to you if I will have any question on your suggested solutions. I am going to upvote your answer. – JVK Jun 16 '16 at 05:15
  • No problem, always glad to help out! – Val Jun 16 '16 at 05:37
  • it just came to my mind. What if I store availability dates as large string. And let's assume my date array is sorted always. E.g. `["2016-06-12", "2016-06-18", "2016-06-19"......]` to just as a long string in this format YYMMDD contacting each other. so then this array will become `16062160618160619`.... (it could be maximum 6x365 = 2190 characters long) and then my search criteria will also be sorted and for 60 days availability search, it will be a 60x6=360 long string that I will do string comparison and if it is `in` that long string then it match. – JVK Jun 16 '16 at 05:53
  • What do you say? above idea is good? and could be much performant than any other solutions we have discussed so far? – JVK Jun 16 '16 at 05:54
  • If you're ok to reindex the full room document every time a single day or a few days are booked, then that might work. Also how do you handle days that have passed already? Are you going to remove them from that string? The other issue is searching within that long string, you're probably willing to use wildcard queries, but it's not a good idea. An alternative is to keep a single string but keep spaces to separate each date and then let that field be analyzed and each date will be a single token. – Val Jun 16 '16 at 05:57
  • I will be updating room availability ONLY once in every 24hours (if it has changed) I will have a service, that will send the list of rooms those availability has only changed - may be 100,000 rooms only changed. Even if I store dates in any other way, I will still need to reindex in every 24 hours. Regarding date already passed - it is not an issue, because search query will never include older days in search query. How bad could be my re-indexing? – JVK Jun 16 '16 at 06:04
  • I had to change my data storage design to get it work with better performance. What I did is this http://stackoverflow.com/questions/37866833/elasticsearch-date-range-condition-should-match-exactly-one-item-from-date-ran and it looks much better. The only thing I stuck there is how to make sure that availability object count is exactly 1? Any help will be highly appreciated. – JVK Jun 16 '16 at 23:13
0

i am new and just learning ES. What are the disadvantages of this setup/mapping ?

ciao..remco

Remco
  • 681
  • 1
  • 6
  • 20
  • See here for an updated solution: https://discuss.elastic.co/t/search-for-available-dates/127317/2?u=val – Val Apr 09 '18 at 12:06