0

We're in the process of writing a django app that lets users send private messages among themselves, as well as send message to a group, and are looking to implement a per-user customized search functionality so each user can search and view only messages they have received.

How do we offer a search experience that's customized to each user? Some messages are part of threads sent to thousands of users as part of a group, whereas others may be private messages sent between 2 users and even others may be "pending" messages that are held for moderation.

Do we hard-code the filters that determine if a user can view a message into each query we send to ElasticSearch, or if a message goes to a group with 1000 members do I add 1000 identical documents to ElasticSearch with the only thing changing being the recipient?

Update

So here's an individual message in it's serialized form serialized:

{
            "snippet": "Hi All,Though Marylan...", // Friendly snippet, this will be needed in the result
            "thread_id": 28719, // Unique ID for this thread
            "thread_title": "Great Thread Title Here", // Title for the thread, will be used to diplay in search results
            "sent_at": "2015-03-19 07:28:15.092030-05:00", // Datetime the message was originr
            "text": "Clean Message Test Here", // Text to be queryable
            "pending": false, // If pending, this should only appear in the search results of the sender
            "id": 30580, // Unique ID for this message across the entire
            "sender": {
                "sender_is_staff": false, // If the sender is a staff member or not (Filterable)
                "sender": "Anna M.", // Friendly name (we'll need this to display on the result page)
                "sender_guid": "23234304-eeee-bbbb-1234-bfb19d56ad68" // Guid of sender (necessary to display a link to the user's profile in the result)
            },
            "recipient" {
                  "name": "", // Not filled in for group messages
                  "recipient_guid": "" // Not filled in for group messages
            }
            "type": "group", // Values for this can be 'direct' or 'group'
            "group_id": 43 // This could be null
}

A user should be able to search:

  1. All the messages that they're the "sender" of
  2. All messages where their GUID is in the "recipient" area (and the "type" is "direct")
  3. All the messages sent to the groups IDs they're a member of that are not pending (they could be a member of 100 groups though, so it could be [10,14,15,18,25,44,50,60,75,80,81,82,83,...])

In SQL that'd be SELECT * FROM messages WHERE text contains 'query here' AND (sender.guid = 'my-guid' OR recipient.guid = 'my-guid' OR (group_id in [10,14,15,18,25,44,50,60,75,80,81,82,83,...] AND pending != True))

NickCatal
  • 718
  • 7
  • 16

1 Answers1

0

I hope I'm understanding your problem correctly.

So you have a messaging system where there are 3 types of messages (group, 2-users, moderated). Your goal is to allow your users to search through all messages, with the option to apply filters on type, user, date, etc.

Take advantage of the scalable nature of ElasticSearch for storing your searchable data. First, consider the servers on which your ES nodes are running on. Do they have enough performant resources (memory, CPU, network, hard drive speed) for your traffic and the size/quantity of your documents? Once you've decided on the server specs, you can simply add more as needed to distribute data and processing.

Next, create your message document structure. I imagine your mapping may look something like this:

"message": {
"properties": {
    "id": {
        "type": "long"
    },
    "type": {
        "type": "string"
    },
    "body": {
        "type": "string"
    },
    "from_user": {
        "type": "object",
        "properties": {
            "id": {
                "type": "integer"
            },
            "name": {
                "type": "string"
            }
        }
    },
    "to_user": {
        "type": "object",
        "properties": {
            "id": {
                "type": "integer"
            },
            "name": {
                "type": "string"
            }
        }
    },
    "group": {
        "type": "object",
        "properties": {
            "id": {
                "type": "integer"
            },
            "name": {
                "type": "string"
            }
        }
    },
    "added_on": {
        "type": "date"
    },
    "updated_on": {
        "type": "date"
    },
    "status_id": {
        "type": "short"
    }
}}

You may want to create custom analyzers for the "body" and "name" fields to customize your search results to fit your expectations. Then it's just a matter of writing queries and using filters/sorts to allow users to search globally or from/to specific users or groups.

After that, you just need to set up a bridge between your database and your ES index for syncing your messages for search. Sync frequency depends on how quickly you want messages to be made available for search.

Well, I truly hope I understood your question correctly. Otherwise, OK...

  • I just added an example serialized message as well as what a SQL query would look like. So what you're saying is that I should make sure that I have sufficient resources allocated, describe my serialized structure in ElasticSearch, create custom analyzers for the "text" field, then insert messages as they come in. How would I go about making sure users can only see what they're assigned? Should I just attach that filtering to each query my code sends to ElasticSearch? If a member is part of 500 groups, wouldn't that add 500 extra "OR"s to the search request and can ElasticSearch handle that? – NickCatal Mar 20 '15 at 05:02
  • It looks like your main difficulty is to find a way to allow a user to search through all messages belonging to groups that he's a part of, including messages where he is neither a sender of recipient. I'm not sure what ES's limit on query size is, but you can always send a multi-search request (multiple queries in one request). For example, if you see that ES only allows you to filter by 250 group ids in 1 query, then use multi-search to send, in one shot, 2 queries, each with its own set of 250 group id filters. – headspiderthingy Mar 21 '15 at 01:23
  • It'd be ideal if there was some way in ES to cross-reference docs from different index types in a single query, but I don't think this is possible (I'm still on v1.2.2, so maybe the newer versions support this?). – headspiderthingy Mar 21 '15 at 01:25