1

I have two entities in my application: tasks and users. Each task is assigned to a user. I may have tens of thousands of tasks per application instance but only tens or hundreds of users per application instance.

I’d like to sort my tasks by the assigned user’s name.

However, the assigned user’s name may change over time. If I were to denormalize this data then whenever a user changes their name I’d need to update hundreds or thousands of tasks which I imagine could be inefficient.

The ElasticSearch join field type feels like a good fit here since I have many children but few parents. But I can’t find a way to reference a parent’s field from a child. (Or better yet index children based on a parent field’s value.)

Alternatively if ElasticSearch has an efficient bulk update API that: 1) doesn’t require me to load all a user’s tasks into application code, 2) can update all denormalized account names atomically in one refresh then I feel that would be a viable solution for my use case.

(This application may actually run on AWS OpenSearch.)

Calebmer
  • 2,972
  • 6
  • 29
  • 36

1 Answers1

0

You could:

  1. Denormalization with Bulk Update. i.e. storing the assigned user's name directly in each task document. When a user's name changes Bulk Update will make it ez to change them all.

OR.

  1. Instead of using ElasticSearch's join field, you can use a parent-child relationship.

i.e. separate index for users and tasks, with tasks being the child of the user.

{
  "mappings": {
    "properties": {
      "user_id": {
        "type": "keyword"
      },
      "user_name": {
        "type": "text"
      },
      // Other user-related fields...
    }
  }
}

{
  "mappings": {
    "properties": {
      "task_id": {
        "type": "keyword"
      },
      "task_name": {
        "type": "text"
      },
      "user_id": {
        "type": "keyword",
        "copy_to": "parent_id"  // Copy the value to a field that will be used as the parent ID
      },
      // Other task-related fields...
    }
  }
}

POST tasks/_doc/task1
{
  "task_id": "task1",
  "task_name": "Task 1",
  "user_id": "user123",  // User ID of the user this task is assigned to
  // Other task-related fields...
}
GET tasks/_search
{
  "query": {
    "has_parent": {
      "parent_type": "users",
      "query": {
        "term": {
          "user_id": "user123"  
        }
      },
      "inner_hits": {},  
      "sort": [
        {
          "users.<FIELD_IN_PARENT>": {
            "order": "asc"  
          }
        }
      ]
    }
  }
}

Andrew Arrow
  • 4,248
  • 9
  • 53
  • 80
  • The link you posted doesn’t work but it looks like the same link as mine. Could you include more specific examples? I can’t find out how to use a parent/child relationship to sort from that link. And bulk update would still require me to search all task documents and load them in my application code (which I’d like to avoid if possible). – Calebmer Jul 19 '23 at 14:16
  • sure I removed dup link and added example – Andrew Arrow Jul 19 '23 at 14:20
  • This allows me to filter by a specific parent (in this case user123). What I want is to sort tasks across multiple parents based on a field in the parent. – Calebmer Jul 20 '23 at 12:29
  • ok I updated to "has_parent" query in combination with the "inner_hits" and "sort" options. – Andrew Arrow Jul 20 '23 at 13:24