0

I have a database with 800k objects and I defined about 13 shard servers to access the data quickly. I assigned a letter to each object for use in the sharding process, for example, shards: 'a' for the first object, shards: 'b' for the second object, and so on. I created a shard key using the shards field within each object and wanted to distribute the objects as evenly as possible across the 13 shard servers. I used "hashed" as the shard key for the shards field. I evenly distributed the letters to all objects, for example, 50k objects had shards: 'a' and 50k objects had shards: 'b', and so on. I used "sh.shardCollection("test.testCollection", { "shards": "hashed" } ) to shard the collection, but the data only went to two of the 13 shard servers. The distribution was not even among the two servers, with a distribution of approximately 72% to one server and 28% to the other. I want the data to be evenly distributed among all 13 shard servers. Can you help me with this?

[
  {
    _id: 'a',
    host: 'a/127.0.0.1:21000,127.0.0.1:21001,127.0.0.1:21002',
    state: 1,
    topologyTime: Timestamp({ t: 1675107083, i: 3 })
  },
  {
    _id: 'b',
    host: 'b/127.0.0.1:22000,127.0.0.1:22001,127.0.0.1:22002',
    state: 1,
    topologyTime: Timestamp({ t: 1675107100, i: 5 })
  },
  {
    _id: 'c',
    host: 'c/127.0.0.1:23000,127.0.0.1:23001,127.0.0.1:23002',
    state: 1,
[direct: mongos] test>
    draining: true
  },
  {
    _id: 'd',
    host: 'd/127.0.0.1:23010,127.0.0.1:23011,127.0.0.1:23012',
    state: 1,
    topologyTime: Timestamp({ t: 1676821653, i: 5 })
  },
  {
    _id: 'e',
    host: 'e/127.0.0.1:23020,127.0.0.1:23021,127.0.0.1:23022',
    state: 1,
    topologyTime: Timestamp({ t: 1676821663, i: 5 })
  },
  {
    _id: 'f',
    host: 'f/127.0.0.1:23030,127.0.0.1:23031,127.0.0.1:23032',
    state: 1,
    topologyTime: Timestamp({ t: 1676821668, i: 1 })
  },
  {
    _id: 'g',
    host: 'g/127.0.0.1:23040,127.0.0.1:23041,127.0.0.1:23042',
    state: 1,
    topologyTime: Timestamp({ t: 1676821673, i: 5 })
  },
  {
    _id: 'h',
    host: 'h/127.0.0.1:23050,127.0.0.1:23051,127.0.0.1:23052',
    state: 1,
    topologyTime: Timestamp({ t: 1676821678, i: 5 })
  },
  {
    _id: 'j',
    host: 'j/127.0.0.1:23060,127.0.0.1:23061,127.0.0.1:23062',
    state: 1,
    topologyTime: Timestamp({ t: 1676821685, i: 5 })
  },
  {
    _id: 'k',
    host: 'k/127.0.0.1:23070,127.0.0.1:23071,127.0.0.1:23072',
    state: 1,
    topologyTime: Timestamp({ t: 1676821689, i: 5 })
  },
  {
    _id: 'l',
    host: 'l/127.0.0.1:23080,127.0.0.1:23081,127.0.0.1:23082',
    state: 1,
    topologyTime: Timestamp({ t: 1676821694, i: 5 })
  },
  {
    _id: 'm',
    host: 'm/127.0.0.1:23090,127.0.0.1:23091,127.0.0.1:23092',
    state: 1,
    topologyTime: Timestamp({ t: 1676821698, i: 5 })
  },
  {
    _id: 'n',
    host: 'n/127.0.0.1:24000,127.0.0.1:24001,127.0.0.1:24002',
    state: 1,
    topologyTime: Timestamp({ t: 1676821708, i: 4 })
  }
]
Shard a at a/127.0.0.1:21000,127.0.0.1:21001,127.0.0.1:21002
{
  data: '125.57MiB',
  docs: 227420,
  chunks: 1,
  'estimated data per chunk': '125.57MiB',
  'estimated docs per chunk': 227420
}
Shard k at k/127.0.0.1:23070,127.0.0.1:23071,127.0.0.1:23072
{
  data: '326.31MiB',
  docs: 576209,
  chunks: 1,
  'estimated data per chunk': '326.31MiB',
  'estimated docs per chunk': 576209
}

Object sample:

{
  "_id": {
    "$oid": "63dd7324289226c918818c55"
  },
  "Title": "",
  "Product": {
    "web1": {
      "Harry Potter and the Chamber of Secrets: 2/7 (Harry Potter 2)": {
        "Price": 15,
        "Url": "https://www.amazon.com/Harry-Potter-Chamber-Secrets-Book/dp/B017V4IPPO/ref=sr_1_2?crid=GCT8C7Z3Q4SE&keywords=Harry+Potter+and+the+Chamber+of+Secrets&qid=1676836656&sprefix=harry+potter+and+the+chamber+of+secrets%2Caps%2C230&sr=8-2",
        "Time": {
          "$date": {
            "$numberLong": "1676669514749"
          }
        }
      }
    }
  },
  "Category": [
    "Book",
    "Fantasy"
  ],
  "Time": {
    "$date": {
      "$numberLong": "1676669514749"
    }
  },
  "shards": "h"
}

I want to ensure that the data is evenly distributed among my shard servers. I would like to learn what I need to do for this.

BayGold
  • 3
  • 2
  • 2
    I think this question is better for dba.stackexchange.com as it is about database administration rather than programming – user20042973 Feb 19 '23 at 19:15
  • 1
    I removed the images and wrote it as a code block, but it didn't seem very nice. @ray – BayGold Feb 19 '23 at 19:20
  • @BayGold You can use triple backticks ``` to surround the code segments. See [formatting helps](https://stackoverflow.com/editing-help) for more details. – ray Feb 19 '23 at 19:23
  • You created 39 shard services on a single host. Do you really think this will increase the performance? A smart index would be better. Did you read and considered [Read Operations to Sharded Clusters](https://www.mongodb.com/docs/manual/core/distributed-queries/#read-operations-to-sharded-clusters)? If not, then your queries will be even slower. – Wernfried Domscheit Feb 19 '23 at 19:58
  • Can you share a sample document? What is the shard key? In MongoDB 6.0 the default Chunk size is 128 MiBytes, for even distribution over all shards you would need at least 1.6 GB of data – Wernfried Domscheit Feb 19 '23 at 20:01
  • I am using a Ryzen 9 5950X processor and 96GB of RAM. When I searched for book titles using the $text search, I experienced significant delays while using three shard servers. Therefore, I want to increase the number of shard servers and distribute the data. There are approximately 500 $text queries per second to the database. I have added a sample object to the question. @WernfriedDomscheit – BayGold Feb 19 '23 at 20:05
  • Have a look at [Choose a Shard Key](https://www.mongodb.com/docs/manual/core/sharding-choose-a-shard-key/) I think your key does not fulfill any of these requirements. Did you create a [text index](https://www.mongodb.com/docs/manual/core/index-text/)? – Wernfried Domscheit Feb 19 '23 at 20:08
  • of course i create for titles. Do you think the error is caused by the shard key? @WernfriedDomscheit – BayGold Feb 19 '23 at 20:15

0 Answers0