Ubuntu

Reshard causing blocks to write IO

Asked by Aaron Sawyer on 2019-04-12

ceph version 12.2.8 Luminous

Hi,
We are new to Ceph and the most recent issue I came across at what seems random is resharding, causing a complete pause to writes to the cluster. We have a 6x data node, 3x monitor (shared with 3 of the data nodes) and 3x RGW configuration for a total of 9 physical nodes in this particular cluster.

When we execute our load tests which are intensive, we are stable at 6.2 GBps throughput (5.4 GBps writes / 900 MBps reads). The next morning we see there was an hour of disruption early in the morning. The errors are very clear and correlate perfectly:

2019-04-12 09:26:40.649588 7f53f8e51700 0 NOTICE: resharding operation on bucket index detected, blocking
2019-04-12 09:26:40.660673 7f53ce5fc700 0 NOTICE: resharding operation on bucket index detected, blocking
2019-04-12 09:26:41.581099 7f53fae55700 0 NOTICE: resharding operation on bucket index detected, blocking
2019-04-12 09:26:46.088665 7f53f2644700 0 NOTICE: resharding operation on bucket index detected, blocking
2019-04-12 09:26:46.160134 7f53f7e4f700 0 NOTICE: resharding operation on bucket index detected, blocking
2019-04-12 09:27:25.659575 7f53f8e51700 0 block_while_resharding ERROR: bucket is still resharding, please retry
2019-04-12 09:27:25.671357 7f53ce5fc700 0 block_while_resharding ERROR: bucket is still resharding, please retry
2019-04-12 09:27:25.671641 7f53f8e51700 0 NOTICE: resharding operation on bucket index detected, blocking

I see mentioning of thresholds as well which I'm unfamiliar with:

2019-04-12 10:36:44.158795 7f53dae15700 0 check_bucket_shards: resharding needed: stats.num_objects=100123 shard max_objects=100000

The limit check appears as per below:

[
    {
        "user_id": "svc.cb1local",
        "buckets": [
            {
                "bucket": "tnbuffer",
                "tenant": "",
                "num_objects": 8391416,
                "num_shards": 167,
                "objects_per_shard": 50248,
                "fill_status": "OK"
            },
            {
                "bucket": "tnprivate",
                "tenant": "",
                "num_objects": 185442,
                "num_shards": 2,
                "objects_per_shard": 92721,
                "fill_status": "OK"
            }
        ]
    }
]

If someone can provide me a link or explanation on what resharding is and how to go about this, that would be most helpful. Apologies if this has been asked - I didn't see anything when doing a search on the word shard or reshard.

Thanks,
Aaron