404 errors after rebalance object ring

Asked by Velychkovsky

I've SAIO cluster with 30 HDDs with weight=6000 on each device.
Then I've expanded my cluster and added new 45 HDDs which have the same size, and set weigh=300 for each (to prevent high load from replication and rebalance affect)

Then I waited 1 day, and saw in my logs that replication process was very slow.

-- 162/161696 (0.10%) partitions replicated in 21300.66s (0.01/sec, 5899h remaining) --

I've buckets with millions of small objects. And then I decide to increase weight of new HDDs to 6000 and rebalance ring. After this I noticed that load on this new disks was increased, but I got another problem, some objects wasn't accessible from storage, and debug showed me that there was wrong paths to this object in new object.ring.gz file

Some debug of object location
---------
swift-get-nodes /etc/swift/object.ring.gz AUTH_d26a7f8b5e0f477ab2b206fcef8e7f9d thumbs_com 1/2/3/6/2/12362419_320x180.jpg

Account AUTH_d26a7f8b5e0f477ab2b206fcef8e7f9d Container thumbs_com Object 1/2/3/6/2/12362419_320x180.jpg

Partition 63316 Hash f7545a8af6daa9b54eb711e72edc1996

Server:Port Device 10.0.0.2:6000 21 Server:Port Device 10.0.0.3:6000 18 Server:Port Device 10.1.0.1:6000 11 Server:Port Device 10.0.0.3:6000 4 [Handoff] Server:Port Device 10.1.0.1:6000 23 [Handoff] Server:Port Device 10.0.0.2:6000 15 [Handoff]

curl -g -I -XHEAD "http://10.0.0.2:6000/21/63316/AUTH_d26a7f8b5e0f477ab2b206fcef8e7f9d/thumbs_com/1/2/3/6/2/12362419_320x180.jpg" curl -g -I -XHEAD "http://10.0.0.3:6000/18/63316/AUTH_d26a7f8b5e0f477ab2b206fcef8e7f9d/thumbs_com/1/2/3/6/2/12362419_320x180.jpg" curl -g -I -XHEAD "http://10.1.0.1:6000/11/63316/AUTH_d26a7f8b5e0f477ab2b206fcef8e7f9d/thumbs_com/1/2/3/6/2/12362419_320x180.jpg" curl -g -I -XHEAD "http://10.0.0.3:6000/4/63316/AUTH_d26a7f8b5e0f477ab2b206fcef8e7f9d/thumbs_com/1/2/3/6/2/12362419_320x180.jpg" # [Handoff] curl -g -I -XHEAD "http://10.1.0.1:6000/23/63316/AUTH_d26a7f8b5e0f477ab2b206fcef8e7f9d/thumbs_com/1/2/3/6/2/12362419_320x180.jpg" # [Handoff] curl -g -I -XHEAD "http://10.0.0.2:6000/15/63316/AUTH_d26a7f8b5e0f477ab2b206fcef8e7f9d/thumbs_com/1/2/3/6/2/12362419_320x180.jpg" # [Handoff]

Use your own device location of servers: such as "export DEVICE=/srv/node" ssh 10.0.0.2 "ls -lah ${DEVICE:-/srv/node}/21/objects/63316/996/f7545a8af6daa9b54eb711e72edc1996" ssh 10.0.0.3 "ls -lah ${DEVICE:-/srv/node}/18/objects/63316/996/f7545a8af6daa9b54eb711e72edc1996" ssh 10.1.0.1 "ls -lah ${DEVICE:-/srv/node}/11/objects/63316/996/f7545a8af6daa9b54eb711e72edc1996" ssh 10.0.0.3 "ls -lah ${DEVICE:-/srv/node}/4/objects/63316/996/f7545a8af6daa9b54eb711e72edc1996" # [Handoff] ssh 10.1.0.1 "ls -lah ${DEVICE:-/srv/node}/23/objects/63316/996/f7545a8af6daa9b54eb711e72edc1996" # [Handoff] ssh 10.0.0.2 "ls -lah ${DEVICE:-/srv/node}/15/objects/63316/996/f7545a8af6daa9b54eb711e72edc1996" # [Handoff]
-------------
I've checked and find this object in another location

10.0.0.2: /4/objects/63316/996/f7545a8af6daa9b54eb711e72edc1996

So it is looks like object exist, but has wrong path to it and I'm getting 404 not found(
----

Then I've return previous object.ring.gz file and paths to objects was restored.

So my question: do I need to wait when replication process is finished (5899h !) and only then rebalance my ring with new device weights ?

Question information

Language:
English Edit question
Status:
Expired
For:
OpenStack Object Storage (swift) Edit question
Assignee:
No assignee Edit question
Last query:
Last reply:
Revision history for this message
John Dickinson (notmyname) said :
#1

There's a lot to unpack here. Without more info about your cluster, it's pretty hard to make specific recommendations, but I can stick to some general things. Please also check out https://ask.openstack.org/en/questions/ and feel free to ask Swift developers question in #openstack-swift on freenode IRC.

* In your particular case, it sounds like you are adding a huge percentage of capacity all at once. Generally, it's better to add in smaller increments so that the data movement associated with ring rebalancing won't impact normal cluster operation. The ring-builder tool will only rebalance enough ring partition as is safe at one time. This means it will normally take multiple rebalance steps to add a significant amount of new capacity.

* Yes, it's important to wait for a replication cycle to complete before doing another rebalance. This will prevent service disruption caused by the ring being updated faster than data is moved in the cluster (ie the 404s you were seeing). The min_part_hours setting on the ring should be matched to your replication cycle time to give you some protection from changing things too quickly. (ie you should monitor replication and update min_part_hours accordingly)

* Unfortunately, replication can be quite slow when dealing with small files. Some things you can do to help speed it up are using a separate replication network, increasing rsync max_connections settings, increasing replication concurrency, and using a servers-per-port object server configuration.

* Monitoring replication cycle time is very important, as well as monitoring handoff partitions (eg with swift-recon or swift-dispersion-report).

I don't envy the situation you're in, but it's absolutely possible to move forward and get from where you are today to a healthy Swift cluster with all the extra capacity you're trying to add.

Revision history for this message
Velychkovsky (ahvizl) said :
#2

Thanks a lot for answer.
So now I have situation while fill my cluster I have many PUT requests, but I have enought load capacity for increase replication speed, and need to move in this way.

I would replication run more fast, tried set more replication threads but it didn't very help me.
I have impression that swift 'afraid' to overload my HDDs despite I have enouth load iops capacity.
I Will be very grateful if you will help me find parameters which can increase replication speed.
I have 3 storage nodes, each has 25 HDD x 6T, and replica factor 3.

This is my object-server.conf
---------------
[DEFAULT]
bind_port = 6000
user = swift
swift_dir = /etc/swift
devices = /mnt/swift
mount_check = True
log_level = ERROR
conn_timeout = 5
container_update_timeout = 5
node_timeout = 5
max_clients = 4096

[pipeline:main]
pipeline = healthcheck recon object-server

[app:object-server]
use = egg:swift#object
replication_concurrency = 800
replication_one_per_device = True
replication_lock_timeout = 30

[filter:healthcheck]
use = egg:swift#healthcheck

[filter:recon]
use = egg:swift#recon
recon_cache_path = /var/cache/swift
recon_lock_path = /var/lock

[object-replicator]
concurrency = 800
run_pause = 5
interval = 5
log_level = DEBUG
stats_interval = 10
rsync_io_timeout = 60

[object-reconstructor]

[object-updater]
concurrency = 200
interval = 20
slowdown = 0.008
log_level = DEBUG

[object-auditor]
interval = 300

[filter:xprofile]
use = egg:swift#xprofile

Revision history for this message
Velychkovsky (ahvizl) said :
#3

UPD:
I'm use separate replication network with 10G link
rsync max_connections and replication concurrency = 800
but don'r use servers-per-port object server

I think I need more agressive replication mode)

Revision history for this message
Velychkovsky (ahvizl) said :
#4

Can anybody help with this problem ? Is this issue unresolved by swift( ?

Revision history for this message
Launchpad Janitor (janitor) said :
#5

This question was expired because it remained in the 'Open' state without activity for the last 15 days.