Failover principle of Swift

Asked by Fatih Güçlü Akkaya

Hi,

During my test i came across following behaviour:

My environment consists of 1 proxy node and 6 storage nodes (account/container/object). I close three nodes and tried to make some gets to certain objects. I have discovered that swift proxy tried to reach the nodes which are closed thus receiving connection timeouts. I guess i did not configure my ring properly against failover. What am i missing in my configuration? You can see my configuration below:

ring configuration with partition power 18 replication 3

account-server.conf

[DEFAULT]
bind_ip = 0.0.0.0
workers = 8

[pipeline:main]
pipeline = account-server

[app:account-server]
use = egg:swift#account

[account-replicator]
run_pause=900

[account-auditor]

[account-reaper]

container-server.conf

[DEFAULT]
bind_ip = 0.0.0.0
workers = 8

[pipeline:main]
pipeline = container-server

[app:container-server]
use = egg:swift#container

[container-replicator]
run_pause=900

[container-updater]

[container-auditor]

object-server.conf

[DEFAULT]
bind_ip = 0.0.0.0
workers = 8

[pipeline:main]
pipeline = object-server

[app:object-server]
use = egg:swift#object

[object-replicator]
run_pause=900
ring_check_interval=900

[object-updater]

[object-auditor]

Question information

Language:
English Edit question
Status:
Solved
For:
OpenStack Object Storage (swift) Edit question
Assignee:
No assignee Edit question
Solved by:
Fatih Güçlü Akkaya
Solved:
Last query:
Last reply:

This question was reopened

Revision history for this message
Fatih Güçlü Akkaya (gucluakkaya) said :
#2

Sorry for the previous comment accidently pressed solved button. Here is my account,container and object rings.

id zone ip address port name weight partitions balance meta
             0 1 ip1 6002 sdb1 100.00 131072 0.00
             1 2 ip2 6002 sdb1 100.00 131072 0.00
             2 3 ip3 6002 sdb1 100.00 131072 0.00
             3 4 ip4 6002 sdb1 100.00 131072 0.00
             4 5 ip5 6002 sdb1 100.00 131072 0.00
             5 6 ip6 6002 sdb1 100.00 131072 0.00

             0 1 ip1 6001 sdb1 100.00 131072 0.00
             1 2 ip2 6001 sdb1 100.00 131072 0.00
             2 3 ip3 6001 sdb1 100.00 131072 0.00
             3 4 ip4 6001 sdb1 100.00 131072 0.00
             4 5 ip5 6001 sdb1 100.00 131072 0.00
             5 6 ip6 6001 sdb1 100.00 131072 0.00

             0 1 ip1 6000 sdb1 100.00 131072 0.00
             1 2 ip2 6000 sdb1 100.00 131072 0.00
             2 3 ip3 6000 sdb1 100.00 131072 0.00
             3 4 ip4 6000 sdb1 100.00 131072 0.00
             4 5 ip5 6000 sdb1 100.00 131072 0.00
             5 6 ip6 6000 sdb1 100.00 131072 0.00

Revision history for this message
clayg (clay-gerrard) said :
#3

If you only stop two nodes do you still have problems?

It seems like on three replica system, with 50% of the cluster offline - there's going to be some objects that only exist on downed nodes?

-clayg

Revision history for this message
Fatih Güçlü Akkaya (gucluakkaya) said :
#4

Thank you for your answer. It seems like our application inserting and retrieving containers had some problem. You are right that for 50% of cluster being offline some object cannot be retrieved and after more test i verified that failover work properly, if one node is down swift will look for another node.

Sorry for the inconvenience.