Write per-zone (datacentre)
Hi,
I'm evaluating Swift and have a question about the Ring Builder and zones.
Say I have 2 datacentres and I want object PUTs to always be made synchronously to each of the 2 DCs, but I have potentially many devices in any one DC. I could define the 2 DCs as separate zones, where (let's say for simplicity) zone 1 has one device, and zone 2 has two devices. So I could declare a replica count of 3, thus forcing a majority of writes to go to the 2 zones. So, I create a simple ring (with 16 partitions for simplicity) as follows:
swift-ring-builder /tmp/ring create 4 3 1
swift-ring-builder /tmp/ring add z1-<server1>
swift-ring-builder /tmp/ring add z2-<server2>
swift-ring-builder /tmp/ring add z2-<server3>
swift-ring-builder /tmp/ring rebalance
The abbreviated output of unpickling the ring:
array('H', [1, 2, 2, 0, 2, 0, 0, 0, 0, 2, 1, 2, 2, 1, 1, 1]),
array('H', [0, 0, 0, 1, 0, 2, 2, 1, 1, 0, 0, 0, 0, 0, 0, 0]),
array('H', [2, 1, 1, 2, 1, 1, 1, 2, 2, 1, 2, 1, 1, 2, 2, 2])
This seems surprising as the documentation says "For a given partition number, each replica’s device will not be in the same zone as any other replica’s device." But here each partition has a replica at both of the devices in zone 2.
Does Swift do anything in this case to ensure that during writes, the majority of 2 nodes actually reside in different zones? For example, the single device in zone 1 could be unavailable - this would leave a majority of 2 nodes available in zone 2, so the write would only be done there?
Many thanks,
Ben
Question information
- Language:
- English Edit question
- Status:
- Solved
- Assignee:
- No assignee Edit question
- Solved by:
- Samuel Merritt
- Solved:
- 2012-12-13
- Last query:
- 2012-12-13
- Last reply:
- 2012-12-12
|
#1 |
Well, you've got 2 zones and 3 replicas, so you pretty much have to have >1 replica within a zone. That's just the pigeonhole principle at work. The docs should probably say something like ""For a given partition number, each replica’s device will not be in the same zone as any other replica’s device *as long as there are at least as many zones as replicas*."
The quorum of write nodes is not distributed in any particular way, so it's true that a write could go entirely to zone 2 if the device in zone 1 is down. Once the fault in zone 1 is repaired, replication will ensure that a copy of the object makes its way to zone 1.
Ben Rowland (ben-rowland) said : | #2 |
Thanks Samuel Merritt, that solved my question.