About swift architecture

Asked by HU Zhong

the Multi-server swift installation documentation says proxy is in an external network, while the storage nodes are not accessible from outside the cluster.

The question is :
when uploading objects to the cluster, how are the real data transferred?
the data routine is user(client) -> proxy node -> storage node ? (regardless of request itself, just data here)

If so, when there's a lot of requests for reading/writing, the proxy(network) should be a bottleneck, although we can add other proxy nodes.

Question information

Language:
English Edit question
Status:
Solved
For:
OpenStack Object Storage (swift) Edit question
Assignee:
No assignee Edit question
Solved by:
John Dickinson
Solved:
Last query:
Last reply:
Revision history for this message
Salvatore Piccolo (spiccolo) said :
#1

And there is another question: how to configure corretly two or more proxy-node? Remember that you can have only one auth-node ... and this is anothe bottleneck when there are hardware failure ...

Revision history for this message
John Dickinson (notmyname) said :
#2

You are correct that the data flow is through the proxy nodes. At Rackspace, we have the proxy nodes connected to the public network via 10G connections and the storage nodes (internal network) connected via 1G. We have many more storage nodes than proxy nodes, and we aren't network limited by the proxies. Note that each external request to the proxy servers will turn in to 3 requests on the storage node network (three replicated writes that attempt to happen concurrently).

The way to configure two or more proxies is to have one external VIP that load balances across all of the proxy nodes.

The data flow is (client) -> (load balancer) -> proxy -> (3x storage node).

Revision history for this message
Salvatore Piccolo (spiccolo) said :
#3

Hi John, when you have more than one proxy and you add a storage node, you have to add it on each proxy with these commands

swift-ring-builder account.builder add z<ZONE>-<STORAGE_LOCAL_NET_IP>:6002/<DEVICE> 100
swift-ring-builder container.builder add z<ZONE>-<STORAGE_LOCAL_NET_IP_1>:6001/<DEVICE> 100
swift-ring-builder object.builder add z<ZONE>-<STORAGE_LOCAL_NET_IP_1>:6000/<DEVICE> 100

is it right?

And how do I manage account.ring.gz, container.ring.gz, and object.ring.gz? Which one I've to copy on storage node?

Revision history for this message
Chuck Thier (cthier) said :
#4

Hi Salvatore,

While the requests do stream through the proxy, the proxies are not really a bottleneck, as they scale horizontally like the rest of the system. Some of your questions about using more than one proxy node can be found with this question: https://answers.launchpad.net/swift/+question/134585.

The built in auth for swift is meant for development only, and not for a production system.

For the ring, we recommend administrating the ring on one server and then distributing the ring files out to all of the nodes (proxy and storage).

Since this question has come up more than once now, I'm going to create a bug report to add more clarification on how to add more proxies.

Revision history for this message
HU Zhong (hz02ruc) said :
#5

Hi John,

I thought each external request would write data to only one storage node, then this storage node rsync the data to another two storage nodes before. Some distributed file systems handle the situation in this way. Could you please tell me is there any benefit for the concurrent write(from proxy to 3 storage node)?

Revision history for this message
Best John Dickinson (notmyname) said :
#6

Swift requires that 2 storage nodes successfully write the object before returning success (in the default config where there are 3 replicas). The third copy will be attempted, but if it fails, replication will handle creating it on the third object server. The second copy is essential, because a single hardware failure immediately after a successful write to a single object server may cause the data to be permanently lost.

To increase user performance (reduce write latency), the three writes to storage nodes happen concurrently. This means that the internal network ports on the proxy servers have an effective throughput of one-third their listed rate. In Rackspace's case, this means that if a user were able to get a sustained 3Gbps to the proxy servers (the load balancers, actually), this would be their network cap. In reality, since each object write is written to one spindle, the actual limiting factor is hard drive spindle speed.

If you were attempting to maximize large write speeds to swift (and cost is no factor), first get really fast disk IO, then fast networks to the storage nodes, then fast public access to the proxy servers (ensuring that the pubic speed is at least 3x the storage network). Finally, start increasing processors and memory on the proxy and storage nodes. For Rackspace, these optimizations would do very little for the average use case, and provide almost no benefit even for the exceptional use case. As always, your use case could be different, so your milage may vary.

Revision history for this message
HU Zhong (hz02ruc) said :
#7

Thanks John Dickinson, that solved my question.