Preferred Swift partition size and count per Storage Node

Asked by Fatih Güçlü Akkaya on 2012-10-22

Hi all,

We plan to build an environment with Openstack swift as our storage. We need to plan our deployment model and allocate resources for that. Currently we take the sample from the documents and deployment 1 proxy node and 5 Storage nodes.

While building the rings we give 18 for partition power and 3 for replication factor. Every storage node has a disk allocated for swift storage and has one primatry partition with size of 100 GB.

Since i do not know exactly how the replication works, i can not forsee what kind of impact does partition count,size and replication factor have on the storage node.

What can you recommend about partition size and replication factor, if i intend to work with 5 storage nodes with one disk each on a read and write intensive environment?

Question information

English Edit question
OpenStack Object Storage (swift) Edit question
No assignee Edit question
Last query:
Last reply:
Samuel Merritt (torgomatic) said : #1

The general recommendation is to have at least 100 partitions per disk in your system. Given a part_power of N, you have 2^N partitions in your system.

The tricky bit is that you can't change your part_power once the ring is created, so you have to get it right the first time. Imagine the maximum number of disks your Swift cluster will ever have (be optimistic). Then, multiply by 100, take the base-2 logarithm, and round up to the next integer. That's the part_power you want to use.

If you've got way more than 100 partitions per disk, your replication passes take longer. Each partition gets its own directory, and they all have to be scanned.

If you've got way fewer than 100 partitions per disk, your data distribution will be more uneven than you want. You'll have to keep more free space on your disks in order to avoid their getting full.

Also, please note that a Swift partition has absolutely nothing to do with a disk partition. A Swift partition is a subset of the range of MD5; a disk partition is a subset of the addressable sectors on that disk.

As for replicas, it depends on how much your data is worth. General advice: pick 3. There's vague plans of being able to adjust the replica count on the fly, but there's no code for it yet.

Thanks for your answer. Just for confirmation currently my cluster have a total of 6 disks. Since i gave the partition power of 18. 2^18 make it 262144 partitions in total. From your explanation does mean that the swift partition per disk count is 262144/6 = 43690 or did i misunderstand your explanation about swift partitions?

Samuel Merritt (torgomatic) said : #3

That's pretty much correct.

Actually, it'll be 3*43690 partitions per disk since each partition has 3 replicas.

So this value ( 3*43690) is pretty much more than 100. From this i understand that replication process will take much longer and this may affect performance. In order to overcome this problem i need to increase my disk count. Which is a better practice, increasing disk count without adding more storage nodes or scale out the whole cluster with storage nodes with one disk per node? Furthermore i need to add the following info that for storage node we use virtual machines, meaning that there is actually one physical disk, which is virtually separated and mount to each VM. Is the calculation for swift partitions legit for virtualized disks?

Samuel Merritt (torgomatic) said : #5

The calculation for partition counts is legitimate for any disks, virtualized or not.

However, if you're really running 5 VMs on 1 host with 1 physical disk, then there's no point in having 3 replicas of your data. A failure of the one disk will destroy all 3 replicas of everything. If that's your setup, I wouldn't worry too much about performance, as it will be bad no matter what you do.

However, the partition directories only exist when there's something in them (an account, container, or object). If your cluster is lightly populated, then your replication won't be bogged down by tons of empty directories. It's only when your cluster has lots of objects in it that your replication may be slowed down by this.

Can you help with this problem?

Provide an answer of your own, or ask Fatih Güçlü Akkaya for more information if necessary.

To post a message you must log in.