How object replication/synchronization works?

Asked by Peter Peng

The Swift documentation describes the object replication as below:
--------------------------------------------------------------------------------------
Object replication uses a scheme in which a hash of the contents for each suffix directory is saved to a per-partition hashes file. The hash for a suffix directory is invalidated when the contents of that suffix directory are modified.
The object replication process reads in these hash files, calculating any invalidated hashes. It then transmits the hashes to each remote server that should hold the partition, and only suffix directories with differing hashes on the remote server are rsynced. After pushing files to the remote server, the replication process notifies it to recalculate hashes for the rsynced suffix directories.
--------------------------------------------------------------------------------------

Per my understanding, the object is modified on some storage node, and then it's synchronized to other replicas (for example Swfit's default config is 3 replicas, then the object will be replicated from this node to other two storage nodes). And this replication is asynchronous.

However, I also see some posts in this forum that the write to an object is sent in parallel from the proxy server to three replicas (storage nodes), obviously this mechanism is quite different from the above one. (a further question: if there are 5 or 10 replicas, the writes to the object will still be sent to these 5 or 10 nodes in parallel, right?)

Now I get confused on the Object replication implementation. Could someone explain which is the correct one? Thanks.

Question information

Language:
English Edit question
Status:
Answered
For:
OpenStack Object Storage (swift) Edit question
Assignee:
No assignee Edit question
Last query:
Last reply:
Revision history for this message
John Dickinson (notmyname) said :
#1

Replication is used to keep 3 good copies. It doesn't have anything to do with how the copies are initially written. The proxy writes all three replicas concurrently. After the write is finished, success is returned to the client and the proxy is out of the picture. However, if bit rot, file system corruption, or hardware failure causes a replica to be lost, the replication process ensures that a good copy exists in 3 locations.

So both things you described are accurate, but you are describing two separate processes in swift.

--John

On Jan 11, 2012, at 8:25 PM, Peter Peng wrote:

> Question #184443 on OpenStack Object Storage (swift) changed:
> https://answers.launchpad.net/swift/+question/184443
>
> Description changed to:
> The Swift documentation describes the object replication as below:
> --------------------------------------------------------------------------------------
> Object replication uses a scheme in which a hash of the contents for each suffix directory is saved to a per-partition hashes file. The hash for a suffix directory is invalidated when the contents of that suffix directory are modified.
> The object replication process reads in these hash files, calculating any invalidated hashes. It then transmits the hashes to each remote server that should hold the partition, and only suffix directories with differing hashes on the remote server are rsynced. After pushing files to the remote server, the replication process notifies it to recalculate hashes for the rsynced suffix directories.
> --------------------------------------------------------------------------------------
>
> Per my understanding, the object is modified on some storage node, and
> then it's synchronized to other replicas (for example Swfit's default
> config is 3 replicas, then the object will be replicated from this node
> to other two storage nodes). And this replication is asynchronous.
>
> However, I also see some posts in this forum that the write to an object
> is sent in parallel from the proxy server to three replicas (storage
> nodes), obviously this mechanism is quite different from the above one.
> (a further question: if there are 5 or 10 replicas, the writes to the
> object will still be sent to these 5 or 10 nodes in parallel, right?)
>
> Now I get confused on the Object replication implementation. Could
> someone explain which is the correct one? Thanks.
>
> --
> You received this question notification because you are a member of
> Swift Core, which is an answer contact for OpenStack Object Storage
> (swift).

Revision history for this message
Peter Peng (peter-peng) said :
#2

Thank you John. Your explanation addressed my question.
As you mentioned: "However, if bit rot, file system corruption, or hardware failure causes a replica to be lost, the replication process ensures that a good copy exists in 3 locations." Here I have one more question for normal case:
When an object is normaly modified (client initiates a piece of data change to the object), is the data change sent in parallel from proxy server to the three replicas, or is the data change firstly written to one node and then later replicated to another two nodes?
Thanks a lot.

Revision history for this message
John Dickinson (notmyname) said :
#3

The proxy attempts to write to all three replicas, but it only requires a successful response from two of them to send success back to the client. In the event that the third copy cannot be written, swift relies on replication to ensure that the third copy is created.

--John

On Jan 11, 2012, at 8:40 PM, Peter Peng wrote:

> Question #184443 on OpenStack Object Storage (swift) changed:
> https://answers.launchpad.net/swift/+question/184443
>
> Peter Peng posted a new comment:
> Thank you John. Your explanation addressed my question.
> As you mentioned: "However, if bit rot, file system corruption, or hardware failure causes a replica to be lost, the replication process ensures that a good copy exists in 3 locations." Here I have one more question for normal case:
> When an object is normaly modified (client initiates a piece of data change to the object), is the data change sent in parallel from proxy server to the three replicas, or is the data change firstly written to one node and then later replicated to another two nodes?
> Thanks a lot.
>
> --
> You received this question notification because you are a member of
> Swift Core, which is an answer contact for OpenStack Object Storage
> (swift).

Revision history for this message
Peter Peng (peter-peng) said :
#4

Thanks John. If I understand correctly, when client creates a new object, this object is written to three replicas in paralle; when client makes some changes to the existing object, the changes are still written to three replicas in parallel, right?
P.S. if there are more than 3 replicas (say 10 replicas), the proxy server will still consider the write successful as long as two replicas (or more) return success, right? Thanks again.

Revision history for this message
John Dickinson (notmyname) said :
#5

Swift actually looks for a majority, so 10 replicas would need 6 good writes. Other than that, everything you said is correct.

--John

On Jan 12, 2012, at 12:01 AM, Peter Peng wrote:

> Question #184443 on OpenStack Object Storage (swift) changed:
> https://answers.launchpad.net/swift/+question/184443
>
> Peter Peng posted a new comment:
> Thanks John. If I understand correctly, when client creates a new object, this object is written to three replicas in paralle; when client makes some changes to the existing object, the changes are still written to three replicas in parallel, right?
> P.S. if there are more than 3 replicas (say 10 replicas), the proxy server will still consider the write successful as long as two replicas (or more) return success, right? Thanks again.
>
> --
> You received this question notification because you are a member of
> Swift Core, which is an answer contact for OpenStack Object Storage
> (swift).

Revision history for this message
Peter Peng (peter-peng) said :
#6

Really appreciate the help John. Thanks.
Can I ask you another question about the sqlite databases for account/container? I saw some posts in this forum that the object read request will not touch/access the container DB, only the write request will update the container DB, is this correct? If YES, why the read request needn't to access the DB?
In my understanding, the information where the object is physically located (e.g., on which storage node and distinct path) is stored in the DB, in order to handle the object read/write requrest, the proxy server needs to know the physical location of that object and hence needs to touch the DB. Is my understanding wrong? If I am wrong, how the proxy server know where is the object physically stored? Thanks.

Can you help with this problem?

Provide an answer of your own, or ask Peter Peng for more information if necessary.

To post a message you must log in.