partial failure of PUT or DELETE

Asked by ted on 2013-06-26

I understand the process of Put or Delete is like this the proxy sends an object PUT to three object servers and as long as at least 2 of the servers respond with 2xx,
the PUT is considered a success. If one fails, replication will repair that copy when it gets to it.
If more than two nodes fail, then it comes to a failure.
But if one node successes, it's likely that it will put the new object on its disk and it's possible that the replicator will copy this new object to all nodes. According to the code, I haven't seen any transaction or rollback operations to suspend this kind of problem. Anybody could tell me in which place it controls this problem?
Thank you!

Question information

Language:
English Edit question
Status:
Answered
For:
OpenStack Object Storage (swift) Edit question
Assignee:
No assignee Edit question
Last query:
2013-07-17
Last reply:
2013-07-17
Samuel Merritt (torgomatic) said : #1

The relevant code is in swift/proxy/controllers/obj.py, and the PUT method starts around line 750.

Note that the only way your scenario will occur is if the connection count drops below quorum right at the very end of the file, so that one backend gets the last bytes of the request but the other two don't. If the connection count drops below quorum in the middle of an upload, it's aborted immediately.

ted (lizhe-ted) said : #2

Thank you for your answer Samuel.
As you mentioned, if the connection count drops below quorum right at the very end of the file, then what happened? Will the problem occur?
And what about the delete method? It seems that object server will firstly delete the file and report a success status. But if partial failure occurs, what happen?

ted (lizhe-ted) said : #3

I mean is it a way to solve the problem if connection drops below quorum right at the very end of file?

ted (lizhe-ted) said : #4

The upload process is using the greenthread, so it's very likely that the connection with a good status will finish first, and if at this time partial failure occurs, does swift rollback this operation? Thank you.

Samuel Merritt (torgomatic) said : #5

To the best of my knowledge, if you manage to hit that exact case, then you'll get a failure response but the object will be successfully created in 1 spot. There's no rollback.

ted (lizhe-ted) said : #6

It means it's possible that a client creates or deletes something in only 1 place but due to the quorum and get a response of failure. And it will actually complete that job in the final due to the rsync job?

Samuel Merritt (torgomatic) said : #7

Assuming the 1 disk with the object doesn't fail, that's correct.

ted (lizhe-ted) said : #8

Thanks for your answer Samuel.
Now I understand this mechanism . But can you tell me the reason why?
Is there any benefits of doing like this?
It may be very confusing from the client side.

ted (lizhe-ted) said : #9

By the way I want to ask, why swift doesn't provide a client side library. It seems that the proxy server's burden is too high. It has to handle the request from the user, parse it and then send to the storage server and receive the response from them and finally send it to client. In my opinion, proxy server may become the center point of the whole system. It should only reply the client the location of the file to the user, and user will send the request to storage server directly. Why swift is not doing like this?

Can you help with this problem?

Provide an answer of your own, or ask ted for more information if necessary.

To post a message you must log in.