Understanding a Timeout

Asked by Steve

I've been testing files of various sizes, and while doing so I am tailing /var/log/syslog

For files around 100MB (haven't gone larger yet), i see the following line maybe 25% of the time:
Jun 13 10:22:59 proxy-01 proxy-server ERROR with Object server 192.168.X.X:6000/sdb1 re: Trying to get final status of PUT to /v1/AUTH_d2e67314-2a18-45e9-bde1-7b2b3d6da553/test3/100MB: Timeout (10s)

The PUT always succeeds, so I'm wondering how I would know to care about this log entry. If it works, why log an error? (Though, maybe it really is an error, and I should adjust the node_timeout value in the proxy conf.)

Thoughts?

Question information

Language:
English Edit question
Status:
Solved
For:
OpenStack Object Storage (swift) Edit question
Assignee:
No assignee Edit question
Solved by:
gholt
Solved:
Last query:
Last reply:
Revision history for this message
Best gholt (gholt) said :
#1

Hmm. Well, the proxy sends an object PUT to three object servers and as long as at least 2 of the servers respond with 2xx, the PUT is considered a success. If one fails, replication will repair that copy when it gets to it. If it's taking longer than 10s to finalize a 100MB file, that seems worrisome. What OS and version are you using? I believe the object servers only sync at every 512MB and end of file. Perhaps setting that lower for your setup would help? The object server has a "mb_per_sync" config option.

Revision history for this message
Steve (lardcanoe) said :
#2

Running in a VM network that is somewhat heavily loaded. Great and awful for testing...

The way you describe it, that log seems more like a warning at the scope of the operation.

I'll play around with that setting. Thanks!

Revision history for this message
Steve (lardcanoe) said :
#3

Thanks gholt, that solved my question.

Revision history for this message
Steven Huynh (stbven) said :
#4

Does anyone know if swift-proxy waits for all 3 PUTS to come back before it reports back with a return code for the client?

I'm seeing cases where one of the object server takes longer than 10 seconds and timeouts, but the overall return code is still a 201. However because the response doesn't come back until after 10 seconds, the client has already timed out (5 seconds on their side) and considers the PUT an error.

Revision history for this message
Samuel Merritt (torgomatic) said :
#5

It waits until it's got a quorum of PUT responses, then waits for a configurable-but-short span of time (default 0.5 seconds, config parameter post_quorum_timeout) for the remainder of the responses to come back, then sends a response to the client.

That quorum bit is subtle, so be careful. If the proxy sends 3 requests and gets 201 then 201, that's a quorum, and it'll wait the half-second for stragglers and then send a response to the client. Same for (200, 201). However, if it gets 201 then 500, then there's no quorum there. It's a quorum of response code class (2xx, 3xx, 4xx, 5xx), not a quorum of total responses, nor a quorum of individual status codes.

Also, 5 seconds seems like a short timeout on the client side. I'd bump that up substantially if I were you.

Revision history for this message
Steven Huynh (stbven) said :
#6

Thanks Samuel, that was very helpful.

Another question, am I supposed to see 3 separate PUT responses to each PUT call in the proxy-server logs? I only seem to see one PUT reply that times out after 10 seconds which happens 2x as the client retires once after the 5 second timeout. If not, do I need to look on the object nodes for the individual responses for each PUT? Based on your reply, I'm trying to figure out why the client is timing out even though the call does return a 201 and the object exists in Swift.

Sep 17 18:49:59 proxy-server ERROR with Object server 10.97.6.12:6000/D001 re: Trying to get final status of PUT to /v1/AUTH_ca58bbe67308444d979d8aca80f1234/testContainer/test: Timeout (10s) (txn: tx0af76705988e498baebae11079f19e7e) (client_ip: 10.97.0.12)
Sep 17 18:49:59 proxy-server 10.97.0.12 10.97.0.12 17/Sep/2014/18/49/59 PUT /v1/AUTH_ca58bbe67308444d979d8aca80f51234/testContainer/test HTTP/1.0 201 - jclouds/1.7.3%20java/1.7.0_55 ba85079e9f3d4acba482d39b21679222 35072 - - tx0af76705988e498baebae11079f19e7e - 10.0662 -

Sep 17 18:50:05 proxy-server ERROR with Object server 10.97.6.12:6000/D001 re: Trying to get final status of PUT to /v1/AUTH_ca58bbe67308444d979d8aca80f51234/testContainer/test: Timeout (10s) (txn: txb880a1a50cb0448cbaf3391ba53d8b70) (client_ip: 10.97.0.10)
Sep 17 18:50:05 proxy-server 10.97.0.10 10.97.0.10 17/Sep/2014/18/50/05 PUT /v1/AUTH_ca58bbe67308444d979d8aca80f51234/testContainer/test HTTP/1.0 201 - jclouds/1.7.3%20java/1.7.0_55 ba85079e9f3d4acba482d39b21679222 35072 - - txb880a1a50cb0448cbaf3391ba53d8b70 - 10.0493 -