write failure followed by an allocation failure

Asked by Huabin Zheng on 2012-06-02

hi all,
   I use memcached 1.4.13 and libmemcached 1.0.4 on CentOS 5.5 64bit. Once the SET operation failed because server temporarily out of memory, the following retry will get write failure error, and it can never auto recovery. My application code looks like this:

    retry = 0;
    do {
      g_rc = memcached_set(g_client, path, path_len, src, file_info.st_size, 18000, flags);
      if (g_rc != MEMCACHED_SUCCESS) {
        printf("Set_Failed: %s, errorcode: %d, retry: %d\n", path, g_rc, retry);
      } else {
        printf("Set_Success: %s\n", path);
        break;
      }
      Wait(10000); // 10000ms
    } while (++retry < 3);

    output:

    Set_Failed: /data/_lcsO_683a0000c7da125d.jpg, errorcode: 17, retry: 0
    Set_Failed: /data_lcsO_683a0000c7da125d.jpg, errorcode: 5, retry: 1
    Set_Failed: /data/_lcsO_683a0000c7da125d.jpg, errorcode: 5, retry: 2
    Set_Failed: /data/PANI_412b00002d21125c.jpg, errorcode: 5, retry: 0
    Set_Failed: /data/PANI_412b00002d21125c.jpg, errorcode: 5, retry: 1

    error code 17:MEMCACHED_MEMORY_ALLOCATION_FAILURE
                    5 :MEMCACHED_WRITE_FAILURE

    so after a failure of allocation, the app is trapped in write failure and can never recovery.

    any hint?

Question information

Language:
English Edit question
Status:
Open
For:
libmemcached Edit question
Assignee:
No assignee Edit question
Last query:
2012-06-02
Last reply:
sinny (sinnydono) said : #1

experience shows that API return code may not be consistent. you could patch/hack library to get access to internal error record list (memcached_st::error_messages and memcached_server_instance_st::error_messages) and print entire lists as well as as return code.

chances are, you will see something like "SERVER IS MARKED DEAD AND IS DISABLED UNTIL RETRY". we had to patch library to disable this piece of logic (marking servers as dead) completely as it prevented reconnects to backend after backend restart... misleading return codes are still huge problem though.

Can you help with this problem?

Provide an answer of your own, or ask Huabin Zheng for more information if necessary.

To post a message you must log in.