Name resolution in local domain slow.

Asked by Roger James

I am trying to understand the behaviour of name resolution for the .local domain and why it sometimes slow.

If I use the getent utility (getent hosts myth.local) to resolve a name in the local domain with the standard settings in nsswitch.conf i.e.

hosts: files mdns4_minimal [NOTFOUND=return] dns mdns4

I get a significant delay in resolution. A network trace shows the following:-

o. Time Source Destination Protocol Length Info
      1 0.000000 192.168.10.2 192.168.10.1 DNS 70 Standard query AAAA myth.local
      2 0.046538 212.104.130.9 192.168.10.2 DNS 145 Standard query response, No such name
      3 1.303024 00:50:7f:62:ad:20 ff:ff:ff:ff:ff:ff ARP 60 Who has 192.168.10.253? Tell 192.168.10.1
      4 5.005085 192.168.10.2 192.168.10.1 DNS 70 Standard query AAAA myth.local
      5 5.050288 212.104.130.65 192.168.10.2 DNS 145 Standard query response, No such name
      6 10.005865 192.168.10.2 192.168.10.1 DNS 70 Standard query AAAA myth.local
      7 10.006893 90:e6:ba:2e:2f:46 00:50:7f:62:ad:20 ARP 42 Who has 192.168.10.1? Tell 192.168.10.2
      8 10.007025 00:50:7f:62:ad:20 90:e6:ba:2e:2f:46 ARP 60 192.168.10.1 is at 00:50:7f:62:ad:20
      9 10.051570 212.104.130.9 192.168.10.2 DNS 145 Standard query response, No such name
     10 15.010985 192.168.10.2 192.168.10.1 DNS 70 Standard query AAAA myth.local
     11 15.056055 212.104.130.65 192.168.10.2 DNS 145 Standard query response, No such name
     12 20.116871 fe80::92e6:baff:fe2e:2f46 ff02::fb MDNS 90 Standard query A myth.local, "QM" question
     13 20.117314 192.168.10.2 224.0.0.251 MDNS 70 Standard query A myth.local, "QM" question
     14 20.117486 192.168.10.3 224.0.0.251 MDNS 80 Standard query response A, cache flush 192.168.10.3

Why is dns is being queryed before mdns? Surely thats not the behaviour specfied in nsswitch.conf. Anyone got any ideas.

If I change the nsswitch.conf line to be:-

hosts: files mdns4_minimal

Then the behaviour is as expected and the query returns quickly with the network trace just showing the mdns lookup and no dns lookups.

This has got very me very confused. My apologies if this is not quite the right forum for this question.

Roger

Question information

Language:
English Edit question
Status:
Solved
For:
Ubuntu nss-mdns Edit question
Assignee:
No assignee Edit question
Solved by:
Roger James
Solved:
Last query:
Last reply:
Revision history for this message
actionparsnip (andrew-woodhead666) said :
#1

What is the output of:

cat /etc/resolv.conf

Thanks

Revision history for this message
Roger James (rogerjames99) said :
#2

resolv.conf contains

# Dynamic resolv.conf(5) file for glibc resolver(3) generated by resolvconf(8)
# DO NOT EDIT THIS FILE BY HAND -- YOUR CHANGES WILL BE OVERWRITTEN
nameserver 127.0.0.1

However I now believe this behaviour is specific to the getent utility. Which makes it somewhat useless as a tool for debugging nsswitch problems!

This little test program resolves the name as expected.
#include <stdio.h>
#include <stdlib.h>
#include <netdb.h>

int main()
{
    struct hostent * hostentry;

    hostentry = gethostbyname("myth.local");

    if (NULL != hostentry)
    {
        printf("ghbn returned %s %d %d %d %d\n", hostentry->h_name,
                                            (unsigned char)hostentry->h_addr_list[0][0],
                                            (unsigned char)hostentry->h_addr_list[0][1],
                                            (unsigned char)hostentry->h_addr_list[0][2],
                                            (unsigned char)hostentry->h_addr_list[0][3]);
    }
    else
        printf("ghbn failed\n");
    return 0;
}

This all started when I looked at a slow telnet login problem on another server. It looked like that was down to slow reverse ip lookups of the local ip addresses (192.168.10.x). I guess the getent thing has led me down a rat hole!

Roger

Revision history for this message
Roger James (rogerjames99) said :
#3

For information.

I eventually tracked the login delay down to ck-get-x11-server-pid which is part of ConsoleKit and is called at the end of the login process.

ck-get-x11-server-pid tries to resolve and connect whatever is in the current DISPLAY environment variable. If you are calling host2 from host1 by doing something like telnet host2.local then the DISPLAY environment variable for the login environment on host2 will contain something like host1:0. ck-get-x11-server-pid will then try to resolve the name host1 but the ether trace shows an ipv6 query and response followed by a 5 second delay followed by a repeated ipv6 query and reponse followed by an ipv4 query and response at which time ck-get-x11-server exits and allows the login to complete. I am not sure from this trace if the program is using nsswitch aware name resolution routines or calling dns directly.

No. Time Source Destination Protocol Length Info
      1 0.000000 192.168.10.2 192.168.10.1 DNS 64 Standard query AAAA myth
      2 0.046743 212.104.130.65 192.168.10.2 DNS 139 Standard query response, No such name
      3 4.237776 00:50:7f:62:ad:20 ff:ff:ff:ff:ff:ff ARP 60 Who has 192.168.10.101? Tell 192.168.10.1
      4 4.237811 00:50:7f:62:ad:20 ff:ff:ff:ff:ff:ff ARP 60 Who has 192.168.10.105? Tell 192.168.10.1
      5 5.005114 192.168.10.2 192.168.10.1 DNS 64 Standard query AAAA myth
      6 5.049498 212.104.130.9 192.168.10.2 DNS 139 Standard query response, No such name
      7 10.010326 192.168.10.2 192.168.10.1 DNS 64 Standard query A myth
      8 10.055472 192.168.10.1 192.168.10.2 DNS 139 Standard query response, No such name

I anyone has any idea on what is happening here I am curious to find out.

The test can easily be repeated by setting the DISPLAY environment variable in a console shell to something like dummy:0 and then running /usr/lib/ConsoleKit/ck-get-x11-server-pid and seeing what happens. I repeated to test on a debian squeeze system and looks like ConsoleKit is even more broken on there. ck-get-x11-server sends out an ipv6 followed by an ipv4 query but closes the ipv4 (or maybe it is the ipv6 I cannot remember) before it gets the reply resulting in an ICMP port unreachable message as well as going through the five second delay and repeat routine.

Anyway this is going on the back burner again now!