Here is the third and last article of a series covering some important aspects about the DNS protocol to troubleshoot applications performance issues.
Many applications perform names resolution through DNS queries.
As explained in the first article, this provides much more flexibility compared to working with fixed IP addresses. The drawback of adding this process in an end-to-end application chain is that it can have disastrous performance impacts in case this process does not perform properly.
The result of a DNS query can fall into one of the following categories:
- DNS request is successful;
- DNS request has been received by the DNS server but has not been processed properly and the client gets an error message;
- The client does not get any answer from the DNS server.
When troubleshooting performance issues, it’s important to quickly assess DNS query processes:
- Are the DNS processes successful? always? for every users and request names?
- Are successful DNS processes performant enough according to the baselines?
- Which DNS processes are unsuccessful and why? No answer? Specific problems encountered?
Successful but non-optimal DNS processes
Let’s take a simple example. In enterprise environments, each client’s TCP/IP stack is normally configured to use a local DNS server. In complex multi-sites architectures, it’s not uncommon to notice that clients use remote DNS servers. Instead of using the local network, each DNS request passes the whole WAN (Wide Area Network), which adds latency.
Bypassing the local DNS server by using external DNS services is also often encountered in production environments. This can have an impact on the overall performances as the client will not take benefit of extra features like caching supported by the local DNS server.
With PerformanceVision, you can quickly check the global DNS service performances by looking at the DNS performance dashboard:
In this particular case, you notice a DNS performance problem at 14:04.
By clicking on this peak, you can then see which client was impacted as well as the DNS resolution processes that were involved:
In this case, the client was 172.16.8.58, requesting the name “baXXXXXes.com” to the local DNS server 172.16.1.12. This request was successful (Reponse Code : “NoError”) but the whole DNS resolution process toke 15.7 seconds! By looking to the corresponding network-related KPIs (by clicking on the “L3” icon at the left), you can quickly correlate network performances to this particular DNS request process to determine whether the network was to blame for this or not.
As mentioned before, it’s also important to check whether the DNS servers are well used (no external DNS servers and proper performance of local DNS servers).
The “Top DNS Servers” view provides you with the right and concise information:
As you can see here, the DNS server 172.16.1.12 seems to have some performance problems as the performance is not stable (you can see a huge deviation bar and an average DNS process time of 2.5 seconds). Furthermore, 4 of the 37 DNS requests have not been successful. This is something to look at.
Unsuccessful DNS process: error response codes
In a DNS response packet, you’ll find the response code in a specific flag, like shown on the Wireshark trace hereunder:
There are many different error codes supported by DNS.
These are identified by PerformanceVision and can be filtered in case you intend to focus on particular DNS response messages:
From a performance perspective, the main response codes we are interested in are the following:
- Code 0 : No error;
- Code 1 : Format error – query cannot be interpreted;
- Code 2 : Server failure – process impossible;
- Code 3 : Name error – domain name does not exist;
- Code 4 : DNS query type not implemented on the DNS server;
- Code 5 : Refused – name server refuses due to policy.
If everything goes fine, the response code should be “0” and the related performances (the duration of the whole DNS request/response process should not exceed some milliseconds depending on your IT infrastructure).
All other codes mentioned are related to DNS issues.
So, the first step in DNS-related problems troubleshooting is to get an overview of all responses codes and corresponding performances.
With PerformanceVision, this is done by visualizing the DNS Requests Overview dashboard:
Referring to the first blog article related to DNS, you [will/can/might] notice on the first line of this dashboard that 411 DNS requests for IPv4 addresses (Request Type “A”) did not succeed.
In this case, the problem is that the requested name does not exist (Response Code = 3).
The next logical step is to check which FQDNs were requested:
As you can see, the FQDNs “ntp.labo.securactive.lan” does not exist.
You can also see all clients that have requested this FQDN in order to further troubleshoot and solve the problem.
DNS queries without response
DNS queries that are issued and that do not generate any answer back are typical situations that should be clearly identified for further analysis. They can be the sign of different issues, like:
- Bad TCP/IP stack configuration of the endpoints that point to a non-existing or decommissioned DNS server;
- DNS server overload;
- Network related problems.
With PerformanceVision, you can easily identify such DNS requests by looking at the DNS Requests Overview dashboard:
We can see from this dashboard that 5 requests for IPv4 addresses did not get any answer.
Well, quite easy to determine with PerformanceVision. Just click on the “+” sign on the left to get the answer:
All these failed requests have been sent to the same DNS server 172.16.1.12.
Perhaps this server is down. How can we check this?
Again, quite simple by filtering on this particular DNS server IP address and requesting all DNS processes that occurred during this particular chosen timeframe:
As you can notice, this DNS server seems to be up and running seeing it answered some other requests correctly.
The troubleshoot process can continue by looking to network-related KPIs and so on.
DNS is an important process in today’s business applications.
Bad or non-optimal name resolution processes can have a dramatic impact on the overall applications performances.
PerformanceVision can decode DNS transactions at layer 7, providing you with the ability to quickly analyze and troubleshoot complex DNS-related performance issues.