udp服务雪崩测试与分析

我们知道， udp socket的接收缓冲区大小是有限的，可以查到最大值。以server端为例，如果server端socket接收缓冲区满了，那么client端新进的请求不会得到及时处理，出现丢包。即使server端的socket的接收缓冲区没有满，但仍有一些请求在其中排队，那么从client端发过来的新请求，也自然会排队，很可能没有等到server端来得及去recvfrom并做逻辑操作，这个请求包就已经超时了， client端等不及了。

接收缓冲区排队堵塞（即使没有塞满），会造成新来的请求排队，如果client端对时延要求较高，那么很可能出现所有的请求都超时，尽管server端还在苦逼忙碌地处理排队的请求，但server端对外表现的服务能力为0 （因为新进请求都因排队而超时, 而server端一直在处理已经超时的请求，做无用功），这就是雪崩。

我们来模拟下：

sever端udp代码为：

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <arpa/inet.h>
#include <unistd.h>

int main(int argc, char **argv)
{
    struct sockaddr_in srvAddr;
    bzero(&srvAddr,sizeof(srvAddr));
    srvAddr.sin_family = AF_INET;
    srvAddr.sin_addr.s_addr = htonl(INADDR_ANY);
    srvAddr.sin_port = htons(atoi(argv[1]));
    int srvAddrLen = sizeof(srvAddr);

    int iSock = socket(AF_INET, SOCK_DGRAM, 0);  // udp
    int iRet = bind(iSock, (struct sockaddr *)&srvAddr, sizeof(srvAddr));
	
    while(1)
    {
		struct sockaddr_in cliAddr;
		bzero(&cliAddr,sizeof(cliAddr));
		cliAddr.sin_family = AF_INET;
		int cliAddrLen = sizeof(cliAddr);
		char szBuf[40960] = {0};
        recvfrom(iSock, szBuf, sizeof(szBuf) - 1, 0, (struct sockaddr *)&cliAddr, (socklen_t*)&cliAddrLen); 
		
		char *p = strstr(szBuf, "=");
		if(p != NULL)
		{
			*p = 0;
		}
		
		usleep(500 * 1000); // 延时500ms, 这一部分相当于业务逻辑处理
		
		unsigned int len = strlen(szBuf);
		sendto(iSock, szBuf, len + 1, 0, (struct sockaddr *)&cliAddr, sizeof(cliAddr));
    }

    close(iSock);
    return 0;
}

看看client 1的代码：

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <arpa/inet.h>
#include <unistd.h>

int main(int argc, char **argv)
{
    struct sockaddr_in srvAddr;
    bzero(&srvAddr, sizeof(srvAddr));
    srvAddr.sin_family = AF_INET;
    srvAddr.sin_addr.s_addr = inet_addr("172.17.0.15");
    srvAddr.sin_port = htons(atoi(argv[1]));

    int iSock = socket(AF_INET, SOCK_DGRAM, 0); // udp
	unsigned int i = 0;
	
	char szTmp[1024] = {0};
	for(unsigned int j = 0; j < sizeof(szTmp) - 100; j++)
	{
		szTmp[j] = 'a';
	}
	
	struct timeval tv;
	tv.tv_sec = 5;
	tv.tv_usec =  0;
	setsockopt(iSock, SOL_SOCKET, SO_RCVTIMEO, &tv, sizeof(tv)); 
		
    while(1)
    {
		char szBuf[1024] = {0};
		sprintf(szBuf, "%u=%s", ++i, szTmp);
		
        sendto(iSock, szBuf, strlen(szBuf) + 1, 0, (struct sockaddr *)&srvAddr, sizeof(srvAddr));
		
		int iRet = recvfrom(iSock, szBuf, sizeof(szBuf) - 1, 0, NULL, NULL);  // client端允许的超时时间为5s
		if(iRet <= 0)
		{
			printf("time out, or error");
		}
		else
		{
			printf("%s\n", szBuf);		
		}
		
		getchar();
    }

	close(iSock);
    return 0;
}

看看client 2的代码：

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <arpa/inet.h>
#include <unistd.h>

int main(int argc, char **argv)
{
    struct sockaddr_in srvAddr;
    bzero(&srvAddr, sizeof(srvAddr));
    srvAddr.sin_family = AF_INET;
    srvAddr.sin_addr.s_addr = inet_addr("172.17.0.15");
    srvAddr.sin_port = htons(atoi(argv[1]));

    int iSock = socket(AF_INET, SOCK_DGRAM, 0); // udp
	unsigned int i = 0;
	
	char szTmp[1024] = {0};
	for(unsigned int j = 0; j < sizeof(szTmp) - 100; j++)
	{
		szTmp[j] = 'a';
	}
	
		
    while(1)
    {
		char szBuf[1024] = {0};
		sprintf(szBuf, "%u=%s", ++i, szTmp);
        sendto(iSock, szBuf, strlen(szBuf) + 1, 0, (struct sockaddr *)&srvAddr, sizeof(srvAddr));  // 不停地给server端发包，去塞满server端socket的接收缓冲区
    }

	close(iSock);
    return 0;
}

测试一：

开启server, 然后开启client 1, 可以看到， client 1每次都能及时收到server端的回包

ubuntu@VM-0-15-ubuntu:~/taoge/client$ ./a.out 8888
1

2

3

4

5

6

一切正常。好，关闭上面的server和client 1

测试二：

开启server端，然后开启client 2, 不停往server端发包。由于server端处理业务逻辑有500ms的延时，而client 2在while中疯狂发包，所以server端socket的接收缓冲区会逐渐堆积，我们可以在逐渐堆积的过程中启动client 1来看表现，但是client 2太疯狂了，一会就满了，来不及在堆积过程中开启client 1.

server端socket接收缓冲区的堆满过程很快，如下：

ubuntu@VM-0-15-ubuntu:~/taoge$ netstat -au
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address           Foreign Address         State      
udp   214272      0 *:8888                  *:*                                
udp     5376      0 *:34237                 *:*                                
ubuntu@VM-0-15-ubuntu:~/taoge$ netstat -au
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address           Foreign Address         State      
udp   214272      0 *:8888                  *:*                                
udp     6912      0 *:34237                 *:*                                
ubuntu@VM-0-15-ubuntu:~/taoge$

现在， client 1还没有开启，我们来关闭client 2，停止疯狂发包, 那么server端的recvfrom就会逐渐从缓冲区中取出数据了， server端socket的接收缓冲区的内容会逐渐变小（逐渐远离爆满的状态），我们来看看变化的过程：

ubuntu@VM-0-15-ubuntu:~/taoge$ netstat -au
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address           Foreign Address         State      
udp   214272      0 *:8888                  *:*                                
udp     5376      0 *:34237                 *:*                                
ubuntu@VM-0-15-ubuntu:~/taoge$ netstat -au
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address           Foreign Address         State      
udp   214272      0 *:8888                  *:*                                
udp     6912      0 *:34237                 *:*                                
ubuntu@VM-0-15-ubuntu:~/taoge$ 
ubuntu@VM-0-15-ubuntu:~/taoge$ netstat -au
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address           Foreign Address         State      
udp   205056      0 *:8888                  *:*                                
ubuntu@VM-0-15-ubuntu:~/taoge$ 
ubuntu@VM-0-15-ubuntu:~/taoge$ netstat -au
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address           Foreign Address         State      
udp   191232      0 *:8888                  *:*                                
ubuntu@VM-0-15-ubuntu:~/taoge$ 
ubuntu@VM-0-15-ubuntu:~/taoge$ netstat -au
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address           Foreign Address         State      
udp   168192      0 *:8888                  *:*

可以看到，缓冲区的内容在逐渐减少。

此时，我们打开client 1来发起请求，虽然server端接收缓冲区中的内容在减少，但毕竟还有内容，所以client 1发的请求依然要排队等待，如果超过client 1预期的5s, 那么就会超时，来看看：

ubuntu@VM-0-15-ubuntu:~/taoge/client$ ./a.out 8888
time out, or error
time out, or error
time out, or error
time out, or error
1
2
3

可以看到，最开始的几个请求都超时了，等server端的socket的接收缓冲区不堵塞（逐渐减少到零）时， client 1发起的请求，都能及时收到回包。

在上述案例中，排队引起了雪崩。但是，我们要注意，排队不一定会导致雪崩。是否雪崩取决于排队多少和server端的处理速度，当然，也不要以为只有缓冲区满才会雪崩。

如上就是雪崩的过程，怎么预防呢？ client端和server端都要努力，雪崩的预防，我们在之前的文章中讨论过，故不再赘述。

最后，我要说，我是很讨厌雪崩的。

Biegral Blog

udp服务雪崩测试与分析

阅读排行

分类

归档