0

I have a http resource whose size is 3GB.

I have some codes like below.

#the url is actually a http resource which is 3GB.
res = urllib2.urlopen(url, timeout = 10)
data = res.read(1024)
while data:
    data = res.read(1024)

In Vmware workstation 11 or below, it works fine.But in Vmware workstation 12, it gives me the error.

Traceback (most recent call last):
  File "<stdin>", line 2, in <module>
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/socket.py", line 384, in read
    data = self._sock.recv(left)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/httplib.py", line 612, in read
    s = self.fp.read(amt)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/socket.py", line 384, in read
    data = self._sock.recv(left)
socket.timeout: timed out

I use safari to download the resource in Vmware workstation 12, it works fine. And if the resource is less than some size such as 10K, it also works fine.

OneCricketeer
  • 179,855
  • 19
  • 132
  • 245
jia hilegass
  • 493
  • 2
  • 12
  • 1
    Have you tried removing `timeout = 10` and see if the requested url is processed within the default timeout period? – Eduard Aug 05 '16 at 02:42
  • @EduardDaduya. If i remove timeout, the thread will get stuck forever. – jia hilegass Aug 05 '16 at 03:00
  • My apologies, I misunderstood the problem at hand, you are retrieving the data via urlopen successfully, yet reading the retrieved data raises the said exception. Something I have not yet encountered. I believe this explanation will help you with your current problem http://stackoverflow.com/a/26765074/1809168 – Eduard Aug 05 '16 at 03:04
  • @EduardDaduya. It doesn't help me. Because i send only one request to the server, and just wait for the data from server. My server works fine because if i use Qt network module or just safari, i can download it . So maybe there has some problem between python urlopen and VMware workstation 12. – jia hilegass Aug 05 '16 at 03:51
  • I'd like to think of it as more of a socket related problem as the error is pointing out, rather than `urlopen` because the operation `urllib2.urlopen` is successfully executed and data is retrieved. – Eduard Aug 05 '16 at 03:55

1 Answers1

0

They fixed it in VMware Fusion 8.5.7! See https://communities.vmware.com/thread/544049

I can't really provide you an answer right now and what I have to say is a bit longer than a comment, but I'm experiencing a similar issue in VMWare Fusion Pro 8.5 on 10.12 with Python's urllib2. It has nothing to do with urllib2.

I started receiving this issue randomly during transfer sessions and, after some Wireshark debugging, determined that it was due to the TCP window reaching 0 on the receiver. For some reason, it never updates again.

If you don't know what a TCP window is, it's basically the size of the receive buffer on one end of a TCP connection. That buffer should expand and contract as a congestion control mechanism during normal transfer, but what shouldn't happen is getting the window stuck at 0.

The reason your sessions work for transfers less than 10k is because the default TCP window is usually about 8k. Anything less than that are you won't even fill up the receiving buffer. Anymore more and you're basically hoping you process the data faster than you receive it.

To reproduce the issue on my local machine, here are two [terribly] written C programs you can compile with cc client.c -o client and cc server.c -o server. Run the client in the VM and the server on your local machine.

server.c:

/* server.c */
/* A simple server in the internet domain using TCP
   The port number is passed as an argument */
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/socket.h>
#include <netinet/in.h>

void error(const char *msg)
{
    perror(msg);
    exit(1);
}

int main(int argc, char *argv[])
{
    int sockfd, newsockfd, portno;
    socklen_t clilen;
    char buffer[1024];
    struct sockaddr_in serv_addr, cli_addr;
    int n, total;
    if (argc < 2) {
        fprintf(stderr,"ERROR, no port provided\n");
        exit(1);
    }
    sockfd = socket(AF_INET, SOCK_STREAM, 0);
    if (sockfd < 0)
       error("ERROR opening socket");
    bzero((char *) &serv_addr, sizeof(serv_addr));
    portno = atoi(argv[1]);
    serv_addr.sin_family = AF_INET;
    serv_addr.sin_addr.s_addr = INADDR_ANY;
    serv_addr.sin_port = htons(portno);
    if (bind(sockfd, (struct sockaddr *) &serv_addr,
            sizeof(serv_addr)) < 0)
        error("ERROR on binding");
    listen(sockfd,5);
    clilen = sizeof(cli_addr);
    newsockfd = accept(sockfd,
              (struct sockaddr *) &cli_addr,
              &clilen);
    if (newsockfd < 0)
        error("ERROR on accept");
    memset(buffer, '0xAB', sizeof(buffer));
    total = 0;
    for (;;) {
        n = write(newsockfd, buffer, sizeof(buffer));
        if (n < 0)
            error("ERROR writing to socket");
        else
            total += n;
            printf("wrote %d / %d\n", n, total);
    }
    close(newsockfd);
    close(sockfd);
    return 0;
}

client.c:

/* client.c */
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <string.h>
#include <sys/types.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <netdb.h>

void error(const char *msg)
{
    perror(msg);
    exit(0);
}

int main(int argc, char *argv[])
{
    fd_set set;
    int sockfd, portno, n, total, rv;
    struct sockaddr_in serv_addr;
    struct hostent *server;
    struct timeval timeout;

    char buffer[256];
    if (argc < 3) {
       fprintf(stderr,"usage %s hostname port\n", argv[0]);
       exit(0);
    }
    portno = atoi(argv[2]);
    sockfd = socket(AF_INET, SOCK_STREAM, 0);
    if (sockfd < 0)
        error("ERROR opening socket");
    server = gethostbyname(argv[1]);
    if (server == NULL) {
        fprintf(stderr,"ERROR, no such host\n");
        exit(0);
    }
    bzero((char *) &serv_addr, sizeof(serv_addr));
    serv_addr.sin_family = AF_INET;
    bcopy((char *)server->h_addr,
         (char *)&serv_addr.sin_addr.s_addr,
         server->h_length);
    serv_addr.sin_port = htons(portno);
    if (connect(sockfd,(struct sockaddr *) &serv_addr,sizeof(serv_addr)) < 0)
        error("ERROR connecting");
    bzero(buffer, 256);

    FD_ZERO(&set);
    FD_SET(sockfd, &set);

    sleep(1);

    timeout.tv_sec = 1;
    timeout.tv_usec = 0;
    total = 0;
    for (;;) {
        rv = select(sockfd + 1, &set, NULL, NULL, &timeout);
        if (rv == -1) {
            perror("select\n");
        } else if(rv == 0) {
            printf("timeout\n");
            break;
        } else {
            n = read(sockfd, buffer, 256);
            if (n < 0)
            error("ERROR reading from socket");
            total += n;
            printf("read %d / %d\n", n, total);
        }
    }
    close(sockfd);
    return 0;
}

These programs are both taken directly from http://www.linuxhowtos.org/C_C++/socket.htm with modification to report additional stats and to force a stall out.

Here is a screenshot from Wireshark demonstrating the TCP Window reducing to 0 and sticking:

TCP Zero Window in Wireshark

My current theory is that there is some kind of bug in the network stack on the VMWare side on the client, but it's difficult to tell. So far I've tried using three different virtual network interfaces (e1000, e1000e, vlance) and still had the same issue with each of them.

I'm going to attempt to try various vmx options to reduce the likelihood that the issue occurs, but this is obviously a killer for a stable system and my use case (Virtualized Jenkins slaves for CI) simply won't allow this kind of bug.

I'll report back if I'm able to learn anything new.

EDIT: I posted a bug in the VMWare Community board: https://communities.vmware.com/message/2648727

EDIT again: They fixed it in VMware Fusion 8.5.7! See the same link as above.

vmrob
  • 2,966
  • 29
  • 40