3

I used nginx to build mlflow server with its proxy_pass and integrated simple HTTP auth in nginx. However, when I ran the experiment for a while, the mlflow client met this exception. And I have no idea how to fix it.

Here is the error messages:

Traceback (most recent call last):
  File "/usr/local/opt/python/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/urllib3/connection.py", line 159, in _new_conn
    (self._dns_host, self.port), self.timeout, **extra_kw)
  File "/usr/local/opt/python/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/urllib3/util/connection.py", line 80, in create_connection
    raise err
  File "/usr/local/opt/python/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/urllib3/util/connection.py", line 70, in create_connection
    sock.connect(sa)
TimeoutError: [Errno 60] Operation timed out
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "/usr/local/opt/python/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/urllib3/connectionpool.py", line 600, in urlopen
    chunked=chunked)
  File "/usr/local/opt/python/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/urllib3/connectionpool.py", line 354, in _make_request
    conn.request(method, url, **httplib_request_kw)
  File "/usr/local/opt/python/Frameworks/Python.framework/Versions/3.6/lib/python3.6/http/client.py", line 1239, in request
    self._send_request(method, url, body, headers, encode_chunked)
  File "/usr/local/opt/python/Frameworks/Python.framework/Versions/3.6/lib/python3.6/http/client.py", line 1285, in _send_request
    self.endheaders(body, encode_chunked=encode_chunked)
  File "/usr/local/opt/python/Frameworks/Python.framework/Versions/3.6/lib/python3.6/http/client.py", line 1234, in endheaders
    self._send_output(message_body, encode_chunked=encode_chunked)
  File "/usr/local/opt/python/Frameworks/Python.framework/Versions/3.6/lib/python3.6/http/client.py", line 1026, in _send_output
    self.send(msg)
  File "/usr/local/opt/python/Frameworks/Python.framework/Versions/3.6/lib/python3.6/http/client.py", line 964, in send
    self.connect()
  File "/usr/local/opt/python/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/urllib3/connection.py", line 181, in connect
    conn = self._new_conn()
  File "/usr/local/opt/python/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/urllib3/connection.py", line 168, in _new_conn
    self, "Failed to establish a new connection: %s" % e)
urllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPConnection object at 0x1280a8438>: Failed to establish a new connection: [Errno 60] Operation timed out
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "/usr/local/opt/python/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/requests/adapters.py", line 449, in send
    timeout=timeout
  File "/usr/local/opt/python/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/urllib3/connectionpool.py", line 638, in urlopen
    _stacktrace=sys.exc_info()[2])
  File "/usr/local/opt/python/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/urllib3/util/retry.py", line 398, in increment
    raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host=<host_ip>, port=<port>): Max retries exceeded with url: /api/2.0/mlflow/experiments/get-by-name?experiment_name=<exp_name> (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x1280a8438>: Failed to establish a new connection: [Errno 60] Operation timed out',))
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "tmp_experiment_entry.py", line 4, in <module>
    mlflow.set_experiment(<exp_name>)
  File "/usr/local/opt/python/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/mlflow/tracking/fluent.py", line 47, in set_experiment
    experiment = client.get_experiment_by_name(experiment_name)
  File "/usr/local/opt/python/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/mlflow/tracking/client.py", line 151, in get_experiment_by_name
    return self._tracking_client.get_experiment_by_name(name)
  File "/usr/local/opt/python/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/mlflow/tracking/_tracking_service/client.py", line 114, in get_experiment_by_name
    return self.store.get_experiment_by_name(name)
  File "/usr/local/opt/python/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/mlflow/store/tracking/rest_store.py", line 219, in get_experiment_by_name
    response_proto = self._call_endpoint(GetExperimentByName, req_body)
  File "/usr/local/opt/python/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/mlflow/store/tracking/rest_store.py", line 32, in _call_endpoint
    return call_endpoint(self.get_host_creds(), endpoint, method, json_body, response_proto)
  File "/usr/local/opt/python/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/mlflow/utils/rest_utils.py", line 133, in call_endpoint
    host_creds=host_creds, endpoint=endpoint, method=method, params=json_body)
  File "/usr/local/opt/python/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/mlflow/utils/rest_utils.py", line 70, in http_request
    url=url, headers=headers, verify=verify, **kwargs)
  File "/usr/local/opt/python/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/mlflow/utils/rest_utils.py", line 51, in request_with_ratelimit_retries
    response = requests.request(**kwargs)
  File "/usr/local/opt/python/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/requests/api.py", line 60, in request
    return session.request(method=method, url=url, **kwargs)
  File "/usr/local/opt/python/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/requests/sessions.py", line 533, in request
    resp = self.send(prep, **send_kwargs)
  File "/usr/local/opt/python/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/requests/sessions.py", line 646, in send
    r = adapter.send(request, **kwargs)
  File "/usr/local/opt/python/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/requests/adapters.py", line 516, in send
    raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: HTTPConnectionPool(host=<host_ip>, port=<port>): Max retries exceeded with url: /api/2.0/mlflow/experiments/get-by-name?experiment_name=<exp_name> (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x1280a8438>: Failed to establish a new connection: [Errno 60] Operation timed out',))

In the client, I use mlflow log by the following format and log_params, log_metrics in main function

with mlflow.start_run():
    main(params)
Jie-Han Chen
  • 108
  • 1
  • 8
  • I see this error sometimes and it seems to be related to the network I am using. Experiments running on the server are logged without error. Maybe it is just weak wifi on the client side? – zlyde Aug 12 '20 at 08:23
  • 1
    @zlyde Yes, I think it was caused by network connection problem. Thank you for your comfirm. – Jie-Han Chen Aug 13 '20 at 01:05
  • 1
    @Jie-HanChen I faced the same issue. It was mainly because of parallel processes using the network connection. But zlyde's point is also valid. – TeJas Feb 17 '21 at 11:50

0 Answers0