1

I'm working on time series charts for 300+ clients. It is beneficial to us to pull each client separately as the combined data is huge and in some cases clients data is resampled or manipulated in a slightly different fashion.

My problem is that the function I loop through to get each client data opens 3 new threads but never closes the threads (I'm assuming the connection stays open) when the request is complete and the function returns the data.

Once I have the results of a client, I'd like to close that connection. I just can't figure out how to do that and haven't been able to find anything in my searches.

def solr_data_pull(submitterId): 
    zookeeper= pysolr.ZooKeeper('ndhhadr1dnp11,ndhhadr1dnp12,ndhhadr1dnp13:2181/solr')
    solr = pysolr.SolrCloud(zookeeper, collection='tran_timings', timeout=60)

    query = ('SubmitterId:'+ str(submitterId) +' AND Tier:'+tier+' AND Mode:'+mode+' '
             'AND Timestamp:['+ str(start_period)+' TO '+ str(end_period)+ '] ')

    results = solr.search(rows=50000, q=[query], fl=[fl_list])

    return(pd.DataFrame(list(results)))
GeorgeLPerkins
  • 1,126
  • 10
  • 24
  • Any reason why you at least can't keep the ZK connection alive between each `solr_data_pull` call to avoid the connection overhead each time? – MatsLindh Jun 23 '17 at 08:37

1 Answers1

2

PySolr uses the Session object from requests as its underlying library (which in turn uses urllib3s connection pooling), so calling solr.get_session().close() should close all connections and drain the pool:

def close(self):
    """Closes all adapters and as such the session"""

(SolrCloud is an extension of Solr which have the get_session() method.)

For disconnecting from Zookeeper - which you probably shouldn't if its a long running session as it'll have to set up watches etc. again, you can use the .zk object directly on your SolrCloud instance - zk is a KazooClient:

stop()
Gracefully stop this Zookeeper session.

close()
Free any resources held by the client.

This method should be called on a stopped client before 
it is discarded. Not doing so may result in filehandles 
being leaked.
MatsLindh
  • 49,529
  • 4
  • 53
  • 84
  • Thanks for the reply. It may help me in the future, but it's moot now. We decided to grab all data in one pull and add an extra function that splits the clients, does individual processing, and the recombines them into a master dataframe. – GeorgeLPerkins Jun 23 '17 at 13:09