1

I have a very long-running python script, which synchronises data across different systems. It does a lot of data retrieval, data transformation, HTTP requests, and all of that partially multithreaded.

The script sometimes produces SIGBUS / SIGILL errors and I have no clue how to handle them properly.

The program works on roughly 500 items in a threaded way. Each item is a dictionary like this.

def processing(item):
    reduced_df = item['streets']
    reduced_df = reduced_df[reduced_df['city'] == item['city_country']['city']].copy()
    do stuff with reduced_df

preped_streets # this is the main_data_frame

items = [{
    'city_country': comb,
    'language': language,
    'streets': preped_streets
} for comb in city_country_combinations for language in ['en','de',...]]

with pool.ThreadPool(processes=32) as pool:
   pool.map(processing, items)

Now I have never encountered SIGBUS or SIGILL before, but after doing some reading I figured something this severe has got to do with the fact that I am threading here and threads are trying to access something that another thread destroyed?

Anton Menshov
  • 2,266
  • 14
  • 34
  • 55
Fabian Bosler
  • 2,310
  • 2
  • 29
  • 49
  • These errors are bugs in the Python interpreter or the libraries, not something in your code. – Barmar Apr 19 '19 at 17:31
  • How big are the list of languages and `city_country_combinations`? This could be due to running out of memory, although I'd expect Python to report that more sensibly. – Barmar Apr 19 '19 at 17:33
  • language is a 2 letter string, comb is a 2 element dictionary {city:x,country:y}, preped_streets is large. Data frame with 5000 rows and 10 columns. But I think this is passed by reference, no? – Fabian Bosler Apr 19 '19 at 22:06
  • I meant how many elements are there in each. The length of `items` will be `len(city_country_combinations) * len(['en','de',...])` – Barmar Apr 19 '19 at 22:09
  • Ohhh, well in total it’s roughly 500 items, 100 city_country_comb and 5 languages – Fabian Bosler Apr 20 '19 at 06:34
  • That shouldn't be a problem at all. I was worried that it was millions. – Barmar Apr 21 '19 at 04:18

0 Answers0