Is there a way to not use Pickling when using the python multi processing module?

Question

I am having a very hard time figuring this out. Im trying to create a real time satellite tracker and using the sky field python module. It reads in TLE data and then gives a LAT and LON position relative to the earth. The sky filed module creates satrec objects which cannot be pickled (even tried using dill). I am using a for loop to loop over all the satellites but this is very slow so I want to speed it up using multiprocessing with the pool method, but as above this is not working since multiprocessing uses pickle. Is there any way around this or does anyone have suggestions on other ways to use multiprocessing so speed up this for loop?

from skyfield.api import load, wgs84, EarthSatellite
import numpy as np
import pandas as pd
import time
import os 
from pyspark import SparkContext
from multiprocessing import Pool
import dill

data = pd.read_json('tempSatelliteData.json')
print(data.head())

newData = data.filter(['tle_line0', 'tle_line1', 'tle_line2'])
newData.to_csv('test.txt', sep='\n', index=False)

stations_file = 'test.txt'
satellites = load.tle_file(stations_file)
ts = load.timescale()
t = ts.now()

#print(satellites)
#data = pd.DataFrame(data=satellites)
#data = data.to_numpy()

def normal_for():
# this for loop takes 9 seconds to comeplete TOO SLOW
    ty = time.time()
    for satellite in satellites:
        geocentric = satellite.at(t)
        lat,lon = wgs84.latlon_of(geocentric)
        print('Latitude:', lat)
        print('Longitude:', lon)
    print(np.round_(time.time()-ty,3),'sec')

def sat_lat_lon(satellite):
    geocentric = satellite.at(t)
    lat,lon = wgs84.latlon_of(geocentric)

p = Pool()
result = p.map(sat_lat_lon, satellites)
p.close()
p.join()

What is `type(geocentric)`, `type(lat)`, and `type(lon)`? Would it be possible to post an example `stations_file` so those helping can reproduce this easily? — Charchit Agarwal, Jul 26 '22 at 07:52
I am unfamiliar with the satellite positioning software you are using and whether it takes 9s because it is doung i/o to acquire the satellite or because it is doing maths. If it is i/o slowing it down, you can use threads rather than multiprocessing and it will not require pickle because the data is already in the same process with threads. Just trying to help. — Mark Setchell, Jul 26 '22 at 09:03
One common way to work around pickling issues is to send data which can be pickled, to create data which cannot be pickled. In this case, you could store the satellite data in multiple text files, and send the text files to the child processes. They can then read and create the satellite objects themselves, without needing to send the data across processes. — Charchit Agarwal, Jul 26 '22 at 13:46
@Charchit im not really sure what geocentric, lat, lon types are but they were just copied from the documentation of sky field to predict the latitude and longitude of the satellite. im not sure how I would send data which can be pickled to create data that cannot be pickled since the for loop requires the satellite object to compute the latitude and longitude and that is what cannot be pickled. — Koss24, Jul 27 '22 at 02:28
@MarkSetchell it is calling a mathematical model (sgp4) to predict the future latitude and longitude — Koss24, Jul 27 '22 at 02:29
@Koss24, push this line `satellites = load.tle_file(stations_file)` inside the for loop. You will need to separate the satellite data into multiple files, and pass the file name to the for loop for each process. This will probably be the quickest, least efficient fix for this. — Charchit Agarwal, Jul 27 '22 at 11:02
@Koss24 Can you provide a small sample of what's in 'tempSatelliteData.json'? — RootTwo, Jul 28 '22 at 05:56

score 2 · Answer 1 · answered Jul 26 '22 at 13:11

I'm the author of dill, multiprocess, ppft, and pathos. You can try multiprocess, which uses dill instead of pickle, but if you say the objects are not serializable by dill, then that won't work. Alternates are multiprocess.dummy which uses threading, and won't require object serialization as in multiprocess (as suggested in the comments). There's also pathos.pools.ParallelPool (or just use underlyingppft)... which converts objects to source code to ship them across processes. There are a few other codes that provide parallel maps, but most of them require serialization of some sort. If none of the above works, you might have to work harder to make the objects serializable. For example, you could register serialization functions for the objects which can inform dill how to pickle the objects. dill also has serialization variants in dill.settings that enables you to try different serializations that might work. Sometimes, just changing the code construction or import locations can make an object serializable.

If it's the speed of the serialization, and not the ability to serialize the objects... then you might try mpi4py (or pyina to get a MPI map). MPI is intended a bit more for heavy lifting (expensive code). However, if it's the serialization and shipping of large serialized objects that is slowing you down... then using threading or adding a custom serializer is probably your best bet.

score 0 · Answer 2 · answered Jul 28 '22 at 22:27

An EarthSatellite object can be created directly by passing the tle lines and a timescale to the constructor. So use Pool.map() or similar to pass the tle lines and timescale to the processes and let them create the satrec objects themselves.

You can probably get the tle lines directly from the json data and skip the steps of read_json and write_csv. But you didn't provide a sample json file.

I don't have any sample data, so this is untested:

from skyfield.api import load, wgs84, EarthSatellite
import pandas as pd
from multiprocessing import Pool

ts = load.timescale()
t = ts.now()

# load the .json data and convert it to a list of
# lists containing tle data for a satellite
data = pd.read_json('tempSatelliteData.json')
tle_data = [(row.tle_line0, row.tle_line1, row.tle_line2, ts, t)
    for row in data.itertuple()]
    
def sat_lat_lon(line0, line1, line2, ts, t):
    satellite = EarthSatellite(line1, line2, line0, ts)
    geocentric = satellite.at(t)
    lat,lon = wgs84.latlon_of(geocentric)
    return satellite.satnum, lat, lon

with Pool() as p:
    result = p.starmap(sat_lat_lon, tle_data)
    p.close()
    p.join()

Is there a way to not use Pickling when using the python multi processing module?

2 Answers2