I have many classified lidar point cloud files, which I want to convert to geotiff raster files. For that I wrote a function that creates a json-Pipeline file that is required for conversion with PDAL and then executes that pipeline.
tiles = []
for file in glob.glob("*.las"):
tiles.append(file)
def select_points_and_raster(file, class_nr, resolution):
filename_out = file.split('.')[0]+'_'+ str(do) +'.tif'
config = json.dumps([ file,
{'type':'filters.range', 'limits':classification[class_nr]},
{'resolution':resolution, 'radius':resolution*1.414,
'gdaldriver':'GTiff',
'output_type':['mean'],
'filename':filename_out}
])
pipeline = pdal.Pipeline(config)
pipeline.execute()
return filename_out
for i in range(len(tiles)):
print(f'do file {tiles[i]}')
filename_out = select_points_and_raster(tiles[i], class_nr, resolution)
print(f'finished and wrote {filename_out}')
where classification
is a dictionary containing numbers that correspond to ground/buildings/vegetation, so I don't have to remember the numbers.
This works fine serially by iterating over each file in tiles
. However, as I have many files, I would like to use multiple cores for that. How do I split the task to make use of at least all the four cores I have in my machine? I have tried to do it with the following:
from multiprocess import Pool
ncores = 2
pool = Pool(processes=ncores)
pool.starmap(select_points_and_raster,
[([file for file in tiles], classification[class_nr], resolution)])
pool.close()
pool.join()
but that does not work as I get an AttributeError: 'list' object has no attribute 'split'
.
But I'm not passing a list, or am I? Is that generally the way to go parallelizing that?