Make use of multiple cores using multiprocessing converting lidar point cloud files to raster

Question

I have many classified lidar point cloud files, which I want to convert to geotiff raster files. For that I wrote a function that creates a json-Pipeline file that is required for conversion with PDAL and then executes that pipeline.

tiles = []
for file in glob.glob("*.las"):
    tiles.append(file)    

def select_points_and_raster(file, class_nr, resolution):
    filename_out = file.split('.')[0]+'_'+ str(do) +'.tif'
    config = json.dumps([ file, 
                         {'type':'filters.range', 'limits':classification[class_nr]},
                         {'resolution':resolution, 'radius':resolution*1.414, 
                          'gdaldriver':'GTiff', 
                          'output_type':['mean'], 
                          'filename':filename_out}
                         ])
    
    pipeline = pdal.Pipeline(config)
    pipeline.execute()
    return filename_out

for i in range(len(tiles)):
    print(f'do file {tiles[i]}')
    filename_out = select_points_and_raster(tiles[i], class_nr, resolution)
    print(f'finished and wrote {filename_out}')

where classification is a dictionary containing numbers that correspond to ground/buildings/vegetation, so I don't have to remember the numbers.

This works fine serially by iterating over each file in tiles. However, as I have many files, I would like to use multiple cores for that. How do I split the task to make use of at least all the four cores I have in my machine? I have tried to do it with the following:

from multiprocess import Pool
ncores = 2

pool = Pool(processes=ncores)

pool.starmap(select_points_and_raster, 
               [([file for file in tiles], classification[class_nr], resolution)]) 
pool.close()
pool.join()

but that does not work as I get an AttributeError: 'list' object has no attribute 'split'. But I'm not passing a list, or am I? Is that generally the way to go parallelizing that?

score 0 · Answer 1 · answered Apr 06 '22 at 10:29

def select_points_and_raster(input):
    file, class_nr, resolution = input
    filename_out = file.split('.')[0]+'_'+ str(do) +'.tif'
    config = json.dumps([ file, 
                     {'type':'filters.range', 'limits':classification[class_nr]},
                     {'resolution':resolution, 'radius':resolution*1.414, 
                      'gdaldriver':'GTiff', 
                      'output_type':['mean'], 
                      'filename':filename_out}
                     ])

pipeline = pdal.Pipeline(config)
pipeline.execute()
return filename_out


info = []
for i in range(len(tiles)):
    print(f'do file {tiles[i]}')
    info.append((tiles[i], class_nr, resolution))


from multiprocess import Pool
ncores = 2  # ncores = cpu_count() - 1

pool = Pool(ncores)

pool.map(select_points_and_raster, input) 
pool.close()
pool.join()

This is what works for me. You seem to be passing a list inside your tuple list: [([],,)] which is passing a list of file names. You also seem to be passing different inputs: classification[class_nr] instead of just class_nr.

Make use of multiple cores using multiprocessing converting lidar point cloud files to raster

1 Answers1