Algorithm or command line tool to decimate point cloud of terrain points?

Question

I need to take a larger (more dense) than needed list of lidar survey points (longitude, latitude, and elevation) for terrain definition and decimate it based on a 2 dimensional grid. The idea would be to end up with points based on a NxN (i.e. 1 meter x 1 meter) dimension grid using the longitude, latitude (x,y) values, therefore eliminating the points that are more than are needed. The goal is to determine what the elevation is at each point in the grid after the decimation, not use elevation as part of the decimation rule itself.

An actual or precisely structured grid is not necessary or the goal here, I only use the grid terminology to best approximate what I envision as the remainder of the cloud of points after reducing it in a manner that we have always have a point within a certain radius (i.e. 1 meter). It is possible there is a better term to use than grid.

I would like to either code/script this myself in a scripting or programming language if I can start with a decimation algorithm or use a command line tool from a project that may already exist that can do this that can run on Ubuntu and called from our application as system call. The approach should not require using a GUI based type of software or tool to solve this. It needs to be part of an automated set of steps.

The data currently exists in a tab separated values file but I could load the data into a sqlite database file if using an database/sql query driven algorithm would be better/faster. The ideal scripting language would be ruby or python but can be any really and if there exists C/C++/C# libraries for this already then we could wrap those for our needs.

Ideas?

Update Clarifying the use of the result of this decimated list: Given a user's location (known by latitude and longitude), what is the closest point in the list and in turn its elevation? We can do this now of course, but we have more data than is necessary so we just want to relax the density of the data so that if we can find the closest point within a tolerance distance (i.e. 1 meter) if able to use a decimated list vs the full list. The latitude, longitude values in the list are in decimal GPS (i.e. 38.68616190027656, -121.11013105991036)

Can you share the data??? And to understand you correctly, after the construction of the Grid wich point do you want to keep for each grid cell?? — David de la Iglesia, Apr 08 '17 at 17:34
@DaviddelaIglesia, I can try to create some sample data. A precise grid is not the goal, just a point in the list/cloud closest to what an actual grid might be based on latitude and longitude. I just noticed your answer and will review it also in a minute. — Streamline, Apr 08 '17 at 19:07
As far as I know your answer may be what I need already, I just haven't had time to test it yet. It may not be until Monday before I can. I had updated my post to add more clarity to why I am doing this. — Streamline, Apr 09 '17 at 04:02
I think that my new answer should be a more efficient way to finde new users altitude — David de la Iglesia, Apr 10 '17 at 15:11
@DaviddelaIglesia, The goal for this post is not to find the user's location. The goal is to produce a decimated list only. I only added the extra information about the fact that we will be using this decimated list to find a user's location downstream of this task because I thought it would help clarify. The list decimation happens on the server side and the client side will use that decimated list to find user's location at separate times. Does your new answer still provide the result of a new decimated list from the original more dense list regardless of the user location portion? — Streamline, Apr 10 '17 at 17:18
@DaviddelaIglesia, If I can separate your updated answer into 2 main steps that can be taken at different times and know that the first step reduces the data footprint and the second step is reusable later to find the user location, then I think it will still be on track. I will start digging into this later today to test. Thanks. — Streamline, Apr 10 '17 at 17:42
Splitted the answer in two parts. With 2 options for part 2: query for new users location — David de la Iglesia, Apr 10 '17 at 17:48

David de la Iglesia · Answer 1 · 2017-04-10T17:45:26.037

1

PART 1: decimated version

Load data

Load the data from the tabular file (change sep according to the separator you are using):

# installed as dependency
import pandas as pd  
# https://github.com/daavoo/pyntcloud
from pyntcloud import PyntCloud  

dense = PyntCloud(pd.read_csv("example.tsv",
                     sep='\t',
                     names=["x","y","z"]))

This is how it looks the example I created:

Build VoxelGrid

Asuming that the latitude and longitude in your file are in meters you can generate a grid as follows:

grid_id = dense.add_structure("voxelgrid",
                           sizes=[1, 1,None],
                           bb_cuboid=False)
voxelgrid = dense.voxelgrids[grid_id]

This voxelgrid has a size of 1 along the x (latitude) and y (longitude) dimensions.

Build decimated version

decimated = dense.get_sample("voxelgrid_centroids", voxelgrid=grid_id)

decimated is a numpy (N,3) array. You can store it for later use in a SQL database, etc.

PART 2: Query

Option A: query voxelgrid

Get mean altitudes for each grid cell

You can know get a vector with the mean z (altitude) value for each cell in the grid:

z_mean = voxelgrid.get_feature_vector(mode="z_mean")

Query the grid with users's location:

users_location = np.random.rand(100000, 2)

Add a column of zeros because query requires 3D (This doesn't affect the results):

users_location = np.c_[ users_location, np.zeros(users_location.shape[0]) ]

Get in wich cell each user is:

users_cell = voxelgrid.query(users_location)

And finally, get the altitude corresponding to each user:

users_altitude = z_mean[users_cell]

Option B: Use decimated version for query

Build a KDTree of decimated:

from scipy.spatial import cKDTree kdt = cKDTree(decimated)
Query the KDTree with user locations:

users_location = np.random.rand(100000, 2) users_location = np.c_[ users_location, np.zeros(users_location.shape[0])

distances, indices = kdt.query(user_locations, k=1, n_jobs=-1)

Extra, you can save and laod the voxelgrid with pickle:

pickle.dump(voxelgrid, open("voxelgrid.pkl", "wb"))

voxelgrid = pickle.load(open("voxelgrid.pkl", "rb"))

edited Apr 10 '17 at 17:45

answered Apr 08 '17 at 18:32

David de la Iglesia

2,436
14
29

thanks for this answer. At first glance and at least conceptually from the method names, the coding appears to be on track with what I am looking for. I will need to test and review closer to provide more feedback. I can say that the video you embedded is hard for me to see any resemblance of a decimated grid-like set of data in the cloud, but it may just be the quality/size of the video. If the latitude,longitude values in the list are decimal GPS format - (i.e. 38.68616190027656, -121.11013105991036) does that change how this will work? – Streamline Apr 08 '17 at 19:18
@Streamline I think that I have a better way to do this. But I haven't pushed to github yet. If you don't mind to wait I'll push it soon. – David de la Iglesia Apr 10 '17 at 10:46
Does this updated answer allow me to split the task of producing the decimated list at one point in time and then later attempting the user location lookup? (See my comment on the main question). This portion of the overall set of tasks is to reduce and eliminate unneeded data. – Streamline Apr 10 '17 at 17:36
It's all about the voxelgrid. To decimate the point cloud you get the voxelgrid centroids as the new decimated version. To query for user location you can either use the method I show in the answer or use any neighbor search against the decimated version. – David de la Iglesia Apr 10 '17 at 17:41
Ok. Sounds good. I will start testing later today and return with feedback. – Streamline Apr 10 '17 at 17:44
Hi David, I don't think I got to the testing I planned for that day back in April. I am returning to this project now however and plan to start testing soon. Before I start is there a recommended python version for this based on the dependencies? Are the pandas and PyntCloud modules the only dependencies needed to perform steps 1, 2a and 2b? Thanks. – Streamline Oct 06 '17 at 16:13
1

I think I've got that I need python3 and the dependencies nailed down after reviewing http://pyntcloud.readthedocs.io/en/latest/installation.html. I've given it a first try and at this line `voxelgrid = dense.voxelgrids[grid_id]` from above step 1 in python console, I am getting `AttributeError: 'PyntCloud' object has no attribute 'voxelgrids'`. If I type `dense` I do see `1 voxelgrids` as a result. Suggestions? Has the library changed since April that I should modify the steps? – Streamline Oct 06 '17 at 22:02
I also have the same problem; 'PyntCloud' object has no attribute 'voxelgrids'. Any ideas how to proceed? – GStav Nov 23 '17 at 15:04

score 0 · Answer 2 · answered Feb 11 '21 at 21:31

If you have a point cloud as a text file (.xyz) a simple and fast solution is to take a random sample from the file using shuf.

10 million points in a xyz-file equals 10 million lines of text. You can run:

shuf input.xyz -l 5000000 -o out.xyz

You have decimated the file to half the original size.

Algorithm or command line tool to decimate point cloud of terrain points?

2 Answers2

PART 1: decimated version

PART 2: Query

Option A: query voxelgrid

Option B: Use decimated version for query