Read formatted data from part of a file fast (Gmsh mesh format)

Question

I maintain a little Python package that converts between different formats used for mesh representation à la

Those files can grow pretty big, so when reading them with Python it's important to do it efficiently.

One of the most used formats is msh from Gmsh. Unfortunately, its data layout is arguably not the best. An example file:

$MeshFormat
2.2 0 8
$EndMeshFormat
$Nodes
8
1 -0.5 -0.5 -0.5
2  0.5 -0.5 -0.5
3 -0.5  0.5 -0.5
4  0.5  0.5 -0.5
5 -0.5 -0.5  0.5
6  0.5 -0.5  0.5
7 -0.5  0.5  0.5
8  0.5  0.5  0.5
$EndNodes
$Elements
2
1 4 2 1 11 1 2 3 5
2 4 2 1 11 2 5 6 8
$EndElements

For the $Nodes:

The first number (8) is the number of nodes to follow.

In each node line, the first number is the index (not actually needed by still part of the format, ugh), then follow the three spatial coordinates.

So far, I haven't come up with anything better than islices in a for loop, which is pretty slow.

# The first line is the number of nodes
line = next(islice(f, 1))
num_nodes = int(line)
#
points = numpy.empty((num_nodes, 3))
for k, line in enumerate(islice(f, num_nodes)):
    points[k, :] = numpy.array(line.split(), dtype=float)[1:]
    line = next(islice(f, 1))
assert line.strip() == '$EndNodes'

For the $Elements:

The first number (2) is the number of elements to follow.

In each element line, the first number is the index, then follows an enum for the element type (4 is for tetrahedra). Then follows the number of integer tags for this element (2 in each case here, namely 1 and 11). Corresponding to the element type, the last few entries in this row correspond to $Node indices that form the element – in the case of a tetrahedron, the last four entries.

Since the number of tags can vary from element to element (i.e., line to line), just like the element type and the number of node indices, each line may have a different number of integers.

For both $Nodes and $Elements, any help for reading this data quickly is appreciated.

score 6 · Accepted Answer · answered Jan 13 '17 at 19:25

Here's a somewhat weird implementation based on NumPy:

f = open('foo.msh')
f.readline() # '$MeshFormat\n'
f.readline() # '2.2 0 8\n'
f.readline() # '$EndMeshFormat\n'
f.readline() # '$Nodes\n'
n_nodes = int(f.readline()) # '8\n'
nodes = numpy.fromfile(f,count=n_nodes*4, sep=" ").reshape((n_nodes,4))
# array([[ 1. , -0.5, -0.5, -0.5],
#   [ 2. ,  0.5, -0.5, -0.5],
#   [ 3. , -0.5,  0.5, -0.5],
#   [ 4. ,  0.5,  0.5, -0.5],
#   [ 5. , -0.5, -0.5,  0.5],
#   [ 6. ,  0.5, -0.5,  0.5],
#   [ 7. , -0.5,  0.5,  0.5],
#   [ 8. ,  0.5,  0.5,  0.5]])
f.readline() # '$EndNodes\n'
f.readline() # '$Elements\n'
n_elems = int(f.readline()) # '2\n'
elems = numpy.fromfile(f,sep=" ")[:-1] # $EndElements read as -1
# This array must be reshaped based on the element type(s)
# array([  1.,   4.,   2.,   1.,  11.,   1.,   2.,   3.,   5.,   2.,   4.,
#    2.,   1.,  11.,   2.,   5.,   6.,   8.])

Mitch · Answer 2 · 2019-05-09T22:34:34.713

5

Why not use the gmsh python API in the Gmsh SDK? For example, using the file explore.py (located in the SDK tarball, gmsh-<\version>-Linux64/share/doc/gmsh/demos/api/explore.py) to read your example, (which I named test.msh).

$ python explore.py test.msh

outputs:

Info    : No current model available: creating one
Info    : Reading 'test.msh'...
Info    : 8 vertices
Info    : 2 elements
Info    : Done reading 'test.msh'
6 mesh nodes and 2 mesh elements on entity (3, 11) Discrete volume
 - Element type: Tetrahedron 4, order 1
   with 4 nodes in param coord:  [0. 0. 0. 1. 0. 0. 0. 1. 0. 0. 0. 1.]

The nodes and elements are stored as numpy arrays.

edited May 09 '19 at 22:34

answered Sep 21 '18 at 17:15

Mitch

167
1
7

Hey I installed gmsh (pip install gmsh), but i don't see an explore.py, can you @Mitch update your post from the import statement and call ?? – SBFRF May 08 '19 at 21:51
1

@SBFRF updated my answer. The file is in the source code tarball, not the python package available through pip – Mitch May 09 '19 at 22:36

Read formatted data from part of a file fast (Gmsh mesh format)

2 Answers2