Reorder a .csv file with Python

Question

I have a problem relating Paraview, Anaconda and Python3.

Simply, I want to open a file.vtu in Paraview, get its data.csv and reorder it. The problem is that when I run the script with pvpython, it doesn't recognize pandas; when I run it with "python3 .py", it does not recognize paraview. I need pandas to reorder in that specific manner because some numbers are in scientific notation with capital E.

Here is my code:

from paraview import simple
import csv
import pandas

reader = simple.OpenDataFile("flow3.vtu")
writer = simple.CreateWriter("data0.csv", reader)
writer.FieldAssociation = "Points"
writer.UpdatePipeline()

with open('data0.csv') as csvfile:
    rdr = csv.reader(csvfile)
    # Pandas have to be used here to read the scientific notation
    b = sorted(rdr, key=lambda x: x[16], reverse=False)
    c = sorted(b, key=lambda x: x[15], reverse=False)

with open('data0.csv', 'w') as csvout:
    wrtr = csv.writer(csvout)
    wrtr.writerows(c)

Thanks very much.

ParaView 5.9.0 will include pandas. – Mathieu Westphal Nov 20 '20 at 08:38 — Mathieu Westphal, Nov 20 '20 at 08:38

Rivers · Answer 1 · 2020-11-19T10:54:29.253

It seem it's a problem of environment.

Using Anaconda or Miniconda, you should create a specific virtual environment for your projetc. By default a virtual environment named "Base" is created.

Here is how you could resolve your problem.

Choose a name for your virtual environment. Say "csvenv".

Then:

# Create the environment named "csvenv"
conda create --name csvenv

# Activate the environment
conda activate csvenv

# In this environment, install paraview and pandas
conda install -c conda-forge paraview
conda install -c conda-forge pandas

Then, from this environment, run your code.

See https://docs.conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html for details about virtual environment with conda.

score 0 · Answer 2 · answered Nov 17 '20 at 19:20

Paraview does not include pandas. A solution would be to compile your own distribution of Paraview from the source code and to include pandas. Anyway, that would be the difficult way.

They were some discussions to include it, however, I do not know if it is going to be done: add pandas in pvpython

An alternative to what @Rivers proposes is to stay in Paraview (pvpython) and to transform your data to a numpy.array. Then, you can sort your data or/and export it to a (*.csv) file. The advantage of such solutions is that you can stay in Paraview and build macros (with buttons in the ribbon) automating the tasks your regularly perform.

First solution (fastest I know)

import numpy as np
import paraview.simple as ps
from vtk.numpy_interface import dataset_adapter as dsa


def export_vtk_table_to_csv_v1(vtk_table, vtk_table_name, save_path):

    """
    This function exports a vtk table to a (*.csv) file

    Parameters
    ----------
    vtk_table: vtkTable class
        vtk table containing the simulation data for the asset
    vtk_table_name:
        vtk table name
    save_path: str
        path of the folder where the (.csv) file is saved

    Returns
    -------
    """
    
    # Getting the number of columns and rows
    nb_cols = vtk_table.GetNumberOfColumns()
    nb_rows = vtk_table.GetNumberOfRows()

    # Built a numpy array that will be exported later on
    # +1 row to insert the column names
    arr = np.zeros((nb_rows + 1, nb_cols), dtype='U255')

    # Storing the columns names in a list (will be the first row)
    for col_index in range(0, nb_cols):
        col_name = vtk_table.GetColumnName(col_index)
        arr[0, col_index] = col_name

        for row_index in range(0, nb_rows):
            arr[row_index + 1, col_index] = \
                vtk_table.GetValue(row_index, col_index)

    np.savetxt(save_path + vtk_table_name + '.csv', arr,
               delimiter=";", fmt="%s")

Second approach (slower)


def export_vtk_table_to_csv_v2(vtk_table, vtk_table_name, save_path):

    """
    This function exports a vtk table to a (*.csv) file

    Parameters
    ----------
    vtk_table: vtkTable class
        vtk table containing the simulation data for the asset
    vtk_table_name:
        vtk table name
    save_path: str
        path of the folder where the (.csv) file is saved

    Returns
    -------
    """

    nTable=dsa.WrapDataObject(vtk_table)
    columns = nTable.RowData.keys()
    nb_rows = vtk_table.GetNumberOfRows ()

    rows = []
    for x in range(nb_rows):
        row = [nTable.RowData[col].GetValue(x) for col in columns]
        rows.append(row)
    
    arr = lists_to_structured_np_array(columns, rows_list, 'U255')
    
    np.savetxt(save_path + vtk_table_name + '.csv', arr,
               delimiter=";", fmt="%s")

def lists_to_structured_np_array(headers_list, data_lists, dtype_list):
    """
    This function gather several lists of data into a np structured array.
    Each list corresponds to a column of the array. The list of headers and of
    dtypes is also required.

    Parameters
    ----------
    headers_lits : list
        the list get a clean console display)
    data_lists: list
        list of lists. Each sub list contain one column data
    dtype_list: list
        list containing the dtypes to apply

    Returns
    -------
    numpy.array
    """

    # If the dtype_list is a simple dtype, it need to be turned to a list
    # with same length as headers_list
    if type(dtype_list) != list:
         dtype_list = [dtype_list] * len(headers_list)
    # Combine the dtype_list and headers_list into a list of tuples
    dtype = [tuple([x, y]) for x, y in zip(headers_list, dtype_list)]
    # Convert the data list to a list of tuples
    data = [tuple(x) for x in data_lists]
    # Create the numpy array
    structuredArr = np.array(data, dtype=dtype)

    return structuredArr

Reorder a .csv file with Python

2 Answers2

First solution (fastest I know)

Second approach (slower)