0

I'm working on a project where I need to import table data from a Parquet file into the Memgraph graph database. My data looks something like this:

+-----------+-------------+---------+------------+--------+
| FirstName | LastName    | Country | Occupation | Salary |
+-----------+-------------+---------+------------+--------+
| John      | Doe         | USA     | Engineer   | 70000  |
| Jane      | Smith       | UK      | Doctor     | 80000  |
| Max       | Johnson     | Canada  | Teacher    | 60000  |
| Emily     | Davis       | Germany | Scientist  | 90000  |
| Luke      | Rodriguez   | France  | Artist     | 50000  |
+-----------+-------------+---------+------------+--------+

I know that I could convert this to CSV and then use LOAD CSV Cypher clause but this is inconvenient. What can I do?

Moraltox
  • 537
  • 1
  • 7

1 Answers1

1

Memgraph supports Parquet file formats via the PyArrow package. To import data from a Parquet file into Memgraph, you can use GQLAlchemy.

Once you have GQLAlchemy installed, you can use the ParquetLocalFileSystemImporter class to import data from a Parquet file. Here's an example:

from gqlalchemy import Memgraph
from gqlalchemy.transformations.importing.loaders import ParquetLocalFileSystemImporter

# Define your data configuration object (parsed_yaml)
# ...

# Create an importer object
importer = ParquetLocalFileSystemImporter(
    path="path/to/your/parquet/file",
    data_configuration=parsed_yaml,
    memgraph=Memgraph()
)

# Import the data
importer.import_data()

You can find more details at https://memgraph.com/docs/gqlalchemy/how-to-guides/table-to-graph-importer.

Taja Jan
  • 942
  • 1
  • 1
  • 11