1

As I was working and seeing through the apache age viewer. This question regarding a data analysis project has come to my mind like how to use the apache age viewer to import this data in CSV or JSON format. What's the best method so that I can start analyzing it.

I myself researched for the sources so I can get an accurate answer but unfortunately coudnt find any good explanation. Some are recommending Gremlin and Cyoher to load it while some are saying to converrt into TinkerPop first.

I hope to get a thorough explanation on how to import the data using either of the method. Looking forward to it.

4 Answers4

1

Yes, you can import data from CSV files into Postgres using Apache AGE. The function load_labels_from_file is used to load vertices from the CSV files. Sample syntax:

load_labels_from_file('<graph name>','<label name>','<file path>')

For Example: Create label country and load vertices from csv file.

SELECT create_graph('agload_test_graph');
SELECT create_vlabel('agload_test_graph', 'Country');
SELECT load_labels_from_file('agload_test_graph', 'Country','age_load/countries.csv');

For more details you can follow this: Importing graph from file

But don't forget to preprocess your files so that the columns and headings are in the correct format.

1

For making labels in the Age, you can use the following command.

load_labels_from_file('<graph name>', 
                  '<label name>',
                  '<file path>')

The fourth parameter is optional and only used if we are not giving the ID in the csv file.

load_labels_from_file('<graph name>', 
                  '<label name>',
                  '<file path>', 
                  false)

Format of CSV File for labels:

ID: It shall be the first column of the file and all values shall be a positive integer. This is an optional field when id_field_exists is false. However, it should be present when id_field_exists is not set to false.

Properties: All other columns contains the properties for the nodes. The header row shall contain the name of the property

For adding edges, the following function is used.

oad_edges_from_file('<graph name>',
                '<label name>',
                '<file path>');

Format of the CSV File for edges is as follows:

start_id: node id of the node from where the edge is stated. This id shall be present in the nodes.csv file.

start_vertex_type: class of the node.

end_id: end id of the node at which the edge shall be terminated.

end_vertex_type: Class of the node.

properties: properties of the edge. the header shall contain the property name.

For a detailed explanation and example, you can visit this link.

CSV LINK

0

If you want to import from CSV file, you can refer to this. Make sure to preprocess your file, so that the columns and headings have same format.

You can also use "COPY" statement to achieve this. It can be done as:

COPY [YOUR_TABLE_NAME] FROM [PATH_TO_CSV-FILE] WITH (FORMAT csv);

The documentation for COPY is here. There are other ways as well like using TinkerPop & Cypher, each with own benefits,

Huzaifa
  • 484
  • 4
  • 8
0

To import CSV files to AGE, these files must initially be formatted in the following way depending on what they will store in the graph:

# Nodes

id,property1,property2,...,propertyN
123,content1,content2,...,contentN
124,content1,content2,...,contentN

# Edges

start_id,start_vertex_type,end_id,end_vertex_type, (properties goes here, just like above)
123,LabelStart,124,LabelEnd

There is also three CSV files located at regress/age_load/data that shows how to store the data for cities, countries, and their relationships with one another. These files are named cities.csv, countries.csv, and edges.csv.

Now, with a postgres instance running, create the graph and the labels for the nodes and edges stored in the CSV files.

LOAD 'age';
SET search_path TO ag_catalog;

SELECT create_graph('agload_test_graph');
SELECT create_vlabel('agload_test_graph','Country');
SELECT create_vlabel('agload_test_graph','City');
SELECT create_elabel('agload_test_graph','has_city');

After these labels have been created, load the CSV files to AGE.

-- The arguments for the functions are: <graph_name>, <label>, <file_path>
SELECT load_labels_from_file('agload_test_graph','Country','age_load/data/countries.csv');
SELECT load_labels_from_file('agload_test_graph','City', 'age_load/data/cities.csv');
SELECT load_edges_from_file('agload_test_graph','has_city','age_load/data/edges.csv');

Then you can check if everything was created correctly.

SELECT COUNT(*) FROM agload_test_graph."Country";
SELECT COUNT(*) FROM agload_test_graph."City";
SELECT COUNT(*) FROM agload_test_graph."has_city";

SELECT COUNT(*) FROM cypher('agload_test_graph', $$MATCH(n) RETURN n$$) as (n agtype);
SELECT COUNT(*) FROM cypher('agload_test_graph', $$MATCH (a)-[e]->(b) RETURN e$$) as (n agtype);
Matheus Farias
  • 716
  • 1
  • 10