0

I am a Neo4j and data analytics noob here. I am looking for programmatic way to format data that I collect from Active Directory to have it prepared to be imported into Neo4j. Right now, I am using PowerBI and DAX Studios to clean the data the way I need to make it look but that is not efficient and still requires a lot of manual intervention. I am also dipping my toe into maybe OpenRefine to do this as well but I want to see what the experts think.

My vision ultimate vision is to be able to take a raw file and upload it to a web front end, have some black magic process format the data the way I need it to look, and then upload it into a fresh Neo4j backend for analysis. Once the data is in the backend, I'm good to go. And I have a collector process to take into environments and gather raw information. It is just the journey from point A to B. Any help is appreciated. Thanks!

POSH Geek
  • 174
  • 1
  • 11

2 Answers2

1

On the topic of the data cleaning. When i import data from .csv files i have used often: apoc.map.clean function to remove empty values http://neo4j-contrib.github.io/neo4j-apoc-procedures/3.5/utilities/map-functions/

Also while parsing big CSV files I often remove keys that I don't need

LOAD CSV WITH HEADERS FROM 'file:///segment_data.csv' as line FIELDTERMINATOR ','
WITH line LIMIT 1
WITH apoc.map.removeKeys(line, [i in keys(line) WHERE NOT i contains 'cust_']) as custKeys
WITH custKeys
RETURN apoc.map.clean(custKeys,[], ["","NA"]) AS output
Naor Levi
  • 1,713
  • 1
  • 13
  • 27
Paul Are
  • 11
  • 5
0

I would use Kettle. It has connectors to read from a big range of data sources and write to Neo4j.

https://medium.com/neo4j/getting-started-with-kettle-and-neo4j-32ff15b991f9

https://github.com/neo4j-examples/kettle-plugin-examples

Nathan Smith
  • 881
  • 4
  • 6