0

I have a data in below format where 1st column represents the products node, all the following columns represent properties of the products. I want to apply content based filtering algo using cosine similarity in Neo4j. For that, I believe, I need to define the fx columns as the properties of each product node and then call these properties as a vector and then apply cosine similarity between the products. I am having trouble doing two things: 1. How to define these columns as properties in one go(as the columns could be more than 100). 2. How to call all the property values as a vector to be able to apply cosine similarity.

Product f1 f2 f3 f4 f5

P1 0 1 0 1 1

P2 1 0 1 1 0

P3 1 1 1 1 1

P4 0 0 0 1 0

Amar jaiswal
  • 55
  • 1
  • 9
  • check this https://neo4j.com/graphgist/a7c915c8-a3d6-43b9-8127-1836fecc6e2f – Tomaž Bratanič Mar 02 '17 at 12:42
  • I have seen this already, but I am having trouble playing around with properties of a node. How do I take all the property values of node as a vector that can be inputted into cosine similarity formaulae. – Amar jaiswal Mar 02 '17 at 15:06

1 Answers1

0

You can use LOAD CSS to input your data.

For example, this query will read in your data file and output for each input line (ignoring the header line) a name string and a props collection:

LOAD CSV FROM 'file:///data.csv' AS line FIELDTERMINATOR ' '
WITH line SKIP 1
RETURN HEAD(line) AS name, [p IN TAIL(line) | TOFLOAT(p)] AS props

Even though your data has a header line, the above query skips over it, as it is not needed. In fact, we don't want to use the WITH HEADERS option of LOAD CSV, since that would convert each data line into a map, whereas it is more convenient for our current purposes to get each data line as a collection of values.

The above query assumes that all the columns are space-separated, that the first column will always contain a name string, and that all other columns contain the numeric values that should be put into the same collection (named props).

If you replace RETURN with WITH, you can append additional clauses to the query that make use of the name and props values.

cybersam
  • 63,203
  • 6
  • 53
  • 76