-1

I am new to NEO4J and I wanted to see how fast it is. I started to test it and I created a table in both MySQL and NEO4J, with these properties (fields):

id    random_number    time_stamp

And I wrote a program to generate mass data and inserted about 150 million rows (and Nodes in Neo4J). I can say the write speed was almost same.

So, I tested a select query in both databases. "I wanted to get one of the rows(nodes) with the random_id of 255454" (we know from this random id there are more than 30 rows)

NEO4J:

match (t:testLabel {random_id: 255454}) RETURN t LIMIT 50;

MySQL:

SELECT * 
FROM  test 
WHERE  random_id=255454 LIMIT 50

NEO4J took ~47 seconds and MySQL took about ~25 seconds to return results.

NEO4J size on disk became ~35GB and MySQL size on disk became ~5.2 GB

And both databases did not have index on the table or properties.

Hardware: CPU: Corei7-4770 | RAM: 12GB | SSD Hard disk


This is a simple test, I mean both databases were so simple and had basic structures and before testing I thought, NEO4J is faster than MySQL. As I really like NEO4J I want to find a solution and use it again.

According to my simple test, NEO4J is not reasonable for big and scalable projects. I want to know maybe there are some ways that make it amazingly faster! The performance test was so simple and all databases have to have it as well regardless of data modeling.

And what about the size on disk?

+ I found another comparing question by Jörg Baach that you may like to see.

Community
  • 1
  • 1
Mohammad Kermani
  • 5,188
  • 7
  • 37
  • 61
  • What, you thought some aliens wrote some superb code that MySQL developers didn't think about? If you like Neo4J, just use it. It doesn't have to be faster than MySQL to prove it's useful to you. – N.B. May 22 '16 at 21:05
  • @N.B. I really like the way NEO4J is. But I also need performance for my problem! – Mohammad Kermani May 22 '16 at 21:48
  • Do you really? You mean that among so many people who use it and think it's a useful system, you're one of those rare guys who have some mysterious software and you simply can't use it because an older and better program beats it in a silly synthetic test? :) – N.B. May 22 '16 at 22:36
  • 1
    @N.B. I'm not saying it is not useful or something like that, as you see in my question I'm looking for a way to make it faster! so, I like to use it and I like to make it faster. – Mohammad Kermani May 22 '16 at 22:51
  • 2
    You're comparing apples to oranges. And... asking how to performance-tune a database engine is off-topic for StackOverflow. And... your question's basic issue is around the query time of a single node, with no indexing (an unrealistic scenario, leaving off indexes). – David Makogon May 23 '16 at 03:36
  • Comparing x and y won't make anything faster. Just use the software and research how to make it faster. Comparing it to MySQL won't yield any results except that one is quicker than the other in certain conditions. You can apply this to any software out there, from operating systems to games. – N.B. May 23 '16 at 16:41
  • @N.B. Thanks for your comment, but I was migrating from MySQL to something better, for sure NEO4J is really good but it was not was to selecting one node and you know it is really useful, for example imagine I want to select a user from some users, and when I have many relationships or nodes, it have to be fast. And it is really good but for other reasons and other scenarios – Mohammad Kermani May 23 '16 at 18:21
  • I understand you completely, but funny thing is - MySQL is actually pretty freakin' fast when you know what you're doing and when a relational database is the right tool for the job, and funny as it may sound - I've met a lot of people during my career that moved away from MySQL to other solutions, only to be disappointed. I'm not saying Neo4j is slow, but from my experience based on 15 years of work with MySQL - it's far, far from slow. – N.B. May 23 '16 at 20:02
  • What queries will _you_ be giving to the dataset? If your example is realistic, then I suggest indexing `random_id`, and discover that MySQL will perform it in less than 1 second. Then tune/tweak/index/whatever Neo4j to see if it can also perform that fast. – Rick James May 29 '16 at 21:28

2 Answers2

2

Comparing relational databases and graph databases is a huge task.

I think a much more helpful test would be to check performance on queries across multiple tables with several joins and fk. Compare that to neo4j and you will find, possibly much better performance than mysql.

Do this: With your test model set up 4-5 possible use cases. A couple things that a dba will be doing, a couple things that users will be doing etc. Determine how many people are going to be doing this, determine how often they will be doing this.

Choose simple tasks, and complex tasks. Compare MySQL performance to Neo4J. You will find that one DB outperforms the other in different situation.

Try to weigh what your priorities are. How important is it to you to have great performance on matching 50 nodes with a certain property. How important is to you that users (dozens? milions?) will have fast, secure method of creating extensively complex networks of relationships? Once you determine what is important to you refer to the performance tests and determine which db is better for your needs.

If you are going to be performing basic queries you should probably use relational database model like sql. Neo4j is great for complex schemas and queries , not only from a performance perspective but from a readability standpoint.

Neo4j is storing data in a very different way, hence the disk storage difference.

Cypher is centered around the graph patterns that are core to your use-cases and represents them visually as part of its query syntax.

This article is really insightful, shows the transition from relational to graph databases.

EoinS
  • 5,405
  • 1
  • 19
  • 32
  • 1
    I think you are right, but what about the time we need to select a single node and it may take a long time? For example imagine those are comments in my database of my social network and I want to chose one according to its id, or is a post and I want to chose one. As I understand until now, for many relations thankfully NEO4J is so faster than relational databases but I am asking about single nodes – Mohammad Kermani May 23 '16 at 07:56
  • 1
    If you need to select data based on a single criteria you need to optimize with conditions, indexes, relationships etc. Matching a large heap based on a single arbitrary property is going to take a while in Neo4J, SQL will be faster. SQL outperforms in this particular case. – EoinS May 23 '16 at 15:40
  • Think of a comparison between Nissan GTR and a Formula 1 car. GTR is awesome, it is an amazing machine, but if you put the GTR against a Formula 1 against one another in a variety of race conditions the F1 car is going to blow the Nissan away. In your test case you are comparing the ability to parallel park. The GTR ticks the box here, but you are not fully utilizing the features, power and technology of both machines. I would encourage you to put the pedal to the metal a little bit. – EoinS May 23 '16 at 15:50
0
  1. Did you create an index on testLabel and property random_id?
  2. You're seeing a rather high disk usage since transaction logs are kept by default for 7 days, there's a config option to tweak this.

On a general notice: Just looking up a single node is not a reasonable performance test for a graph db. You should probably do some query following a few connections to see the difference.

Stefan Armbruster
  • 39,465
  • 6
  • 87
  • 97
  • 1
    He said both MySQL and NOE4J did not have index on table and properties (fields), then both have to have same speed or NEO4J have to be faster. And index make the NOE4J even heavier than this! – Nasser Ghiasi May 22 '16 at 21:47
  • 1
    I agree with @NasserGhiasi, and as I said before I like NEO4J and it is simple to use. But we need to use it by even a single node too. I mean for example we want to select one user and see his information or something like this! And about the transaction logs, they were temporary and just was storing for a limit size, not a big size. NEO4J database size was ~35 GB that means 6.7 more than MySQL. + I am not a fan of MySQL and I just can not understand why NEO4J is so heavy and why it is not fast in a same situation! – Mohammad Kermani May 22 '16 at 21:58
  • 1
    Neo4J does so much more complex relationship data modeling than sql. It's an unfair comparison of tools to not use indexes, essentially not utilizing the functionality of those tools to compare. – EoinS May 22 '16 at 23:11
  • Then you are agree with me that NEO4J is slower than MySQL with the same condition and without indexing for fetching one row (node)? – Mohammad Kermani May 31 '16 at 12:06
  • 1
    Since your use case is not at all graph oriented - you're just looking up a node without indexes - another technology optimized for that kind of operation might have more performance. However concluding "neo4j is slower than mysql" out of this is wrong and misleading. Just do couple of joins upon larger tables.... just like a Ferrari is probably not the right car to win a off-road competition. – Stefan Armbruster May 31 '16 at 15:17