I'm going to explain this using the ArangoDB 2.8 Traversals.
We Create these collections to match your shema using arangosh:
db._create("countries")
db.countries.save({_key:"Germany", name: "Germany"})
db.countries.save({_key:"France", name: "France"})
db.countries.ensureHashIndex("name")
db._create("cities")
db.cities.save({_key: "Munich"})
db.cities.save({_key: "Toulouse")
db._create("company")
db.company.save({_key: "Siemens"})
db.company.save({_key: "Airbus"})
db._create("employees")
db.employees.save({lname: "Kraxlhuber", cname: "Xaver", _key: "user1"})
db.employees.save({lname: "Heilmann", cname: "Vroni", _key: "user2"})
db.employees.save({lname: "Leroy", cname: "Marcel", _key: "user3"})
db._createEdgeCollection("CityInCountry")
db._createEdgeCollection("CompanyIsInCity")
db._createEdgeCollection("WorksAtCompany")
db.CityInCountry.save("cities/Munich", "countries/Germany", {label: "beautiful South near the mountains"})
db.CityInCountry.save("cities/Toulouse", "countries/France", {label: "crowded city at the mediteranian Sea"})
db.CompanyIsInCity.save("company/Siemens", "cities/Munich", {label: "darfs ebbes gscheits sein? Oder..."})
db.CompanyIsInCity.save("company/Airbus", "cities/Toulouse", {label: "Big planes Ltd."})
db.WorksAtCompany.save("employees/user1", "company/Siemens", {employeeOfMonth: true})
db.WorksAtCompany.save("employees/user2", "company/Siemens", {veryDiligent: true})
db.WorksAtCompany.save("employees/user3", "company/Eurocopter", {veryDiligent: true})
In AQL we would write this query the other way around.
We start with the constant time FILTER
on the indexed attribute name
and start our traversals from there on.
Therefor we filter for the country "Germany":
db._explain("FOR country IN countries FILTER country.name == 'Germany' RETURN country ")
Query string:
FOR country IN countries FILTER country.name == 'Germany' RETURN country
Execution plan:
Id NodeType Est. Comment
1 SingletonNode 1 * ROOT
6 IndexNode 1 - FOR country IN countries /* hash index scan */
5 ReturnNode 1 - RETURN country
Indexes used:
By Type Collection Unique Sparse Selectivity Fields Ranges
6 hash countries false false 66.67 % [ `name` ] country.`name` == "Germany"
Optimization rules applied:
Id RuleName
1 use-indexes
2 remove-filter-covered-by-index
Now that we have our well filtered start node, we do a graph traversal in reverse direction. Since we know that Employees
are exactly 3 steps away from the start Vertex, and we're not interested in the path, we only return the 3rd layer:
db._query("FOR country IN countries FILTER country.name == 'Germany' FOR v IN 3 INBOUND country CityInCountry, CompanyIsInCity, WorksAtCompany RETURN v")
[
{
"cname" : "Xaver",
"lname" : "Kraxlhuber",
"_id" : "employees/user1",
"_rev" : "1286703864570",
"_key" : "user1"
},
{
"cname" : "Vroni",
"lname" : "Heilmann",
"_id" : "employees/user2",
"_rev" : "1286729095930",
"_key" : "user2"
}
]
Some words about this queries performance:
We locate Germany using a hash index is constant time -> O(1)
Based on that we want to traverse m many paths where m is the number of employees in Germany; Each of them can be traversed in constant time.
-> O(m) at this step.
Return the result in constant time -> O(1)
All combined we need O(m) where we expect m to be less than n (the number of employees) as used in your SQL-Traversal.