Comment by alexchantavy

Neo4j has a GraphRAG book that I've found very helpful: https://neo4j.com/essential-graphrag/

It depends on the shape of your data. In my domain (cloud security), there are many many entities and it's very valuable to map out how they relate to each other.

For example, we often want to answer a question like: “Which publicly exposed EC2 instances are used by IAM roles that have administrative privileges in my AWS account?”

To answer the question, you need to: 1. Join ec2 instances to security groups to IP rules to IP ranges to find network exposure paths to the open internet. 2. Join the instances to their instance profiles, to their roles. 3. Join the IAM roles to their role policies to determine which have admin policies. 4. Chain all of those joins together, possibly with recursive queries if there are indirect relationships (e.g., role assumption chains).

That’s a lot of joins, and the SQL query would get both heavy and hard to maintain.

In graph this query looks something like

match (i:EC2Instance)--(sg:EC2SecurityGroup)--(r:IPPermissionInbound{action:"Allow"})--(rng:IPRange{id:"0.0.0.0/0"}) match (i)--(r:AWSRole)--(p:AWSPolicy)--(stmt:AWSPolicyStatement{effect:"Allow", resource:"*"}) return i.id as instance_id, r.name as role_name

To answer this question what internet open compute instances can act as admins in our environment, we needed to traverse multiple objects, but the shape of the answer is pretty simple: just a list of ids and names.

Graph databases have quirks and add complexity of their own. If your domain isn't this edge heavy, you're probably better off with Postgres, but for our use-case it's been worth the trade-off imo.

This item has no comments currently.