Behind the scenes of creating the world's biggest graph database

Click for: original source

How we’d decided to try and build the biggest graph database that has ever existed. By Chris Gioran.

Using smaller instances did the trick, and the next day we had the full contingent of 1129 shards up and running. The latency measuring demo app was almost ready so we decided to take some measurements to see where we stand.

This article describes the process how neo4j team built a database which had:

  • 1128 forum shards, 1 person shard, 3 Fabric processors
  • Each forum shard contains 900 million relationships and 182 million nodes. The person shard contains 3 billion people and 16 billion relationships between them
  • Overall, the full dataset is 280 TB, and 1 trillion relationships
  • Took 3 weeks from inception to final results
  • Costs about $400/h to run at full scale

When authors introduced Neo4j Fabric, they also created a proof of concept benchmark that was presented at FOSDEM 2020. It showed that, for a 1TB database, throughput and latency improve linearly with the number of shards that it’s distributed across. More shards, more performance. Follow the link to the full article to learn more about this fascinating database. Excellent!

[Read More]

Tags database queues search performance cloud devops aws streaming