Building a distributed time-series database on PostgreSQL

Click for: original source

TimescaleDB, a time-series database on PostgreSQL, has been production-ready for over two years, with millions of downloads and production deployments worldwide. Authors publicly shared their design, plans, and benchmarks for the distributed version of TimescaleDB. By Mike Freedman and Erik Nordström.

PostgreSQL is the fastest growing database right now, faster than MongoDB, Redis, MySQL, and others. PostgreSQL itself has also matured and broadened in capabilities, thanks to a core group of maintainers and a growing community.

Our new distributed architecture, which a dedicated team has been hard at work developing since last year, is motivated by a new vision: scaling to over 10 million metrics a second, storing petabytes of data, and processing queries even faster via better parallelization. Essentially, a system that can grow with you and your time-series workloads.

The article then explains broad range of topics related to time series databases, including:

  • Chunking, not sharding
  • Benchmarks
  • Five objectives of database scaling
  • Designing for Scale
  • Introducing Distributed Hypertables
  • Design Principles

TimescaleDB doesn’t overcome the CAP Theorem. We do talk about how TimescaleDB achieves “high availability”, using the term as commonly used in the database industry to mean replicated instances that perform prompt and automated recovery from failure. This is different than formal “Big A” Availability from the CAP Theorem, and TimescaleDB today sacrifices Availability for Consistency under failure conditions. Good read!

[Read More]

Tags database software-architecture distributed web-performance