Benchmarking time series workloads on Apache Kudu using TSBS

Click for: original source

Since the open-source introduction of Apache Kudu in 2015, it has billed itself as storage for fast analytics on fast data. This general mission encompasses many different workloads, but one of the fastest-growing use cases is that of time-series analytics. By Todd Lipcon.

In this blog post, we’ll evaluate Kudu against three other storage systems using the Time Series Benchmark Suite (TSBS), an open-source collection of data and query generation tools representing an IT operations time-series workload.

The article then covers:

  • Kudu-TSDB architecture
  • Benchmarking target systems
  • Benchmark hardware
  • Benchmark setup
  • Results: Data loading performance
  • Results: Light queries, 8 client threads
  • Results: Light queries, 16 client threads
  • Performance on heavy queries

Although Apache Kudu is a general purpose store, its focus on fast analytics for fast data make it a great fit for time series workloads. In addition to the quantitative differences summarized above, it’s important to understand qualitative differences between the stores. In particular, Kudu and ClickHouse share the trait of being general-purpose stores, whereas VictoriaMetrics and InfluxQL are limited to time series applications. In practical terms, this means that Kudu and ClickHouse allow your time series data to be analyzed alongside other relational data in your warehouse, and to be analyzed using alternative tools such as Apache Spark, Apache Impala, Apache Flink, or Python Pandas. Good read!

[Read More]

Tags analytics big-data data-science performance devops