Why I recommend my clients NOT use KSQL and Kafka Streams

Click for: original source

An article by Jesse Anderson. He recommends his clients not use Kafka Streams because it lacks checkpointing. Kafka Streams also lacks and only approximates a shuffle sort. KSQL sits on top of Kafka Streams and so it inherits all of these problems and then some more.

Kafka isn’t a database. It is a great messaging system, but saying it is a database is a gross overstatement. Saying Kafka is a database comes with so many caveats I don’t have time to address all of them in this post. Unless you’ve really studied and understand Kafka, you won’t be able to understand these differences.

Checkpointing is fundamental to operating distributed systems.

The article then delves into:

  • Explaining checkpointing
  • Shuffle Sort
  • Kafka Streams and KSQL
  • What Should You Do?

Loads of useful information for anybody considering data streaming for their project. Nice one!

[Read More]

Tags streaming software-architecture apache distributed