How we use Apache Kafka and the Confluent Platform

Click for: original source

Jendrik Poloczek from TokenAnalyst published this article about their experience building the core infrastructure to integrate, clean, and analyze blockchain data.

Apache Kafka® is the central data hub of TokenAnalyst. They’re using Kafka for ingestion of blockchain data. The Confluent Platform is a stream data platform that enables you to organize and manage data from many different sources with one reliable, high performance system.

A public ledger could potentially serve not only as a publicly accessible ledger for money or asset transactions but also as a ledger of interactions on a shared decentralized data infrastructure.

The blockchain as a data structure is, in essence, a giant, shared immutable log, lending itself perfectly for event sourcing and (replayed) stream processing. The required trust comes from transparency. And transparency is realized by surfacing and decoding the data that is stored on the blockchain.

In the article you learn:

  • Why does on-chain data matter?
  • Cluster of Ethereum nodes, Ethereum-to-Kafka bridge
  • Block confirmer based on Kafka Streams
  • API and software development kit (SDK)

To find out how they use templates written in Terraform, which allow them easily deploy and bootstrap nodes across the planet in different AWS regions. Together with use the Geth and Parity clients.

To bridge the gap between different Ethereum clients and Kafka, they developed an in-house solution named Ethsync, written in Scala. Good read!

[Read More]

Tags blockchain apache apis data-science scala