Change data capture with Debezium: A simple how-to
Posted on May 19, 2020, Level intermediate Resource Length long
Eric Deandrea wrote this piece about one question that always comes up as organizations moving towards being cloud-native, twelve-factor, and stateless: How do you get an organization's data to these new applications?
Building an adaptive, multi-tenant stream bus with Kafka and Golang
Posted on February 20, 2020, Level intermediate Resource Length medium
Back in the 2000s, SOAP/WSDL with ESB (Enterprise Service Bus) was the dominant server-side architecture for many companies. Since the 2010s, microservices and service mesh technologies have grown wildly and thus became the de-facto industry standards. By Xinyu Liu.
Exploring an Apache Kafka to Pub/Sub migration: Major considerations
Posted on January 28, 2020, Level intermediate Resource Length medium
In many cases, Google's Pub/Sub messaging and event distribution service can successfully replace Apache Kafka, with lower maintenance and operational costs, and better integration with other Google Cloud services. By Leonid Yankulin.
Why I recommend my clients NOT use KSQL and Kafka Streams
Posted on October 24, 2019, Level beginner Resource Length medium
An article by Jesse Anderson. He recommends his clients not use Kafka Streams because it lacks checkpointing. Kafka Streams also lacks and only approximates a shuffle sort. KSQL sits on top of Kafka Streams and so it inherits all of these problems and then some more.
Using graph processing for Kafka Stream visualizations
Posted on September 9, 2019, Level intermediate Resource Length long
Article by David Allen. Focused on Graph processing for Kafka Stream visualizations. Apache Kafka® is great when one needs to dealing with streams, allowing you to conveniently look at streams as tables. Stream processing engines like KSQL furthermore give you the ability to manipulate all of this fluently.
Create your first AWS Lambda using Rust
Posted on December 6, 2018, Level intermediate Resource Length short
Blog post by Konstantin Kostov about how he created serverless function in Rust programming language and deployed it to AWS. It was an example AWS Lambda function tasked with checking if a provided serial number is correct and that it is unique (not already part of an existing dataset).
Parsing logs 230x faster with Rust
Posted on November 10, 2018, Level intermediate Resource Length medium
Andre Arko blog post about dealing with logs for very busy web application behind RubyGems.org. A single day of request logs was usually around 500 gigabytes on disk. They tried few hosted logging products, but at their volume they can typically only offer a retention measured in hours. The only thing they could think of to do with the full log firehose was to run it through gzip -9 and then drop it in AWS S3.
JVM Profiler: open source tool for tracing distributed JVM applications at scale
Posted on October 14, 2018, Level advanced Resource Length long
Bo Yang, Nan Zhu, Felix Cheung, Xu Ning from Uber Engineering team published blog post about JVM Profiles. Data is at the heart of strategic decision-making process at Uber. Right sizing the resources allocated to Spark applications and optimizing the operational efficiency of Uber data infrastructure requires fine-grained insights about these systems, namely their resource usage patterns.
Apache Kafka is not for event sourcing
Posted on February 1, 2018, Level beginner Resource Length medium
Jesper Hammarbäck article in which he argues why Kafka is not the best tool for event sourcing. Kafka is a great tool for delivering messages between producers and consumers and the optional topic durability allows you to store your messages permanently. Forever if you'd like.
Sherlock: Near real time search indexing for commerce site
Posted on December 30, 2017, Level beginner Resource Length long
Prasanna Ranganathan from Flipkart published article about building a world-class e-commerce discovery experience through search. The dynamic nature of e-commerce poses unique challenges — stock units, availability, pricing, catalog data, etc. can all change at a very high rate and the system needs to keep up with the latest data lest the customer be disappointed.
Apache Kafka exactly-once processing explained
Posted on August 15, 2017, Level intermediate Resource Length medium
Adam Warski blog post explaining real time processing with Apache Kafka and what its' new major feature - exactly-once semantics - really means.