Tag: Apache
-
Real-time data linkage via Linked Data Event Streams
Posted on April 12, 2023, Level intermediate Resource Length long
Real-time interchanging data across domains and applications is challenging; data format incompatibility, latency and outdated data sets, quality issues, and lack of metadata and context. A Linked Data Event Stream (LDES) is a new data publishing approach which allows you to publish any dataset as a collection of immutable objects. The focus of an LDES is to allow clients to replicate the history of a dataset and efficiently synchronize with its latest changes. By towardsai.net.
Tags data-science streaming performance how-to big-data apache
-
Deploy Apache Flink cluster on Kubernetes
Posted on March 11, 2023, Level intermediate Resource Length medium
When it comes to deploying Apache Flink on Kubernetes, you can do it in two modes, either session cluster or job cluster. A session cluster is a running standalone cluster that can run multiple jobs, while a Job cluster deploys a dedicated cluster for each job. By Elvis David.
Tags apache devops cloud data-science big-data
-
How to orchestrate an ETL Data Pipeline with Apache Airflow
Posted on March 10, 2023, Level intermediate Resource Length medium
Data Orchestration involves using different tools and technologies together to extract, transform, and load (ETL) data from multiple sources into a central repository. By Aviator Ifeanyichukwu.
Tags apache database nosql data-science python big-data
-
Using Apache Kafka to process 1 trillion inter-service messages
Posted on January 27, 2023, Level intermediate Resource Length long
Cloudflare has been using Kafka in production since 2014. We have come a long way since then, and currently run 14 distinct Kafka clusters, across multiple data centers, with roughly 330 nodes. Between them, over a trillion messages have been processed over the last eight years. By Matt Boyle.
Tags event-driven apache apis app-development database
-
Postgres: Better message queue than Kafka?
Posted on October 13, 2022, Level intermediate Resource Length long
Today author is going to talk about why they made the unconventional decision to build thier logging system on top of Postgres, what worked well, what didn't work well, and how they did it. By Pete Hunt.
Tags apache sql app-development database messaging
-
Uber freight carrier metrics with near-real-time analytics
Posted on September 7, 2022, Level beginner Resource Length long
Uber Freight has been around since 2016 and is dedicated to provide a platform to seamlessly connect shippers with carriers. We're simplifying the lives of trucking companies by providing a platform for carriers to browse through all available shipment opportunities with upfront pricing and book with the tap of a button, and making the fulfillment process more scalable and efficient. By Ujwala Tulshigiri, Yeqing Lu, Ting Chen, Branden Colen.
Tags data-science apache event-driven messaging distributed devops
-
How to visualize your Apache Kafka data the easy way with stream lineage
Posted on July 24, 2022, Level intermediate Resource Length long
Understanding how data flows and is transformed across the different layers of an organization's application and data stack is one of the most challenging governance problems companies are facing today. Who is producing data? By David Araujo and Julia Peng.
Tags app-development messaging apis apache streaming
-
Building a Data Lake on Google Cloud Platform
Posted on March 17, 2022, Level beginner Resource Length medium
When creating a platform, it's critical to have clearly defined customers and products that will benefit from it rather than building in a vacuum. By Javier Turegano Director, Software Engineering @slack.
Tags cloud analytics big-data data-science gcp apache
-
Building Real-Time ETL Pipelines with Apache Kafka
Posted on February 17, 2022, Level beginner Resource Length short
Whether you're a data engineer, a data scientist, a software developer, or someone else working in the field of software and data - it's very likely that you have implemented an ETL pipeline before. By Stefan Sprenger.
Tags apache database queues messaging big-data
-
Streaming analytics with Apache Pulsar and Spark structured streaming
Posted on February 12, 2022, Level beginner Resource Length long
Apache Pulsar, a promising new toolkit for distributed messaging and streaming. In this piece we combine two of our favorite pieces of tech: Apache Pulsar and Apache Spark. By Daniel Ciocîrlan.
Tags queues messaging big-data apache cio cloud analytics
-
Comparing the best web servers: Caddy, Apache, and Nginx
Posted on October 23, 2021, Level intermediate Resource Length medium
A web server is a piece of software that accepts a network request from a user agent, typically a web browser, and returns either the appropriate response for the request or an error message. Two dominant solutions for HTTP servers today are Apache and Nginx. However, a new player in the space, Caddy Web Server, is gaining traction for its ease of use. By Ayooluwa Isaiah.
Tags servers devops microservices app-development apache nginx
-
Processing time-series data with Redis and Apache Kafka
Posted on June 22, 2021, Level beginner Resource Length medium
Learn how to analyze time-series data through RedisTimeSeries with Apache Kafka in this practical walkthrough. RedisTimeSeries is a Redis module that brings native time-series data structure to Redis. By Abhishek Gupta.
Tags app-development apache microservices nosql event-driven messaging