Welcome to curated list of handpicked free online resources related to IT, cloud, Big Data, programming languages, Devops. Fresh news and community maintained list of links updated daily. Like what you see? [ Join our newsletter ]

Multi-dimensional approach helps you proactively prepare for failures: Application layer

Categories

Tags software-architecture devops microservices performance app-development queues

Resiliency of applications surpasses everything else in building customer trust. Because of this, it cannot be an afterthought. Instead of simply reacting to a failure, why not be proactive?. By Piyali Kamra, Aish Gopalan, Isael Pimentel, and Aditi Sharma.

Relationship between loose coupling and availability

Source: https://aws.amazon.com/blogs/architecture/a-multi-dimensional-approach-helps-you-proactively-prepare-for-failures-part-1-application-layer/

As your system expands, you’ll likely encounter issues that can hinder your ability to scale, like security and cost. So, it’s necessary to think about the correct architectural patterns beforehand to minimize your chances of enduring a failure without a recovery plan.

The inetresting parts in the article:

  • Example use case
  • Pattern 1: Microservices
  • Pattern 2: Saga pattern
  • Pattern 3: Event-driven architecture
  • Pattern 4: Cache pattern
  • Improving application resiliency with bounded contexts

Distributed systems can use all of these patterns, which helps improve resiliency at the application layer. Applying these patterns, along with the Infrastructure and Operations improvements will provide frameworks for resilient applications. Nice one!

[Read More]

Visualize Microservice dependencies in a team context

Categories

Tags devops microservices teams app-development

The moment we introduce strong coupling between our services, we lose the potential advantages of a microservice architecture. This article addresses the challenge by introducing the concept of a Change Coupling analysis. Change Coupling is a behavioral code analysis technique that uncovers logical dependencies across services and team boundaries. Let’s see it in action. By Adam Tornhill.

Change Coupling means that two (or more) modules repeatedly change together over time.

Source: https://codescene.com/blog/visualize-microservice-dependencies-in-team-context/

The main concepts discussed:

  • Prioritize dependencies that cross team boundaries
  • Introducing change coupling: uncover logical dependencies
  • Automated change coupling discovery
  • Adding the team dimension

Microservices is a high-discipline architecture where loose dependencies are key. Failing that, we’ll face a situation where it gets hard for a team to operate in an autonomous way. Just like we expect alerts from our production environment, we really should monitor the key architectural properties so that we can act upon dependencies early. Nice one!

[Read More]

Introduction to Apache Spark and its Datasets

Categories

Tags big-data data-science database miscellaneous

In this article, we will introduce you to the big data ecosystem and the role of Apache Spark in Big data. We will also cover the Distributed database system, the backbone of big data. In today’s world, data is the fuel. Almost every electronic device collects data that is used for business purposes. By Abhishek Jaiswal.

The article also discusses Resilient Distributed Dataset (RDD) and Transformations and actions:

  • What is Apache Spark?
  • Apache Spark Architecture
  • Spark RDDs can’t be modified only can be replaced
  • Spark RDDs are lazy evaluated, which helps in data integrity and doesn’t let data corrupt
  • Spark Supports distributed SQL that is built on top of RDDs
  • Spark Supports various machine learning models, including CNN as well as NLPs

Spark Architecture contains a driver node, context reader, and node manager. Spark works in a distributed manner, the same as Hadoop, but alike Hadoop, it uses In-memory computation instead of disk. Good read!

[Read More]

Expanding the CAP tradeoff frontier at scale

Categories

Tags database big-data data-science performance devops

Distributed systems must balance their needs for high availability and low latency with consistency guarantees; providing a mostly hit happy path for requests enables these systems to push the boundaries of this tradeoff. By Audrey Cheng.

At scale, there are three particular concerns a distributed system must address when providing a mostly hit happy path:

  • Performance isolation by ensuring that clients do not affect each other
  • Hotspot tolerance so that hot keys do not affect availability
  • Bounding the worst-case since tail latency can slow overall user interaction. We provide examples below on how the three systems tackle these issues

Given challenging, real-world workloads, any performance degradation resulting from enforcing stronger consistency guarantees for one client should not affect others. Insulating performance effects is also crucial to protecting against misbehaving users.

Ultimately, finding the right balance between availability / latency and consistency depends on application needs. To push the limits of what can be achieved in this tradeoff, distributed systems should identify and leverage local information to provide a mostly hit happy path for requests. This approach can be especially impactful at global scale. Nice one!

[Read More]

Benchmarking time series workloads on Apache Kudu using TSBS

Categories

Tags analytics big-data data-science performance devops

Since the open-source introduction of Apache Kudu in 2015, it has billed itself as storage for fast analytics on fast data. This general mission encompasses many different workloads, but one of the fastest-growing use cases is that of time-series analytics. By Todd Lipcon.

In this blog post, we’ll evaluate Kudu against three other storage systems using the Time Series Benchmark Suite (TSBS), an open-source collection of data and query generation tools representing an IT operations time-series workload.

The article then covers:

  • Kudu-TSDB architecture
  • Benchmarking target systems
  • Benchmark hardware
  • Benchmark setup
  • Results: Data loading performance
  • Results: Light queries, 8 client threads
  • Results: Light queries, 16 client threads
  • Performance on heavy queries

Although Apache Kudu is a general purpose store, its focus on fast analytics for fast data make it a great fit for time series workloads. In addition to the quantitative differences summarized above, it’s important to understand qualitative differences between the stores. In particular, Kudu and ClickHouse share the trait of being general-purpose stores, whereas VictoriaMetrics and InfluxQL are limited to time series applications. In practical terms, this means that Kudu and ClickHouse allow your time series data to be analyzed alongside other relational data in your warehouse, and to be analyzed using alternative tools such as Apache Spark, Apache Impala, Apache Flink, or Python Pandas. Good read!

[Read More]

Cloud platform teams are everywhere — here's why

Categories

Tags cloud management infosec teams devops

In HashiCorp’s new State of Cloud Strategy survey, 86% of respondents said they rely on cloud platform teams — for a wide variety of very good reasons. Organizations with complex business requirements have long sought ways to simplify operations and boost the productivity of their software development teams. It appears business and IT leaders have found an answer: adopt and empower centralized cloud platform teams. By Jared Ruckle.

Main article points:

  • What’s powering the rise of platform teams?
  • Different cloud service providers have different APIs
  • Skills shortages
  • Cultural transformation is siloed and uneven
  • Governance becomes difficult to manage
  • Business leaders trust platform teams – and expect a lot from them
  • Cloud platform teams require continued investment

In our experience working with many of the world’s largest brands, platform teams typically include engineers who provision, run, and manage cloud infrastructure and other shared services. These teams create and operate highly automated platforms available on-demand across the organization. Developers can access the platform capabilities via self-service processes, making it easy to quickly create new environments and new service instances.

But platforms are never finished, just shipped. There’s always more to be done. The Forrester Consulting study (Unlocking Multicloud’s Operational Potential) suggests platform teams are critical “to mitigate people – and process-themed challenges like skills shortages (41%) and siloed teams (35%).” Good read!

[Read More]

How to build an organizational culture that is 'cybersecurity ready'

Categories

Tags cio management infosec teams frameworks

Cyber threats are some of the biggest challenges organizations face, but cybersecurity failure is still seen as a critical short-term risk.. By Artem Nikulchenko. By Candid Wüest, Nisha Almoula, Roman Hagen @weforum.org.

Cyber risk is one of the main challenges that organizations face today. The World Economic Forum’s Global Risks Report 2022 highlights how cyber threats have intensified through digital transformation and growing digital dependency.

The article then walks you through:

  • 80% of firms have suffered a cybersecurity breach
  • Boards should prioritize cyber risks in planning
  • Strategic involvement is vital to secure assets and services
  • Cross-functional coordination can strengthen response capabilities
  • Collaboration is key to being ‘cybersecurity ready'

Most executives and board members are aware of key global cyber threats and recognize cybersecurity risk as an enterprise-wide risk, but not everyone understands the impact of these cyber risks and their economic drivers.

Cybersecurity must be a core strategic priority, and ownership and accountability for cybersecurity risk management activities must be adopted both within and outside the CISO organization. Nice one!

[Read More]

Steps to emulate k8s Pod Network

Categories

Tags cloud cio kubernetes containers devops gcp

Networking is the spine of Kubernetes, but it can be challenging to understand exactly how it is expected to work. There are 4 distinct networking problems to address. By Harinderjit Singh.

There are multiple ways to achieve the requirements laid by Kubernetes for pod networking. We can mainly differentiate between them on the basis of whether the pod network address space is part of the node pool’s subnet or the Pod network address space is separate and is not part of the node pool’s subnet. We will try to emulate the latter.

The article then explains:

  • Pod network
  • Test Configuration
  • Emulation of pod network
  • Testing the connectivity

Linux namespaces (particularly network namespaces) make it easy to implement these requirements. A network namespace is assigned to a pod as soon as it is scheduled and it is done by Kubelet. That means one network namespace for each pod. Good read!

[Read More]

Hidden gems of Google BigQuery

Categories

Tags golang app-development database miscellaneous gcp

BigQuery is amazing. It is one of my favorite tools within Google Cloud. Luckily, it looks like Google feels the same and, to the joy of BigQuery fans, keeps adding new features there. By Artem Nikulchenko.

Let’s say you push some data into BigQuery, and then another system wants to run a scheduled job to process the newly arrived data. For example, a system can try to pull data from BigQuery to another storage, or this system needs to run hourly reports based on the data, etc. In each of those cases, you would prefer to avoid processing the same records multiple times . As a result, you need a way to know which records are already processed and which were added after the processing took place.

No matter how long I have been working with BigQuery, there is always something new I discover once in a while. Today author wants to share with you the following four things:

  • AUTO column
  • Multi-statement transactions
  • Clustering
  • Indexes

As you may guess from the name, it is designed for point lookups, but not over any field. Currently, indexes can be used to easily find unique data elements that are buried in unstructured text or semi-structured JSON data. Indexes are only used when the SEARCH query is executed. Good read!

[Read More]

Shaving 40% off Google's B-Tree implementation with Go Generics

Categories

Tags golang app-development performance programming

There are many reasons to be excited about generics in Go. In this article, I’m going to show how, using Go generics, ScyllaDB achieved a 40% performance gain in an already well-optimized package, the Google B-Tree implementation. By Michal Matcczuk.

The work covered in this article was part of ScyllaDB’s long-standing partnership with the Computer Science Department at the University of Warsaw. We’ve worked on a number of projects together: integrating Parquet, an async userspace filesystem, a Kafka client for Seastar, a system for linear algebra in ScyllaDB and a design for a new Rust driver.

The article then describes and explains well:

  • Making faster B-Trees with Generics
  • The additional allocation
  • Why is it faster?

By shifting from an implementation using interfaces to one using generics, we were able to significantly improve performance, minimize garbage collection time, and minimize CPU and other resource utilization, such as heap size. Particularly with heap size, we were able to reduce HeapObjects by 99.53%. You will find all the code and performance testing described in the article as well. Very interesting!

[Read More]