// CodeIsGo.com / codeisgo.com

An introduction to the Azure DevOps toolset

Posted on February 19, 2022, Level intermediate Resource Length medium

Categories

Tags azure cloud cicd devops kubernetes servers

In this post, we start to delve into DevOps toolsets, specifically Microsoft Azure DevOps Services. This is the third in a series of blog posts about DevOps. By Ron Callahan.

Just some of the benefits of a DevOps toolset include the standardization and automation of development processes, improved collaboration within and among the teams, consolidated code repositories, work-item tracking, automated testing, and release pipelines.

Tools (or one person) will not automatically transform an IT department into a DevOps shop, but a department that has adopted the culture and organization to follow DevOps practices WILL benefit from a suite of DevOps tools as they mature.

The article then deals with:

Azure Boards
Azure Pipelines
Azure Repositories
Azure Test Plans
Azure Artifacts

Faster development cycles allow businesses to push out new features faster, allowing them to be more agile in responding to their competition and to new requests from customers. Furthermore, a tightly integrated CI/CD platform means less downtime – and less downtime equals more revenue! Good read!

[Read More]

Cloud egress charges: How to prevent these creeping costs

Posted on February 18, 2022, Level beginner Resource Length medium

Categories

Tags cio management cloud cicd

One of the advantages of using the cloud is the ability to scale rapidly. On-demand scalability can eliminate the need to overbuy capacity that is only required for peak times. By factioninc.com.

Data egress is when data leaves a network and goes to an external location. If you’re using the cloud, data egress occurs whenever your applications write data out to your network or whenever you repatriate data back to your on-premises environment. While cloud providers usually do not charge to transfer data into the cloud (“ingress”), they do charge for data egress in most situations. Another charge related to data egress fees are data transfer fees, which may be assessed when moving data between regions or availability zones within the same cloud provider.

The article pays attention to:

What is data egress?
Egress fees can hinder innovation
How are egress fees calculated?
Best practices to reduce or eliminate egress fees

Data egress fees can vary considerably. Each cloud has its own egress fee structure. Use clouds with higher egress fees for only the workloads that require the capabilities of that specific cloud. workloads that demand it. When services are comparable, always choose the less expensive option. While an older article it is a good read!

[Read More]

Building Real-Time ETL Pipelines with Apache Kafka

Posted on February 17, 2022, Level beginner Resource Length short

Categories

Tags apache database queues messaging big-data

Whether you’re a data engineer, a data scientist, a software developer, or someone else working in the field of software and data - it’s very likely that you have implemented an ETL pipeline before. By Stefan Sprenger.

ETL stands for Extract, Transform, and Load. These three steps are applied to move data from one datastore to another one. First, data are extracted from a data source. Second, data are transformed in preparation for the data sink. Third, data are loaded into a data sink. Examples are moving data from a transactional database system to a data warehouse or syncing a cloud storage with an API.

The article content is split into:

What are real-time ETL pipelines?
What are the benefits of real-time ETL pipelines?
How to implement real-time ETL with Apache Kafka

The open-source community provides most essentials for getting up and running. You can use open-source Kafka Connect connectors, like Debezium, for integrating Kafka with external systems, implement transformations in Kafka Streams, or even implement operations spanning multiple rows, such as joins or aggregations, with Kafka. Good read!

[Read More]

Plain English description of monads without Haskell code

Posted on February 16, 2022, Level beginner Resource Length short

Categories

Tags programming software-architecture learning

Monads are notorious in the programming world for their use in the Haskell programming language and for being hard to grasp. There’s even a joke that writing a “monad tutorial” is a rite of passage for new Haskellers, and been described as pointless. By Chris Done.

One of the Haskell designers3 in the 90s just came up with a class/interface that worked for all of these. As he was into category theory, he “monad”. The types also sort of match the theory if you squint hard enough.

You can re-use your intuition from existing common place chaining of things in other popular languages:

Async chains (JS)
Parser combinator chains (Rust, JS)
Optional or erroneous value chains (TypeScript, Rust)
Continuation passing style (you can do this in Lisp and JS)
Cartesian products/SQL (C#’s LINQ)

Monad is the name of the class for “and_then”,5 defined in a sensible way, with some laws for how it should behave predictably, and then a bunch of library code works on anything that implements “and_then”. Apart from F# or Haskell (or descendants), no other language embraces the abstraction with syntax so it’s hard to find a good explanation without them. It’s like explaining Lisp macros without using a Lisp, the explanation tends to be awkward and unconvincing.

Haskellers don’t like to throw exceptions, or use mutation, and functions can’t return early, etc. Suddenly Monad and syntactic sugar for it looks pretty attractive to them. Nice one!

[Read More]

How to create your own Google Chrome extension

Posted on February 15, 2022, Level intermediate Resource Length medium

Categories

Tags browsers javascript web-development app-development

If you are a Google Chrome user, you’ve probably used some extensions in the browser. Have you ever wondered how to build one yourself? In this article, I will show you how you can create a Chrome extension from scratch. By Sampurna Chapagain.

The article then will help you understand the following:

What is a Chrome Extension?
What will our Chrome Extension Look Like?
How To Create a Chrome Extension
Creating a manifest.json file

A chrome extension is a program that is installed in the Chrome browser that enhances the functionality of the browser. You can build one easily using web technologies like HTML, CSS, and JavaScript. As we discussed earlier, building a Chrome extension is similar to building any web application. The only difference is that the Chrome extension requires a manifest.json file where we keep all the configurations. Good read!

[Read More]

The file system access API with Origin Private File System

Posted on February 14, 2022, Level beginner Resource Length medium

Categories

Tags browsers javascript web-development cio

It is very common for an application to interact with local files. For exampe, a general workflow is opening a file, making some changes, and saving the file. By Sihui Liu.

For web apps, this might be hard to implement. It is possible to simulate the file operations using IndexedDB API, an HTML input element with the file type, an HTML anchor element with the download attribute, etc, but that would require a good understanding of these standards and careful design for a good user experience. Also, the performance may not be satisfactory for frequent operations and large files.

The article then describes:

Origin Private File System
Persistence
Browser Support
API
Examples

The API is currently unavailable for Safari windows in Private Browsing mode. For where is it available, its storage lifetime is the same as other persistent storage types like IndexedDB and LocalStorage. The storage policy will conform to the Storage Standard. Safari users can view and delete file system storage for a site via Preferences on macOS or Settings on iOS. Nice one!

[Read More]

OPC UA, MQTT, and Apache Kafka - The Trinity of data streaming in IoT

Posted on February 13, 2022, Level beginner Resource Length long

Categories

Tags queues messaging cloud analytics

In the IoT world, MQTT and OPC UA have established themselves as open and platform-independent standards for data exchange in Industrial IoT and Industry 4.0 use cases. Data Streaming with Apache Kafka is the data hub for integrating and processing massive volumes of data at any scale in real-time. By Kai Waehner.

Machine data must be transformed and made available across the enterprise as soon as it is generated to extract the most value from the data. As a result, operations can avoid critical failures and increase the effectiveness of their overall plant.

Decision tree for evaluating IoT protocols

Source: https://www.kai-waehner.de/blog/2022/02/11/opc-ua-mqtt-apache-kafka-the-trinity-of-data-streaming-in-industrial-iot/

The article then describes:

Kappa architecture for a real-time IoT data hub
When to use Kafka vs. MQTT and OPC UA?
Meeting the challenges of Industry 4.0 through data streaming and data mesh
Separation of concerns in the OT/IT world with domain-driven design and true decoupling
How to choose between OPC UA and MQTT with Kafka?
Decision tree for evaluating IoT protocols
Integration between MQTT / OPC UA and Kafka
BMW case study: Manufacturing 4.0 with smart factory and cloud

… and much more. An event-driven data streaming platform is elastic and highly available. It represents an opportunity to increase production facilities’ overall asset effectiveness significantly. With the help of data processing and integration capabilities, data streaming complements machine connectivity via MQTT, OPC UA, HTTP, among others. This allows streams of sensor data to be transported throughout the plant and to the cloud in near real-time. Nice one!

[Read More]

Streaming analytics with Apache Pulsar and Spark structured streaming

Posted on February 12, 2022, Level beginner Resource Length long

Categories

Tags queues messaging big-data apache cio cloud analytics

Apache Pulsar, a promising new toolkit for distributed messaging and streaming. In this piece we combine two of our favorite pieces of tech: Apache Pulsar and Apache Spark. By Daniel Ciocîrlan.

Apache Pulsar excels at storing event streams and performing lightweight stream computing tasks. It’s a great fit for long term storage of data and can also be used to store results to some downstream applications.

Stream processing is an important requirement in modern data infrastructures. Companies now aim to leverage the power of streaming and real-time analytics in order to provide results faster to their users in order to enhance the user experience and drive business value. Typically, streaming data pipelines require a streaming storage layer like Apache Pulsar or Apache Kafka, and then in order to perform more sophisticated stream processing tasks we need a stream compute engine like Apache Flink or Spark Structured Streaming.

The article main points are:

The role of Apache Pulsar in streaming data pipelines
Example use case: Real-time user engagement
Using the Apache Pulsar/Spark Connector

In this article we discussed the role of Apache Pulsar as a backbone of a modern data infrastructure, the streaming use cases Pulsar can support, and how you can use it along with Spark Structured Streaming to implement some more advanced stream processing use cases by leveraging the Pulsar Spark Connector. We also reviewed a real world use case, demonstrated a sample streaming data pipeline, and examined the role of Apache Pulsar and Spark Structured Streaming within the pipeline. Good read!

[Read More]

Right hybrid cloud strategy enables agility at scale

Posted on February 11, 2022, Level beginner Resource Length long

Categories

Tags big-data agile cio cloud ibm

In today’s world, there’s a common thread connecting almost every organization, of every size, across all industries and regions: uncertainty. Change—often disruptive—is happening faster. or the organizations trying to navigate it, the need for business agility—the ability to adapt rapidly and effectively—has never been more important. By @IBM.

The article then dives right in:

Need for agility and threat of complexity
Why hybrid’s time is now
How IBM’s open hybrid cloud strategy stands apart
Unlocking value through hybrid cloud
Why hybrid cloud matters to you
Open hybrid cloud solutions in action

Maybe you’ve already recognized the looming challenges - in terms of orchestration, inflexibility and security—and you’ve taken the first steps toward either doing it yourself (DIY) or going with a provider. Perhaps the most compelling reason to resist the DIY path is the sheer amount of resources an enterprise needs to commit to building and sustaining a homegrown hybrid cloud platform. Talent—in the form of engineers experienced in open-source development—is the main gating factor. Nice one!

[Read More]

Six steps for leading successful data science teams

Posted on February 10, 2022, Level beginner Resource Length long

Categories

Tags big-data analytics cio data-science

An increasing number of organizations are bringing data scientists on board as executives and managers recognize the potential of data science and artificial intelligence to boost performance. But hiring talented data scientists is one thing; harnessing their capabilities for the benefit of the organization is another. By Rama Ramakrishnan.

The article main parts:

Point data science teams toward the right problem
Decide on a clear evaluation metric up front
Create a common-sense baseline first
Manage data science projects more like research than like engineering
Check for truth and consequences
Log everything, and retrain periodically

It is important to subject results to intense scrutiny to make sure the benefits are real and there are no unintended negative consequences. The most basic check is making sure the results are calculated on data that was not used to build the models. Data science models, like software in general, tend to require a great deal of future effort because of the need for maintenance and upgrades. They have an additional layer of effort and complexity because of their extraordinary dependence on data and the resulting need for retraining. Nice one!

[Read More]