Welcome to curated list of handpicked free online resources related to IT, cloud, Big Data, programming languages, Devops. Fresh news and community maintained list of links updated daily. Like what you see? [ Join our newsletter ]

How to perform K-means clustering with Python in Scikit?

Categories

Tags python data-science analytics big-data

While deep learning algorithms belong to today’s fashionable class of machine learning algorithms, there exists more out there. Clustering is one type of machine learning where you do not feed the model a training set, but rather try to derive characteristics from the dataset at run-time in order to structure the dataset in a different way. It’s part of the class of unsupervised machine learning algorithms. By Christian Versloot.

k-means clustering is a method (…) that aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean (cluster centers or cluster centroid), serving as a prototype of the cluster.

Wikipedia

The article dives into:

  • What is K-means clustering?
  • Introducing K-means clustering
  • The K-means clustering algorithm
  • Inertia / Within-cluster sum-of-squares criterion
  • On convergence of K-means clustering
  • The drawbacks of K-means clustering – when is it a bad choice?
  • Implementing K-means clustering with Python and Scikit-learn
  • Generating convex and isotropic clusters
  • Applying the K-means clustering algorithm
  • Full model code
  • Results

In this blog post, we looked at K-means clustering with Python and Scikit-learn. You will also get good explanation of Python code, links to further reading and resources together with some video talks explaining the concepts and science behind the article. Nice one!

[Read More]

Functional error handling with Express.js and DDD | Enterprise Node.js + TypeScript

Categories

Tags nodejs javascript frontend web-development

How to expressively represent (database, validation and unexpected) errors as domain concepts using functional programming concepts and how to hook those errors up to our Express.js base controller. By Khalil Stemmler, a developer advocate at Apollo GraphQL.

In most programming projects, there’s confusion as to how and where errors should be handled. Do I throw an error and let the client figure out how to handle it? Do I return null?

When we throw errors, we disrupt the flow of the program and make it trickier for someone to walk through the code, since exceptions share similarities with the sometimes criticized GOTO command.

And when we return null, we’re breaking the design principle that “a method should return a single type”. Not adhering to this can lead to misuse of our methods from clients.

The article the deals with:

  • Why expressing errors explicitly is important to domain modeling
  • How to expressively represent errors using types
  • How to and why to organize all errors by Use Cases
  • How to elegantly connect errors to Express.js API responses

This is long article with detailed explanation of the code and design and you also get access to GitHub repository. Great read!

[Read More]

5 Useful jq commands to parse JSON on the CLI

Categories

Tags json big-data data-science programming software

JSON has become the de facto standard data representation for the web. It’s lightweight, human-readable (in theory) and supported by all major languages and platforms. However, working on the CLI with JSON is still hard using traditional CLI tooling. By Fabian Keller.

Lucky, there is jq, a command-line JSON processor. jq offers a broad range of operations to transform and manipulate JSON based data structures from the command line. Looking at the documentation however reveals an overwhelmingly huge number of options, functions and things you can do with jq. This blog post shows 5 useful jq commands that you really need to know.

The article then describes:

  • The GitHub events API as example
  • Exploring the data structure
    • Extract a specific element from an array
    • Extract a specific key for all elements
    • Filter by specific value
    • Extracting all JSON paths
    • Deep text search

This blog post has shown some basic and advanced jq commands to explore and thrive with JSON on the command line. jq is very powerful and a reliable companion as soon as there is the first bit of JSON data on the CLI. Exciting!

[Read More]

JAMstack step by step tutorial to create a website with just clicks and no code at all for free

Categories

Tags nodejs javascript web-development

This JAMstack tutorial will show you how to create a JAMstack website with just clicks, no code and for $0. It will detail how to set up a JAMstack website step by step with 30+ screenshots and 2000+ words. It will involve using a git-based CMS service to edit your content easily. Let’s get started. By Geshan Manandhar.

If JAMstack is something new for you have a look at my previous post detailing what is JAMstack and some of its technical aspects. For this tutorial following are the prerequisites:

  • You must have a working email address (a no brainier, still good to be explicit)
  • Knowledge of markdown would be beneficial
  • Knowledge of a static site generator like Hugo would help
  • Previous know-how of using any Content Management System (CMS) like Drupal or Wordpress would be great

You will need to register for the 4 (or less) online services to get your JAMstack website up and running. The good news is all of them have a free plan so your website will have a recurring running cost of exactly $0, hurray!

  • Github - To host the code, probably you already have a Github account :)
  • Netlify - CDN to host the website, it will be fast as it will be static files mainly
  • Forestry - Git-based Content Management System (CMS) service to edit the JAMStack website content, your content changes will reflect on the website in a couple of minutes. Still fast for a JAMstack website
  • Stackbit - Service to manage above 3 and glue all of them together to bring your JAMstack website to life

This is just scratching the surface, now you can show your running website to people but do remember to optimize it well before going live. Good read!

[Read More]

Serverless and Knative: Knative Serving

Categories

Tags kubernetes devops app-development software-architecture containers

In this article author will cover Knative Serving, which is responsible for deploying and running containers, also networking and auto-scaling. Auto-scaling allows scale to zero and is probably the main reason why Knative is referred to as Serverless platform. By haralduebele.

Knative uses new terminology for its resources. They are:

  • Service: Responsible for managing the life cycle of an application/workload. Creates and owns the other Knative objects Route and Configuration.
  • Route: Maps a network endpoint to one or multiple Revisions. Allows Traffic Management.
  • Configuration: Desired state of the workload. Creates and maintains Revisions.
  • Revision: Specific version of a code deployment. Revisions are immutable. Revisions can be scaled up and down. Rules can be applied to the Route to direct traffic to specific Revisions.

Knative terminology and resources

Source: https://knative.dev/

Knative workloads: In contrast to general-purpose containers(Kubernetes), stateless request-triggered (i.e. on-demand) autoscaled containers have the following properties:

  • Little or no long-term runtime state (especially in cases where code might be scaled to zero in the absence of request traffic)
  • Logging and monitoring aggregation (telemetry) is important for understanding and debugging the system, as containers might be created or deleted at any time in response to autoscaling
  • Multitenancy is highly desirable to allow cost sharing for bursty applications on relatively stable underlying hardware resources

To lern more read the article in full. You will also get access to code repository with example code. Excellent for anybody migrating from Kubernetes to Knative.

[Read More]

Build your first data warehouse with Airflow on GCP

Categories

Tags google cloud gcp big-data cio data-science

What are the steps in building a data warehouse? What cloud technology should you use? How to use Airflow to orchestrate your pipeline? By Tuan Nguyen.

In this project, we will build a data warehouse on Google Cloud Platform that will help answer common business questions as well as powering dashboards. You will experience first hand how to build a DAG to achieve a common data engineering task: extract data from sources, load to a data sink, transform and model the data for business consumption.

The article is split into:

  • Why Google Cloud Platform?
    • Cost
    • Ease of use
  • Business objective
  • The dataset
  • Data modeling
  • Architecture
  • Set up the infrastructure
  • Data pipeline

… and much more. Author will walk through the many steps of designing and deploying a data warehouse in GCP using Airflow as an orchestrator. You will also get source code which you can reference in GitHub repo.Excellent for anybody in data science!

[Read More]

How does public key encryption work? | Public key cryptography and SSL

Categories

Tags infosec web-development cloud devops

Public key encryption, also known as asymmetric encryption, uses two separate keys instead of one shared one: a public key and a private key. Public key encryption is an important technology for Internet security. By cloudflare.com.

Public key encryption, or public key cryptography, is a method of encrypting data with two different keys and making one of the keys, the public key, available for anyone to use. The other key is known as the private key. Data encrypted with the public key can only be decrypted with the private key, and data encrypted with the private key can only be decrypted with the public key. Public key encryption is also known as asymmetric encryption. It is widely used, especially for TLS/SSL, which makes HTTPS possible.

Public key encryption is extremely useful for establishing secure communications over the Internet (via HTTPS). A website’s SSL/TLS certificate, which is shared publicly, contains the public key, and the private key is installed on the origin server – it’s “owned” by the website.

Public key encryption

Source: @cloudflare https://www.cloudflare.com/learning/ssl/how-does-public-key-encryption-work/ Instead of one key, two keys go with this lock: Key No. 1 can only turn to the left; Key No. 2 can only turn to the right.

Public key cryptography can seem complex for the uninitiated; fortunately a writer named Panayotis Vryonis came up with an analogy that works on example of a trunk with a lock that two people, Bob and Alice, use to ship documents back and forth. To learn moore, read this interesting article in full. Great read!

[Read More]

Conversational AI updates that help you build sophisticated and personalized experiences

Categories

Tags big-data robotics iot azure machine-learning

Now, more than ever, developers need to respond to the rapidly increasing demand from customers for support and accurate information - meeting them where they are – any time of the day and on an expanding range of platforms and devices. Azure AI has met unprecedented demand, underpinning over 1500 Covid-19 related bots via the Microsoft Health Bot service alone, in addition to the over 1.25 billion messages per month already handled by Azure Bot Service. By GaryPrettyMsft.

Bot Framework Composer is a new open source, visual authoring canvas for developers to design and build conversational experiences. Composer focuses the bot creation process more on conversation design and less on the scaffolding required to begin building awesome bots. Composer easily brings together the common components required to build bots such as the ability to define Language Understanding models, integrate with QnA Maker and build sophisticated composition of bot replies using Language Generation.

Composer also supports building Bot Framework Skills (bots that can perform a set of tasks for another bot allowing for re-usability and componentising bot solutions as their complexity and surface area increases. Skills built with Composer can be consumed by other bots built with Composer or using the Bot Framework SDK, as well as from Power Virtual Agents.

Azure Cognitive Services brings AI within reach of every developer—without requiring machine-learning expertise. At Build 2020, we made several announcements related to new features and improvements across the Cognitive Services used within the Conversational AI eco-system. Nice one!

[Read More]

Scaling up with Elixir

Categories

Tags elixir programming functional-programming erlang

It’s exciting when your website or platform gets a spike in traffic – that is, unless it fails. A disappointing web experience can quickly encourage new users to look elsewhere for an alternative that is faster and more responsive to their needs. By Nathan Long.

One important factor is how your code handles concurrency – that is, its ability to juggle multiple tasks at once (read more about concurrency here). And as many companies have shown, for concurrent programming, you can’t beat the Erlang virtual machine.

For example, you may have heard that 90% of all internet traffic goes through Erlang-controlled nodes, with Cisco alone shipping 2 million devices a year that use Erlang. Or that WhatsApp scaled to serve 900 million users with an Erlang service written by just 50 engineers.

The article deals with:

  • Concurrency-Oriented Programming
  • Non-blocking IO
  • Non-blocking computation
  • Error isolation: if an error occurs in one process, it can fail or be restarted without other processes being affected at all
  • Superpowers applied
  • Simpler stacks

Phoenix (web framework) has a tiny memory footprint and can happily run on a Raspberry Pi Zero as the interface of a Nerves project. To be clear, Elixir is no silver bullet. Writing scalable software always requires thought and care. You will also get plenty of links to further reading. Good read!

[Read More]

The nature of machine learning projects

Categories

Tags machine-learning big-data data-science

Michael Ohlsson article about building a data-driven product. Building a data-driven product differs in many ways from how one would create a more conventional software product. A machine learning system is still a software system, but the process to develop the system is different.

These differences are very important to understand for all the stakeholders and it is key to have a common view on this for a project to be successful. With this post author will briefly try to explain the machine learning process and why it calls for a different approach and mindset. In order to adopt this mindset, the most important thing to understand is that developing a machine learning system is much like a scientific process instead of a traditional software development process. However, the whole solution still requires a lot of software engineering practices.

In traditional programming, you write down all the rules that the program needs to have for it to perform and accomplish a specific task and produce the desired result. The program takes some data as input and this data is then processed as stated by the rules, and will hopefully, in the end, return the correct result. On the contrary, a machine learning system is instead trained rather than being programmed explicitly. The input to such a system is not just the data but also the expected result for that data and the output will then be a set of rules (this is also called a model in machine learning vocabulary).

Machine learning is not like any other technology and to boil down all this to its core components we could consider a few important rules:

  • Create a common ground of understanding, this will ensure the right mindset
  • State early how progress should be measured
  • Communicate clearly how different machine learning concepts works
  • Acknowledge and consider the inherited uncertainty, it is part of the process

You will also get resources to further reading. Great read!

[Read More]