// Trying out Containerized Applications on Apache Hadoop YARN 3.1 / codeisgo.com

Shane Kumpf & Vinod Kumar Vavilapalli & Saumitra Buragohain from Hortonworks wrote series of articles about Hadoop. This is the 5th blog of this seres and in this blog, they will explore running Docker containers on YARN for faster time to market and faster time to insights for data intensive workloads at scale.

Apache Hadoop YARN community has been hard at work enabling the building blocks for running a diverse set of data intensive workloads at unprecedented scale. The rich scheduling and multi tenancy features that are already present in YARN combined with containerization, open up many more use cases.

Hadoop has three core components, plus ZooKeeper if you want to enable high availability:

Hadoop Distributed File System(HDFS)
MapReduce
Yet Another Resource Negotiator(YARN)

YARN (Yet Another Resource Negotiator) is the framework responsible for assigning computational resources for application execution.

In the article you will find instructions how to go about:

Setting up yarn resource manager & scheduler
Setting up yarn containerization on the NodeManager
Setting up yarn NodeManager – yarn-site.xml
Setting up yarn container-executor binary (container-executor.cfg)
Running containerized distributed shell

Detailed tutorial, which gets you started with dockerized Hadoop. Security considerations are also mentioned. Good read!

[Read More]

Tags big-data data-science database