Trying out Containerized Applications on Apache Hadoop YARN 3.1

Click for: original source

Shane Kumpf & Vinod Kumar Vavilapalli & Saumitra Buragohain from Hortonworks wrote series of articles about Hadoop. This is the 5th blog of this seres and in this blog, they will explore running Docker containers on YARN for faster time to market and faster time to insights for data intensive workloads at scale.

Apache Hadoop YARN community has been hard at work enabling the building blocks for running a diverse set of data intensive workloads at unprecedented scale. The rich scheduling and multi tenancy features that are already present in YARN combined with containerization, open up many more use cases.

Hadoop has three core components, plus ZooKeeper if you want to enable high availability:

  • Hadoop Distributed File System(HDFS)
  • MapReduce
  • Yet Another Resource Negotiator(YARN)

YARN (Yet Another Resource Negotiator) is the framework responsible for assigning computational resources for application execution.

In the article you will find instructions how to go about:

  • Setting up yarn resource manager & scheduler
  • Setting up yarn containerization on the NodeManager
  • Setting up yarn NodeManager – yarn-site.xml
  • Setting up yarn container-executor binary (container-executor.cfg)
  • Running containerized distributed shell

Detailed tutorial, which gets you started with dockerized Hadoop. Security considerations are also mentioned. Good read!

[Read More]

Tags big-data data-science database