// Introduction to Apache Spark and its Datasets / codeisgo.com

In this article, we will introduce you to the big data ecosystem and the role of Apache Spark in Big data. We will also cover the Distributed database system, the backbone of big data. In today’s world, data is the fuel. Almost every electronic device collects data that is used for business purposes. By Abhishek Jaiswal.

The article also discusses Resilient Distributed Dataset (RDD) and Transformations and actions:

What is Apache Spark?
Apache Spark Architecture
Spark RDDs can’t be modified only can be replaced
Spark RDDs are lazy evaluated, which helps in data integrity and doesn’t let data corrupt
Spark Supports distributed SQL that is built on top of RDDs
Spark Supports various machine learning models, including CNN as well as NLPs

Spark Architecture contains a driver node, context reader, and node manager. Spark works in a distributed manner, the same as Hadoop, but alike Hadoop, it uses In-memory computation instead of disk. Good read!

[Read More]

Tags big-data data-science database miscellaneous