Before explaining the differences, those who dont know what are YARN and Zookeeper, first we briefly define them and why should we use them while working in Hadoop.
YARN is fast, reliable and secure dependency management system in Hadoop-2 architecture. Some major features of YARN,
- Yarn caches every package it downloads so it never needs to download it again. It also parallelizes operations to maximize resource utilization so install times are faster than ever.
- Yarn uses checksums to verify the integrity of every installed package before its code is executed.
- If you’ve installed a package before, you can install it again without any internet connection.
- Yarn efficiently queues up requests and avoids request waterfalls in order to maximize network utilization.
ZooKeeper is a centralized service for maintaining configuration information, naming, providing distributed synchronization, and providing group services. All of these kinds of services are used in some form or another by distributed applications.
Many people ask what is the difference between YARN and Zookeeper while working in Hadoop 2. Lets discuss,
- YARN manages a cluster of nodes from the resource allocation coordination and scheduling perspective. Zookeeper is a cluster of its own, with 3 or 5 nodes, and does not manage a cluster outside of it, it just like a database superficially, it allows writes and reads, in a consistent fashion.
- YARN is the new Map Reduce daemon (MRv1) and it’s primary job is to take jobs and run them in the Hadoop cluster. So it primarily farms out and manages cluster work load. Zookeeper provides a distributed configuration service, a synchronization service and a naming registry for distributed systems. It is used by many daemons (including YARN) to manage their peers in a multiple node setup for high availability.
- With YARN, Hadoop V2’s Job Tracker has been split into a master Resource Manager and slave-based Application Master processes. It separates the major tasks of the Job Tracker: resource management and monitoring/scheduling.