Storm Components

August 13, 2016 by S4

Filed under Hadoop, Strom

Last modified August 13, 2016

Storm Components

  • Topology : As on Hadoop, you run “Map-Reduce jobs”, on Storm, you will run ‘Topologies’. Key difference between both is : MapReduce job eventually finished, whereas a topology runs forever(until you kill it
  • Nimbus :  master node runs a daemon called “Nimbus” that is similar to Hadoop’s “JobTracker”. Nimbus is responsible for distributing code around the cluster, assigning tasks to machines, and monitoring for failures.
  • Supervisor :  Each worker node runs a daemon called the “Supervisor”. The supervisor listens for work assigned to its machine and starts and stops worker processes as necessary based on what Nimbus has assigned to it. Each worker process executes a subset of a topology; a running topology consists of many worker processes spread across many machines.
  • Stream : A stream is an unbounded sequence of tuples. Storm processes transforming a stream into a new stream in a distributed and reliable way. For example, transforming  tweets stream into a stream of trending topics.
  • Spout: It’s a source of streams. It reads tuple from external source and emits those as stream in the topology.
  • Bolt : It consumes input streams, does some processing, and possibly emits new streams. Complex stream transformations, like computing a stream of trending topics from a stream of tweets, require multiple steps and thus multiple bolts. Bolts can do anything from run functions, filter tuples, do streaming aggregations, do streaming joins, talk to databases etc.

 

Zookeeper cluster works as coordinator between Nimbus and supervisors. Nimbus and supervisor are stateless and fail-fast. All states are kept in Zookeeper or local disk.

Leave a Comment