Hadoop Administrator Interview Questions

July 11, 2016 by S4

Filed under Hadoop

Last modified December 1, 2016

Hadoop Administrator Interview Questions


  1. Can you describe about your current roles and responsibility or day to day activity.
  2. Please describe the YARN Architecture.
  3. What is Name Node Heap memory and how we can configure heap memory.
  4. What do you install Hadoop cluster, please describe in detail, what are the service and component you install during Hadoop installation
  5. How do you enable repository during installation and what details you provide there
  6. What is Meta store and how do you connect it

7, if hive Meta store service is down, then what will be impact on Hadoop cluster.

  1. Do we install Hive service on every nodes in Hadoop Cluster?
  2. What is beeline?
  3. What is hiveserver2?
  4. How do you connect hive through beeline
  5. What is thrift client?
  6. What is job tracker and Resource Manager?
  7. What is the use of ZooKeeper services and why we need it?
  8. How do you troubleshoot if Name Node is down in Hadoop version 1 and also on Hadoop Version 2?
  9. How do you troubleshoot if some services are down in Hadoop cluster?
  10. How do you troubleshoot slow running job.
  11. What are the benefit of using YARN?
  12. Is it possible to run MRV1 and MRV2 run on single cluster?
  13. What is FIFO scheduler?

21, what is Capacity scheduler

  1. Difference between FIFO and Capacity scheduler
  2. How do you executer job on cluster using FIFO scheduler
  3. How do you identify a long running job in a large busy cluster?
  4. How do you kill Hadoop job, if the cluster is configured with capacity scheduler.
  5. What is Kerberos realm, how do you define it.
  6. How do you define and create a Kerberos principle
  7. How do you add new user in Hadoop cluster.
  8. How do you define permissions to user for particular directory in Hadoop Cluster?
  9. How do we decide the heap memory limit for a Hadoop?
  10. How do we decide the heap memory limit for Name Node?
  11. How do you increase the Name node heap memory?
  12. What is Standby Name Node and what is High availability Hadoop cluster?
  13. How do you resolve connectivity issue of Active Name Node and Standby Name node and what will be the impact on Hadoop cluster and will the standby Name Node try to become active.
  14. Few Data node is running slow. What will be the impact on the job which is running on those data node and what will be the impact on overall cluster performance.
  15. What is the difference between dead node and blacklist node and how node becomes blacklist node?
  16. How Name Node decide which Node is dead.
  17. What is speculative execution? What it does?
  18. How do you schedule jobs in Hadoop cluster?
  19. Which version of MapReduce you are using.
  20. Difference between MapReduce version one and MapReduce version two.
  21. How do you identify a long running job and how do you troubleshoot that
  22. How do you kill job.
  23. How do you add a service or install a component in existing Hadoop cluster.
  24. How do you restart the Name Node?
  25. How do you add or remove data node in Hadoop cluster, what are the steps and what files you edit for it.
  26. What is Hive and what are the work you have done on Hive.
  27. What is Oozie and how do you use in it.
  28. What are the schedulers available in Hadoop?
  29. When you submit a spark job in Hadoop 2.x. how spark interact with YARN, how resources are negotiated with SPARK in YARN.
  30. What is spark context? What is the use of it?
  31. Why spark job can run only in Hadoop 2.x not in 1.x
  32. What is default YARN scheduler?
  33. How jobs are gets scheduled in YARN. Which component is responsible for it? How container do the resource allocation in YARN
  34. If you submit a SPARK job in Hadoop cluster, how container do the resource negotiation for SPARK job
  35. How do you troubleshoot if data node is down, what are the logs file you check.
  36. How do you increase storage capacity of Hadoop Cluster?
  37. What happens after adding new data node in Hadoop cluster?
  38. What is balancer, how do you schedule it.
  39. You try to login on a machine of your cluster and you are getting timeout exception. What could be the issue for it? What will be your steps to resolve it?
  40. How do you start the process in Linux?
  41. in which case speculative exception in not beneficial
  42. When we run a MapReduce job, what are the process involved in Mapper side? Before going to reducer


There are some more questions, Please keep visiting to get update.

Leave a Comment