Category: Hadoop

Storm Components

August 13, 2016 by S4

Filed under Hadoop, Strom

Last modified August 13, 2016

Storm Components Topology : As on Hadoop, you run “Map-Reduce jobs”, on Storm, you will run ‘Topologies’. Key difference between both is : MapReduce job eventually finished, whereas a topology runs forever(until you kill it Nimbus :  master node runs a daemon called “Nimbus” that is similar to Hadoop’s “JobTracker”. Nimbus is responsible for distributing code around the cluster, assigning tasks to machines, and monitoring for …

Installing a Storm Cluster

August 13, 2016 by S4

Filed under Hadoop, Strom

Last modified August 13, 2016

Installing a Storm Cluster Following are the prerequisites for setting the cluster : Linux Operating system Java 6 installed Python installed Installation Steps : Following steps are needed to get a Storm Cluster up and running. Set up Zookeeper Cluster : Zookeeper is used as a coordinator in Storm cluster. You can refer here to see the installation steps for Zookeeper. Install native dependencies on …

Interview Questions of Hive

August 12, 2016 by Arti Khedkar

Filed under Hadoop, Hive

Last modified August 12, 2016

Interview Questions of Hive What is Hive? Where is Hive best suitable? What are the different types of tables available in HIve? Is Hive suitable to be used for OLTP systems? Why? Can a table be renamed in Hive? Can we change the data type of a column in a hive table? What are the different types of tables available in Hive? Is Hive suitable …

Hadoop Administrator Interview Questions

July 11, 2016 by S4

Filed under Hadoop

Last modified December 1, 2016

Hadoop Administrator Interview Questions   Can you describe about your current roles and responsibility or day to day activity. Please describe the YARN Architecture. What is Name Node Heap memory and how we can configure heap memory. What do you install Hadoop cluster, please describe in detail, what are the service and component you install during Hadoop installation How do you enable repository during installation …

Hadoop Distributed File System (HDFS) for Big Data

July 11, 2016 by S4

Filed under Hadoop

Last modified July 15, 2016

Hadoop Distributed File System (HDFS) for Big Data 1.1. Introduction Hadoop1 provides a distributed filesystem and a framework for the analysis and transformation of very large data sets using the MapReduce [DG04] paradigm. While the interface to HDFS is patterned after the Unix filesystem, faithfulness to standards was sacrificed in favor of improved performance for the applications at hand. An important characteristic of Hadoop is …

Cloudera hadoop cluster demo VM

July 11, 2016 by S4

Filed under Hadoop

Last modified July 15, 2016

Cloudera Hadoop cluster demo VM This document will hep java developers to kick start how to run hadoop cluster VM. To understand this document you need to possess basic theoretical knowledge on Hadoop, hdfs and map reduce jobs. It is also advisable to have some prior knowledge on basic linux commands. It is possible to try sample map reduce jobs on your windows PC without any …

Analyzing Apache logs with Pig

June 27, 2016 by S4

Filed under Hadoop, Pig

Last modified June 27, 2016

Analyzing Apache logs with Pig Analyzing log files, churning them and extracting meaningful information is a potential use case in Hadoop. We don’t have to go in for MapReduce programming for these analyses; instead we can go for tools like Pig and Hive for this log analysis. I’d just give you a start off on the analysis part. Let us consider Pig for apache log …

Implementing basic SQL Update statement in Hive

June 27, 2016 by S4

Filed under Hadoop, Hive

Last modified June 27, 2016

Implementing basic SQL Update statement in Hive Hive is not meant for point to point queries and hence sql update functionality would be least required in hive that should be the reason hive doesn’t have update functionality for rows or rather individual columns in a row. There would be cases you find a much more suitable use case in hive, but the same can’t be …

Joins with plain Map Reduce or Multiple Inputs

June 27, 2016 by S4

Filed under Hadoop, Hive

Last modified June 27, 2016

Joins with plain Map Reduce or Multiple Inputs Being a map reduce developer I’d never recommend to write joins of data sets using custom map reduce code. You have very intelligent and powerful tools handy in hadoop like hive and pig that can easily join huge data sets with the choice of join like inner, outer etc. But if such a scenario arises where you …

Optimizing Joins in hive Sorting Java Heap issues with hive joins

June 27, 2016 by S4

Filed under Hadoop, Hive

Last modified June 27, 2016

Optimizing Joins in hive/Sorting Java Heap issues with hive joins Optimizing Joins in hive/Sorting Java Heap issues with hive joins, In hadoop we tent to use hive extensively since it is SQL like language and easier in framing our jobs with stored structured data. (Even Pig is great but still needs a little time to get comfortable with Pig Latin). But as beginners we often …