What is Hadoop?

Hadoop is a method for distributed computing and storage. For a long-time, companies stored information on servers in big databases. Basically, if you’ve some row and column-based data, this works for you really well. However, in the last few years, certain companies wanted more data.

Let’s imagine a small eCommerce company early on in their life, the only thing they are storing is user information and purchase / order history. If you have 1 million users and every makes 5 purchases a year, your user data is only 1 million records and your order database is 5 million records (and grows only 5 million records a year).

Sometime, later you decide to store every product a user adds to the shopping cart but never purchases, so you start storing aborted purchase information which is 10 million records a year. Then, they decide they want to store every product every user looks at. They start storing that, every user might look at 10 products before each completed or aborted purchase. Now you’ve got 150 million records per year. Then you decide you want to store every single click that a user does. it isn’t very long before you want to store billions or trillions of records every year.
What is Hadoop
And this is where traditional storage structures typically don’t work. A single database with billions of records is going to be so large that, if it works, it’s going to be slow.

To overcome above situation Hadoop came into picture for processing large volume of data through distributed storage and distributed processing enviornment.

Hadoop is an open-source software framework for storing data and running applications on clusters of commodity hardware. It provides massive storage for any kind of data, enormous processing power and the ability to handle virtually limitless concurrent tasks or jobs.

Basically Hadoop was created by Doug Cutting and Mike Cafarella in 2005. Cutting, who was working at Yahoo! at the time, named it after his son’s toy elephant. It was originally developed to support distribution for the Nutch search engine project.

Leave a Reply

Your email address will not be published. Required fields are marked *