Apache Hadoop Overview: Scalable Open Source Software
Ranked #1,916 in Internet, #113,213 overall
Hadoop: The Software and the Community
If you are involved in large-system computing, likely you have heard of Apache Hadoop. For those new to Hadoop: This open source software project is a platform to do parallel computation. The Apache Hadoop project web page describes it this way:
"The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using a simple programming model."
This page is an overview to give you a taste of the Hadoop project and its history and current poularity. This site also includes pointers for how to find good resources for understanding and using Hadoop and how to get involvedin the Apache Hadoop community through discussion groups and the Hadoop mailing lists.
Logo from the the Apache Hadoop Project
"The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using a simple programming model."
This page is an overview to give you a taste of the Hadoop project and its history and current poularity. This site also includes pointers for how to find good resources for understanding and using Hadoop and how to get involvedin the Apache Hadoop community through discussion groups and the Hadoop mailing lists.
Logo from the the Apache Hadoop Project
Go to a HUG!
For links to meet-ups in your area here's a link to many Hadoop User Groups internationally:
Hadoop User Groups
Hadoop is an Apache Software Foundation Project
Hadoop is one of the most popular of the large open source software projects that are under the umbrella of the Apache Software Foundation and its open source license. For more information go toApache Software Foundation
Hadoop and Scalability
A system can have reliable scalability by design through the use of a cluster of computers rather than relying on just increasing size of one computer. This approach has cost advantages in terms of money and time. Hadoop is used by a wide range of major companies and is of considerable interest in the computing community.
Why a Yellow Elephant?
Doug Cutting and the Hadoop Logo
This picture of Doug Cutting is used by permission. I did not ask the elephant what he thought, but he looks pretty happy.
The yellow elephant of Hadoop is reflected in the logo of the related Apache project named Mahout. Mahout is an Indian term for someone who drives an elephant. In the case of the Apache Mahout project logo, it's a yellow elephant on which the mahout rides. For more information about Apache Mahout, see the review of the book Mahout in Action at Best Book on Mahout.
Scalability
A key to success in large data projects
Scalability refers to the ability of a system to continue to perform efficiently and effectively with increased capacity that improves linearly relative to the addition of new resources such as additional hardware or additional time. Systems whose capacity increases at a rate more slowly than available resources are said to not scale. Systems that do not scale will eventually fail with increasing load.
With the widespread occurrence of systems having very large and growing data, such as internet sites, the need for scalable software is great. The open source Hadoop project provides a reliable software that is scalable for distributed computing.
With the widespread occurrence of systems having very large and growing data, such as internet sites, the need for scalable software is great. The open source Hadoop project provides a reliable software that is scalable for distributed computing.
Hadoop and MapReduce
Hadoop relies on the idea that large scale computing can be distributed across a cluster of servers. A key point in the development of this idea came from publication of a paper by Google Labs in 2004. It presented map-reduce algorithms that make this type of distributed computing possible.The idea of map-reduce inspired the developers of the Apache Lucene sub-project, Nutch, to produce the Hadoop framework to solve some scaling problems in Nutch. Yahoo commissioned a team of programmers to work on Hadoop, contributing the results back to Apache. Several other companies did likewise. Now a large and growing number of companies use Hadoop-based approaches for large projects.
At the Start: A Moment in Hadoop History
Doug Cutting started Hadoop, and it has grown to be an internationally known and widely used software framework for open source software. I ran across an entry from an old blog of Doug's from 13 March 2006 Free Search that marks an B>early point in the development of Hadoop. Doug wrote the following:"We've split the distributed computing parts of Nutch into a new project named Hadoop. This includes a filesystem modelled after GFS and a distributed computing system modelled after Google's MapReduce. So far a few folks are using Hadoop on tens of machines, and we're testing it on clusters with hundreds of machines. Next stop, thousands!"
By the way, this March 2006 blog posting had one comment...I recall an evening in November 2007 when Ted Dunning and I hosted a local Bay Area Hadoop users get together at a bar in Palo Alto. We invited about 20 people; about 30 showed up. Doug was there. It's amazing today to think of this early point in the project when you consider how many developers are using Hadoop now. The Hadoop users group meetings in the San Francisco area now involve hundreds of attendees for each monthly chapter meeting and there are local meetings all over the world. The Hadoop Summit 2010 in Santa Clara, CA had over 1000 attendees. Hadoop Summit 2011 was on 29 June 2011 in Santa Clara with over 1600 people.
Photo is a modern moment in Hadoop: Ted Dunning and Doug Cutting discussing Hadoop during a break at the Berlin BuzzWords 2011 Conference. Image © E. Friedman, used with permission Ted and Doug.
Hadoop Summit 2012
Apache Hadoop Summit 2012 takes place 13- 14 June at the Santa Clara Convention Center in Santa Clara, California.
Hadoop Summit 2012
Hadoop Summit 2012
Hadoop Summit 2011
Over 1600 people attended Hadoop Summit 2011. The presentation for MapR Technologies grabbed audience attention with a video CLICK HERE.
Hadoop Committers - Who Are They?
The people who make the project
Hadoop is a top-level Apache Foundation project, and as such, it involves a large community. To date, the Hadoop committers, who develop, expand and update the software, number over 40.
Go to this link on the Apache site to see a the list of who are the current Hadoop project committers:
Apache Hadoop Committers
Go to this link on the Apache site to see a the list of who are the current Hadoop project committers:
Apache Hadoop Committers
How to Participate in Hadoop
Want to join the Hadoop community? Go to a local Hadoop Users Group or HUG meetup in your area or join in online in the discussion groups.Visit the main site for the Apache Foundation Open Source Hadoop project at
Hadoop Home
You will find news updates, a technical description of Hadoop and a list of the committers who develop it as well as links to all aspects of the Hadoop community.
If you want to get involved, go to the mailing lists and discussion groups. There are several choices depending on whether you want to join in the users group, the developers discussion or project level discussions. The link for the various mailing lists and discussions is at
Hadoop Mailing Lists
You can participate by looking for a local Hadoop Users Group Meet-up. HUGs are now International. New ones are forming in many areas. Search online to find a local HUG. Or check this link to find one in your area:
List of Active HUGs
For example, in the California Bay Area, the Hadoop User Group Monthly Meetup (HUG) is the 3rd Wednesday of each month. For more information or to sign up for a meet-up, click this link:
Bay Area Hadoop Users Group.
Follow me on Twitter for announcements about specific Hadoop related events worldwide @Ellen_Friedman
Books on Hadoop
There are several choices for a guide to using Hadoop. All of these are conveniently available from Amazon.
Hadoop at QCon in San Francisco November 2011
"Hadoop for the Enterprise Architect Panel" was presented on Friday 18 November 2011 at QCon in San Francisco. Panelists/presenters included Amr Awadallah, Guy Bayes, Ron Bodkin, Ted Dunning, Sanjay Radia and Peter Sirota.
Hadoop Panel at QCon: Hadoop for the Enterprise Architect"
Hadoop Panel at QCon: Hadoop for the Enterprise Architect"
Hadoop at Berlin BuzzWords 2012
Berlin Buzzwords 2012 takes place 4th-5th June Here's the link Berlin Buzzwords 2012Apache Projects were one of the topic of discussion at the Berlin BuzzWords 2011 conference in Germany last June, and Hadoop was a major focus, particularly of the two keynote addresses.
Participants worked hard during the two-day conference and many also joined in at several hack-a-thons at local companies and the Technical University held in conjunction with the BuzzWords conference.
Participants also had the opportunity to enjoy seeing Berlin, the tourists sights and the high-tech happenings as well.
1st Keynote Speaker at Berlin BuzzWords 2011
Doug Cutting talks about Hadoop, Avro and other projects
Doug Cutting was elected chairman of the board of the Apache Software Foundation in September 2010. He contributes to several Apache projects. Doug is currently at a company called Cloudera.
2nd Keynote Speaker at Berlin BuzzWords 2011
Ted Dunning talks about the future of Hadoop
Ted Dunning gave the keynote address on the second day of the Berlin BuzzWords 2011 conference on 7 June 2011. He talked mainly about the future of Hadoop and the changes the community faces as the world of computing embraces Hadoop.Ted Dunning is a member of the Apache Software Foundation, a committer for the Mahout project and active in the Hadoop community. Ted is currently at a company called MapR.
Read more from Ted Dunning's blog at
Surprise and Coincidence: Musings from the Long Tail.
Useful Books on Related Topics
If you have an interest in Hadoop, you may also find these books useful. Both of these titles come highly rated at Amazon.
Book Review: Mahout in Action
If you'd like to know more about the related Apache open source software project Mahout read this review of a new how-to book on Mahout, titled Mahout in Action. This book is published by a technical publisher called Manning.
Mahout in Action
eBook and print version from Manning
To ORDER Mahout in Action NOW: Get just the eBook or eBook with print version. And the eBook has audio/video enhancements. Both available as of 4 October 2011.To get a limited-time 37% discount on all formats of Mahout in Action book, go to the publisher Manning and use the discount code
mahout37
at the following link: Manning's Mahout in Action
Mahout in Action from Amazon
Pre-order print version now
If you want to get just the text version for slightly less, pre-ORDER NOW just the print version from Amazon, and they will ship you the copy when available 28 October 2011.
Please leave your comment or question
-
-
Tipi
Aug 18, 2011 @ 9:00 pm | delete
- I like how you sneak in your humor here and there....didn't ask the elephant about use of his picture but he looks happy!
-
-
-
sukkran Jun 28, 2011 @ 11:55 pm | delete
- really useful info about a open source software. thanks for sharing
-
Other Topics I've Written About
Life cannot be all work - here are some ideas for play, from great Indian food in the San Francisco Bay Area, how to make oolong tea, fun music for mandolin and fiddle and a new photo project. Food for thought and food for you!
Computer Laptop Sleeve
Convenient Computer Memory
These flash drives are a handy way to temporarily back up or transfer information. They are useful at a presentation to share files. And the swivel design means that the cap won't get lost.
by efriedman
I am co-author of a book about another Apache project, titled Mahout in Action. By training, I am a biochemist/ molecular biologist. Most of my writi... more »
- 26 featured lenses
- Winner of 22 trophies!
- Top lens » Fresh Figs: When Are Figs Ripe?
Feeling creative?
Create a Lens!
Explore related pages
- Pro Hadoop Book Review Pro Hadoop Book Review
- Open Source Software: The Apache Foundation Open Source Software: The Apache Foundation
- Linux Clustering Linux Clustering
- Apache Mahout: Scalable Data Mining Apache Mahout: Scalable Data Mining
- The Importance Of Education The Importance Of Education
- Frog Unit Study: Hopping to Learn Frog Unit Study: Hopping to Learn