Apache Mahout: Scalable Data Mining

Ranked #884 in Internet, #45,942 overall

Big Data and the Mahout Project

The goal of the open source Apache Mahout project is to provide scalable machine learning libraries. It is distributed free under the Apache Foundation license, making it business friendly. Mahout generally is implemented on a core of Apache Hadoop, but this is not a requirement. It uses the power of map reduce for scalable learning.

This blue rider on a yellow elephant is the Official Logo of the Apache Mahout project. For more information, visit the official website for Mahout:

Apache Mahout Official Site

Mahout News: A New Venture from Sean Owen

Sean Owen, Apache Mahout committer and co-author Mahout in ActionSean Owen, Mahout committer and co-author of Mahout in Action, is taking a new step by commercializing Mahout in a venture called Myrrix. Sean announced the pre-launch and described the project on the Mahout mailing list. Here's the link to Sean's Myrrix site:

Myrrix Pre-launches"

Mahout committer and Apache Foundation member Ted Dunning opened a discussion in support of the business-friendly nature of Apache projects and this new Mahout-related venture. See what's being said at this link:

"The Apache Way"

More thoughts from Ted on his blog at:
Musings from the Long Tail

Video and Slides from Mahout Meet-Up

The first meet-up of Mahout users took place in Redwood City near San Francisco on Tuesday 29 November 2011. About 35 attended. The presenters included Mahout co-founder and committer Grant Ingersoll (Lucid Imagination) and Mahout core committer and an author of the Mahout in Action book Ted Dunning (MapR). Grant Ingersoll, co-founder of Mahout, suggests the term "Mahout User Meet-up" or "MUM" for these groups.

New slides and video presentations from the first Mahout Meet Up are available online. Grant talks about using clustering and classification for data mining with email. Ted talks about using random projections to improve performance in machine learning with Mahout while maintaining quality.

SEE VIDEO PRESENTATIONS of GRANT INGERSOLL AND TED DUNNING HERE:

Link for free slides and video from the presentations:
Slides and Video Mahout Meet-up November 2011 San Francisco

For more from the presenters, visit their blogs:

Grant's blog Grant's Grunts: Lucene Edition

Ted's blog Surprise and Coincidence: Musings from the Long Tail

Look for more Mahout Meet-Ups in the future or follow me on Twitter for announcements @Ellen_Friedman

Discount on Mahout Book

Mahout in Action cover, published by Manning 2011The how-to book on Mahout, Mahout in Action, was just published in October 2011 by publisher Manning. Three of the four authors are Mahout core committers.

Get a 37% DISCOUNT on all formats of the book: Go to the Manning site and use THIS DISCOUNT CODE in the coupon box: mahout37

Manning's Mahout in Action

Early Edition Taming Text Available Online

Cover shot of Taming Text published by ManningMahout co-founder Grant Ingersoll is one of the authors of a new book Taming Text being published by Manning. Pre-buy the book and get early access online (MEAP) at the Manning website:

Taming Text MEAP

Why the Name Mahout?

Mahout with young elephant, image by Alexander Klink in wikipedia, cc3.0 licenseMahout is the Indian word for the person who drives elephants. This name was picked for the software project because much of the Mahout data mining is based on another Apache Foundation project, Hadoop. The logo for Hadoop is a yellow elephant, hence the inspiration for the name Mahout - a software driving the "yellow elephant".

Image of Mahout with young elephant is from wikipedia, taken by Alexander Klink, distributed under creative commons license 3.0
Alexander Klink Mahout image

Who are the People Who Make Up Mahout?

Robin Anil, Mahout committer and co-author Mahout in ActionMahout like other Apache Foundation open source software projects is developed by a group of volunteer committers. The resultant software is available for free under the business-friendly Apache license.

Mahout is also a community of people who help the project through discussions online (see the mailing lists), through meetings and presentations and by using Mahout in their own projects and providing feedback.

For a list of Mahout committers and contact information, click this link:

Mahout Committers

Photo is Robin Anil, Mahout Committer and co-author of Mahout in Action

More Mahout Core Committers

Sebastian Schelter, © S. Schelter 2011Another Mahout Core Committer who is active in the Mahout community is Sebastian Schelter. Currently he is trying various experiments using Mahout in a variety of settings, including testing Mahout with a billion - sample data set. Now that's BIG data.

Follow Sebastian Schelter on Twitter: @sscdotopen

Get Involved with Mahout

Eric Andrejko, Ted Dunning, ACM Mahout presentation,© Martin Stein 2011 Oct 2011, If you are interested in using Mahout software, why not get involved with the Mahout community? It's free to join in discussions by subscribing to the Mahout mailing lists for users or the one for developers on the official Apache Mahout site:

Mahout Mailing Lists

Another way to get involved is to attend presentations in your area. For example, ACM local chapters have presentations on Mahout and scalable data mining. Pictured here are Mahout core committer and Apache member Ted Dunning on the right with Mahout user Eric Andrejko (left) who were presenters on a data mining panel at the ACM meeting in San Jose, CA on 15 October 2011. Photo by Martin Stein, used with permission.

How to Use Mahout

Cover of Manning's Mahout in Action bookMahout is useful for data mining applications in recommendation, clustering and classification in projects with huge amounts of data. Huge goes beyond 100,000s of examples to millions of examples - above 10 million examples in a data set, Mahout is really the single solution that meets the need.

For more details about how to use Mahout, you have several options. Visit the Official Apache Mahout Site and join the mailing lists to be part of the Mahout discussions. This approach is for people already fairly familiar with the types of algorithms used by Mahout. Or try the tutorials on the official Mahout site at this link:

Mahout tutorials

Another option is to buy the new book about how to use Mahout, titled Mahout in Action, published by Manning.

UPDATE: Book is finished The final version of Mahout in Action for the print and eBook is now available as of 5 October 2011,. Go to the publisher's site and use code mahout 37 to get 37% discount on either format of the book:

Link for the how-to book on Mahout: Mahout in Action

This book provides in depth instruction for Mahout, with many do-it-yourself examples using real data.

Learn More About Mahout

Robin Anil & Ted Dunning at OSCON 2011, image © E. Friedman 2011Committers Robin Anil and Ted Dunning presented a Hands On Mahout workshop at the July OCSCON 2011 Conference in Portland, Oregon. You can see their slides from the presentation at this link:
Hands On Mahout OSCON 2011Slides

For more from Robin Anil see Robin's blog:
Robin's Blog

For more from Ted Dunning see Ted's blog:
Surprise and Coincidence

Excellent article on Mahout from co-founder Grant Ingersoll:
Scalable Machine Learning for Everyone

Book Review: Mahout in Action

Mahout in Action is an in depth how-to book on using Mahout. Three of the four authors are Mahout committers.

Grant Ingersoll, core committer and co-founder of Mahout, wrote this review of the book Mahout in Action:

Mahout in Action Review

For a biased review of the Mahout in Action book by one of the book's authors plus more information about the authors, click following link.
Loading

Video Presentation by Isabel Drost

Isabel Drost in Berlin June 2011, image © E. Friedman 2011, all rights reservedIsabel Drost, co-founder of the Mahout project and a committer is seen here in Berlin in 2011. You can purchase a video presentation she did at Strata in March 2011 from O'Reilly at this link:

Scaling Data Analysis With Apache Mahout by Isabel Drost.

For more from Isabel Drost, see her blog Isabel Drost

Mahout logo has an elephant

Elephant from a tangerine, © E. Friedman 2011Working with Mahout makes you see elephants in some of the most unlikely places....

Books on Related Topics

Loading

Updates

For a partial list of Mahout talks go to this link on the Mahout site:

Mahout Talks

Free video of Mahout presentation by Ted Dunning at LA-Hadoop Users Group Meet-up in March 2011:
March 2011 LA-Meetup Ted Dunning

Slides from a Berlin Buzzwords June 2011 presentation Frank Scholten presentation on Mahout

For more updates on Mahout activities and presentations, follow me on Twitter for announcements:

@Ellen_Friedman

Mahout at Apache Con 2011 in Vancouver

Apache Con 2011 logo for presentersApache Con 2011 took place 7-11 November in Vancouver, B.C. There were a number of Mahout-related presentations. Isabel Drost, committer and co-founder of Mahou,t presented two talks described at this link:

Isabel Drost /Mahout at Apache Con

Grant Ingersoll, committer and co-founder of Mahout, presented a one day training workshop for newcomers to Mahout:

Grant Ingersoll's Mahout Training Workshop for Newcomers

Your comments and questions are welcome

  • WriterJanis Feb 27, 2012 @ 3:25 am | delete
    Good luck with this project!
  • Tipi Jan 21, 2012 @ 3:27 pm | delete
    Returning for a little review...
  • sukkran Aug 23, 2011 @ 6:40 am | delete
    thanks for sharing some valuable information about mahout and and it's tutorials. nicely done
  • Tipi Aug 18, 2011 @ 8:42 pm | delete
    I love that you tell how the logo of a yellow elephant with a blue rider came about.

Something Completely Different

Want to explore some completely different topics? Try these creative sites - and consider submitting your own work.
Loading

Amazon

Some interesting reading ....
Loading

by

efriedman

My favorite time of the day: now.

My interests: the fine art of knowledge from sciences to painting.

My favorite place: outdoors, preferably mou...
more »

Feeling creative? Create a Lens!

Amazon Spotlight Personal Review 

Lucene in Action, Second Edition: Covers Apache Lucene 3.0

Amazon Price: $23.99 (as of 06/02/2012)Buy Now

Lucene is another useful Apache Foundation project, and Lucene in Action has gotten excellent ratings from reviewers.

Amazon Spotlight Personal Review 

No contents yet - please edit this module to customize its settings.Sorry, there are no results available from Amazon.