What is Apache Mahout?
A mahout is one who drives an elephant as its master. Mahout name comes from its close association with Apache Hadoop which uses an elephant as its logo. Hadoopis an open-source framework from Apache that allows to store and process big data in a distributed environment across clusters of computers using simple programming models.Apache Mahout is an open source project that is primarily used for creating scalable machine learning algorithms. Mahout implements popular machine learning techniques such as:
Features of Mahout:
The basic features of Apache Mahout are listed below.
- The algorithms of Mahout are written on top of Hadoop that’s why it works well in distributed environment. It uses the Apache Hadoop library to scale effectively in the cloud.
- It offers the coder a ready-to-use framework for doing data mining tasks on large volumes of data.
- It lets applications to analyze large sets of data effectively and in quick time.
- It includes several MapReduce enabled clustering implementations such as k-means, Canopy, Dirichlet, fuzzy k-means, and Mean-Shift.
- It Supports Distributed Naive Bayes and Complementary Naive Bayes classification implementations.
- Mahout comes with distributed fitness function capabilities for evolutionary programming.
- Includes matrix and vector libraries.
Applications of Mahout
- Companies such as Adobe, Foursquare, Twitter, Facebook, LinkedIn, and Yahoo use Mahout internally.
- Foursquare helps you in finding out places, food, and entertainment available in a particular area. Mahout uses the recommender engine of Mahout.
- Mahout is used by Twitter for user interest modelling.
- Yahoo uses Mahout for pattern mining.
Classification of Mahout:
What is Classification?
Classification is a machine learning technique that uses known data to determine how the new data should be classified into a set of existing categories. For eq,
- iTunes application uses classification of Mahout to prepare playlists.
- Mail service providers such as Yahoo! and Gmail use this technique to decide whether a new mail should be classified as a spam. Categorization algorithm trains itself by analyzing user habits of marking certain mails as spams. On the basis of that, the classifier decides whether a future mail should be deposited in your inbox or in the spams folder.
Applications of Classification
- Credit card fraud detection - The Classification mechanism is used to predict credit card frauds. The classifier can predict which future transactions may turn into frauds, using historical information of previous frauds.
- Spam e-mails - Depending on the characteristics of previous spam mails, the classifier determines whether a newly encountered e-mail should be sent to the spam folder.