RapidMiner 6 has been released

November 24, 2013 at 10:02 PMBuddy James
  RapiderMiner 6  The highly anticipated RapidMiner 6 named RapidMiner studio has been released.  As I'm sure you can tell by my blog, I'm a huge fan of RapidMiner.  It make data mining tasks extremely easy for anyone to learn.  This is made possible by the awesome user interface design which is based on a WYSIWYG editor where you drag and drop process operators in the design view and connect connect the operators through in and out ports that provide a simply and effective way to model the process flow of data.  While these are the primary reasons that I love RapidMiner, they are nothing new.  So lets take a look at the new features added to this awesome application. New logo for a new brand The first thing that I noticed was the changes made on the RapidMiner website.  The website has been completely changed.  The changes include a new logo as well as a new presentation of the application which looks more like a product advertisement than an open source application.  This is because there are now several different versions of the application.  The open source, free community edition,   Here is a list of the different versions from the RapidMiner website.   STARTER Free 1 GB CSV and Excel None Community support Unlimited Download   PERSONAL $999/ yr. 4 GB Common types Open source databases Community support 1 year Buy   FEATURED PROFESSIONAL $2999/ yr. 8 GB Common types All database systems Community support 1 year Buy   ENTERPRISE Ask Unlimited main memory Common types + SPSS, SAS, HDFS All database systems Enterprise support 1 year Contact Us   New features Process templates The new version of RapidMiner Studio contains project templates that provide processes geared toward specific data mining problems. The templates and new tutorials look to be great additions to this already stellar package.  I plan to download a trial of the professional version and write a full review.  Stay tuned!

Posted in: Analytics | BI | Data mining | RapidMiner | Review

Tags: , , , ,

Text mining: How to mine e-mail data from an IMAP account using RapidMiner

July 14, 2013 at 5:39 PMBuddy James
In this article, I will show you how to use RapidMiner, one of the best Open Source data mining solutions on the internet, to read email data from an IMAP or POP3 account for storage and processing.  I'll cover the basic Text mining operations such as the Transform Case, Filter Stopwords, Stem, and tokenize operators. Here is a picture of the main process.   This process allows me to data mine email messages and write the information that I'm interested with into a SQL Server database.   The operator of interest at the beginning of the process is the Process documents from mail store operator.  This operator allows you to specify host details and login credentials in order to bring email messages into RapidMiner.   The process documents operator allows a sub process where you can add operators inside of the process documents operator to assist with munging your data.  Simply double click on the blue square boxes on the process documents operator to enter the sub process. Here is a screen shot of the processing operators that I use inside of the Process Documents operator.   There is a host property (webmail.yourdomain.com), username /password properties, to authenticate with the mail server, as well as a protocol drop down list.  You can specify that you only want to read unread emails by ticking the checkbox that reads (only unseen) and you can mark emails as read by using the mark seen checkbox. Here is an example of the properties for the Process Documents operator.   When you run your process.. it may take a while depending on how many emails that you plan to mine.  The results that I'm writing to a database are as follows: #1 The example set from the Select Attributes operator I used the Select Attributes operator because for some reason their were two received columns in the result set which gave me trouble when writing to the database.  So I used Select attributes to select all attributes except for the extra received attribute. #2 The example set from the WordList to Data operator   As you can see, this is an incredible source of data.  The data also offers classification modeling opportunities (I'm working on an article to detect spam using RapidMiner  check back soon). Thank you for reading.

Posted in: Data mining | Text mining | Analytics | RapidMiner | Tutorial | SQL Server

Tags: , , , , , ,