Data Frames and Transactions
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Transactions are a very useful tool when dealing with data mining. It provides a way to mine itemsets or rules on datasets.
In R the data must be in transactions form. If the data is only available in a data.frame then to create (or coerce) the data frame to transaction the researcher may use the following code. This example shows the “Adult” dataset available in the arules package. It originates from the “Census Income” database. These data, AdultUCI, can be coerced to transactions using the following commands:
library("arules"); data("AdultUCI"); Adult = as(AdultUCI, "transactions");
The dataframe can be in either a normalized (single) form or a flat file (basket) form. When the file is in basket form it means that each record represents a transaction where the items in the basket are represented by columns. When the dataset is in ‘single’ form it means that each record represents one single item and each item contains a transaction id. The following snippet of code shows the read.transaction() function and how the data is set up.
my_data = paste("1,2","1","2,3", sep="\n"); write(my_data, file = "my_basket"); trans = read.transactions("my_basket", format = "basket", sep=","); inspect(trans);
Once the data has been coerced to transactions the data is ready for mining itemsets or rules. Association Rule Learning uses the transaction data files available in R. A very popular algorithm for association rules is the apriori algorithm. I have discussed approaches on the use of Association Rule Learning and the Apriori Algorithm.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.