15 April 2019 / #Machine Learning #Data Product

Market Basket Analysis - Part 3 of 3 - A Shiny Product Recommender with Improved Collaborative Filtering

My objective for this piece of work is to carry out a Market Basket Analysis as an end-to-end data science project. I have split the output into three parts, of which this is the THIRD and last, that I have organised as follows:

In the first chapter, I will source, explore and format a complex dataset suitable for modelling with recommendation algorithms.
For the second part, I will apply various machine learning algorithms for Product Recommendation and select the best performing model. This will be done with the support of the recommenderlab package.
In the third and final instalment, I will implement the selected model in a Shiny Web Application.

Introduction

In the course of the research I carried out for this project I came across several pieces of work, all of which provided great inspiration and insight for my Shiny Application. A special mention goes to Jekaterina Novikova, PhD for her Movie Recommender System , and Prof. Michael Hahsler, who created a Joke Recommender. They used the MovieLens and the Jester dataset respectively, which come as default with recommenderlab.

However, both authors employed recommenderlab as the engine for their Shiny App, which famously requires a long time to calculate predictions on large datasets. The solution adopted by Jekaterina to make calculations more manageable was to take a sample of the dataset to reduce the size of the rating matrix. However, this may prove detrimental to prediction accuracy.

Fortunately, I came across this brilliant Kaggle kernel by Philipp Spachtholz, who not only carried out a salient analysis on a non-Kaggle dataset but, crucially for me, also build a Shiny-based Book Recommender system using a much faster collaborative filtering code.

Philipp drew his inspiration from this blog-post by SmartCat Consulting, which describes how to use an Improved Collaborative Filtering code and all the associated functions contained in the companion Github repository. In particular, the repository includes the simililarity_measures.R functions for calculating similarity matrices, and the cf_algorithm.R file with the collaborative filtering algorithm and prediction function.

In this final part of the project, I will describe how I used these functions in my Shiny implementation.

Loading the Packages

library(tidyverse)            
library(knitr)
library(Matrix)
library(recommenderlab)

The Data

In this section I am using the retail dataset from Part 2 to prepare the data files needed for the Shiny App. Note that in Part 2 I have carried out the extra formatting step of removing those orders that contained the same item more than once.

glimpse(retail)

## Observations: 517,354
## Variables: 10
## $ InvoiceNo   <dbl> 536365, 536365, 536365, 536365, 536365, 536365, 53...
## $ StockCode   <chr> "85123A", "71053", "84406B", "84029G", "84029E", "...
## $ Description <fct> WHITE HANGING HEART T-LIGHT HOLDER, WHITE METAL LA...
## $ Quantity    <dbl> 6, 6, 8, 6, 6, 2, 6, 6, 6, 32, 6, 6, 8, 6, 6, 3, 2...
## $ InvoiceDate <dttm> 2010-12-01 08:26:00, 2010-12-01 08:26:00, 2010-12...
## $ UnitPrice   <dbl> 2.55, 3.39, 2.75, 3.39, 3.39, 7.65, 4.25, 1.85, 1....
## $ CustomerID  <dbl> 17850, 17850, 17850, 17850, 17850, 17850, 17850, 1...
## $ Country     <fct> United Kingdom, United Kingdom, United Kingdom, Un...
## $ Date        <date> 2010-12-01, 2010-12-01, 2010-12-01, 2010-12-01, 2...
## $ Time        <fct> 08:26:00, 08:26:00, 08:26:00, 08:26:00, 08:26:00, ...

For the app deployment I need to create 2 data files: past_orders_matrix and item_list.

past_orders_matrix is a user-item sparse matrix containing the history of past orders. This is needed in the Shiny server.R file for all the calculations.

past_orders_matrix <- 
    retail %>%
    # Select only needed variables
    select(InvoiceNo, Description) %>% 
    # Add a column of 1s
    mutate(value = 1) %>%
    # Spread into user-item format
    spread(Description, value, fill = 0) %>%
    select(-InvoiceNo) %>% 
    # Convert to matrix
    as.matrix() %>% 
    # Convert to class "dgCMatrix"
    as("dgCMatrix")

I save the file for use in the app.

saveRDS(past_orders_matrix, 
        file = "past_orders_matrix.rds")

item_list is a list of all the products available to purchase. This will feed in the Shiny ui.R file to make the products list available for selection.

# Creating a unique items list
item_list <- retail %>% 
    select(Description) %>% 
    unique()

I save the list for use in the app.

saveRDS(item_list, 
        file = "item_list.rds")

Improved Collaborative Filtering

To show how the Improved Collaborative Filtering works, I am fitting the best performing model found in Part 2, the item-based CF, on the same made-up order. I am also doing the same using recommenderlab to compare the performance of the two approaches.

First, I re-create the made-up order using the same 6 randomly selected products.

customer_order <- c("GREEN REGENCY TEACUP AND SAUCER",
                     "SET OF 3 BUTTERFLY COOKIE CUTTERS",
                     "JAM MAKING SET WITH JARS",
                     "SET OF TEA COFFEE SUGAR TINS PANTRY",
                     "SET OF 4 PANTRY JELLY MOULDS")

Next, I to put new_order in a user_item matrix format.

# put in a matrix format
new_order <- item_list %>%
    # Add a 'value' column with 1's for customer order items
    mutate(value = as.numeric(Description %in% customer_order)) %>%
    # Spread into sparse matrix format
    spread(key = Description, value = value) %>%
    # Change to a matrix
    as.matrix() %>% 
    # Convert to class "dgCMatrix"
    as("dgCMatrix")

Then, I add the new_order to the past_orders_matrix as its first entry.

# binding 2 matrices
all_orders_dgc <- t(rbind(new_order, past_orders_matrix))

Now, I need to set a number of parameters required by the Improved CF to work.

# Set range of items to calculate predictions for - here I select them all
items_to_predict <- 1:nrow(all_orders_dgc)
# Set current user to 1, which corresponds to new_order
users <- c(1)
# Set prediction indices
prediction_indices <- as.matrix(expand.grid(items_to_predict, users = users))

I load the algorithm implementations and similarity calculations.

# Load algorithm implementations and similarity calculations
source("cf_algorithm.R")
source("similarity_measures.R")

And finally I can fit the item-based CF model with the Improved CF and check the runtime.

start <- Sys.time()

recomm <- predict_cf(all_orders_dgc, prediction_indices,
                     "ibcf", FALSE, cal_cos, 3, FALSE, 4000, 2000)

end <- Sys.time()
cat('runtime', end - start)

## runtime 0.630003

Wow! That was lightning fast!

Let’s now run the item-based CF model with recommenderlab and compare performances.

# Convert `all_orders` to class "binaryRatingMatrix"
# signature(from = "dgTMatrix", to = "realRatingMatrix")

all_orders_brm <- as(all_orders_dgc, "realRatingMatrix")

# Run run IBCF model on recommenderlab
start <- Sys.time()

recomm <- Recommender(all_orders_brm, 
                      method = "IBCF",  
                      param = list(k = 5))

end <- Sys.time()
cat('runtime', end - start)

## runtime 12.75939

The speed gain is around 20 times and is consistent with what Philipp Spachtholz witnessed in his work. This is rather promising for the Shiny App!

Deploying the App

For the App deployment I went for a Proof of Concept approach. My focus has been on speed of execution and on getting right all the calculations shown in this article to power the server side of the App. This reflects on what is currently a minimalistic User Interface, which simply features product selection and a Complete Your Purchase action button.

Here’s a link to the Product Recommender for your peruse.

I will continue to work on the User Interface to add features that improve customer experience and make it more of a final product.

Here are a few ideas I’m toying with at the moment:

Add Product Price and ability to select Product Quantity
Add Product Image - with 4,000 items to choose from, this is a mini project in its own right!
Ability to select item from list using first letter of Product Name
Enhance the UI visuals with shinyjs and/or htlm
Research how to Implement a Hybrid approach, given that not all combinations of products currently return a suggestion

Comments

I have to admit that I’ve genuinely enjoyed working on this Market Basket Analysis project. Recommendation systems are a fascinating field of research with real-world applications and I feel that I’ve just scratched the surface.

I have also really appreciated learning the ropes of Shiny App development, which turned out to be more straightforward that I initially thought: reactivity is a key concept to get your head around and it forces you to think in terms of User Interface and Server being two sides of the same coin.

The main consideration for me is that the potential is massive: even small companies with an online presence can benefit from implementing the most basic of recommendation system. With only a few lines of code one can improve customer experience, promote customer loyalty and boost sales.

Code Repository

The full R code can be found on my GitHub profile

References

For Recommenderlab Package see: https://cran.r-project.org/package=recommenderlab
For Recommenderlab Package Vignette see: https://cran.r-project.org/web/packages/recommenderlab/vignettes/recommenderlab.pdf
For the SmartCat Improved Collaborative Filter see: https://www.smartcat.io/blog/2017/improved-r-implementation-of-collaborative-filtering/