Cross-sell and Up-sell Blue Paper

May 2021 | Research Papers

Prepared by: 

Al-Ahmadgaid B. Asaad
Jonathan Ray Abat
John Paolo Maulion

Introduction

In an article published by The Economist, entitled “The world’s most valuable resource is no longer oil, but data”, the author blows the whistle and makes emphasis on the oil of the digital era. It may sound ambitious, but this seems to be the case. In fact, Intel’s CEO support this in an article published by Fortune entitled “Intel CEO Says Data is the New Oil”. The CEO, Brian Krzanich, stated the following: “Oil changed the world in the 1900s. It drove cars, it drove the whole chemical industry,” Krzanich explains. “Data, I look at it as the new oil. It’s going to change most industries across the board”. It may not be apparent for some, but the truth is, tech giants like Google and Facebook have been hoarding data using their free platforms that most users are taking advantage of. In return, these tech giants gets more and more data which they can use for creating and testing different algorithms tailored towards advertising. For example, Facebook knows well what to advertise on your feed, thanks to their algorithms trained to target customers from a given ad. Having such algorithms encourage advertisers to spend on these platforms since it can reach users that need to be targeted.

Another example of using data for targeting customers, is Amazon. While Google and Facebook hoards data such as mails, likes, posts, chats, etc., Amazon on the other hand, hoards transactional data of their customers. In an article published by Fortune entitled “Amazon’s recommendation secret”, the author stated that: “Amazon calls this homegrown math (recommendation system) ‘item-to-item collaborative filtering,’ and it’s used this algorithm to heavily customize the browsing experience for returning customers. A gadget enthusiast may find Amazon web pages heavy on device suggestions, while a new mother could see those same pages offering up baby products”. The homegrown math is indeed popular and is useful for transactional data. 

Judging by the success of these companies (Google, Facebook and Amazon), hoarding data and not making use of it is indeed not helpful. In this paper, we detail the application of Amazon’s collaborative filtering for USSC’s recommendation system. The discussions are organized as follows: starting with the different models/algorithms to consider in the related literature, the benchmarks of these algorithms, the technical details of the methodology, the integration and deployment of the product, and finally the results.

Related Literature

A classic approach to recommendation system is based on Market Basket Analysis (MBA), which is done using Association Rules of the form antecedent => consequent. The idea is that each transaction of the customer is considered as basket of goods. Inside this basket are different products, for example: soap, diaper, bread, peanut butter, etc. Having this type of dataset, one can therefore come up with a rule across transactions, for example, bread => peanut butter. If this rule has strong association, then one can recommend peanut butter if the customer bought bread. MBA is a simple classic approach but only applicable for packaging goods, which is not applicable for e-commerce transactions.

E-commerce data are best modeled by Collaborative Filtering (CF). The original approach to CF is based on user-oriented neighbourhood model, where it tries to estimate unknown ratings based on the recorded ratings of like minded users. Analogous to this approach is the item-oriented neighbourhood model proposed by [6] and [15]. It became popular due to better scalability and improved accuracy. The said approach estimates a rating using known ratings of the same user on similar items [1]. An alternative to neighbourhood model based CF is the Latent Factor models, which tries to estimate latent features that explains observed ratings. An example of these models are Neural Networks [16] and Latent Dirichlet Allocation [17]. The model used for USSC’s recommendation engine is based on learning latent features as well, which defines the factors of the user-item matrix. The model is a CF based on Matrix-Factorization.

Finally, graph based models is another alternative useful for recommendation systems, in particular, user-item graph models trained via Personalized Page Rank (PPR) algorithm. PPR is a special case of Page Rank algorithm whereby the jumps is biased towards a set of starting vertices. The set of starting vertices can be a particular item or a particular user. Consider items as the starting vertices, running PPR on the graph with nodes consisting of users and items, gives us scores for all nodes. The scores is the number of passes made by the agent in PPR. Hence, the top user nodes in terms of number of passes made by the agent becomes the candidate users likely to buy the given item in starting vertices. PPR is a good model for recommendation system, in fact, it is used by Spotify for recommending songs according to Cornell University blog post. However, implicit CF is known to be simple and used by successful companies like Amazon. Further, PPR was not used since the team who handled the project proposed initially thought of using Apache Spark library anticipating big transactional data. The said library only offers implicit CF for recommendation systems. Hence, the decision was to use implicit CF, this has been the case up until the current proposal which now uses Python library. A complete review on recommendation system algorithms is detailed in [3] and [4].

Industry and Academic Benchmarks

In a study conducted by [3], the following algorithms were benchmarked:

The authors conclude the following:

Generally speaking, Matrix-Factorization-based methods perform best in terms of prediction accuracy. Nevertheless, in some special cases, several other methods have a distinct advantage over Matrix-Factorization methods.

Matrix-Factorization based algorithm is the heart of Collaborative Filtering, which is used for the USSC’s Cross-sell and Up-sell recommendation system. The complete results of [3] for the models used in the study (see Table 1) are detailed here.

Objectives

The objective of the project is to build a tool, which we call the Recommendation Engine, for the USSC’s transactional data. Specifically, to provide them with a user-friendly interface based on web technologies and is powered by an algorithm running behind it, which we refer to as the NeuralMechanics’ (NM) Recommendation Engine. This algorithm runs on remote computers, which we refer to as the USSC’s servers, that response from the users’ requests in the interface.

Technical Methodology

Among the popular techniques used in recommender systems is Collaborative Filtering (CF). Generally, CF will match a user’s unique taste or preference based from the historical transactions and activities of many users. For example, CF systems identify pairs of items that tend to be rated similarly or like-minded users with similar history of purchasing to reduce unknown relationships between users and items [1]. CF recommender system can be developed in two tracks depending on the available datasets. A widely used approach is through explicit CF which depends on users giving ratings or scores to a product. A good example for using explicit CF is the movie recommendation by Netflix. The company utilized the Star Ratings of movies provided by the users to recommend similar taste and interest. On the other hand, implicit CF uses customer activities, such as purchase and browsing history, to make a recommendation.

For this project, implicit feedback CF is used since the type of datasets provided by the client is a purchase history of the customers. This will indirectly reflect opinion through observing user behavior [2].

Math

Consider index u for users and index i for items, and let rui  be the input data of user u to item i. The input data r can indicate the number of times u purchased i, for example customer u bought an item i twice, then rui  = 2. Another example, for TV shows recommender system, if user u watched 70% of the show i, then rui  = 0.7.

For succeeding discussion, let pui be a binary variable indicating the preference of user u to item i defined below:

Items with no user interaction will therefore have zero preference. However, preference is not enough for capturing the interaction. This follows from the fact that, there are many reasons on why certain products were not availed. One case would be the user is unaware on the existence of the item, or unable to consume due to its price or limited availability. In order to capture these varying confidence level, let cui be the confidence measured as follows:

The confidence level will adjust the sum of square difference in the following loss function:

From the above equation, minimizing the loss function is equivalent to finding the vectors xuand yi such that it factors pui. The loss function is optimized across users and items. The vectors xuand yi are initialized randomly with user-defined number of features, and are tuned by minimizing the difference puixuTyi. The second term in Equation (3) is the regularization used to avoid overfitting during training. Figure 1 illustrates the goal of minimizing the loss function, where rows labeled U1, U2, …, Un are the users, the columns labeled I1, I2, …, Im are the items. The yellow cells in the R matrix below represents the interaction between the user and the item, for example, number of items bought. Further, the columns labeled f1, f2, …, fn are the features intialized to random values, but are tuned in the model training to approximate R. Both U and V matrices are shaded since the values in the cells are nonzeros. For further details, please refer to [1].

The process flow for model training is detailed in Figure 2. The flow starts with the clients’ Single Customer View (SCV) data, which contains the necessary columns such as the user ID and item ID. These data are then partitioned into training and testing datasets, ideally 70-30 or 80-20 split. The model is initialized in terms of the number of features in U and V matrices (see Figure 1), and their corresponding initial cell values. The model is trained by minimizing the loss function given in Equation (3). The idea is that, the learned or optimized cell values in both U and V, must lead the matrix product of U and V to S, such that S best approximates R, where R is the original user-item matrix. After training, the model is validated in the testing dataset. Finally, we evaluate the model using metrics such as Recall and Precision at k. If the model performed well in the testing dataset, then the model is deployed to productization, otherwise, we must retrain it.

 

Model Evaluation

There are several ways to evaluate the model. One approach is to use Recall and Precision at k. The idea is to come up with a confusion matrix and is done by binarizing the scores both for actual and predicted. Table 2 illustrates the typical confusion matrix.

From this confusion matrix, the precision and recall are computed as follows:

 

As mentioned above, confusion matrix is constructed from binary scores. Therefore, the scores for the predicted user-item matrix must be binarized as well. To do so, the first task is to convert the actual scores to binary scores. This is done by assigning a threshold that defines Actual: True and Actual: False. For purpose of contextualizing the confusion matrix in evaluating the recommendation scores, let’s redefine the row names of the Confusion Matrix as follows: let Relevant Products be the Actual: True, and let Irrelevant Products be the Actual: False. Therefore, given threshold t, any scores greater than or equal to t are considered relevant products, and any scores below that are considered irrelevant products. Further, let the Recommended Products be the Predicted: True and let the Unrecommended Products be the Predicted: False in the confusion matrix above. The threshold for these columns are defined by the top k. Any products inside the top k are considered Recommended Products and products outside top k are considered Unrecommended Products. The confusion matrix takes the following updated row and column names.

For USSC data, the threshold for the Relevant Products is 2 since that is the average transactions on all products. Using the confusion matrix in Table 3, one can now compute the metrics: Recall and Precision at k. For the model used in USSC data, the performance of the model is listed in Table 5.

USSC Recommendation Engine

In this section, we present the integration of the recommendation engine into USSC’s frontend system (system being used by USSC Front Liner Agents (FLAs) in doing customer transactions). The discussion proceeds with the journey of a customer in the USSC’s store, then the Process Flow, next is the Backend and Frontend of the application, etc.

USSC Store Routine

The usual routine in the store as observed from personal experience is narrated as follows: for remittance services, each customer will have to fill out some forms upon entering the store, and wait for their turn for the transaction. For Bill’s payment, Eload and other transactions that don’t need forms to fill out, the customer can proceed to the line and wait for her turn. As they are called one by one, the customer will present their transaction to the FLA, and as the FLA processes the  transaction, the customer simply stands in front of the FLA’s tall desk and wait for their receipt. It takes about 5 minutes maximum but that also depends on the transaction. While standing in front of the tall desk, in some occasion, the FLA will try to offer other products they have, maybe Eload etc, to the customer. Upon completing the transaction, the FLA will issue the receipt to the customer.

Process Flow

Given the understanding of how the routine works in the store, we now present the process flow of the customer journey and the FLA interaction with the recommendation engine. The high-level summary of the process is detailed in Figure 3 and we narrate it as follows: the journey starts with a regular customer, someone with records already in the database. The said customer will present her Panalo Kard (PK) to the FLA. The FLA will then enter the PK number to the textfield in the software (see Figure 5), which contains two main panels. The first one, lists the historical purchases of the customer; while the second one, lists the recommended products for the customer. The software is powered by NeuralMechanics’ (NM) Recommendation Engine which runs on the USSC’s server (see Communication section for discussion). After entering the PK number, the software will return the results on the two panels: historical purchases and top 5 recommended products in order. Given these lists, and having the understanding that the customer usually stands in front of the FLA’s tall desk waiting for their receipts. The FLA can take this opportunity and try to recommend products to the customer, while in the process of entering their transaction. The FLA will recommend the first product in the order, and if the customer declined, the FLA will try to recommend the second product, and so on; or the FLA can simply enumerate the products listed, and let the customer decide which to avail.

Finally, the new transaction of the regular customer is added to the USSC Single Customer View (SCV) data, which will be used for maintenance services. The maintenance services may include retraining of the model on the new dataset.

User Flow from Frontliner’s Perspective

The user flow of the application from the FLA’s perspective is illustrated in Figure 4. The flow starts with the customer’s PK number, and the first task of the FLA prior to entering the said number, is to make sure that the Application is in reset mode, that is the two panels (List of Purchased Products and List of Recommended Products) in the application are empty, and all buttons are deactivated (in gray color). The application will not proceed with the new PK number unless in reset mode. This might sound like a lot of preparation, but this simply avoids the case of recommending products to the new customer using the recommended products of the previous customer. Finally, the FLA can now enter the PK number to the textfield and press the Enter button. By doing this, the frontend of the application makes request to the backend using the PK number as the data input. The request is processed by the NM’s Recommendation Engine running on the USSC’s server. The response is send back to the client/frontend with output in .json format. The results of the response is then parsed and displayed on the two panels: List of Products Purchased by the customer, and the List of Recommended Products. With these lists displayed, the FLA can take the opportunity and make recommendation to the customer. If the customer bought one of the recommended products, the FLA must tag this using the corresponding Dropdown Menu next to the product recommended. The Dropdown Menu contains two options: Not Interested (Default) and Bought. Finally, at the end of the transaction, the FLA must submit the new data — the list of recommended products and the corresponding tag. The new data are send to the USSC server in particular to their database. The purpose of this new data is for future improvements of the model, and for monitoring purposes as well. Therefore, whether the customer bought the recommended products or not, the FLA must still submit the new data. After finishing the transaction of the current customer, the FLA must again reset the application, by clicking the Reset button, in preparation for the next customer. The whole process is done for all customers.

Backend

The backend of the software is powered by NeuralMechanics’ (NM) Recommendation Engine. Deployment of the recommendation engine is done via application containers with an exposed REST API for frontend connectivity. 

Deploying in Docker container ensures security, fault tolerance and scalability when the engine is deployed across multiple sites. Docker will contain the necessary libraries that include Python itself. For reference, read more about Docker here.

The Docker container will be deployed into USSC’s server, in which all requests from the frontend are processed. This architecture will help the ITs in maintaining the application, such as when updating the model or any additional functionality needed for the new features in the frontend. The following are the requirements:

  1. Docker or Python Virtual Environment
  2. Python 3.6 with the following libraries
    • Implicit
    • Klein
    • Pandas
    • Thread

The library used for the backend algorithm is the Implicit package by Benfred hosted on Github through this link: https://github.com/benfred/implicit

The package uses Cython which help in speeding up the model training. Having this dependency, however, requires installation of gcc via homebrew and updating of xcode on macOS. Recent updates on macOS 10.14.1 returns conflicts on Cython dependencies, though the team managed to get it to work on macOS 10.14.0 and macOS 10.14.2. This is just a heads up to anticipate.

Frontend

The frontend of the application is a web-based interface that can be accessed via a link under the domain of the clients’ web address or any link provided by the client. The frontend will have features shown in Figure 5 below. Needless to say, the application needs some security, say a login window, before the FLA can access the app, and this interface must be provided by the client.

Note: The frontend design in Figure 5 is NM’s proposal to the client, but this is not final yet since that’s the next discussion with the USSC once they approve to roll it out.

Communication

The frontend communication is done via asynchronous request, while the backend communication is done via multi-threaded asynchronous request. This approach will handle huge number of traffic so long as the server can cope with it.

Note: The architecture of the communication is NM’s proposal to the client, but this is not final yet since that’s the next discussion with the USSC once they approve to roll it out.

Product Deployment

In order to deploy the product, the ITs of both ends needs to deploy the app, both the frontend and backend, into the USSC’s server. As mentioned above, the frontend is accessed via a link that the FLA can go to using their web browser. It should be simple and straightforward to use, that is, with the web app fully loaded in the browser, the FLA can simply enter the PK number of the regular customer, which will then return list of products purchased by the customer and the corresponding recommended products. The overall process flow is depicted in Figure 3 and the frontend interface is shown in Figure 5.

Maintenance Services

Maintenance services can be of the following: updating the model, if the client needs to add new functionalities, or integrate it with other apps they have. Model updating is optional but highly recommended. This follows from the fact that the model only learned the training dataset, and so any new transaction for some period of time is useful for tuning the recommendations. The model retraining can be done every month or 2 months.

New Customer

The current recommendation engine handles regular customers only, that is, the model can only make recommendation on regular customers included in the training dataset. This is the nature of the algorithms behind it (Collaborative Filtering) which is based on Matrix Factorization (see the Methodology above). So in order to handle new customers, the app will recommend products that are frequently bought by the regular customers for the last six months, or simply the top products availed by the regular customers. Of course, there are ways to improve this general recommendation for new customer, but that is better tackled as future improvements.

Timeline

The following timeline covers the development of the backend system of the application. The frontend and integration of the system to the clients system is a separate timeline.

Model Performance

Table 5 details the mean precision at k and the mean recall at k of the model. The goal is to have high precision and high recall. From Table 5, k must be set to 5 to obtain this objective.

Results

The initial results of the model addresses the following tasks:

  1. For June 2018 (recent) with transaction frequencies higher than the average transaction frequency for the entire duration (January to June).
  2. For June 2018 (recent) only regardless of the transaction frequencies.
  3. For January-May 2018 with transaction frequencies higher than the average transaction frequency for the entire duration (January to June).

The results are encoded in this Google Sheet.

Next Steps

The next steps for the project is the productization of the backend. Hence, packaging of the libraries and the APIs of the backend are included; the frontend development; and integration of the application to the clients’ system. Model evaluation is also recommended and is needed for assessing the performance of the model in practice.

Github Repository

The codes is hosted on Github under foxmulder2017/csus, below is the link to it:

https://github.com/foxumulder2017/csus

Maintainer: Al-Ahmadgaid Asaad

References

  1. Hu, Y., Koren, Y. and Volinsky, C. Collaborative Filtering for Implicit Feedback Datasets, 2008 Eighth International Conference on Data Mining, IEEE Computer Society, 2008. pp. 263-272
  2. D.W. Oard and J. Kim, “Implicit Feedback for Recommender Systems”, Proc. 5th DELOS Workshop on Filtering and Collaborative Filtering, 1998, pp. 31–36.
  3. Lee, J., Sun, M., & Lebanon, G. (2012). A Comparative Study of Collaborative Filtering Algorithms. CoRR, abs/1205.3193.
  4. X. Su and T. M. Khoshgoftaar (2009). A survey of collaborative filtering techniques. Adv. in Artif. Intell., 2009:4:2-4:2.
  5. J. Breese, D. Heckerman, and C. Kadie (1998). Empirical analysis of predictive algorithms for collaborative filtering. In Proc. of Uncertainty in Artificial Intelligence.
  6. B. Sarwar, G. Karypis, J. Konstan, and J. Reidl (2001). Item-based collaborative filtering recommendation algorithms. In Proc. of the international conference on World Wide Web.
  7. A. Paterek (2007). Improving regularized singular value decomposition for collaborative filtering. Statistics, 2007:2-5.
  8. D. Lee and H. Seung (2001). Algorithms for non-negative matrix factorization. In Advances in Neural Information Processing Systems. MIT Press.
  9. R. Salakhutdinov and A. Mnih (2008). Probabilistic matrix factorization. In Advances in Neural Information Processing Systems.
  10. R. Salakhutdinov and A. Mnih (2008). Bayesian probabilistic matrix factorization using markov chain monte carlo. In Proc. of the International Conference on Machine Learning.
  11. N. D. Lawrence and R. Urtasun (2009). Non-linear matrix factorization with gaussian processes. In Proceedings of the 26th Annual International Conference on Machine Learning.
  12. D. Lemire and A. Maclachlan (2005). Slope one predictors for online rating-based collaborative filtering. Society for Industrial Mathematics, 05:471-480.
  13. K. Yu, S. Zhu, J. Laerty, and Y. Gong (2009). Fast nonparametric matrix factorization for large-scale collaborative filtering. In Proc. of the international ACM SIGIR conference on Research and development in information retrieval.
  14. M. Sun, G. Lebanon, and P. Kidwell (2011). Estimating probabilities in recommendation systems. In Proc. of the International Conference on Artificial Intelligence and Statistics.
  15. G. Linden, B. Smith and J. York, “Amazon.com Recommendations: Item-to-item Collaborative Filtering”, IEEE Internet Computing 7 (2003), 76–80.
  16. R. Salakhutdinov, A. Mnih and G. Hinton, “Restricted Boltzmann Machines for Collaborative Filtering”, Proc. 24th Annual International Conference on Machine Learning, pp. 791–798, 2007.
  17. D. Blei, A. Ng, and M. Jordan, “Latent Dirichlet Allocation”, Journal of Machine Learning Research 3 (2003), 993–1022.