For travel e-business companies, recommender systems are paramount. There is an increasing need to take into account all the user information to tailor the best product proposition. One of them is the content that the user actually sees: the visual of the product.
When it comes to hotels, some people can be more attracted by pictures of the room, the building or even the nearby beach.
In this talk, Pierre describes how he improved an e-business vacation retailer recommender system using the content of images. He explains how to leverage an open dataset and pre-trained deep learning models to derive user taste information. This transfer learning approach enables companies to use state-of-the-art machine learning methods without having deep learning expertise.
This talk is composed of 3 major parts: the iterative creation of a recommender engine, the labeling of images, the post processing of images.
After introducing the main topic, labeling images to improve recommendation engine performances, Pierre starts with a recommendation engine discussion. He briefly describes the “classical” recommender system (collaborative filtering, content based filtering) and their advantages and limitations. He then describes the re-ranking approach we used to combine different engines into one. Re-ranking is a method (used by Google for example) that takes the different ranking as features and optimizes a certain loss. In our case we combine our different recommendations through a logistic regression that predict the probability of purchases for each tuple (user, sale). This version of the engine led to +7% revenue per customer and is now running in production.
He continues explaining why we wanted to use images information. It seemed that sales with some given images were performing better than others. If we had labels on all images we could use them in a content-based recommender system (used itself in the re-ranking engine). He then described how to label our images using pre-trained models, transfer learning and external APIs. He also shows how easy it is to steal these APIs.
The final part deals with post processing of the images. Since most pre-trained models only output one class prediction, we need to reshape these into broad themes that can be used in our engine. We use a Non Negative Matrix Factorization for this purpose and show that we have very interpretable results. He concludes by comparing visually the different engines.
The key take away (more information in the pitch part) are theses: