Background
Our client is one of Canada’s most well-established and decorated news outlets. They have been the recipient of numerous journalism awards and have a reach of millions of readers for their print and digital content across all news categories.
In the early to mid 2010s, our client began to shift its focus towards their digital platform. With a significant weekly readership and the rapid transition to digital content, the client first created a data pipeline which could collect and store the millions of rows of clickstream data their users generated on a daily basis. Next, in order for the client to leverage their collected user clickstream data to enhance the online user experience, the Beam Data team was tasked with developing recommender system models whereby users can receive more personalized article recommendations.
Problem Statement
Our client aims to utilize a recommender system in order to:
- Increase user website engagement through the recommendation of more relevant articles
- Grow their current userbase and retain subscribed users long-term
Given that our client handles millions of users on a daily basis, leveraging big data tools was necessary in order to process the raw data and generate user-specific recommendations in a timely manner.
Methodology
In order to meet the technical requirements for recommender system development as well as other emerging data needs, the client has built a mature data pipeline through the use of cloud platforms like AWS in order to store user clickstream data, and Databricks in order to process the raw data. With these data tools in place, the Beam Data team was able to:
- Process the raw user clickstream data with Python & Spark to develop an array of recommender models. These models utilized traditional methods like content-based filtering and collaborative filtering, as well as more advanced deep learning techniques with BERT.
- Generate user article recommendations and write the recommendations back to a NoSQL database.
- Automate article recommendation generation through Databricks built-in job scheduler.
- AB Test the article recommendations generated from our developed models against the current champion model.
Architecture
This architecture demonstrates how data collected from our client’s website is stored and fed into databricks for model development. The recommendations generated from our models are then written back into a NoSQL database and displayed back on their website via an API
Conclusion
Over the course of this project, the Beam Data team tackled the development of several recommender models by taking advantage of collected user clickstream data and article meta data. This was performed in order to generate more personalized article recommendations with the goal of increasing user engagement. Given that these models are ran several times a day to update a user’s recommendations, the aim of subsequent projects will focus on further optimizing these models in order to maximize their performance while minimizing costs.