We just introduced (in part 1) the different models we use to greater personalize the notifications and content we offer to our users. Now we’ll present our experiences when building our very first recommendation engine.

Collaborative filtering – predicting whether a user will click on a notification

When thinking about how to apply these recommendation methods in our use case, we identified two important factors –

  1. What will make a user click on a notification and open it
  2. Which content items will seem valuable to a user

The main insight that led us to separate these two factors was the user’s experience. At first, the user only sees a notification. Of course, the notification indicates in some way what the content will be about (beware of clickbait!), but it’s hard to fully explain the full value of a content item in just a small sentence within the notification.

With that in mind, we started developing a new custom recommendation model – Collaborative filtering based on user open rate (whether or not a user clicked on a notification). This model’s purpose was to find similarities between users that opened the same content items. In order to analyze the performance of this recommendation model, we continued delivering content via our randomized model to a control group of users. That way we can compare how our new model impacts user engagement vs. our older, random model.

One key insight we learned from the first iteration of our new model was the faster we got our model to users, the more we learned. Even though there are many ways to test a recommendation model before going to production, the faster/earlier we were able to get our model into production the more valuable our learning was on its impact on users. And the faster we learned, the better our understanding was of what the next steps should be.

Y – open rate on a scale of 0 to 1
X – date

After the first ~2 month we already started seeing promising results – The new model produced much a higher open rate as can be seen in the above graph!!

Great success! We’ve found that we are able to target better content items to users by finding similarity between users, meaning we are more personalized. At this point we deduced users preferences only by whether they clicked on a notification or not.

Top rank recommendation

Although we’ve seen these good results, we still had some issues. A known issue when using collaborative filtering is the ‘cold start’ issue. Since collaborative filtering tries to find similarity across user’s preferences, it requires some past knowledge about what the user ‘s likes and dislikes in order to produce an accurate recommendation.

In order to address this issue quickly, we decided to deploy another recommendation model to production called ‘Top Rank’, which prioritizes our most popular content items (measured by notification open rate) to be sent first. This allows us to easily recommend content to our users already from day one from registration. But is also means that our recommendations aren’t personalized – it recommends the same content items to all users.

Another benefit of the ‘Top Rank’ model is that it gives us another control group to compare our collaborative model to. And the ‘Top Rank’ model gives us better results for users in their first period in our service.

Another metric we started monitoring was the diversity of our recommendations (making sure we do not recommend the same content items again and again). Check out the article from the previous part(The Netflix Recommender System) on Netflix’s recommendation engine for an interesting read on this point. Not surprisingly, the ‘Top Rank’ model does not score well regarding diversity.

Expanding our recommendations – calculating user value on a given content item

Our first recommendation model mainly used information about our users’ actions, such as whether a user opened a notification or not. But we wanted to better understand if or how a user related to the content itself – did they enjoy it? Did they find it to be valuable to them?

To do that we first needed to define how we measure our content’s value in the eyes of our users. But value is subjective, with no right or wrong answer. So for us to measure value we needed to truly understand our product and, more specifically, what we considered to be a valuable experience for our users.

We defined certain functions that result in the value we feel the user should receive, such as how many times they reopen the content item, how long they spent reading the item, if they scroll to the end, if they click on other links within the content item, and others. We also added a Like/Dislike button which provides us with direct user feedback. To be sure the functions we chose most accurately correlated to the value we wanted to measure, we collaborated with other teammates (including content writers, developers, product managers, designers) for their input. Of course this was the first iteration of defining value via certain functions and it will change and develop over time. We’ve called the overall output of these measurements the ‘Engagement Score’.

Using this ‘Engagement Score’ we created our second recommendation model. It was also collaborative filter based, but instead of only using user’s actions, like whether or not a user clicked on a notification, it used this engagement score. Our goal was to produce data that helped us learn about our users’ personality and preferences so we can determine how to better engage with them and deliver content they truly enjoy.

Expanding our recommendations – content based recommendations

At the same time, we also started working on another recommendation model, content based. The main goals with this model were –

  • To provide recommendations to new users
  • To provide more diverse recommendations
  • Create a foundation upon which we can create future hybrid recommendation models

This model takes into account many more parameters than our previous models, such as our content items’ metadata (length, topic, images and more), and data about the user (device type, device screen size, how long the user is in the service, user reaction to other notifications – if data already exist and more), but on that we will explain deeper in future posts, for now it’s enough data for you to process 🙂

Future work

Currently, our method is to randomly assign a recommendation model to a given user. One of our next goals will be to better understand how to integrate all the different models into the same system in order to provide the best recommendation for each of our users.

Another interesting goal is to provide more information to content writers about our users’ needs, interests, and preferences to help them determine which content items to write about next.
In addition, we want to start personalizing other parts of our service, as we discussed before. We’ve already started conducting a small experiment on how often to send notifications to different user types which we’ll discuss in more depth in future blog posts. In addition, in the future we’ll elaborate more on the technicalities of both our recommendation models and the architecture needed to support this growing system.

Special thanks to Buffy (in the image above) for inspiring us while writing this blog 🙂

Writers of this blog: Amir Pupko and Bar Levy