Abstract In the last couple of years, short videos have become the new darling of the digital mediascape. After the internet boom in India, many influencers are emerging daily. We all have our favorite creators and can spend hours watching their content. For a platform like ours, we needed a user-creator affinity recommendation model such that we recommend creator stories to users based on the affinity (likeability) factor where a consumer’s (user) likeability for a creator is defined by: Follow, Profile Visit, Like, Comment, Share, etc. Overview Affinity means a natural liking for and understanding of someone or something. Affinity is a factor that changes with time and interest niche. Our goal is to capture user-creator affinity strength, which also captures users’ i.e., what type of stories a consumer (user) prefers more. temporal interest niche Business Goals Improve stories recommendation algorithm such that user’s session time increases Improve user niche discovery for content Improve visibility of and content discovery based on consumer’s likeability factor. long-tail creators Expected Outcome Recommend a list of story ids of creators for whom user-creator affinity is high. : A creator is also a user on the platform. Hence, I will address as who watch a creator’s video. NOTE users consumers Interaction between Creator-Consumer on Roposo Out of these different interactions b/w consumer-creator, we decided to pick profile visit as a stronger signal to map out similarity between creators. Approach High-Level Approach We divided the problem into 2 parts: For each consumer (user) find the top K creators based on the for which the True Affinity score is high; where affinity is defined by like, follow, profile visit, comment, loop_count, perc_seen (percentage of video seen based on video duration), etc. Multi-Criteria Decision Making TOPSIS technique Post finding these True Top K creators for whom affinity is high based on MCDM. We take embedding of these creators (embedding computed using node2vec embedding methodology) and find out nearest neighbors and recommend similar creators. *: First, we find out true high-affinity creators for a consumer based on MCDM. Then we find similar creators with respect to high-affinity creators.* In Summary Implementation Details (TLDR) — Refer to the above Approach Figure with Steps Step 1: Finding True Top Affinity creator for a Consumer (user) from interactions. This is a as we wanted to rank all creators for a consumer (user) with whom the consumer interacted in the last 30 days. multi-criteria decision-making (MCDM) or multi-criteria decision analysis (MCDA) problem Consumer-Creator Interaction Data We use the an MCDM algorithm to rank creators in order of affinity. TOPSIS is based on the concept that the chosen alternative should have the shortest geometric distance from the ideal solution and the longest geometric distance from the worst solution. Technique for Order of Preference by Similarity to Ideal Solution (TOPSIS) Scikit Criteria_:_ Link One can check out my blogs to get a detailed understanding of MCDM: Ranking of entities with Multi-Criteria Decision Making Methods (MCDM) — Part One | Ranking and Selection of the best with Multi-Criteria Decision Making (MCDM) — Part Two Post step1, for every consumer we have a ranked creator set based on affinity factors with whom the consumer has interacted in the last 30 days. Sum up: **Step2 & Step3: Creator Graph — Profile Visits to Embedding**We constructed a Creator-Creator graph based on the profile visits of a consumer. Connections between those creators were made for which profile visits by consumers co-occurred on a particular day. The graph weights were defined by co-occurrence strength (number of times profile visits by a consumer co-occurred). We computed the creator embeddings based on that uses word2vec skip-gram model. Paper Link Node2vec+ Node2vec Params: How to set p and q? The top and bottom panels correspond to the embedding generated using q = 0.5 and q = 2. One can see that in the top panel, nodes that fall into the same (i.e., ) are colored the same. On the other hand, in the bottom panel, nodes are colored the same. node2vec local network neighborhood homophily structurally equivalent Params q=0.5 and p=1 in this setting node2vec discover clusters/communities of characters that frequently interact with each other. Since the edge b/w nodes are based on co-appearances. Post Step2 & Step3, we now have creator embedding computed based on creator-creator graph build based on co-occurrence of profile visit. Sum up: **Step4: Recommending Top Creators**Now, we have true Consumer (User)-Creator Affinity Ranked based on MCDM and we have embeddings of all (active) creators on our platform. We pick the top 5 True Affinity Creators ranked from the MCDM technique and recommend Nearest Neighbours to get the top 100 high-affinity creators. Why top 5 True Affinity Creators were picked as query vectors? Why not pick the best top 1 or create a mean vector of top 5 creators and show similar creators to the query vectors in embedding space? Idea of picking top 5 creators is inspired from Pinterest Research Paper . PinnerSage It is true a user cannot be represented by one particular “interest” embedding.In general even in example of movies everyone shows interests in multiple genres likes honor, action, sci-fi, comedy, etc.To identify user interest we pick top 5 creators from the ranked set. For vector similarity search we used Approximate Nearest Neighbour Algorithm (ANN) over creator embeddings for fast vector similarity search. ScaNN Approach Summary (Recap) We use MCDM to rank creators for each consumer using the interactive features (affinity-defining features). This ranked set of creators is the True Affinity Creators for a consumer (user). Now, we create a creator-creator graph based on profile visits co-occurrence in a session of a consumer. On this graph, we apply the random-walk algorithm Node2Vec+ with a set breadth-first search and depth-first search parameters. This gets us a creator vector representation. At last, we pick the top 5 creators from the True Affinity set (ranked set based on MCDM) for a consumer and use the creator embeddings to find the top 100 most similar creators from the entire creator. For a fast vector similarity search, we use the ScaNN algorithm. Stories Recommendation Our expected outcome is a list of storyid. Hence, from the 100 top affinity creators for each consumer (from the above approach), we pick the latest not watched story of each creator and add it to the recommendation pool of the consumer, stories ranked based on creator similarity score wrt. user’s true creator affinity. Conclusion This approach of Topsis MCDM and Node2Vec+ not only ranks creators for a consumer but also helped us to find similarities between creators of the same niche using a profile-visit co-occurrence graph. Reference PinnerSage: Multi-Modal User Embedding Framework for Recommendations at Pinterest Billion Scale Recommendation at Taobao DeepWalk: Online Learning of Social Representations Ranking of entities with Multi-Criteria Decision Making Methods (MCDM) — Part One Ranking and Selection of the best with Multi-Criteria Decision Making (MCDM) — Part Two I hope you learned something new from this post. If you liked it, hit ❤️, subscribe, and share this blog with others. Want to discuss it further? Connect with me . here This newsletter is now read by more than 4500+ subscribers. If you are building an AI or a data product or service, you are invited to become a sponsor of one of the future newsletter issues. Feel free to reach out to shauryauppal97@gmail.com for more details on sponsorships. I am nominated for the HackerNoon 2022 Noonies, Vote for me: https://www.noonies.tech/2022/programming/2022-hackernoon-contributor-of-the-year-data Connect 1:1 Meeting here: https://topmate.io/shaurya I am open to Consults you can reach out to me on LinkedIn: https://www.linkedin.com/in/shaurya-uppal/ Data Science Book Recommendations: [1] The Book of Why [2] Naked Statistics