Content Recommender System | Naveen

Motivation

Generic recommendation engines optimize for engagement. They surface popular items, trending content, or straightforward purchase history — and they work reasonably well until you have a diverse enough user base that “popular” stops meaning “relevant to this specific person.”

The goal of this project was to build a recommender that understood why customers were similar, not just what they had bought. By modeling psychographic characteristics alongside behavioral signals, the system could identify meaningful similarity even between customers with very different purchase histories.

Collaborative Filtering

The foundation is a standard collaborative filtering approach: users are represented as vectors in preference space, and similarity is computed between those vectors to generate recommendations.

The user preference matrix is built from:

Explicit signals: ratings, saves, explicit preference inputs
Implicit signals: time-on-item, return visits, scroll depth
Purchase history: product category affinities derived from order history

Each user ends up as a dense vector in a high-dimensional space where each dimension corresponds to a preference axis.

Psychographic Model

Psychographic modeling adds a layer on top of behavioral data. Rather than relying solely on what users have done, it attempts to capture why they do it — the underlying motivations and values that predict future preferences.

The model uses a cluster of psychographic dimensions drawn from survey data and inferred behavioral proxies:

Price sensitivity — inferred from discount engagement rates
Quality orientation — inferred from brand and category selections
Novelty seeking — inferred from new-product engagement velocity
Social influence — inferred from engagement with trending or shared content

These dimensions are combined with behavioral signals into a unified preference vector per user.

Distance Calculation

Similarity between users is computed using Euclidean distance in the combined preference space:

import numpy as np

def user_similarity(vec_a: np.ndarray, vec_b: np.ndarray) -> float:
    distance = np.linalg.norm(vec_a - vec_b)
    # Convert distance to similarity score in [0, 1]
    return 1 / (1 + distance)

The top-N most similar users to a given customer are retrieved, and items they engaged with (but the target customer has not yet seen) are scored by weighted overlap across the similar-user set.

Results

After deploying to a subset of users in a controlled rollout:

Recommendation click-through rate increased by 23% over the baseline collaborative filter
Average session length increased by 11%
The psychographic dimensions were most predictive for users with sparse purchase histories — exactly the population where standard collaborative filtering performs worst

The biggest lesson: psychographic signals have value precisely because they provide signal in the absence of behavioral data. New users with no history still have preferences that can be surfaced through the onboarding questionnaire, giving the system something to work with from day one.