[{"content":" TL;DR\nPart 3 of a 4-part series. Part 1 covered MF. Part 2 covered FM and XGBoost. This post traces the deep-learning revolution in RecSys from 2016 to 2023 — Neural Collaborative Filtering, Wide \u0026amp; Deep, DeepFM, Deep Interest Network (DIN), DLRM, and AdaTT. Audience: ML engineers and researchers building large-scale RecSys for e-commerce, streaming, and ads. Each section covers the architecture, the gap it filled, and where it wins/loses in production. Context: Why This Post Matters, Who It’s For, and What You’ll Learn Welcome to Part 3 of our four-part series on evaluating recommendation systems (RecSys)! In the previous installments, we laid the groundwork: Part 1 introduced foundational techniques like collaborative filtering (CF) and Matrix Factorization (MF), which excelled at capturing user-item interactions but assumed linearity, missing complex patterns. Part 2 explored Factorization Machines (FM) and XGBoost, which tackled sparse data and non-linear ranking but fell short on higher-order interactions and sequential behaviors. By 2016, these limitations spurred a seismic shift toward deep neural networks (DNNs), which transformed RecSys by learning intricate feature interactions, automating feature engineering, and addressing diverse tasks like sequential recommendations and multi-task optimization. This post traces that evolution from 2016 to 2023, diving into Neural Collaborative Filtering (NCF), Wide \u0026amp; Deep Learning, DeepFM, Deep Interest Network (DIN), Deep Learning Recommendation Model (DLRM), and Adaptive Task-to-Task Fusion (AdaTT). It’s tailored for data scientists, ML engineers, and tech professionals—particularly those designing large-scale RecSys in domains like e-commerce, streaming, and advertising—who need a deep, technical understanding of these advancements.\nRecap: Where We Left Off In Part 2, we saw how FM extended MF by modeling pairwise feature interactions, making it a powerhouse for sparse settings like click-through rate (CTR) prediction. Its prediction function, ( \\hat{y}(\\mathbf{x}) = w_0 + \\sum_{i=1}^n w_i x_i + \\sum_{i=1}^n \\sum_{j=i+1}^n \\langle v_i, v_j \\rangle x_i x_j ), captured second-order relationships efficiently but couldn’t handle higher-order interactions or non-linear patterns beyond its linear assumptions. XGBoost, meanwhile, leveraged tree ensembles to rank items based on non-linear feature combinations, shining in tasks like top-N recommendations. Yet, it struggled with high-dimensional sparse data and required extensive manual feature engineering, limiting its scalability. These gaps—missing deep non-linearities, higher-order interactions, and sequential modeling—paved the way for DNNs, which, starting in 2016, redefined RecSys by learning complex patterns directly from raw data.\nThe Big Picture: The Deep Learning Revolution in RecSys Picture a recommendation system as a guide helping you navigate a vast library. In Part 2, our guide used simple rules: FM paired clues like your reading history with book traits, while XGBoost ranked options by studying everyone’s preferences. But what if your interests shift over time (say, from mysteries to sci-fi), or the guide needs to predict both what you’ll read and whether you’ll buy it? These earlier methods faltered. DNNs emerged as a smarter guide, capable of deciphering intricate patterns, tracking sequential behaviors, and juggling multiple goals. From 2016’s Wide \u0026amp; Deep to 2023’s AdaTT, this era saw RecSys evolve to handle complex user behaviors with unprecedented accuracy, shaping modern systems in companies like Google, Alibaba, and Facebook.\nDeep Dive: The Evolution of DNNs in RecSys Let’s explore this journey, starting with Neural Collaborative Filtering, which kicked off the DNN era by rethinking how we model user-item interactions.\nNeural Collaborative Filtering (NCF, 2017) Traditional MF, a staple from Part 1, predicts user-item interactions via a dot product: ( \\hat{r}_{ui} = p_u^T q_i ), where ( p_u ) and ( q_i ) are latent vectors for user ( u ) and item ( i ). This worked well for explicit ratings but assumed linearity, missing non-linear patterns in implicit feedback like clicks or views. In 2017, He et al. proposed Neural Collaborative Filtering (NCF) to overcome this, replacing the dot product with a neural network to capture complex, non-linear relationships. The motivation was clear: real-world preferences aren’t linear—liking sci-fi movies doesn’t linearly predict liking sci-fi books—and DNNs, fresh from successes in vision and NLP, offered a way to model these nuances.\nNCF’s architecture comes in three flavors. First, the inputs are simple: one-hot encoded user ID ( \\mathbf{u} ) and item ID ( \\mathbf{i} ), mapped to dense embeddings ( \\mathbf{p}_u ) and ( \\mathbf{q}_i \\in \\mathbb{R}^{32} ) via lookup tables. The Generalized Matrix Factorization (GMF) variant mimics MF but with a neural twist: it computes an element-wise product ( \\mathbf{p}_u \\odot \\mathbf{q}i ), feeds it through a linear layer with weights ( \\mathbf{w} ), and applies a sigmoid activation to output a probability: ( \\hat{y}{ui} = \\sigma(\\mathbf{w}^T (\\mathbf{p}_u \\odot \\mathbf{q}_i)) ). This retains MF’s linear interaction but learns the weighting neurally. The Multi-Layer Perceptron (MLP) variant takes a different tack, concatenating the embeddings into ( [\\mathbf{p}_u, \\mathbf{q}_i] ) and passing them through three fully connected layers (e.g., 256, 128, 64 neurons) with ReLU activations: ( \\mathbf{z}_1 = \\text{ReLU}(\\mathbf{W}_1 [\\mathbf{p}u, \\mathbf{q}i] + \\mathbf{b}1) ), followed by more layers, ending in a prediction layer. This captures non-linear interactions unavailable to MF. Finally, Neural Matrix Factorization (NeuMF) combines both, concatenating GMF’s and MLP’s penultimate outputs and applying a final linear layer: ( \\hat{y}{ui} = \\sigma(\\mathbf{w}^T [\\mathbf{z}{\\text{GMF}}, \\mathbf{z}{\\text{MLP}}]) ). This hybrid leverages both linear and non-linear modeling.\nFor implicit feedback (e.g., clicks), NCF uses binary cross-entropy as its loss: ( L = -\\sum_{(u,i) \\in D} [y_{ui} \\log(\\hat{y}{ui}) + (1-y{ui}) \\log(1-\\hat{y}{ui})] ), where ( y{ui} = 1 ) for observed interactions and 0 otherwise. Since unobserved pairs vastly outnumber observed ones, negative sampling (e.g., 4 negatives per positive) keeps training feasible. The optimizer is Adam, with a learning rate of 0.001, balancing speed and stability. On the MovieLens 1M dataset, NeuMF achieved a Hit Ratio@10 of 0.71, beating MF’s 0.67 by 6%, thanks to its ability to model non-linear patterns. Compared to MLP alone (0.69), NeuMF’s fusion of GMF’s linearity and MLP’s depth proved superior. The special change—swapping a dot product for a neural network—unlocked this flexibility, though NCF ignores auxiliary features like user demographics and can’t model sequential behaviors.\nWide \u0026amp; Deep Learning (2016) NCF’s focus on user-item pairs left out contextual features and struggled with generalization in sparse, diverse settings. Enter Wide \u0026amp; Deep Learning, proposed by Cheng et al. at Google in 2016, designed for app recommendations on Google Play. The problem was twofold: linear models like logistic regression memorized specific patterns (e.g., “user installed app A”) but couldn’t generalize to unseen data, while DNNs generalized well but missed rare, critical interactions. Wide \u0026amp; Deep combined a linear “wide” model for memorization with a DNN “deep” model for generalization, aiming to balance both.\nThe architecture starts with inputs: sparse features (e.g., user ID, app ID) mapped to embeddings (e.g., 32 dimensions) and dense features (e.g., user age) fed raw. The wide component is a linear model: ( \\mathbf{y}{\\text{wide}} = \\mathbf{w}^T \\mathbf{x} + b ), where ( \\mathbf{x} ) includes raw features and hand-crafted cross-features (e.g., “user installed app A AND app B”), capturing low-order interactions. Designing these cross-features required domain expertise, a key modification over pure DNNs. The deep component is an MLP with three hidden layers (1024, 512, 256 neurons) using ReLU: ( \\mathbf{z}1 = \\text{ReLU}(\\mathbf{W}1 \\mathbf{e} + \\mathbf{b}1) ), where ( \\mathbf{e} ) concatenates embeddings and dense inputs, learning higher-order interactions. The outputs merge via a weighted sum: ( \\hat{y} = \\sigma(\\mathbf{w}{\\text{wide}}^T \\mathbf{y}{\\text{wide}} + \\mathbf{w}{\\text{deep}}^T \\mathbf{z}{\\text{deep}} + b) ), yielding a click probability.\nThe loss is logistic (binary cross-entropy), optimized differently per component: FTRL with L1 regularization for the wide part (encouraging sparsity) and AdaGrad for the deep part (adapting to dense gradients). On Google Play, Wide \u0026amp; Deep boosted app installations by 3.9% over a wide-only model and 1% over a deep-only model, proving the hybrid’s value. Unlike NCF, it leverages auxiliary features, but the manual engineering of cross-features limits scalability, and it doesn’t address sequential data or higher-order interactions beyond the wide part.\nDeepFM (2017) Wide \u0026amp; Deep’s reliance on manual feature engineering was a bottleneck, especially for large-scale systems with thousands of features. In 2017, Guo et al. introduced DeepFM, targeting CTR prediction in online advertising (e.g., Criteo dataset), by combining Factorization Machines (FM) with a DNN to automate feature interactions. FM’s strength was modeling pairwise interactions efficiently, but it missed higher-order patterns; DeepFM extended it to capture both low- and high-order interactions without human intervention.\nDeepFM’s inputs are sparse features (e.g., user ID, ad ID) mapped to embeddings ( \\mathbf{v}i \\in \\mathbb{R}^{10} ). The FM component computes: ( \\mathbf{y}{\\text{FM}} = w_0 + \\sum_{i=1}^n w_i x_i + \\sum_{i=1}^n \\sum_{j=i+1}^n \\langle \\mathbf{v}_i, \\mathbf{v}_j \\rangle x_i x_j ), capturing second-order interactions via dot products. The deep component, an MLP with three 200-neuron layers and ReLU, takes the same embeddings: ( \\mathbf{z}1 = \\text{ReLU}(\\mathbf{W}1 \\mathbf{v} + \\mathbf{b}1) ), learning higher-order interactions. Sharing embeddings between FM and DNN ensures consistency and efficiency—a key design choice. The final output combines both: ( \\hat{y} = \\sigma(\\mathbf{y}{\\text{FM}} + \\mathbf{w}{\\text{deep}}^T \\mathbf{z}{\\text{deep}}) ).\nThe loss is binary cross-entropy, optimized with Adam (learning rate 0.001). On Criteo, DeepFM hit an AUC of 0.801, edging out Wide \u0026amp; Deep (0.799) and FM (0.785), as it automated feature engineering while retaining FM’s strengths. This modification—replacing Wide \u0026amp; Deep’s manual cross-features with FM—made it scalable, though it still overlooks sequential user behaviors critical for dynamic settings like e-commerce.\nDeep Interest Network (DIN, 2017) DeepFM’s static modeling couldn’t capture how user interests evolve, say, during an e-commerce browsing session. Zhou et al. at Alibaba introduced the Deep Interest Network (DIN) in 2017 to address this, using an attention mechanism to weigh historical behaviors based on their relevance to a candidate item. Proposed for ad recommendations, DIN recognized that not all past interactions (e.g., clicked items) equally inform the next click, necessitating a dynamic approach.\nDIN’s inputs include a user’s behavior sequence ( S_u = {v_1, v_2, \\ldots, v_T} ) (e.g., clicked items), a candidate ad ( v_a ), and context features, all mapped to embeddings. The core innovation is the attention mechanism: for each historical item ( v_i ), it computes a weight ( \\alpha_i = f(\\mathbf{v}_i, \\mathbf{v}_a) ) using a small MLP: ( f(\\mathbf{v}_i, \\mathbf{v}_a) = \\text{ReLU}(\\mathbf{W} [\\mathbf{v}_i, \\mathbf{v}_a, \\mathbf{v}_i \\odot \\mathbf{v}_a] + \\mathbf{b}) ). This weights items by relevance, forming a user interest vector ( \\mathbf{s}u = \\sum{i=1}^T \\alpha_i \\mathbf{v}i ). This vector, the candidate embedding, and context embeddings feed into an MLP with three layers (200, 80, 2 neurons) and ReLU, ending in a sigmoid output: ( \\hat{y} = \\sigma(\\mathbf{w}^T \\mathbf{z}{\\text{deep}}) ).\nThe loss is binary cross-entropy, optimized with Adam (learning rate 0.001). On Alibaba’s dataset, DIN’s AUC of 0.82 beat DeepFM’s 0.80 by 2%, highlighting attention’s power in sequential modeling. Unlike DeepFM, DIN adapts to temporal dynamics, but it’s tailored to single-task CTR prediction, not multi-objective scenarios.\nDeep Learning Recommendation Model (DLRM, 2019) DIN’s single-tower design wasn’t built for the massive scale and diverse features of systems like Facebook’s ad platform. In 2019, Naumov et al. proposed DLRM, a multi-tower architecture for CTR prediction, explicitly modeling pairwise interactions for scalability and interpretability. The need arose from handling billions of sparse features (e.g., ad IDs) alongside dense ones (e.g., user stats), where implicit modeling slowed training.\nDLRM’s inputs split into dense features (e.g., user age) and sparse features (e.g., user ID), mapped to embeddings. The dense tower is an MLP with three layers (512, 256, 128 neurons) and ReLU, processing continuous inputs. The sparse tower computes pairwise dot products between embeddings: ( \\mathbf{z}_{ij} = \\mathbf{v}_i^T \\mathbf{v}_j ), forming an interaction vector. These outputs concatenate with the dense tower’s result, feeding a top MLP (128, 1 neurons) with ReLU and sigmoid: ( \\hat{y} = \\sigma(\\mathbf{w}^T \\mathbf{z}) ).\nThe loss is binary cross-entropy, optimized with Adam or SGD. On Criteo, DLRM’s AUC of 0.802 slightly topped DeepFM’s 0.801, with better scalability from its parallel towers—a key change over single-tower designs. However, it focuses on single-task CTR, not multi-task or sequential needs.\nAdaptive Task-to-Task Fusion (AdaTT, 2023) DLRM’s single-task focus couldn’t handle multi-objective RecSys, like predicting CTR and conversion rate (CVR) together. Multi-Task Learning (MTL) emerged to share representations across tasks, but task conflicts often hurt performance. Yang et al.’s AdaTT, introduced in 2023 at Kuaishou, tackled this with an adaptive task-to-task fusion network, dynamically balancing task interactions.\nAdaTT’s inputs—shared features (e.g., user ID, item ID)—map to embeddings feeding a shared bottom MLP: ( \\mathbf{z}_{\\text{shared}} = \\text{ReLU}(\\mathbf{W} \\mathbf{e} + \\mathbf{b}) ). Task-specific towers (e.g., CTR, CVR) process this into ( \\mathbf{z}t = \\text{MLP}t(\\mathbf{z}{\\text{shared}}) ). The innovation is an attention-based fusion: ( \\mathbf{z}t\u0026rsquo; = \\sum{s \\neq t} \\alpha{ts} \\mathbf{z}s ), where ( \\alpha{ts} ) weights contributions from other tasks, computed via a task-to-task attention MLP. Outputs are per-task sigmoids: ( \\hat{y}_t = \\sigma(\\mathbf{w}_t^T \\mathbf{z}_t\u0026rsquo;) ).\nThe loss is a weighted sum: ( L = \\sum_t \\lambda_t L_t ) (binary cross-entropy per task), optimized with Adam. On Kuaishou’s dataset, AdaTT lifted CTR AUC by 1.5% and CVR AUC by 2% over single-task models, thanks to its adaptive fusion—a leap over static MTL. Its complexity, though, demands careful tuning.\nConclusion: What’s Next From NCF’s non-linear leap to AdaTT’s multi-task finesse, DNNs have reshaped RecSys, each model solving a prior limitation: NCF broke linearity, Wide \u0026amp; Deep merged memorization and generalization, DeepFM automated engineering, DIN added sequence, DLRM scaled up, and AdaTT tackled multiple goals. Part 4 will explore graph neural networks and transformers, pushing RecSys further into complex, real-time domains.\nReferences He, X., et al. (2017). Neural Collaborative Filtering. arXiv:1708.05031. Cheng, H.-T., et al. (2016). Wide \u0026amp; Deep Learning for Recommender Systems. arXiv:1606.07792. Guo, H., et al. (2017). DeepFM: A Factorization-Machine Based Neural Network for CTR Prediction. arXiv:1703.04247. Zhou, G., et al. (2017). Deep Interest Network for Click-Through Rate Prediction. arXiv:1706.06978. Naumov, M., et al. (2019). Deep Learning Recommendation Model for Personalization and Recommendation Systems. arXiv:1906.00091. Yang, S., et al. (2023). AdaTT: Adaptive Task-to-Task Fusion Network for Multitask Learning in Recommendations. arXiv:2304.04959. ","permalink":"https://pwaabdullah.github.io/posts/the-evaluation-of-recsys-part-3/","summary":"Part 3 of the series: how DNNs transformed RecSys from 2016 onward — NCF, Wide \u0026amp; Deep, DeepFM, DIN, DLRM, and AdaTT. Architectures, intuition, and where each shines.","title":"The Evaluation of RecSys — Part 3: The Deep Learning Era (NCF, Wide \u0026 Deep, DeepFM, DIN, DLRM, AdaTT)"},{"content":" TL;DR\nPart 2 of a 4-part series. Start at Part 1 → if you missed it. Factorization Machines (FM) generalize matrix factorization to arbitrary categorical/numerical features — making them a workhorse for CTR prediction on sparse data. XGBoost brought robust non-linear ranking via gradient-boosted trees, dominating top-N recommendation tasks in the mid-2010s. Both hit hard limits on higher-order interactions and sequential behavior — setting up the deep-learning wave in Part 3 →. Welcome to Part 2 of the RecSys series! In Part 1, we traced the evolution of RecSys from content-based filtering (CBF) to collaborative filtering (CF), and finally to Matrix Factorization (MF), which introduced latent factor models to tackle sparsity and scalability. However, MF’s linear assumptions and struggles with implicit feedback (e.g., clicks, views) set the stage for more advanced techniques. In this post, we dive into two pivotal methods from the 2010-2015 era: Factorization Machines (FM) and Gradient Boosted Trees (XGBoost).\nIn Part 2, you’ll learn how FM generalizes MF to handle diverse data types, how XGBoost leverages decision trees for ranking, and the strengths and limitations of each.\n1. Recap In Part 1, we explored the foundational stages of RecSys:\nCBF based on features (e.g., movie genres) but struggled with diversity. CF leveraged user-item interactions, introducing neighborhood methods and latent factor models. MF modeled users and items in a latent space, predicting ratings as \\(\\hat{r}_{ui} = p_u^T q_i\\). However, MF assumed linear interactions and worked best with explicit feedback (ratings) only, failed to capture implicit feedback like clicks or views. These limitations prompted the 2010-2015 era, where machine learning techniques like FM and XGBoost emerged to handle more complex patterns.\n2. In Layman terms Imagine you’re shopping in a store for a jacket. In Part 1, MF was like a salesman who suggested jackets based on your ratings, guessing your taste with simple categories like “likes warm jackets” or “prefers casual style.” It worked well when you rated items, but what if you never rating? or what if the assistant only knows you clicked on a jacket or viewed its picture? MF struggles here because it’s too rigid. Enter Factorization Machines (FM) and XGBoost—two smarter assistants who arrived around 2010 to fix this.\nFM: It\u0026rsquo;s like a smart salesman who looks at everything—you clicked, the weather, and your profile (e.g., you’re a runner), mixing these clues to suggest a waterproof running jacket if it’s rainy. It’s flexible and can handle all kinds of hints, not just ratings.\nXGBoost: XGBoost is like a super-smart friend who learns from everyone’s shopping habits to suggest the perfect jacket. It builds a decision flowchart (Actually TREE): “If you like bright colors, and it’s winter, and you often buy on weekends, then try this red parka.” It improves its suggestions step by step.\nThese assistants are more flexible than MF, handling messy data and complex patterns, but they have limits, which we’ll explore as we move toward deep learning in Part 3.\n3. Prerequisites Dot product combines two vectors to measure similarity—think of it as a handshake between features (e.g., user preferences and item traits). Loss function measures prediction errors (e.g., squared error: \\((y - \\hat{y})^2\\)), regularization prevents overfitting, and optimization (e.g., gradient descent) minimizes the loss. One-hot encoding transforming raw data (e.g., user IDs, item categories) into usable inputs. From Part 1, recall MF models ratings as \\(\\hat{r}_{ui} = p_u^T q_i\\), where \\(p_u\\) and \\(q_i\\) are latent vectors, but it struggles with implicit feedback. For more on these topics, check out Linear Algebra Basics or Intro to Machine Learning.\n4. Deep Dive 4.1 Factorization Machines (FM) FM, introduced by Steffen Rendle in 2010, generalizes Matrix Factorization to model interactions between any features, not just users and items. It excels in sparse, high, dimensional settings like CTR prediction in online advertising, where data includes implicit feedback (clicks, views) and diverse features (user demographics, ad categories, context). FM’s ability to capture pairwise feature interactions without manual engineering made it a cornerstone for RecSys.\nHow It Works:\nFM models a prediction (e.g., click probability) as a combination of linear and pairwise feature interactions. For a feature vector \\(\\mathbf{x} \\in \\mathbb{R}^n\\) (where \\(n\\) is the number of features), the prediction is:\n$$ \\hat{y}(\\mathbf{x}) = w_0 + \\sum_{i=1}^n w_i x_i + \\sum_{i=1}^n \\sum_{j=i+1}^n \\langle v_i, v_j \\rangle x_i x_j $$\n\\(w_0\\): Global bias. \\(w_i\\): Weight for feature \\(x_i\\). \\(\\langle v_i, v_j \\rangle = v_i^T v_j\\): Dot product of latent vectors \\(v_i, v_j \\in \\mathbb{R}^k\\), modeling the interaction between features \\(x_i\\) and \\(x_j\\). \\(k\\): Number of latent factors (typically 10-100). This captures both linear effects ( \\(w_i x_i\\)) and pairwise interactions ( \\(\\langle v_i, v_j \\rangle x_i x_j\\)). For example, in CTR prediction, \\(x_i\\) might indicate the user, \\(x_j\\) the ad, and their interaction reflects compatibility.\nConnection to MF:\nIf \\(\\mathbf{x}\\) encodes only user \\(u\\) and item \\(i\\) (e.g., \\(x_u = 1\\), \\(x_i = 1\\), all others 0), FM reduces to MF: $$ \\hat{y}(\\mathbf{x}) = w_0 + w_u + w_i + \\langle v_u, v_i \\rangle $$ Here, \\(\\langle v_u, v_i \\rangle\\) mirrors MF’s \\(p_u^T q_i\\), but FM’s generality allows modeling additional features like user age or ad category.\nLoss Function:\nFM supports regression (rating prediction) or classification (click prediction). For regression:\n$$ L = \\sum_{(\\mathbf{x}, y) \\in D} (y - \\hat{y}(\\mathbf{x}))^2 + \\lambda (| \\mathbf{w} |_2^2 + | V |_F^2) $$\nFor classification (CTR):\n$$ L = \\sum_{(\\mathbf{x}, y) \\in D} \\log(1 + \\exp(-y \\hat{y}(\\mathbf{x}))) + \\lambda (| \\mathbf{w} |_2^2 + | V |_F^2) $$\n\\(D\\): Training data. \\(y\\): Target (e.g., 1 for click, -1 for no click). \\(\\lambda\\): Regularization to prevent overfitting. \\(V\\): Matrix of latent vectors \\(v_i\\). Optimization:\nRendle (2010) proposes three methods:\nStochastic Gradient Descent (SGD): Updates parameters incrementally for each sample, ideal for large datasets. Alternating Least Squares (ALS): Optimizes one parameter at a time, better for batch processing. Markov Chain Monte Carlo (MCMC): A Bayesian approach, offering uncertainty estimates but slower.\nSGD is often preferred for scalability, with updates like: $$ w_i \\leftarrow w_i - \\eta \\frac{\\partial L}{\\partial w_i}, \\quad v_i \\leftarrow v_i - \\eta \\frac{\\partial L}{\\partial v_i} $$\n\\(\\eta\\): Learning rate. Input and Output:\nInput: Sparse feature vector \\(\\mathbf{x}\\) (e.g., one-hot encoded user ID, item ID, context). Output: Predicted score (e.g., click probability, rating). Real-World Example:\nAt Meta Ads, FM might model user-ad interactions by combining user demographics (e.g., age, location), ad features (e.g., category, keyword), and context (e.g., device type), predicting the likelihood of a click to optimize ad placement.\nTakeaways:\nCaptures pairwise feature interactions. Scales well in sparse, high-dimensional data. Excels in CTR prediction and implicit feedback tasks. Features: Handles both explicit and implicit feedback, scales to high-dimensional sparse data, captures feature interactions without manual engineering.\nLimitations: Limited to pairwise interactions, misses higher-order patterns (e.g., user-item-context triplets), and assumes linear combinations, which may not capture deep non-linearities.\nLed to: Deep learning models like DeepFM (2017), which combine FM with neural networks to learn higher-order interactions.\n4.2 Gradient Boosted Trees (XGBoost) XGBoost, introduced by Chen \u0026amp; Guestrin in 2016, leverages an ensemble of decision trees for ranking tasks in RecSys, excelling in search (e.g., Bing) and online advertising. It addresses MF and FM’s limitations in capturing non-linear patterns, using second-order optimization for efficiency and scalability.\nHow It Works:\nXGBoost builds a sequence of decision trees, each correcting the errors of the previous ones. For RecSys, it’s often used in learning-to-rank tasks (e.g., ranking search results or videos). Features include user behavior (clicks, watch time), item metadata (category, tags), and context (time, device). The prediction is:\n$$ \\hat{y}i = \\sum{t=1}^T f_t(\\mathbf{x}_i) $$\n\\(T\\): Number of trees. \\(f_t\\): Output of the \\(t\\)-th tree for input \\(\\mathbf{x}_i\\). Loss Function:\nXGBoost optimizes a regularized objective:\n$$ L = \\sum_{i=1}^N l(y_i, \\hat{y}i) + \\sum{t=1}^T \\Omega(f_t) $$\n\\(l(y_i, \\hat{y}_i)\\): Loss, e.g., squared loss for regression, or pairwise ranking loss (e.g., LambdaRank). \\(\\Omega(f_t) = \\gamma T + \\frac{1}{2} \\lambda \\| \\mathbf{w}_t \\|_2^2\\): Regularization, where \\(T\\) is the number of leaves, and \\(\\mathbf{w}_t\\) are leaf weights. For ranking, it uses pairwise loss:\n$$ L = \\sum_{(i,j) \\in P} \\log(1 + \\exp(-(\\hat{y}_i - \\hat{y}_j))) $$\nwhere \\(P\\) is the set of relevant-irrelevant pairs. XGBoost uses a second-order approximation:\n$$ L \\approx \\sum_{i=1}^N \\left[ l(y_i, \\hat{y}_i^{(t-1)}) + g_i f_t(\\mathbf{x}_i) + \\frac{1}{2} h_i f_t(\\mathbf{x}_i)^2 \\right] + \\Omega(f_t) $$\n\\(g_i = \\frac{\\partial l}{\\partial \\hat{y}_i^{(t-1)}}\\), \\(h_i = \\frac{\\partial^2 l}{\\partial (\\hat{y}_i^{(t-1)})^2}\\). This enables efficient tree construction, with features like column sampling and parallel processing for scalability.\nInput and Output:\nInput: Feature vectors (numerical or categorical, e.g., user watch time, item category). Output: Ranking scores for items. Real-World Example:\nAt Bing, XGBoost ranks search results by modeling features like query relevance, user click history, and page quality, ensuring the most relevant results appear at the top.\nKey Takeaways (XGBoost in 3 Points):\nCaptures non-linear patterns via tree ensembles. Robust to missing data and interpretable (feature importance). Excels in ranking tasks like search and ads. Features: Captures non-linear patterns, handles mixed feature types via tree splits, robust to missing data, interpretable (feature importance scores).\nLimitations: Struggles with extremely high-dimensional sparse data (e.g., one-hot encoded user/item IDs), computationally expensive for large datasets, requires careful feature engineering.\nLed to: Neural Collaborative Filtering (NCF) and other deep learning methods that automatically learn feature representations.\n6. Conclusion Part 2 has taken us from MF to FM’s flexible feature interactions and XGBoost’s non-linear ranking power. FM excels in CTR prediction by modeling sparse, implicit data, while XGBoost dominates ranking tasks with its ability to capture complex patterns. However, both methods hit limits—FM’s pairwise focus and XGBoost’s reliance on feature engineering couldn’t keep up with the complexity of modern RecSys. In Part 3, we’ll explore how deep learning overcomes these limitations, tackling unstructured data like images and text with models like Neural Collaborative Filtering and DeepFM, which leverage neural networks for higher-order interactions and automated feature learning.\n7. References Rendle, S. (2010). Factorization Machines. 2010 IEEE International Conference on Data Mining (ICDM), 995-1000. https://www.ismll.uni-hildesheim.de/pub/pdfs/Rendle2010FM.pdf Chen, T., \u0026amp; Guestrin, C. (2016). XGBoost: A Scalable Tree Boosting System. arXiv preprint arXiv:1603.02754. https://arxiv.org/pdf/1603.02754 He, X., Zhang, H., Kan, M.-Y., \u0026amp; Chua, T.-S. (2017). Fast Matrix Factorization for Online Recommendation with Implicit Feedback. Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’17), 549–558. https://dl.acm.org/doi/10.1145/3459637.3482492 ","permalink":"https://pwaabdullah.github.io/posts/the-evaluation-of-recsys-part-2/","summary":"Part 2 of the series: how Factorization Machines generalized MF to arbitrary features, how XGBoost handled non-linear ranking, and the limitations that pushed the field toward deep neural networks.","title":"The Evaluation of RecSys — Part 2: Factorization Machines and XGBoost"},{"content":" TL;DR\nThis is Part 1 of a 4-part deep-dive on recommendation systems. We cover content-based filtering, collaborative filtering (user-based \u0026amp; item-based), and matrix factorization — with the math, loss functions, and where each breaks down. By the end you\u0026rsquo;ll know why MF dominated 2010-era RecSys but couldn\u0026rsquo;t handle sparse, non-linear, sequential patterns. Read Part 2 → picks up with Factorization Machines and XGBoost. Recommendation systems (RecSys) play a critical role in modern AI-driven applications. From e-commerce to social media, search engines, and online advertising, personalized recommendations significantly impact user experience and business revenue. This blog series is intended for both beginners and experienced ML practitioners who want to understand the evaluation of recommendation systems in a structured manner.\nI’ll discuss early tech briefly and deep dive into the latest innovations. For each technique, I’ll break down key concepts, their loss functions (how they learn), inputs/outputs, features, and limitations—why they weren’t enough, and how the next breakthrough fixed the flaws. But first things first:\n1. What is a Recommendation System? A Recommendation System (RecSys) is an AI-driven system designed to predict and suggest relevant items (products, content, services) to users based on their past interactions, preferences, and contextual signals.\nWhy is it important?\nIt reduces information overload and boosts user engagement by personalizing content delivery, making your online experience smoother while helping companies rake in revenue.\nUse Cases Domain Examples Application E-commerce Amazon, eBay, Alibaba Product recommendations Streaming Netflix, YouTube, Spotify Content recommendations Social Media FB, Insta, TikTok, Twitter Feed ranking, friends/pages suggestion Search Engines Google, Bing, Baidu Personalized search results Online Advertising Google Ads, Meta Ads, TikTok Ads Personalized ad ranking Healthcare Clinical Decision Support Personalized treatment recommendations Finance Stock Market, PayPal, Banking Personalized financial insights, fraud detection Geolocation Uber, Doordash, Airbnb Personalized ride, restaurant, rental suggestions 2. Evaluation Overview: From Legacy to State-of-the-Art 2000–2010: Classical Techniques Heuristic Methods: Rule-based (e.g., Amazon’s \u0026ldquo;Customers who bought this also bought\u0026rdquo;). Collaborative Filtering (CF): User \u0026amp; item-based CF using similarity metrics (cosine, Pearson). Content-based Filtering: TF-IDF, cosine similarity, word embeddings. Matrix Factorization (MF): SVD, ALS, NMF for latent factor modeling (Netflix Prize 2006). 2010–2015: ML-Based Factorization Machines (FM): Generalization of MF (used in CTR prediction). Gradient Boosted Trees (GBDT): XGBoost, LightGBM for ranking models (widely used in ads \u0026amp; search). Hybrid Models: CF + ML 2015–2020: Deep Learning-Based RecSys DeepFM: FM + DNN for learning feature interactions. Neural Collaborative Filtering (NCF): Replacing MF with deep networks. Multi-task Learning (MTL): Multi-objective optimization for RecSys (e.g., ads ranking). Graph Neural Networks (GNNs): PinSage (Pinterest), GAT-based RecSys. Sequential RecSys (RNN, Transformer): GRU4Rec, BERT4Rec, SecRec for session-based recommendations. 2020–Present: Gen AI-Powered RecSys LLMs for RecSys: ChatGPT, GPT-4, PaLM, Gemini for conversational RecSys. Retrieval-Augmented Generation (RAG): Using search + generation for recommendations. Diffusion Models: Generating recommendations using probabilistic diffusion. Multimodal RecSys: Combining text, image, video, and audio (e.g., TikTok). Reinforcement Learning (RL) in RecSys: Deep Q-Networks (DQN), PPO, Bandits (news feed ranking). 3. In Layman Terms (For non ML background) Imagine you’re at a buffet with endless dishes, but you only have time to pick a few. A recommendation system is like a smart friend who knows your taste. There are three classic ways it works:\nContent-Based Filtering (The “What You Like” Friend):\nThis friend looks at what you’ve eaten before—like spicy tacos—and suggests more spicy stuff, like chili soup. It builds a “you” profile (loves spicy!) and a “food” profile (this dish is spicy!) to match them up.\nExample: You watch a sci-fi movie on Netflix, so it suggests Star Wars next because it’s similar. Easy, right?\nCollaborative Filtering (The “Crowd Wisdom” Friend):\nThis friend doesn’t care what the food is—they watch other people. If you and your buddy both liked pizza, and your buddy also loved sushi, they’ll suggest sushi to you.\nExample: On Amazon, you buy a phone case, and it suggests a charger because others who bought cases also grabbed chargers.\nMatrix Factorization (The “Secret Code” Friend):\nNow imagine your friend cracking a secret code. They don’t just look at what you ate or what others did—they figure out why you liked it.\nExample: You rate action movies high on Netflix. It figures you like “fast pacing” and “hero vibes,” so it suggests Mad Max even if you’ve never seen it before.\nEach method has strengths, but they stumble too—new users or items can confuse them, or they miss the bigger picture. That’s why we keep inventing better friends!\n4. Technical Deep Dive Content-Based Filtering Key Idea: Match users to items based on their features. How It Works: Build a profile for users (e.g., “likes sci-fi”) and items (e.g., “sci-fi movie”) using text features like TF-IDF or embeddings (word2vec). Compute similarity (cosine) between them. Input: User history (watched Star Trek), item metadata (movie tags). Output: Ranked list of similar items (Star Wars). Loss Function: Minimize ranking error (e.g., cosine distance) or maximize relevance (e.g., precision@k). Limitations: Only recommends items similar to past preferences, leading to filter bubbles. Cannot discover diverse recommendations (e.g., user likes Sci-Fi → never gets Comedy). Fig 3: Content-based filtering [1]\nCollaborative Filtering (CF) Key Idea: Use group behavior to predict individual tastes. It assumes that users with similar preferences will like similar items. Neighborhood methods:\nUser-Based: Find users like you (similar ratings), borrow their likes.\nMath: Cosine similarity between user rating vectors. Item-Based: Find items like what you rated (similar users liked them).\nMath: Cosine similarity between item rating vectors. Latent factor models:\nRepresent users and items in a lower-dimensional latent space, driven by hidden factors (e.g., genres, tastes). This leads to Matrix Factorization (MF).\nInput: User-item rating matrix (sparse!) Output: Top-N items you might rate high. Loss Function: Minimize prediction error (e.g., RMSE). Limitations: Cold start problem. Also, If a movie or product is not popular, it will never get recommended. Fig 3: The user-oriented neighborhood method. [2]\nMatrix Factorization (MF) Key Idea: Decode ratings into hidden “factors” (tastes) for users and items. How It Works: Picture a giant grid with users as rows and items as columns, ratings in cells. Most cells are blank (no ratings). MF fills in the blanks by estimating latent traits. Math: Break the matrix into two:\nUser factors \\( p_u \\) Item factors \\( q_i \\) Predicted rating \\( \\hat{r}_{ui} = p_u^T q_i \\) Loss Function:\n$$ \\text{Loss} = \\sum (r_{ui} - q_i^T p_u)^2 + \\lambda (||q_i||^2 + ||p_u||^2) $$\nOptimizer: SGD or ALS Limitations: Assumes linear interactions, works only for explicit feedback. Fig 3: A simplified version of Matrix factorization [3]\nWhy Not Just SVD? SVD is related to MF and captures latent structure, but:\nSparsity: SVD requires a full matrix. Scalability: Expensive for large datasets. Overfitting: Without constraints, it fits noise. Regularization:\nHelps address sparsity and overfitting. Regularized MF works only on observed ratings and avoids costly imputation.\nConclusion The progression from content-based filtering (Fig 1) to collaborative filtering (Fig 2), and finally to MF (Fig 3), reflects a shift toward leveraging latent structures in sparse data. Neighborhood methods provided an initial collaborative approach, but latent factor models, inspired by SVD, offered scalability and accuracy. SVD’s limitations spurred regularized MF, focusing on observed ratings, with SGD and ALS optimizing this process for real-world systems.\nReferences StrataScratch Koren, Yehuda, et al. \u0026ldquo;Matrix Factorization Techniques for Recommender Systems.\u0026rdquo; Computer 42.8 (2009): 30–37. ","permalink":"https://pwaabdullah.github.io/posts/the-evaluation-of-recsys-part-1/","summary":"Part 1 of a 4-part series tracing how RecSys evolved from content-based filtering through collaborative filtering to matrix factorization — and where each technique falls short, setting up the next breakthrough.","title":"The Evaluation of RecSys — Part 1: From Content-Based Filtering to Matrix Factorization"},{"content":" Photo was taken in Meta All Hands 2024 at Hacker Square, Meta HQ, Menlo Park, CA Short Abdullah Al Mamun, PhD is a distinguished AI researcher and industry expert in Recommender Systems (RecSys) and Generative AI (GenAI). He is currently a Sr. ML Engineer at Atlassian on the Central AI team, where he builds and improves SMART Answer generation for Jira/Confluence Search using RL-based fine-tuned LLMs and multi-Agent AI architecture. Previously, he was a Member of Technical Staff at Aisera, architecting end-to-end multi-agentic AI systems for enterprise IT/HR — driving ~$3.4M estimated ARR, fine-tuning LLaMA-3 on AWS to save $4M in CAPEX vs. GPT-4, and enhancing the RAG pipeline for a ~93% real-time semantic-search improvement. Before that, he was a Machine Learning Engineer at Meta, fine-tuning LLaMA 3 for large-scale ad creation (13% CTR ↑, 6% CVR ↑, ~$497M iRev) and leading AutoCA audience clustering for personalized ads ranking (~$109M iRev). Academically, he holds a PhD in Computer Science from Florida International University, focused on interpretable applied ML for early cancer detection and drug recommendation. His journey spans research, academia, and production-level AI systems—with deep expertise in LLMs, Transformers, RAG, RL fine-tuning, and scalable ML systems. Long Abdullah Al Mamun, PhD, has spent over a decade at the forefront of ML research and applied AI, with a mission to bridge academic excellence and industry-scale impact. His journey spans multiple countries, top-tier institutions, and FAANG, culminating in his current role at Atlassian shaping the future of enterprise agentic AI for Jira and Confluence Search. He began his academic career in Bangladesh, earning a BS in CSE from Dhaka University of Engineering and Technology (DUET), where his early interest in machine learning and natural language processing (NLP) took root. Pursuing deeper expertise, he moved to KSA, earning a Master’s in Computer Engineering from King Fahd University of Petroleum \u0026 Minerals (KFUPM). During this period, he developed an LSTM-based sentiment analysis system that achieved 98% accuracy on customer feedback data. In 2017, he joined Qatar University as a machine learning researcher and went on to win the 2nd GCC Robotics Challenge, a milestone that recognized his innovation in AI and robotics. Later that year, Abdullah moved to the United States to pursue a PhD in CS at Florida International University (FIU). His research focused on interpretable deep learning for early-stage cancer detection and drug recommendation. His work resulted in multiple publications and travel fellowships to premier conferences such as ACM BCB and IEEE BIBM, reflecting both academic rigor and translational impact. In 2022, Abdullah transitioned to industry, joining Meta (formerly Facebook) as a Machine Learning Engineer. At Meta, he worked on large-scale ads ranking, personalization, and generative ad creation using cutting-edge techniques like MTML and Transformer-based sequence learning models. He collaborated with Meta AI to integrate LLMs into ad-creation workflows — fine-tuning LLaMA 3 (SFT, RLHF, KV-Cache, 4D parallelism, quantization, distillation) and delivering 13% CTR ↑, 6% CVR ↑, and ~$497M in incremental revenue. He also led AutoCA audience clustering for ads ranking via targeting relaxation, contributing another 0.1% iRev improvement (~$109M). In 2024, Abdullah joined Aisera as a Member of Technical Staff, where he led the design and deployment of multi-agentic AI systems for enterprise IT and HR automation — driving ~$3.4M estimated ARR. He architected a scalable, reusable onboarding agent and spearheaded the migration from commercial APIs (e.g., GPT-4) to in-house fine-tuned LLaMA-3 models on AWS, saving $4M in CAPEX. He also enhanced the RAG pipeline, improving real-time semantic search performance by ~93%. Most recently, Abdullah joined Atlassian as a Sr. ML Engineer on the Central AI team, where he builds and improves SMART Answer generation for Jira and Confluence Search. His work centers on RL-based fine-tuned LLMs and multi-Agent AI architecture, driving the next generation of enterprise search experiences for millions of users across thousands of organizations. Throughout his career, Abdullah has developed deep expertise in RecSys, LLMs, Transformers, RAG, vector databases, PyTorch, Hugging Face, LangChain, and inference optimization. He holds certifications from Google Cloud and the University of Illinois Urbana-Champaign, and continues to contribute actively to the field through open-source, research, and real-world deployment of AI systems. Now based in Fremont, California, he remains focused on building AI that is not only intelligent—but scalable, reliable, and transformative for enterprises and society alike. Outside of work, Abdullah leads an active lifestyle with a love for badminton, skiing, hiking, and mountain biking. He is also passionate about nature photography and has a deep appreciation for the outdoors. A frequent traveler, he has visited 16+ countries so far and continues to explore new cultures and landscapes to fuel his curiosity and creativity. Follow GitHub ","permalink":"https://pwaabdullah.github.io/about/","summary":"about","title":""},{"content":" Send Message to Abdullah Name Email Message Send Message ✅ Thanks! Your message has been sent. Close Follow GitHub ","permalink":"https://pwaabdullah.github.io/contact/","summary":"\u003cdiv style=\"display: flex; justify-content: center; align-items: center; padding: 2rem 1rem;\"\u003e\n  \u003cdiv style=\"\n    backdrop-filter: blur(12px);\n    background: rgba(255, 255, 255, 0.06);\n    border: 1px solid rgba(255, 255, 255, 0.15);\n    box-shadow: 0 8px 32px rgba(0, 0, 0, 0.3);\n    border-radius: 20px;\n    padding: 2rem;\n    max-width: 600px;\n    width: 100%;\n    color: white;\n  \"\u003e\n\u003ch2 id=\"send-message-to-abdullah\"\u003eSend Message to Abdullah\u003c/h2\u003e\n  \u003cform id=\"contact-form\" onsubmit=\"return handleValidatedSubmit(event)\" style=\"display: flex; flex-direction: column; gap: 1rem;\"\u003e\n    \u003clabel\u003e\n      \u003cspan style=\"display: block; margin-bottom: 0.5rem;\"\u003eName\u003c/span\u003e\n      \u003cinput type=\"text\" name=\"name\" required style=\"\n        width: 100%;\n        padding: 0.75rem;\n        border-radius: 10px;\n        border: 1px solid #555;\n        background-color: rgba(255,255,255,0.08);\n        color: white;\n        font-size: 1rem;\n        outline: none;\n      \" /\u003e\n    \u003c/label\u003e\n   \u003cdiv\u003e\n    \u003clabel\u003e\n      \u003cspan style=\"display: block; margin-bottom: 0.5rem;\"\u003eEmail\u003c/span\u003e\n      \u003cinput type=\"email\" name=\"email\" required style=\"\n        width: 100%;\n        padding: 0.75rem;\n        border-radius: 10px;\n        border: 1px solid #555;\n        background-color: rgba(255,255,255,0.08);\n        color: white;\n        font-size: 1rem;\n        outline: none;\n      \" /\u003e\n    \u003c/label\u003e\n   \u003c/div\u003e\n    \u003clabel\u003e\n      \u003cspan style=\"display: block; margin-bottom: 0.5rem;\"\u003eMessage\u003c/span\u003e\n      \u003ctextarea name=\"message\" rows=\"5\" required style=\"\n        width: 100%;\n        padding: 0.75rem;\n        border-radius: 10px;\n        border: 1px solid #555;\n        background-color: rgba(255,255,255,0.08);\n        color: white;\n        font-size: 1rem;\n        resize: vertical;\n        outline: none;\n      \"\u003e\u003c/textarea\u003e\n    \u003c/label\u003e\n\u003cdiv class=\"g-recaptcha\" data-sitekey=\"6Le4LhwrAAAAADL-2U0w1gcQY7BiSrJTvgyvbPmO\"\u003e\u003c/div\u003e\n\u003cdiv\u003e\n    \u003cbutton type=\"submit\" style=\"\n  margin-top: 1rem;\n  padding: 0.75rem 1.5rem;\n  background-color: #444;\n  border: none;\n  border-radius: 8px;\n  color: white;\n  font-size: 1rem;\n  cursor: pointer;\n  transition: background-color 0.3s ease;\n\" onmouseover=\"this.style.backgroundColor='#666';\" onmouseout=\"this.style.backgroundColor='#444';\"\u003e\n  Send Message\n\u003c/button\u003e\n   \u003c/div\u003e\n  \u003c/form\u003e\n  \u003c/div\u003e\n\u003c/div\u003e\n\u003cdiv id=\"custom-alert\" style=\"display: none; position: fixed; top: 50%; left: 50%; transform: translate(-50%, -50%); background: rgba(0,0,0,0.85); color: white; padding: 1.5rem 2rem; border-radius: 10px; box-shadow: 0 8px 24px rgba(0,0,0,0.4); z-index: 1000; text-align: center;\"\u003e\n  ✅ Thanks! Your message has been sent.\n  \u003cdiv style=\"margin-top: 1rem;\"\u003e\n    \u003cbutton onclick=\"document.getElementById('custom-alert').style.display='none'\" style=\"background: transparent; border: 1px solid white; color: white; padding: 0.5rem 1rem; border-radius: 6px; cursor: pointer;\"\u003eClose\u003c/button\u003e\n  \u003c/div\u003e\n\u003c/div\u003e\n\u003c!-- Social Media Buttons --\u003e\n\u003cp align=\"center\" style=\"padding-top: 40px; flex-wrap: wrap;\"\u003e\n  \u003ca href=\"https://www.linkedin.com/comm/mynetwork/discovery-see-all?usecase=PEOPLE_FOLLOWS\u0026followMember=newabdullah\" target=\"_blank\" class=\"social-btn linkedin\"\u003e\n    \u003ci class=\"fab fa-linkedin\"\u003e\u003c/i\u003e Follow\n  \u003c/a\u003e\n  \u003ca href=\"https://github.com/pwaabdullah\" target=\"_blank\" class=\"social-btn github\"\u003e\n    \u003ci class=\"fab fa-github\"\u003e\u003c/i\u003e GitHub\n  \u003c/a\u003e\n  \u003ca href=\"https://x.com/newmamun\" target=\"_blank\" class=\"social-btn x\"\u003e\n    \u003csvg style=\"height: 20px; width: 20px; fill: white;\" xmlns=\"http://www.w3.org/2000/svg\" viewBox=\"0 0 24 24\"\u003e\n      \u003cpath d=\"M17.53 2H21l-7.22 8.24L23 22h-6.38l-4.99-6.03L5.8 22H2l7.66-8.75L1 2h6.5l4.44 5.37L17.53 2Zm-1.1 18h1.74L8.61 4H6.75l9.68 16Z\"/\u003e\n    \u003c/svg\u003e\n  \u003c/a\u003e\n\u003c/p\u003e","title":""},{"content":" All Aisera Meta FIU Photography Qatar Hobbies KSA Travals Sunset in the mountains\nSunset in the mountains\nSunset in the mountains\nSunset in the mountains\nSunset in the mountains\nSunset in the mountains\nSunset in the mountains\nSunset in the mountains\nSunset in the mountains\nSunset in the mountains\nSunset in the mountains\nSunset in the mountains\nSunset in the mountains\nSunset in the mountains\nSunset in the mountains\nSunset in the mountains\nSunset in the mountains\nSunset in the mountains\nSunset in the mountains\nSunset in the mountains\nSunset in the mountains\nSunset in the mountains\nSunset in the mountains\nSunset in the mountains\nSunset in the mountains\nSunset in the mountains\nSunset in the mountains\nSunset in the mountains\nSunset in the mountains\nSunset in the mountains\nSunset in the mountains\nSunset in the mountains\nSunset in the mountains\nSunset in the mountains\nSunset in the mountains\nSunset in the mountains\nSunset in the mountains\nSunset in the mountains\nSunset in the mountains\nSunset in the mountains\nSunset in the mountains\nSunset in the mountains\nSunset in the mountains\nSunset in the mountains\nSunset in the mountains\nSunset in the mountains\nSunset in the mountains\nSunset in the mountains\nSunset in the mountains\nSunset in the mountains\nSunset in the mountains\n\u0026#10006; \u0026#10094; Prev Next \u0026#10095; ","permalink":"https://pwaabdullah.github.io/gallery/","summary":"\u003c!-- Category Filter Buttons --\u003e\n\u003cdiv class=\"category-buttons\"\u003e\n  \u003cbutton onclick=\"filterImages('all')\" id=\"all-btn\" class=\"active\"\u003eAll\u003c/button\u003e\n  \u003cbutton onclick=\"filterImages('Aisera')\" id=\"Aisera-btn\"\u003eAisera\u003c/button\u003e\n  \u003cbutton onclick=\"filterImages('Meta')\" id=\"Meta-btn\"\u003eMeta\u003c/button\u003e\n  \u003cbutton onclick=\"filterImages('FIU')\" id=\"FIU-btn\"\u003eFIU\u003c/button\u003e\n  \u003cbutton onclick=\"filterImages('Photography')\" id=\"Photography-btn\"\u003ePhotography\u003c/button\u003e\n  \u003cbutton onclick=\"filterImages('Qatar')\" id=\"qatar-btn\"\u003eQatar\u003c/button\u003e\n  \u003cbutton onclick=\"filterImages('Hobbies')\" id=\"hobbies-btn\"\u003eHobbies\u003c/button\u003e\n  \u003cbutton onclick=\"filterImages('KAS')\" id=\"KAS-btn\"\u003eKSA\u003c/button\u003e\n  \u003cbutton onclick=\"filterImages('Travals')\" id=\"Travals-btn\"\u003eTravals\u003c/button\u003e\n\u003c/div\u003e\n\u003c!-- Gallery Images --\u003e\n\u003cdiv class=\"gallery-container\"\u003e\n\u003c!-- META --\u003e\n\u003cdiv class=\"gallery-item\" data-category=\"Meta\"\u003e\n    \u003cimg src=\"/images/Meta/302439033_5318522611592346_4803124893801972541_n.jpg\" alt=\"Sunset in the mountains\" class=\"gallery-image\"\u003e\n    \u003cp class=\"image-description\"\u003eSunset in the mountains\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv class=\"gallery-item\" data-category=\"Meta\"\u003e\n    \u003cimg src=\"/images/Meta/306757202_5354889141289026_5112950834718955289_n.jpg\" alt=\"Sunset in the mountains\" class=\"gallery-image\"\u003e\n    \u003cp class=\"image-description\"\u003eSunset in the mountains\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv class=\"gallery-item\" data-category=\"Meta\"\u003e\n    \u003cimg src=\"/images/Meta/312402184_5462164477228158_3787190121755072551_n.jpg\" alt=\"Sunset in the mountains\" class=\"gallery-image\"\u003e\n    \u003cp class=\"image-description\"\u003eSunset in the mountains\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv class=\"gallery-item\" data-category=\"Meta\"\u003e\n    \u003cimg src=\"/images/Meta/321647808_2360346054120141_6847844467917085108_n.jpg\" alt=\"Sunset in the mountains\" class=\"gallery-image\"\u003e\n    \u003cp class=\"image-description\"\u003eSunset in the mountains\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv class=\"gallery-item\" data-category=\"Meta\"\u003e\n    \u003cimg src=\"/images/Meta/328841250_858117231952165_491789575471010659_n.jpg\" alt=\"Sunset in the mountains\" class=\"gallery-image\"\u003e\n    \u003cp class=\"image-description\"\u003eSunset in the mountains\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv class=\"gallery-item\" data-category=\"Meta\"\u003e\n    \u003cimg src=\"/images/Meta/350797827_1952280745117212_5745484599839038115_n.jpg\" alt=\"Sunset in the mountains\" class=\"gallery-image\"\u003e\n    \u003cp class=\"image-description\"\u003eSunset in the mountains\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv class=\"gallery-item\" data-category=\"Meta\"\u003e\n    \u003cimg src=\"/images/Meta/IMG_48832.jpg\" alt=\"Sunset in the mountains\" class=\"gallery-image\"\u003e\n    \u003cp class=\"image-description\"\u003eSunset in the mountains\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv class=\"gallery-item\" data-category=\"Meta\"\u003e\n    \u003cimg src=\"/images/Meta/IMG_4889.jpg\" alt=\"Sunset in the mountains\" class=\"gallery-image\"\u003e\n    \u003cp class=\"image-description\"\u003eSunset in the mountains\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv class=\"gallery-item\" data-category=\"Meta\"\u003e\n    \u003cimg src=\"/images/Meta/IMG_4891.jpg\" alt=\"Sunset in the mountains\" class=\"gallery-image\"\u003e\n    \u003cp class=\"image-description\"\u003eSunset in the mountains\u003c/p\u003e","title":"Gallery"},{"content":" A curated set of peer-reviewed work spanning deep learning, evolutionary machine learning, and applied AI across cybersecurity and computational biology. Full list and live citation counts on Google Scholar. 2025\nGenetic programming for enhanced detection of Advanced Persistent Threats through feature construction Abdullah Al Mamun, Harith Al-Sahaf, Ian Welch, Seyit Camtepe Computers \u0026amp; Security, vol. 149, 104185 \u0026nbsp;·\u0026nbsp; Elsevier \u0026nbsp;·\u0026nbsp; Q1, IF\u0026nbsp;~5.6 Uses genetic programming to automatically construct features that catch advanced persistent threats — outperforming hand-engineered baselines on benchmark intrusion datasets. 2024\nDetection of Advanced Persistent Threat: A genetic programming approach Abdullah Al Mamun, Harith Al-Sahaf, Ian Welch, Masood Mansoori, Seyit Camtepe Applied Soft Computing, vol. 167, 112447 \u0026nbsp;·\u0026nbsp; Elsevier \u0026nbsp;·\u0026nbsp; Q1, IF\u0026nbsp;~8.7 Evolutionary ML for cybersecurity — proposes a GP framework for APT detection that generalizes across attack families with interpretable detection rules. 2021\nMulti-run concrete autoencoder to identify prognostic lncRNAs for 12 cancers Abdullah Al Mamun, Raihanul Bari Tanvir, Masrur Sobhan, Kalai Mathee, Giri Narasimhan, Gregory E. Holt, Ananda Mohan Mondal International Journal of Molecular Sciences, 22(21), 11919 \u0026nbsp;·\u0026nbsp; MDPI \u0026nbsp;·\u0026nbsp; Q1 Deep-learning approach (concrete autoencoder) for biomarker discovery across 12 cancer types — finds prognostic long non-coding RNAs that survive across independent runs, giving clinicians a stable feature set. 2020\nDeep learning to discover genomic signatures for racial disparity in lung cancer Masrur Sobhan, Abdullah Al Mamun, Raihanul Bari Tanvir, Maria J. Alfonso, Pia Valle, Ananda Mohan Mondal IEEE International Conference on Bioinformatics and Biomedicine (BIBM 2020) Deep learning surfaces genomic signals that differ between racial groups in lung cancer — a societal-impact application of ML to a healthcare-equity problem. 2019\nLong non-coding RNA based cancer classification using deep neural networks Abdullah Al Mamun, Ananda Mohan Mondal Proceedings of the 10th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics (ACM-BCB 2019) End-to-end deep neural network for multi-cancer classification using lncRNA expression — outperforms classical ML baselines on the TCGA pan-cancer dataset. For the complete list, conference proceedings, and live citation counts, see Google Scholar. ","permalink":"https://pwaabdullah.github.io/publications/","summary":"Selected peer-reviewed publications on deep learning, evolutionary ML, and applied AI.","title":"Publications"},{"content":"","permalink":"https://pwaabdullah.github.io/resume/","summary":"8+ years of applied AI/ML at Atlassian, Aisera, and Meta. LLM fine-tuning, multi-agent systems, RecSys, ads ranking.","title":"Resume"}]