For now, resources.
Here is a “guide + reading list” for Sequence-of-Events (SoE) user embeddings in ad-tech, with enough context that you can use it as a starting spec inside your team.
1. What “user embeddings from sequences of events” really means
Traditional ad-tech user features:
Sequence-of-Events (SoE) approach:
-
For each user you keep a time-ordered list of events:
- Impressions, clicks, conversions, searches, page views, app actions…
-
Each event carries:
- IDs (ad, item, campaign, publisher), context (device, geo, placement), time, etc.
-
You feed this sequence into a sequence model (RNN / Transformer).
-
The model outputs one or more user vectors = user embeddings.
-
These embeddings are then used for:
- Candidate retrieval (ANN, similarity graph).
- Ranking (CTR/CVR models).
- Lookalikes / clustering / personalization in other products.
In other words: instead of hand-crafting summaries of the log, you let a sequence model learn how to compress the log into a vector.
2. Does it scale to hundreds of millions of users?
Short answer: yes. Several companies are doing almost exactly what you describe at “hundreds of millions – billions of users per day” scale. The trick is how you architect it.
2.1 Meta: ALURE – async user embeddings for ads
Paper: Async Learned User Embeddings for Ads Delivery Optimization (ALURE).(arXiv)
What they do:
- Learn user embeddings from sequence-based, multimodal user activities using a Transformer-like model.(arXiv)
- Do this asynchronously for billions of users per day.
- Build a user similarity graph from these embeddings and use it to retrieve ad candidates, combined with realtime signals in the main ads system.(arXiv)
Why this matters for you:
- This is almost exactly “user embeddings based on SoE events for ads,” proven at Meta scale.
- They explicitly decouple heavy sequence modeling into an offline/nearline pipeline; serving only uses precomputed embeddings + realtime features.
2.2 Alibaba / Taobao: long sequential user behavior for CTR
Core paper: Practice on Long Sequential User Behavior Modeling for CTR Prediction (MIMN + UIC).(arXiv)
What they say:
Why this matters:
- UIC is effectively a user embedding service built from sequences.
- It’s deployed in Alibaba’s display ads system and handles sequences up to thousands of events per user.(arXiv)
Follow-up work:
- SIM (Search-based User Interest Modeling) and ETA-Net improve how they search and attend over lifelong user histories (tens of thousands of behaviors) while staying within latency budgets.(arXiv)
Takeaway: they solved scaling issues by:
- Separating interest modeling from CTR serving (UIC).
- Using retrieval + efficient attention for very long histories rather than running a giant Transformer over everything at serve time.
2.3 Pinterest: TransAct – realtime + batch user sequence modeling
Paper: TransAct: Transformer-based Realtime User Action Model for Recommendation at Pinterest.(arXiv)
Key design:
- A realtime Transformer (TransAct) that encodes recent user actions.
- Combined with batch-generated user embeddings that summarize long-term preferences.(arXiv)
- Deployed to multiple large surfaces (Homefeed, Related Pins, Notifications, Search).(arXiv)
- Public PyTorch repo: pinterest/transformer_user_action.(GitHub)
Why it matters:
This is very close to what you’d do if you plug SoE user embeddings into an existing ad ranking stack.
2.4 Kuaishou: TWIN – lifelong user sequences
Paper: TWIN: TWo-stage Interest Network for Lifelong User Behavior Modeling in CTR Prediction at Kuaishou (KDD 2023).(arXiv)
Highlights:
-
Targets lifelong histories (behaviors over months/years; sequences of length (10^4–10^5)).(arXiv)
-
Uses two stages:
- A General Search Unit (GSU) over the long history.
- An Exact Search Unit (ESU) with attention over a small relevant subset.
-
Ensures consistent relevance metrics between GSU and ESU so the retrieval stage doesn’t filter out behaviors the attention stage cares about.(arXiv)
Takeaway: if you eventually want very long histories, you probably need a two-stage design like TWIN/SIM, not a single giant sequence model.
2.5 Tencent: AETN – general-purpose user embeddings from app usage
Paper: General-Purpose User Embeddings based on Mobile App Usage (Tencent, KDD 2020).(arXiv)
They:
- Model sequences of app events: install, uninstall, retention, etc. (heterogeneous events).(arXiv)
- Use an AutoEncoder-coupled Transformer Network (AETN) to learn general-purpose user embeddings.(arXiv)
- Deploy these embeddings in multiple downstream applications (ads, recommendations, etc.) at Tencent scale.(arXiv)
Takeaway: SoE embeddings can be shared across many tasks and teams, not just one CTR model.
2.6 Survey confirmation
Survey: “A Survey on User Behavior Modeling in Recommender Systems” (IJCAI 2023).(arXiv)
- Defines categories like Long-Sequence UBM and User-Behavior Retrieval-based methods.
- Explicitly discusses industrial systems like MIMN/UIC, SIM, etc. as examples of long-sequence user modeling at scale.(IJCAI)
This gives you a good overview of where SoE user embeddings fit in the broader recommender literature.
3. What all these systems have in common
If you strip away the details, the successful large-scale systems share a few core ideas.
3.1 Heavy sequence modeling is not in the hot path
Instead of recomputing a big Transformer for every ad request:
-
Meta ALURE:
- Runs a Transformer-like model offline/nearline on user histories.
- Produces embeddings asynchronously for billions of users per day.
- Those embeddings are used later in retrieval + ranking.(arXiv)
-
Alibaba MIMN/UIC:
- UIC is a separate module that stores user interest vectors produced from long sequences.
- The main CTR model queries UIC; it doesn’t redo long-sequence modeling at request time.(arXiv)
Pattern:
Build a user embedding service that is updated offline/nearline, then reuse its outputs everywhere.
For you: this is how you get SoE richness without blowing up latency.
3.2 Long histories are managed with windows or two-stage retrieval
Naïve idea: “just feed all 10,000 events into a big Transformer.”
Reality: too slow and too costly at ad serving QPS.
What people actually do:
-
Use a recent window (e.g., last 100–300 events) for the main encoder.
-
For lifelong histories, use two-stage methods:
- SIM and TWIN: fast search over the full history, then attention over a small subset.(arXiv)
This keeps complexity manageable while still benefiting from long-term behavior.
3.3 Hybrid offline (long-term) + realtime (short-term)
TransAct is the cleanest example:
- Batch user embeddings (long-term) + realtime Transformer features (short-term).(arXiv)
ALURE also combines async user embeddings with realtime user activity when retrieving ads.(arXiv)
Pattern:
-
Offline part:
- SoE encoder over long window, updated every X minutes/hours.
-
Realtime part:
- Small model or features over last few events in the current session.
For your scale, I would assume this hybrid structure from the beginning.
3.4 Specialized infra for big embedding tables and sequences
Two libraries you’ll see referenced:
At hundreds of millions of users, the bottleneck is often the embedding infrastructure, not the sequence model itself. Using one of these stacks (or building something similar) is strongly recommended.
4. A practical blueprint for your ad-tech use case
Below is a simplified but realistic step-by-step plan.
4.1 Step 1 – Start small and narrow
Pick:
- 1–2 high-impact ad surfaces (e.g., feed ads on web + app).
- 1 main objective (CTR or CVR).
Build a dataset:
This gives you a clean SoE representation to experiment with.
4.2 Step 2 – Train a modest SoE encoder
Choose a simple model first:
Train it to:
- Predict click/no-click (or next event) given the history up to time (t).
From this model, define the user embedding as:
- The final hidden state, or
- An attention-pooled summary over the sequence.
Run offline comparisons:
- Old model (aggregated features) vs new model (aggregated + user embedding).
- Look at AUC / log-loss / NDCG improvements.
Goal: prove value offline and debug data/leakage issues.
4.3 Step 3 – Turn it into a user embedding service
Once you have a good encoder:
Then update your online stack:
-
On each ad request:
- Look up user embedding.
- Feed it, together with existing features, into your CTR/CVR ranker.
-
Optionally:
- Start using it as a query vector in an ANN index to retrieve candidate ads/items.
At this point, you have a real SoE-based user representation in production.
4.4 Step 4 – Add a small realtime head
When the basics are stable:
This is effectively a simplified TransAct-style hybrid design.(arXiv)
4.5 Step 5 – Only then think about full lifelong histories
Once:
- The SoE user embedding service works,
- The hybrid ranker is stable and shows lift,
you can consider:
- Extending sequence length,
- Introducing two-stage retrieval (SIM/TWIN style) for lifelong histories.
This is where TWIN and SIM/ETA become relevant.(arXiv)
I would not start there for a first deployment.
5. Curated reading list and repos (short, opinionated)
If you want a “minimum set” of things to read and show colleagues:
5.1 Directly relevant industrial papers
-
Meta – ALURE
Async user embeddings from sequence-based activities for billions of users per day, powering a user similarity graph for ad retrieval.(arXiv)
-
Alibaba – MIMN + UIC
“Practice on Long Sequential User Behavior Modeling for CTR Prediction”: introduces MIMN and UIC, a separate interest center service to handle long sequences efficiently.(arXiv)
-
Pinterest – TransAct
“Transformer-based Realtime User Action Model for Recommendation”: hybrid of batch user embeddings + realtime Transformer, deployed to Homefeed and other surfaces.(arXiv)
-
Kuaishou – TWIN
“TWo-stage Interest Network for Lifelong User Behavior Modeling in CTR Prediction”: two-stage interest retrieval + attention for very long histories.(arXiv)
-
Tencent – AETN
“General-Purpose User Embeddings based on Mobile App Usage”: AutoEncoder + Transformer over app usage sequences, used across multiple downstream tasks.(arXiv)
-
Survey – He et al. 2023
“A Survey on User Behavior Modeling in Recommender Systems”: overview of conventional vs long-sequence vs retrieval-based user behavior models, including industrial systems.(arXiv)
5.2 Libraries and repos to look at
-
Transformers4Rec (NVIDIA Merlin) – GitHub
Library for sequential & session-based recommendation with Transformers, integrated with NVTabular and Triton for full pipelines.(arXiv)
-
TorchRec (Meta) – GitHub + docs
PyTorch library for large-scale recsys with sharded embedding tables and distributed training.(IJCAI)
-
pinterest/transformer_user_action – TransAct code
Example of a production-style Transformer user action model.(GitHub)
These give you concrete templates for how to structure and run SoE models.
6. Very short summary
-
Yes, SoE user embeddings do scale: Meta (ALURE), Alibaba (MIMN/UIC, SIM, ETA), Pinterest (TransAct), Kuaishou (TWIN), Tencent (AETN) all run SoE-style user modeling at “hundreds of millions / billions of users per day” scale.(arXiv)
-
The key patterns they share:
- Heavy sequence modeling is async/offline in a user embedding service.
- Long histories are handled with windows or two-stage retrieval + attention, not one huge model.
- They use hybrid offline (long-term) + realtime (short-term) representations.
- They rely on specialized infra like TorchRec and Transformers4Rec for big embedding tables and sequence modeling.(ar5iv)
-
A sensible path for you:
- Start with one surface + last 100–200 events + small Transformer/GRU.
- Turn it into a user embedding service and plug into existing CTR/ranking.
- Add a small realtime head.
- Only then explore full lifelong histories and two-stage architectures.