The Real Twitter Files: The Algorithm
Plus Product Career Hacks
Welcome to the more than 2,000 smart new people joining us this week.
In Today’s Newsletter:
Tech Corner: Unpacking Twitter’s Algorithm
AI Corner: The Pause Makes No Sense
Product Corner: PM Career Hacks
Paid Corner: A Real Product Requirements Document (PRD) Example
Unpacking Twitter’s Algorithm
The big news in tech this week was Twitter open-sourcing its algorithm.
It’s an important event to analyze because algorithms rule our lives. Everything from Google and YouTube, to Instagram and Facebook, to LinkedIn and TikTok, has an algorithm underneath it.
Twitter is the first company to completely upload their algorithm to Github. So for any serious follower of the tech industry, it’s worth peaking under the hood.
Heads Up: If you just want the “so what does this mean for me to grow on Twitter?" and you haven’t seen my viral Twitter thread, you may want to go there first. This piece goes the layer deeper, giving you the overall framework of how all that information comes together.
How does the algorithm work?
Surprisingly, the algorithm is not as “AI driven” as you might imagine. There’s three basic steps to the algorithm.
Step 1: Data aggregation
Step 2: Feature formation
Step 3: Mixing
Here’s what that looks like in a diagram:
Let’s break down each step.
Step 1: Data Aggregation
The data aggregation phase is relatively simple. What happens is that the algorithm collects data about three key areas: your followers, your tweet, and you.
The data about your followers is pretty simple. It looks at who follows you.
The data about your tweet is a little more complex. It’s a linear ranking parameter that weighs several inputs about the tweet. Here is what this looks like. Conveniently, the Scala code is pretty readable, even for non-programmers:
What these linear ranking parameters show is that the algorithm biggest positive boost is for “favCountParams.” This sadly isn’t defined in the Github. The current thinking is that favCountParams = Likes + Bookmarks. If your post can earn likes or bookmarks, each gives it a 30x boost.
The next highest boost comes from Retweets, which give you a 20x boost. After that, you get a 2x boost for having a video or image. Surprisingly, a reply only gets you a 1x boost.
What’s not shown in the above code is that there are also “deboosts” for your tweet. These are negative modifiers that lower your linear ranking parameters. The algorithm doesn’t like if you post links or in an unrecognized language.
The final data point is about you. This is data about how many times you have been blocked, muted, reported for abuse or spam, or been unfollowed (over the last 50 day rolling period). Here it is in the code:
Step 2: Feature Formation
After data aggregation comes feature formation. The algorithm turns all that data into four key feature buckets. Let’s go through each one by one.
SimClusters is the cluster where your Tweet belongs. Twitter groups tweets and people together into clusters of people algorithmically. This is the probably the most “AI” aspect of the algorithm as it exists today. Here’s what that looks like for users:
TwHIN is an algorithmic representation that estimates a user’s probability of doing the target downstream tasks. It creates a vectorized representation for a tweet for a specific user representing the probability the user will read that tweet, like it, retweet it, and reply to it.
RealGraph takes information about the tweet, the tweet writer, and the potential tweet receiver to create a weighted graph to estimate the probability of any interaction.
Trust & Safety reads content to see if it violates Twitter’s rules. This is all about removing things like child pornography, hate speech, and misinformation. Surprisingly, it also dings information about “the Ukraine Crisis:”
Under Elon Musk, the algorithm has gotten much more aggressive about censure. As former head of Trust & Safety Yael Roth said:
Mr. Musk empowered my team to move more aggressively to remove hate speech across the platform — censoring more content, not less.
Step 3: Mixing
After feature formation comes the final phase: Mixing. The algorithm groups all of the features into three candidates sources: In Network (RealGraph and Trust & Safety), Embedding Space (SimClusters and TwHIN), and Social Graph (Follower Graph, Engagements) as inputs into the Heavy Ranker. This is where all of the action happens.
Here is what it looks like:
Heavy ranker takes predictions of whether a user will read the tweet and stay for two minutes. This explains why threads and long tweets do so well on the platform. It also takes predictions of whether the user will reply, and whether the author will reply to the reply. That’s actually the highest final weighting! It gets you a 75x boost.
Basically, what HeavyRanker is trying to do is rank every single tweet that could be shown to you on the probability that will lead to positive actions for Twitter.
After HeavyRanker is calculated, several heuristics and filtering are applied. The algorithm tries to create a curated feed by creating content balance and author diversity. It doesn’t want you to just stay in the echo chamber of your cluster. It also uses social proof (highly viral things) and feedback fatigue (give you a variety of outcomes, not just tweets you will feedback to) to make tweaks.
The final step in the construction of your “For You” feed is mixing together the tweets that have been ranked by all the above steps with ads and who the algorithm thinks you should follow. This results in what you see.
Unfortunately, all of the algorithm is not posted on Github. A lot of the files online are readme’s summarizing the actual code. One former Twitter Engineer I talked to said “about 80%” of the actual code in production is not in the Github.
Nevertheless, given the importance of algorithms in our lives, we can and must try to read the tea leaves. I hope today’s post helped.
The “Pause” Makes No Sense
The other big news in tech this week was 2,8000 plus tech luminaries publishing an open letter asking that we pause development of AI algorithms.
It’s just not feasible
Co-signed by everyone from Elon Musk to Yuval Harari, the letter identifies valid problems but suggests a completely infeasible solution: “We call on all AI labs to immediately pause for at least 6 months the training of AI systems more powerful than GPT-4.”
It’s simply not possible to stop all the AI labs in the world. Company boards have a fiduciary obligation to their shareholders. There are going to be some that decide against the moratorium. And even if one does, all the rest then fall behind.
This results in a game theoretic outcome where some will continue development, if not simply for fear others might. This might all happen in the quiet, but it will happen nonetheless.
This is especially certain if you consider the case of state actors. Is there any realistic chance Russia or North Korea would consider pausing development in this space? They too can see its power.
It will increase misinformation
“Should we let machines flood our information channels with propaganda and untruth?”
The concern of those in the letter is supposedly misinformation and propaganda. However, state actors are the largest perpetrators of such propaganda.
Pausing development in certain countries really just kneecaps them while less-aspirational actors continue inevitable development. If anything, the proposed pause just puts us behind in the fight against disinformation.
It kneecaps all the progress we’re making in generative AI
Generative AI is already presenting immense applications for programmers, writers, accountants, and hundreds of other fields. As I’ve covered in prior issues of this newsletter, it’s an internet or mobile scale revolution in technology.
Pausing progress kneecaps all that positive momentum. Given the very real positive productivity improvements we’re seeing from generative AI, we cannot afford to do that.
Summary: The goal of responsible AI is true and noble. But this proposed solution is not the way.
Product Career “Hacks”
The best career “hacks” I’ve learned in 16 years of product management:
1. Ask for feedback more often than feels comfortable: The fastest way to grow is to have a short action-learn-grow cycle. Asking for feedback more often shortens the cycle. View the job as a set of skills. 5% gains 12 times a year yields 1.8x a skill in a year.
2. Ask for feedback really well: Generically asking, “do you have feedback for me?” yields few quality improvement areas. Instead, retro projects with the team. Intuit where product could have improved. In 1:1s ask specifics, like, “how can we make planning better next half?”
3. Prune your feedback inputs over time: 10 sessions with 1 person who gives great feedback are far more valuable than 10 sessions with 10 people who give okay feedback. And 90% of people only give okay feedback, because they want to appear friendly and accepting.
4. Have really clear goals: To understand and filter feedback appropriately, you need to be working towards something specific. Write your goals down and revisit progress regularly. If you do not have clear goals, spend time clarifying them.
5. Regularly discuss your goals with your manager: Most people could afford to manage their careers more proactively, especially with their managers. Leverage them as an ally, not critic. It’s cognitive bias. Managers promote those who they are working most to get promoted.
6. Practice working backwards with your manager: Ask, “what skills and track record do I lack to make the next level?” Get specific. Then track progress to make sure you are closing the gap. If progress is stalled, leverage your manager to unblock you.
7. Ship regular upgrades of yourself: Between your feedback and your goals, you should have plenty of fodder to identify areas to improve. Like the iPhone, make those incremental upgrades every year. Every few years, make a really big change. EG, when going from IC to manager.
8. Leverage alternative resources to grow in unique ways: There’s product Twitter, LinkedIn, Reddit, meetups, colleagues, friends from college… Pick which work for you. Then regularly approach them with a growth mindset. Progress stops once you think you’ve mastered something.
9. Be your own biggest fan: No one is paying as much attention to how you spend your time as you. You have to study yourself like a sports fan. You also have to pick yourself up and be your biggest advocate. No one can replace you in these tasks.
10. Be your own coach, critic, and parent: Choose the right frame at the right time. Coach: getting the best out of yourself. Critic: identifying where you need to grow. Parent: creating the conditions for success.
Don’t just let your career “happen” to you. Manage it.
Real PRD Example from Aakash
Perhaps the most fundamental skill in product development is writing great product requirements documents (PRDs). While there’s innumerable templates and guides to writing a PRD online, the problem is that the best PRDs don’t follow a template. They’re built organically to meet the need of that specific feature.
For paid subscribers, I’ve put together a real PRD example. This will give you a flavor for a PRD that I as a product leader would give a 10/10. It:
Gets to the lowest level of detail and presents a clear phasing plan
Justifies the impact and purpose of the feature
Clearly outlines the hypothesis and success criteria
So, if you want to get a real example of a PRD, here you go.