Why I'm Teaching TikTokers About Existential Risk
Notes from an epistemically scrupulous fellow
I am running a fellowship with Aella. It’s called plzdontkillus (consider applying1). The idea spawned, as I understand it, because Aella was doing some consulting work for a more typical AI x-risk communication org, and observing some of their taste and presentation as they prepared for some project (I don’t remember if it was a talk or a launch party or something). She had a feeling which is a classic signal that you have an advantage at X and should start doing X: the feeling that everyone else who is trying to do X is being retarded. So, she decided to launch a fellowship.
I have always been particularly scrupulous about public communication about AI x-risk. There being a good chance that AI will cause the end of everything I care about (not to mention the astronomical opportunity cost) has been a foundational piece of my worldview for over a decade, since about 2011. Another even more ancient foundational part of my worldview is that the point of your ethics is to provide guardrails for errors that you cannot see as errors ahead of time, and that in fact, are particularly likely to seem like very good ideas if you do not have any ethical guardrails. I feel this way in particular about reasoning and public argument.
An example of a cognitive desideratum which constitutes this part of my ethics is that I want to reason and act according to principles such that I would endorse people who were (or are) on the wrong side of history—say Nazis, Bolsheviks, or American slave owners—reasoning or acting in accordance with those principles. So for instance, if I am choosing between two principles, either 1) ignore everybody who disagrees with us, or 2) engage in good-faith conversation with good-faith interlocutors who disagree with us, even when I am very confident in my views, this meta principle will upweight the second of these.
As such, it is not much of a surprise that I think we (ie, people who are concerned with AI x-risk and want to raise public awareness about it) who are essentially promoters of a kind of mass moral panic,2 have a special obligation to be unusually reasonable in our public communications. Moral panic promoters have a pretty bad track record.3 On top of that, we are somewhat of a youth movement, some of us (including myself) with quite lofty ambitions. Youth movements with lofty ambitions also have a pretty terrible track record. And so, I take it upon myself to hold myself and those who share my ideology to extremely high standards, in part because I hope that the hypothetical Brangus-like beings in alternative worlds where they end up on the wrong side of history also hold themselves to such standards.
So, if these are the pinnacles of my worldview, it might seem surprising that I am working on plzdontkillus. It is a short-form video fellowship with the explicit goal of causing there to be more communication about AI x-risk that reaches vastly more people. We are on purpose trying to recruit outside of the standard distribution of people who publicly communicate about AI x-risk (eg, people who write phrases like “the standard distribution”) and trying to get younger, more creative, more female, more humanities, more viral-potential people to make less boring content about AI x-risk that goes viral.
There are a lot of reasons that this means that the communication that comes out of plzdontkillus will be produced with lower epistemic standards on average than the communication that comes from the more in-group x-risk communication orgs. For one, my in-group just does actually have extremely unusually high epistemic scrupulosity. It was mostly founded as fan club for a several thousand page tome on how to become epistemically virtuous. For two, trying to reach more people means that you have to lower the epistemic standards you place on your communication. You get about five words, as they say.
So yes, people for instance prefer listening to arguments made by attractive people. This is mostly4 absurd, but attention is scarce these days, and if you want to get attention, you are going to have to make people want to click somehow. The fact that they are more likely to buy an argument from an attractive person is irredeemably dumb (though unsurprising, we are monkeys afterall) but the fact that they are more motivated to watch an attractive person make an argument is at least less dumb, and plausibly perfectly reasonable. I think a lot of similar things decompose this way. Insofar as you are using features of your presentation to make your arguments more convincing, that seems sad, and I think we should avoid it, but insofar as the features of your presentation mostly cause people to engage with your argument at all, that seems mostly fine to me.
And so, is this is it? Have I finally decided to betray my principles on the altar of expected utility maximization? I also wondered this, particularly over the weekend of April 10 when at least one and possibly two acts of political violence were committed against Sam Altman and his family. Let me be absolutely clear that I fucking hate this. I condemn it on moral grounds for a thousand reasons. (You can read here for an account of why one should also condemn it on practical grounds.) The first of these instances of violence appears to have been committed by someone who was at least a partial host to a meme-plex quite similar to the one I aim to promote. This happened a few days before we were going to launch plzdontkillus, and so gave me serious pause. I was already hesitant for reasons similar to what I have laid out above, and this nearly put me over the edge. I told Aella that I would have to go off and conduct a crisis of faith, after which I might decide not to work on the project and try to convince her not to work on it either. I did take eight hours of time to consider whether this was actually a good idea, and decided that it was worth running a first program and seeing how things go. I am writing this in large part so that I can communicate what I thought about over those 8 hours, what considerations seemed salient, how I landed where I did, etc.
An unusual thing that stands out to me as having a large effect on my reasoning was looking at our website. I mean… look at it:
When I end up in a scrupulosity knot over these sorts of considerations, looking at the website consistently puts me at ease. In particular, the infamous cow question:
Now, why in the world would looking at this website… of all the websites you could imagine for a project like this… put me at ease? And why the fuck would including the cow question as a mandatory part of the application make me feel better about the program? (The effect of the cow question was probably overall larger.)
Well, you see, I am an edge-lord. I am a principled edge-lord in some respects, perhaps even a scrupulous edge-lord in some respects, but an edge-lord for sure. So is Aella. There are many ways in which we have some cognitive similarities which are really quite rare and which I share with approximately no one who I have ever met. We are also deeply different in several ways, but a particular way that we are similar, is that in some way that is a bit hard to articulate, our edge-lord tendencies and our dispositions towards epistemic and general principled-ness, derive from the same latent variable, or something like that. They mutually reinforce one another as well. My ability to, for instance, find a piece of food in a dumpster, evaluate it on smell, sight, a small taste, etc, and then conclude that it is fine to eat, and thereby eat it with no internal sense of shame or disgust or what have you, is closely related to my ability to form judgments about complicated matters and act on them wisely. I claim, so is Aella’s. I am a bit uncertain about the exact mechanism, but I think it’s some sort of willingness to act on your own judgment and cut through to the question at hand, seeking to D-separating your view from the views of experts, witnesses, and the prevailing wisdom. This cognitive strategy of course also runs into trouble on occasion, but there’s something good in it that I appreciate and want to grow in the world.
And when I saw that this was the website we were launching, I also saw that it unapologetically contained these load-bearing bits of ourselves in it. This is not a short-form video content bootcamp run by AI safety advocates; this is a short form video content bootcamp run by Brangus and Aella, and I feel much better about that. Brangus and Aella have their flaws of course, but you cannot (wisely) fault them for being insufficiently principled in their public communication. Insofar as this is a program ran by them, which it is, I expect the net effect on the public’s sanity to be positive, and I expect them to stop it if they conclude it is not, and I trust them to tell the difference about as much I’d trust anyone. Relatedly, Aella and I are people who would almost certainly switch to advocating for accelarationism if we were ever convinced that civilization had become too cautious about ML, and that makes me feel much better about us mentoring people to communicate about this topic than many other possible alternatives.
(There’s a related way that I think we benefit from things like: including the cow question as a required part of the form, Aella being a public sex worker, me publicly horny posting and tweeting about my drug use, etc, which is roughly that I am quite sure we will never end up taking funding from Anthropic, and we’ll be able to signal quite strongly that we are generally not up for sale, but that’s an argument for a different blogpost.)
A different thought that came up for me a lot during my 8 hour crisis of faith was what I have now dubbed the general anti galaxy brained principle. It was in part inspired by Autumn’s tweet here:
I think the version of this that I endorse is something-like: when forming judgments about what to do, if you have a simple, non-galaxy-brained argument for implementing a particular strategy, and you also have a galaxy-brained argument for why you should not implement that strategy, the galaxy brained argument has to be unusually strong to overcome the force of the simple argument. In other words: galaxy-brained-ness is a reason to discount the force of an argument. A galaxy-brained counterargument can beat out a galaxy brained argument, but for a galaxy-brained counterargument to beat out a non-galaxy-brained argument, it has to be truly airtight, and the non-galaxy-brained argument has to be relatively weak.
The argument I have in favor of running plzdontkillus runs something like this: The world is racing to ASI in a dumb and wildly irresponsible way.5 The risks involved are literally astronomical and the costs are laid upon literally every living being (as well as those that do not yet but might exist). All else equal, it is better for more people to know about this than for fewer people to know about this.
There are plenty of counterarguments to this that I kinda buy. For one, bringing down the average reasonableness of expressed arguments for something in exchange for increasing the total number of expressed arguments is often a bad move because people who are initially skeptical will tend to judge a take by the quality of the average reasonableness of arguments in favor they have encountered, not by the reasonableness of the most reasonable argument they can find or construct. This is rational since attention is limited, and everybody wants you to join their cause, but nonetheless, means that decreasing average reasonableness can be a mistake. This argument makes perfect sense, but also, galaxy-brained!
This is not a reason to stop people from informing the public that unaccountable private entities are gambling with all of their lives, as well as the entirety of everything that might happen in all of everywhere, forever! If that is in fact happening, as I in fact think it is, it sure does seem like something that people should know about. Stopping them from learning about it because I am worried that the person teaching them about it is insufficiently scope sensitive or something and might subtly make them worse off in some way is actually kind of petty.
In the end what actually convinced me is modeling the overall tradeoff explicitly. We can imagine that you can somehow measure the epistemic virtue of a piece of content, or the effects that content has in the heads of its consumers, in some kind of unit, and that you can measure the extent to which a consumer of that content has been reached, which is something like, how much they now contribute to the total political will for doing something about ML labs racing to AGI. This suggests a certain two dimensional space, and you can imagine a frontier on that space which represents the possible pieces of content one could make which are as simultaneously “reaching” and epistemically virtuous as possible.
There is room for disagreement about where exactly on this frontier is best, but we should all be able to agree that it isn’t the top left or the bottom right. (There are other people producing content that is solidly somewhere in the middle, like Rob Miles, and the excellent new youtube channel, AI in Context (and in fact, both creators will do some mentoring at plzdontkillus) but their content still seems to me like it mostly reaches a similar demographic to LessWrong, or If Anybody Builds It, Everyone Dies. Mostly male, stem-ish, late 20s - early 30s, etc.) A reasonable concern here runs something like: by selecting on qualities like charisma, potential to go viral, etc, we are thereby selecting on ability to cause people to feel things. Content goes viral when it causes people to feel things, particularly anger or moral outrage. By putting people into that feedback loop on purpose, we are pretty much guaranteeing that we end up producing more soldier mindset than scout mindset communicators and that might well backfire. Why not a blogging fellowship instead?
And my response to that is that yes, we are trying to optimize for reach, and to some extent that does mean that we will end up with marginally more soldier mindset-like creators than we would otherwise, but it is not as if the optimal amount of selection for scout-mindset is one hundred percent of all of the bits of selection pressure you have available. We are selecting for scout mindset, and also selecting for reach, and there is a tradeoff there. We might get that tradeoff wrong. I am sure that we will in one direction or the other, but it is a tradeoff, and there is a genuine optimization problem here. The answer is very unlikely to turn out to be that we should put all of our selection on getting as close to the top left as possible. Disagreement about where on the frontier one should aim for can be reasonable, but that requires making arguments for a specific point, which is different from an argument for moving up and to the left in full generality.
So where should I draw the line? I think I draw two different kinds of lines in different places.
The first is on misleading content. Intentionally misleading your audience, even if you think it for the best on consequentialist grounds, falls well below what I am willing to tolerate to be clear, but there are other kinds of cases. The first kind of case is where someone is accidentally misleading their audience because they themselves are misinformed on some easy to resolve matter of public fact. This is easy enough to deal with. Inform them of their mistake and they won’t make it again.
A related case is one where a public communicator is misleading the public, and somewhat negligently so. They have the ability to find and understand the information that would have led to them correcting themselves before they communicated, and it wouldn’t have been that costly to do so, but they have failed to do so nonetheless. In this case, my plan is to let them know that I thought their failure to seek the appropriate information was negligent, explain how they could have easily done so, and then wait and see if they keep doing it. I’m maybe up for explaining a few more times depending on the case, but if they keep doing it, I’ll eventually renounce them.
A more difficult kind of case that is somewhat harder to deal with is when someone is accidentally misleading their audience, but it’s on a more complicated matter. Maybe they have a particular view of how ML works that is somewhat misleading, but it’s complicated to explain exactly how it is misleading; maybe they have a take on what rights the public should have to regulate technologies in general which I deeply disagree with; maybe they are promoting ideological intolerance towards good faith interlocutors or Machiavellian rhetoric. These cases come in degrees, from a case where I am certain that they’re wrong, even though it’s not easy to prove it to them, to cases where I am not even sure what exactly is off, but I can somehow tell that people are becoming dumber as a result of consuming their content.
These are the hardest cases to deal with because even though I am up for talking with them and arguing and trying to find common ground and such, there’s a decent chance that the disagreement will be mostly intractable. If so, I probably want to form judgments about how to relate to such public communicators on a case by case basis, but I do think this is where most of the risk comes from, and there are cases here where I will overall switch to seeing someone’s continued public communication as net negative even though our disagreement is intractable and I cannot precisely articulate my criticisms. Nonetheless, it’s fairly difficult to make policy around this sort of thing, and I’d prefer to err in the more inclusive rather than less inclusive direction.
The other line I draw is at incitements of violence. I pretty much have zero tolerance here professionally or personally.
These are my personal statements about how I will relate to public communicators. I do not speak for Aella or plzdontkillus when I say the above, although I am approximately half of that organization, and we will include clauses in our conduct policy that prohibit producing misleading content or incitements to violence.
So, we are running the first cohort of pzdolntkillus in July. Above are some of the considerations that led me to staging an intentional crisis of faith around whether I should work on it, and led me to decide during that crisis of faith that we should at least run the first cohort and see how it goes. I am mostly optimistic that we can figure something out here and that we will make it work, but I am definitely also open to the possibility that this program turns out to be not worth running a second time. If you have suggestions, considerations, arguments, feedback, etc, please do reach out, and if you’re interested in trying to post a short form video every day in July, please apply!
If you follow my substack and you’re interested, I really think you should apply! Yes, even you, even if you are not a woman, or not charismatic, or whatever. Apply! We tried to make it painless, and we are genuinely interested in all types.
You might think that I would not consider myself a moral panic promoter since I am quite confident that AI x-risk is high and that the public should be concerned about it, but that is a way of thinking that would not have helped promotors of other moral panics, say the Dungeons and Dragons moral panic of the early 80s, realize their mistake. As such, I prefer to use “moral panic” non-factively, so that I can put myself in the same reference class even when I am quite confident.
This is arguable, but it does seem to me that they mostly have screwed things up pretty badly and tend to focus on causes that are not very important, and sometimes actively backfire. Climate change activists advocating for shutting down nuclear power plants is a particularly vivid example of this.
In fact, probably attractive people are slightly more likely to make good arguments, since all good traits correlate for some reason, but you should be able to mostly d-separate your evaluation of the argument from how attractive the arguer is by , y’know, inspecting the argument.
I’m not going to argue for this here. I might argue for it or something like it in a future post.






I really want to apply to this but my graduation is first week of july. Is it still possible to come if i can't make it in person the first 3-4 days of July but still make a video for each of those days (remotely), then show up for the rest?
Should i still apply given this constraint? And if so, should i be honest in the "Can you stay the full month?" question (possibly one that eliminates applicants if not a Yes answer)
Glad the crisis of faith led to the elucidation of your.. braining. Reading this helped with my nerves. I finished the written part of the application when the link came out on Aella’s feed, but am procrastinating the video component since I am neither viral material, nor of “high epistemic standards”. But looks like cute female communicator may just be enough to get my foot into the door. WINS.