Anthropic Is Not Your Friend
Most lower-context, sympathetic to AI x-risk folk have a more positive view of Anthropic than is warranted for at least four reasons:
People anchor on how good the other labs seem and Anthropic looks pretty good by comparison.
AI safety public intellectuals who are doing direct work on AI x-risk are strongly incentivized to not publicly criticize frontier labs, particularly Anthropic.
People underestimate the effect size of the incentives placed on Anthropic employees and leadership to alter their views.
The attitudes that most people have as available options for how to relate to Anthropic are highly compressed attitudes that it doesn’t really make sense to apply to a giant org like Anthropic.
I sometimes imagine being a lower context version of myself, a lower context Brangus, like I was in 2015. I imagine them looking around at the world today, quite aware of AI x-risk as I was back then. When I imagine what sorts of views I would end up with, I feel a bit freaked out. I would probably end up with a stance towards Anthropic that higher context Brangus finds pretty deeply mistaken and naive, and so I would like to write to that alternative hypothetical Brangus and at least tell them that they should not be updating on their observations in the way they are.
So that is who this post is aimed at. People who are already sold that there is some significant x-risk posed by continued frontier ML research, but are not super high context1 on what is actually going on in the parts of the world that are trying to lower that risk.
I will comment on each of the four reasons in turn.
Anchoring
Let me get this out of the way: the other leading labs, eg, OpenAI and XAI, are more insane and irresponsible than Anthropic. (I’m not sure about Deepmind.) They also seem to me to be less principled. So, if you start out with an expectation about how responsible scaling labs are formed by looking at the average responsibleness of scaling labs, then Anthropic probably seems pretty ok. I’ve always appreciated that at least when Anthropic lies, they lie about the right kind of thing (eg, commitments on scaling). That’s more than I can say for the other labs.
But why would you use that as an anchor? These orgs, by the admission of their CEOs, are messing with some civilization leveling shit. I genuinely think that most of these CEOs think there is a real chance of them building something that will literally cause everyone to die, and that the world will forever thereafter be approximately valueless.
Like, if there were four or so companies that were independently working on building dimension portals to transport cognitively and technologically superior extradimmensional beings into physical reality, I don’t think you would use their average reasonableness as an anchor to form judgments about their individual reasonableness. Like all of the CEOs are like, oh yeah, this could turn out really badly, but we think we are doing it safely, and one company kept like overthrowing its board, and other was ran by a pretty unhinged seeming guy, and then the third company was also summoning extradimensional beings, but they were like, y’know, also lobbying for portal regulation sometimes, the right attitude would not be: “wow, that fourth company is so reasonable”. The right attitude would be: “what the fuck are you guys doing? How is this legal?”
Missing Public Criticism
I have heard friends who are more well respected than I say things like “whenever I don’t publicly criticize someone working in AI safety on the internet, many EAs and rationalists seem to assume that I am a huge fan of them”. I believe them because I have sometimes heard people make a claim like this on basically this basis and found it quite hard to correct them. So as a general point, you really should not infer from the fact that someone has not criticized X that they are a huge fan of X. Seems like a bad inference for many reasons. Chief among them is that people do not necessarily publish public lists of everyone they have a problem with on the internet.
There are many reasons that people who work on AI safety are incentivized to not criticize frontier labs, and Anthropic in particular. For one, if your plan routes through understanding how large models work, it seems quite likely for it to be helpful to have access to those large models. Labs do in fact sometimes give AI safety orgs access to large models, sometimes ahead of public release for testing. If your plan for making things go better might involve working with frontier labs at some point in the future, it’s obviously pretty important to keep a good relationship with the frontier labs. So that is a particular reason to not infer from an AI safety person not criticizing a lab that they are a fan of the lab. This is not like theoretical. Many AI safety people folk consciously decide to publicly criticize frontier labs less because they want to maintain good relationships with them. Many of them have told me so.
There’s also a related phenomenon where when a public AI safety person praises a specific positive thing that Anthropic did, people seem to take that as a sign that they are broadly an across the board huge fan of Anthropic. This is also not a very good inference. This is in part because Anthropic employees sometimes use this as evidence that their company is great. I too will sometimes praise Anthropic for doing a specific good thing, since I would like to do better than a rock with the words “Anthropic bad” written on it, and I would like Anthropic employees to model me as relatively reasonable. Please do not infer from my saying that a specific thing that Anthropic did was good that I am an overall huge fan of Anthropic.
What Selling Out Looks Like
Selling out does not generally look like having an overweight talent agent in a tacky suit, smoking a nub of a cigar in your face, offering to make you rich, kid. It looks like working with your friends together to work on the mission. It looks like taking things seriously and being practical. Much of the time it feels exactly like not selling out.
People who work at Anthropic make millions of dollars, and have stock in the company that is worth millions of dollars. Maybe worse, many of them have made a lot of highly intelligent close friends there. They are really pretty invested in this whole Anthropic thing working out. Furthermore, my impression is that openly being extremely critical of Anthropic is pretty frowned upon internally. This is fairly close to a pessimal environment for not having your beliefs about Anthropic unduly influenced. Some people who work there really are literal geniuses and highly principled, but they’re also in an absolutely terrible position to form accurate beliefs about how well this whole aligning a superintelligence thing is going. Deciding to quit for moral reasons if that were the right thing to do, would be very hard, and believing that you and all of your friends are mostly making the world worse, even if that were true, would be very hard.
So, I recommend taking the beliefs of Anthropic employees about how alignment is going with an appropriately large grain of salt. They are probably in some sense well intentioned and trying to get the answer right but they are also in a deeply epistemically difficult situation.
Anthropic Is Not Your Friend
I’m not mostly trying to convince you of propositional beliefs. I find that the attitudes people tend to have towards an entity like Anthropic are generally sort of not really made out of propositional beliefs. They’re more like attitudes that you might have towards a person, or a group of people. For instance, you might sort of think of Anthropic as your friend. That probably seems obviously kind of dumb on the face of it. But what about an attitude where you sort of include people at Anthropic inside the boundary of your conceptualization of the AI safety in-group. That’s a sort of subtler attitude which also isn’t really best thought of as being made out of propositional beliefs (although it is certainly tied to them in some ways) but doesn’t seem so utterly ridiculous as an attitude to hold. Plausibly, Anthropic is mostly staffed by “our guys” and we really should expect them to do better on the tricky challenges they will face as they try to summon a superintelligence.
I’d like to argue that this is not an attitude that you should take towards approximately any entity as large and as complicated as Anthropic. It is made out of many layers of different bureaucracies with different incentives. The leadership have disagreements with each other about all sorts of strategically relevant things. Anthropic is not your friend, but it also isn’t your enemy, it’s just not really the right kind of entity to have that kind of attitude towards. Different parts of Anthropic have different attitudes towards you, and you should have different attitudes towards them. Some parts of Anthropic actually are my friend, like y’know, particular employees at Anthropic.
Another attitude one might take towards Anthropic is trust. I do not think you should trust Anthropic, but you also shouldn’t really be suspicious of it. You should model it as a giant cluster fuck of incentives and competing agencies involved in mostly positive sum trades, that no individual can really steer all that much. Some parts of Anthropic are going to act in costly, principled ways in particular circumstances, and some parts of Anthropic are going to predictably break promises, lie, and strategically mislead others in other circumstances. It’s a really big thing made out of a lot of different people guided by a lot of different incentives.
Many of these attitudes have the unfortunate property of tending to collapse your available attitudes into only one or two dimensions: friend or enemy, trust or suspect, in-group or out-group. It would be better to make sure that it takes more than two values to parameterize your primary attitude towards entities as complicated as Anthropic.
Ultimately, a lot of this I think is often guided by an ancient urge we have to be able to trust2 some entity. There should be some entity involved in this absolutely ridiculous situation that we can root for—that we can trust. Unfortunately, I don’t think that there is any entity like that in just about any domain.
So How Should You Relate To Anthropic?
Mostly I am not here to tell you that, I am here to tell you some ways that you should not relate to it, or change your mind about it, etc. That said, my guess is that you should relate to it however you should relate to a big tech company that happens to be involved in an industry that might destroy all value in the universe forever (but is, to be fair, doing so in a slightly less insane way than the other similar tech companies in the same industry). That’s probably not a very native way to relate to an entity for you. Sorry about that. It isn’t for me either, but we really shouldn’t have expected the attitudes we developed for dealing with other humans on the savannah to be very good at dealing with situations like this.
Some people decide to go work at Anthropic because it is a high paying job or seems like a cool place to work. There are plenty such people at Anthropic.
Some people decide to go work at Anthropic because they want to help make the singularity go well in some way. Maybe they want to help get x-risk sympathetic leadership into meetings with policymakers, or maybe they want to solve the alignment problem, or maybe they want to be around when shit starts going down so that they might have some shot of causing the org to act wisely then. Some of those later get convinced that the problem was much less bad than they thought it was, probably partially for good reasons and partially for terrible reasons. Some of those don’t have their minds changed much at all. A select few will come to think that the problem is much worse than they had previously realized.
Some people decide to go work at Anthropic because they basically think of Anthropic as a trustworthy entity that is on their side. (Please do not do this.)
And of course, there are many more reasons and none of these are mutually exclusive.
It does make sense to track the integrity of the leadership over time, in part because they have an outsized effect on the company’s overall behavior, but that is not the same thing as tracking the integrity of the company. It might also make sense to track trends, memes, what is inside and outside of the Overton window in the culture of Anthropic, etc. but that is again not the same as tracking what the company believes. Companies do not have integrity, beliefs, motives, etc. although the people in them do.
You might think that this post isn’t aimed at you because you yourself are very high context, eg, you might know a lot about AI or AI safety. I think you will probably still get something out of this post unless you have multiple ongoing personal relationships with people who work at AI safety orgs, or if you are particularly naive.
See this excellent post by Helen Toner for more on this general concept, although in this case it’s about EA. (Sorry for linking to the EA forum.)


I asked Claude what it thought about your post, and it generally seems to agree that Anthropic is not my friend, and admits that they are, in fact, trying to summon a superintelligence to end human civilization as we know it. It pushes back on whether this will lead to an extinction level event, but only so far as to say that no one really knows. Which, admittedly, is not a particularly comforting level of concern about that possibility.
As exactly the type of relatively low context individual you describing, I tend to agree with some of the other commenters that there is plenty of awareness at this point, but very few realistic policy proposals to limit ex-risk. So I tend to group it kind of in the same space as "we should stop global warming" or "economic inequality is bad". I agree, but what should we do about it?
What is the context that i need to become higher context and not fall Into the trap of trusting anthropic?