I am more optimistic than you for a few reasons, but one that I'd like to highlight is that I think humans are likely okay in a world with mostly aligned ASIs but also some misaligned ones. In that case there will be much more intelligence and resources working in favor of humanity than against. This is far from perfect of course, since some technologies have much easier offence than defence. It's easier to create a novel pathogen than to defend against one. But in a world where most AI is aligned, it seems very likely that some humans will survive even in a worst-case engineered pandemic.
It partly depends on how smart and dangerous those misaligned superintelligences are. They could manufacture a killer pathogen that stays in your body dormant for months, and then kills you. If that's the case, most of humanity might get infected before we realize, and it might kill a pretty substantial percentage of the human population. Unsure if it would kill everyone, though.
But that's just a realistic-ish scenario with 21st century technology. I'm sure a superintelligent AI could do weapons that we cannot even imagine, like the jump between normal bombs and nuclear weapons.
So I'm not sure aligned superintelligences would be able to fend off extinction, given that destruction is easier than protection. It depends what particular scenario you're imagining.
This was a good piece and largely mirrors my own evolution on this question. For me, it was Zvi who flipped my thinking with the disempowerment thesis, which I find too highly plausible. Single metric, bayesian priors, have always struck me as an overly narrow frame, but it looks like LLMs think very much like Rationalists, maybe good maybe bad, but narrowing for sure. One thing to look at though, and people hate me for saying this, but Grok is the first next-FLOP-gen model. And while people are praising it for being 'open' and non-refusal, it's not **that much better** than the previous O1 (GPT 4 + Thinking) generation. Unverifiable tasks might be a bigger gap in the jagged frontier than we assume. p(doom) 10-15% depending on definition.
Very interesting read, Rafael. I’ve had a similar trajectory in my assessment of AI risk and of Yudkowsky and the rationalists while engaging with EA over the years. (Though I could not have put it as eloquently as you did)
Keeping in mind the limited use of p(doom) estimates in general, at this point even your ~30+% doom is oddly reassuring to me. What a strange time to be alive!
I spent a few years looking into x-Risks. The AI one was definitely the most fun. I settled on close to zero.
There is a fundamental flaw re LLMs that people keep overlooking: simulated intelligence will always be flawed, regardless of how refined it gets (and we are many orders of magnitude from anything convincingly faking actual intelligence). There is — quite literally — no one there doing any thinking. Exactly like a calculator.
Pragmatically speaking, it will be stupid humans using powerful narrow AI that will screw us, not misaligned AI.
Btw I found Brian Christian’s The Alignment Problem to be a great general info book. Bostrum is a bit nuts but beings great value by listing all the way things can go wrong, which humanity, with its optimism bias, myth of progress and myriad of other delusions, totally ignores (outside of nerdy communities).
Then I found something with p(Doom) of almost 100%, and could easily move the AI risk to “minor concern” catagory: the polycrisis, metacrisis, and impending civilizational collapse.
Wait till you look at that. It will melt your mind.
I am more optimistic than you for a few reasons, but one that I'd like to highlight is that I think humans are likely okay in a world with mostly aligned ASIs but also some misaligned ones. In that case there will be much more intelligence and resources working in favor of humanity than against. This is far from perfect of course, since some technologies have much easier offence than defence. It's easier to create a novel pathogen than to defend against one. But in a world where most AI is aligned, it seems very likely that some humans will survive even in a worst-case engineered pandemic.
It partly depends on how smart and dangerous those misaligned superintelligences are. They could manufacture a killer pathogen that stays in your body dormant for months, and then kills you. If that's the case, most of humanity might get infected before we realize, and it might kill a pretty substantial percentage of the human population. Unsure if it would kill everyone, though.
But that's just a realistic-ish scenario with 21st century technology. I'm sure a superintelligent AI could do weapons that we cannot even imagine, like the jump between normal bombs and nuclear weapons.
So I'm not sure aligned superintelligences would be able to fend off extinction, given that destruction is easier than protection. It depends what particular scenario you're imagining.
This was a good piece and largely mirrors my own evolution on this question. For me, it was Zvi who flipped my thinking with the disempowerment thesis, which I find too highly plausible. Single metric, bayesian priors, have always struck me as an overly narrow frame, but it looks like LLMs think very much like Rationalists, maybe good maybe bad, but narrowing for sure. One thing to look at though, and people hate me for saying this, but Grok is the first next-FLOP-gen model. And while people are praising it for being 'open' and non-refusal, it's not **that much better** than the previous O1 (GPT 4 + Thinking) generation. Unverifiable tasks might be a bigger gap in the jagged frontier than we assume. p(doom) 10-15% depending on definition.
Very interesting read, Rafael. I’ve had a similar trajectory in my assessment of AI risk and of Yudkowsky and the rationalists while engaging with EA over the years. (Though I could not have put it as eloquently as you did)
Keeping in mind the limited use of p(doom) estimates in general, at this point even your ~30+% doom is oddly reassuring to me. What a strange time to be alive!
I spent a few years looking into x-Risks. The AI one was definitely the most fun. I settled on close to zero.
There is a fundamental flaw re LLMs that people keep overlooking: simulated intelligence will always be flawed, regardless of how refined it gets (and we are many orders of magnitude from anything convincingly faking actual intelligence). There is — quite literally — no one there doing any thinking. Exactly like a calculator.
Pragmatically speaking, it will be stupid humans using powerful narrow AI that will screw us, not misaligned AI.
Btw I found Brian Christian’s The Alignment Problem to be a great general info book. Bostrum is a bit nuts but beings great value by listing all the way things can go wrong, which humanity, with its optimism bias, myth of progress and myriad of other delusions, totally ignores (outside of nerdy communities).
Then I found something with p(Doom) of almost 100%, and could easily move the AI risk to “minor concern” catagory: the polycrisis, metacrisis, and impending civilizational collapse.
Wait till you look at that. It will melt your mind.
I blog a bit about it, if you’re unfamiliar and curious. Start here https://open.substack.com/pub/gnug315/p/part-1-of-5-denial