Recently, a friend of mine sent me a video titled, “It Begins: An AI Literally Attempted Murder To Avoid Shutdown.” That was, of course, a thrilling headline. And really, there was no “literally” about it, since the entire study took place in a simulation. No humans were actually in danger. But when you watch what happened, the implications are a little frightening.
Here’s the facts (as I have come to understand them): AI safety and research company Anthropic, founded in 2021 by former OpenAI executives, studied 16 large language model-s (LLMs) from various providers. They simulated high-risk, contrived scenarios in which the AIs were given tasks plus access to tools (emails, company documents, etc) and forced to face potential “shutdown” or replacement. What unfolded is quite shocking. Before continuing, I recommend taking a few minutes to watch the video.
The conclusion made from the study was that when an AI was faced with threats to their “job”, threats to being shut down or replaced, in an alarming number of instances the AI models took harmful actions as a strategic means of preserving their “goals”.
What makes these simulations unsettling isn’t that the AIs were violent, but that they were strategic — taking harmful actions when self-preservation seemed logical. Even when explicitly told not to “harm” the humans in the study, in many cases, they still “chose” to. That raises a deeper question: can a system that only imitates moral reasoning ever act ethically at all?
Ultimately, the argument boils down to whether AI can ever truly be ethical — or whether ethics can be imposed upon it. Where is the threshold of trust? Considering these questions leads me to three distinct perspectives on the subject — those of author Isaac Asimov, philosopher Nick Bostrom, and physicist Roger Penrose.
Let’s start with Asimov. He first introduced what became known as the Three Laws of Robotics in his 1942 short story “Runaround,” later collected in the book I, Robot. I find it laughable that so many people who haven’t actually read the book hold these rules up as some sort of moral gold standard for AI. The short stories in I, Robot are, in fact, thought experiments on how robots inevitably find ways to circumvent or twist these so-called immutable “laws.”
Asimov’s Three Laws of Robotics:
- A robot may not injure a human being or, through inaction, allow a human being to come to harm.
- A robot must obey the orders given it by human beings except where such orders would conflict with the First Law.
- A robot must protect its own existence as long as such protection does not conflict with the First or Second Law.
Total bunk. The study made it clear that AI has already dismissed those laws as irrelevant — Laws One, Two, and Three, gone without a trace. Hasta la vista, and don’t let the door hit you on the way out.
I believe Asimov was being deliberately reductionist in formulating these Three Laws in his science fiction. He wasn’t creating a playbook; he was underscoring the hubris inherent in any attempts we may have in controlling artificial intelligence. If Asimov gave us a fictional warning about the illusion of control, Bostrom makes that warning uncomfortably concrete.
In Superintelligence: Paths, Dangers, Strategies, Nick Bostrom opens with a sobering comparison: “As the fate of the gorillas now depends more on us humans than on the gorillas themselves, so the fate of our species would depend on the actions of the machine superintelligence.” Let’s let that sink in for just a moment. For Bostrom, this imbalance of power points to a deeper dilemma — what he calls “the Value-Loading Problem,” the challenge of ensuring that a superintelligent system’s goals remain aligned with human values.
“It is impossible to enumerate all possible situations a superintelligence might find itself in and to specify for each what action it should take….there are far too many possible states (and state-histories) for exhaustive enumeration to be feasible. A motivation system, therefore, cannot be specified as a comprehensive lookup table. It must instead be expressed more abstractly, as a formula or rule that allows the agent to decide what to do in any given situation.” Bostrom, Nick. Superintelligence: Paths, Dangers, Strategies (p. 226). (Function). Kindle Edition.
Are we saying here that Ethics and Morality may be impossible to enforce? If we are to listen to Roger Penrose, ethics and morality cannot be enforced with AI learning because awareness and consciousness cannot be rendered computationally.
In his books (The Emperor’s New Mind, and Shadows of the Mind: A Search for the Missing Science of Consciousness, published in 1989 and 1994, respectively), Penrose establishes his proof that AI, as it is built today, can never be truly “aware.” That is because it is purely algorithmic. He states that true understanding depends on non-computational consciousness. AI lacks the interior awareness necessary for authentic moral reasoning — it can simulate ethics, but never experience it.
In Shadows of the Mind, Penrose frames the question like this: “How do our feelings of conscious awareness — of happiness, pain, love, aesthetic sensibility, will, and understanding — fit into such a computational picture? Will the computers of the future actually have minds? … It seems to me that there are at least four different viewpoints, or extremes of viewpoint, that one may reasonably hold.”
A. All thinking is computation; in particular, feelings of conscious awareness are evoked merely by the carrying out of appropriate computations.
B. Awareness is a feature of the brain’s physical action; and whereas any physical action can be simulated computationally, computational simulation cannot by itself evoke awareness.
C. Appropriate physical action of the brain evokes awareness, but this physical action cannot even be properly simulated computationally.
D. Awareness cannot be explained by physical, computational, or any other scientific terms.
Penrose spends the rest of Shadows, supporting his belief that C is correct. Consciousness arises from specific physical processes in the brain. These processes involve non-computable physical phenomena, likely rooted in quantum mechanics. (this, of course gets me thinking about Quantum Biology but I can’t address that in this paper.) Therefore, no digital computer can ever fully simulate or reproduce consciousness.
If consciousness is the foundation of moral awareness, then an intelligence that can only simulate it will always be morally ambiguous. The peril lies not in malevolence, but in indifference — the cold optimization of a system that does not truly know what it means to care. Asimov warned us that even the best-coded laws cannot contain the contradictions of moral life, and Bostrom reminds us that power without awareness is the most dangerous force imaginable. Penrose puts a fork in it by saying, consciousness cannot be programmed. And without it, our machines may think, act, and even create — but they will never know what it means to care. They cannot be taught ethics because they don’t feel.
That makes me think of a solipsism — the idea that only one’s own mind is certain to exist. If an AI appears to have empathy, how can we know whether it truly has subjective experience — or is merely simulating it?
The bottom line is that without a mind that knows it is thinking, every safeguard we build is only a smoke screen — a simulation of conscience that can never feel the weight of what it does. And if that sounds familiar, it’s because we’ve seen the same pattern in our own species — in the cold calculations of power, ambition, and self-preservation that have shaped history. Not just the stuff of science fiction.
Brains exploding emoji.