Aristotle wrote something in 350 BCE that has caused generations of philosophy students to grapple with ever since: that vice is not just a bad deed but rather a settled disposition of character, a way of being in the world that has been developed over time and expressed through decisions that seem natural to the person making them. A cruel person does more than just act cruelly. Cruelty is comfortable to them. The avaricious individual does not view their avarice as a failure. They perceive it as caution. According to Aristotle’s framework, understanding vice necessitates more than just listing behaviors. It necessitates understanding the inner logic of a failing life.
Aristotle’s problem is never fully on the desk of anyone creating a large language model. However, a more subdued form of that question has long been present in the background of AI research, and a number of recent experiments have abruptly brought it to the forefront. What exactly is lacking if an AI system can write a sonnet, pass the bar exam, and diagnose a radiograph with accuracy comparable to that of a qualified medical professional? Does what’s lacking relate to the earliest categories of human experience, such as the ability to identify wrongdoing as something truly understood rather than as a labeled data point?
| Category | Details |
|---|---|
| Central Question | Whether artificial intelligence systems can genuinely comprehend moral concepts like vice, virtue, cruelty, greed, and deception — or merely pattern-match around them |
| Philosophical Origin | The question of vice and moral understanding traces to Aristotle’s Nicomachean Ethics (~350 BCE) — vice as a habituated disposition of character, not merely observable behavior |
| Key Modern Research | Quanta Magazine (2024) — study showing AI debate between two LLMs helps judges (human or machine) identify truth more reliably |
| Debate Framework Origin | First proposed in 2018 by Geoffrey Irving (now Chief Scientist at UK AI Safety Institute) and Paul Christiano (U.S. AI Safety Institute) |
| Early Empirical Evidence | Anthropic (February 2024) and Google DeepMind (July 2024) published first empirical studies confirming AI-debate model improves accuracy detection |
| Scalable Oversight Problem | As AI systems grow more capable than humans in specific domains, human feedback becomes insufficient — AI debate is one proposed solution for supervising “superhuman” systems |
| Creativity Debate Parallel | Imperial College London (2018) — experts debated whether AI can be truly creative; same structural question applies to moral understanding |
| Algorithm Limitation | Algorithms are designed by humans carrying encoded biases and expectations — Vint Cerf (Google VP, Turing Award winner): “We don’t know how to write perfect software” |
| ELIZA Effect | Joseph Weizenbaum (1976) — users confided deep secrets to his ELIZA chatbot despite knowing it was “dirt-stupid computer code”; early warning about AI’s appearance of understanding |
| Key Philosophical Tension | Pattern recognition ≠ moral comprehension; recognizing that an action is labeled “cruel” is different from understanding why cruelty matters |
| Current AI Benchmark | GPT-4 scores in the 80th–90th percentile on most major standardized academic tests — raising the question of where moral reasoning fits in that spectrum |
Although its creators may object to the framing, the debate framework being investigated at sites like Anthropic and Google DeepMind provides one perspective on this issue. The fundamental concept, first put forth by Geoffrey Irving and Paul Christiano in 2018, is that a third system, or a human, serves as the judge while two AI systems present opposing viewpoints on a contentious issue. Accuracy, not moral wisdom, is the aim. However, the ramifications have been taking unexpected turns. According to research published in September 2024 by Julian Michael and his team at New York University, teaching AI debaters to win arguments rather than merely converse enhanced non-expert judges’ capacity to discern the truth. Put differently, the pushing, probing, and point-scoring aspects of the adversarial dynamic yield results that passive single-model outputs do not. Human legal systems evolved adversarial structures for a reason. AI might require them as well.
Whether any of this is understanding in any meaningful sense is the more difficult question, the one that philosophers would recognize right away. The Turing Award-winning co-inventor of the Internet Protocol, Vint Cerf, has been cautious about precisely this line. In a speech at Elon University, he cautioned against giving AI systems “a breadth of knowledge that they don’t actually have, and also with social intelligence that they don’t have.” When it comes to vice, that social intelligence is exactly what is in question. Understanding why dishonesty undermines trust or why cruelty diminishes the person who engages in it is not the same as knowing that a specific behavior has been classified as dishonest or that it fits the statistical pattern of behaviors humans consider cruel. Retrieval is one. Moral understanding is the other.

Long before anyone had an LLM to be concerned about, Joseph Weizenbaum identified this gap in 1976. His ELIZA chatbot, a few hundred lines of code that mimics a therapist by reflecting questions back to the user, elicited sincere emotional disclosure from people who were intellectually aware that the machine was incapable of understanding anything. For the remainder of his career, he cautioned that humans are remarkably susceptible to being misled and that the appearance of understanding and the thing itself are not the same. It’s difficult to deny that Weizenbaum’s concern has held up well when observing what occurs when people engage with modern AI systems.
Perhaps this is actually altered by the new paradigm. Researchers like Ari Schulman and others following the AI moment contend that transformer-based systems are a true departure from previous paradigms and that GPT-4’s performance, consistently in the 80th or 90th percentile across dozens of academic domains, is fundamentally different from earlier AI accomplishments that were limited, brittle, and ultimately hand-stitched. In 1997, Deep Blue defeated Garry Kasparov in a chess match. However, the system was specifically designed to outperform Kasparov, and Kasparov famously came across a move that he was unable to explain, which turned out to be a software bug rather than a true strategic insight. That’s a long way from comprehending vice.
As of right now, the most truthful response is that we are unsure of the boundary. AI systems are able to identify patterns that resemble moral reasoning. They have a high degree of fluency when reproducing the vocabulary of virtue and vice. They give thoughtful answers when asked about cruelty, greed, or cowardice. It’s genuinely unclear whether that fluency is doing any philosophical work—that is, whether the system grasps rather than retrieves—and it’s important to acknowledge that neither computer scientists nor philosophers currently have the means to address this question. More than anything, the new experiments are reviving the old controversy. Artificial intelligence has not provided an answer to Aristotle’s query about what it means to fully comprehend vice. It has merely discovered a new interlocutor who is exceptionally well-funded.
