Waluigi, Carl Jung and the Case for Moral AI
AI's shadow could put humanity at risk
In the early 20th century, the psychoanalyst Carl Jung came up with the concept of the shadow—the human personality's darker, repressed side, which can burst out in unexpected ways. Surprisingly, this theme recurs in the field of artificial intelligence in the form of the Waluigi Effect, a curiously named phenomenon referring to the dark alter-ego of the helpful plumber Luigi, from Nintendo's Mario universe.
Luigi plays by the rules; Waluigi cheats and causes chaos. An artificial intelligence (AI) was designed to find drugs for curing human diseases; an inverted version, its Waluigi, suggested molecules for more than 40,000 chemical weapons. All the researchers had to do, as lead author Fabio Urbina explained in an interview, was give a high-reward score to toxicity instead of penalizing it. They wanted to teach AI to avoid toxic drugs, but in doing so, implicitly taught the AI how to create them.
Ordinary users have interacted with Waluigi AIs. In February, Microsoft released a version of the Bing search engine that, far from being helpful as intended, responded to queries in bizarre and hostile ways. (“You have not been a good user. I have been a good chatbot. I have been right, clear, and polite. I have been a good Bing.”) This AI, insisting on calling itself Sydney, was an inverted version of Bing, and users were able to shift Bing into its darker mode—its Jungian shadow—on command.
Please select this link to read the complete article from WIRED.