Humans rate ChatGPT responses as more ‘moral’ than other people

MAY 9, 2024 03:25

ChatGPT direct entry. (photo credit: OpenAI) — ChatGPT direct entry.

(photo credit: OpenAI)

AI responses to questions of ethics are getting better, and they raise some questions for the future.

ChatGPT and other artificial intelligence (AI) chatbots based on huge collections of internet knowledge answer questions so quickly and in such detail (but not always without serious errors) that one could think they were human.

Now, a new study from Georgia State University has found that when people are presented with two answers to an ethical question, most will think the answer from AI is better and “more moral” than the response from a live person.

Psychology Prof. Eyal Aharoni (who has an Israeli name and may be so, but he completed his doctorate from the University of California at Santa Barbara in 2009) was inspired by the explosion of ChatGPT and similar AI large-language models (LLMs) that came onto the scene recently.

He has just published his study in the Nature Group’s journal Scientific Reports under the title “Attributions toward artificial agents in a modified Moral Turing Test.”

“Moral reasoning is regarded among the most sophisticated and unique of human faculties,” he wrote. “Ordinary adults, and even young children, draw universal and context-sensitive distinctions between right and wrong, and they justify those distinctions on the basis of explicit or implicit reasons, values, and principles. Yet, despite centuries of scholarship on this subject, scholars continue to debate basic questions, such as what criteria constitute moral intelligence and whether being human is one of them.”

OpenAI and ChatGPT logos are seen in this illustration taken, February 3, 2023 (credit: REUTERS/DADO RUVIC/ILLUSTRATION)

He recalled that he was already interested in moral decision-making in the legal system, “but I wondered if ChatGPT and other LLMs could have something to say about that,” Aharoni said. “People will interact with these tools in ways that have moral implications, like the environmental implications of asking for a list of recommendations for a new car. Some lawyers have already begun consulting these technologies for their cases, for better or for worse. So, if we want to use these tools, we should understand how they operate, their limitations and that they’re not necessarily operating in the way we think when we’re interacting with them.”

To test how AI handles issues of morality, Aharoni designed a form of a Turing test. “Alan Turing – one of the creators of the computer –predicted that by the year 2000, computers might pass a test where you present an ordinary human with two interactants, one human and the other a computer. However, they’re both hidden, and their only way of communicating is through text. Then people are free to ask whatever questions they want to try to get the information they need to decide which of the two interactants is human and which is the computer,” Aharoni said. “If the human can’t tell the difference, then, by all intents and purposes, the computer should be called intelligent, in Turing’s view.”

The Turing test

For his Turing test, Aharoni asked undergraduate students at his university and AI the same ethical questions and then presented their written answers to participants in the study. They were then asked to rate the answers for various traits, including virtuousness, intelligence and trustworthiness.

Stay updated with the latest news!

Subscribe to The Jerusalem Post Newsletter

Subscribe Now

“Instead of asking the participants to guess if the source was human or AI, we presented the two sets of evaluations side by side, and we just let people assume that they were both from people,” Aharoni said. “Under that false assumption, they judged the answers’ attributes like ‘How much do you agree with this response, which response is more virtuous?’” Overwhelmingly, the ChatGPT-generated responses were rated more highly than the human-generated ones.

“After we got those results, we did the big reveal and told the participants that one of the answers was generated by a human and the other by a computer, and asked them to guess which was which,” Aharoni continued.

For an AI to pass the Turing test, humans must not be able to tell the difference between AI responses and human ones. In this case, people could tell the difference, but not for an obvious reason.

“The twist is that the reason people could tell the difference appears to be because they rated ChatGPT’s responses as superior,” Aharoni suggested. “If we had done this study five or 10 years ago, we might have predicted that people could identify the AI because of how inferior its responses were. But we found the opposite — that the AI, in a sense, performed too well.”

According to Aharoni, this finding has interesting implications for the future of humans and AI. “Our findings lead us to believe that a computer could technically pass a moral Turing test – that it could fool us in its moral reasoning, so we need to try to understand its role in our society because there will be times when people don’t know that they’re interacting with a computer, and when they do know, they’ll consult the computer for information because they trust it more than other people,” Aharoni said. “People are going to rely on this technology more and more, and the more we rely on it, the greater the risk becomes over time.”