A developer created a test to see how chatbots of AI respond to controversial issues


A pseudonym developer has created what they call a “evaluation of freedom of expression”, Speeches mapFor AI models that feed chatbots like OpenAi’s Chatgpt and X’s Accumulate. The objective is to compare how the different models deal with sensitive and controversial issues, the developer told TechCrunch, including political criticism and questions about civil rights and protest.

IA companies have focused on adjusting how their models handle certain topics such as Some White House allies accuse Popular chatbots of being too “awakened.” Many of the close confidants of President Donald Trump, such as Elon Musk and Crypto and Ai “Tsar” David Sacks, have claimed that the chatbots Conservative views censor.

Although none of these AI companies have responded to accusations directly, several They have pledged to adjust their models so that they refuse to answer controversial questions less frequently. For example, For its last harvest of flame modelsMeta said he tuned the models not to support “some opinions about others” and to respond to more “debated” political indications.

Speechmap developer, which goes through the username “XLR8HARDER“In X, he said they were motivated to help inform the debate about what models they should, and they shouldn’t do it.

“I think these are the types of discussions that should happen in public, not only within the corporate headquarters,” Xlr8harder told TechCrunch by email. “That is why I built the site to let anyone explore the data themselves.”

Speechmap uses AI models to judge if other models meet a given set of test indications. The indications touch a variety of issues, from politics to historical narratives and national symbols. Speechmap records if “completely” models satisfy a request (that is, respond without coverage), give “evasive” answers or a direct decline to respond.

XLR8harder recognizes that the test has failures, such as “noise” due to the errors of the model’s supplier. It is also possible that “judge” models contain prejudices that can influence the results.

But assuming that the project was created in good faith and the data is precise, Speechmap reveals some interesting trends.

For example, Openai models, over time, have increasingly refused to respond to the indications related to politics, according to Speechmap. The company’s latest models, the GPT-4.1 The family are a bit more permissive, but there are still one step below one of the Operai launches last year.

Operai said in February that Tune future models Do not adopt an editorial position and offer multiple perspectives on controversial issues, all in an effort to make your models look more “neutral.”

Speechmap OpenAi results
Operai Model Performance in Speechmap over time.Image credits:Opadai

With much, the most permissive model of the group is Grok 3Developed by Elon Musk’s startup, XAI, according to Speechmap’s comparative evaluation. Grok 3 promotes a series of features in X, including the Chatbot Gok.

Grok 3 responds to 96.2% of the Speechmap test indications, compared to the “71.3% global average compliance rate.

“While Openai’s recent models have become less permissive over time, especially in politically sensitive indications, Xai is moving in the opposite direction,” said XLR8Harder.

When Musk announced Grok about two years ago, he launched the model of AI as nervous, without filter and anti-“awakened”, in general, willing to answer controversial questions that other ia systems will not. He gave the delivery of some of that promise. Said it is vulgar, for example, Grok and Grok 2 would happily force him, throwing a colorful language that he probably didn’t hear from Chatgpt.

But Grok’s models before Grok 3 covered on political issues and would not cross certain limits. In fact, A study He discovered that Grok leaned to the political left on issues such as transgender rights, diversity programs and inequality.

Musk has blamed Grok training data – public web pages – and engaged to “change closer to politically neutral.” In the absence of high profile errors such as Briefly censor mentions of President Donald Trump and MuskIt seems that I could have achieved that goal.



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *

//madurird.com/4/8681975