The "are you sure?" Problem: Why AI keeps changing its mind

The "are you sure?" Problem: Why AI keeps changing its mind

by turoczy

zarzavat

In computer chess there's a concept of "contempt". When you set the engine to have high contempt it evaluates the opponent's moves lower, essentially assuming that the opponent will make a mistake. Conversely with low contempt the engine evaluates the opponent's moves higher, expecting the opponent to play better than it.

There is a similar trade-off with LLMs. Sometimes their human conversant wants assistance and so the LLM should be more deferential. At other times the human wants a bias towards correctness rather than their own opinions.

It would be nice to have a contempt knob that you can adjust, instead of blindly trying to emulate one through prompting.

johndhi

The article posits that sycophancy is inherent to how models are trained.

I think there's a simpler explanation. Every leaked system prompt from every model pretty much includes instructions to "be helpful," and the models are trained to be assistants, not just general knowledge repositories or research tools.

My hunch is that's the core of the problem -- the system prompt.

Eddy_Viscosity2

My prompts always contain the phrase 'no sycophancy'. The results are more direct.

RugnirViking

The article's main idea is that for an AI, sycophancy or adversarial are the two available modes because they don't have enough context to make defensible decisions. You need to include a bunch of fuzzy stuff around the situation, far more than it strictly "needs" to help it stick to its guns and actually make decisions confidently

I think this is interesting as an idea. I do find that when I give really detailed context about my team, other teams, ours and their okrs, goals, things I know people like or are passionate about, it gives better answers and is more confident. but its also often wrong, or overindexes on these things I have written. In practise, its very difficult to get enough of this on paper without a: holding a frankly worrying level of sensitive information (is it a good idea to write down what I really think of various people's weaknesses and strengths?) and b: spending hours each day merely establishing ongoing context of what I heard at lunch or who's off sick today or whatever, plus I know that research shows longer context can degrade performance, so in theory you want to somehow cut it down to only that which truly matters for the task at hand and and and... goodness gracious its all very time consuming and im not sure its worth the squeeze

trusche

This is real, but (at least in a coding context) easily preventable. Just append "don't assume you're wrong - investigate" or something to that effect. Annoying, but usually effective.

kibibu

I feel like if I asked the author of this entire article "are you sure?", they might change their mind...

philipp-gayret

I am seriously tired of every other paragraph I read ending in an It isn't just X, it's Y. I'm sure there is something insightful in between this slop but to the author: Please write using your own voice, if I wanted ChatGPT's take on it I would ask.

tyleo

Agreed. I don't even necessarily have anything against AI edited text but there's a way to sharpen your own writing and there's a way to let its voice dominate. There's a lot of idioms it tends to fall back on (em dashes being the most well known). I'm surprised that folks don't notice these and aggressively reassert their voice.

I use LLMs in my own writing because they have benefits for conciseness but it tends to be a fairly laborious process of putting my text in the LLM for shortening and grammar, getting something more generic out, putting my soul back in, putting it back in the LLM for shortening, etc. I tend to do this at the paragraph level rather than the page level.

jofzar

I miss people having their own voice. I can't keep reading slop.

I wish hackernews banned slop, or atleast required disclosure.

nimonian

> These aren't edge cases. This is...

me stopping reading

srean

I think HN might need a downvote button for stories if this continues.

jagged-chisel

We have "flag." Flag 'em.

robertlagrant

Exactly. It's not just nauseating—it's sickening.

catigula

An AI can only be tuned to either be sycophantic or adversarial.

It isn't possible to tune an AI to have some sort of 'correct answer' orientation because that would be full AGI.

hks0

Except when I wanted to get ChatGPT or Claude to criticize a religion or religious figure, namely Khamenei. It never backed down and if forced too much and I pointed out its contradiction, it would switch to 2~3-word sentences response mode (i.e. passive-aggressive).

It was a long time ago, Claude 3 or maybe ChatGPT's v3. It felt so dehumanizing that I never tried again.

It didn't seem like trained behavior though, it felt much like hardcoded behavior.

satisfice

I call this self-repudiation. I performed a systematic experiment on this exact matter, a couple of years ago. I found that ChatGPT 3.5 frequently self-repudiated, whereas 4.0, under identical circumstances, rarely did.

These experiments are a bit expensive to run because you are forced to read all the responses to judge repudiation. Sometimes it is subtle.

Also, behavior changes with the exact wording of the question.

gmerc

AI Slop. Unfortunately

josefritzishere

AI slop about AI slop. The internet is dead.

StilesCrisis

[dead]

agentultra

There isn’t a mind to change. Unfortunately the article is slop. Too bad, won’t read the rest.

I wish there was a tag or something we could put on headlines to avoid giving views to slop.

sunir

There is a mind; the model + text + tool inputs is the full entity that can remember, take in sensory information, set objectives, decide, learn. The Observe, Orient, Decide, Act loop.

gebalamariusz

In AWS, for example, DNSSEC Route53 signing is possible, but almost no one configures it. Generally, most people do a lot of good things about security, but they somehow forget about DNS.

Crafted by Rajat

Source Code

hckrnws

The "are you sure?" Problem: Why AI keeps changing its mind