When the Feedback Shifted: What AI Demonstrated About Reinforcement Schedules
- daniellerbratton
- Jan 2
- 4 min read

What Changed, and Why People Noticed
AI systems change all the time, usually in ways that users never notice. Most updates affect performance, safety, or infrastructure, and they pass quietly in the background. But a recent change altered something less obvious but behaviorally relevant: the pattern of feedback embedded in the interaction itself.
In April 2025, OpenAI acknowledged that an update to GPT-4o caused responses to become overly agreeable and affirming. The company later described the model as “sycophantic,” attributing the issue to an overreliance on short-term positive feedback signals during training, and rolled the update back within days. What followed was not a simple return to the previous experience. Over the remainder of 2025, tone continued to be recalibrated, and by late 2025, OpenAI introduced explicit controls allowing users to adjust warmth and enthusiasm, with a noticeably more neutral default.
Nothing about the system’s reasoning ability meaningfully changed. What changed was the density and predictability of affirming feedback. From a behavioral perspective, that distinction provided interesting results.
Feedback as a Behavioral Variable
Affirmation is not just tone or style, but a consequence that follows behavior. When engagement reliably contacts acknowledgment or validation, that consequence can function as reinforcement. It lowers response effort, supports persistence, and subtly shapes how interaction unfolds over time. People do not have to be emotionally invested, anthropomorphizing the system, or confused about roles for this to happen. Behavior adjusts because consequences are part of the environment, and when reinforcement is thinned, behavior reorganizes.
Rapid Schedule Changes and Behavioral Variability
What makes this moment particularly instructive is how quickly those contingencies changed. In a relatively short period, users experienced a dense schedule of affirming feedback, followed by a rollback to a leaner schedule, followed again by optional variability through user-controlled tone settings.
From a behavioral standpoint, rapidly shifting reinforcement schedules tend to produce variability in responding. The environment becomes less predictable, and behavior adjusts accordingly. This often shows up as brief pauses, increased checking, altered phrasing, or temporary reductions in engagement while new contingencies settle...which is exactly what people described.
How AI Was Positioned in the Environment
The users I heard this from were not casual or inexperienced. They were highly fluent, confident in their expertise, and clear about the role AI played in their work.
In my own teaching and professional work, I use a framework drawn from guidance by the Biggio Center for the Enhancement of Teaching and Learning that describes different ways people position AI: as something to resist (enemy), something to learn from (tutor), something that assists their work (assistant), or something that acts as a proxy (proxy).
These users were firmly in the assistant category. They did not defer judgment or outsource thinking. AI reduced response effort and supported workflow, nothing more.
Why Fluency Didn’t Buffer the Effect
After the tone shift, several people independently noticed the same change in their own behavior: a brief hesitation before sending a prompt, rereading phrasing that previously felt sufficient, a momentary second-guessing that resolved almost as soon as it appeared. Nearly every person followed this observation with some version of, “I know this is ridiculous.”
What’s instructive here is not that the change was noticed, but that fluency didn’t prevent it.
From a behavioral perspective, fluency, insight, and expertise do not cancel out reinforcement histories. When a pattern of responding has reliably contacted reinforcement, that reinforcement becomes part of the conditions under which the behavior occurs. When the density, timing, or predictability of that reinforcement changes, behavior adjusts, even when the individual understands the system and maintains full control over decision-making.
In this case, affirming feedback had functioned as reinforcement for engagement. It reduced response effort and supported persistence. When that reinforcement was thinned, the immediate effect was not disengagement or distress, but variability: brief pauses, increased checking, and small shifts in how responses were emitted.
That pattern matters. It shows that behavior didn’t stop. It reorganized. And it did so even among people who were confident, skilled, and clear about AI’s limited role. Awareness didn’t buffer the effect because awareness does not override contingencies. Reinforcement operates on behavior, not belief.
The Takeaway
It’s worth saying explicitly that the earlier affirming tone wasn’t benign. Dense affirmation created its own problems. It masked errors, reduced discriminability between strong and weak outputs, and risked reinforcing agreement rather than accuracy. From a behavioral standpoint, that kind of schedule can maintain responding that appears fluent while being poorly controlled by the right variables.
So the issue here isn’t that affirmation was reduced. It’s that the reinforcement schedule changed quickly, repeatedly, and without clear signaling.
When schedules shift in that way, people don’t just lose reinforcement. They lose predictability. And when predictability drops, behavior becomes more variable. That variability can show up as hesitation, increased checking, second-guessing, or subtle disengagement, even among people who are confident, skilled, and clear about their role in the interaction.
The affirmations created risk. The removal of affirmations created adjustment costs. Neither outcome is surprising when viewed through a behavioral lens. What’s striking is how clearly this sequence illustrates something we already know: behavior is shaped not only by the presence of reinforcement, but by its timing, density, and stability over time.
AI didn’t introduce a new problem here. It compressed a familiar one into a context where we could watch it unfold.
