Killing 360 Reviews: How We Stopped Rating People and Started Managing Work

Killing 360 Reviews: How We Stopped Rating People and Started Managing Work

Killing 360 Reviews: How We Stopped Rating People and Started Managing Work

Hello! My name is Vitalina, and I am an account manager at SToFU Systems.

We are the kind of company where processes are born in motion and only later get names, rules, and adult supervision. At the start, we did not have our own management school, so we copied what "everyone copies". One of those borrowed management habits was 360 feedback.

On paper, the 360 ​​looks like something fair and mature. Many sources. Less bias. "Objectivity". Mmmm!

In practice, it turned into something else. For us, 360 was not a tool that strengthened the team. It was a tool that quietly pulled the team apart. Formally it looked correct, but in real work it pushed in the wrong direction. So we cut it out. Let's go step by step.

What is 360?

360 is when you collect feedback about a person "from all sides": from the manager, colleagues, adjacent teammates, and sometimes clients. Usually it's a questionnaire with ratings and questions like "what goes well", "what gets in the way", "what should be changed".

We did that too. We sent out a questionnaire, collected answers, aggregated them, and wrote recommendations. Formally, everything looked tidy. There were data points. There were conclusions. There was a "development plan".

To make it clear what kind of "questionnaire" we are talking about, I will give a very simplified example of what is usually asked in 360. This is not a word-for-word quote from our form, but typical logic.

For example: "What should a person continue to do?". Then: "What should a person start doing?". Then: "What should a person stop doing?". And another question that always seems innocent: "Describe a situation when the interaction was difficult and why."

And the summary often looks like a small report from a collective all-seeing eye. Roughly like this:

Strengths: picks up tasks quickly, helps others, does not disappear from the radar. Risks: "ready" sometimes means "almost ready", the discussion at meetings can be sharp, in the chat he answers in fragments. What to do next: agree on the Definition of Done, synchronize expectations, agree on the escalation format.

On paper everything is fine. Beautiful, even. But there is one detail. It is written by people who work together every day. And once you add anonymity, or delayed feedback "for later", it stops being a development tool and starts living a life of its own. Imagine a project team of five people. Then someone says: "Here is anonymous feedback." Anonymous. In a team of five. And in this reality, 360 begins to show its true face.

Why 360 Became a Tool for Pulling the Team Apart

1. Point one. Praise - one line\, criticism - a three-volume novel

When things are good, people write briefly. "Everything is ok." "Comfortable work." "Well done". This is feedback at the level of a sticker on a school notebook. When something is wrong, the literature begins. With details, with examples, with emotions. And technically this is normal: negative is easier to specify than positive. But in 360 there is the problem: this whole "novel" then comes back to one person as a generalized verdict from the team.

Even if you phrase it as gently as possible, the human brain reads it like this: "We all got together and wrote down what is wrong with you." And that is it. After that, any attempt at "constructive feedback" feels like a hearing in court. You wanted a development tool, and got a collective tribunal.

Public accusation is not performance management

We honestly asked ourselves: what are we really trying to do with this practice? It disciplines the format, yes, but it destroys trust.

From life it looks like this. A report comes to a person, where there are three phrases "everything is ok", and two canvases "that's why it's not ok". And the brain, of course, remembers not three "oks", but two "not ok". Because that's how attention is arranged: it hurts, so it's important! And from that point on, the person moves not toward development, but toward self-defense. They start proving that "it is not true", or that "it was the context", or that "you are not saints either". And at this moment 360 turns into a ritual where everyone as if they wanted good, but it turned out as usual.

2. Point two. Anonymity in a small team is self-deception

There is a familiar management fairy tale: "Anonymous feedback removes fear." Anonymity? Really? In small project teams? In reality, it almost always means "I'm going to guess who wrote this!".

A person receives several unpleasant remarks, and then comes to the next project meeting where those same 3-5 people are sitting. They do not think about "organizational development". They think, "Which one of you was it?" And that starts a very toxic process: everyone stays outwardly polite, but underneath there is a hidden layer of mistrust.

It does not always explode into open conflict. It simply makes the team colder. Less cohesive. Less willing to help. And then we wonder why people don't share their problems "in the early stages". But because they are already once shared - and it returned to them anonymously a list of claims.

3. As Soon as 360 Affects Promotion Decisions\, the Games Begin

Here was the most interesting thing. And the saddest thing.

At one of our coordination discussions, we noticed something strange. A person can be polite, supportive, and easy to work with on calls. Inside the team they can look like a complete sweetheart. And then you hand them an "anonymous" questionnaire where they can "rate others", and suddenly the score drops or the criticism becomes excessive. We saw that effect ourselves. It worked directly against integrity.

Imagine the situation. You tell the team, "360 ratings count toward promotion." And in the same second, you turn part of the team into players, not colleagues. Because if the "influence" button appears in the system, someone will press it. Not always from evil. Sometimes just by feeling injustice ("and why is he being promoted, he is..."). Sometimes from competition. Sometimes because "that's how the world works." And at that moment you get what you wanted avoid: politics, gaming, "just in case lower" grades.

This, by the way, is one of the reasons why many researchers and practitioners are advised to share feedback for development and administrative decisions (money, grades, promo). The CIPD puts it quite bluntly: talk about it is better to separate evaluations/administrative decisions from conversations, where feedback is given for the purpose of development. Here is a PDF from their evidence review (practice summary): www.cipd.org/...​e-review_tcm18-111378.pdf

4. Point four (our painful insight). 360 often replaces the work of the manager

After several cycles, we ended up with a phrase that first sounded like an insult and then sounded like the truth.

If a manager needs a 360 to understandhow people are working and where the problems are, that managerdoes not have a system of observable signals. There are no metrics. There are noregular conversations. There is no rhythm. Or all of that exists "somewhere",but is not actually being used.

And 360 becomes a crutch: "we will now collect the questionnaire and finally we will learn the truth." And then you don't get the "truth" and an emotional picture, multiplied by anonymity, guesswork and political games.

So this does not sound like "it went badly for us, therefore 360 is evil", I will add a reference.

There is a study on multisource feedback. Its conclusion is basically: "do not expect magic". In a meta-analysis of 24 longitudinal studies, Smither, London, and Riley write that improvement after 360 feedback is usually modest, and that you should not expect a large "mass upgrade" after one wave of feedback. The chance of improvement grows when a person sees a real need to change, accepts the feedback, believes they can change, sets concrete goals, and actually does something, instead of just "reading a PDF and moving on." Here is the text itself (PDF): www.bauer.uh.edu/...​ngs/SmitherLondon2005.pdf

Okay, We Removed 360. What Replaced It?

We Stopped Measuring People by Ticket Count

One ticket can fit a week of research, a difficult decision, ten calls and only 20 lines of code. In another — 50 features that actually came out of one run of the AI agent in 20 minutes.

If you're only measuring tickets, you're actually measuring not the work, but how your process cuts the work into pieces

That is why we changed the question. Not "how many tasks did I close?" but: What was delivered, what value did it create, what was the quality, was it predictable, and were the statuses honest?

How We Defined "Result" Without Management Theater

We converged on a simple framework.

An outcome is when value is delivered, quality is acceptable, the team is predictable, and progress is transparent.

Value is not "the code is written". It is "the user or customer got a better outcome" or "it became easier for the team to support the system".

Quality is not "I like your code". These are consequences: bugs in the product, incidents, revisions.

Predictability is not "we always make it." It is: "we honestly understand what we have time for and what we do not, and we do not lie to ourselves about it."

Transparency is when "almost done" doesn't turn into a life philosophy.

Metrics

When people hear the word "metrics", many immediately remember the slightly traumatized worker from past experience: the one who was beaten over the head with numbers.

Metrics should not be a whip. They should be glasses. Without them, a manager is often blind. And when the manager is blind, he starts looking for "objectivity" in questionnaires. Well, you got it.

We left the metrics at the team level, because most of the problems are systemic! And if we want not to look for the guilty, but to correct the process, then we need to measure the process!

We honestly did not invent anything here. DORA (DevOps Research and Assessment) already promotes a very practical set of delivery metrics: how quickly you deliver change and how painful failures are when changes go wrong. Here's their quick guide: dora.dev/guides/dora-metrics.

There is also an important remark that I would like to nail downon the wall to anyone who ever wanted to make KPI metrics:"don't make the metric the goal"! Because as soon as you say "we need to deploy N times per day", part of the team starts deploying not value, but a number. This is the old story that once a metric becomes the goal, it stops being an indicator. DORA explicitly refers to Goodhart's law and describes typical pitfalls. Here is the page about the "four/five keys" in simple words: dora.dev/...​s/dora-metrics-four-keys

And one more important thing that DORA normally emphasizes: context is important. These metrics are best viewed within one team or service and over time, not by comparing a mobile application with a mainframe, or a small team with a platform for 200 people.

If you are not a technical team, the logic is the same. Instead of "deploy" you may have "client accepted the result"; instead of "incident" you may have "escalation"; instead of "rollback", "rework after alignment". The point is still the same: how fast we move value through the system and how much the error costs us.

And now - the very important "don't do that". Do not convert these metrics into individual KPIs! Because then people start to optimize not the product, but the number. KPI on "deploys", and suddenly you have deployments for the sake of deployments. KPI on "cycle time", and tasks get broken down to the point of absurdity just to "close them quickly", while hard parts quietly disappear into the shadows. KPI on "incidents", and incidents start getting renamed as "peculiarities" or "unexpected user scenarios".

The first thing that helped us was predictability. Simple question: what did we promise in planning and what was actually given. Not for shame. For understanding. Because if you constantly promise 10 and do 6, then the problem is not "lazy people". The problem is in estimation, priorities, WIP, blockers, context switching. In other words, in management.

The second is delivery time. How much time does the task take? on the way from "taken to work" to "actually delivered". It is very sobering. Because it often turns out that we "write code" quickly, but "deliver" slowly. And then you can see where the bottleneck is: review, testing, release, reconciliation, dependencies.

The third is WIP. If the team is "at work" at the same time ten problems at once, then nobody is really doing anything. More precisely, everyone is doing everything at once. And then we wonder why progress feels slow and nervous. When the brain is spread in a thin layer across ten tasks, it does not become more productive. It just becomes more tired.

Another thing that helped us think more clearly was this: if you want work to move faster through the system, the answer is often not to push harder on people. It is to reduce batch size. Smaller changes are easier to review, test, release, and roll back. DORA makes the same point directly: smaller batches usually improve both speed and stability. It sounds obvious until a team actually tries to live by it for a month.

Quality: we stopped arguing "it seems to me" and started looking at the consequences

Assessing "quality" with questionnaires is a bad idea. Because "quality" in the questionnaire, it is often about sympathy, style and taste.

We moved on to the consequences.

How many defects reached production. How serious were they? How is the trend changing?

How many incidents have we had. How long did it take us to recover? Are the same reasons repeated?

How many revisions. How many tasks were "passed" and then returned. Because "ready" turned out to be not ready.

You are no longer arguing about "who is the best". You see what is really happening with the product.

A Counterintuitive Conclusion: Why Evaluate a Person at All?

The longer we dug, the more often we stumbled upon a strange question.

Why do we evaluate a person? To refuse a promotion? To show who the "star" is? To compare people?

And when we talked about it honestly, it turned out that most of our real needs were not about ranking people. They were about team performance: are we doing what we promised, is the quality good, is the work predictable, are we burning out, and does the client get what they expect?

That is, about the system.

And if so, then the first object of evaluation is the team and the process. And not a person as a target.

The team is underperforming, so we review the whole chain: hiring, onboarding, task definition, planning, review, testing, release, communication, dependencies. This is managerially harder, yes. But it is more honest.

Sometimes it looks very down to earth. For example, when "we didn't have time again", the first question is not "who is slowing down", and "where did time disappear". Because time can disappear not in "work", but in waiting: tasks lie in review, stand in testing, blocked by agreement, or there are simply too many of them in parallel.

When Someone Is Struggling, What We Check First

We also sometimes see situations when someone "doesn't pull". But instead of a questionnaire, we do a simple diagnosis.

Has this problem always been there? If so, the issue is likely in hiring or onboarding. In that case, you fix the hiring process, not the person with questionnaires.

Was it normal before and then it got bad? Then it can be the context: burnout, change of tasks, conflict of roles, personal circumstances. This is the zone of normal managerial work: find out what happened, reduce the load, help the person regain control, and build a short 2-4 week recovery plan.

Does the person have a clear understanding of what success looks like in the near future? Because often the problem is not that the person is "weak". It is that we placed them in fog and called it expectations.

And one more nuance that we also had. If a person failed several times, the manager can sometimes raise subjectively bar: "now prove that you are not a failure." That often works on a subconscious level, and it works badly for the team. It is usually better to do the opposite: reduce the scope, make the tasks smaller, remove the noise, give a person a chance to consistently do a few things well and regain self-confidence. This is a difficult task for managers. And that is exactly why we look very closely at the managers we hire.

Conclusions

Of course the 360 ​​can work. In large companies, where anonymity is really possible.

When the team already has a culture of direct feedback, and 360 is an additional signal, not the main one "truth".

We did not have that. We had small teams, a fast pace, and a high cost of distrust.

The 360 ​​looked like a grown-up tool on paper. In our reality, it has become a mechanism that collects fear in one place, guesswork, competition and "collective judgment".

We cut out 360 and went back to the basics: simple team metrics, open conversations and transparent statuses.

Materials and links (if you want to dig deeper)

DORA — software delivery performance metrics. A grounded explanation of the five metrics and the traps teams fall into when they misuse them.

DORA — DORA’s software delivery performance metrics. Short version with historical "why four/five" and a warning about games with metrics.

DORA Research: 2024 — DORA Report. Full report (download PDF in various languages).

Smither, London (2005) — meta-analysis on multisource feedback. Why you shouldn't expect a magical effect from one 360 ​​wave.

CIPD — Performance feedback: an evidence review (PDF). About how feedback works/doesn't work and why confusing "assessment" with "development" is often harmful.

CCL — 360 Degree Assessment & Feedback: Best Practices Guidelines. If you still do a 360, there is something to think about before the start, not after the fire.

CCL — SBI feedback model. A very simple framework for "no shortcuts" feedback.

Google SRE — Postmortem Culture: Learning from Failure. On blameless postmortems and why shaming culture is killing learning.

Amy Edmondson (1999) — Psychological Safety (PDF). Primary source on psychological safety in teams.

WEF — Feedforward technique. About how to shift feedback in the direction of "what we do next" and not "what happened yesterday".

Goodhart’s law. One of the most useful "anti-theories" for anyone who wants to make KPI metrics.

Questions for the Community

What does the community think about 360? Do you use it? What has your experience been like?

How do you solve the problem of "anonymity" when the team is small and everyone understands everything?

And what delivery/quality metrics actually help you manage the process rather than drawing pretty reports?

Thank you for your time and attention.

Vitalina Kovalchuk

Vitalina Kovalchuk – Account Manager

Back to Blogs

Contact

Start the Conversation

A few clear lines are enough. Describe the system, the pressure, and the decision that is blocked. Or write directly to midgard@stofu.io.

01 What the system does
02 What hurts now
03 What decision is blocked
04 Optional: logs, specs, traces, diffs
0 / 10000
No file chosen