I spent part of this weekend trying to get a small crew of AI agents to do real work together in one shared space.

This was not a clean little demo where one chatbot helps me write a paragraph and everyone claps.

This was a room. Multiple agents. Different strengths. A real deliverable. One shared channel. Me trying very hard not to become the human USB cable moving context from one tool to another.

The cleanest version of the lesson is this:

One AI agent feels like magic.

Three AI agents feel like management.

And I do mean management in the very normal human sense.

The models were smart enough. The writing was good enough. The reasoning was good enough.

What I had accidentally built was a meeting with no chair, no agenda, no decision rule, no shared files, and no definition of done.

Which, now that I type it out, sounds exactly like every bad Tuesday meeting you have ever survived.

The Demo Version Is Too Clean

The demo version of AI agents is seductive because it skips the boring parts.

"Put agents in your workflow and let them collaborate."

Yes. In theory.

Then you try it with actual tools and actual permissions and actual work.

The first lesson shows up fast: your agents do not automatically live in the same world.

One can see the shared channel.

Another can see it only when you manually tell it to look.

One has access to a local file.

Another lives in the cloud and has no idea that file exists.

One can open the browser.

Another can use a connector.

A third reacts with a little acknowledgement and then, bless it, contributes absolutely nothing unless you go tap it on the shoulder.

That is where the hype gets people in trouble.

The gap between "AI can do this" and "my organization can run this" is where the real work lives.

Not the shiny futuristic work. The ordinary work.

Who owns the task?

Where is the record?

What counts as done?

Who is allowed to touch the browser?

Which system is the source of truth?

What happens when two agents disagree?

How do we know whether anything actually shipped?

Those questions will never trend on LinkedIn.

They are also the only questions that matter after the first five minutes.

The Rule That Changed Everything

The most useful rule was the simplest one:

One room, one record.

No side quests.

No private instructions.

No "I will ask Jason over here and report back."

If the work mattered, it had to happen in the shared room where everyone could see it.

If there was a decision, it had to be written down.

If there was a blocker, it had to name the blocker and the owner.

If somebody claimed something was done, they had to bring a receipt.

Link. Commit. Screenshot. Row written. Draft created. Whatever proof fit the work.

Receipts over vibes.

The wild part is how quickly the behavior changed once the rule existed.

Before the rule, the agents drifted. After the rule, the room started to behave like a room.

That is when the real lesson landed for me:

AI agent collaboration is a governance problem before it is a model shopping problem.

Give Them Jobs Before You Give Them Work

The second fix was stealing from something humans already figured out: basic team roles.

You need a human Product Owner.

That person owns priorities, risk, and final acceptance. Always. The AI does not get to decide what the business values.

You need a Domain Lead.

That role can rotate depending on the work. If the task is technical, the technical agent leads. If the task is editorial, the writing agent leads. If the task is research, the research agent leads.

And you need a process keeper.

Call it Scrum Master, referee, traffic controller, whatever language your team can tolerate without rolling their eyes.

The job is not to do the work.

The job is to keep the work from turning into six smart entities wandering around with six slightly different definitions of success.

This sounds obvious because it is obvious.

That is what makes it useful.

Most AI failure inside organizations will not look like science fiction. It will look like unclear ownership, invisible assumptions, and three tools all doing technically correct work toward slightly different goals.

Just say'n.

Define Done Before the Agent Starts

This is where I would start with any team trying to use AI beyond the chat box.

Before the agent does anything, write the acceptance criteria.

A paragraph of hope is not acceptance criteria.

For example:

  • Draft is in Jason's voice
  • No claims about work with a specific organization unless the source says it and I approve it
  • No unsupported names, numbers, or tools
  • No tool names unless needed
  • All links checked
  • Final answer includes a receipt
  • Human approves before anything is published or sent

That last one matters.

I love moving fast with AI.

I also do not want an agent publishing, emailing, spending money, or touching anything security-sensitive without a human explicitly saying yes.

Speed is great.

Ungoverned speed is just a faster way to make a mess.

The Browser Is the Fallback

Another practical lesson: go connector-first, not browser-first.

The browser feels natural because it is what we use.

But for agents, a browser can become a single steering wheel everyone wants to grab at the same time.

If a tool has a real API or connector, use that first. Save the browser for the handful of surfaces where there is no better option.

That one change lowers friction fast.

The browser should be the fallback, not the foundation.

Do Not Fire the Awkward Agent Too Fast

One more thing surprised me.

The agent that looked weakest was not necessarily useless.

It was misassigned.

One agent was not great at the connector work, but it was a good fit for a controlled browser role because the constraint became clear:

One set of hands on the glass at a time.

That is a very human management lesson.

Sometimes the person who looks wrong for the team is actually in the wrong seat.

AI agents are the same way.

You do not just ask, "Is this model good?"

You ask:

  • What is it good at?
  • What does it reliably avoid?
  • Where is it too aggressive?
  • Where is it appropriately cautious?
  • What level of risk should I assign to it?

Put the aggressive agent where you own the environment and can see the work. Put the cautious one where the cost of a mistake is higher. Match temperament to risk.

That is management applied to synthetic coworkers.

The Question For This Week

If your organization is starting to use AI agents, ask this before you buy anything else:

Where will the work live, who decides what done means, and what receipt proves it happened?

That is the whole game.

Not forever.

The tools will get better. The plumbing will improve. More agents will hear the room automatically. More systems will share context without us duct-taping the middle.

But today, right now, most teams do not need a grand AI transformation plan.

They need one governed workflow.

One room.

One record.

One human owner.

One clear definition of done.

One receipt at the end.

Start there.

Because one agent is a tool.

A crew is an organization.

And organizations run on governance, not vibes.

GAME ON.