We’ve all seen the explosion of AI tools over the past year. Tools that write, tools that plan, tools that analyze, generate, summarize. It feels like we’ve entered an age where we can spin up an AI to do almost any single task in minutes. And yet, when we try to make these tools work together, we fail miserably.
That shiny new AI assistant has got a bad name now. It retrieves the wrong data, repeats earlier outputs, or unable to comprehend what you are trying to ask for. We end up managing the AI instead of it supporting us. Why though? We think because most AI today is built like a tool, not like a team player.
And that’s the shift we need to make.
This realization that was brought up by our CEO, Wiley Chin, during our weekly development update, changed everything for us. We are not building AI applications. If we are planning for something as large as Multi-Agent System, we are essentially building a virtual organization.
At the beginning of our development work, we put all our efforts to sophisticated (but individual) systems where each agent has their own expertise. As the scale of development grow, we had to work in sync with all the agents seamlessly to get real complex tasks done. Once we started treating our agents like a team, the design questions got deeper.
We shifted from, “Can this model do the task?” to “How does it coordinate? Who does what? What happens when something goes wrong?”
That’s when it became clear: the real work ahead isn’t just about better algorithms. It’s about better organization and culture, even at the code.
We’ve come a long way from the early days of AI. At first, AI was largely statistical, spotting patterns in data and predicting outcomes. Then came the era of large language models (LLMs) in the 2020s, which gave us AI that could think, reason and write.
With the impressive advancement, we’re entering a new phase, that is, agentic AI. Instead of relying on one giant model to do everything, we’re starting to connect smaller, specialized agents into ecosystems. Each agent has a role. For example, one might fetch data, another summarizes it, another drafts a response, while the last one evaluates tone or factual accuracy.
This approach mirrors how we work in organizations. No one person does everything. We rely on others with different skills and perspectives to get something done. When it works well, the outcome is far more than the sum of its parts. Multi-agent systems reflect this mindset. But they don’t work well by default. Just like any growing team, they need structure, communication, accountability, and a shared sense of purpose. Otherwise, even the smartest agents get in each other’s way.
When we began building our own multi-agent systems, it didn’t take long for familiar challenges to show up. The challenges that mirrored what many of us have experienced when managing real teams.
Communication
One of the first things we noticed was how quickly communication between agents could break down. Just like in a team where people work in silos, agents that don’t share context or updates often repeat work, miss dependencies, or make inconsistent decisions. We had agents summarizing documents without knowing someone else had already processed them. One a bad try, AI agents could even generate responses based on outdated inputs. It was chaos, not because they lacked intelligence, but because they lacked coordination.
We also realized that task ownership was a real issue. If all agents are allowed to pick up any task, how do we prevent five agents from trying to solve the same problem at the same time? And if one fails, who’s responsible? In our early prototypes, we saw multiple agents simultaneously replying to the same user query, each unaware that someone else had already done the job. It reminded us of teams with no project manager, where everyone is enthusiastic but nothing really moves forward efficiently.
Structure
Structure was another missing piece. Without clear lines of escalation or fallback options, our agents loop endlessly on tasks they couldn’t resolve, thanks to their natural state of being very passionate and helpful in their response to act. There was no way for one agent to pass on a case to another more suited AI agent. In real-life example, it is like having a junior team with no senior support. Always well-intentioned, but often stuck.
Performance Assessment
Then there was the issue of motivation and feedback. Unlike people, we think agents do not respond to praise or promotions. But if we don’t build some form of internal evaluation or pointing system, something that tracks success and adjusts opportunities accordingly, agents are likely either to repeat poor performance or don’t improve at all.
We had one agent that continuously generated vague summaries because it never received any signal about whether those summaries were useful. There was no learning loop.
Memory
Perhaps the most frustrating challenge of all was memory, or rather, the lack of it (which we will discuss on a separate entry). No memory is equivalent taking every task assigned brand new. No context and mistakes. That means we lose the opportunity for improvement. We are permanently stuck in a loop where every cycle starts from zero, no matter how many times it has been done in the past. Sounds frustrating, don’t you agree?
It became obvious to us: if we wanted to build a multi-agent system that truly worked, we had to stop thinking like engineers and start thinking like team builders. The team was pointed to what we already knew about building strong organizations and started translating those principles into system design.
One of the first models we evaluated was holacracy. In this model, agents are free to communicate and pass tasks between one another without central control. This worked well in creative flows, like content ideation or campaign planning, where flexibility was key. Our summarizer agent could pass drafts to a rephraser, which then checked tone with a sentiment analyzer without any central coordinator.
Great work!
But over time, we saw the cracks. When tasks failed or became unclear, the system had no “manager” to step in.
That’s when we brought in flatarchy, a structure where agents still act independently but have an option to escalate or pass back tasks to a more experienced agent. This gave us both speed and safety.
While doing this, we also looked at some organizational corporate strategy: U-Form and M-Form. U-Form is centralized, which is ideal for fast decisions when the system is small. But as our agent networks grew, we moved to trying M-Form. The concept is to divide agents into specialized clusters. Each group had its own rules, tools, and optimization paths but aligned to a shared outcome.
To handle task distribution more intelligently, we developed a triage and cache model where agents would apply for a task, presenting their strengths. The system would then check their performance history and assign based on merit. This was especially useful in systems where quality varied.
The impact of these changes was immediate and tangible in our exploration phase. With the right communication protocols and memory structure, the enterprise-standard process flow begins to run smoothly like a real agency. When issues came up, tasks escalated. When context was missing, agents retrieved it.
This discovery opened our minds that developing AI solutions isn’t just a technical challenge, it’s a design philosophy. To build AI systems that can scale, we need to design them like we design our best teams. We need clear roles. Defined handoffs. Performance tracking. Reliable communication. And most of all, we need systems that can grow smarter together.
The research development has just started and we’re still actively learning and implementing into our pilot projects. Every system we build teaches us something new. But the direction is clear. The future of AI won’t be defined by individual tools. It will be defined by how well those tools collaborate, improve, and learn as a collective.
I believe this deeply what we’re doing isn’t just sheer technical. It’s culture-building. Women leaders especially those of us who’ve led teams or held systems together in fast-moving environments, bring something powerful to this space.
Leaders understand how to build structure that supports. Good leaders value connection and communication. Good organizations recognize when people need clarity, encouragement, or escalation.
We are designing for people first. We are shaping the future of work. Virtual AI teams of tomorrow will reflect the values we embed into them today.
Let’s not stop at automate tasks, let’s elevate intelligence together.