1. Introduction: When Smart Agents Meet Sharp Constraints

I’ve spent over two decades building software systems that run on clarity, determinism, and architecture you can trace from first principle to final outcome. Most of my work has lived in the world of Domain-Driven Design (DDD), where behavior is not just modeled but owned by bounded contexts, explicit rules, and contracts that don’t negotiate.

Then along came Agentic AI.

A wave of enthusiasm surged through the AI community. Suddenly, we weren’t just building models anymore. No we were orchestrating agents. Tools wrapped in language. Goals pursued with reasoning. Workflows narrated like thought processes.

It’s an intoxicating idea: systems that don’t just run code but explain themselves, adapt, decide. Systems where logic is written in English and delegation is a prompt away.

You might already hear the friction.

Because in many real-world systems, especially the kind I’ve helped design, there’s no room for improvisation. A warehouse doesn’t tolerate guesswork. A payment gateway doesn’t need storytelling. These are deterministic domains, where correctness isn’t a probability but a requirement.

And so I found myself asking: What happens when you introduce a probabilistic engine into a world that demands guarantees?

To explore this tension, I built a full multi-agent system for a fictional paper supply company covering quote requests, inventory checks, supplier restocking, order processing, and financial reporting. Each domain responsibility was assigned to a language-powered agent, coordinated through tool calls and LLM-driven reasoning.

What followed was an illuminating exercise in contrast.

In this post, I’ll walk you through what I learned. Where Agentic AI architecture shines, where it collapses under its own abstraction, and why the tension between fuzzy intelligence and crisp constraints is not just technical, but philosophical.

Let’s begin.

2. Background: Agentic AI in Theory vs. Practice

Let’s start with the theory.

Agentic AI promises a shift from task execution to goal-driven behavior. Instead of writing imperative code to process an order or update a stock level, you define what should happen and well let agents decide how to get there. These agents are powered by large language models (LLMs), which interpret goals, decide which tools to call, and narrate their way through complex workflows.

At least, that’s the vision.

In principle, it’s elegant:

  • A natural-language interface to structured logic
  • Modular agents with clear responsibilities
  • Reasoning instead of routing
  • Explanation instead of execution

You might be thinking: That sounds like a clean abstraction layer.
And it is … until it isn’t.

Because when this idea touches the ground and that especially in systems with rules, state transitions, and money on the line, things begin to wobble.

In practice, I gave each domain in the paper company system its own agent: quote parsing, inventory, ordering, quoting, reporting. These agents received goals, consulted their tools, reasoned about the situation, and replied in structured JSON. An orchestrator mediated their interaction.

On paper, it looked like a cognitive workflow.
In execution, it looked more like babysitting. Yes babysitting 😉

Working with LLM-based agents in a deterministic system feels like making a deal with the devil where you get exactly one wish and then spend eternity writing guardrails to keep him from misinterpreting it.

Why? Because language models are not state machines. They are not program counters. They don’t “know” that an item was just restocked, unless you tell them. They don’t enforce constraints, they interpret them. And interpretation, in probabilistic systems, is a polite word for maybe.

So while the theory suggests a path to modular autonomy, the practice delivers something fuzzier: a system that looks intelligent, but needs constant supervision.

And that’s where the cracks begin to show.

3. The Reality Check: Five Core Problems

So what happens when the rubber meets the runway?

You get surprises. Not the good kind.

The idea of autonomous agents coordinating business logic sounds futuristic. But when you’re responsible for actual correctness, reliability, and traceability, the experience turns into something else entirely: a constant back-and-forth between hopeful delegation and nervous babysitting.

Let’s look at the fractures.

3.1 The Illusion of Delegation

Agentic systems suggest that you’re handing over control: “Let the agent handle it.”

But you don’t. Not really.

You still:

  • Precompute key values like inventory and cash
  • Define every tool and its constraints
  • Validate every output manually or through rigid parsing
  • Patch over hallucinated reasoning with additional checks

The agent isn’t making decisions. It’s narrating decisions you already embedded elsewhere.

At some point you realize: you’ve replaced a few if-statements with a thousand tokens of speculation.

3.2 LLMs Are Not Deterministic Executors

You might think: But what if the agent gets smarter over time?

That’s missing the point.

In deterministic domains, getting smarter is irrelevant. The system has to be right, every time, under every condition.

But LLMs don’t execute. They improvise. I’ve had agents conclude that 659 units is enough to fulfill a 1000-unit order because somewhere, the language model decided that sounded plausible.

This breaks core software expectations:

  • Deterministic logic
  • Verifiable state transitions
  • Predictable control flow

In short: if correctness is non-negotiable, the LLM is the wrong actor for the job.

3.3 Tools Are Still the Real Actors

Here’s the quiet truth most Agentic AI demos gloss over: the “agent” doesn’t do anything. It suggests, describes, or initiates but it never performs.

Every meaningful action whether it’s checking inventory, calculating cost, or creating a transaction happens inside a tool written by you.

So what’s really happening?

You’re calling deterministic functions and wrapping the results in a verbose justification layer. The agent becomes a kind of narrating interface: half protocol, half roleplay.

Nice for demos. Problematic for systems where someone needs to own the outcome.

3.4 Misalignment with Proven Engineering Principles

As a software architect, this part grated the most.

Agentic AI breaks almost every principle we use to build reliable systems:

  • Control flow becomes entangled with data representation
  • There are no unit tests for agent behavior only logs and hope
  • State tracking becomes guesswork unless explicitly injected
  • Each agent becomes a black box: opaque, brittle, and hard to reuse

This isn’t abstraction. It’s erosion.

Compare that to Domain-Driven Design, where logic lives in clear boundaries, state is explicit, and every behavior has a traceable intent. Agentic systems offer none of that structure out of the box.

3.5 False Innovation in Wrapping Logic

There’s a seductive argument floating around:

Just put all the real logic in tools, and let the agent coordinate them!

Sounds safe. And it is.
But once you do that, ask yourself: What’s left?

You’ve built a deterministic backend, wrapped in a probabilistic narrator. The agent isn’t making decisions, it’s describing them. You haven’t added intelligence. You’ve added merely indirection.

And if you’re reaching for language models just to call functions in order, you’re not innovating. You’re decorating.

4. Two Competing Architectures

As I built out the system, a dilemma kept resurfacing:
How much responsibility should the agents actually have?

Not in theory but in code.

Because every time I gave them more autonomy, I had to write more guards. And every time I moved logic out of the agents and into tools, I found myself asking: Why have the agents at all?

Eventually, two clear architectural strategies emerged.

🧠 4.1 Option 1: Full Agent-Based Architecture (Implemented)

In this approach, I followed the Agentic AI playbook:

  • Define distinct agents for each domain: Inventory, Quotes, Orders, Reporting
  • Create a central orchestrator to mediate their conversation (with a state machine)
  • Expose every deterministic operation (check stock, get delivery date, etc.) as a callable tool
  • Let the agents “decide” what to do based on LLM-driven reasoning

At first glance, this seemed like a clean separation of concerns. The agent thinks. The tool does. The orchestrator watches.

But here’s what actually happened:

  • The agents frequently misjudged constraints
  • Tool calls had to be carefully shaped and validated
  • Logic had to be duplicated: once for the agent’s explanation, once for the tool’s execution
  • Debugging became a forensics exercise

In short, I spent most of my time supervising the supervisors.

You might be tempted to think: With better models, this will improve.
Maybe. But the core issue isn’t accuracy. It’s accountability. Who owns the outcome? Who enforces the rule?

In this setup, the answer is simple: not the agent.

🧱 4.2 Option 2: Logic-in-Tools-Only (Not Implemented)

The alternative was stricter and, in many ways, saner:

Put all critical business logic in deterministic tools.
The LLMs become interface layers: they ask, explain, and route but they don’t decide.

This resembles a traditional system architecture:

  • Functions encapsulate behavior
  • Tools enforce constraints
  • The language model adds a natural-language veneer

Cleaner? Absolutely.
More reliable? Without question.

But also: functionally indistinguishable from a traditional backend.

Once you’ve moved the logic out of the agents, you’re no longer building an agentic system. You’re building a chatbot for your business logic.

And that raises an uncomfortable truth:

If all the reasoning must be re-encoded in tools…
then what exactly is the agent reasoning about?

The answer is often: nothing that matters.

5. Where Agentic AI Does Add Value

Let’s be clear: language models are not useless.
They’re just often misplaced.

In deterministic domains, asking an LLM to decide what to do is risky.
But asking it to explain, extract, summarize, or translate intent?
That’s a different story.

In fact, some of the most satisfying moments in this project came from using LLMs not as orchestrators but as specialists.

Let me give you a concrete example.

🧾 Example: Parsing Natural Language Quote Requests

Imagine a customer sends this free-text request:

“Hi, I’m organizing a small seminar and need 2 packs of A4 paper, maybe some glossy sheets for certificates, and around 5 boxes of ballpoint pens. Delivery by next week, please.”

This is the kind of task humans handle with ease but traditional systems struggle with. You need:

  • Named items mapped to inventory SKUs
  • Approximate quantities normalized
  • Delivery intent extracted and resolved to a date
  • Everything returned in structured format

An LLM, wrapped in a dedicated QuoteParserAgent, excels here.

Why? Because this is not a logic problem.
It’s a language problem.

There’s no state to track, no correctness to guarantee beyond format and plausibility. And when the LLM gets it wrong, you can easily flag, correct, or retry.

This is where the agent metaphor actually holds up: you give a messy input to a specialist, and get a clean, structured result. No orchestration, no hallucinated decision chains.

🧠 The Contradiction

And here’s the rub:

Agentic AI performs best when the “agents” don’t interact.
When each one owns a single, narrow, human-adjacent task like parsing, summarizing, translating.

But the moment you chain them together, hoping for emergent intelligence or multi-step planning across a deterministic domain, the whole thing becomes fragile.

The irony is hard to ignore:

The closer an agent gets to actual autonomy,
the less trustworthy it becomes.

So while LLMs are brilliant at understanding language, they’re far less capable of owning responsibility.

✅ The Real Value Layer

Here’s where I’ve landed:

Language models (and agentic workflows more broadly) shine when they operate around the system:

  • Input translation: natural language → structured requests
  • Output interpretation: state → summary or report
  • Developer assistance: log interpretation, debugging hints
  • Interface augmentation: “why did this quote fail?” explained in plain English

None of these require the LLM to make decisions that affect business rules.
They simply ask it to help humans reason faster.

And in that capacity? LLMs are astonishing.

6. From Domain-Driven Design to Agent-Driven Disappointment

If you’ve worked with Domain-Driven Design (DDD), you’re used to systems that mean what they say.

A domain model in DDD is not a sketch or an approximation. It’s an executable expression of business logic: bounded, versionable, and testable. When a rule exists in a DDD system, it exists in code. It’s traceable. You can step through it, assert against it, and know when it breaks.

Agentic AI doesn’t share this philosophy.

It trades structure for flexibility, correctness for plausibility, and explicitness for language.

At first, this seems liberating.
Then, it starts to feel like erosion.

🧩 What DDD Gets Right

In a deterministic domain, DDD gives you:

  • Explicit state transitions: A purchase order moves from Created to Approved to Shipped with conditions attached to each move.
  • Ubiquitous language: Concepts like StockItem or QuoteRequest aren’t just labels. They’re central abstractions in both code and conversation.
  • Clear responsibility boundaries: Application services orchestrate. Aggregates enforce consistency. Repositories abstract persistence.

In short: every behavior lives somewhere. You can reason about it.

🧠 What Agentic AI Undermines

Agentic systems, by contrast, diffuse responsibility across layers of natural language.

  • An agent might “decide” to approve an order but what actually happened?
  • The decision logic lives half in its prompt, half in the LLM’s latent space, and half in your post-processing code and tools.
  • When things go wrong, debugging becomes archaeology.

This isn’t abstraction. It’s obfuscation.

You might assume: Well, with the right tooling and tracing, we can fix that.

Maybe. But the further you go down that road, the more you start rebuilding… exactly what DDD already provides.

🧠 A Shift from Modeling to Prompting

The deeper philosophical shift is this:

DDD models a domain. Agentic AI prompts it.

And prompting, by nature, is suggestive, not declarative.

You’re not enforcing behavior. You’re proposing it and hoping the model agrees with you today the same way it did yesterday.

This softens the very thing DDD was built to sharpen: a shared, rigorous understanding of how the domain works.

🤹 The Fantasy of Emergence

There’s a subtle allure in believing that autonomy can emerge from well-prompted agents. That if we just phrase things right, the system will organize itself.

But domains don’t want to emerge. They want to be understood.

That’s why we model them. That’s why we version them. That’s why we test them, evolve them, and negotiate their meaning with stakeholders.

Agentic AI asks us to let go of that clarity at precisely the moment we need it most.

7. Guiding Principle: Code for What Must Be Correct

After 2 weeks of building, testing, backtracking, and babysitting hallucinations, I came away with a rule I now carry into every agentic project:

If something must be correct, it belongs in code.

Not a prompt. Not a reasoning chain. Not a suggestion from a large language model trying to “understand” your tool interface.

In code.

Because code has properties LLMs do not:

  • It’s inspectable
  • It’s testable
  • It’s deterministic
  • It fails loudly when it breaks

LLMs, by contrast, fail quietly. Persuasively. They produce plausible nonsense with the confidence of a senior engineer and the accountability of a stray comment in a Slack thread.

🧠 LLMs Are Interfaces, Not Engines

The moment you treat a language model as an engine of logic, you start losing control. But the moment you treat it as an interface as a way to bridge the human and the machine, it becomes powerful again.

And that’s the pivot:

Agentic AI shouldn’t be a replacement for software architecture. It should be a window into it.

Want to know why a quote was rejected?
Let the LLM explain the code’s decision.

Want to interpret logs, summarize reports, or rephrase outputs for different audiences?
Use the LLM as an augmentation layer.

But don’t let it drive the system.
Not if the system needs to do what it says and say the same thing tomorrow.

🧱 Architect First, Prompt Second

If you’re building serious systems with traceability, guarantees, or long-term maintenance, your mindset should remain architectural.

Even in agentic workflows, the LLM is not the architecture.
It’s a participant. Sometimes useful. Rarely accountable.

So design your logic like it matters.
And let the language model follow your lead and not the other way around.

8. Conclusion: Hype, Hope, and Hard Truths

Agentic AI is not a scam.
It’s just not a shortcut.

It promises autonomy, but delivers orchestration.
It hints at intelligence, but requires constant supervision.
It wraps itself in the language of emergence but struggles so with even basic consistency.

And yet, I’m glad I built the system.

Because there’s no better way to understand the limits of a paradigm than to follow it all the way to its edge, only to discover what it quietly leaves behind.

I saw firsthand:

  • Where agents drift from accountability
  • Where language models fall short of execution
  • Where logic disguised as prose creates more opacity than insight

And most importantly: I learned when not to reach for the agentic toolbox.

💬 What Remains

What remains is not disappointment but clarity.

Use LLMs where language is the problem.
Use code where correctness is the requirement.

Use agents when you need explanations.
Use architecture when you need guarantees.

These are not enemies.
But they are not interchangeable.

📌 Final Thought

Agentic AI is a fascinating experiment in system design.
But fascination isn’t enough.

If we want to build software that endures like systems that do what they say, and say what they do, we must resist the temptation to replace discipline with prose.

The future of software is not prompt-shaped.
But with care, prompts may still help us see it more clearly.

Bonus: Exploring Agentic AI for Non-Deterministic Domains

So far, I’ve focused on where Agentic AI doesn’t fit: deterministic domains where correctness is paramount, state must be explicit, and every decision leaves a trace.

But what about the other side of the spectrum?

What about domains where:

  • Goals are fuzzy
  • Success is probabilistic
  • Exploration matters more than precision?

This is where Agentic AI may still have real traction.

🔍 Example: Research and Synthesis

Consider the task of researching a new market or technology.

You’re given a vague objective:

“Find out whether biodegradable packaging is gaining traction in the EU food industry, and what competitors are doing about it.”

This isn’t a query. It’s a journey.

You might:

  • Search for articles, patents, regulatory changes
  • Extract key arguments or market signals
  • Summarize conflicting perspectives
  • Propose possible actions or hypotheses

Here, multiple agents could coordinate:

  • A WebSearchAgent gathering sources
  • A SynthesisAgent structuring what’s been found
  • A RecommendationAgent proposing next steps

And here’s the key difference:
There is no single correct answer.

Instead of enforcing constraints, the system explores a space of possible insights. Instead of guaranteeing outcomes, it presents reasoned approximations: fragments of clarity, stitched together from noise.

That’s not failure. That’s the task.

🧭 From Command to Collaboration

In this context, agents don’t execute rules.
They simulate collaborators that are curious, biased, fallible, and improvisational.

And that’s useful.

Because in non-deterministic domains (strategic planning, content synthesis, scenario modeling) the goal isn’t correctness. It’s perspective.

You don’t want a system that follows rules.
You want a system that thinks with you.

⚖️ The Tradeoff

Of course, the tradeoff remains: you lose determinism.
But in return, you gain something that deterministic systems rarely offer:

An interface that thinks aloud—and lets you decide what’s worth keeping.

Used wisely, that can be powerful.

Just don’t let it creep into places where clarity is king.


Discover more from GRAUSOFT

Subscribe to get the latest posts sent to your email.


Leave a Reply

Discover more from GRAUSOFT

Subscribe now to keep reading and get access to the full archive.

Continue reading