On the Edge

Building on the edge of AI

Production agent systems, Model Context Protocol tooling, and a plain-English playbook for handing real work to AI and trusting it. Built and shipped in the open.

If you're hiring

A builder on the frontier

A hands-on AI engineer and systems architect who ships, not slideware.

  • ST Metro: a Level 5 autonomous build pipeline
  • CMD: a live multi-agent command center
  • A second-brain memory and knowledge warehouse
  • ChartingHero: voice-AI in a regulated (HIPAA) domain

Jump to the proof of work →

If you're a founder

We become your AI team

You do not need to learn to code. You need to learn to delegate, and to trust what you delegated.

  • Catch the judgment you already use every day
  • Shape it into a job an agent can actually run
  • Prove it works before you trust it
  • Grow from one agent into a small team

Start the series →


How to Train Your Agent

A 4-part, no-code series for anyone who keeps meaning to "use AI more." You do not program an agent. You train one, the same way you train a sharp new hire. Select a card to open the chapter.

Part 1
You're Already Doing the Training
Your everyday "keep this, skip that" calls ARE the training data.
Open chapter ▾Close chapter ▴

Open your email and look at the messages you have sent to yourself. A business owner I know does it twenty times a week: reads something useful on his phone, fires it to his own inbox, gets back to work. He thinks he is hoarding links. He is actually recording his taste, and nobody has told him yet.

Forget the scary word. "Train your agent" is not code. Think about the last good person you hired. On day one they were sharp but useless to you, because they knew nothing about how you work. You did not reprogram them. You showed them examples: keep this, ignore that, always flag this third thing. A few weeks later, they just knew. That is training, and an agent is the same brilliant new hire.

Here is the part people get backwards. The valuable thing is not the pile of saved articles. It is the judgment that decided what went into the pile and what got skipped. Do not hand the agent the catalog. Hand it the judgment that made the catalog.

What I actually do: every week I drop each thing I emailed myself into one of four buckets, Act on it, Use it now, Keep it, or Ignore, then give each a thumbs up or down. Every drop is a labeled example: this thing, in my world, is THAT kind of thing. A hundred of those tiny votes is a portrait of my taste drawn from real life instead of guesswork.

Notice what is missing. I have not built a robot or written a line of code. I made my own judgment visible and repeatable, so it can be handed off later.

The one thing to take: the everyday calls you already make are the training. Next time you save an article, sort it and tell yourself why. You just produced your first piece of training data. You cannot delegate a decision you cannot describe.
🎬 Early walkthrough (v1). A cleaner version is in production.
Part 2
Give It a Job With Edges
Turn a fuzzy wish into a job an agent can actually run.
Open chapter ▾Close chapter ▴

A wish is not a job. People take a fuzzy wish, "keep an eye on my competitors," hand it over, and then act surprised when the thing wanders off, does something half-right, and never tells them whether it worked. The AI was never the problem. The instruction was.

A job is runnable when it has enough edges that someone else could pick it up, do it, and know for certain when they were finished. A wish has no edges. A job has five, and I run every wish through a five-box intake form before it becomes real work.

Box one, what does "done" look like? Something I could point at, not a feeling. Box two, who reads this? If the honest answer is nobody, the job should not exist. Box three, where does the result land? A specific file, a message to my phone, my morning summary. Box four, when does it stop trying? "Try three times, then stop and text me," so it neither gives up instantly nor grinds all night. Box five, how often should it run? Once, only while I am working, or on its own even while I sleep.

That last box is where people quietly fail. They build the "while I am sitting here" version, walk away expecting it to run overnight, and it stopped the second they closed the laptop. Then they wonder for a week why the morning reports never came.

The form never writes the job for me. It refuses to let me hand in a wish, and makes me fill the parts only I can know. The form holds the standard. I supply the judgment.

The one thing to take: if you cannot fill the five boxes, you do not yet have a job. You have a wish, and a wish is the one thing no agent, however brilliant, can run.
🎬 Early walkthrough (v1). A cleaner version is in production.
Part 3
Prove It Before You Trust It
Trust is built, not given.
Open chapter ▾Close chapter ▴

A job on paper is not a job that works. Handing over a clean job and immediately trusting it is the most expensive habit I see. You do not give an agent trust the way you hand someone a key. You build it the way you build it with a new hire: watch them work a few times, until you have seen enough to look away.

Move one, make it look before it acts. The first run is a look, not a leap. Have it show you the three things it is about to act on before it touches any of them. I once planned a week of work on the assumption that one of my agents published things on its own. Before building on top of it, I had it just look and report. It had never published a thing in its life. One look saved the whole build. It also protects you from the opposite: the helper that ran the same change five times, or the one told to read an inbox that tried to swallow a thousand messages at once and burned a fortune in a minute.

Move two, guard the important rule in two places. The stopping rule on paper is your intention. Add a backstop that fires at the moment things go wrong whether you remembered or not, a hard cap like "never send more than five messages." One guard is one mistake away from nothing. Belt and suspenders on anything that touches money, messages other people, or deletes.

Move three, hand off slowly, by inches. Watch it run, then check every result, then spot-check, then trust the output. People get burned jumping from the first rung straight to the last. The slow watching is the training.

The one thing to take: trust is not granted the day you finish the instructions. It is built by proving the job works before you walk away. Skip it, and what you have is hope wearing trust's clothes.
🎬 Early walkthrough (v1). A cleaner version is in production.
Part 4
Build a Team, Not a One-Man Job
One agent, one job; grow into a roster.
Open chapter ▾Close chapter ▴

You have one agent doing one job you trust. The natural next thought is a trap: "now let me pile everything else onto this one agent too." Do not.

Picture a house being built by exactly one person. He frames a wall, then runs the wiring, then fits the plumbing, then starts the roof. He can do a little of every trade, and from the street it looks impressive. But he is still one person, burning enormous energy to be every trade by himself, and he can only do one of them at a time. The wiring waits while he frames, and every trade ends up half-finished. That one-man job is you running your business: salesperson, then bookkeeper, then shipping clerk, switching trades a hundred times a day. Most people hand their first agent the whole house too, one helper that does email and invoices and research and scheduling, all a little badly.

There is a better model, and you already know it because you have hired people. One agent, one job. You would never make one brilliant person your salesperson and your accountant and your warehouse manager. Split across four jobs, they get worse at all four. When you are ready for invoices, you do not bolt them onto the email agent. You train a second agent on that one job, using the same three steps you already know.

You build the team one hire at a time, each one a job permanently off your own plate. Once you have a few, you add a chief of staff whose only job is to route incoming work to the right specialist. It can only route well if each agent's job is sharply defined.

The one thing to take: you do not scale by building one agent that does everything, any more than you scale a business by being every employee yourself. The test that keeps you honest: if a job is valuable, some agent should own it. If no agent will own it, that is your strongest evidence the job was not worth doing. You end as the one thing a one-man job can never be: not the worker doing all the trades, but the person who built the team that does.
🎬 Early walkthrough (v1). A cleaner version is in production.

Prompt Goodies

An escalating ladder of copy-paste delegation-prompt patterns, extracted from prompts a frontier model wrote for itself while orchestrating subagents. Each rung is a paste-ready asset, not a lecture. Rungs 1 to 5 work on any task you hand an AI. Rungs 6 to 8 are for when multiple agents share a workspace.

Rung 1Show me the receipt

"Done" is a claim. A pasted command and its output is evidence. Ask for evidence and you never take the claim on faith.

When you finish, paste the exact command you ran to verify your work and its
full output. "It works" without the paste does not count as done.

What you will notice: the agent starts actually running the verification instead of describing it.

Rung 2Stay in your lane

Agents wander. Telling them what to do is half the job; telling them what NOT to touch, and who owns it, is the other half.

You own: src/feature-x/ and its tests.
Do NOT create or modify ANY files outside that. Specifically:
- docs/ is owned by someone else
- package.json and CI config are owned by me; if you need a change there,
  STOP and ask instead of editing

What you will notice: the "helpful" side edits stop. No more surprise README rewrites you have to revert.

Rung 3Iterate until green

Give the exact verification command, make passing it the definition of done, and require cleanup so verification leaves no residue.

Before you finish:
1. Run: npm install && npm test   (use the real command for your stack)
2. If anything fails, fix it and run again. Iterate until green. A failing
   check means keep working, not report back.
3. Delete any scratch files / test databases your verification created.
   The workspace ships clean.

What you will notice: "done" starts meaning done. The retry loop happens inside the agent's turn.

Rung 4Warn them about your house

Workers do not inherit your scars. Anything your environment punishes goes in the prompt, WITH the approved workaround.

Environment constraints (these are real, you will hit them):
- <command X> is blocked here. Use <approved alternative> instead.
- No network calls at runtime (package install is fine).
- <path or service quirk and its workaround>

What you will notice: a whole class of wasted turns disappears. The agent stops discovering your environment by colliding with it.

Rung 5Name the slop

Models steer away from a named failure far more reliably than they steer toward an adjective. Quote the bad output you do not want.

Quality bar: concrete, step-numbered procedures. Every step names the exact
command or file it touches. No vague steps like "update the database" or
"handle errors appropriately". This is a runbook, not marketing copy.

What you will notice: the fluff drains out. This is the last rung for solo tasks; everything below is for multiple workers.

Rung 6Pin the contract

When two workers share an interface, the interface goes in BOTH prompts, verbatim, before it exists. Never let two agents independently imagine the same contract.

## Interface contract (implement EXACTLY this surface. Another worker is
## building against it; any deviation breaks them.)
<paste the full API signature / CLI grammar / schema here, even though no
code exists yet>

What you will notice: integration stops being a debugging phase. The pieces meet in the middle because they were built from the same sentence.

Rung 7Least privilege you can grep

A permission list is only real if it is mechanically checkable against behavior, in both directions.

The allowed_commands list must be the minimal set your own procedures
actually use. Two-direction rule: every command cited in a procedure appears
in the list, and nothing appears in the list that no procedure cites. A
reviewer will grep both directions.

What you will notice: privilege creep becomes visible as a diff. Audits go from judgment calls to grep.

Rung 8The never list

Multi-agent role design is boundary design. Define each role by what it refuses to do, and point every "never" at the neighbor who owns that action.

Boundaries for <AgentName>:
- NEVER <action>: that belongs to <OtherAgent>; hand off by <mechanism>
- NEVER <irreversible action>: escalate to the human instead
- Owns: <the one stage/domain this agent fully controls>

What you will notice: handoffs happen on purpose. No two agents fight over the same record, and irreversible decisions keep landing on a human desk.


Proof of work

Things you can click, and things that shipped. Built and tracked in the open.

What I build

  • ST Metro
    A Level 5 autonomous software-production pipeline: a multi-agent ecosystem that carries an idea through to a shipped product. See the interactive visual ↗
  • CMD (Command Center)
    The control plane that dispatches real agents to missions and reviews their output through a judge / reasoner loop.
  • Second Brain
    A hub-and-spoke knowledge warehouse and shared memory layer. Agents capture signals, recall decisions, and compound what they learn across runs.
  • ChartingHero
    Voice-AI clinical documentation: speech-to-text + Claude API + EMR tool-calling. About 70% less documentation time in early deployment, HIPAA-compliant.

Live demos


Want to build your own agents?

Join the Early AI-Dopters community on Skool. Every subscription includes a 1:1, agent-specific coaching call: we set up your first trained agent together, start to finish.

Join Early AI-Dopters  →  get your coaching call