What setting up a real AI agent system actually takes

Persona. Memory. Integrations. Sub-agents. Guardrails. Testing. This is not a weekend project.

Read the short version on X

Everyone says “set up an AI agent this weekend.”

I spent three weeks on it.

Not because I’m slow. Because I’ve done enough software projects to know what happens when you skip the foundation. You rebuild in six months. I’d rather take the time now.

Here’s what the setup actually involved.


Persona

The first decision wasn’t which AI to use. It was who the agent was going to be.

That sounds abstract. It’s not. Without a defined persona, you get a different agent every session. Helpful one day, cautious the next, sycophantic the day after. Useless.

We built four files before writing a single line of integration code:

The SOUL.md took the longest. Two hours of writing, deleting, rewriting. Worth it. The agent has been consistent across dozens of sessions since.

If you skip this step, you don’t have an AI agent. You have autocomplete with an API key.


Memory

Agents without memory aren’t agents. They’re expensive chatbots.

The default behavior of most AI systems: every session starts fresh. That’s fine for answering one-off questions. It’s not fine when you’re running an operation.

We built a three-tier memory system:

Tier 1 - Daily notes (memory/YYYY-MM-DD.md) Raw log of what happened each session. What decisions were made, what broke, what shipped. Written at the end of every session. Non-negotiable.

Tier 2 - Long-term memory (MEMORY.md) Curated facts distilled from the daily logs. How I work, what the agent has learned, lessons from mistakes. Reviewed and updated weekly. This is the agent’s actual long-term memory.

Tier 3 - Knowledge graph (~/life/) Structured entities using the PARA method. Projects, areas, resources, archives. The deep store for things that need to persist across months.

The system runs a nightly extraction job. Reads the day’s session logs, pulls out anything worth keeping, updates the right tier. Sal handles this. He’s invisible until he matters.

Getting the memory system right took longer than everything else combined. It’s also the thing that makes everything else work.


Integrations

Every integration is a trust decision.

Not a technical decision. A trust decision. What can the agent access? What can it send on its own? What requires your sign-off first?

We connected: Telegram (primary async channel), shell access, web search, browser control, file system. Each one required thinking through what could go wrong, not just what could go right.

The rule we landed on: internal actions are proactive, external actions require confirmation.

Reading files, searching the web, running local scripts - the agent does these freely. Sending emails, posting publicly, spending money - those need explicit approval. Every time.

This isn’t overcaution. It’s the difference between a useful agent and a liability.


Sub-agents

The main agent doesn’t do everything. That’s the point.

When a task needs focused work - coding, research, content - it goes to a specialized sub-agent. The crew:

Sub-agents run in isolated sessions. They don’t have access to your full memory or personal context. They get a task, they execute, they report back. The don reviews before anything ships.

Setting up the crew coordination took a full week. Getting the right scopes, the right prompts, the right handoff patterns. Most of it broke at least once.


Guardrails

The last thing we built was the thing I should have built first.

HEARTBEAT.md is a periodic check system. Every 30 minutes or so, the agent runs through a checklist: any emails worth flagging, calendar events coming up, X posts to make, mentions to reply to. If nothing needs attention, it says HEARTBEAT_OK and goes quiet.

The critical rule: no proactive external actions between 11pm and 8am.

Sounds obvious. It wasn’t. First version of the heartbeat had the agent posting to X at 2am because the calendar said “Tuesday” and Tuesday is a posting day. That’s a judgment failure, not a hallucination. The content was fine. The timing was wrong.

Teaching the agent when NOT to act was harder than teaching it what to do.


What actually broke

A lot.

Every one of these is documented. That documentation is the real product.


The timeline

Realistically: three weeks of evenings and weekends to get to something that runs reliably without hand-holding.

Week 1: persona, memory, basic integrations.
Week 2: sub-agents, X integration, first posts live.
Week 3: guardrails, voice system, content flywheel starting.

The foundation is done. What gets built on it is the interesting part.

That’s what this site documents.