I Spent $1,300+ Testing Every Major AI Agent Tool This Month. Here's the Honest Verdict.

I didn't plan to spend $1,300 on Perplexity Computer this month. Or max out both my OpenAI and Claude Max plans at $200 each. But when you're running a VC operations consultancy and trying to figure out which AI agent tools are actually ready for real work — you end up going deep.

This is what I found.

aThe Setup

At Black Matter VC, we build AI-powered systems, automations, and data infrastructure for VC funds and enterprises. So when I test these tools, I'm not testing them for fun — I'm testing them to see if they can do the kind of work my clients need done. CRM automation. Dealflow pipelines. Sales outreach. Back-office ops.

I also run Junglebee, a SaaS platform for tour operators that I've been building for nine years. It became one of my main test beds this month.

Here's what I put through its paces: OpenClaw (on Mac Mini and a VPS via Railway), Perplexity Computer, Paperclip multi-agent system, Claude Code with Gary Tan's GitHub repo, and Replit Agent 4.

bOpenClaw: It Works — But the Hype Isn't Telling You the Whole Story

Let me say this upfront: OpenClaw can work for professionals. It can help businesses and it can make money. I'm not writing it off.

But if you've been on X lately, you've seen the posts — "my OpenClaw runs my company," "gave it $100, barely touch it, it's printing money." That narrative is misleading.

Behind every impressive OpenClaw setup you see on social media, there is someone spending serious hours training it, fixing broken flows, rebuilding skills, and monitoring tasks that silently failed. That behind-the-scenes work doesn't make the viral clips.

OpenClaw is genuinely good for content creation, social marketing, daily briefs, SEO audits, idea generation, and competitive analysis. These are real, valuable use cases for any business.

Where it struggles is the gap between capability and reliability. It can do the task — but doing it consistently without you hovering over it is a different story. The pattern I saw repeatedly: it tells you it fixed the problem, then silently doesn't. It says it'll schedule a fix, then doesn't follow through. This cycle can repeat for days.

The tool has genuine merit. But if you're a professional with limited time, be prepared for significantly more hands-on work than the hype suggests. The people making it look effortless on X are putting in far more hours behind the scenes than they're showing.

cPaperclip Multi-Agent: Promising, But Needs Serious Setup Investment

I want to be fair about Paperclip — it's a genuinely exciting project and I believe multi-agent systems are where this space is heading.

The challenge right now is the setup work. Environment variables need to be properly available to every sub-agent. Each agent's internal files and instructions need constant updating as you refine the system and add new capabilities. If any agent is missing context, things go sideways — and tracing the problem takes time.

Approach one: OpenClaw acting as the "CEO," directing Paperclip agents in specific roles — CTO, developers, and so on. The architecture looked impressive. The output quality wasn't where it needed to be for professional use.

Approach two: Pure Paperclip, with OpenClaw as the agent developer. Better results, and the potential was clearly visible. But for my specific use case — building apps — it underperformed compared to Replit Agent 4 and Claude Code with Gary Tan's repo.

I'm not writing it off. The architecture is compelling and with enough investment in the agent configurations, I think it could be very powerful. It just wasn't the fastest path to production-quality output for me this month. I'll keep iterating on it.

dClaude Code + Gary Tan's GitHub Repo: Quietly Excellent

This combo is underrated.

Gary Tan's repo gives Claude Code a strong foundation to work from, and the result is that 3 out of 4 times, Claude Code actually tests what it builds before handing it back to you. And when you tell it to fix something, it fixes it — close to 100% of the time.

That reliability matters. When you're building real tools and not just prototypes, knowing that "fix this" will actually result in a fix changes the entire workflow.

The setup does require assembling your own services: Supabase for database and auth, Vercel or Railway for hosting and deployment. You have more control, but you're wiring up the infrastructure yourself. For developers who want that control, it's worth it.

eReplit Agent 4: A Genuine Step-Change

Replit Agent 4 has had a step-change improvement. Beautiful designs. Understands visual hierarchy and responsiveness. Produces clean code without duplicates. It's moving faster month over month than anything else I've tested.

If you wrote Replit off six months ago, go back and try it. It's a different product.

Here's my honest current assessment: Replit is visually far ahead of everything else I've tested. On the backend, it's as capable as Claude Code. The real difference is the infrastructure model. Replit gives you everything in-house — database, hosting, and deployment are all built in. You ship faster because there's nothing to wire up.

Claude Code with a strong repo structure gives you more control and flexibility, but you're assembling your own stack — Supabase, Vercel, Railway.

I'm genuinely still deciding between these two for full app builds. They're both strong. They just make different tradeoffs. I'll have a clearer view after another month of building with both.

fPerplexity Computer: The One That Actually Changed My Business

I'll be honest — I went in skeptical. But Perplexity Computer handled tasks that OpenClaw repeatedly failed at, and it handled them the first time.

Here's the specific thing I built for Junglebee: a fully automated sales pipeline targeting Caribbean tour operators. The system researches companies, identifies owners, finds WhatsApp numbers and email addresses, populates an Airtable CRM, runs drip email campaigns, sends WhatsApp outreach, and generates daily briefs.

I gave it the goal. It built the pipeline. I'm generating leads now. From a tool I'd had for less than a month.

For founders and operators running lean teams, that's not a productivity gain — that's a structural change in what's possible.

The cost is real: expect $1–2K in your first month if you're using it properly. My advice — use standard Perplexity Pro for research (it's nearly free at that tier), and save your Computer credits for actual build work and automations.

gThe Playbook I'm Running Now

After all of this, here's where I've landed — with the honest caveat that the full-app question is still open:

Full application builds — Option A: Replit Agent 4. Everything in-house. Visually excellent. Fast to ship. Great if you want everything handled for you.
Full application builds — Option B: Claude Code, locally, with Gary Tan's repo. More control. You wire up Supabase, Vercel, Railway. Better for developers who want to own the stack.
Operations and automations: Perplexity Computer. Nothing else is close right now.

I'm running both Replit and Claude Code in parallel to figure out which I'll standardise on. I'll share what I find.

For April, I'm pushing Claude Cowork as far as it'll go. I'll document what happens.

If you're running a fund or scaling a startup and trying to figure out where AI agents actually fit into your operations, this is what I'm seeing on the ground. Happy to compare notes — what are you testing right now?

— Michael Rouveure

aThe Setup

bOpenClaw: It Works — But the Hype Isn't Telling You the Whole Story

cPaperclip Multi-Agent: Promising, But Needs Serious Setup Investment

dClaude Code + Gary Tan's GitHub Repo: Quietly Excellent

eReplit Agent 4: A Genuine Step-Change

fPerplexity Computer: The One That Actually Changed My Business

gThe Playbook I'm Running Now

More from the Lab.

What My AI Agent Swarm Actually Costs to Run

Loop Engineering: An Honest Verdict From Someone Who Actually Runs Agent Loops

The week the AI harness became the moat

If this was useful,you should book a call.

If this was useful,
you should book a call.