Cohort 2 — applications opening soon

A method for working with
AI coding agents — for
engineers who've already shipped.

Two-weekend live cohort. Thirty engineers. By application. For people who've shipped a feature with Claude Code, Cursor, Devin or Aider, and noticed where it breaks.

You've already lived this.

You’ve shipped with Claude Code. You’ve also seen a confident hallucination get to staging. A refactor that wasn’t. A six-hour loop on a typo. Three weeks of subtle naming drift, found at merge.

The tools are real. The output, sometimes, is not. The next phase isn’t more AI. It’s method for AI.

  • // confident hallucination The agent invents an API that looks plausible. The PR looks fine. CI passes. It ships. It breaks in production a week later, on a code path nobody read.
  • // refactor-that-wasn't You asked for a refactor. You got a rewrite that changed the semantics in three places you didn't notice until QA found them.
  • // six-hour loop The agent gets stuck. You don't notice for an hour because it's chatting confidently. By the time you intervene, you've burned half a day and a hundred dollars in tokens on a typo.
  • // naming drift The agent picks names that are reasonable in isolation but wrong in context. Three sprints later your codebase has four words for the same thing and nobody knows which one is canonical.
  • // scope creep "While I was in there, I noticed…" The agent helpfully fixes things you didn't ask it to. The PR is now 600 lines and impossible to review.
  • // review collapse Reviewers stop reading carefully because the code looks too clean. Nobody's reading the spec because there is no spec. Bugs slip through that a junior would have caught.

The spec is the unit of work.

Not the prompt. Not the chat. Not the PR. The spec.

Every failure above is a spec failure dressed up as something else. Hallucination is a spec that didn’t constrain the API surface. Refactor-that-wasn’t is a spec that didn’t define non-goals. Naming drift is a spec that didn’t include a glossary. Once you write the spec the agent should be reading, the failure modes become legible — and most of them stop happening.

01.

Spec, not prompt.

The spec is a decision-grade document the agent reads before it reads your message. Problem, constraints, non-goals, success criteria. The prompt becomes a thin layer on top.

02.

Plan, analyse, design, re-plan.

A four-step loop the agent runs against the spec, not against your last instruction. The loop is what makes the work survive a long session without drift.

03.

Scaffolding that doesn't rot.

Variables, decisions, manuals — versioned alongside the code. The artifacts that let the next session pick up where this one stopped without re-explaining the whole project.

// the four-step loop — fig. 1

Engineers buy outcomes, not curricula.

By the end of weekend two, you walk away with four things. Not slides. Not a workbook. Four things you can take into Monday.

01. repo template

A working repo template.

The structure your agent reads from on every session. /spec, CLAUDE.md, AGENTS.md, the four files in /spec. Forkable.

02. shipped feature

A shipped feature against your own spec.

Real code, against a real codebase you bring with you, demoed live in front of the cohort on day four.

03. the method, documented

A documented methodology.

The four-step loop, the failure forensics, the spec format — written down in your words so you can take it back to your team without re-deriving it.

04. the cohort

Twenty-nine peers across the Gulf and SL.

A Slack that stays open after the cohort ends. Engineers who’ve lived the same failure modes and committed to the same method.

Two weekends. Four sessions.

Eighteen hours total. Ten hours live across four sessions. Eight hours async between weekends, paced by structured touchpoints. No prerecorded content; this is a workshop, not a course.

// weekend 1 — saturday

Failure forensics + the stack you actually need.

Diagnose. We catalogue the failure modes you’ve already lived through and name them precisely. No code yet. The diagnosis is the foundation.

// weekend 1 — sunday

The spec, part one.

Live spec writing against a real codebase you bring. Repo setup. The four files in /spec. By end of session you have a spec the agent can read from.

// weekend 2 — saturday

Driving the agents + maintenance.

The four-step loop, run live in front of the room. How to intervene. How to scope. How to maintain the spec as the code grows. When each tool wins, when each fails.

// weekend 2 — sunday

Build sprint, demo day.

Ship a feature against your spec. Demo it to the cohort. Take questions. The artifact you leave with is the demo.

Method, not vibes.

There are three legitimate methodologies for working with AI coding agents — spec-first, eval-first, and vibe-then-verify. Each has a place. This program teaches one of them, and is honest about it.

If you're considering Our distinction
YouTube tutorials, Twitter threads Live, structured, accountable. You demo in front of the room.
Generic prompt engineering courses We don't teach prompts. We teach specs. The prompt is downstream.
Internal experimentation at your company Twenty-nine peers across the Gulf and SL, structured failure forensics, named methodology.
Vibe-coding bootcamps This is spec-first by design. If you want to feel it out, this isn't the cohort.
Hiring a consultant This trains you to do the work. Cheaper, more durable, transferable to your team.

An honest filter.

The hard floor is two years of professional engineering and at least one feature shipped with an AI coding agent. Below that and the cohort doesn't work for you. Above that and the failure modes will be familiar.

// this is for you if

  • You have two-plus years of professional engineering experience.
  • You've shipped at least one feature using Claude Code, Cursor, Devin or Aider.
  • You build B2B software, internal tools, or regulated work where "looks about right" doesn't ship.
  • You can attend live SGT-time sessions across two weekends.
  • You read documentation. You write ADRs sometimes. You have opinions about naming.

// this is not for you if

  • You're new to engineering or to AI coding tools.
  • You build consumer-facing experimental products where vibes are fine.
  • You want async-only learning. This is live.
  • You're looking for a tools tutorial or a prompt collection.
  • You can't make three of four live sessions.

From engineers who ran the loop.

I'd been using Claude Code for six months and getting better at it incrementally. The spec-first method was the first thing that changed how I worked, not just how fast I worked. The repo template alone saved my team a fortnight in the first month.
[Engineer name] — [role], [company], [country]

More voices land as cohort 2 progresses. We don't publish stats below the threshold we'd want to see ourselves.

Cohort 2 is by application.

Applications open in the next few weeks. The form takes about ten minutes and includes one question that does most of the qualifying work: describe a build that went sideways with an agent, and what you learned. Anyone who can answer that question is in the audience.

Drop your email and we’ll send the form the day it opens — along with cohort 2 dates, fee, and the qualifying questions in advance so you can think about them.

$ notify –cohort 2

// we email twice. once when applications open. once when cohort 2 dates are confirmed.

What people usually ask.

What's the time commitment?

Two Saturdays and two Sundays, three and a half to four hours each, plus about three hours async between weekends. Eighteen hours total across two consecutive weekends.

What time zone?

SGT, 10:00 start. Confirmed at signup. Recordings are released within six hours, but live attendance is the default. If you can't make three of four sessions live, this isn't the right cohort.

What does it cost?

Sent after application acceptance. We application-gate to make sure the cohort fits, before talking pricing. Pricing lands in a conversation with someone we've already said yes to. That's the right shape for this kind of programme.

Do I need to know Claude Code specifically?

No. We cover Claude Code, Cursor, Devin and Aider. You should have used at least one of them to ship a feature. Tool-specific tutorials live on YouTube; we teach when each tool wins and when each fails.

What's spec-first vs eval-first vs vibe-then-verify?

Three legitimate methodologies for working with AI coding agents. Spec-first front-loads the contract: the agent reads a decision-grade spec before the prompt. Eval-first front-loads the test: you write the eval, then let the agent satisfy it. Vibe-then-verify lets the agent run loose, then formalises post-hoc. We teach spec-first because it's the methodology that holds up best for B2B software, internal tools, and regulated work, and we say so on the page.

What language is the programme in?

English.

Is there a certificate?

Yes, available on request. It's not the point. The repo and the method are the point.

Can my company pay?

Yes. We provide invoices. We also run private cohorts for teams of ten and above; ask us when you apply.

Who's running this?

Janaka Ediriweera and Tiran, co-founders of specshop.dev. Nine years of spec-first delivery across more than a hundred and twenty products in five countries.

What happens after the cohort?

The cohort Slack stays open. One-to-one advisory and a contractor pipeline are available for top performers. Mentioned, not detailed.