Week 4 — Ultra-fast iteration + cross-discipline expansion

Why Week 4 is the throughput-unlock week

By now the substrate is in place. Week 1 set the principles. Week 2 installed the operating routines. Week 3 gave you a map of the support functions that need to scale with you. The pieces are there to ship at AI-native velocity.

Week 4 is what flips the switch culturally. Three shifts that, taken together, change what a working week looks like on your team:

The almost-never-say-no reflex. Decoupling the cost of trying something from the cost of deciding whether to try it.
Cross-discipline scope expansion. Everyone’s effective surface area widens because AI is the leveller across specialisms.
PR review automation. The mechanical layer that keeps the now-larger PR stream from bottlenecking on human review.

And one chapter that’s as important as the three shifts combined: the low-quality phase. The visible-output dip that most teams hit in Week 4 (or whenever they install ultra-fast iteration), and that causes the most reversion to old habits if the team isn’t expecting it.

Shift 1 — The almost-never-say-no reflex

The old reflex, deeply trained: someone proposes something, you weigh whether it’s worth doing, whether it fits the sprint, whether you have time, whether it might be a dead-end. The decision-cost is high because the build-cost is high. You say no to things often because the alternative (building something that doesn’t pan out) is expensive.

The new reflex: when something worth trying is proposed, you open a Claude Code tab and have a rough version this afternoon. The build-cost has collapsed, so the decision-cost should collapse with it.

In the new world you almost never say no to anything — you open a new Claude Code tab and let it do it. Rough versions arrive within hours. The interesting conversation moves from “is it worth building” to “does the rough version teach us something we didn’t know.”

What this changes in practice:

Sprint-planning meetings get noticeably shorter, because fewer things are being weighed in the abstract. The rough version is the weigh-in.
The PR count goes up significantly — including a lot of PRs that never merge. That’s the system working, not failing.
The interesting conversation across the team shifts from “what should we prioritise” to “what did this morning’s rough versions teach us.”
You stop being the bottleneck on “deciding whether to try things.” Anyone on the team who can articulate a thing can have a rough version of it in a few hours.

What it doesn’t mean: “say yes to everything.” The discipline is reserving the no’s for things that are actually the wrong direction (strategy mismatch, principled disagreement, real capacity constraint) — not things that are maybe the wrong direction (which is most of them, most of the time).

Shift 2 — Cross-discipline scope expansion

The most surprising effect of AI-native engineering for teams: specialism boundaries soften, sometimes dramatically. The backend engineer ships a frontend PR. The frontend engineer ships a backend PR. The PM, the designer, the data scientist ship PRs. The full-stack engineer takes on tech-lead work without the years of seniority traditionally required.

What’s happening: AI is the leveller across deep domain expertise. The backend engineer who couldn’t previously write idiomatic React can now ask Claude to generate the React, review it, iterate. The output isn’t at the level of a senior frontend specialist — but it’s usable, it ships, and it unblocks the backend engineer from waiting on a frontend-team handoff.

Practical implications:

Cross-team handoffs reduce significantly. The thing that used to require a frontend ticket can be done by the engineer who’s already in context.
PR review patterns shift: a backend engineer’s frontend PR gets reviewed by a frontend specialist, but the bar shifts from “is this the way I would have written it” to “is this correct, maintainable, and reasonable for the constraints it had to work under.”
Non-engineer roles ship code. The PM who used to write a JIRA ticket for the engineering team can now write the code themselves for simple changes — with the engineer reviewing, not implementing.
The org chart starts to look different. Specialism depth still matters for hard problems; specialism depth as gating for routine work goes away.

This shift is uncomfortable for engineers whose identity is anchored on being “the only one who can do X.” The honest framing: that identity was built for a world where the cost of skill-acquisition was high. The cost has collapsed for the routine 80% of work. The remaining 20% — the hard architectural decisions, the deep debugging, the place where domain expertise compounds — is where specialism still matters and is more valuable than ever.

Shift 3 — PR review automation

Both shifts above produce more PRs. The PR volume going up while review-capacity stays flat is one of the most visible bottlenecks teams hit at AI-native velocity.

What to automate:

Status checks. Lint, format, type-check, unit tests, integration tests. All run on every PR automatically. Reviewers don’t look at PRs that haven’t passed these.
Auto-review for routine patterns. An agent reviews every PR for: obviously bad patterns (commented-out code, debug prints, hardcoded secrets), deviation from team conventions (CLAUDE.md-defined patterns), missing test coverage on new code. Drafts review comments; humans approve or dismiss.
Reviewer nudging. Agents nudge reviewers via Slack/Teams when a PR has been waiting, respecting your team’s urgency conventions.
Draft / ready-for-review toggles. Automatic transitions based on CI state and self-review completion.
Auto-merge eligibility. When all gates pass and required approvals are in, the PR auto-merges (or gets queued for merge during low-risk windows).

What not to automate: the actual judgement-call review on non-trivial PRs. The agent surfaces, summarises, and triages; the human reviewer makes the call on architectural decisions, on changes that affect product behaviour, on PRs in code areas the team flags as judgement-required.

Review for direction, not size. The quiet shift that makes everything else workable: a one-line feature flip can carry the same weight as a huge refactor that doesn’t visibly change anything on the surface. Trying to scale per-PR review effort with per-PR line count breaks the moment you’re shipping 5× the PRs at a smaller average size. The review question stops being “is this code OK” and becomes “is this change moving the work-stream in the right direction.” Most of the time the answer is yes and the review takes 30 seconds; the few PRs that need real direction-discussion get it.

The low-quality phase — honest chapter

This is the part of Week 4 that determines whether the practice takes or reverts. It’s an emotional / cultural chapter, not a mechanical one. Most adopters who reverted reverted here.

When you adopt ultra-fast iteration, your visible output looks worse for a while. You’re swimming in low-quality, half-finished, rough-iterated stuff every day. This is not failure. This is the phase before you reach a higher final quality than you could have achieved with the old slow-polish approach. The mental gymnastics is enduring the dip while trusting that the higher quality is on the other side.

What the dip looks like, concretely:

Your repo has more open PRs than it did a month ago, more half-finished work, more “tried this, didn’t pan out” branches.
Your team’s code review channel has more comments per PR — because rough first-passes need feedback to converge.
Senior engineers who measure quality by polish feel like the bar is dropping. Mid-career engineers who measure quality by throughput feel like the bar is finally rising. These two groups will disagree, often sharply, mid-Week-4.
Adjacent teams (QA, security, product) start to notice the higher PR volume and may push back — often before they’ve seen the higher final quality on the other side.

Why the higher final quality actually arrives:

You’re running 5–10× the iteration cycles in the same calendar time. More iterations = more chances to find the better solution.
The rough-first-pass surfaces problems early that the slow-polish approach would have hit at integration time (or worse, in production).
Decisions are based on working code, not on speculation. The speculation-error rate collapses.
Specialism boundaries softening means the right expertise can land on a problem earlier in its lifecycle, not after it’s already shipped wrong.

How to stay the course:

Name the phase explicitly with your team. Make it a shared expectation that Week 4 (or its equivalent) is the dip, and that the dip ends.
Track final-quality metrics, not per-PR-polish metrics. Are production incidents going up or down? Is customer-reported quality going up or down? Are the shipped features more or less useful? These are the signals that matter.
Don’t over-correct mid-dip. The team that looks at Week 4 and tries to pull back to Week 0 usually loses the ground they gained.
Lean into Week 5’s polish + political layer work, which addresses both the calibrated-quality question (when polish actually matters) and the adjacent-team-pushback question (how to defend the practice with evidence).

This week’s assignment

Do this — Assignment 1 of 2

Have at least one non-engineer team member ship a meaningful PR this week.

PM, designer, data analyst, customer support, marketer — whichever non-engineer role is most adjacent to your engineering work.
Pick a real piece of work small enough to fit into their week (a copy change, a small UI tweak, a minor report query, a config flip). Not a toy assignment.
Pair-program with Claude Code: they drive, Claude implements, you review.
Land the PR. Notice what they could do that they couldn’t before. Notice what your team’s review process needed to absorb.

Time estimate: 2–4 hours for the pairing session, plus normal review time.

Do this — Assignment 2 of 2

Have at least one engineer ship a rough first-pass of a feature outside their classical specialism, into staging, within 24 hours of starting.

Backend engineer ships a frontend feature, or vice versa. Frontend ships a data-pipeline change, or vice versa. Whatever the team’s cross-discipline gap is.
Goal is rough first-pass in staging, not polished merge to main. The point is to demonstrate that the specialism gap is now spannable.
The engineer who normally owns that area reviews the rough first-pass and writes back what they’d change for production-readiness.
If everything that lands in staging needs to be polished further before merge, that’s correct and expected. The assignment is the rough-first-pass shift, not a quality compromise.

Time estimate: integrated into the engineer’s week. 24-hour rough-pass constraint is the discipline, not the speed test.

Self-check — did this week land?

Has the almost-never-say-no reflex shown up at least once in your week — you opened a tab to try something you’d normally have deferred? What did you learn from the rough version?
Did a non-engineer on your team ship a meaningful PR? If you’re solo, did you do meaningful work outside your own primary specialism?
Did an engineer ship a rough first-pass in a domain outside their classical specialism, into staging, in under 24 hours?
Is at least one piece of PR-review automation now running on your team’s PRs? (Status checks, auto-reviewer, nudging, auto-merge eligibility — any of them.)
Have you and the team named the low-quality phase out loud? Has the “this is the dip, not the ceiling” framing been shared? If you’re solo: have you let yourself accept the dip without course-correcting back to old habits?
Friday demo + weekly planning is still running. By now the Friday demo should be noticeably more demos-per-person than it was in Week 1.

Why this week often hurts

Week 4 is the most common reversion point for teams adopting AI-native engineering. The dip is real, the ambiguity is real, and most teams haven’t built the cultural muscle to endure the visible-output regression in service of the final-quality rise.

Week 5 is partly built around exactly this: the polish + political layer that lets you defend the practice with evidence, calibrate per-surface what quality actually means, and build cross-team independence so that adjacent-team pushback doesn’t derail you.

If Week 4 feels harder than the previous three weeks combined, you’re probably doing it right.