Table of Contents >> Show >> Hide
- First, What “Vibe Coding” Really Means (In Production Terms)
- The 3 Apps (And Why We Were Brave Enough to Ship Them)
- What Actually Worked (Our “Vibe Coding, But Make It Production” Playbook)
- 1) We Stopped Asking for Code Firstand Started Asking for a Plan
- 2) We Forced Smaller Diffs (Because “Accept All” Is How Legends Get Fired)
- 3) We Used Trunk-Based Development + Feature Flags (So Shipping Didn’t Mean Releasing)
- 4) We Put Tests Where the AI Was Weak (And Let It Excel Where It Was Strong)
- 5) We Measured Delivery Like Adults (DORA Metrics, Not Vibes)
- 6) Observability Became a Feature, Not a Nice-to-Have
- 7) Security: We Used Real Frameworks (Because “Seems Secure” Is Not a Control)
- What Nearly Killed Us (And How We Survived)
- The Checklist We Now Use Before Any Vibe-Coded App Hits Production
- So… Should You Vibe Code to Production?
- Bonus: 500 More Words of Hard-Won Experience (Because We Have the Scars)
- Conclusion
We didn’t set out to “vibe code” our way into production. We set out to ship. Fast. With AI help.
Then the AI help got… very helpful. Suddenly we were moving at a pace that felt illegal in most states.
Three apps later, we’ve learned a painfully clear truth:
vibe coding can absolutely get you to productionbut only if you install guardrails before the wheels come off.
This is the honest post-mortem (minus the crying): what worked, what failed, and what nearly turned our on-call phone into a tiny haunted rectangle.
First, What “Vibe Coding” Really Means (In Production Terms)
“Vibe coding” is the AI-driven style of building where you describe what you want in plain English, accept big chunks of generated code,
and iterate by running the app, reading errors, and prompting again. The vibe part is real: you’re steering with intent, examples, and feedback
not writing every line yourself.
The problem is that production doesn’t care about vibes. Production cares about:
authentication, reliability, data integrity, observability, and that one user who will absolutely paste 10,000 emojis into your input field.
So we treated vibe coding like a power tool: great for speed, terrible for fingers if you refuse to read the manual.
The 3 Apps (And Why We Were Brave Enough to Ship Them)
Each app started as “a quick tool” and ended as “please don’t break, we just told people about you.”
Different products, same pattern:
- App #1: An internal ops dashboard (low stakes… allegedly).
- App #2: A customer-facing workflow app (medium stakes, maximum opinions).
- App #3: A lightweight analytics companion (high stakes, because data).
We shipped each one faster than our old playbook allowed, but we didn’t ship blindly. We built a repeatable system that let AI move quickly
while still respecting the laws of software physics.
What Actually Worked (Our “Vibe Coding, But Make It Production” Playbook)
1) We Stopped Asking for Code Firstand Started Asking for a Plan
The biggest upgrade wasn’t a new model or tool. It was changing our prompts.
We stopped saying: “Build me a settings page.”
We started saying: “Design the architecture, list risks, define data flow, then generate code in small PR-sized chunks.”
Our best prompts forced structure:
- Inputs: constraints (stack, deadline, existing services), known requirements, what “done” means.
- Outputs: endpoint list, schema, error cases, auth rules, logging events, test plan, rollout plan.
- Risk scan: “What could go wrong in prod?” (The AI is surprisingly good at catastrophizing, when asked.)
Once we had a plan, code generation got cleaner, faster, and less “mysteriously haunted.”
2) We Forced Smaller Diffs (Because “Accept All” Is How Legends Get Fired)
Big AI-generated commits are a trap. They feel productive and ship-shaped, but they hide bugs like a toddler hides broccoli.
We enforced a hard rule: no change bigger than what a human can review in 15–20 minutes.
Practically, that meant:
- One feature per PR.
- One responsibility per file (when possible).
- “No refactors plus new features” (AI loves bundling those like it’s making a burrito).
The AI didn’t mind. It just needed the constraint spelled outlike a golden retriever that codes.
3) We Used Trunk-Based Development + Feature Flags (So Shipping Didn’t Mean Releasing)
Vibe coding increases iteration speed, which makes long-lived branches even more painful.
We moved fast by merging small changes frequently (trunk-based development) and hiding incomplete work behind feature flags.
Feature flags became our “seatbelt”:
- Deploy code continuously.
- Release gradually (internal users → small % → everyone).
- Kill switch instantly if something starts smoking.
The key discipline: flags are either short-lived (remove them) or intentionally permanent (document them). Otherwise, you’re just
building a museum of dead toggles.
4) We Put Tests Where the AI Was Weak (And Let It Excel Where It Was Strong)
AI is great at generating lots of plausible code. It is less great at predicting your weirdest edge cases.
So we leaned on a test “pyramid” mindset:
- Many unit tests for pure logic (cheap, fast, catches dumb mistakes).
- Some integration tests for APIs + database behavior (where “it works on my machine” goes to die).
- Few end-to-end tests for critical paths (login, payment, core workflow).
We also used AI to generate test scaffolding and test cases, but we wrote the “golden paths” ourselves: the tests that define the product’s promise.
If you’re vibe coding, your tests are your memory.
5) We Measured Delivery Like Adults (DORA Metrics, Not Vibes)
We started tracking a few delivery metrics that correlate with high-performing teams:
deployment frequency, lead time for changes, change failure rate, and time to restore service.
Here’s why this matters for vibe coding: when your throughput increases, you need a speedometer and brakes.
If change failure rate spikes while deployment frequency rises, the AI isn’t “making you faster”it’s making you faster at breaking things.
6) Observability Became a Feature, Not a Nice-to-Have
Our first near-disaster wasn’t caused by a giant bug. It was caused by a tiny bug we couldn’t see.
That’s when we learned:
if you can’t observe it, you can’t own it.
For every app, we added:
- Structured logs (request ID, user ID, error codes, key events).
- Metrics (latency, error rates, queue depth, external API failures).
- Tracing for slow requests (so we could find “the one call” that was ruining everyone’s day).
- Alerts that page you for user pain, not for meaningless noise.
AI can generate instrumentation quicklyif you specify what you want to measure and why.
Otherwise it will happily log “something happened” 4 million times and call it a day.
7) Security: We Used Real Frameworks (Because “Seems Secure” Is Not a Control)
Vibe-coded apps can accidentally reinvent the greatest hits of security failures.
So we anchored our security checks to well-known practices:
- Secure development fundamentals (threat modeling, secure defaults, least privilege).
- Web app risk awareness (access control, injection, misconfiguration, insecure design patterns).
- Supply chain hygiene (dependency updates, vulnerability scanning, SBOM thinking).
We also did one painfully simple thing that saved us multiple times:
we wrote an “auth rules” document in plain English, then forced every endpoint and UI action to map to it.
If the AI generated code that violated the rules, tests failedor reviewers did.
What Nearly Killed Us (And How We Survived)
1) Prompt Drift: The App Slowly Became a Different App
Over dozens of prompts, the AI “helpfully” optimized things… in directions we didn’t intend.
A settings page became a mini admin console. A simple report became a complex query builder.
The scope creep didn’t happen in a meeting. It happened in generated code.
Fix: we maintained a living product brief (requirements, non-requirements, data boundaries) and pasted the key section into every major prompt.
Repetition is annoying. So is rewriting your app.
2) Hallucinated Integrations (The API Endpoint That Never Existed)
AI will confidently call endpoints that look reasonable. Sometimes those endpoints are imaginary.
You find out later when production is returning “404” like it’s a personality trait.
Fix: contract tests, mocked staging, and a rule: “No external call without a documented contract and retries/timeouts.”
3) Data Migrations: AI Is Brave, Databases Are Not
On App #3, the AI proposed a schema change that looked elegant… and would have been a disaster at scale.
Production databases don’t care about elegance. They care about locks, backfills, and not ruining your weekend.
Fix: we adopted a boring migration checklist:
- Backwards-compatible changes first (add columns before removing).
- Write code that supports old + new schema during rollout.
- Backfill in batches with monitoring.
- Only then remove old fields.
Boring is the point.
4) Secrets and “Helpful” Logging
AI loves logging. It also loves logging the wrong things.
At one point we caught a debug log line that would’ve exposed a token in plain text.
That’s the sort of bug that doesn’t crash your appit crashes your reputation.
Fix: a redaction layer, lint rules for common secret patterns, and a strict “no secrets in logs” test.
Plus: we stopped passing secrets through the AI toolchain and used proper secret managers/environment injection.
5) The “Works Locally” Performance Cliff
AI-generated code can be correct and still painfully slow:
N+1 queries, unbounded loops, missing indexes, eager-loading everything because “why not?”
Fix: performance budgets and profiling in staging:
- Set a target latency for key endpoints.
- Load test the critical path.
- Add caching only after measurement (cache-first is how you store mistakes faster).
The Checklist We Now Use Before Any Vibe-Coded App Hits Production
Here’s the “no drama” gate we apply every single time:
- Requirements: written spec + “not in scope” list.
- Security: auth rules mapped to endpoints + basic threat modeling pass.
- Tests: unit + integration coverage for critical logic; a few E2E smoke tests.
- CI/CD: automated build/test; deploy pipeline; rollback plan.
- Release safety: feature flags, staged rollout, kill switch.
- Observability: logs/metrics/traces + alerts tied to user pain.
- Dependencies: vulnerability scanning + automated update workflow.
- Runbook: “If X breaks, do Y” (written before it breaks, not during).
If you do only one thing from this list, do this: write the runbook. It will make Future You feel loved.
So… Should You Vibe Code to Production?
Yesif you treat AI like a fast junior teammate:
capable, tireless, occasionally overconfident, and in desperate need of clear requirements and review.
Vibe coding shines when you:
- Need rapid iteration and can break work into small pieces.
- Have strong CI/CD and a culture of testing and review.
- Use feature flags and gradual rollouts to reduce blast radius.
- Care about observability and operational readiness from day one.
It fails when you:
- Ship massive diffs you don’t understand.
- Skip tests “just this once.”
- Ignore security basics because the app “is small.”
- Wait until users complain to learn what’s happening in production.
Bonus: 500 More Words of Hard-Won Experience (Because We Have the Scars)
After shipping the third app, we realized something awkward: the AI didn’t “replace” engineering effort. It moved the effort.
We spent less time typing and more time doing what software teams were always supposed to doclarify, validate, and operationalize.
The vibe part is fun. The production part is a responsibility.
Here’s what changed in our day-to-day workflow once we accepted that:
We started every build with a one-page “definition of done.”
Not a novel. One page. It included the user story, the core flow, the scary edge cases, and the explicit list of what we were not building.
AI loves to be helpful. Without boundaries, it becomes “helpful” in the same way a puppy is helpful around an open bag of flour.
That one page became our anchor; when a prompt drifted, we pulled it back.
We learned to prompt for tradeoffs, not miracles.
Instead of asking, “Make it scalable,” we asked, “Given 10k daily active users, what breaks first and how do we instrument it?”
Instead of “Make it secure,” we asked, “List the top three abuse cases, then propose mitigations and tests.”
When you prompt for specificsworkload assumptions, threat scenarios, failure modesthe output becomes less magical and more engineering-grade.
We stopped treating refactors as “free.”
AI will happily rewrite large parts of your codebase because it sees a cleaner pattern. Sometimes it’s right. Often it’s not worth the risk.
In production work, “cleaner” is not the same as “safer.” We adopted a rule: refactors require a measurable reason
(performance, reliability, security, maintainability pain). Otherwise, it’s just noveltyfun until it breaks.
We made rollback boring on purpose.
We practiced rollback like a fire drill. We documented it. We timed it. We automated what we could.
This had an unexpected side effect: it made shipping less scary, which made us ship more, which made us learn faster.
A reliable undo button is the difference between confident delivery and anxious tinkering.
We changed how we do code review.
Review wasn’t about style. It was about risk:
“What’s the blast radius?” “What happens if this dependency fails?” “Are we logging anything sensitive?”
“Is this endpoint protected the way the spec says it is?” That lens made AI-generated code reviewable.
And yes, we rejected AI code. Often. The model didn’t sulk. It just generated a better version.
The biggest lesson: vibe coding can compress build time, but it can also compress the feedback loop in dangerous ways.
If you don’t slow down at the right momentssecurity, migrations, observabilityyou’ll pay for it later with interest.
Ship fast. But install guardrails faster.
