equityflowunderstand the flow and follow it

Choosing your vibecoder: every AI building tool, compared

An honest, tool-by-tool tour of how to build with AI in 2026: the app builders (Lovable, Bolt, v0, Replit), the AI IDEs and agents (Cursor, Claude Code, Antigravity, Copilot, Codex, Grok), how to choose between them, and how usage limits and pricing really work.

There has never been more ways to build software with AI, and the list grows every month. They split into three families: app builders that turn a prompt into a running app, AI-native editors where you still see the code, and autonomous agents that work across a whole codebase. This guide walks every major tool, honestly, with its strengths, its limits and roughly what it costs, then helps you choose. Prices and features move fast, so the durable advice is in the shape of each tool, not its sticker price as of mid-2026.

The app builders: from a prompt to a running app6 chapters · ~23 min

Lovable: prompt to full-stack app

If Bolt and v0 feel like fast scaffolders, Lovable (lovable.dev) wants to be the whole studio. You describe an app in plain English, and it generates a real, running full-stack product: a React frontend, a database, authentication, and the wiring between them. It is the European breakout of the category, reportedly crossing roughly $100M in annual recurring revenue within about eight months of launch, with millions of projects created in its first year. That speed is the pitch, and mostly it delivers.

What it actually is

Under the hood, Lovable generates a TypeScript + React + Vite app. The backend is not a mystery box: it runs on Supabase (Postgres, auth, storage, and serverless Edge Functions), now wrapped as "Lovable Cloud" so non-technical users do not have to think about it. You add a login flow, a payments page, or a "save my data" feature by asking in the chat, and it provisions the Postgres tables, auth providers (email/password, magic links, Google or GitHub OAuth), and Stripe checkout for you. Two things matter here for builders who do not want a black box: native two-way GitHub sync and full code ownership. You can connect a repo, edit in your own IDE or Cursor, open pull requests, and eject entirely if you outgrow the platform. You are not locked in.

Where it genuinely shines

  • Speed from zero. Idea to a clickable, deployed app in an afternoon is normal, not aspirational. One-click deploy to a lovable.app subdomain (or your own domain) is built in.
  • Non-technical friendliness. The conversational loop is forgiving. You do not need to know what a migration is to add a "users table with a profile photo."
  • Design quality. Output leans clean and modern out of the box (Tailwind, sensible spacing, real component structure), which is why it is a favourite for landing pages and polished MVP front ends.

The honest limits

Lovable gets you roughly 60 to 70 percent of the way, and the last 30 percent is where people get burned. The consistent complaint across 2026 reviews: it stumbles on complex, nested business logic, multi-step workflows, and serious security or scalability decisions. Worse, debugging can spiral. When something breaks, the AI often regenerates whole files rather than making a surgical fix, so you hit "bug loops" where it patches one error and quietly introduces two more. The mitigation is to use its Plan/Chat mode to think before it writes, commit working states to GitHub often, and know when to drop into the actual code (or hand it to a developer) instead of re-prompting a broken build five more times.

Credits and cost, roughly (mid-2026)

Lovable bills in credits, consumed per AI message and weighted by task complexity. There is a free tier (about 5 build credits a day, capped near 30 a month). Paid plans start at approximately $25/month for Pro (around 100 monthly credits, shared across unlimited collaborators, with rollover), then Business near $50/month adding SSO and roles, then Enterprise. Plan these are approximate and time-stamped; they move. The real gotcha is burn rate: a basic MVP can easily consume 150 to 300 credits over a couple of weeks, and a debugging loop can eat credits with nothing to show. Watch consumption the way you would watch a cloud bill.

Ideal projects, and a concrete example

Lovable is at its best for MVPs, marketing and landing pages, and internal tools or dashboards: things that need a database, a login, and a clean UI, but not bank-grade architecture. The proof is in what people ship. Lovable highlights customers like ShiftNex, a healthcare staffing product the company says reached around €1M ARR in five months, and Lumoo, an AI fashion platform cited near €800k ARR in nine months. The pattern is identical: a founder with domain knowledge and little or no engineering background gets a working, paid product in front of real users in weeks, not quarters.

Takeaway: treat Lovable as the fastest way to a real, owned full-stack MVP, not as your permanent engineering team. Sync to GitHub from day one, commit working states, watch your credit burn, and plan to bring in a developer (or your own code edits) the moment the logic gets genuinely hard.

Bolt: in-browser full-stack generation

Bolt (bolt.new, built by StackBlitz) is the most "actually a dev environment" of the prompt-to-app tools. You type what you want, and Bolt scaffolds a real project, installs real npm packages, runs a real Node.js server, and shows you a live preview. The twist: all of that happens inside your browser tab. There is no remote VM spinning up. If your laptop is the machine, the dev environment is the page you have open.

What makes it different: WebContainers

The thing under the hood is WebContainers, StackBlitz's own technology that runs a Node.js runtime compiled to WebAssembly directly in the browser. That sounds like a gimmick until you use it. It means Bolt gives you a genuine file tree, a terminal, and a package manager, not a sandboxed approximation. You can npm install a library, watch the dev server boot, and hit a real localhost-style preview, all without anything touching a backend server. Boot is near-instant and you can edit files by hand whenever the AI gets something wrong.

That architecture buys Bolt its biggest strength after raw speed: framework flexibility. Because it is just Node in a browser, Bolt is not locked to one stack. It happily generates Vite, React, Vue, Svelte, Astro, Next.js, or Remix projects, and you can pull in arbitrary npm dependencies. Most of its rivals are React-and-only-React. For deployment, Bolt offers one-click publishing to Netlify, and since the output is a standard Node project you can also push it to GitHub and deploy it anywhere yourself. Database needs are typically wired up through a Supabase integration.

The catch: token burn

Bolt runs on a token model, and this is the single most important thing to understand before you start. Roughly, as of mid-2026: a Free tier with about 1M tokens a month, a Pro plan around $25/month with roughly 10M tokens plus rollover, and Teams and Enterprise tiers above that. Treat those numbers as approximate and time-stamped; the tiers move.

Tokens are not consumed per app, they are consumed per message, and the cost of a message scales with the size of your project. Bolt tends to re-read and re-sync your file tree to stay in context, so a prompt that costs little on a three-file scaffold can cost a lot on a forty-file app. The failure mode that burns founders is the error loop: something breaks, you hit "fix it," the model patches a symptom instead of the cause, that introduces a new break, and you repeat. Six or eight rounds in, the app is in worse shape and a meaningful chunk of your monthly tokens is gone. The lesson is not "Bolt is bad." It is that Bolt rewards small, well-scoped projects and punishes letting a confused model flail on a large one.

A concrete example

Say you want a tiny internal tool: a Vite + React dashboard that pulls rows from a Supabase table and renders a sortable table with a CSV export button. In Bolt this is a sweet spot. One prompt scaffolds the Vite project, installs the Supabase client and a table library, and gives you a running preview in under a minute. You connect Supabase, paste your keys, and iterate on the UI in two or three follow-ups. Twenty minutes later it is live on Netlify. Now contrast the wrong job: a multi-role SaaS with billing, background jobs, and a dozen interlinked screens. That is where the file tree grows, each message gets expensive, and an architectural bug you cannot see quietly drains tokens.

Bolt vs Lovable

They look similar and price similarly (both around $25/month for Pro), but they aim at different people. Lovable is design-first and chat-first, with deep Supabase integration and a structured, opinionated path that suits non-technical builders shipping a polished MVP. Bolt is code-first: a real IDE in the browser, a visible terminal and file tree, broad framework choice, and more low-level control. The pricing split matters too. Lovable meters by message credits, Bolt meters by tokens, so Lovable's cost is more predictable while Bolt's can spike on big projects or debugging spirals.

Practical takeaway:

  • Reach for Bolt when you want a real, editable Node project and framework freedom, and your scope is small to medium.
  • Watch the token meter; if you hit three failed "fix" attempts, stop and edit the code or re-prompt from a clean state instead of looping.
  • Prefer Lovable if you want predictable credit-based cost and a guided, design-led build over hands-on control.

v0 by Vercel: design-first generation

If most of the tools in this guide start from "what should this app do," v0 (at v0.app) starts from "what should this thing look like." Vercel built it, the same company behind Next.js and the hosting platform a huge slice of the modern web runs on, and that lineage shows in everything it produces. v0 is the tool you reach for when the interface is the point: a landing page, a marketing site, a dashboard, a component you want to drop into an existing React app.

What it actually generates

You describe a UI in plain English, or you paste a screenshot, a Figma frame, or a rough sketch, and v0 returns working front-end code. The output is opinionated in a good way: React, Next.js, Tailwind CSS, and shadcn/ui components. That is not a random stack. It is the default stack of a large chunk of the professional front-end world in 2026, which means the code v0 hands you is code a real engineer can read, fork, and ship without rewriting from scratch. Compare that to tools that generate a proprietary or locked-in format, and the difference matters the day you outgrow the generator.

The image-to-code path is v0's standout trick. Drop in a screenshot of a competitor's pricing page, or a Figma export of your designer's mockup, and it reconstructs an editable, reasonably faithful version in code. It is usually faster than describing a layout in words, and for anyone working from a designer's comps it closes the handoff gap that normally eats days. There is also a Design Mode: a point-and-click panel for nudging spacing, colors, and typography, where your visual edits write back into the underlying code instead of drifting out of sync with it.

Where it shines, and where it stops

v0's front-end quality is genuinely strong, arguably the best in this category for polished, on-brand UI on the first try. Through 2025 and into 2026 Vercel pushed it past pure prototyping: a VS Code-style editor, Git integration with branches and PRs, GitHub sync, and one-click deploy to Vercel's own infrastructure. The 2026 updates also added some backend reach (database connections, including enterprise sources like Snowflake and AWS, plus early agentic workflows), so it is no longer strictly a UI toy.

But be honest about the shape of the tool. v0 is front-end first, and its center of gravity is still the interface. Auth, complex business logic, background jobs, a real data model, third-party integrations: you are wiring most of that yourself or leaning on Vercel's ecosystem to do it. If your project is 80% backend and 20% screens, v0 is the wrong starting point. If it is 80% screens, you will move faster here than almost anywhere else.

Pricing, roughly

As of early 2026, v0 runs on a credit model rather than flat feature tiers. The free plan gives you about $5 of monthly credits (enough to explore, not to build a real project, since a single component generation can cost roughly $0.30 to $2.00 depending on complexity and which model handles it). The paid Premium tier is around $20/month with $20 of included credits and the ability to buy more, plus Figma import and API access. Team plans sit near $30/user/month, Business around $100/user/month, with Enterprise custom. Treat every number here as approximate and time-stamped: credit pricing shifts, and credits reset monthly without rolling over.

A concrete example

Say you are launching a waitlist page for a fintech side project. You paste a screenshot of a layout you liked, type "make it ours: dark theme, our logo, a single email field, a three-card feature row, and an FAQ accordion at the bottom," and v0 returns a Next.js page using shadcn components. You open Design Mode, tighten the card spacing, swap the accent color to match your brand, then push it to a GitHub branch and deploy to Vercel in one click. Twenty minutes, live URL. What you do not get for free is the part that stores the email: you still wire the form to a backend, a database, or a service like a Vercel function plus a hosted DB. That split (gorgeous front end fast, plumbing on you) is the whole personality of v0 in one task.

Takeaway: reach for v0 when the UI is the hard part.

  • Best for: landing pages, marketing sites, dashboards, prototyping interfaces, and single components for an existing React/Next.js codebase.
  • Skip it for: backend-heavy apps where the screens are an afterthought.
  • Biggest advantage: it emits the real Next.js, Tailwind, and shadcn/ui stack, so the output survives the move into a production repo instead of trapping you.

Replit: build, run, and deploy in one place

Most tools in this series do one thing well: Cursor edits code, v0 designs UI, Lovable spins up a frontend. Replit tries to be the whole stack. It is a browser-based IDE, a runtime, a Postgres database, an auth system, and a hosting platform, all stitched together so that you can go from a one-line prompt to a live URL without ever opening a terminal on your own machine. For a founder who does not want to think about Docker, environment variables, or where the database lives, that all-in-one quality is the entire pitch.

What Replit Agent actually does

The headline product is Replit Agent. You describe an app in plain English, and the Agent plans the work, writes the code, provisions a database, wires up authentication, connects third-party services, and deploys, all inside the browser. The current generation (Agent 3, released September 2025) can run autonomously for long stretches, up to roughly 200 minutes per session, building, testing, and self-correcting with minimal hand-holding. It will click its own buttons, submit its own forms, and even log into an app it built with Replit Auth to verify the login flow works, then report back and fix what broke.

The backend defaults are sensible. When the Agent needs storage, it provisions a Postgres database in the container rather than asking you to pick one. Auth is a built-in module (Replit Auth) rather than a third-party integration you have to configure. When it connects to something like Notion or Dropbox, it surfaces a small auth UI instead of asking you to paste API keys into a file. This is the difference between "AI that writes code" and "AI that ships a working product," and it is genuinely useful for non-engineers.

A concrete example

Say you want an internal tool: a form where your team logs client meetings, with login, a searchable table, and CSV export. In Replit you prompt the Agent with exactly that sentence. It scaffolds a full-stack app, creates a Postgres table for the meetings, adds Replit Auth so only your team can log in, builds the form and the table view, tests the flow, and gives you a one-click deploy button. Twenty minutes later you have a real URL you can send to a colleague. No local setup, no separate database account, no Vercel config. That workflow is hard to match elsewhere.

The trade-offs: cost and performance

The all-in-one model has two real costs. The first is literal. Replit moved to effort-based pricing, where you pay for the time and compute each Agent task consumes rather than a flat per-message fee. Simple edits are cheap (often under $0.25), but a large, complex feature is bundled into one "checkpoint" that can run well over a dollar, and a hard multi-step build can cost several dollars in one go. Agent runs add up fast when you iterate, and because the Agent works autonomously, a single misguided 200-minute session can burn real money before you catch it.

The second cost is performance and control. You are working in a browser-based container, so heavy builds and large codebases feel slower than a local Cursor setup, and you have less fine-grained control than an engineer used to their own toolchain. Replit is excellent at getting to a working v1; it is less comfortable as the home for a mature, performance-sensitive codebase with a team of senior engineers.

Pricing, roughly (as of mid-2026)

  • Starter (free): a genuine test drive. You can learn the platform and prototype, but apps are public and Agent usage is limited.
  • Core (around $20/month): full Agent access, roughly $20 to $25 in monthly usage credits, unlimited published apps, and a handful of collaborators.
  • Pro (around $95 to $100/month): larger credit allowance with rollover, more powerful Agent modes, more builders, priority support.
  • Deployments: static (frontend-only) hosting is free; scheduled and autoscale deployments start around $1/month; a reserved always-on VM starts around $20/month.

Treat these as approximate and time-stamped: Replit changes plans and credit math regularly, so check the live pricing page before you commit.

Who it is for

Replit is the strongest pick when you value one environment over best-in-class in any single layer. It shines for learners (you can see and run real code without local setup), for non-technical founders shipping internal tools and MVPs, and for collaborative work where multiple people edit the same project live in the browser.

Takeaway: use Replit when "build, host, and deploy in one click, in the browser" is worth more to you than raw speed or low-level control. Watch your Agent spend like a hawk, and set the credit budget before you let a long autonomous run loose.

The wider builder landscape, and how to judge a new one

The big four (Cursor, Lovable, Bolt, Replit, give or take your own list) get the attention, but they are a fraction of the field. New AI app builders launch roughly every week, most ride one of three or four underlying models, and a meaningful share are thin wrappers: a chat box, a prompt template, and someone else's API. The skill that ages well is not memorising names. It is knowing how to take any new tool apart in ten minutes.

A quick map of the wider field

Beyond the headline players, a few names are worth knowing as of mid-2026:

  • Base44: an AI app builder acquired by Wix in June 2025 for an initial roughly $80M plus earn-outs. Conversational, opinionated, handles its own database and deployment. Fast for non-technical founders, but now part of Wix's "vibe coding" line, which shapes where your app lives.
  • Google Firebase Studio: Google's full-stack, Gemini-powered builder. Note the churn: Google has been consolidating this toward Google AI Studio and its Antigravity agent, with documented migration paths (zip-and-download or export to GitHub). Good code export, but the product surface itself moved within a year, which is exactly the kind of platform risk to weigh.
  • Create, Tempo, and the weekly newcomers: prompt-to-app tools aimed at speed. Some are genuinely capable; some are a UI over a single foundation model. The label on the box rarely tells you which.
  • Softr and Glide (no-code): a different category. These build apps on top of existing data (Airtable, Notion, a database) with auth, payments, and views bolted on. Glide, by its own positioning, does not export code at all. That is fine for an internal tool or a portal, and a dead end if you ever need a real engineer to take the wheel.

The rubric: how to judge any new builder

When the next "this changes everything" tool shows up, run it through these questions before you trust it with anything that matters:

  1. Does it export real code you own? Can you get a working repository out, or are you renting access to your own app? "GitHub export" should mean a project that builds and runs locally, not a snapshot you cannot rebuild.
  2. What is the stack, and is it boring on purpose? React, a normal backend, Postgres, all readable. If the generated code is some proprietary DSL or a framework only that tool understands, every future developer pays a tax.
  3. Where does the data live, and can you take it? Whose database, whose cloud account, and can you export the data and point the app at infrastructure you control? Data lock-in is worse than code lock-in because it is harder to undo.
  4. Lock-in on hosting and runtime. Does the app only run inside their platform, or can you deploy it to Vercel, Fly, a VPS, anywhere? "Works only on our servers" is a business model, not a feature.
  5. Can a real dev take over later? The honest test: hand the output to an engineer you trust and ask, "could you maintain this without the tool?" If the answer is no, you have bought a prototype, not a product.
  6. Is it a thin wrapper? Check which model it runs, what it actually adds (agents, deploys, a real backend, testing), and whether the same result is a weekend's work on the underlying model directly. Wrappers are not worthless, but you should not pay platform prices for one.

A concrete example

Say a founder builds a client portal. On Glide it ships in a day, looks clean, and runs on Glide's data and servers with no code to export. Perfect for a 20-person internal tool. The same portal in Lovable comes out as a React app with a Supabase backend, exportable to GitHub, deployable anywhere, and handable to a contractor in month six. Same surface, very different exit options. Neither is "wrong"; the mistake is choosing the first when your roadmap clearly needs the second.

Practical takeaway: match the tool to the lifespan of the thing you are building. Throwaway or internal, optimise for speed and ignore lock-in. Anything you expect to still run and grow in a year, demand code you own, a boring stack, portable data, and a clean handoff to a human engineer. If a shiny new builder fails three or more rubric questions, it is a prototyping toy, and you should treat it as one.

The builder trap: the 80% demo and the 20% that breaks

There is a specific feeling the first time a prompt-to-app tool builds you a working product in four minutes. You type "a marketplace for vintage synths with listings, search, and Stripe checkout," and Lovable or Bolt hands back something that runs, looks decent, and demos beautifully on your phone in the coffee queue. It feels like cheating. It feels like the future arrived early. That feeling is real, and it is also the start of the trap.

Why the demo is the easy 80%

AI coding tools are extraordinary at the happy path: the screens, the layout, the obvious flows, the parts that look like a thousand other apps in the training data. That is most of what you see, which is why a demo feels almost finished. But a demo is roughly 20% of the code in a real product. The other 80%, the part you do not see in a screen recording, is error handling, input validation, edge cases, authorization, concurrency, monitoring, and the quiet plumbing that keeps data correct. The model optimizes for the path where everything goes right, because that is the path that makes a convincing demo.

The gap shows up first, and most painfully, in three places:

  • Auth. Generated auth handles login and signup. It tends to miss session expiry mid-form, OAuth state validation, magic-link race conditions, password reset, and MFA. Real users hit these in week one.
  • Payments. A common failure: a Stripe webhook that does not verify signatures and has no retry handling. One widely-cited write-up describes a build where roughly 15% of successful charges never got recorded in the database. The customer was charged; your app never knew. That is not a bug, that is a refund queue and a trust problem.
  • Data integrity. No unique constraints, no transactions, no idempotency. The demo works because one tester clicks slowly. Production has two users clicking the same button at the same second.

None of this is exotic. It is predictable, and that is the point: the last 20% is where projects stall, because it is the part the demo trained you to assume was already done.

Debugging code you did not write

When the 80% demo cracks, you hit the second wall: you have to fix code you have never read. With a hand-built app, you debug your own mental model. With a generated app, you are reverse-engineering an unfamiliar codebase under pressure, and the same AI that wrote the bug is often confidently wrong about where it lives. You ask it to fix the checkout, it rewrites three files, the symptom moves, and now two things are broken. This is the moment many non-technical builders quietly give up, or start paying someone to untangle it.

Lock-in, and why "own your code" is the whole game

This is where the platform you chose stops being a convenience and becomes a constraint. The single most important question to ask before you invest weeks in any prompt-to-app tool: can I export real, standard code and walk away?

The answer varies a lot, as of mid-2026:

  • v0 (v0.app) outputs standard React/Next.js you can drop into your own repo. Low lock-in.
  • Lovable (lovable.dev) offers two-way GitHub sync on paid plans: React, Tailwind, your Supabase config, the real folder structure. You can edit in a normal IDE and sync back.
  • Replit (replit.com) has real Git and a persistent filesystem, but tends to tie your database, auth, and hosting to its platform. Convenient, stickier.
  • Bolt (bolt.new) can push to GitHub, but being browser-first it is generally the most awkward to fully migrate off.

The practical rule: if a tool cannot give you a clean Git repository you control, treat anything you build in it as a prototype, not a foundation. Connect it to GitHub on day one, before you have anything worth losing. Owning the code is what turns "I am trapped in a tool" into "I can hire any engineer to take it from here."

The signal to graduate

Prompt-to-app tools are the right tool for validation, internal tooling, and the first version. The skill is knowing when you have outgrown them. Move to a real IDE (with an agent like Cursor or Claude Code, covered in the right tools), or bring in an engineer, when you see:

  • You are taking money, storing personal data, or anything where a bug means liability, not just embarrassment.
  • You spend more time fighting regenerated code than moving forward, and fixes keep breaking other things.
  • The same edge cases keep reappearing because nobody owns the architecture.
  • You need things the platform hides: migrations, tests, background jobs, observability, a staging environment.

Hitting that wall is not failure. It is graduation. The demo got you to "people want this." The next 20% is how you keep them. Parts III and V are about exactly that handoff: taking the code you own into a real environment and making it survive contact with real users.

The AI IDEs and coding agents7 chapters · ~26 min

Cursor: the AI-first IDE

If the browser-based builders in the previous chapters are about getting a working app without touching code, Cursor (cursor.com) is the opposite bet. It is a code editor first, an AI tool second, and it assumes you can read what the model writes. That assumption is the whole reason it has become the default workspace for so many working engineers. It is a fork of VS Code, so your extensions, keybindings, themes, and muscle memory carry over on day one, and the AI is woven through the editor rather than bolted on as a chat sidebar.

The four things that matter

Cursor has a lot of surface area, but four features do most of the work:

  • Tab. This is Cursor's autocomplete, and it is the feature people miss most when they leave. It does not just finish the current line, it predicts your next edit and the location of that edit, so you press Tab, jump down twelve lines, press Tab again, and watch a rename or refactor ripple through a file. After an hour you stop thinking about it.
  • Agent (formerly Composer). You describe a change in natural language and the agent reads the relevant files, writes code across multiple files, runs terminal commands, reads the output, and fixes its own errors in a loop. It shows you diffs you can accept or reject per file. As of mid-2026 Cursor also ships its own in-house model, Composer, tuned for low-latency agentic edits and noticeably faster than routing every step through a frontier model.
  • Cursor Rules. A .cursor/rules/ folder of version-controlled instruction files ("always use our logger, never console.log", "API routes live in src/api", "we use Tailwind not styled-components"). These get fed to the model so it matches your conventions instead of inventing its own. Because rules are file-scoped, they load only when relevant and keep the prompt lean as the set grows.
  • MCP support. Native Model Context Protocol support lets the agent reach external tools and data: query your Postgres schema, hit a Linear or GitHub server, pull live docs. This is how you stop the model guessing about your database and let it read the real thing.

A concrete workflow

Say you need to add a "downgrade plan" flow to a SaaS billing page. In Cursor you open the agent, point it at the existing upgrade code with @ file mentions, and write: "Add a downgrade path mirroring the upgrade flow in @billing/upgrade.ts, including the Stripe call, a confirmation modal, and a unit test." The agent reads the upgrade file, writes the new handler, the React modal, and a test, then runs the test suite in the integrated terminal. The first run fails on a type error; it reads the stack trace and patches it without you asking. You review three diffs, reject the modal styling because it ignored your design tokens, add a Cursor Rule pointing at your token file, and re-run. Five minutes, and crucially you saw every line before it landed.

Strengths and limits

The strength is flow. For someone who can read code, Cursor is the fastest way to stay in the loop while the AI does the typing, and the per-file diff review is a genuine safety rail against the "it changed forty files and I have no idea what happened" problem that plagues more autonomous tools.

The limits are real. You must be able to read code, or you are accepting diffs on faith, which is exactly how people ship subtle bugs and security holes. Usage is metered: heavy agent use burns through credits, and a long multi-step agent run costs far more than a single Tab completion. Budget accordingly.

Pricing, approximate, as of mid-2026

  • Hobby: free, with limited Tab and agent requests.
  • Pro: around $20/month: unlimited Tab, generous agent limits, all frontier models, MCP, and a monthly credit pool for premium requests.
  • Pro+ / Ultra: roughly $60 and $200/month for multiples more usage.
  • Teams: about $40/user/month with shared rules, SSO, and admin controls.

Treat these as a snapshot; Cursor's plans and the underlying credit math change often, so check the live pricing page before committing a team.

Takeaway: Cursor leads when a competent reader of code wants AI to accelerate real work in a real repo. If you cannot yet evaluate a diff, start with a builder from the earlier chapters and graduate to Cursor once you can.

Claude Code: the terminal agent

Most of the tools in this series put a visual layer between you and the work. Claude Code, from Anthropic, does the opposite. It lives in your terminal, reads and edits your actual repository, runs your actual commands, and treats the whole codebase as fair game. It is less "AI feature inside an editor" and more "an agent you hand the keys to." That framing is the single most useful thing to understand before you start.

What it actually is

Claude Code is an agentic coding tool that you launch from the command line in your project folder. Instead of you selecting which files to give it as context, it uses agentic search to crawl the codebase itself, then makes coordinated changes across multiple files, runs tests, reads the failures, and tries again. It asks permission before editing files or running commands, which is the seatbelt that keeps it from going off a cliff unsupervised.

As of mid-2026 it runs on Anthropic's Claude 4.x family: Opus (currently 4.8, the heavyweight for hard reasoning), Sonnet (4.6, the balanced default), and Haiku (4.5, fast and cheap for routine edits), plus the frontier Fable 5 tier. Opus 4.8 ships with a 1M-token context window, which is why it can hold a large codebase in its head at once. You can switch models mid-session and let cheap models do grunt work while the expensive one plans.

The features that matter

  • Plan mode: Claude researches and proposes a plan before touching anything. You read it, correct it, then let it execute. This is the highest-leverage habit in the whole tool. Plan first, build second.
  • Subagents: specialized helpers, each with its own context window and tool permissions, coordinated by a main agent. One subagent reviews while another writes, and the verbose work stays isolated so your main thread does not drown in noise.
  • MCP (Model Context Protocol): a standard way to plug in external systems. Point it at a GitHub server, a Postgres database, or a browser, and Claude can reason over them as first-class context instead of you copy-pasting.
  • Hooks: deterministic scripts that fire at lifecycle points (for example PreToolUse before any command runs). Use them to block dangerous operations or auto-run a linter. The model is probabilistic; hooks are not, which is exactly why they are valuable as guardrails.
  • Skills: reusable markdown instruction files (SKILL.md) that capture your conventions so you stop re-explaining them every session.
  • Long-running and remote work: it runs lengthy autonomous tasks, and beyond the terminal it works in VS Code and JetBrains, on the web at claude.ai/code (each session gets an isolated cloud VM with your repo cloned), and via Routines that run on a schedule, an API call, or a GitHub event.

A concrete example

Say you need to migrate a service from a deprecated payment SDK. In an IDE assistant you would open files one at a time. In Claude Code you say: "Plan a migration from the old Stripe SDK to the new one, find every call site, update them, and keep the tests green." It enters plan mode, greps the repo, lists the 14 files it intends to touch and the order it will do them in. You approve. It edits, runs the suite, sees three failures, reads the stack traces, fixes the mocks, and reruns until green. A PreToolUse hook you set blocks any git push so nothing leaves your machine without you. That end-to-end loop, across the whole repo, is the thing Claude Code does better than editor-bound tools.

Strengths, limits, and when to reach for it

Strengths: deep agentic work, orchestration via subagents, and a large context window that lets it reason across a real codebase rather than a few open tabs. Limits are just as real. It is terminal-first, which assumes comfort with a shell, git, and your test runner. And it costs money: it is bundled with Claude paid plans (Pro around $20/month, Max tiers at roughly $100 and $200/month as of mid-2026), but heavy autonomous runs burn tokens, and API usage for Opus runs around $5 in / $25 out per million tokens (higher in fast mode). Long unattended sessions can rack up cost or hit usage limits.

Takeaway:

  • Reach for Claude Code when the task spans many files, needs real test/run loops, or you want to delegate a whole chunk of work and review the result.
  • Stay in an IDE assistant (Cursor, Copilot) for tight, line-by-line edits where you want a human hand on every keystroke.
  • Whatever you do, use plan mode and at least one hook before you let it run long. Supervised autonomy beats blind autonomy.

Google Antigravity: the agentic IDE

Google launched Antigravity in November 2025, the same week as Gemini 3, and it is the clearest statement yet of where Google thinks AI coding is going. Where Cursor started as a code editor that grew an agent, Antigravity starts from the opposite premise: the agent is the product, and the editor is one of several surfaces it operates. It is free during the public preview, and it runs as a desktop app on macOS, Windows, and 64-bit Linux. If you have used VS Code, the bones will feel familiar (Antigravity is a fork), but the default workflow is deliberately different.

Agent-first, not editor-first

The headline surface is the Agent Manager, a "Mission Control" dashboard where you spawn, monitor, and steer several agents at once, each working asynchronously in its own workspace. You can drop into a classic editor view (file tree on the left, code in the middle, terminal at the bottom) whenever you want to read or hand-edit code, but the intended loop is higher up: you describe a task, the agent works across editor, terminal, and browser, and you review the result rather than babysitting every step. The browser piece is the genuinely novel part. Antigravity ships a Chrome integration so an agent can load the app it just built, click through a flow, and record video and screenshots as proof the feature works.

It runs primarily on Gemini 3 Pro, with browser control handled by a Gemini computer-use model. Importantly, it is not Gemini-only: you can switch the agent to Anthropic's Claude (Sonnet and Opus 4.5) or OpenAI's open-weight GPT-OSS models, which matters because no single model is best at everything.

Artifacts: the trust layer

Antigravity's best idea is Artifacts. Instead of dumping raw tool calls into a chat log, the agent produces reviewable documents: a task list, an implementation plan, code diffs, and a final walkthrough summarising what changed and how to test it, often with the browser recording attached. You comment on these the way you would on a Google Doc, and the agent reads your feedback on its next run. That asynchronous, document-style review loop is the part that genuinely improves on Cursor's chat-only feedback, and it is well suited to a builder reviewing work between meetings rather than watching a cursor move.

A concrete example

Say you want a password-reset flow added to a Next.js app. In Antigravity you would write the task once, and the agent drafts a plan you can edit (add the token model, the email send, two pages, an expiry check), scaffolds the code in the editor, runs the dev server in the terminal, then opens the app in Chrome, requests a reset, follows the emailed link, sets a new password, and logs in. It records that run, screenshots the success state, and hands you a walkthrough. You spot in the recording that expired tokens still work, leave a comment on the artifact, and the agent patches the expiry check and re-verifies, no re-prompting from scratch.

Strengths, limits, and who it suits

The strengths are real: built-in browser verification, multi-agent orchestration, the artifact review model, model choice, and a free preview. The limits are equally real. Rate limits have been the loudest complaint since launch, with free-tier users hitting cooldowns (windows that reset every few hours), so heavy use can stall. It is newer and less battle-tested than Cursor, the agent-first default has a learning curve if you just want to edit one file, and pushing agents to run terminal commands and drive a browser autonomously is exactly the surface where you must read diffs and scope permissions carefully.

  • Versus Cursor: Cursor is still the smoother in-editor, human-in-the-loop experience. Antigravity wins on asynchronous, verified, multi-agent work.
  • Versus Claude Code: Claude Code is a terminal-native agent you compose into your own workflow. Antigravity gives you a full GUI cockpit with built-in browser proof.
  • Who it suits: builders comfortable delegating whole tasks and reviewing artifacts, teams doing web work that benefits from automated end-to-end checks, and anyone already invested in Gemini.

Takeaway: try Antigravity for the artifact-and-browser-verification loop on a non-critical web feature, but keep reviewing every diff and watch the rate limits before you bet a deadline on it.

GitHub Copilot: the incumbent

If Cursor is the upstart that redefined what an AI editor could be, GitHub Copilot is the incumbent that got there first and never left. Launched in 2021 as a humble autocomplete, it now sits inside the place most professional code already lives: GitHub, owned by Microsoft, with Microsoft's commercial and security machinery behind it. For a lot of teams, that single fact decides the question. You do not have to get a new vendor through procurement, and your security team already trusts the name on the invoice.

What it actually does in 2026

Copilot is no longer one feature. It is a stack:

  • Completions: the original grey ghost text that finishes your line or block as you type. Still the most-used surface, and still genuinely good for boilerplate, tests, and the obvious next line.
  • Chat: a sidebar conversation that knows your open files and repo, for "explain this", "why is this failing", or "refactor this function".
  • Agent mode: the Cursor-style autonomous loop inside your editor. You give it a task, it decides which files to touch, edits across many of them, runs terminal commands, reads the output, and iterates until tests pass. It went generally available in VS Code and JetBrains around early 2026.
  • Coding agent on github.com: the part that is uniquely Copilot. You assign a GitHub issue to Copilot the way you would assign a colleague. It spins up its own sandboxed environment on GitHub Actions, writes the code, runs the tests, and opens a pull request for you to review. No editor required. You can fire it from the issue, from an agents panel on any GitHub page, or with a "delegate to coding agent" button in VS Code.

Model choice

Copilot stopped being a single-model product a while ago. The model picker now spans multiple vendors: Anthropic's Claude (Opus and Sonnet tiers), OpenAI's GPT-5 series, Google's Gemini, and Microsoft's own in-house models. You switch per task: a cheaper, faster model for routine edits, a heavyweight like Claude Opus for a gnarly multi-file refactor. This is a real strength. You are not betting your workflow on one lab's roadmap, and you can dial cost against capability without leaving the tool.

A concrete example

Say a bug report comes in: a date-formatting helper breaks for users in non-UTC timezones. You open the issue on github.com, type a one-line clarification, and assign it to Copilot. A few minutes later you get a pull request: the helper is patched, a new regression test covers the timezone case, the existing suite passes, and the PR description explains the change. You review it like any human PR, request one tweak in a comment, and the agent pushes a follow-up commit. You never opened your editor. That round trip, issue to reviewable PR without a human writing the first draft, is the thing Copilot does that pure editor tools do not.

Strengths and limits, honestly

The strengths are ubiquity, trust, and ecosystem. It runs in VS Code, Visual Studio, JetBrains, Neovim, and on github.com itself. Enterprise admin controls, audit logs, and policy management are mature because Microsoft has sold to enterprises for decades. If your code, issues, CI, and reviews already live on GitHub, Copilot is the option with zero integration friction.

The honest limit: for a long stretch, Copilot felt more conservative and less aggressive than Cursor, slower to ship the bleeding-edge agent behaviour that power users wanted. That gap has narrowed sharply through 2025 and into 2026 with agent mode and the coding agent, but the instinct still holds. Copilot optimises for the median enterprise developer and for not breaking things, where Cursor optimises for the power user who wants maximum autonomy. Which you prefer says more about your risk appetite than about raw capability.

Pricing, approximate and time-stamped (mid-2026)

Tiers, roughly: a Free plan (a couple thousand completions and around 50 agent or chat requests a month), Pro at about $10/user/month, Pro+ around $39/user/month for heavy agent users who want the premium models, Business near $19/user/month, and Enterprise around $39/user/month with codebase customisation. As of June 2026 GitHub moved usage-based billing to "AI Credits" priced by tokens processed, so heavy agent use is metered on top of the seat. Treat every number here as approximate and check the official page at github.com/features/copilot before you commit.

Takeaway: if your team already lives on GitHub and your blocker is trust, procurement, or governance rather than raw model horsepower, Copilot is the path of least resistance, and in 2026 it is no longer the underpowered choice it once was.

OpenAI Codex: CLI and cloud agent

Codex is a confusing name with two lives. The first was a 2021 model that powered the original GitHub Copilot and was quietly retired. The name you care about is the 2025 revival: an agentic coding product OpenAI relaunched in May 2025, now powered by GPT-5-class models specifically tuned for software work. If you think of Codex as "the old autocomplete thing", recalibrate. Today it is a direct competitor to Cursor's agent and to Anthropic's Claude Code, and it comes in two shapes that share one brain.

The two shapes: CLI and cloud agent

The Codex CLI is an open-source terminal agent (written in Rust, repo at github.com/openai/codex). You install it, point it at a directory, and it reads, edits, and runs code on your machine, asking permission before risky actions depending on the approval mode you choose. It is the same mental model as Claude Code or the Gemini CLI: a loop that plans, edits files, runs the tests, reads the failure, and tries again. It supports MCP servers for extra tools, can review your own diff with a separate reviewer pass before you commit, and reads an AGENTS.md file in your repo for project-specific instructions (build commands, conventions, what not to touch).

The cloud agent is the more distinctive half. From the Codex web UI, the VS Code extension, or ChatGPT, you hand it a task and it spins up an isolated sandbox container, clones your repo, works autonomously for minutes, and hands back a diff or opens a pull request. Because each task runs in its own sandbox, you can fire off several in parallel: "fix this failing test", "add the pagination endpoint", "bump these dependencies and make the build pass", all at once, then review the PRs when they land. This is the part that feels genuinely different from a local agent: it is less a pair-programmer and more a queue of junior contractors you delegate to and review later.

Models, access, and rough pricing

Codex runs on OpenAI's GPT-5 family, with code-specialized variants (marketed as GPT-5-Codex, and as of mid-2026 a GPT-5.5-class flagship plus cheaper mini variants for lighter work). The exact suffixes churn almost monthly, so do not anchor on a version number; anchor on the tier you are paying for.

The pricing model that matters: Codex is bundled into ChatGPT subscriptions rather than sold separately. As of mid-2026, signing in with a ChatGPT account gets you Codex on every plan, with usage governed by rate limits in a rolling 5-hour window:

  • Free / Go (around $0 to $8/month): enough to try it on small tasks, not enough to lean on.
  • Plus (around $20/month): a few focused coding sessions a week. The realistic entry point for a solo founder.
  • Pro (from around $100/month): roughly 5x or 20x the Plus limits. This is where daily, all-day use lives.
  • API key: pure token-based pricing if you want to drive the CLI without a subscription, billed per token used.

One honest gotcha: in 2026 the included usage shifted toward token-based accounting, so a sprawling multi-file refactor can quietly burn several times the budget of a one-line fix even though both count as "one task". Watch your usage meter the first week; it teaches you to scope prompts tighter.

A concrete example

Say you maintain a Next.js app and a flaky integration test has been failing on CI for a week. Instead of context-switching into it yourself, you open the Codex cloud agent, point it at the repo, and write: "The test in checkout.test.ts fails intermittently. Find the race condition, fix it, and make CI green. Do not change the public API." Codex clones the repo into a sandbox, runs the test until it reproduces the failure, traces it to an unawaited promise, patches it, reruns the suite to confirm it is stable, and opens a PR with the diff and a short explanation. You review the change in five minutes rather than spending an afternoon reproducing a Heisenbug. If you disagree with the fix, you comment and it iterates. That delegate-and-review loop is Codex's sharpest use case.

Where it fits, and where it does not

Codex shines if you already live in the OpenAI ecosystem, want parallel autonomous tasks running against a real repo, and value a strong sandbox-and-PR workflow over inline editing. It is weaker as a moment-to-moment "watch me type" assistant; tools like Cursor still feel more fluid for that. And like every agent in this article, it will confidently produce plausible-but-wrong code, so the review step is not optional, it is the job.

Practical takeaway: if you already pay for ChatGPT Plus or Pro, you already have Codex. Use the cloud agent for well-scoped, reviewable tasks (bug fixes, test repair, dependency bumps) you can delegate, and keep the CLI for hands-on work in your own terminal. Always read the diff before you merge.

xAI Grok Code: the speed-and-price play

Most of this article has steered you toward frontier models for hard problems, and that advice stands. But not every coding task is hard. A lot of agentic work is mechanical: rename a symbol across forty files, write the obvious test, scaffold a CRUD endpoint, fix the lint error the agent itself just introduced. Paying frontier prices and waiting on frontier latency for that work is like hiring a senior architect to move boxes. This is the niche xAI built grok-code-fast-1 for, and it is worth understanding because it changes how you think about model selection inside an agent loop.

What it actually is

xAI shipped grok-code-fast-1 in late August 2025 (it ran in stealth first under the codename "Sonic"). The pitch is in the name: a model purpose-built for the agentic inner loop, optimized for throughput and cost rather than raw reasoning. As of mid-2026 the API pricing is roughly $0.20 per million input tokens, $1.50 per million output tokens, and about $0.02 per million cached input tokens (approximate, time-stamped, and likely to drift). It carries a 256K-token context window and runs fast: xAI quotes output speeds well above what frontier reasoning models deliver, on the order of 90 to 190 tokens per second depending on how you measure. xAI reports around 70.8% on SWE-Bench-Verified using its own internal harness, which puts it firmly in the competent-but-not-frontier tier.

The cheap cached input price is not a footnote, it is the whole point. Agentic coding replays a growing context (the same files, the same instructions) on every turn. Cline reports cache hit rates above 90% in typical loops, which means most of your input tokens bill at the $0.02 rate, not the $0.20 one. The economics of "let the agent take twenty small steps" suddenly look very different.

Where you can use it

It is available directly via the xAI API, and as a selectable model inside most of the tools covered earlier in this article: Cursor (cursor.com), Cline, GitHub Copilot, Windsurf, Roo Code, Kilo Code, and opencode were all launch partners, with a free promotional window at launch. In practice you pick it from the same model dropdown where you would otherwise choose Claude or GPT. Note that xAI's broader lineup moves fast and prunes aggressively (the separate grok-4.1-fast reasoning model, for instance, was already deprecated by May 2026), so always confirm the current model id before wiring it into anything.

A concrete example

Say you are migrating a React codebase from a deprecated date library to a new one. The plan is genuinely hard: which call sites have timezone bugs, what the edge cases are, how to handle the one component that relies on the old library's mutation behavior. That is a job for a frontier model. But once you and the strong model have agreed the plan, the execution is 120 nearly identical mechanical edits across the repo. Hand that execution phase to grok-code-fast-1 in Cline: it churns through the find-replace-and-verify loop at high speed, caches the unchanged context cheaply, and you babysit a diff instead of a thought process. You spent frontier money on the 10% that needed judgment and pennies on the 90% that did not.

The honest limits

There is a real capability ceiling. Push it past mechanical work into novel architecture, subtle concurrency bugs, deep multi-file reasoning, or anything needing strong general-world knowledge, and it will confidently produce plausible-looking wrong code, which is exactly the hallucination failure mode this whole article warns about. It is fast at being wrong, too. Treat it as a fast, cheap pair of hands, not a brain you can stop checking.

Practical takeaway for a multi-model workflow:

  • Plan with a frontier model, execute with the fast one. Use Claude or a comparable model to design the change, then switch the agent to grok-code-fast-1 for the grunt work.
  • Reach for it on high-iteration, low-ambiguity loops: scaffolding, test generation, routine refactors, lint and type-error cleanup.
  • Do not reach for it on the hard 10%: architecture, security-sensitive logic, gnarly debugging. The price you save is not worth a subtle bug.
  • Verify regardless. Cheaper output means more output to review, not less reviewing.

The rest of the field worth knowing

The headline tools (Cursor, Claude Code, Lovable, Bolt, v0, Replit) cover most of what you will reach for. But the field is wider than that, and a few of the runners-up are genuinely better for specific jobs. Here is an honest map: one or two lines each on what makes the tool distinct and who should care. As always in mid-2026, names and ownership shift monthly, so treat status as a snapshot.

Agentic IDEs

  • Windsurf (formerly Codeium): the other serious agentic IDE next to Cursor. Its Cascade agent reads your repo and makes multi-file edits with a clean, low-friction flow. Worth knowing the ownership saga: an OpenAI deal fell through in 2025, then Cognition (the Devin company) acquired Windsurf and now pairs the local IDE with Devin as a cloud agent. For founders who want a Cursor alternative with a tighter local-to-cloud handoff.
  • Zed: a blisteringly fast editor written in Rust, built for real-time multiplayer. Multiple humans and multiple agents share one buffer, and it speaks the open Agent Client Protocol so you can plug in Claude, Gemini, or external CLI agents. Hit its 1.0 in April 2026. For engineers who care about latency and want pair-programming (human or AI) without the Electron bloat.

Open-source terminal and editor agents

  • Aider: free, open-source pair programmer that lives in your terminal and is git-native. You describe a change in plain English; it edits the files and makes an atomic commit with a sensible message. Vendor-agnostic: point it at Claude, GPT, Gemini, DeepSeek, or a local model via Ollama. For developers who want clean git history and zero lock-in.
  • Cline: open-source autonomous agent that runs inside VS Code with human-in-the-loop approval on every step. Model-agnostic, conservative, and stable enough for real work. Its more aggressive fork, Roo Code, adds multi-mode personas (Architect, Code, Ask, Debug) for people who like more knobs. For VS Code users who want an agent they can audit and stop, not a black box.

Async and autonomous agents

  • Amp (Sourcegraph): an agent built on top of Sourcegraph's industrial-strength code search, available as a CLI and a VS Code extension. The pitch is context: on a large or unfamiliar codebase, it finds the right files before it touches anything. It runs on a usage/credit model with a free tier and team plans. For teams working in big, sprawling repos where context retrieval is the hard part.
  • Jules (Google): an asynchronous agent that runs Gemini in an isolated cloud VM. You queue a task from a GitHub issue or a description, walk away, and a pull request shows up later. It never auto-merges, so nothing lands on main without your review. For people who think in terms of a backlog of small, well-scoped tickets rather than live chat.
  • Devin (Cognition): the most autonomous of the lot, marketed as an AI software engineer that takes a task end to end (plan, write, test, open a PR). It is the priciest tier in this space and works best on bounded, well-specified work, not vague exploration. Cognition reports that the bulk of its own internal code is now committed by Devin, which is a real signal and a marketing line at the same time. For teams ready to delegate whole tickets and review the output like you would a contractor's.

Practical takeaway: do not collect tools. Pick one driver in each mode you actually use, then stop.

  • Live, in-editor: Cursor or Windsurf.
  • Terminal and git-first: Aider or Claude Code.
  • Auditable VS Code agent: Cline.
  • Fire-and-forget tickets: Jules or Devin.

The differences between any two of these matter less than your own discipline: a clear spec, small diffs, and a human reading every change before it merges. The tool is a steering wheel, not a destination.

Choosing, and what it costs3 chapters · ~11 min

Choosing among them: a decision guide

By now you have seen the whole field: prompt-to-app builders, design-first generators, IDE assistants, and terminal agents. The honest answer to "which one should I use" is "it depends on the job in front of you." But "it depends" is a cop-out without a framework, so here is one. Match your situation to a primary tool, not the other way around.

The table

Your situationPrimary toolWhy
Non-technical founder building an MVP to test an idea Lovable or Bolt Prompt-to-app with hosting, auth and a database wired in. You describe the product, get a working URL, and never touch a terminal. Lovable leans into a guided React + Supabase stack; Bolt lets you pick the framework.
Designer building UI or a polished front end v0 It generates clean React components (forms, tables, cards) you can refine visually and drop into a Next.js project. It is a component and interface generator, not a full-app builder, which is exactly what you want here.
Solo dev shipping and maintaining a real SaaS Cursor IDE-first, with fast tab autocomplete and a multi-file agent (Composer), plus model choice (Claude, GPT, Gemini). You keep your hands on the code while moving fast, and you own the repo on GitHub.
Team working on a large, established codebase Claude Code (or Codex) Terminal-native agents shine at autonomous multi-file refactors, codebase exploration, test generation and CI work. Claude Code's large context window (up to roughly 1M tokens on higher tiers as of early 2026) helps it reason across many files at once.
A quick script or one-off automation Claude Code or Cursor in agent mode Describe the task, let the agent write and run it. For throwaway glue code the overhead of an app builder is wasted; a terminal agent gets you a working script in one pass.
Learning to code Replit + Cursor Replit gives you a zero-setup cloud environment to run things; using an IDE assistant in "explain, then suggest" mode (not full autopilot) lets you read every change and actually build a mental model. Autopilot teaches you nothing.

A concrete example

Say a non-technical founder validates a niche scheduling tool. She starts in Lovable, gets a working booking flow and a landing page live in a weekend, and lands her first ten paying users. Three months later the codebase has grown, an edge case keeps breaking checkout, and she hires a contractor. He exports the repo, opens it in Cursor for day-to-day feature work, and reaches for Claude Code when he needs to untangle the messy auth logic the builder generated. Same product, three tools, each used where it earns its keep.

The meta-point

Pick by three things: the job (throwaway script vs product you will maintain), your skill (can you read and correct the output, or only describe what you want), and your codebase size (greenfield vs a hundred-thousand-line repo). And expect to use more than one. These tools are not religions, and the people shipping fastest are not loyal to a single logo: they start in a builder, graduate to an IDE assistant when the code matters, and call in a terminal agent for the heavy refactors. Fluency means knowing which tool the moment calls for.

That is the "what." a multi-provider setup turns to the "how well": once you are building with these tools, how do you keep the output honest, tested, and safe to ship?

Understanding usage limits before you hit the wall

Nothing kills momentum like a wall you did not see coming. You are three hours into a build, the agent is finally doing the thing, and a banner tells you you are out of quota until Thursday. The frustration is real, but most of it comes from not understanding the meter. AI coding tools bill in at least four different currencies, and they rarely line up. Let us make the meter legible so you can plan around it instead of slamming into it.

The four units, and why they confuse everyone

  • Tokens are the atomic unit. Roughly a token is three quarters of a word; your code, your prompt, the model's reply, and everything the agent reads from your files all count. This is what the model providers actually charge for, and almost every other unit is a wrapper around it.
  • Messages or requests are coarse counts of turns. Older plans capped "messages per day", which felt simple but hid the fact that one message in a giant repo costs far more than one message in a hello-world.
  • Credits or usage dollars are a prepaid pool that drains as tokens get spent. Cursor, for example, moved to credit-based billing in 2025: each paid plan includes a monthly pool, and a hard request can cost roughly ten times a simple one because newer models burn more tokens on longer tasks. On the Pro plan that pool covers on the order of a couple hundred Claude Sonnet requests in a typical month, fewer if your requests are heavy.
  • Compute units or rate-limit windows govern how fast you can spend, separate from how much. Claude Code uses a dual system: a rolling five-hour window plus a weekly cap, each measured in token and compute budget rather than message count.

Context windows are a different limit entirely

The context window is how much the model can hold in its head at once, not how much you are allowed to use over time. As of mid-2026 frontier models commonly offer context windows of a few hundred thousand tokens. Two things matter here. First, a big window does not mean free: you pay for every token you stuff into it, so dragging your whole codebase into context on every turn is expensive and often makes the model worse, not better. Second, when you exceed it the tool silently drops or summarizes earlier context, which is a common source of the agent "forgetting" a decision you made twenty minutes ago.

What resets, and when

This is where people get burned. Take Claude Code's structure as a durable example (numbers shift, the shape holds). The five-hour window starts with your first message and tracks every token, in and out, for the next 300 minutes, then resets automatically. Sitting on top is a weekly budget that restarts every seven days. You can be fine on the five-hour meter and still be locked out because the weekly budget is drained, or vice versa. Credit-based tools like Cursor reset on a monthly billing cycle instead. So before you commit to a deadline, know which clock you are racing and when it rolls over.

Why agents burn quota so much faster than chat

This is the single biggest source of surprise walls. In a chat window, one of your messages equals one model call. An agentic tool does not work that way. When you tell an agent "fix the failing tests", it might read ten files, run the test suite, read the errors, edit three files, re-run, read again, and loop. That single instruction can fan out into twenty or more model calls, each one re-reading context. A ten-minute autonomous agent run can quietly spend what would have been a full day of chat-style quota. The work is real and often worth it, but the meter does not care that you only typed one sentence.

Practical ways to avoid the wall

  • Match the model to the task. Use a cheaper, faster model for boilerplate, renames, and small edits; save the expensive frontier model for genuinely hard reasoning. Many tools let you switch per request.
  • Scope your context. Point the agent at the two or three relevant files instead of the whole repo. Less context is cheaper and usually produces better answers.
  • Watch the dashboard early. Most tools show remaining quota or spend. Check it at the start of a session, not when the banner appears.
  • Break big agent runs into checkpoints. A reviewed, committed step every fifteen minutes beats one hour-long autonomous run that burns your week and produces something you have to throw away.
  • Keep a fallback. If you hit a hard cap mid-deadline, having a second tool or an API key with pay-as-you-go billing means you are slowed, not stopped.

The takeaway: limits are not the enemy, surprise is. Learn which currency your tool bills in, which clock resets when, and that agents spend at a wildly different rate than chat. Do that, and the wall stops being a wall and becomes a budget you actually control.

Reading the pricing honestly

Every AI coding tool wants the same thing from you: a recurring charge. But the shape of that charge varies wildly, and the shape is what bites. There are three rough models, and most tools now blend them.

The three pricing models

  • Per-seat subscriptions. A flat monthly price per person. Predictable, easy to budget, and the default for editor-style tools. Cursor's individual Pro plan sits around $20/month (as of mid-2026), with Pro+ near $60 and an Ultra tier around $200. Team seats run roughly $32 to $40 per seat per month.
  • Usage-based (pay per token). You pay for what the model actually reads and writes. Anthropic's API prices Claude Opus 4.8 at roughly $5 per million input tokens and $25 per million output tokens (mid-2026), with cheaper Sonnet and Haiku tiers. Honest, but uncapped: a chatty agent on a big codebase can run up real money fast.
  • Credit systems. An abstraction layer on top of tokens. Lovable sells "credits" (Free gives around 30/month, Pro near $25/month), v0 meters input and output tokens into a credit balance, and Bolt prices in tokens directly from around $18/month. Credits feel tidy until you realize you cannot easily predict how many a given feature will consume.

The catch with all three: modern tools run agentically. One prompt does not equal one model call. An agent reads files, plans, writes code, runs tests, reads the failure, and tries again, sometimes looping a dozen times. Each loop spends tokens or credits. Replit's billing makes this brutally visible: Agent checkpoints have been billed around $0.25 each, so a single feature that takes twenty checkpoints is a real, itemized $5, not a rounding error.

Cost per shipped feature, not cost per month

The right unit is not your monthly bill. It is the cost to ship one working thing. A $200/month plan that ships you four features a week is cheaper per feature than a $20 plan where you burn two days fighting a model that keeps looping on the wrong file. Cheap-but-slow is the most expensive option there is, because your time is the dominant cost.

This reframes the classic trap. The cheap plan is a false economy when your bottleneck is throughput: if you hit credit limits mid-feature and sit idle waiting for a reset, or you downgrade to a weaker model that produces code you have to rewrite, you are paying with hours to save dollars. The expensive plan is wasted when you are an occasional builder: a $200 Ultra or Max plan is dead money if you ship one small thing a month, and a $20 plan plus careful prompting would cover you with room to spare.

A back-of-envelope MVP

Say you are building a typical SaaS MVP: auth, a dashboard, a CRUD database, Stripe checkout, and a marketing landing page. A realistic mid-2026 path:

  • Landing page and UI scaffolding in v0 or Lovable on a ~$25/month plan. One month, maybe two, of credits to get the front end looking right: call it $25 to $50.
  • App logic, database wiring, and Stripe in Cursor Pro (~$20/month) or Claude Code on a Pro/Max plan ($20 to $100/month), where the agentic loops on real backend code live. Budget one to two months of focused work: $40 to $200 depending on how much frontier-model time you burn.
  • Overflow token spend when you exceed the included pool on a heavy week: usually tens of dollars, occasionally over $100 if you let an agent thrash.

So a solo MVP plausibly lands somewhere around $150 to $500 in tool spend, spread over six to ten weeks. The variance is almost entirely behavior, not list price: tight scoping and short agent runs sit at the low end, while "let it run and hope" sits at the high end or worse.

Takeaway

  • Match the model to your rhythm: per-seat for steady daily building, usage-based when work is bursty, credits only if you can stomach unpredictable consumption.
  • Set a hard spend cap or alert wherever the tool allows it. Uncapped agentic loops are how surprise bills happen.
  • Track cost per shipped feature for a week. It will tell you faster than any pricing page whether your plan is too cheap, too rich, or just right.

Picked your tool? If it hosts everything for you, head to the all-in-one path. If you are going to own the stack, see the full-control path. Either way, do not skip security.

Take this with you
Choose your AI coding tools, as a downloadable skill

Give it to your AI agent (Claude Code, Cursor, Copilot) and it applies this to your own project, locally.

↓ Download skill All skills →
← All deep dives