Will AI Programming Replace Developers? The Current Capabilities and Limitations of Large Model Programming (Vibe Coding)

type

status

date

slug

summary

Exploring Different Tools

When I first started exploring AI programming tools, I focused mainly on Cursor, Windsurf, v0, and Devin.

Cursor and Windsurf

The first impression I got from Cursor and Windsurf was that they gave large language models the ability to directly read documents, code, and even access the file system. I remember a year ago, I still had to use shell commands like tree to manually paste the project structure for the model to understand it.

These tools have improved over time. For example, Cursor used to require an @codebase command to understand the project structure (essentially calling the model multiple times to summarize each file and send that summary to the main chat), but now under Agent Mode, Cursor can automatically read the relevant parts of the project on its own.

In Agent Mode, the model is given the ability to analyze and execute tasks with clearly defined success criteria. That means when a user asks for something, the agent will analyze the request, plan out steps, select and use tools, and adjust based on feedback—until it believes it has achieved the goal. I believe this is the core of all current AI programming tools and the foundation for Vibe Coding to actually work. But whether an agent can truly understand and complete a task still depends heavily on the underlying model and how well the rules guide it (we’ll discuss that more later).

Overall, Cursor and Windsurf greatly extend the boundaries of LLM programming ability by bundling in tools like file system access, terminal control, and web search. Agent Mode further enhances their ability to understand needs, define success, and take action autonomously.

v0

v0 represents a category of lightweight AI web development tools. Similar products include Figma AI and Google's Stitch. I only briefly tried out these tools because, although they can quickly turn mockups into static web pages, they lack control and flexibility when adding or refining features later on.

Devin

Devin is a different kind of tool. I could really feel the ambition behind it: to achieve fully end-to-end automated development. It’s even officially marketed as an "AI software engineer."

For example, if I ask Devin to build software, it will start with requirement analysis, then gradually build a development plan and carry out each step—setting up the environment, writing code, testing, etc. Every step is visualized, and I can view a progress timeline and even go back through its actions. I can interrupt it anytime, give suggestions, or make new requests.

Devin’s strong abilities are said to come from its long-term reasoning and planning skills, allowing it to execute complex tasks involving hundreds or thousands of steps and remember the details. From what I’ve seen, this is achieved by dynamically updating its planning documents and using RAG-based memory. Also, Devin interacts very well with the browser—it can browse for information and even "see" what’s happening on a webpage.

In short, I’m impressed by Devin’s power and its UI/tool integration. But it’s expensive. Thankfully, the recent change to a per-ACU pricing model let me try it at a low cost—before that, $500/month was too much.

Further Discussion & My Final Choice

Final Verdict

In the end, I chose Cursor as my main Vibe Coding tool. Tools like v0 didn’t offer enough control or extensibility. Devin was great, but too expensive. I also found that with a good MCP tool and a well-written Cursor Rule, Cursor could achieve around 90% of Devin’s capability—which made it much more cost-effective.

Between Cursor and Windsurf, I initially thought they were similar. But in a video by AI Evolution - Huasheng, I learned that Windsurf performs better in task understanding and multi-step execution, while Cursor is stronger in understanding code files. Based on my needs, I went with Cursor, since multi-step execution can be improved with the right Rules.

As a developer, I don’t need a full end-to-end solution. I prefer being in control of each step the AI takes. I know what I’m asking the AI to do and which part of the code it’s working on. So, it’s crucial that the AI can deeply understand code and follow instructions.

Based on all this, my tool recommendation for developers is:
Cursor > Windsurf > Devin
For non-developers or those without a programming background:
Devin > Windsurf > Cursor
(Only considering Agent Mode development.)

What Determines Core Ability?

After testing these tools, I think the key differences in their capabilities come from the following areas (listed in increasing order of importance):

External Tool Integration

Most Vibe Coding tools now support web search, file access, terminal usage, and even browser interaction. This transforms LLMs from passive language models into active programming agents. Devin is the best at this, with excellent browser interaction and GitHub integration.

However, with the rise of MCP protocols, integration gaps between tools are shrinking. For example, I used the Browser-tools MCP to let Cursor Agents operate the browser and view the console for debugging—making Cursor nearly equal to Devin in browser usage.

Agent Rules

One major insight I gained was that a well-written Cursor Rule can give Cursor about 90% of Devin’s capabilities. The core idea is to guide Cursor to maintain a planning document and perform multi-step decisions until the goal is reached.

Later, I found a GitHub project claiming to have collected the full system prompts from tools like v0, Cursor, Devin, Windsurf, and others:

GitHub - x1xhlol/system-prompts-and-models-of-ai-tools: FULL v0, Cursor, Manus, Same.dev, Lovable, Devin, Replit Agent, Windsurf Agent, VSCode Agent, Dia Browser & Trae AI (And other Open Sourced) System Prompts, Tools & AI Models.

FULL v0, Cursor, Manus, Same.dev, Lovable, Devin, Replit Agent, Windsurf Agent, VSCode Agent, Dia Browser &amp; Trae AI (And other Open Sourced) System Prompts, Tools &amp; AI Models. - x1x...

https://github.com/x1xhlol/system-prompts-and-models-of-ai-tools

From this, I learned that many of these tools are largely driven by well-structured internal prompts, defining tool usage and behavioral logic. Some of these prompts are surprisingly simple—but it’s likely that the tools also have mechanisms outside the model for task polling, validation, or memory (e.g., RAG).

This inspired me to go a step further: if the official prompts are incomplete, I can wrap my own prompts into the Cursor Project Rules—importing some of Devin’s logic into Cursor to push it even closer to Devin’s level.

The Underlying Model’s Performance

No matter how good the prompts are, the final result still depends on the model’s core performance.

Cursor became popular after Anthropic released Claude Sonnet 3.5, which significantly improved programming ability and long-context memory. That made its code output more stable and controllable.

Let’s face it—the rise of Vibe Coding is mainly driven by improvements in LLMs themselves.

I’ve tested many LLMs: GPT-4.1, GPT-0.3, Gemini 2.5 Pro, Claude 3.7, Claude 4, and more. Overall, Claude models are the most reliable and controllable for programming tasks. Other models often forget rules, skip steps, or behave erratically. For now, I mostly use Claude 4 Sonnet. When I hit tough problems, I switch to Claude 4 Opus—once, it helped me locate a complex bug caused by recursive function calls across multiple files, which Sonnet failed to identify even after 10+ attempts.

So in short: the success of Vibe Coding largely depends on the model. Its ability to reason, follow long-term plans, and stick to rules are especially critical.

Summary

What Are Vibe Coding Tools Made Of?

Here’s how I think of them:

LLM: The core capability

Tools (MCP, plugins, etc.): External powers

Prompts/Rules/RAG: Behavioral guidance

Schedulers/Monitors (speculative): Goal tracking mechanisms

Which Tool to Use? It Might Not Matter Soon

As LLMs continue improving and MCP tools mature, differences between AI coding tools may disappear quickly. A few months from now, today’s comparisons might be irrelevant.

Current Limitations

As of now, I don’t recommend users without a programming background—especially those lacking system-level engineering experience—to try building full software using Vibe Coding.

Small personal websites or games might be okay. But when it comes to more complex systems (e.g., frontend + backend, databases, cloud), even Devin can’t build them fully autonomously without human involvement.

Manual tuning by experienced engineers is still essential. Here are some issues I’ve personally faced:

Poor Understanding of Visual Design

For example, small UI alignment tweaks or color changes that I can fix in 2 minutes often take multiple failed interactions with the agent. Current models can "see" image content, but struggle with fine-grained UI issues like spacing or alignment.

Even when I have the agent take a screenshot using Browser-tools, correctly explain the issue, and offer a fix—it still may not solve it correctly.

Weak Runtime Debugging

When facing runtime bugs, agents often lack solid debugging tools. They tend to brute-force read the entire codebase.

In web development, I work around this by having the agent design a logging system and use Browser-tools to check the console. In other domains, I might save logs locally. But when it gets tricky, I still end up manually debugging and pointing the model toward likely causes.

A non-engineer would likely get stuck here.

Poor Project Architecture Management

In my first Vibe Coding project, Cursor wrote over 2,000 lines in main.js. Later, I asked it to refactor, which introduced lots of bugs.

This kind of thing happens often when expanding a project. Without proper architectural control, it can become a mess. For non-engineers, this could be even worse.

Cursor + Rules = Devin? How to Write a Good Agent Rule

Many of the problems above can be partially solved with a well-designed Agent Rule:

Long-term memory and planning → Maintain a project plan document

Codebase understanding → Maintain a file tree and consistent docstrings

Debugging → Design a logging system from the start

Complex workflows → Define steps like project analysis → page planning → tech stack selection → component development

I’ll share a separate post about writing good Cursor Rules, including a template.

Essentially, good Rules reflect real engineering knowledge. So writing good Agent Rules requires a deep understanding of your project.

Looking Ahead

A Flood of Low-Quality Apps Might Be Coming

A lot of people with no engineering background are now using AI to build apps. Many of these may have poor security or privacy flaws. Others may just be clones for quick profit. The market may soon be flooded, raising challenges for app stores and regulation.

Ideal Workflow for Indie Developers

If you want to use AI to launch a product today, here’s what I suggest:

Ideal Flow

Use a strong LLM’s deep research capabilities to explore ideas

Once validated, plan your tech stack, frameworks, and architecture

Write a custom Project README and Agent Rules

Start building with Vibe Coding

Use AI tools to generate visuals, videos, and marketing materials

Launch the MVP, gather feedback, and iterate

Maintain your own model memory and document base to support future pivots or expansions.

Ideal Skill Tree

Thinking

Product thinking
MVP mindset: Fast iterations

Basic Programming

Scripting basics: variables, functions, debugging
Frontend/backend fundamentals: APIs, Postman
Basic databases
Git and version control
Cloud: AWS, Azure
Deployment: Vercel, auto-CI/CD

Awareness

Security: Injection, vulnerabilities
Legal: Privacy, IP, compliance

AI Usage

Prompt engineering
AI memory/docs management

Marketing

Growth hacking
AI-assisted video production

Will AI Replace Programmers?

As mentioned above, there are still many limits

Things that feel intuitive to human programmers—like noticing visual issues in a layout—are still hard for AI. Whether agents can independently handle every development step will determine if they can fully replace human devs.

Reliability is also a concern

LLMs are still probabilistic. We might keep improving them, but the result may never be 100% reliable. For critical systems, skilled human engineers will still be necessary.

What about junior devs and interns?

As AI software engineers mature, companies may reduce demand for junior or intern devs. CS grads may face record-level difficulty finding jobs. Existing junior devs may also risk being laid off.

Embracing AI and shifting toward Agent engineering, product management, or multi-skilled roles (like indie devs) may be the most promising paths in this time of radical change.

Table of Contents