type
status
date
slug
summary
tags
category
icon
password
In recent months, many AI programming tools have introduced Agent Mode, allowing users to set custom Agent Rules to guide the agent's behavior. This enables users to use natural language to ask the agent to help with development—even those who don't know how to code. This model of development, known as “Vibe Coding,” is gradually becoming more mature.
I have tested several of the most popular AI programming tools and even used Cursor to build a lightweight frontend project without writing a single line of code myself. This article is a record of the experiences and thoughts I gathered during that process, in hopes it may be helpful.
Since AI technology is still evolving rapidly, this blog post may only be relevant for a short time (as of June 2025).
Table of Contents
Table of ContentsExploring Different ToolsCursor and Windsurfv0DevinFurther Discussion & My Final ChoiceFinal VerdictWhat Determines Core Ability?External Tool IntegrationAgent RulesThe Underlying Model’s PerformanceSummaryWhat Are Vibe Coding Tools Made Of?Which Tool to Use? It Might Not Matter SoonCurrent LimitationsPoor Understanding of Visual DesignWeak Runtime DebuggingPoor Project Architecture ManagementCursor + Rules = Devin? How to Write a Good Agent RuleLooking AheadA Flood of Low-Quality Apps Might Be ComingIdeal Workflow for Indie DevelopersIdeal FlowIdeal Skill TreeWill AI Replace Programmers?As mentioned above, there are still many limitsReliability is also a concernWhat about junior devs and interns?
Exploring Different Tools
When I first started exploring AI programming tools, I focused mainly on Cursor, Windsurf, v0, and Devin.
Cursor and Windsurf
The first impression I got from Cursor and Windsurf was that they gave large language models the ability to directly read documents, code, and even access the file system. I remember a year ago, I still had to use shell commands like
tree
to manually paste the project structure for the model to understand it.These tools have improved over time. For example, Cursor used to require an
@codebase
command to understand the project structure (essentially calling the model multiple times to summarize each file and send that summary to the main chat), but now under Agent Mode, Cursor can automatically read the relevant parts of the project on its own.In Agent Mode, the model is given the ability to analyze and execute tasks with clearly defined success criteria. That means when a user asks for something, the agent will analyze the request, plan out steps, select and use tools, and adjust based on feedback—until it believes it has achieved the goal. I believe this is the core of all current AI programming tools and the foundation for Vibe Coding to actually work. But whether an agent can truly understand and complete a task still depends heavily on the underlying model and how well the rules guide it (we’ll discuss that more later).
Overall, Cursor and Windsurf greatly extend the boundaries of LLM programming ability by bundling in tools like file system access, terminal control, and web search. Agent Mode further enhances their ability to understand needs, define success, and take action autonomously.
v0
v0 represents a category of lightweight AI web development tools. Similar products include Figma AI and Google's Stitch. I only briefly tried out these tools because, although they can quickly turn mockups into static web pages, they lack control and flexibility when adding or refining features later on.
Devin
Devin is a different kind of tool. I could really feel the ambition behind it: to achieve fully end-to-end automated development. It’s even officially marketed as an "AI software engineer."
For example, if I ask Devin to build software, it will start with requirement analysis, then gradually build a development plan and carry out each step—setting up the environment, writing code, testing, etc. Every step is visualized, and I can view a progress timeline and even go back through its actions. I can interrupt it anytime, give suggestions, or make new requests.
Devin’s strong abilities are said to come from its long-term reasoning and planning skills, allowing it to execute complex tasks involving hundreds or thousands of steps and remember the details. From what I’ve seen, this is achieved by dynamically updating its planning documents and using RAG-based memory. Also, Devin interacts very well with the browser—it can browse for information and even "see" what’s happening on a webpage.
In short, I’m impressed by Devin’s power and its UI/tool integration. But it’s expensive. Thankfully, the recent change to a per-ACU pricing model let me try it at a low cost—before that, $500/month was too much.
Further Discussion & My Final Choice
Final Verdict
In the end, I chose Cursor as my main Vibe Coding tool. Tools like v0 didn’t offer enough control or extensibility. Devin was great, but too expensive. I also found that with a good MCP tool and a well-written Cursor Rule, Cursor could achieve around 90% of Devin’s capability—which made it much more cost-effective.
Between Cursor and Windsurf, I initially thought they were similar. But in a video by AI Evolution - Huasheng, I learned that Windsurf performs better in task understanding and multi-step execution, while Cursor is stronger in understanding code files. Based on my needs, I went with Cursor, since multi-step execution can be improved with the right Rules.
As a developer, I don’t need a full end-to-end solution. I prefer being in control of each step the AI takes. I know what I’m asking the AI to do and which part of the code it’s working on. So, it’s crucial that the AI can deeply understand code and follow instructions.
Based on all this, my tool recommendation for developers is:Cursor > Windsurf > DevinFor non-developers or those without a programming background:Devin > Windsurf > Cursor(Only considering Agent Mode development.)
What Determines Core Ability?
After testing these tools, I think the key differences in their capabilities come from the following areas (listed in increasing order of importance):
External Tool Integration
Most Vibe Coding tools now support web search, file access, terminal usage, and even browser interaction. This transforms LLMs from passive language models into active programming agents. Devin is the best at this, with excellent browser interaction and GitHub integration.
However, with the rise of MCP protocols, integration gaps between tools are shrinking. For example, I used the Browser-tools MCP to let Cursor Agents operate the browser and view the console for debugging—making Cursor nearly equal to Devin in browser usage.
Agent Rules
One major insight I gained was that a well-written Cursor Rule can give Cursor about 90% of Devin’s capabilities. The core idea is to guide Cursor to maintain a planning document and perform multi-step decisions until the goal is reached.
Later, I found a GitHub project claiming to have collected the full system prompts from tools like v0, Cursor, Devin, Windsurf, and others:
From this, I learned that many of these tools are largely driven by well-structured internal prompts, defining tool usage and behavioral logic. Some of these prompts are surprisingly simple—but it’s likely that the tools also have mechanisms outside the model for task polling, validation, or memory (e.g., RAG).
This inspired me to go a step further: if the official prompts are incomplete, I can wrap my own prompts into the Cursor Project Rules—importing some of Devin’s logic into Cursor to push it even closer to Devin’s level.
The Underlying Model’s Performance
No matter how good the prompts are, the final result still depends on the model’s core performance.
Cursor became popular after Anthropic released Claude Sonnet 3.5, which significantly improved programming ability and long-context memory. That made its code output more stable and controllable.
Let’s face it—the rise of Vibe Coding is mainly driven by improvements in LLMs themselves.
I’ve tested many LLMs: GPT-4.1, GPT-0.3, Gemini 2.5 Pro, Claude 3.7, Claude 4, and more. Overall, Claude models are the most reliable and controllable for programming tasks. Other models often forget rules, skip steps, or behave erratically. For now, I mostly use Claude 4 Sonnet. When I hit tough problems, I switch to Claude 4 Opus—once, it helped me locate a complex bug caused by recursive function calls across multiple files, which Sonnet failed to identify even after 10+ attempts.
So in short: the success of Vibe Coding largely depends on the model. Its ability to reason, follow long-term plans, and stick to rules are especially critical.
Summary
What Are Vibe Coding Tools Made Of?
Here’s how I think of them:
- LLM: The core capability
- Tools (MCP, plugins, etc.): External powers
- Prompts/Rules/RAG: Behavioral guidance
- Schedulers/Monitors (speculative): Goal tracking mechanisms
Which Tool to Use? It Might Not Matter Soon
As LLMs continue improving and MCP tools mature, differences between AI coding tools may disappear quickly. A few months from now, today’s comparisons might be irrelevant.
Current Limitations
As of now, I don’t recommend users without a programming background—especially those lacking system-level engineering experience—to try building full software using Vibe Coding.
Small personal websites or games might be okay. But when it comes to more complex systems (e.g., frontend + backend, databases, cloud), even Devin can’t build them fully autonomously without human involvement.
Manual tuning by experienced engineers is still essential. Here are some issues I’ve personally faced:
Poor Understanding of Visual Design
For example, small UI alignment tweaks or color changes that I can fix in 2 minutes often take multiple failed interactions with the agent. Current models can "see" image content, but struggle with fine-grained UI issues like spacing or alignment.
Even when I have the agent take a screenshot using Browser-tools, correctly explain the issue, and offer a fix—it still may not solve it correctly.
Weak Runtime Debugging
When facing runtime bugs, agents often lack solid debugging tools. They tend to brute-force read the entire codebase.
In web development, I work around this by having the agent design a logging system and use Browser-tools to check the console. In other domains, I might save logs locally. But when it gets tricky, I still end up manually debugging and pointing the model toward likely causes.
A non-engineer would likely get stuck here.
Poor Project Architecture Management
In my first Vibe Coding project, Cursor wrote over 2,000 lines in
main.js
. Later, I asked it to refactor, which introduced lots of bugs.This kind of thing happens often when expanding a project. Without proper architectural control, it can become a mess. For non-engineers, this could be even worse.
Cursor + Rules = Devin? How to Write a Good Agent Rule
Many of the problems above can be partially solved with a well-designed Agent Rule:
- Long-term memory and planning → Maintain a project plan document
- Codebase understanding → Maintain a file tree and consistent docstrings
- Debugging → Design a logging system from the start
- Complex workflows → Define steps like project analysis → page planning → tech stack selection → component development
I’ll share a separate post about writing good Cursor Rules, including a template.
Essentially, good Rules reflect real engineering knowledge. So writing good Agent Rules requires a deep understanding of your project.
Looking Ahead
A Flood of Low-Quality Apps Might Be Coming
A lot of people with no engineering background are now using AI to build apps. Many of these may have poor security or privacy flaws. Others may just be clones for quick profit. The market may soon be flooded, raising challenges for app stores and regulation.
Ideal Workflow for Indie Developers
If you want to use AI to launch a product today, here’s what I suggest:
Ideal Flow
- Use a strong LLM’s deep research capabilities to explore ideas
- Once validated, plan your tech stack, frameworks, and architecture
- Write a custom Project README and Agent Rules
- Start building with Vibe Coding
- Use AI tools to generate visuals, videos, and marketing materials
- Launch the MVP, gather feedback, and iterate
Maintain your own model memory and document base to support future pivots or expansions.
Ideal Skill Tree
- Thinking
- Product thinking
- MVP mindset: Fast iterations
- Basic Programming
- Scripting basics: variables, functions, debugging
- Frontend/backend fundamentals: APIs, Postman
- Basic databases
- Git and version control
- Cloud: AWS, Azure
- Deployment: Vercel, auto-CI/CD
- Awareness
- Security: Injection, vulnerabilities
- Legal: Privacy, IP, compliance
- AI Usage
- Prompt engineering
- AI memory/docs management
- Marketing
- Growth hacking
- AI-assisted video production
Will AI Replace Programmers?
As mentioned above, there are still many limits
Things that feel intuitive to human programmers—like noticing visual issues in a layout—are still hard for AI. Whether agents can independently handle every development step will determine if they can fully replace human devs.
Reliability is also a concern
LLMs are still probabilistic. We might keep improving them, but the result may never be 100% reliable. For critical systems, skilled human engineers will still be necessary.
What about junior devs and interns?
As AI software engineers mature, companies may reduce demand for junior or intern devs. CS grads may face record-level difficulty finding jobs. Existing junior devs may also risk being laid off.
Embracing AI and shifting toward Agent engineering, product management, or multi-skilled roles (like indie devs) may be the most promising paths in this time of radical change.
- Author:Jingsheng Chen
- URL:https://jingsheng.dev/article/006en
- Copyright:All articles in this blog, except for special statements, adopt BY-NC-SA agreement. Please indicate the source!