06/08/2025

AI-Assisted Software Development and the “Vibe Coding” Debate by Nick Kelly

AI Tools for Accelerating Software Development

AI-powered coding assistants like GitHub Copilot and ChatGPT have been adopted by developers to help write and review code more quickly. Recent research provides evidence that these tools can enhance productivity in software development:

Productivity Boosts with Copilot: A 2024 study spanning three randomised trials (with ~4,800 developers across Microsoft, Accenture, and a Fortune 100 firm) found that using GitHub Copilot led to a 26% increase in developer productivity on coding tasks. In this field experiment, developers with Copilot completed more pull requests per week on average. Notably, less-experienced programmers benefited the most, as they more readily adopted the AI suggestions and saw larger efficiency gains.
Case Study – Time Savings and Satisfaction: An internal evaluation at ZoomInfo (400 engineers) reported that Copilot helped developers save roughly 20% of coding time, contributing hundreds of thousands of lines of code to the codebase. Developer surveys in this study showed high satisfaction (around 72% positive sentiment) with AI assistance. Engineers appreciated the tool for quickly generating boilerplate code and routine functions, though they also had to spend time reviewing AI output for correctness and consistency. (The main limitations noted were Copilot’s lack of domain-specific knowledge and varying code quality, which required additional human scrutiny during code reviews.)
ChatGPT and LLM-Based Assistants: Large language models like ChatGPT are likewise being used to speed up development. These conversational AI tools can generate code snippets or suggest fixes in response to natural-language prompts, effectively acting as on-demand “copilots.” For example, ChatGPT can produce working code for many programming assignments or help debug errors by explaining issues in plain language. Early studies even suggest ChatGPT can partially handle certain coding tasks autonomously, demonstrating high accuracy on routine programming challenges. However, researchers caution that developers should use these assistants with oversight. Relying on them exclusively can lead to blind spots (e.g. not fully understanding the generated code or missing subtle bugs), so human validation and testing remain critical.

Risks and Critiques of “Vibe Coding” (Prompt-foo, AI-First Coding)

Alongside the excitement for AI-assisted coding, experts have raised concerns about “vibe coding.” This term refers to coding by improvisation with AI; i.e. quickly building software with an LLM’s help without careful review, planning, or architecture. In other words – and put very generously – vibe coding is when a developer trusts the AI’s output on “gut feeling” or convenience, rather than applying deliberate design and rigorous checks. Several studies and industry analyses critique this unstructured approach, warning of risks to code quality and project health:

Code Quality and Maintainability: Rapid AI-generated code can turn into “spaghetti code” – tangled, inconsistent source code that lacks clear structure. Because an AI may solve similar problems in different ways each time, a vibe-coded project can become a patchwork of heterogeneous styles with minimal documentation. This inconsistency accumulates technical debt: one article notes that “AI-generated code often lacks the structure, documentation, and clarity necessary for long-term maintenance. This can lead to increased technical debt, making future modifications and debugging significantly more difficult.” In practice, developers have reported that blindly accepting AI output can yield very poor code (surprise surprise). For instance, one programmer who reviewed an app built via vibe coding was “shocked to see how bad the code was in a lot of places. It looked like it was written by a lot of junior developers, all using different coding practices.” He had trouble tracking down bugs in the AI-written code and eventually reverted to planning the architecture himself before using AI, to avoid such chaos. In short, vibe coding’s speed comes at the cost of maintainable design, with the resulting codebase may be hard to read, fix, or extend over time.
Security Vulnerabilities: A critical risk of vibe coding is the introduction of security flaws. AI models do not inherently understand secure coding practices – they generate code based on patterns in training data, which might include outdated or unsafe examples. Analyses have found that vibe-coded applications often skip essential security steps: for example, they may omit proper input validation, use generic error handling that exposes sensitive info, or pull in third-party packages without vetting their safety. One industry report bluntly stated that “the code these models generate is full of security holes and can be easily hacked.” Likewise, an expert noted that AI-produced code is not only spaghetti, but “insecure spaghetti,” since it frequently draws on snippets from open-source sources that might harbour vulnerabilities. The consequence is that software built by unchecked AI might work on the surface but harbour serious weaknesses (SQL injections, buffer overflows, etc.). Without rigorous reviews, such flaws can slip into production, where they pose risks of data breaches or exploits. Thus, even proponents of AI coding advise never to use raw vibe-coded output in critical production code without extensive security auditing.
Scalability and Performance Issues: Vibe coding tends to prioritise quick solutions over sound architecture, which can hurt scalability. AI-generated code often focuses on making a feature work in the simplest way, ignoring best practices for efficiency. Studies note that systems built by “just winging it” with an AI often exhibit poor resource utilisation and architecture that doesn’t scale well under load. For example, an LLM might generate a data access logic that is fine for a prototype but becomes a bottleneck with real-world data sizes (e.g. unoptimised database queries or memory-intensive routines). There’s also a tendency for vibe-coded projects to evolve into monolithic, tightly coupled code, since the AI isn’t enforcing modular design. All of this means that while vibe coding can whip up a quick demo, that same code may struggle in production at scale, requiring significant refactoring or re-engineering later on. In effect, any initial time saved can be lost when the team has to resolve performance problems or rewrite large portions of the codebase to handle growth.
Debugging Difficulties and Lack of Accountability: An often-cited pitfall of vibe coding is that developers may not fully understand the code that the AI writes, making debugging a challenge. When errors arise in AI-written code, some programmers (especially novices) simply ask the AI for a fix or regeneration, rather than diagnosing the root cause themselves. Professors have observed this pattern in students: if a program produced via ChatGPT “didn’t work, [they] simply ask the AI system to do it again,” because they don’t actually grasp what the code is supposed to do. This leads to a dangerous cycle where bugs are patched by more AI output, potentially introducing new issues. Moreover, developer accountability can diminish. Since the human didn’t hand-write the code, they might feel less ownership or responsibility for its correctness. As one software manager put it, “when AI generates code, the traditional sense of ownership blurs,” and engineers may overlook flaws that would normally be caught if they had written and reasoned through the code themselves. The lack of personal accountability means critical reviews can be skimped on, and important details (like corner-case bugs or maintainability concerns) get missed. Ultimately, vibe coding can produce code that “works” initially but becomes a nightmare to debug or modify, since the people on the team did not build up a deep understanding of how it works internally.

AI coding tools offer promising boosts to developer productivity and can accelerate the software development process, as evidenced by multiple studies. Tools like Copilot and ChatGPT can generate code, assist with routine tasks, and even help newcomers learn faster, contributing to faster delivery of features. Moreover, the level of accuracy is exponentially increasing… as I pen this article, I’m fully aware that it may not age well even beyond 6 months. However, “vibe coding” serves as a cautionary tale: when developers lean too heavily on AI, i.e., coding by “vibe” or intuition without structure (read: have no idea what they’re doing aside from prompting to “app-release = profit and glory… and being pwned”), the short-term gains can be undermined by long-term costs. Peer-reviewed studies and expert analyses alike warn that uncritical reliance on AI-generated code may lead to unmaintainable, insecure, and fragile software. The consensus is that human oversight, good software engineering practices, and thorough code review are still essential. AI can be a powerful amplifier for developer productivity, but it works best when used as a tool under the guidance of skilled engineers, rather than a replacement for sound design and due diligence.

Sources:

Bakal et al., “Experience with GitHub Copilot for Developer Productivity at Zoominfo,” arXiv preprint 2501.13282 (2025) – Case study of Copilot deployment.
Microsoft/MIT/Wharton study (2024) – Field experiments on Copilot’s impact, reported by InfoQ.
Silva et al., “ChatGPT: Challenges and Benefits in Software Programming for Higher Education,” Sustainability 16(3):1245 (2024) – On ChatGPT assisting coding.
Trotta, “5 Vibe Coding Risks and Ways to Avoid Them in 2025,” Zencoder Blog (2025) – Industry analysis of vibe coding pitfalls.
Mack, “Vibe Coding Explained: Use Cases, Risks and Developer Guidance,” VKTR (2025) – Expert opinions on vibe coding vs. AI-assisted programming.