AI and COBOL: Unpacking the Hype, Challenges, and Real-World Impact on Legacy Code

The impact of AI on coding, particularly for legacy languages like COBOL, presents a complex landscape of challenges and opportunities. While the widespread notion that AI will rapidly replace human developers is often debated, real-world experiences suggest a more nuanced picture, especially in critical, long-standing systems.

Challenges for AI in Legacy Systems

For COBOL, a language foundational to much of the global economy, several unique hurdles limit AI's current utility:

Strict Formatting and Syntax: COBOL, especially older versions (pre-79), relies on fixed-column formatting where a misplaced character isn't just a visual preference but a syntax error or worse, a subtle compile-time bug. Current LLMs frequently struggle with this, often generating code that exceeds column 72 or misplaces period terminations in nested IF statements, requiring more time to lint than to write manually.
Compliance and Data Security: A significant barrier in highly regulated industries like banking is the inability to send proprietary or sensitive code to external LLM providers. While local models offer a solution, they are often too heavy to run on the restricted virtual desktop infrastructure (VDI) instances commonly used in these environments. The risk of code leakage remains a critical concern for many organizations.
Undocumented Business Logic: Decades of accumulated, often undocumented, business rules and "why" knowledge embedded in COBOL systems are nearly impossible for AI to grasp. This institutional knowledge, residing primarily with experienced human developers, defines the true complexity of maintenance and development in these environments. LLMs lack the necessary context of specific system intricacies and historical quirks.
Proprietary Language Flavors: Unlike more modern, standardized languages, COBOL often exists in various proprietary "flavors" specific to particular financial institutions. This fragmentation limits the effectiveness of generically trained LLMs, as a model trained on one bank's COBOL might offer limited transference to another.

General LLM Performance and Use Cases

Beyond COBOL, experiences with LLMs across various languages reveal a mix of capabilities and limitations:

Varying Language Proficiency: LLMs show inconsistent performance across languages. While some report strong results with Java, TypeScript, SQL, YAML, and Bicep, others find them struggling with C, Rust (especially with lifetimes and complex types), Python, and Go. Success often depends on the specific model (e.g., Opus 4.5 vs. older versions) and the complexity of the task.
Hallucinations and Trust Issues: A common complaint is LLMs' tendency to confidently generate incorrect code or lead developers down unproductive rabbit holes, sometimes even suggesting insecure configurations (e.g., logging bearer tokens and session cookies). This necessitates extensive, vigilant review, which can often negate the time saved by AI generation. For junior developers, relying on AI without deep understanding can hinder learning and increase error rates.
Strengths in Specific Tasks: LLMs are highly effective for:
- Tedious and boilerplate tasks: Generating test data, file layouts, simple functions, and configuration files (YAML, Bicep).
- Research and documentation: Quickly finding obscure information in large manuals or deciphering poorly documented APIs (e.g., AWS documentation, COM interfaces).
- Overcoming writer's block: Providing an initial draft or "slop" that can be refined and rewritten, making it easier to start complex tasks.
- Modernization efforts: Assisting with translating older codebases or refactoring small, well-defined sections.

Strategies for Effective AI Integration

To maximize the benefits of AI in coding, developers emphasize several key practices:

Treat AI as an Assistant, Not an Autonomous Agent: Every line of AI-generated code, including tests, must be thoroughly reviewed and vetted by a human. This ensures correctness, security, and alignment with project standards.
Become an "AI Architect": Successful users treat LLMs like junior developers requiring precise, detailed instructions. This involves prompting with excruciating specificity, outlining constraints, desired patterns, and even explicit security requirements. Providing a "style guide" or CLAUDE.md can help enforce consistency.
Leverage Iterative Workflows: AI code generation is rarely a one-shot solution. It often requires an iterative process of prompt, generate, test, review, and refine. Connecting LLMs to checkers, linters, and compilers allows them to identify and correct their own mistakes more effectively.
Provide Sufficient Context: LLMs perform significantly better when operating within a well-structured codebase with clear documentation, examples, and established patterns. Feeding them a blank slate or a messy environment yields poorer results.
Focus on High-Level Design: AI currently struggles with abstraction and functional decomposition. Human developers remain crucial for architectural decisions and breaking down complex problems into manageable, well-defined components that AI can then assist in implementing.
Consider Fine-Tuning for Specialized Domains: For highly proprietary languages or specific codebases, fine-tuning LLMs with a custom corpus can yield much better results, though this requires significant investment from organizations.

The Broader Impact

While AI may not lead to the immediate "death" of COBOL or other specialized programming roles, it is undoubtedly shifting the nature of work. Experienced engineers can use AI as a "force multiplier," offloading tedious tasks and accelerating development, potentially freeing them for higher-value activities. It might also lower the barrier for newer engineers to engage with complex legacy systems, addressing the talent shortage.

However, the probabilistic nature of LLMs, contrasted with the deterministic output of traditional compilers, underscores that AI is a tool that requires human oversight, critical thinking, and a deep understanding of the problem domain. The goal is not "vibe-coded" solutions but thoughtfully integrated AI assistance that enhances, rather than compromises, software quality and reliability.