Blockchain CouncilGlobal Technology Council
ai5 min read

Goldman Sachs Tests AI Agent Devin as a ‘New Employee’

Blockchain CouncilBlockchain Council
Goldman Sachs Tests AI Agent Devin as a ‘New Employee’

Goldman Sachs is actively testing Devin, an autonomous AI software engineer developed by Cognition Labs. The bank has deployed hundreds of Devin instances in its internal systems, with plans to scale up. This marks one of the first real-world cases where an AI agent is evaluated like a human employee.

If you’re asking what this means – Goldman is seeing how far AI can go in performing real coding tasks usually handled by junior developers.

What Is Devin?

Devin is an AI-powered coding agent trained to perform full software engineering tasks. It operates inside its own environment, complete with a shell, browser, and code editor. Unlike traditional AI coding assistants, Devin can execute multi-step programming tasks from start to finish.

Cognition Labs, the company behind Devin, claims it is the first AI capable of functioning as a full-stack software engineer. It has already shown that it can:

  • Write and debug code
  • Test software in realistic dev environments
  • Deploy code to live systems

This hands-free, agentic behavior is what makes Devin different from tools like GitHub Copilot.

Why Goldman Sachs Is Testing Devin

Goldman Sachs sees Devin as a potential productivity multiplier. According to CIO Marco Argenti, the bank’s goal is to evaluate Devin “as if it were a new employee.”

Here’s why Goldman is testing this approach:

  • To automate repetitive tasks in coding
  • To support its 12,000+ human developers with AI agents
  • To increase speed and accuracy in delivery
  • To explore cost-effective ways to scale engineering operations

The bank has already deployed hundreds of Devin instances and may roll out thousands more if tests go well.

What Devin Can Do at Goldman

Devin is not just a demo or a prototype. Goldman Sachs is already using it to perform specific software development tasks under human supervision. These include:

  • Editing and updating legacy codebases
  • Writing unit and integration tests
  • Refactoring sections of internal tools
  • Preparing code for production environments

Supervising engineers review all output, but Devin does most of the heavy lifting in these assigned tasks.

Task Breakdown in Live Deployment

Here’s how Devin fits into Goldman Sachs’ coding workflow:

  • Engineers assign Devin a ticket through a project management system
  • Devin reads the codebase, identifies dependencies, and proposes a solution
  • It writes and tests the code in its own sandboxed environment
  • Human engineers review and merge the pull request

Use Cases of Devin in Production

Task Type Fully AI-driven Human-supervised Manual by Engineers
Code refactoring Yes Yes Occasionally
Test writing Yes Yes Rarely
Codebase navigation Yes No No
Deploy-ready output No Yes Yes
Debugging complex logic No Yes Yes

This setup creates a hybrid workflow that blends AI automation with human expertise.

What Makes Devin Different

Devin isn’t just a chatbot for code. Its key differentiator is agency – it can carry out multi-step technical tasks independently. Unlike other AI tools that stop at suggesting code snippets, Devin:

  • Executes shell commands
  • Tests code in runtime environments
  • Uses context across multiple files
  • Follows task instructions with reasoning steps

This makes it more than just a helpful assistant. It becomes a co-worker – one that never sleeps, doesn’t need a paycheck, and can work in parallel with thousands of other instances.

Goldman’s Expected Productivity Gains

Goldman Sachs has already seen gains with AI-powered coding copilots. Some teams report a 20 percent boost in developer speed for small tasks. With Devin, internal estimates point to a 3x to 4x productivity increase in more complex workflows.

The firm believes this type of agentic AI could eventually take on 20 to 40 percent of all software engineering workload – especially in code-heavy departments like risk, compliance, and data engineering.

Devin vs Other AI Coding Tools

Tool Name Autonomy Level Deployment Type Ideal Use Case Agentic Behavior
Devin High Private instances Multi-step coding tasks Yes
GitHub Copilot Low Editor plugin Code suggestions in real time No
ChatGPT (Code) Medium Chat interface Q&A and logic generation No

Devin’s main edge lies in its ability to plan, execute, and revise – without needing repeated prompts.

Is Devin Replacing Human Engineers?

No. Goldman Sachs emphasizes that Devin supports developers, not replaces them. Humans remain responsible for:

  • Final code reviews
  • Deciding architecture
  • Approving deployments
  • Integrating outputs into larger systems

Devin is not ready to handle creative, abstract engineering work or deal with novel problems that require deep judgment. It is a tool – not a peer.

Career Implications for Developers

Devin’s arrival signals a shift in how engineering teams may function. As AI agents take on more of the repetitive work, developers will be expected to:

  • Work alongside AI agents efficiently
  • Review and refine machine-generated code
  • Spend more time on architecture, security, and strategy

For those looking to stay competitive, investing in AI skills is critical. The AI Certification helps professionals understand how agentic tools are built and deployed. If you’re more data-focused, the Data Science Certification is ideal. And if your goals lean toward strategy and growth, the Marketing and Business Certification explores how AI shapes commercial outcomes.

Strategic Impact for Enterprise AI

Devin’s deployment is not just a tech milestone – it’s a business case. It shows that:

  • AI agents can be embedded in live workflows
  • Large institutions are ready to experiment at scale
  • Hybrid AI-human teams are not just the future – they’re already here

Goldman Sachs is using Devin to push boundaries and create a blueprint that others in finance, insurance, and tech may follow.

Why This Pilot Matters

This is one of the first times an AI tool has been treated as an employee equivalent. From HR to deployment structure, Goldman is setting a precedent:

  • AI onboarding as part of team operations
  • Tracking AI performance like human contributors
  • Iterating workflows to better integrate agents

It’s not just a trial – it’s a restructuring of how enterprise tech teams function.

Devin’s Enterprise Benefits

Benefit Area How Devin Helps Resulting Impact
Developer productivity Handles low-level code tasks Saves time and effort
Cost efficiency Runs at scale with fewer overheads Reduces operational cost
Quality assurance Generates test cases and checks Improves code quality
Scalability Can deploy thousands of agents Meets demand flexibly

Devin is not a threat to human talent – but a tool to amplify it.

Final Thoughts

Devin’s pilot at Goldman Sachs marks a turning point in enterprise AI adoption. It’s more than a coding assistant. It’s a software agent with real responsibilities and performance expectations.

As more organizations explore agentic AI, developers, managers, and business leaders will need to adapt. Goldman is showing what this transition looks like – not in theory, but in action.