Goldman Sachs Tests AI Agent Devin as a ‘New Employee’

Goldman Sachs is actively testing Devin, an autonomous AI software engineer developed by Cognition Labs. The bank has deployed hundreds of Devin instances in its internal systems, with plans to scale up. This marks one of the first real-world cases where an AI agent is evaluated like a human employee.
If you’re asking what this means – Goldman is seeing how far AI can go in performing real coding tasks usually handled by junior developers.
What Is Devin?
Devin is an AI-powered coding agent trained to perform full software engineering tasks. It operates inside its own environment, complete with a shell, browser, and code editor. Unlike traditional AI coding assistants, Devin can execute multi-step programming tasks from start to finish.
Cognition Labs, the company behind Devin, claims it is the first AI capable of functioning as a full-stack software engineer. It has already shown that it can:
- Write and debug code
- Test software in realistic dev environments
- Deploy code to live systems
This hands-free, agentic behavior is what makes Devin different from tools like GitHub Copilot.
Why Goldman Sachs Is Testing Devin
Goldman Sachs sees Devin as a potential productivity multiplier. According to CIO Marco Argenti, the bank’s goal is to evaluate Devin “as if it were a new employee.”
Here’s why Goldman is testing this approach:
- To automate repetitive tasks in coding
- To support its 12,000+ human developers with AI agents
- To increase speed and accuracy in delivery
- To explore cost-effective ways to scale engineering operations
The bank has already deployed hundreds of Devin instances and may roll out thousands more if tests go well.
What Devin Can Do at Goldman
Devin is not just a demo or a prototype. Goldman Sachs is already using it to perform specific software development tasks under human supervision. These include:
- Editing and updating legacy codebases
- Writing unit and integration tests
- Refactoring sections of internal tools
- Preparing code for production environments
Supervising engineers review all output, but Devin does most of the heavy lifting in these assigned tasks.
Task Breakdown in Live Deployment
Here’s how Devin fits into Goldman Sachs’ coding workflow:
- Engineers assign Devin a ticket through a project management system
- Devin reads the codebase, identifies dependencies, and proposes a solution
- It writes and tests the code in its own sandboxed environment
- Human engineers review and merge the pull request
Use Cases of Devin in Production
| Task Type | Fully AI-driven | Human-supervised | Manual by Engineers |
| Code refactoring | Yes | Yes | Occasionally |
| Test writing | Yes | Yes | Rarely |
| Codebase navigation | Yes | No | No |
| Deploy-ready output | No | Yes | Yes |
| Debugging complex logic | No | Yes | Yes |
This setup creates a hybrid workflow that blends AI automation with human expertise.
What Makes Devin Different
Devin isn’t just a chatbot for code. Its key differentiator is agency – it can carry out multi-step technical tasks independently. Unlike other AI tools that stop at suggesting code snippets, Devin:
- Executes shell commands
- Tests code in runtime environments
- Uses context across multiple files
- Follows task instructions with reasoning steps
This makes it more than just a helpful assistant. It becomes a co-worker – one that never sleeps, doesn’t need a paycheck, and can work in parallel with thousands of other instances.
Goldman’s Expected Productivity Gains
Goldman Sachs has already seen gains with AI-powered coding copilots. Some teams report a 20 percent boost in developer speed for small tasks. With Devin, internal estimates point to a 3x to 4x productivity increase in more complex workflows.
The firm believes this type of agentic AI could eventually take on 20 to 40 percent of all software engineering workload – especially in code-heavy departments like risk, compliance, and data engineering.
Devin vs Other AI Coding Tools
| Tool Name | Autonomy Level | Deployment Type | Ideal Use Case | Agentic Behavior |
| Devin | High | Private instances | Multi-step coding tasks | Yes |
| GitHub Copilot | Low | Editor plugin | Code suggestions in real time | No |
| ChatGPT (Code) | Medium | Chat interface | Q&A and logic generation | No |
Devin’s main edge lies in its ability to plan, execute, and revise – without needing repeated prompts.
Is Devin Replacing Human Engineers?
No. Goldman Sachs emphasizes that Devin supports developers, not replaces them. Humans remain responsible for:
- Final code reviews
- Deciding architecture
- Approving deployments
- Integrating outputs into larger systems
Devin is not ready to handle creative, abstract engineering work or deal with novel problems that require deep judgment. It is a tool – not a peer.
Career Implications for Developers
Devin’s arrival signals a shift in how engineering teams may function. As AI agents take on more of the repetitive work, developers will be expected to:
- Work alongside AI agents efficiently
- Review and refine machine-generated code
- Spend more time on architecture, security, and strategy
For those looking to stay competitive, investing in AI skills is critical. The AI Certification helps professionals understand how agentic tools are built and deployed. If you’re more data-focused, the Data Science Certification is ideal. And if your goals lean toward strategy and growth, the Marketing and Business Certification explores how AI shapes commercial outcomes.
Strategic Impact for Enterprise AI
Devin’s deployment is not just a tech milestone – it’s a business case. It shows that:
- AI agents can be embedded in live workflows
- Large institutions are ready to experiment at scale
- Hybrid AI-human teams are not just the future – they’re already here
Goldman Sachs is using Devin to push boundaries and create a blueprint that others in finance, insurance, and tech may follow.
Why This Pilot Matters
This is one of the first times an AI tool has been treated as an employee equivalent. From HR to deployment structure, Goldman is setting a precedent:
- AI onboarding as part of team operations
- Tracking AI performance like human contributors
- Iterating workflows to better integrate agents
It’s not just a trial – it’s a restructuring of how enterprise tech teams function.
Devin’s Enterprise Benefits
| Benefit Area | How Devin Helps | Resulting Impact |
| Developer productivity | Handles low-level code tasks | Saves time and effort |
| Cost efficiency | Runs at scale with fewer overheads | Reduces operational cost |
| Quality assurance | Generates test cases and checks | Improves code quality |
| Scalability | Can deploy thousands of agents | Meets demand flexibly |
Devin is not a threat to human talent – but a tool to amplify it.
Final Thoughts
Devin’s pilot at Goldman Sachs marks a turning point in enterprise AI adoption. It’s more than a coding assistant. It’s a software agent with real responsibilities and performance expectations.
As more organizations explore agentic AI, developers, managers, and business leaders will need to adapt. Goldman is showing what this transition looks like – not in theory, but in action.