AI Hiring

02.24.2026

AI legacy code modernization: why contractor governance must catch up

Gordie Hanrahan image

Gordie Hanrahan

blog feature image

AI-driven legacy modernization is quickly becoming a priority for financial services CIOs, but contractor governance frameworks have not kept pace.

This week, two announcements signaled a structural shift for financial services technology leaders. Anthropic released a COBOL modernization playbook to accelerate mainframe transformation. At the same time, OpenAI unveiled its Frontier Alliances program with global consulting firms, including Accenture, Boston Consulting, Capgemini, and McKinsey, to deploy AI agents at enterprise scale. 

Karat’s finserv workforce transformation survey shows that legacy modernization is one of the most anticipated AI use cases for CIOs over the next 3-5 years. As Anthropic shared in this week’s announcement, “legacy code modernization stalled for years because understanding legacy code costs more than rewriting it. But AI flips that equation.”

Agents can help interpret and refactor legacy code, generate test cases, assist with documentation, and support migration planning. That makes these decade-long processes much more addressable in the near term.

The challenge is that legacy system AI workflows won’t happen in isolation. They will happen through the services ecosystem that already runs modernization. And in highly regulated environments, speed without judgment isn’t a competitive advantage. It’s risk exposure. 

If a coding assistant fails for five minutes at a startup, productivity dips.

If a major bank misroutes a wire transfer, that’s a regulatory event.

And because most financial institutions don’t fully control their own engineering workforce, that risk becomes harder to mitigate. 

Balancing AI modernization with contractor dependency in financial services

The IT and software outsourcing market is expected to eclipse $1T/year by 2030. In the financial services industry, contractors make up around 30% of the software engineering workforce, on average, and we’ve seen organizations where that percentage is over 70%. 

Modernization initiatives, especially legacy systems, frequently sit with external partners: the exact companies that OpenAI and Anthropic are empowering to deploy AI at scale. 

This means that the frontlines for AI deployments across financial services companies will likely be 3rd-party contractors: a cohort of engineers who have historically underperformed their FTE counterparts

What are the risks of deploying AI through consulting partners?

If 30–70% of the engineers executing your AI modernization aren’t on your payroll, your engineering quality is only as strong as your partner evaluation framework.

Ilene Eng, SVP of Digital Engineering at Bread Financial, shares this sentiment. “Embracing a standards-based approach to contractors is a strategic accelerator for our teams,” notes Eng. “It strengthens cross-functional collaboration, streamlines project delivery, and establishes a consistent framework for both internal teams and external partners to drive measurable outcomes.”

The problem is that most organizations lack the benchmarking and rigor to evaluate contract engineers. What’s more, the evaluations that are being used were designed for a pre-LLM world. 

CIOs must update their contractor evaluations to account for AI. But this is an area where financial services companies are falling behind. 25% of leaders are updating technical assessments to evaluate AI proficiency, and just 22% are training interviewers to assess AI skills.

This is especially problematic because, according to David Lau, VP of Engineering at OpenAI, “frontier models are advancing so quickly that last month’s edge cases become this month’s baseline. To ride this wave of AI momentum, organizations must continually re-evaluate how they empower great people with the latest models, and design their software, workflows, tools, and hiring processes to let humans and AI multiply each other’s impact.” 

How should CIOs evaluate contractors for AI-driven legacy modernization?

The safest way to scale AI modernization through contractors is to evaluate engineering judgment, not just code output. CIOs should evaluate contractors using structured Human + AI technical interview rubrics that assess AI tool usage, system navigation, and decision-making.

In a pre-LLM world, evaluating coding productivity was relatively straightforward. But in the human + AI era, a simple solution to a basic coding question can be generated in seconds. As a result, having contractors produce basic answers to code tests is no longer sufficient. Leaders need a more granular rubric to validate the candidate’s approach and separate it from someone who got lucky with a prompt.

What is a human + AI technical interview rubric?

A Human + AI technical interview rubric is a structured evaluation framework that measures not only coding output, but judgment, AI usage, system navigation, and decision-making in AI-augmented development environments. 

Critical skills for AI-driven legacy modernization include:

  • Scoping ambiguous enterprise problems
  • Validating and testing AI-generated output
  • Debugging and refining AI-assisted code
  • Navigating complex, regulated codebases

This is particularly critical when evaluating contractor talent. Unlike internal employees, contractors often join projects mid-stream while working in complex systems and operating across multiple client environments. Contractor assessments must evolve to explicitly measure these abilities rather than assuming they are implied.

Updating talent evaluations for the AI era may seem daunting, especially when developing multi-file code repositories for candidates to use in interviews. The good news is that there are plenty of resources to help leaders create human + AI interview rubrics that measure real-world performance in an AI-enabled enterprise.

The bottom line for CIOs: Contractor Evaluation is AI Governance

If 30–70% of the engineers executing your AI transformation sit outside your org chart, contractor evaluation becomes a core governance function. This means that structured, human + AI technical interview rubrics are not a hiring optimization. They are a risk mitigation strategy.

FAQS

What is AI-driven legacy modernization?
AI-driven legacy modernization uses AI agents and tools to help interpret, refactor, test, and migrate legacy systems faster than traditional methods while reducing manual engineering effort.

Why is contractor governance important for AI modernization?
Because many financial institutions rely heavily on external engineering partners, AI modernization success depends on how well organizations evaluate and govern contractor quality, decision-making, and risk management.

How should CIOs evaluate contractors in the AI era?
CIOs should use structured Human + AI technical interview rubrics that assess engineering judgment, AI tool usage, system navigation, and decision-making — not just coding output.

What is a Human + AI technical interview rubric?
A Human + AI technical interview rubric is a structured framework that evaluates both traditional engineering skills and AI-assisted problem-solving capabilities, including judgment, validation, and workflow decisions.

What are the biggest risks of AI modernization in financial services?
Key risks include insufficient governance, poor contractor evaluation, lack of AI skill assessment, and increased regulatory exposure when AI-generated outputs are not properly validated.

Ready to get started?

It’s time to start
hiring with confidence

Request Demo