Evident Logo
The Brief

DATA-DRIVEN INSIGHTS AND NEWS

ON HOW BANKS ARE ADOPTING AI

How to judge your model

How to judge your model

Source: Adobe Firefly

13 November 2025

Welcome back to the Banking Brief. As we test out new features of this now weekly newsletter on AI adoption in the industry, we want to hear from you. What do you like? Dislike? Want more of? Email us at [email protected].

This week: We’ve got a unique view into the China-U.S. model race, and the big lesson from how the market prices banks’ AI investment. In Use Case Corner, how one bank cut a cybersecurity task’s time from weeks to minutes. 

People mentioned: Jensen Huang, Andrew Bean, EJ Achtner, Pat Opet, Bori Cox, Michael Santomassimo, Tan Su Shan, Brendan Coughlin, Dermot McDonogh, Andrew Irvine, Fiona Browne, Claudio Balbo, Dror Ayalon, Meena Tumuluri, Benjamin Crestel, Sion Roberts, Robert Li and Yann LeCun.

This edition is 1,487 words, a 5 minute read. Check it out online. If you were forwarded the Brief, you can subscribe here.

– Alexandra Mousavizadeh & Annabel Ayles

TOP OF THE NEWS

ON-PAPER TIGER

China’s latest AI model just beat America’s best. The headlines here miss the point.

Yes, Moonshot AI released the Kimi K2 Thinking Model, an open source LLM that outperforms OpenAI’s GPT-5 and Anthropic’s Claude Sonnet 4.5 across several benchmarks. It looked like quick vindication of Nvidia CEO Jensen Huang’s claim last week that China “would win” on AI.

China may well beat the world to AGI. Here’s the problem – that's not the only race worth winning. The race that matters isn’t about academic benchmarks scores; it’s about how well AI can be applied in business.

Model benchmarks measure the wrong thing, according to new research from Oxford. They may excel at “questions they’ve effectively seen before because they can memorize everything they’ve trained on,” lead researcher Andrew Bean told us. “The hard thing with language models is whether or not they generalize to new tasks.” That’s the key to making them useful in the enterprise. 

Researchers at Meta made a similar point last month: Even widely-used software engineering benchmarks are too easy to game and say little about how models perform in practice. It’s led to growing accusations that model-makers are, to use a new bit of jargon, “benchmaxxing” – training their LLMs in ways that deliberately boost their performance on tests like these. 

So who can banks trust, if not the academics? The answer: themselves. They’re building their own custom evaluations, testing how reliable, accurate and cost effective models are in real-world conditions for specific use cases. The Bank of Singapore, for example, shared with us last week that it designed a 1,000-question automated benchmark to test new models on the market for its KYC use case (see:Follow the money,” The Brief, Nov. 6). 

As BMO’s head of applied AI EJ Achtner wrote this week, “Models come and go, evals are eternal.” Put another way: It’s not about who has the most advanced model; it’s about how you turn performance into productivity.

Two weeks ago, we gathered in New York with 400 AI leaders in financial services for our annual Evident AI Symposium. Movers and shakers from JPMorganChase, Goldman Sachs, Capital One, Morgan Stanley, Citi, UBS, CommBank, BMO, RBC, BNY and more discussed everything from practical ways to scale Gen AI tools to the playbook for building agentic systems inside a bank.

USE CASE CORNER

PLAYING DEFENSE

Cybersecurity AI use cases are rarely shared due to their sensitivity (patents are a different story, but more on that next week). This year, just three cybersecurity tools have been added to the Use Case Tracker – our database of banks’ publicly-announced use cases. This week, we look at the newest one from JPMorganChase, out last month.

Use Case: AI threat modeling co-pilot
Vendor: OpenAI
Bank: JPMorganChase

Why it’s interesting: Threat modeling – how banks find their vulnerabilities before bad actors do – is one of the most human-dependent tasks in cybersecurity. Before a bank adopts a new platform or launches a new feature, the cybersecurity team needs to analyze how it might be exploited, which can take weeks. By using Gen AI to map the potential vulnerabilities, JPMC removed a big bottleneck.

How it works: Engineers give the tool a diagram of how one of its systems works or a description of how an application was built. The tool uses a technique called “tradecraft prompting” to mirror how a human would approach threat modeling: breaking each element down, flagging potential risks with the design and offering ways to mitigate them. The tool was trained using human-made threat models from eight different parts of the bank (its consumer app and payments platform, for example) to give it broad exposure to the different types of risks it should look for. 

By the numbers: “We can now produce a fully elaborated threat model in context of our threat and control framework in minutes,” Pat Opet, the bank’s CISO, wrote. The copilot drove 20% efficiency within the threat modeling process. And when it analyzes a new system, it finds an average of nine more threats than a human does on their own.

Want to know more about the specific ways banks are rolling out AI? Check out our Use Case Tracker – the inventory of all the AI use cases announced by the world’s largest banks available to members.

STAT OF THE WEEK

The number of workers Singapore-based DBS is actively retraining as a result of AI’s effects on the workforce. The bank last week said it will stop hiring for roles that would be done primarily by AI in the future. “If you’re in service, you’re in a call center, you might want to think about, how do you morph into a relationship manager?” said CEO Tan Su Shan

EVIDENT ETF

UNTAPPED POTENTIAL

Investors sent a clear message to the market this week: It’s too late for bolt-on AI.

Rightmove, the UK’s biggest property site, saw its stock plummet after announcing late-in-the-game plans to ramp up AI investment. The same lesson is true in banking: As investors look to the future, they’re increasingly rewarding businesses where AI is built in, not layered on.

The Evident ETF – our quarterly way of testing whether being good on AI makes you a better business, or at least a market favorite – shows as much: The higher a bank ranks in the Evident AI Index, the higher its price-to-earnings ratio. In layman’s terms: The better a bank is on AI, the rosier Wall Street’s outlook.

Does all that goodwill come from AI alone? Hardly. But strong fundamentals and diversified businesses earned these banks the cash to make big bets on the tech. And now that early returns are showing just how deep those investments go, it’s clear laggards will struggle to catch up.

FORWARD LOOKING STATEMENTS

The banks that do the most on AI have the highest price-to-earnings ratios, a signal that Wall Street believes in their long term profitability.

Source: Evident analysis

Morgan Stanley, for example, says it saves 280,000 hours with its legacy code translation tool. Citi frees up 100,000 developer hours a week with AI. JPMorganChase employees using LLM Suite all are saving four hours per week – roughly 600,000 hours company-wide. 

Banks say there’s a lot more to come based on what’s going on behind the scenes. Bori Cox, the CFO of consumer banking at JPMC, said that her institution still had “embedded productivity gains” it was “working hard to unlock” at a conference last week. Wells Fargo CFO Michael Santomassimo added that “across the company there’s more we can do to get more efficient.” BNY’s Dermot McDonogh put it plainest: “We really haven't factored in the opportunities, both in growth and in efficiency, that AI is going to bring to the firm over time.” Said another way: We’re working on translating these big numbers to the bottom line. Until then, keep the optimism coming.

NOTABLY QUOTABLE
“The amount of organizational change needed by financial services firms to utilize GenAI may be substantial. History suggests progress may be slow...to successfully leverage the potential of GenAI on a sustainable basis, decisions based on those models must be well controlled, numerically and legally precise, explainable, and replicable. AI developers still struggle to some extent with all of those criteria.”

– Michael Barr, Fed governor, at Singapore Fintech Festival, Nov. 11

ABOUT EVIDENT

Evident is the intelligence platform for AI adoption in financial services. We help leaders stay ahead of change with trusted insights, benchmarking, and real-time data through our flagship Banking Index, our new Insurance Index, Insights across Talent, Innovation, Leadership, Transparency and Responsible AI pillars, a real-time Use Case Tracker, community and events. Watch our latest roundtable exploring the insights from the 2025 Index for banks and get in touch to hear more about how Evident can help your business adopt AI faster.

TALENT MATTERS

AI IN IB

JPMorganChase hired Dror Ayalon as head of AI product for investment banking operations. Ayalon was most recently a group product manager at Google DeepMind

Danske Bank UK hired Fiona Browne as its new head of AI. She was previously at 9fin, a debt market intelligence company. The bank brought Kasper Tjørntved Davidsen on as its group chief AI officer in June.

Sion Roberts is now AI strategy lead for the UK & Europe for Crédit Agricole’s commercial and investment banking arm. Roberts was previously an AI strategy manager at NatWest.

Benjamin Crestel joined BNP Paribas to head up the bank's AI & Analytics Lab in Canada. He was previously a VP on Morgan Stanley's Firmwide Innovation team.

Robert Li, VP of venture investing at Citi Ventures left for Parafin, a fintech that offers cash advances to small businesses.

Meena Tumuluri left JPMC to be head of strategy and quantum finance director for quantum computing company IonQ. She’ll reunite with Marco Pistoia who she worked under at the bank, before Pistoia departed earlier this year (see: JPMC’s quantum shake-up,” The Brief, July 24).

Yann LeCun, chief scientist at Meta, is leaving the company to found a startup focused on world models, systems that use visual and spatial data instead of text.

IN THE NEWS

BANK WHIPAROUND

MUFG and OpenAI struck a deal Wednesday to bring the AI lab’s tech into more of the Japanese bank’s customer-facing tools. That includes letting customers use ChatGPT to better understand their finances and investments. At the same time, the bank announced plans to hire more than 350 AI specialists by March 2027.

Citizens Financial put a number on AI ROI – sort of. Speaking about a new simplification program, president Brendan Coughlin said the run rate would be $400 million over three years. It wasn’t solely tied to AI; some would come from supplier renegotiation and real estate, but a big portion is how the bank can “deploy generative AI, agentic AI in a number of different spots,” Coughlin said.

Intesa Sanpaolo’s GT GenAI now gets 30,000 to 35,000 interactions per week, the bank’s head of IT architecture Claudio Balbo shared on LinkedIn last week. The bank also rolled out its first agentic tool for “streamlining the creation of analysis documents,” he wrote.

Lloyds is touting a new agentic tool for personal finance for its mobile banking app early next year. When it launches to the public, users will be able to chat with the tool to get budgeting help or investment advice and authorize it to act on their behalf.

Heads are going to roll at Australia’s NAB, CEO Andrew Irvine warned, and AI is to blame. “I do think over time as the technology matures that there will be some job impacts,” he said last week. Evident data shows AI is having a different effect: The banks leaning in are the ones that have grown their headcount the most (see: "Not guilty," The Brief, Nov. 6).

WHAT'S ON

Mon 17 - Tues 18 Nov
Momentum AI Finance 2025, New York

Sun 30 Nov - Sun 7 Dec
NeurIPS, Mexico City & San Diego

Mon 19  - Fri 23 Jan
WEF, Davos, Switzerland

TwitterLinkedIn