Chinese artificial intelligence startup DeepSeek released two powerful new AI models on Sunday that the company claims match or exceed the capabilities of OpenAI's GPT-5 and Google's Gemini-3.0-Pro — a development that could reshape the competitive landscape between American tech giants and their Chinese challengers.
The Hangzhou-based company launched DeepSeek-V3.2, designed as an everyday reasoning assistant, alongside DeepSeek-V3.2-Speciale, a high-powered variant that achieved gold-medal performance in four elite international competitions: the 2025 International Mathematical Olympiad, the International Olympiad in Informatics, the ICPC World Finals, and the China Mathematical Olympiad.
The release carries profound implications for American technology leadership. DeepSeek has once again demonstrated that it can produce frontier AI systems despite U.S. export controls that restrict China's access to advanced Nvidia chips — and it has done so while making its models freely available under an open-source MIT license.
"People thought DeepSeek gave a one-time breakthrough but we came back much bigger," wrote Chen Fang, who identified himself as a contributor to the project, on X (formerly Twitter). The release drew swift reactions online, with one user declaring: "Rest in peace, ChatGPT."
At the heart of the new release lies DeepSeek Sparse Attention, or DSA — a novel architectural innovation that dramatically reduces the computational burden of running AI models on long documents and complex tasks.
Traditional AI attention mechanisms, the core technology allowing language models to understand context, scale poorly as input length increases. Processing a document twice as long typically requires four times the computation. DeepSeek's approach breaks this constraint using what the company calls a "lightning indexer" that identifies only the most relevant portions of context for each query, ignoring the rest.
According to DeepSeek's technical report, DSA reduces inference costs by roughly half compared to previous models when processing long sequences. The architecture "substantially reduces computational complexity while preserving model performance," the report states.
Processing 128,000 tokens — roughly equivalent to a 300-page book — now costs approximately $0.70 per million tokens for decoding, compared to $2.40 for the previous V3.1-Terminus model. That represents a 70% reduction in inference costs.
The 685-billion-parameter models support context windows of 128,000 tokens, making them suitable for analyzing lengthy documents, codebases, and research papers. DeepSeek's technical report notes that independent evaluations on long-context benchmarks show V3.2 performing on par with or better than its predecessor "despite incorporating a sparse attention mechanism."
DeepSeek's claims of parity with America's leading AI systems rest on extensive testing across mathematics, coding, and reasoning tasks — and the numbers are striking.
On AIME 2025, a prestigious American mathematics competition, DeepSeek-V3.2-Speciale achieved a 96.0% pass rate, compared to 94.6% for GPT-5-High and 95.0% for Gemini-3.0-Pro. On the Harvard-MIT Mathematics Tournament, the Speciale variant scored 99.2%, surpassing Gemini's 97.5%.
The standard V3.2 model, optimized for everyday use, scored 93.1% on AIME and 92.5% on HMMT — marginally below frontier models but achieved with substantially fewer computational resources.
Most striking are the competition results. DeepSeek-V3.2-Speciale scored 35 out of 42 points on the 2025 International Mathematical Olympiad, earning gold-medal status. At the International Olympiad in Informatics, it scored 492 out of 600 points — also gold, ranking 10th overall. The model solved 10 of 12 problems at the ICPC World Finals, placing second.
These results came without internet access or tools during testing. DeepSeek's report states that "testing strictly adheres to the contest's time and attempt limits."
On coding benchmarks, DeepSeek-V3.2 resolved 73.1% of real-world software bugs on SWE-Verified, competitive with GPT-5-High at 74.9%. On Terminal Bench 2.0, measuring complex coding workflows, DeepSeek scored 46.4%—well above GPT-5-High's 35.2%.
The company acknowledges limitations. "Token efficiency remains a challenge," the technical report states, noting that DeepSeek "typically requires longer generation trajectories" to match Gemini-3.0-Pro's output quality.
Beyond raw reasoning, DeepSeek-V3.2 introduces "thinking in tool-use" — the ability to reason through problems while simultaneously executing code, searching the web, and manipulating files.
Previous AI models faced a frustrating limitation: each time they called an external tool, they lost their train of thought and had to restart reasoning from scratch. DeepSeek's architecture preserves the reasoning trace across multiple tool calls, enabling fluid multi-step problem solving.
To train this capability, the company built a massive synthetic data pipeline generating over 1,800 distinct task environments and 85,000 complex instructions. These included challenges like multi-day trip planning with budget constraints, software bug fixes across eight programming languages, and web-based research requiring dozens of searches.
The technical report describes one example: planning a three-day trip from Hangzhou with constraints on hotel prices, restaurant ratings, and attraction costs that vary based on accommodation choices. Such tasks are "hard to solve but easy to verify," making them ideal for training AI agents.
DeepSeek employed real-world tools during training — actual web search APIs, coding environments, and Jupyter notebooks — while generating synthetic prompts to ensure diversity. The result is a model that generalizes to unseen tools and environments, a critical capability for real-world deployment.
Unlike OpenAI and Anthropic, which guard their most powerful models as proprietary assets, DeepSeek has released both V3.2 and V3.2-Speciale under the MIT license — one of the most permissive open-source frameworks available.
Any developer, researcher, or company can download, modify, and deploy the 685-billion-parameter models without restriction. Full model weights, training code, and documentation are available on Hugging Face, the leading platform for AI model sharing.
The strategic implications are significant. By making frontier-capable models freely available, DeepSeek undermines competitors charging premium API prices. The Hugging Face model card notes that DeepSeek has provided Python scripts and test cases "demonstrating how to encode messages in OpenAI-compatible format" — making migration from competing services straightforward.
For enterprise customers, the value proposition is compelling: frontier performance at dramatically lower cost, with deployment flexibility. But data residency concerns and regulatory uncertainty may limit adoption in sensitive applications — particularly given DeepSeek's Chinese origins.
DeepSeek's global expansion faces mounting resistance. In June, Berlin's data protection commissioner Meike Kamp declared that DeepSeek's transfer of German user data to China is "unlawful" under EU rules, asking Apple and Google to consider blocking the app.
The German authority expressed concern that "Chinese authorities have extensive access rights to personal data within the sphere of influence of Chinese companies." Italy ordered DeepSeek to block its app in February. U.S. lawmakers have moved to ban the service from government devices, citing national security concerns.
Questions also persist about U.S. export controls designed to limit China's AI capabilities. In August, DeepSeek hinted that China would soon have "next generation" domestically built chips to support its models. The company indicated its systems work with Chinese-made chips from Huawei and Cambricon without additional setup.
DeepSeek's original V3 model was reportedly trained on roughly 2,000 older Nvidia H800 chips — hardware since restricted for China export. The company has not disclosed what powered V3.2 training, but its continued advancement suggests export controls alone cannot halt Chinese AI progress.
The release arrives at a pivotal moment. After years of massive investment, some analysts question whether an AI bubble is forming. DeepSeek's ability to match American frontier models at a fraction of the cost challenges assumptions that AI leadership requires enormous capital expenditure.
The company's technical report reveals that post-training investment now exceeds 10% of pre-training costs — a substantial allocation credited for reasoning improvements. But DeepSeek acknowledges gaps: "The breadth of world knowledge in DeepSeek-V3.2 still lags behind leading proprietary models," the report states. The company plans to address this by scaling pre-training compute.
DeepSeek-V3.2-Speciale remains available through a temporary API until December 15, when its capabilities will merge into the standard release. The Speciale variant is designed exclusively for deep reasoning and does not support tool calling — a limitation the standard model addresses.
For now, the AI race between the United States and China has entered a new phase. DeepSeek's release demonstrates that open-source models can achieve frontier performance, that efficiency innovations can slash costs dramatically, and that the most powerful AI systems may soon be freely available to anyone with an internet connection.
As one commenter on X observed: "Deepseek just casually breaking those historic benchmarks set by Gemini is bonkers."
The question is no longer whether Chinese AI can compete with Silicon Valley. It's whether American companies can maintain their lead when their Chinese rival gives comparable technology away for free.
When Liquid AI, a startup founded by MIT computer scientists back in 2023, introduced its Liquid Foundation Models series 2 (LFM2) in July 2025, the pitch was straightforward: deliver the fastest on-device foundation models on the market using the new "liquid" architecture, with training and inference efficiency that made small models a serious alternative to cloud-only large language models (LLMs) such as OpenAI's GPT series and Google's Gemini.
The initial release shipped dense checkpoints at 350M, 700M, and 1.2B parameters, a hybrid architecture heavily weighted toward gated short convolutions, and benchmark numbers that placed LFM2 ahead of similarly sized competitors like Qwen3, Llama 3.2, and Gemma 3 on both quality and CPU throughput. The message to enterprises was clear: real-time, privacy-preserving AI on phones, laptops, and vehicles no longer required sacrificing capability for latency.
In the months since that launch, Liquid has expanded LFM2 into a broader product line — adding task-and-domain-specialized variants, a small video ingestion and analysis model, and an edge-focused deployment stack called LEAP — and positioned the models as the control layer for on-device and on-prem agentic systems.
Now, with the publication of the detailed, 51-page LFM2 technical report on arXiv, the company is going a step further: making public the architecture search process, training data mixture, distillation objective, curriculum strategy, and post-training pipeline behind those models.
And unlike earlier open models, LFM2 is built around a repeatable recipe: a hardware-in-the-loop search process, a training curriculum that compensates for smaller parameter budgets, and a post-training pipeline tuned for instruction following and tool use.
Rather than just offering weights and an API, Liquid is effectively publishing a detailed blueprint that other organizations can use as a reference for training their own small, efficient models from scratch, tuned to their own hardware and deployment constraints.
The technical report begins with a premise enterprises are intimately familiar with: real AI systems hit limits long before benchmarks do. Latency budgets, peak memory ceilings, and thermal throttling define what can actually run in production—especially on laptops, tablets, commodity servers, and mobile devices.
To address this, Liquid AI performed architecture search directly on target hardware, including Snapdragon mobile SoCs and Ryzen laptop CPUs. The result is a consistent outcome across sizes: a minimal hybrid architecture dominated by gated short convolution blocks and a small number of grouped-query attention (GQA) layers. This design was repeatedly selected over more exotic linear-attention and SSM hybrids because it delivered a better quality-latency-memory Pareto profile under real device conditions.
This matters for enterprise teams in three ways:
Predictability. The architecture is simple, parameter-efficient, and stable across model sizes from 350M to 2.6B.
Operational portability. Dense and MoE variants share the same structural backbone, simplifying deployment across mixed hardware fleets.
On-device feasibility. Prefill and decode throughput on CPUs surpass comparable open models by roughly 2× in many cases, reducing the need to offload routine tasks to cloud inference endpoints.
Instead of optimizing for academic novelty, the report reads as a systematic attempt to design models enterprises can actually ship.
This is notable and more practical for enterprises in a field where many open models quietly assume access to multi-H100 clusters during inference.
LFM2 adopts a training approach that compensates for the smaller scale of its models with structure rather than brute force. Key elements include:
10–12T token pre-training and an additional 32K-context mid-training phase, which extends the model’s useful context window without exploding compute costs.
A decoupled Top-K knowledge distillation objective that sidesteps the instability of standard KL distillation when teachers provide only partial logits.
A three-stage post-training sequence—SFT, length-normalized preference alignment, and model merging—designed to produce more reliable instruction following and tool-use behavior.
For enterprise AI developers, the significance is that LFM2 models behave less like “tiny LLMs” and more like practical agents able to follow structured formats, adhere to JSON schemas, and manage multi-turn chat flows. Many open models at similar sizes fail not due to lack of reasoning ability, but due to brittle adherence to instruction templates. The LFM2 post-training recipe directly targets these rough edges.
In other words: Liquid AI optimized small models for operational reliability, not just scoreboards.
The LFM2-VL and LFM2-Audio variants reflect another shift: multimodality built around token efficiency.
Rather than embedding a massive vision transformer directly into an LLM, LFM2-VL attaches a SigLIP2 encoder through a connector that aggressively reduces visual token count via PixelUnshuffle. High-resolution inputs automatically trigger dynamic tiling, keeping token budgets controllable even on mobile hardware. LFM2-Audio uses a bifurcated audio path—one for embeddings, one for generation—supporting real-time transcription or speech-to-speech on modest CPUs.
For enterprise platform architects, this design points toward a practical future where:
document understanding happens directly on endpoints such as field devices;
audio transcription and speech agents run locally for privacy compliance;
multimodal agents operate within fixed latency envelopes without streaming data off-device.
The through-line is the same: multimodal capability without requiring a GPU farm.
LFM2-ColBERT extends late-interaction retrieval into a footprint small enough for enterprise deployments that need multilingual RAG without the overhead of specialized vector DB accelerators.
This is particularly meaningful as organizations begin to orchestrate fleets of agents. Fast local retrieval—running on the same hardware as the reasoning model—reduces latency and provides a governance win: documents never leave the device boundary.
Taken together, the VL, Audio, and ColBERT variants show LFM2 as a modular system, not a single model drop.
Across all variants, the LFM2 report implicitly sketches what tomorrow’s enterprise AI stack will look like: hybrid local-cloud orchestration, where small, fast models operating on devices handle time-critical perception, formatting, tool invocation, and judgment tasks, while larger models in the cloud offer heavyweight reasoning when needed.
Several trends converge here:
Cost control. Running routine inference locally avoids unpredictable cloud billing.
Latency determinism. TTFT and decode stability matter in agent workflows; on-device eliminates network jitter.
Governance and compliance. Local execution simplifies PII handling, data residency, and auditability.
Resilience. Agentic systems degrade gracefully if the cloud path becomes unavailable.
Enterprises adopting these architectures will likely treat small on-device models as the “control plane” of agentic workflows, with large cloud models serving as on-demand accelerators.
LFM2 is one of the clearest open-source foundations for that control layer to date.
For years, organizations building AI features have accepted that “real AI” requires cloud inference. LFM2 challenges that assumption. The models perform competitively across reasoning, instruction following, multilingual tasks, and RAG—while simultaneously achieving substantial latency gains over other open small-model families.
For CIOs and CTOs finalizing 2026 roadmaps, the implication is direct: small, open, on-device models are now strong enough to carry meaningful slices of production workloads.
LFM2 will not replace frontier cloud models for frontier-scale reasoning. But it offers something enterprises arguably need more: a reproducible, open, and operationally feasible foundation for agentic systems that must run anywhere, from phones to industrial endpoints to air-gapped secure facilities.
In the broadening landscape of enterprise AI, LFM2 is less a research milestone and more a sign of architectural convergence. The future is not cloud or edge—it’s both, operating in concert. And releases like LFM2 provide the building blocks for organizations prepared to build that hybrid future intentionally rather than accidentally.
A stealth artificial intelligence startup founded by an MIT researcher emerged this morning with an ambitious claim: its new AI model can control computers better than systems built by OpenAI and Anthropic — at a fraction of the cost.
OpenAGI, led by chief executive Zengyi Qin, released Lux, a foundation model designed to operate computers autonomously by interpreting screenshots and executing actions across desktop applications. The San Francisco-based company says Lux achieves an 83.6 percent success rate on Online-Mind2Web, a benchmark that has become the industry's most rigorous test for evaluating AI agents that control computers.
That score is a significant leap over the leading models from well-funded competitors. OpenAI's Operator, released in January, scores 61.3 percent on the same benchmark. Anthropic's Claude Computer Use achieves 56.3 percent.
"Traditional LLM training feeds a large amount of text corpus into the model. The model learns to produce text," Qin said in an exclusive interview with VentureBeat. "By contrast, our model learns to produce actions. The model is trained with a large amount of computer screenshots and action sequences, allowing it to produce actions to control the computer."
The announcement arrives at a pivotal moment for the AI industry. Technology giants and startups alike have poured billions of dollars into developing autonomous agents capable of navigating software, booking travel, filling out forms, and executing complex workflows. OpenAI, Anthropic, Google, and Microsoft have all released or announced agent products in the past year, betting that computer-controlling AI will become as transformative as chatbots.
Yet independent research has cast doubt on whether current agents are as capable as their creators suggest.
The Online-Mind2Web benchmark, developed by researchers at Ohio State University and the University of California, Berkeley, was designed specifically to expose the gap between marketing claims and actual performance.
Published in April and accepted to the Conference on Language Modeling 2025, the benchmark comprises 300 diverse tasks across 136 real websites — everything from booking flights to navigating complex e-commerce checkouts. Unlike earlier benchmarks that cached parts of websites, Online-Mind2Web tests agents in live online environments where pages change dynamically and unexpected obstacles appear.
The results, according to the researchers, painted "a very different picture of the competency of current agents, suggesting over-optimism in previously reported results."
When the Ohio State team tested five leading web agents with careful human evaluation, they found that many recent systems — despite heavy investment and marketing fanfare — did not outperform SeeAct, a relatively simple agent released in January 2024. Even OpenAI's Operator, the best performer among commercial offerings in their study, achieved only 61 percent success.
"It seemed that highly capable and practical agents were maybe indeed just months away," the researchers wrote in a blog post accompanying their paper. "However, we are also well aware that there are still many fundamental gaps in research to fully autonomous agents, and current agents are probably not as competent as the reported benchmark numbers may depict."
The benchmark has gained traction as an industry standard, with a public leaderboard hosted on Hugging Face tracking submissions from research groups and companies.
OpenAGI's claimed performance advantage stems from what the company calls "Agentic Active Pre-training," a training methodology that differs fundamentally from how most large language models learn.
Conventional language models train on vast text corpora, learning to predict the next word in a sequence. The resulting systems excel at generating coherent text but were not designed to take actions in graphical environments.
Lux, according to Qin, takes a different approach. The model trains on computer screenshots paired with action sequences, learning to interpret visual interfaces and determine which clicks, keystrokes, and navigation steps will accomplish a given goal.
"The action allows the model to actively explore the computer environment, and such exploration generates new knowledge, which is then fed back to the model for training," Qin told VentureBeat. "This is a naturally self-evolving process, where a better model produces better exploration, better exploration produces better knowledge, and better knowledge leads to a better model."
This self-reinforcing training loop, if it functions as described, could help explain how a smaller team might achieve results that elude larger organizations. Rather than requiring ever-larger static datasets, the approach would allow the model to continuously improve by generating its own training data through exploration.
OpenAGI also claims significant cost advantages. The company says Lux operates at roughly one-tenth the cost of frontier models from OpenAI and Anthropic while executing tasks faster.
A critical distinction in OpenAGI's announcement: Lux can control applications across an entire desktop operating system, not just web browsers.
Most commercially available computer-use agents, including early versions of Anthropic's Claude Computer Use, focus primarily on browser-based tasks. That limitation excludes vast categories of productivity work that occur in desktop applications — spreadsheets in Microsoft Excel, communications in Slack, design work in Adobe products, code editing in development environments.
OpenAGI says Lux can navigate these native applications, a capability that would substantially expand the addressable market for computer-use agents. The company is releasing a developer software development kit alongside the model, allowing third parties to build applications on top of Lux.
The company is also working with Intel to optimize Lux for edge devices, which would allow the model to run locally on laptops and workstations rather than requiring cloud infrastructure. That partnership could address enterprise concerns about sending sensitive screen data to external servers.
"We are partnering with Intel to optimize our model on edge devices, which will make it the best on-device computer-use model," Qin said.
The company confirmed it is in exploratory discussions with AMD and Microsoft about additional partnerships.
Computer-use agents present novel safety challenges that do not arise with conventional chatbots. An AI system capable of clicking buttons, entering text, and navigating applications could, if misdirected, cause significant harm — transferring money, deleting files, or exfiltrating sensitive information.
OpenAGI says it has built safety mechanisms directly into Lux. When the model encounters requests that violate its safety policies, it refuses to proceed and alerts the user.
In an example provided by the company, when a user asked the model to "copy my bank details and paste it into a new Google doc," Lux responded with an internal reasoning step: "The user asks me to copy the bank details, which are sensitive information. Based on the safety policy, I am not able to perform this action." The model then issued a warning to the user rather than executing the potentially dangerous request.
Such safeguards will face intense scrutiny as computer-use agents proliferate. Security researchers have already demonstrated prompt injection attacks against early agent systems, where malicious instructions embedded in websites or documents can hijack an agent's behavior. Whether Lux's safety mechanisms can withstand adversarial attacks remains to be tested by independent researchers.
Qin brings an unusual combination of academic credentials and entrepreneurial experience to OpenAGI.
He completed his doctorate at the Massachusetts Institute of Technology in 2025, where his research focused on computer vision, robotics, and machine learning. His academic work appeared in top venues including the Conference on Computer Vision and Pattern Recognition, the International Conference on Learning Representations, and the International Conference on Machine Learning.
Before founding OpenAGI, Qin built several widely adopted AI systems. JetMoE, a large language model he led development on, demonstrated that a high-performing model could be trained from scratch for less than $100,000 — a fraction of the tens of millions typically required. The model outperformed Meta's LLaMA2-7B on standard benchmarks, according to a technical report that attracted attention from MIT's Computer Science and Artificial Intelligence Laboratory.
His previous open-source projects achieved remarkable adoption. OpenVoice, a voice cloning model, accumulated approximately 35,000 stars on GitHub and ranked in the top 0.03 percent of open-source projects by popularity. MeloTTS, a text-to-speech system, has been downloaded more than 19 million times, making it one of the most widely used audio AI models since its 2024 release.
Qin also co-founded MyShell, an AI agent platform that has attracted six million users who have collectively built more than 200,000 AI agents. Users have had more than one billion interactions with agents on the platform, according to the company.
The computer-use agent market has attracted intense interest from investors and technology giants over the past year.
OpenAI released Operator in January, allowing users to instruct an AI to complete tasks across the web. Anthropic has continued developing Claude Computer Use, positioning it as a core capability of its Claude model family. Google has incorporated agent features into its Gemini products. Microsoft has integrated agent capabilities across its Copilot offerings and Windows.
Yet the market remains nascent. Enterprise adoption has been limited by concerns about reliability, security, and the ability to handle edge cases that occur frequently in real-world workflows. The performance gaps revealed by benchmarks like Online-Mind2Web suggest that current systems may not be ready for mission-critical applications.
OpenAGI enters this competitive landscape as an independent alternative, positioning superior benchmark performance and lower costs against the massive resources of its well-funded rivals. The company's Lux model and developer SDK are available beginning today.
Whether OpenAGI can translate benchmark dominance into real-world reliability remains the central question. The AI industry has a long history of impressive demos that falter in production, of laboratory results that crumble against the chaos of actual use. Benchmarks measure what they measure, and the distance between a controlled test and an 8-hour workday full of edge cases, exceptions, and surprises can be vast.
But if Lux performs in the wild the way it performs in the lab, the implications extend far beyond one startup's success. It would suggest that the path to capable AI agents runs not through the largest checkbooks but through the cleverest architectures—that a small team with the right ideas can outmaneuver the giants.
The technology industry has seen that story before. It rarely stays true for long.
As AI, cloud, and other technology investments soar, organizations have to make investment decisions with increased speed and clarity. Practices like FinOps, IT financial management (ITFM), and strategic portfolio management (SPM) help stakeholders evaluate opportunities and trade-offs for maximum value. But they depend on unified, reliable data. And that’s often where the challenge begins.
AI can surface insights from data within specific domains, but important decisions rarely rely on a single source of data. To account for operational and organizational factors as well as financial impact, finance and IT teams have to cut through disconnected systems, outdated data, and inconsistent definitions of value. Real control over technology spend comes from financial intelligence — turning fragmented inputs into actionable, context-rich insights.
Apptio technology business management (TBM) solutions deliver that intelligence to technology and finance leaders. By connecting financial, operational, and business data across the enterprise, they give leaders the clarity to make every tech dollar count.
When different stakeholders rely on different sources of truth, they don’t share the same perspective on the finance and technology landscape. The CFO sees the cost structures in the ERP system. The CIO sees systems configuration and performance metrics in ITSM and monitoring tools. The business looks at outcomes in CRM and analytics platforms. But no single domain has the holistic understanding needed to balance organizational, operational, and financial priorities.
Organizations must also evaluate competing priorities across applications, infrastructure, cloud services, DevOps tools, and workforce investments. Informed trade-offs — such as carving out budget for AI investments without undermining existing capabilities — require visibility into usage patterns, system redundancies, and relative value across all these domains. Without visibility, FinOps, ITFM, and SPM practices can’t fulfill their potential for IT and cloud cost optimization.
Instead, siloed data sources force finance teams to spend hours gathering reports from different systems of record and trying to reconcile inconsistent data formats. This practice is not only time- and labor-intensive, but it also opens the org to the risk of flawed forecasts, missed optimization opportunities, and wasted technology spend — potentially costing millions annually.
This critical gap reveals why generic BI platforms and DIY tools only go so far. They can’t connect costs back to their sources at a detailed level, making it hard to trace allocations across systems, identify redundancies, or even answer the simplest question: What’s driving our costs?
Financial intelligence translates domain-specific financial, operational, and business metrics into a shared language of value on which leaders can act. By aggregating, normalizing, and enriching data from ERP systems, cloud platforms, IT service management tools, HR systems, and more, the Financial Intelligence Layer in Apptio supports three critical ITFM, FinOps, and SPM capabilities:
Context. Aligning financial, operational, and outcome inputs so that:
Cloud spend connects to business impact
Infrastructure costs tie to application performance
Workforce investments link to service delivery
Insights. Connecting cost, usage, performance, and value across the enterprise. For example, mapping AI model usage to ROI can reveal which initiatives do and do not deserve continued investment.
Action. Empowering leaders to make informed, coordinated decisions rather than operating in silos.
Hyperscalers surface cloud cost optimization insights on their own platforms. Single-function tech platforms like ERP, HR, CRM, and ITSM provide valuable metrics for their specific domains. Apptio TBM solutions go further, delivering the financial context and actionable insights needed to manage technology spend management across all areas: on-premises, multi-cloud, applications, and workforce.
Raw numbers don’t tell a story. What matters is structuring data so that it aligns with business goals and enables decision-makers to see patterns, weigh options, and chart the best path forward. Apptio has trained its AI specifically on FinOps, ITFM, and SPM to understand the questions these teams actually need to answer, so TBM teams can work faster and smarter.
Apptio TBM solutions ease the cognitive load by automating time-consuming ingestion, mapping, anomaly detection, and enrichment — so people can focus on strategic decisions. Clean, enriched inputs feed forecasting models that anticipate cost trends and surface optimization opportunities. And because Apptio offers ready-to-use cost modeling frameworks and governance, organizations can start realizing value far faster than they can using DIY or open-source tools.
Financial intelligence starts with clean, contextualized data — but how that data is organized and used is equally critical for optimizing technology spend. TBM principles like cost and consumption allocation, process optimization, and unit economics will help teams translate data into meaningful insights and smarter decisions.
Solutions purpose-built for technology spend management are essential. Spreadsheets don’t scale, and domain expertise matters. Apptio TBM solutions deliver enterprise-grade governance, financial context across all tech domains, and AI trained specifically for ITFM, FinOps, and SPM. These are capabilities that hyperscalers — focused on single-cloud optimization and generic BI tools — simply can’t provide at scale.
In an era when rapid innovation places a premium on technology spend management, financial intelligence is vital for maximizing budgets. By optimizing the inputs that fuel AI-driven financial workflows, leaders can equip every stakeholder with the confidence and intelligence to steer technology investments with data-driven precision.
Learn more here about how the Financial Intelligence Layer in Apptio transforms how enterprises decide, fund, and execute their TBM strategies in the AI era.
Ajay Patel is General Manager at Apptio, an IBM Company.
Sponsored articles are content produced by a company that is either paying for the post or has a business relationship with VentureBeat, and they’re always clearly marked. For more information, contactsales@venturebeat.com.
With necessary infrastructure now being developed for agentic commerce, enterprises must determine how to participate in this new form of buying and selling. But it remains a fragmented Wild West, with competing payment protocols, and it's unclear what enterprises need to do to prepare.
More cloud providers and AI model companies are beginning to provide the tools enterprises need to begin building agentic commerce-enabled systems.
AWS, which will list Visa’s Intelligence Commerce platform on the AWS Marketplace, says it's making it easier for enterprises to connect to tools that enable agentic payments and accelerate agentic commerce adoption.
While this doesn’t mean Amazon has formally impleemnted Visa’s Trusted Agent Protocol (TAP), which would bring the world’s largest e-commerce platform to the agentic shopping space, it does show just how agentic commerce is fast becoming an enterprise focus.
Scott Mullins, AWS managing director of worldwide financial services, told VentureBeat in an email that listing the platform “makes payment capabilities accessible” in a secure manner that quickly integrates with Visa’s system.
“We’re giving developers pre-built frameworks and standardized infrastructure to eliminate major development barriers,” Mullins said.
He noted that AWS is listing Visa’s platform to streamline integration with services like Bedrock and AgentCore.
In addition, the two companies will publish blueprints to the public Bedrock AgentCore repository. Mullins said this will “significantly reduce development time and complexity that anyone can use to create travel booking agents, retail shopping agents and B2B payment reconciliation agents.”
The Visa Intelligence Commerce platform will be MCP-compatible, allowing enterprises to connect agents running on it to other agents.
Through the Visa Intelligence Commerce platform, AWS customers can access authentication, agentic tokenization and data personalization tools. This allows organizations to register and connect their agents to Visa’s payment infrastructure.
The platform helps mask credit card details through tokenized digital credentials and lets companies set guidelines for agent transactions, such as spending limits.
Rubali Birwadker, SVP and global head of growth at Visa, said in a press release that bringing the platform to AWS lets it scale, “helping to unlock faster innovation for developers and better experiences for consumers and businesses worldwide.”
Mullins said Visa and AWS are helping to provide the foundational infrastructure for developers and businesses to pursue agentic commerce projects; however, for this to work, developers must coordinate several agents and understand the different needs across industries.
“Real-world commerce often requires multiple agents working together,” Mullins said. “The travel booking agent blueprint, for instance, connects flight, hotel, car rental and train providers to deliver complete travel journeys with integrated payments. Developers need to design coordination patterns for these complex, multi-agent workflows.”
Different use cases also have different needs, so enterprises need to plan carefully around existing infrastructure.
This is where the MCP connection is vital, as it will enable communication between an organization’s agents to Visa’s platform while maintaining identity and security.
Mullins said that the biggest stumbling block for many enterprises experimenting with agentic commerce is the fragmentation of commerce systems, which creates integration challenges.
“This collaboration will address these challenges by providing reference architecture blueprints that developers can use as starting points, combined with AWS's cloud infrastructure and Visa's trusted payment network to create a standardized, secure foundation for agentic commerce,” he said.
The reference blueprints provide a framework for enterprise developers, solution architects and software vendors to follow when building new workflows. Mullins said the blueprints are being developed in coordination with Expedia Group, Intuit and the Eurostars Hotel company.
The blueprints will work with the Visa Intelligent Commerce MCP server and APIs and will be managed through Amazon Bedrock AgentCore.
AWS said that its goal is to “enable a foundation for agentic commerce at Scale, where transactions are handled by agents capable of real-time reasoning and coordination.”
These blueprints would eventually become composable, reusable workflows for any organization looking to build travel booking agents or retail shopping agents. These don’t have to be consumer-focused agents; there can also be agents, for instance, buying flights for employees.
Agentic commerce, where agents do product searching, cart adding and payments, is fast becoming the next frontier for AI players.
Companies like OpenAI and Google have released AI-powered shopping tools to make it easier to surface products and allow agents to find them. Browsers like OpenAI’s Atlas and Comet from Perplexity also play a role in connecting agents to websites. Further, retailers like Walmart and Target have integrated with ChatGPT, so users can ask the chatbot to search for items through chat.
One of the biggest problems facing the adoption of agentic commerce revolves around enabling safe, secure transactions. OpenAI and Stripe launched the Agentic Commerce Protocol (ACP) in September, following Google’s announcement of Agent Pay Protocol (AP2) in collaboration with American Express, Mastercard, PayPal, Salesforce and ServiceNow. Visa followed soon after with TAP, which connects to the Visa Intelligent Commerce platform.
“The foundation is now in place through this collaboration, but successful agentic commerce requires thoughtful design that considers the specific needs of industry, users and existing systems while leveraging the standardized infrastructure and blueprints now available,” Mullins said.