Co-building the open-source ML serving platform that Bloomberg's AI infrastructure runs on

Bloomberg needed production-grade ML model serving on Kubernetes: governed, extensible, independent of any vendor's roadmap. Ideas2IT co-built KServe into a CNCF-graduated platform now handling inference across Bloomberg's financial AI products, cutting test infrastructure costs by 68% and network overhead by 85%.

Client

Bloomberg

Industry

Financial Technology

Duration

AI/ML Platform Engineering

Team

Embedded Engineering Pod

Platform

KServe · CNCF Graduated

01 Challenge

Bloomberg ran ML inference on KFServing, governed by Google. In production, it worked. The problem: Bloomberg needed serverless inference, LLM serving, and a standardised protocol across frameworks. None were on KFServing's roadmap. None could be built without owning the platform's governance.

02 Solution

Ideas2IT built offline-first as the architectural spine, with on-device risk algorithms and a sync layer that never lost a patient record on reconnect. When nine countries outgrew a monolithic architecture, the team rebuilt SPICE on Spring Boot and Kubernetes so new government integrations could ship without destabilizing the rest of the platform.

03 Outcome

Restructuring CI/CD onto GitHub Actions cut test infrastructure cost by 68% and image size by 47%. gRPC adoption reduced network overhead by 85%. Test coverage reached 84% from a 50% baseline. KServe graduated from LFAI incubation to CNCF, with serverless scale-to-zero in production.

Phase 01

Platform Independence and Core Architecture

Platform independence and the Open Inference Protocol: separating KServe from its origins

The first decision: establish KServe as an independent project, separate from its Google-governed origins.

That required:

Decoupling the codebase.
Restructuring the contribution model so Bloomberg had input into the roadmap.
Managing a governance change without losing the contributors who built it.

The Open Inference Protocol was implemented in PyTorch, creating a standardised interface across model frameworks. Native PyTorch/TorchServe integration followed. gRPC replaced REST, reducing network overhead by 85%. The CI/CD pipeline moved from paid AWS to GitHub Actions with parallelised jobs: test cost dropped 68%, coverage extended from 50% to 84%, and Docker image size fell 47%. KServe entered LFAI incubation.

PHASE 01 DELIVERABLES

KServe independent GitHub org + governance
Open Inference Protocol (OIP) in PyTorch
Native PyTorch / TorchServe integrationg
RPC communication layer
84% test coverage (Pytest + Ginkgo)
KServe LFAI incubation

Phase 02

Serverless Inference and LLM Readiness

Serverless inference and LLM readiness: KEDA integration, Hugging Face, and CNCF graduation

The platform could not answer one question: how do you run inference cost-efficiently when demand is uneven, without provisioning for idle workloads? KEDA integration solved it.

KEDA enabled KServe to scale deployments dynamically, including to zero during idle periods. The same team drafted the community-wide proposal, drove adoption, and built the production implementation Bloomberg deployed.

LLM serving was added through Hugging Face integration, enabling Bloomberg to serve transformer models across financial products. The Gateway API and Envoy AI Gateway handled GenAI workload orchestration. Queue-Proxy managed queued request processing under load. All runtimes migrated to Poetry. KServe graduated from LFAI incubation to CNCF.

PHASE 02 DELIVERABLES

KEDA scale-to-zero autoscaling
Hugging Face / LLM serving integration
Gateway API + Envoy AI Gateway
Queue-Proxy for request queuing
Poetry package management migration
OpenAI API specification support
KServe CNCF graduation
Documentation revamp (Mike + Netlify)

The Ideas2IT team operated the way you'd want a core platform team to operate. They understood the community obligations and the production requirements, and they didn't treat those as separate problems.

The Outcome

One specialist team. A production ML serving platform Bloomberg's AI runs on, and the industry adopted.

Number	Label	Description
68%	Test infrastructure cost and run time reduced	The CI/CD pipeline was rebuilt from paid AWS compute to GitHub Actions with parallelised jobs. The cost reduction was a direct consequence of the infrastructure change, not an optimisation layered on top of what existed.
85%	Reduction in network overhead	gRPC replaced REST for model serving communication, reducing serialisation overhead and connection cost on every inference request. At Bloomberg's inference volume the reduction compounds into meaningful latency and throughput gains.
84%	Test code coverage achieved	Coverage extended from a 50% baseline through a purpose-built regression and end-to-end test suite using Pytest for Python serving runtimes and Ginkgo for the Go Kubernetes operator.
47%	Decrease in Docker image size	Layer caching, GitHub Artifacts replacing Docker Hub, and removal of build-time dependencies from runtime layers produced images nearly half the original size. Smaller images reduce pull times, registry costs, and attack surface in production.
Scale-to-zero	Serverless inference enabled	Before KEDA, Bloomberg's inference deployments maintained minimum replica counts to handle burst traffic: idle compute ran continuously. KEDA-based autoscaling eliminated that floor. Infrastructure now provisions and deprovisions against actual request volume.
CNCF Graduated	KServe elevated from LFAI incubation	CNCF graduation signals the project meets the engineering standards and production adoption criteria CNCF applies to infrastructure trusted at enterprise scale. For Bloomberg, it means the platform they depend on for ML inference is maintained to the same standard as Kubernetes itself.

The 68% test cost reduction, the 85% drop in network overhead, the serverless inference layer: each one was the direct result of a specific architectural decision, not a general improvement programme. The same engineers who delivered Bloomberg's production requirements drove the open-source roadmap forward in the same cycle. KServe's CNCF graduation is the external validation: the platform is now infrastructure the industry maintains, not just infrastructure Bloomberg uses.