

Co-building the open-source ML serving platform that Bloomberg's AI infrastructure runs on
Bloomberg needed production-grade ML model serving on Kubernetes: governed, extensible, independent of any vendor's roadmap. Ideas2IT co-built KServe into a CNCF-graduated platform now handling inference across Bloomberg's financial AI products, cutting test infrastructure costs by 68% and network overhead by 85%.
.png)

Client
Bloomberg

Industry
Financial Technology

Duration
AI/ML Platform Engineering

Team
Embedded Engineering Pod

Platform
KServe · CNCF Graduated
01 Challenge
Bloomberg ran ML inference on KFServing, governed by Google. In production, it worked. The problem: Bloomberg needed serverless inference, LLM serving, and a standardised protocol across frameworks. None were on KFServing's roadmap. None could be built without owning the platform's governance.
02 Solution
Ideas2IT built offline-first as the architectural spine, with on-device risk algorithms and a sync layer that never lost a patient record on reconnect. When nine countries outgrew a monolithic architecture, the team rebuilt SPICE on Spring Boot and Kubernetes so new government integrations could ship without destabilizing the rest of the platform.
03 Outcome
Restructuring CI/CD onto GitHub Actions cut test infrastructure cost by 68% and image size by 47%. gRPC adoption reduced network overhead by 85%. Test coverage reached 84% from a 50% baseline. KServe graduated from LFAI incubation to CNCF, with serverless scale-to-zero in production.
Phase 01
Platform Independence and Core Architecture
Platform independence and the Open Inference Protocol: separating KServe from its origins
The first decision: establish KServe as an independent project, separate from its Google-governed origins.
That required:
- Decoupling the codebase.
- Restructuring the contribution model so Bloomberg had input into the roadmap.
- Managing a governance change without losing the contributors who built it.
The Open Inference Protocol was implemented in PyTorch, creating a standardised interface across model frameworks. Native PyTorch/TorchServe integration followed. gRPC replaced REST, reducing network overhead by 85%. The CI/CD pipeline moved from paid AWS to GitHub Actions with parallelised jobs: test cost dropped 68%, coverage extended from 50% to 84%, and Docker image size fell 47%. KServe entered LFAI incubation.
PHASE 01 DELIVERABLES
- KServe independent GitHub org + governance
- Open Inference Protocol (OIP) in PyTorch
- Native PyTorch / TorchServe integrationg
- RPC communication layer
- 84% test coverage (Pytest + Ginkgo)
- KServe LFAI incubation
.png)
Phase 02
Serverless Inference and LLM Readiness
Serverless inference and LLM readiness: KEDA integration, Hugging Face, and CNCF graduation
The platform could not answer one question: how do you run inference cost-efficiently when demand is uneven, without provisioning for idle workloads? KEDA integration solved it.
KEDA enabled KServe to scale deployments dynamically, including to zero during idle periods. The same team drafted the community-wide proposal, drove adoption, and built the production implementation Bloomberg deployed.
LLM serving was added through Hugging Face integration, enabling Bloomberg to serve transformer models across financial products. The Gateway API and Envoy AI Gateway handled GenAI workload orchestration. Queue-Proxy managed queued request processing under load. All runtimes migrated to Poetry. KServe graduated from LFAI incubation to CNCF.
PHASE 02 DELIVERABLES
- KEDA scale-to-zero autoscaling
- Hugging Face / LLM serving integration
- Gateway API + Envoy AI Gateway
- Queue-Proxy for request queuing
- Poetry package management migration
- OpenAI API specification support
- KServe CNCF graduation
- Documentation revamp (Mike + Netlify)
.png)
The Ideas2IT team operated the way you'd want a core platform team to operate. They understood the community obligations and the production requirements, and they didn't treat those as separate problems.
The Outcome
One specialist team. A production ML serving platform Bloomberg's AI runs on, and the industry adopted.
The 68% test cost reduction, the 85% drop in network overhead, the serverless inference layer: each one was the direct result of a specific architectural decision, not a general improvement programme. The same engineers who delivered Bloomberg's production requirements drove the open-source roadmap forward in the same cycle. KServe's CNCF graduation is the external validation: the platform is now infrastructure the industry maintains, not just infrastructure Bloomberg uses.