Issue No. 09 • May 28, 2026
Gradient Brief
MLOps & AI Infrastructure — for the engineers building it
Red Hat AI 3.4 Ships the Most Complete AgentOps Stack Yet
At the Red Hat Summit in Atlanta this week, the company announced Red Hat AI 3.4, delivering what it describes as a "metal-to-agent" platform. This is the most technically dense MLOps release of the week, addressing the specific operational friction points that emerge when moving autonomous agents from experimentation to production. The core of the release is a new Model-as-a-Service (MaaS) layer that provides a single, governed interface for developers to access curated models, complete with self-service token key management, role-based administration, and usage showback.
The inference engine beneath the MaaS layer received significant upgrades. Powered by vLLM and the llm-d distributed inference engine, the platform now supports request prioritization, allowing interactive and background traffic to share the same endpoint while processing latency-sensitive requests first under load. Red Hat also announced general availability for speculative decoding, which uses efficient draft models to accelerate processing by 2x to 3x without quality loss. On the hardware side, the platform expanded its accelerator support to include AMD Instinct MI355X GPUs and generally available vLLM CPU serving on both AMD EPYC and Intel Xeon processors.
The most forward-looking component of the release is the AgentOps framework. To secure autonomous systems, Red Hat implemented SPIFFE/SPIRE-based cryptographic identity management, replacing static hardcoded keys with short-lived tokens to enforce least-privilege operations. The platform also introduced an MCP (Model Context Protocol) server catalog and gateway for governed tool access, and integrated automated adversarial scanning using technology from its Chatterbox Labs acquisition and the Garak LLM vulnerability scanner. This represents a structural shift from managing models to managing the entire lifecycle, identity, and security boundary of agentic workflows.
Tool of the Week: CoreWeave Sandboxes
Commercial | On-Cluster or Serverless | CoreWeave
A unified execution layer that provides secure, isolated environments specifically designed for running reinforcement learning (RL), agent tool use, and model evaluation workflows.
As AI systems evolve to take actions, training requires more than just compute. RL and evaluation workflows require isolated environments that can run code safely, maintain state across steps, and scale massively. Most organizations currently rely on custom-built systems or third-party tools that sit outside their core infrastructure. CoreWeave Sandboxes solves this by running directly within a customer's CoreWeave Kubernetes Service (CKS) cluster, or as a serverless runtime through Weights & Biases (W&B).
The platform is built for massive concurrency. IBM Research reported spinning up thousands of sandboxes in parallel per training step for their RL workflows, each with its own container image and resource boundaries. For the serverless option, sandbox activity is captured directly in the same W&B run view as training metrics, allowing teams to debug in context rather than across disconnected tools.
Quick Hits
- Fractile Raises $220M for In-Memory Inference Chip The UK-based startup closed a Series B led by Accel to take its SRAM-based inference chip to production. Fractile's architecture performs matrix multiplications inside SRAM cells alongside compute logic, removing the DRAM bottleneck. The company claims this approach can run frontier models 25 times faster at one-tenth the cost of current GPU setups. Tape-out is expected in 2027, with Anthropic reportedly in early discussions to become a customer.
- Nebius Acquires Clarifai Core Team and Inference IP AI cloud provider Nebius has acquired the core engineering and research team from Clarifai, led by founder Matthew Zeiler, who joins as SVP of Research. Nebius also licensed Clarifai's inference and compute orchestration technology to strengthen its Token Factory platform. This builds on Nebius's recent acquisition of Eigen AI, combining model-level optimization with Clarifai's system-level orchestration.
- Gartner Predicts 40% AI Observability Adoption by 2028 A new Gartner report projects that 40% of organizations deploying AI will implement dedicated AI observability tools within four years. The firm cited the growing need for predictive issue detection and the inability of traditional software monitoring to trace opaque deep learning models and agentic AI logic.
Gradient Brief is published for ML engineers, data scientists, and technical founders. Forward to a colleague who should be reading this.