Gradient Brief

MLOps & AI Infrastructure — for the engineers building it

Red Hat AI 3.4 Ships the Most Complete AgentOps Stack Yet

At the Red Hat Summit in Atlanta this week, the company announced Red Hat AI 3.4, delivering what it describes as a "metal-to-agent" platform. This is the most technically dense MLOps release of the week, addressing the specific operational friction points that emerge when moving autonomous agents from experimentation to production. The core of the release is a new Model-as-a-Service (MaaS) layer that provides a single, governed interface for developers to access curated models, complete with self-service token key management, role-based administration, and usage showback.

The inference engine beneath the MaaS layer received significant upgrades. Powered by vLLM and the llm-d distributed inference engine, the platform now supports request prioritization, allowing interactive and background traffic to share the same endpoint while processing latency-sensitive requests first under load. Red Hat also announced general availability for speculative decoding, which uses efficient draft models to accelerate processing by 2x to 3x without quality loss. On the hardware side, the platform expanded its accelerator support to include AMD Instinct MI355X GPUs and generally available vLLM CPU serving on both AMD EPYC and Intel Xeon processors.

The most forward-looking component of the release is the AgentOps framework. To secure autonomous systems, Red Hat implemented SPIFFE/SPIRE-based cryptographic identity management, replacing static hardcoded keys with short-lived tokens to enforce least-privilege operations. The platform also introduced an MCP (Model Context Protocol) server catalog and gateway for governed tool access, and integrated automated adversarial scanning using technology from its Chatterbox Labs acquisition and the Garak LLM vulnerability scanner. This represents a structural shift from managing models to managing the entire lifecycle, identity, and security boundary of agentic workflows.

Tool of the Week: CoreWeave Sandboxes

Commercial | On-Cluster or Serverless | CoreWeave

Training agentic systems takes more than raw GPU hours. Reinforcement learning and evaluation pipelines need disposable environments that execute untrusted code safely, hold state across multiple steps, and replicate to high counts on demand. CoreWeave Sandboxes provides that execution layer in two forms: inside an existing CoreWeave Kubernetes Service (CKS) cluster for teams already on the platform, or as a serverless runtime exposed through Weights & Biases for teams that want no cluster to manage.

Concurrency is the headline capability. IBM Research used it to launch thousands of isolated sandboxes per training step, each pinned to its own container image and resource limits, so a single RL run can fan out without contention. The serverless path folds observability back in as well: sandbox activity appears in the same W&B run view as the training metrics beside it, which keeps debugging in one place instead of stitched across separate tools.

Quick Hits

Fractile Raises $220M for In-Memory Inference Chip The UK-based startup closed a Series B led by Accel to take its SRAM-based inference chip to production. Fractile's architecture performs matrix multiplications inside SRAM cells alongside compute logic, removing the DRAM bottleneck. The company claims this approach can run frontier models 25 times faster at one-tenth the cost of current GPU setups. Tape-out is expected in 2027, with Anthropic reportedly in early discussions to become a customer.
Nebius Acquires Clarifai Core Team and Inference IP AI cloud provider Nebius has acquired the core engineering and research team from Clarifai, led by founder Matthew Zeiler, who joins as SVP of Research. Nebius also licensed Clarifai's inference and compute orchestration technology to strengthen its Token Factory platform. This builds on Nebius's recent acquisition of Eigen AI, combining model-level optimization with Clarifai's system-level orchestration.
Gartner Predicts 40% AI Observability Adoption by 2028 A new Gartner report projects that 40% of organizations deploying AI will implement dedicated AI observability tools within four years. The firm cited the growing need for predictive issue detection and the inability of traditional software monitoring to trace opaque deep learning models and agentic AI logic.

Gradient Brief is published for ML engineers, data scientists, and technical founders. Forward to a colleague who should be reading this.

Red Hat AI 3.4: A Metal-to-Agent AgentOps Stack

Gradient Brief

Red Hat AI 3.4 Ships the Most Complete AgentOps Stack Yet

Tool of the Week: CoreWeave Sandboxes

Quick Hits

Keep Reading

Quick Links

Subscription

Socials