NVIDIA Levels Up Local AI Agents Across RTX PCs and DGX Spark

Personal agents are exploding in popularity, with open source projects like OpenClaw and Hermes seeing rapid adoption by AI developer communities on GitHub. Built to adapt to individual preferences and workflows, these agents can interact with applications, generate content, automate repetitive processes and manage multi-step tasks — all while running locally on device. Today at […]

Personal agents are exploding in popularity, with open source projects like OpenClaw and Hermes seeing rapid adoption by AI developer communities on GitHub. Built to adapt to individual preferences and workflows, these agents can interact with applications, generate content, automate repetitive processes and manage multi-step tasks — all while running locally on device.

Today at NVIDIA GTC Taipei at COMPUTEX, NVIDIA unveiled NVIDIA RTX Spark — a new class of Windows PCs purpose-built for personal agents — alongside a wave of updates that expand local agents across the broader NVIDIA RTX and DGX ecosystems. 

Running agents securely and privately requires hardware that’s up to the task. RTX Spark’s 1 petaflop of AI compute and 128GB of unified memory can meet the computing demand of on-device agents, offering a new class of computer that goes from tool to teammate. Designed for AI, creating and gaming, RTX Spark brings NVIDIA’s 30 years of technology innovation to slim Windows laptops with all-day battery life and ultraefficient desktop PCs.

NVIDIA’s partnership with Windows scales from personal to enterprise solutions. Also introduced at the show was NVIDIA DGX Station for Windows, the ultimate AI deskside supercomputer for professionals, bringing a data-center-class GPU and CPU for inference in a desktop system equipped with Windows for manageability, security and compatibility. 

Other announcements include:

  • The NVIDIA OpenShell runtime is coming to Windows, built on Microsoft’s new security primitives for agents — providing developers an easy-to-deploy package for secure, on-device agents. Hermes Agent and OpenClaw will also integrate OpenShell and the Microsoft security primitives into their new Windows applications.
  • The NVIDIA NemoClaw blueprint is expanding across NVIDIA’s full local AI lineup — GeForce RTX, RTX PRO, RTX and DGX Spark, and DGX Station — with new streamlined installers and support for Hermes Agent.
  • 2x inference performance on top agentic models with multi-token prediction in llama.cpp and vLLM, as well as new multi-GPU optimizations for llama.cpp and ComfyUI.
  • H Company is releasing computer-use tools — including new models and an upcoming desktop agent harness — optimized for RTX and DGX PCs.
  • Adobe is rearchitecting its Photoshop and Premiere apps, Blender is adding NVIDIA DLSS 4.5 Ray Reconstruction, and NVIDIA unveiled RTX Video Frame Generation, which will be coming to ComfyUI. All these updates arrive this fall with RTX Spark.
  • The NVIDIA Broadcast 2.2 update brings Studio Voice feature optimizations and Elgato Stream Deck support. NVIDIA Project G-Assist also adds Stream Deck integration.

Local Agentic AI: Personal, Private and Fast on Windows RTX PCs

Broad agent adoption has been limited by the inability to run agents securely and privately on users’ primary PCs.

NVIDIA and Microsoft are partnering to address this challenge by delivering a robust, secure Windows platform for on-device agents.

The collaboration begins with a strong foundation — new Windows security primitives and the NVIDIA OpenShell runtime — to ensure agents run safely and under full user control.

The new Windows primitives deliver identity, containment, policy and end-to-end security capabilities to build and run agents natively. NVIDIA OpenShell provides additional policy capabilities for the user to define what agents can and cannot do, the ability to intelligently route queries to local models based on the user’s privacy policies, and the ability to disguise personal information in queries sent to cloud models.

This robust security and privacy layer is being adopted by leading agent developers such as Hermes Agent and OpenClaw in their new Windows apps. These new apps will make it easy and secure for users to access powerful on-device agents that can execute tasks in Windows applications, reason through cross-app workflows, generate images and video, code plug-ins and apps, and semantically search local files.

Powering agents on local devices requires both robust security and performant hardware. RTX Spark features up to 1 petaflop of AI compute and 128GB of unified memory to meet the processing demands of on-device agents.

NVIDIA is also accelerating the local open model ecosystem these agents rely on. 

NVIDIA collaborated with the llama.cpp community to enable features and optimizations such as multi-token prediction (MTP) — a speculative decoding technique where a smaller draft model proposes multiple tokens at a time that the target model verifies in a single pass. This coupled with other optimizations such as programmatic dependent launch delivers 2x performance on Qwen 3.6 and 3.5 27B, and a 1.6x performance boost on Qwen 3.6 and 3.5 35B. These updates are available via the llama.cpp webUI and LM Studio.

Performance gains shown with latest NVIDIA optimizations to llama.cpp: Qwen3.6-27B delivers up to 2x throughput and Qwen3.6-35B up to 1.6x on GeForce RTX 5090, accelerating local agentic AI workloads through open source community collaboration.

For AI enthusiasts running multi-GPU rigs, NVIDIA collaborated with the open source community to enhance two of the most popular local AI tools:

  • llama.cpp adds tensor parallelism for up to 2x memory and 1.8x compute on two equivalent GPUs.
  • ComfyUI gains a new classifier-free guidance method for up to 2x performance on two equivalent GPUs, plus the option to split model chains across GPUs to take advantage of the combined memory.
Shows token generation performance improvements for the Tensor Parallel Multi-GPU technique over pipeline parallel and single-GPU inferencing on llama.cpp.
Shows generation time performance improvements for multi-GPU techniques on ComfyUI.

NVIDIA is also expanding agent capabilities with H Company. H Company’s computer-use harness lets agents navigate a PC by seeing the screen and operating a mouse and keyboard just like a user, even in apps with no application programming interfaces, and is coming soon to RTX and DGX PCs with local model support. 

NVIDIA has collaborated with H Company to quantize its state-of-the-art Holo Computer Use models, as well as accelerate its harness — driving a 2x speedup on NVIDIA GPUs while reducing memory consumption by 35%. The models are available for download now, and the Holo Desktop app will be available soon. 

Agent Optimizations for Linux

For developers who need always-accessible local agents, NVIDIA DGX Spark is the most capable personal agent AI computer for developers who need a Linux environment — unifying large memory, fast compute and compatibility with the NVIDIA CUDA ecosystem.

This month’s DGX Spark OS release brings the most streamlined out-of-the-box experience with a streamlined NemoClaw installer, along with faster inference on the top agentic models. 

NemoClaw is now available for all NVIDIA RTX and DGX PCs on Linux and the Windows Subsystem for Linux. Safely deploy local agents on Linux with new streamlined installers, delivering automatic sandboxing and added support for Hermes Agent. 

NVIDIA has collaborated with vLLM to optimize inference for agents, with optimizations in vLLM and new optimized NVFP4 checkpoints for Qwen 3.6 35B. The updates deliver 2.6x performance on DGX Spark compared with the previously available NVFP4 checkpoints from Unsloth, and include kernel improvements as well as mixed precision, and CUDA Graph support for MTP.

Read the vLLM blog for a full walkthrough of serving NVFP4 mixture-of-expers models on DGX Spark — from unified memory tuning to a working NVIDIA Nemotron 3 Super reference setup.

Delivering Powerful Creative Experiences With Adobe

NVIDIA is partnering with Adobe to rearchitect Adobe Premiere and Photoshop for RTX Spark. Firefly-powered Generative Fill in Photoshop and Generative Extend in Premiere are among the hundreds of accelerated tools that deliver creative power, precision and control. RTX Spark takes these capabilities further, delivering up to 2x faster AI, editing, coloring and effects across creative workflows.

Adobe Premiere will feature a new video pipeline that taps into RTX Spark’s unified memory, Blackwell GPU and TensorRT software, delivering real-time performance for editing and color correction, GPU-accelerated AI performance and more efficient rendering of complex timelines. In addition, Adobe’s Substance 3D Painter and Stager will run natively on RTX Spark for smoother and more responsive 3D texturing and scene creation workflows.

Adobe’s next-generation Photoshop engine will be optimized for GPU-accelerated compositing, enabling live filters, high dynamic range and modern natural brushing. The AI-native pipeline is built to harness the full power of RTX Spark, including TensorRT.

Adobe will further extend Premiere and Photoshop to allow users to create, edit and design with Windows agents, providing creators with a collaborative teammate to accelerate their workflows.

Updates to Adobe’s creative apps like Premiere, Photoshop and Substance are expected to start rolling out alongside RTX Spark availability.

New Tools and App Updates for Creators

New NVIDIA platform updates and partner app optimizations are rolling out across the broader RTX ecosystem — some shipping today and others arriving with RTX Spark this fall.

NVIDIA Broadcast 2.2 graduates Studio Voice — an AI feature that makes any microphone sound studio-quality — out of beta starting today. Studio Voice now runs on GeForce RTX 3060 GPUs and above with improved performance. The application also gets Elgato Stream Deck integration and configurable keyboard shortcuts. 

Project G-Assist also adds Stream Deck support via the Elgato MCP Server, letting users enable AI assistant capabilities for their stream setup.

In addition, Blender Cycles is integrating DLSS 4.5 Ray Reconstruction as a new denoiser, turning the path-tracing viewport into an interactive, real-time viewer. This lets 3D artists navigate around a scene while seeing near-final render quality, transforming the lighting and look-development workflow. The update will be released with Blender 5.3 this fall, alongside RTX Spark.

Also launching with RTX Spark, RTX Video Frame Generation is a new AI effect that doubles or quadruples video frame rate in real time — ideal for enhancing the 15-20 frames-per-second (fps) outputs that AI models typically generate. It arrives as a Python wheel and a ComfyUI node, letting AI artists generate videos faster at low fps and then interpolate up to smooth playback rates.

#ICYMI: The Latest From RTX AI Garage

🪐 Read the full NVIDIA RTX Spark announcement for details on the superchip, NVIDIA’s work with Windows on agents, and partner laptop and small desktops.

💻ASUS ProArt creator laptops now ship with Black Forest Labs’ FLUX.2 Klein 4B — a distilled image model preinstalled through the MuseTree app, optimized with the NVFP4 format and NVIDIA TensorRT for RTX software development kit. Creators get an up to 2.5x speedup and 560% memory reduction, with the first-run experience going straight from unbox to generating images locally — no model downloads or ComfyUI setup required.

🎬 The NVIDIA AI for Media software development kit is introducing updates, including new LipSync NVIDIA NIM microservices optimized for French, German and Spanish. The Active Speaker Detection NIM microservice also adds multi-camera support with cross-video speaker correlation.

🤖 Check out the latest RTX AI Garage blog post on Hermes Agent and self-improving AI on RTX PCs and DGX Spark.

Plug in to RTX Spark on Facebook, Instagram, TikTok and X — and stay informed by subscribing to the RTX Spark newsletter.

See notice regarding software product information.