Integrating a Quantum Computer into HPC Infrastructure

Marin Ivezic May 23, 2026

12 minutes read

This article is part of the How to Build a Quantum Computer Deep Dive series, which covers the practical engineering of assembling quantum computers from modular components across every major qubit modality. The capstone article introduces the series and the Quantum Open Architecture model that makes it possible.

This article draws extensively on Applied Quantum‘s Systems Integration Playbook (v2.0, May 2026), the primary source for signal chain specifications, calibration sequences, integration timelines, and troubleshooting data throughout the series. Where other sources supplement the playbook, they are cited inline. Cost figures are list-price estimates from vendor disclosures and Applied Quantum’s field experience; negotiated prices vary 20–40%.

Introduction

A quantum computer that is not connected to classical compute resources is a physics experiment, not a computational tool. Every algorithm of practical interest requires hybrid execution: classical pre-processing to prepare the problem, quantum circuits to perform the computation that classical hardware cannot, and classical post-processing to interpret the results. Error correction decoding, the operation that makes fault-tolerant quantum computing possible, is itself a classical computation that must execute in real time while the quantum processor runs. Without tight integration between QPUs, GPUs, and CPUs, a quantum computer sits idle between shots while a researcher manually shuttles data between systems.

The IQM team at the Leibniz Supercomputing Centre in Munich learned this the hard way. Their arXiv:2509.12949 paper, covering 250 days of operational telemetry, documents the engineering required to make a 20-qubit superconducting system function as a genuine accelerator inside the SuperMUC-NG HPC environment rather than a standalone experiment in an adjacent room. The problems were not quantum physics problems. They were networking, scheduling, monitoring, and latency engineering problems, the same class of problems that the HPC community solved decades ago for GPUs, adapted for hardware that decoheres on microsecond timescales.

This article covers the integration stack that connects a quantum computer to classical HPC infrastructure: the NVQLink interconnect, the QRMI scheduling interface, the CUDA-Q programming model, the QEC decoder path, and the network, API, and staffing decisions that go with them. Two adjacent layers have their own articles in this series. The electronics that physically drive the qubits are covered in the control system build guide. The software that schedules jobs and orchestrates the machine, including the question of why no turnkey Western quantum operating system exists, is covered in the quantum OS and orchestration article. For the physical hardware being integrated, see the modality-specific build guides for superconducting, trapped-ion, neutral-atom, photonic, and silicon-spin systems.

NVQLink: the Western HPC-quantum interconnect

Before NVQLink, coupling a quantum computer to a GPU cluster meant building a custom integration for every deployment. Each control vendor had its own data format, its own timing semantics, and its own idea of how measurement results should travel from the QPU to the classical processor and back. The latency of that round trip determined whether real-time error correction was possible, and most custom integrations could not guarantee it.

NVIDIA introduced NVQLink at GTC Washington DC on 28 October 2025 with 17 quantum hardware builders, 5 control system providers, and 9 U.S. national laboratories (Brookhaven, Fermilab, LBNL, LANL, MIT Lincoln Lab, ORNL, PNNL, Sandia, plus NIST). At GTC 2026, NVQLink became publicly available through the release of the cudaq-realtime API in the CUDA-Q platform.

The architecture provides 400 Gb/s GPU-QPU throughput and measured round-trip latency of 3.96 µs maximum (Caldwell et al.) using RDMA over Converged Ethernet (RoCE) on commercially available 400 Gb/s Ethernet hardware. Dell has validated the XE7745, XE9680, R7715, and R770 server platforms as Real-time Host systems, reproducing NVQLink latencies under 4 µs. This is not exotic networking hardware. It is the same RoCE fabric that large-scale AI training clusters already use, which means an HPC center that runs GPU workloads today has most of the physical infrastructure NVQLink needs.

What this means in practice: a quantum control system sends qubit measurement data to a GPU node over a standard Ethernet link at microsecond-scale latency. The GPU processes that data (decoding error syndromes, computing conditional operations, running classical subroutines of hybrid algorithms) and returns instructions to the controller before the next quantum operation must begin. The round trip is fast enough for real-time quantum error correction on superconducting hardware, where a surface-code cycle takes approximately 1 µs and the decoder must respond within roughly 10 cycles. The choice of control electronics on the QPU side of that link is a procurement decision in its own right, covered in the control system build guide.

The ecosystem adoption since the October 2025 launch has been rapid. Qblox adopted cudaq-realtime for OQC’s full-stack quantum systems, enabling microsecond-level hybrid feedback loops. Quantum Machines integrated cudaq-realtime with their Open Acceleration Stack. Q-CTRL reports a 50x reduction in classical overhead and 5x speedup in overall wall-clock time by integrating NVQLink with their calibration software at the Q-PAC deployment, with enhanced NVQLink support planned throughout 2026. SDT deployed the first tightly-coupled hybrid quantum-classical data center in Korea using their controller with NVQLink connecting an Anyon Computing QPU to NVIDIA GPUs. Diraq is using NVQLink to connect silicon-spin quantum processors with accelerated computing for calibration, autotuning, and benchmarking. HPE has incorporated NVQLink into its vision for quantum supercomputing.

NVQLink combined with CUDA-Q and QRMI is the most documented and widely adopted Western integration path for low-latency QPU-HPC coupling as of May 2026. Any integrator planning an HPC-connected quantum deployment should specify NVQLink compatibility in the control electronics procurement.

QRMI: making QPUs schedulable like GPUs

NVQLink solves the latency problem. QRMI solves the scheduling problem. Before QRMI, every HPC center that added a quantum computer had to build a custom job-submission interface from scratch: its own queue, its own authentication flow, its own monitoring hooks, its own way of telling the classical scheduler that a QPU existed and could accept work. The result was a different integration at every site, none of them portable.

The Quantum Resource Management Interface (QRMI) changes this by formally exposing QPUs as Slurm-native schedulable compute resources. A researcher submitting a hybrid quantum-classical job to an HPC cluster sees the QPU as another accelerator alongside CPU and GPU nodes. Standard Slurm job submission, scheduling, and monitoring workflows apply. No quantum-specific scheduler is needed.

QRMI originated from an initiative established by IBM with collaborative development from Pasqal, Rensselaer Polytechnic Institute, and the STFC Hartree Centre. Pasqal demonstrated QRMI integration with NVIDIA CUDA-Q in March 2026, with the on-premises stack first deployed at CINECA to integrate with Leonardo, the EuroHPC pre-exascale supercomputer. The integration is also available on Pasqal’s cloud platform and in CUDA-Q 0.14 as a backend option.

QRMI is designed to be hardware-agnostic, modality-agnostic, and vendor-agnostic. A properly implemented QRMI layer means the HPC center does not need to know whether the QPU behind the scheduler is superconducting, trapped-ion, or neutral-atom. The abstraction handles authentication, resource allocation, job lifecycle management, and monitoring through standard interfaces that HPC operations teams already understand.

For an integrator, QRMI eliminates the need to build a custom quantum-aware scheduler. The HPC/DevOps engineer configures QRMI as a Slurm plugin, registers the QPU as a schedulable resource, and standard HPC workflows apply from that point. QRMI sits on the seam between this article and the orchestration layer: it is the interface through which the broader job-scheduling and multi-user orchestration stack, covered in the quantum OS and orchestration article, presents a QPU to classical HPC.

CUDA-Q: the programming model

CUDA-Q is NVIDIA’s open-source platform for hybrid quantum-classical programming. It provides a unified framework combining CPUs, GPUs, and QPUs under a single programming model, with support for C++ and Python. CUDA-Q compiles quantum circuits to target-specific backends (physical QPUs or simulators) and manages the interleaving of classical and quantum kernels within a single program.

For integrators, CUDA-Q matters because it is hardware-agnostic by design. A program written in CUDA-Q can target a Pasqal neutral-atom QPU, a Quantinuum trapped-ion system, an IQM superconducting processor, or a GPU-based quantum simulator without rewriting the quantum subroutines. The backend is selected at runtime. This portability is not complete in practice (pulse-level control and hardware-specific optimizations still require backend-specific code), but at the circuit level, CUDA-Q provides the closest thing the Western stack has to a universal quantum programming model.

CUDA-Q adoption is accelerating. TII integrated its Qibo framework for hybrid workload validation. memQ demonstrated its Extensible Distributed Quantum Compiler (xDQC) built on CUDA-Q, simulating hundreds of qubits across eight QPUs. Hiverge integrated CUDA-Q with an AI platform that uses LLM agents to translate natural-language problem descriptions into executable quantum circuits.

The QEC decoder path

Real-time quantum error correction is the capability that transforms a quantum computer from a NISQ-era research tool into a fault-tolerant computational resource. The decoder is the classical computation that processes error syndrome measurements (produced every surface-code cycle, roughly every microsecond on superconducting hardware) and determines which corrections to apply to the logical qubit. It is hosted on the classical side of the HPC link, which is why it belongs in this article rather than with the control electronics.

The engineering constraint is straightforward and unforgiving. The decoder must keep up with the syndrome data rate. If it falls behind, unprocessed syndromes accumulate, the correction lags reality, and the logical qubit decoheres. For a distance-7 surface code on superconducting hardware with a 1 µs cycle time, the decoder must respond within roughly 10 µs. NVQLink’s 3.96 µs round-trip latency fits within this budget, making GPU-hosted decoders practical for the first time on commercially available hardware.

Riverlane Deltaflow 2. FPGA-based decoder with an ASIC roadmap. Deployed at OQC CentreSquare (July 2025) and Oak Ridge National Laboratory (September 2025). Integrated with Qblox control electronics over QECi, Riverlane’s open QEC interface (March 2026). One of the clearest commercially available, vendor-supported, real-time decoder paths for integrators who do not want to build their own. Riverlane’s target: MegaQuOp (10⁶ error-corrected operations) as a milestone for production-grade fault tolerance.

Google AlphaQubit. ML-based decoder (recurrent neural network) that achieved the highest accuracy on Google Willow data, published in Nature. Not yet real-time. Requires significant GPU resources for inference. Research-grade, not commercially deployed.

NVIDIA NVQLink decoder hosting. GPU-hosted decoders running on Grace Hopper or GB200 class hardware, connected via NVQLink to control electronics. Quantinuum’s NVQLink demonstration achieved a 67 µs reaction time for a qLDPC decoder, roughly 30x faster than Helios’ 2 ms requirement. This validates GPU-hosted decoding even for trapped-ion systems with slower gate times.

IBM FPGA/ASIC decoder. Custom FPGA implementation targeting sub-microsecond latency. Internal to IBM, not commercially available.

PyMatching. Open-source Union-Find decoder (Oscar Higgott). Production-ready for offline batch decoding. Real-time use requires integration with NVQLink or custom FPGA hosting.

For a 2026 deployment that needs real-time QEC capability, the recommendation is Riverlane Deltaflow 2 integrated with Qblox or Quantum Machines control electronics over QECi, Riverlane’s open, QEC-specific control-to-decoder interface. This is the path of least resistance for an integrator: Deltaflow runs on its own dedicated FPGA hardware, so it does not require a GPU node. The GPU-hosted alternative (Google AlphaQubit, or PyMatching on a GPU) is what needs a dedicated GPU node, NVIDIA GH200 or GB200 class, connected via NVQLink to the control rack and running the decoder continuously. Either way, a real-time decoder path is not optional for QEC-ready systems; without it, error correction is offline-only.

Reference deployments

Five deployments demonstrate production-grade HPC-quantum integration as of May 2026. Each solved a different slice of the integration problem, and the lessons from them are more instructive than the specifications.

IQM/LRZ (Munich): the most documented public deployment in the field. The 20-qubit Q-Exa system was integrated into the SuperMUC-NG HPC environment via the Munich Quantum Software Stack in 2024, and its operation is documented in arXiv:2509.12949 — 250 days of calibration telemetry covering the physical installation (four weeks on site plus three weeks of remote commissioning), recalibration cadences (~40 minutes for a quick cycle, ~100 minutes for a full recalibration), and drift characterization over months of operation. The follow-on Euro-Q-Exa system, built on IQM’s 54-qubit Radiance platform with Zurich Instruments control, was inaugurated in February 2026, with a 150-qubit upgrade planned by the end of 2026. Four IQM quantum computers are now installed at LRZ. The LRZ experience is the closest the public record has to a playbook for HPC-quantum integration, and anyone planning a comparable deployment should read the paper before writing a project plan.

Pasqal/CINECA (Bologna): 140-qubit neutral-atom QPU integrated with Leonardo pre-exascale supercomputer via QRMI/Slurm. First European supercomputer supporting hybrid HPC-QPU workloads in a standard Slurm environment. This deployment is the reference case for QRMI: the QPU appears in the Slurm queue alongside GPU partitions, and researchers submit jobs through the same interface they use for classical workloads.

Jülich Supercomputing Centre (Germany): First European DGX Quantum deployment, with Quantum Machines and Arque Systems. This is the tightest GPU-QPU coupling in Europe, with the DGX node and the control electronics co-located for minimum NVQLink latency.

Israel IQCC (Israel Quantum Computing Center): QuantWare Contralto QPU + ORCA photonic system + NVIDIA DGX classical compute + Quantum Machines OPX+ controller. The only multi-modality hybrid deployment in the reference set, combining superconducting and photonic QPUs in a single facility with shared classical infrastructure.

Q-PAC (Denver): QuantWare QPU + Qblox control + Maybell cryostat + Q-CTRL calibration. The fastest Western QOA deployment on record (five months from concept to cloud-accessible operation). NVQLink-based GPU cluster integration on the roadmap for 2026, with Q-CTRL reporting 50x classical overhead reduction and 5x wall-clock speedup in early NVQLink demonstrations.

Network design and API patterns

For the HPC/DevOps engineer responsible for the classical infrastructure connecting the quantum computer to the HPC environment:

Network topology. Dedicated 10/25/100 GbE VLAN from the cryostat control rack to the NVQLink GPU node. Private fiber to the HPC network. For NVQLink real-time operation, the GPU node must be within the Ethernet distance limit of the control electronics (practically, in the same room or adjacent rooms). For cloud-attached operation: 1+ Gbps egress with TLS-terminated REST/gRPC APIs. For classified workloads: dedicated VPN or air-gap.

API design. OpenQASM 3 circuits plus provider-specific extensions. REST/Python SDK following the patterns established by Qiskit Runtime, AWS Braket, and Azure Quantum. Required capabilities: synchronous and asynchronous circuit submission, pulse-level access (gated by user role), calibration data retrieval, job and queue introspection, per-tenant quotas and quality metrics.

Authentication. OIDC or SAML integration with the organization’s identity provider. Per-tenant isolation of jobs and data. Role-based access control separating administrators (full system access), quantum developers (circuit submission, calibration data read), and end users (circuit submission only).

Post-quantum cryptography on the API surface. The classical API surface of a quantum computing service (the REST endpoints, the authentication flows, the data in transit) should be migrated to post-quantum cryptography during the initial deployment. Use ML-KEM (FIPS 203) for key exchange and ML-DSA (FIPS 204) for authentication. This is the one point where a quantum compute build intersects the broader quantum security picture, and it is straightforward to handle at deployment time rather than retrofitting it once external users are connected. The PQC Migration Framework at pqcframework.com provides the methodology.

Team and skills

HPC-quantum integration requires a different skill set from QPU hardware operation, and the distinction matters for hiring. The engineers who install a dilution refrigerator and calibrate a QPU are not the engineers who configure a VLAN and deploy a Slurm plugin. Both teams are essential. They speak different languages and come from different career paths.

One to two HPC/DevOps engineers for NVQLink configuration, QRMI/Slurm plugin deployment, network VLAN management, GPU node setup, and monitoring integration (Grafana, Prometheus). These engineers come from classical HPC backgrounds and do not need quantum physics expertise. They need Slurm administration experience, RDMA/Ethernet networking skills, and familiarity with GPU cluster management.

One software engineer for API development: REST/gRPC endpoint design, OIDC/SAML authentication integration, per-tenant isolation, and the Qiskit/CUDA-Q/Pulser portability layer. This role bridges the gap between the quantum hardware team (who speak in pulses and qubits) and the end users (who submit circuits through a web API).

One quantum software engineer for framework integration: translating algorithms from framework-level code (Qiskit, CUDA-Q, PennyLane) to the pulse-level instructions that the control electronics execute, and optimizing circuit compilation for the specific QPU topology.

For organizations running real-time QEC, one engineer with decoder integration experience: configuring Riverlane Deltaflow or GPU-hosted PyMatching, tuning decoder parameters, and monitoring decoder latency against the syndrome data rate.

These roles are distinct from the cryogenic engineers, laser specialists, and quantum device physicists described in the modality-specific build guides. In a production deployment, the HPC integration team and the hardware operations team work in parallel, with the control electronics as the interface between them.

What this means for procurement

Specify NVQLink compatibility in the control electronics procurement. This is non-negotiable for any system designed for 2027+ operation. All three major Western control vendors (Qblox, Quantum Machines, Zurich Instruments) support NVQLink.

Budget for a dedicated GPU node (NVIDIA GH200 or GB200 class) as part of the quantum computer procurement. This is not a “nice to have” for future QEC. It is required infrastructure for real-time error correction, and it accelerates calibration and hybrid algorithms today. Q-CTRL’s 50x classical overhead reduction via NVQLink at Q-PAC demonstrates the immediate value.

Plan QRMI deployment as part of the HPC integration, not as an afterthought. If your HPC center runs Slurm (most do), QRMI makes the QPU schedulable through existing workflows. The alternative is building a custom scheduler integration, which is more expensive and less portable.

Address the post-quantum cryptography migration of your quantum service’s API surface during the initial deployment, not after external users are connected. Practical Steps to Quantum Readiness covers the methodology.

For the physical infrastructure that the HPC stack connects to, see the facility preparation guide and the cryogenic infrastructure article. For the full cost picture including GPU nodes and HPC integration costs, see the cost and procurement article.

For HPC integration set within the full systems-integration program, see my book, Quantum Systems Integration.