Quantum Security & PQC News Research News

“Tour de Gross” Bets That High‑Rate QLDPC Modules Can Beat the Surface Code on Qubit Economics – If Inter‑Module Measurements and Links Can Catch Up.

Marin Ivezic June 15, 2025

11 minutes read

15 Jun 2025 – The June 2025 preprint “Tour de gross” proposes the bicycle architecture: a modular, long‑range‑connected fault‑tolerant quantum computing stack built around bivariate bicycle (BB) quantum LDPC codes and an explicit set of fault‑tolerant logical “bicycle instructions” (logical measurements, automorphisms, and T‑state injection).

The paper’s central claim is architectural: for a fixed number of physical qubits and a fixed physical error rate, a bicycle architecture can run circuits with roughly an order of magnitude more logical qubits than conventional surface‑code architectures, at comparable logical T counts – but often with longer runtime per logical rotation because compilation to the instruction set is measurement-heavy.

Two concrete BB codes anchor the proposal: the gross code [[144,12,12]] and the two‑gross code [[288,12,18]]. Each encodes 12 logical qubits per code block, with the architecture reserving one “pivot” logical qubit for measurement-based synthesis (leaving 11 “data” logical qubits per module in their compiler).

Under a uniform circuit-level depolarizing noise model and a modern belief‑propagation–style decoder (Relay‑BP), the authors estimate per‑instruction logical error probabilities. A headline asymmetry emerges: inter‑module logical measurements are currently the noisiest primitive in their benchmarked set (e.g., for gross at physical error rate ($$p=10^{-3}$$), inter‑module measurement is ($$\sim 2\times10^{-3}$$) per use)

The most “application-shaped” resource claim is a toy‑model benchmark: a 10×10 transverse‑field Ising model (TFIM) time‑evolution circuit (≈1.8×10⁵ rotations in their construction) is estimated to be reachable with ≈8.1k physical qubits if ($$p\approx 7.3\times10^{-4}$$) using two‑gross modules – versus tens of thousands of qubits in a surface-code baseline at similar noise.

Background

Fault tolerance trades many physical qubits (noisy hardware qubits) for fewer, more reliable logical qubits (encoded information protected by a quantum error‑correcting code). This framing is standard in the threshold literature: below a code‑ and decoder‑dependent error threshold, increasing code size (or effective distance) can suppress logical errors exponentially in distance (up to architecture and noise‑model constraints).

What “qLDPC” and “bivariate bicycle” mean in plain terms

Quantum LDPC codes are stabilizer codes whose parity checks have low weight (each check touches only a small number of qubits) and where each qubit participates in only a small number of checks. This sparsity is attractive because it suggests constant-rate encoding (more logical qubits per physical qubit) and potentially much lower qubit overhead than 2D topological codes – at the cost of harder decoding and more challenging connectivity and circuit design.

Bivariate bicycle codes are a family of quantum LDPC codes that can be viewed as toric-code-like layouts on a torus, augmented with structured long‑range connections parameterized by two variables (hence “bivariate”). In “Tour de gross,” the gross code uses a 12×6 torus layout (144 data qubits) and two‑gross uses 12×12 (288 data qubits), but both require additional check qubits for syndrome extraction cycles (bringing the code module’s “memory circuit” qubit count to 288 and 576, respectively).

Noise models and why they matter here

The paper primarily benchmarks under a stochastic circuit-level depolarizing noise model: every 1‑ and 2‑qubit operation, measurement, preparation, and idle has probability (p) of suffering a random Pauli error (with standard per-gate distributions like (p/15) over two‑qubit Paulis for CNOT). This is a widely used first-order fault‑tolerance stress test, but it is often optimistic relative to calibrated device noise (biased noise, leakage, correlated errors, and link‑dependent infidelities).

CRQC and bosonic codes: definitions and relevance

The preprint is not a continuous-variable hardware paper, but it does embrace a continuous‑rotation computing style at the software level: it compiles algorithms into Pauli-generated rotations ($$P(\phi)=\exp(i\phi P/2)$$) and Pauli measurements (a Pauli‑based computation flavor), then realizes those rotations via T-state injection plus synthesis/approximation.

Bosonic codes instead encode logical information in oscillator modes (photonic or microwave), with canonical examples like the Gottesman–Kitaev–Preskill (GKP) code, which embeds a qubit into a continuous-variable system to protect against small displacement errors in phase space. Bosonic codes are relevant to “CRQC” insofar as they can provide a physical layer with different error statistics (e.g., bias, analog error correction) and sometimes reduce overhead or improve effective gate fidelities – but “Tour de gross” assumes qubit-based BB code blocks and does not rely on bosonic encoding.

What the paper did and what it found

The architectural move: define a complete “computer,” not just a code

The authors propose six criteria for a scalable fault‑tolerant architecture – fault-tolerant, addressable, universal, adaptive, modular, and efficient – and argue the bicycle architecture can satisfy them using BB codes plus long‑range couplers (both within and between modules).

At a high level, the hardware is partitioned into modules:

Code modules that store 12 logical qubits each (gross or two‑gross),
Logical processing units (LPUs) attached to each code module to enable selective logical measurements via qLDPC “surgery,”
Inter‑module adapters that turn Bell‑pair links into joint checks for inter‑module logical measurements,
T factory modules (they use surface‑code-based factories as concrete, benchmarked stand‑ins).

This concretizes a key direction from earlier “low-overhead with long-range connectivity” proposals: treat logical Pauli measurements – not transversal logical gates – as the backbone of computation.

The instruction set: make logical operations explicit and simulatable

The bicycle architecture defines a universal instruction set (I) that includes: (i) idles (syndrome cycles), (ii) shift automorphisms (code automorphisms implemented as qubit permutations inducing logical CNOT networks), (iii) in‑module Pauli measurements via LPU‑assisted qLDPC surgery, (iv) inter‑module Pauli measurements using Bell‑pair couplers between modules, and (v) T injection via a connected magic‑state factory.

Two implementation details matter for later feasibility discussions:

They assume a Bell‑coupler that can create a high‑fidelity Bell state between modules in less than a syndrome-cycle time, and they treat that link as having the same error rate (p) as local operations in the baseline model.
They rely on fast classical processing (syndrome decoding plus feedforward) and highlight the need for real‑time decoding hardware (FPGA/ASIC).

Benchmark methodology: circuit simulations + rare-event extrapolation

For each instruction, the authors specify a full circuit under stochastic circuit noise and decode using Relay‑BP, a belief‑propagation decoder designed for circuit-level decoding with low computational footprint.

To reach extremely low logical error regimes that brute‑force Monte Carlo can’t sample directly, they use a failure-spectrum ansatz technique for rare events (fitting the fraction of failing fault configurations by fault weight, then integrating over the binomial fault distribution).

Compilation: from algorithms to bicycle instructions

A substantial fraction of the paper is “systems work”: it proposes a compilation flow that converts a circuit into Pauli-generated rotations and Pauli measurements (Pauli-based computation), distributes that workload across 12‑logical‑qubit modules (using a “pivot” ancilla qubit per module), synthesizes general Pauli measurements by conjugating a set of native measurements with native Clifford operations, and implements small-angle rotations via T-injection + synthesis (citing the Ross–Selinger approximation method for ($$Z(\phi))$$).

Two key resource-estimation equations appear in their end‑to‑end model:

Physical qubits vs modules: [ q = M(c+u+a) – a + a’ + f ] where (M) is number of code modules, (c) code‑memory qubits, (u) LPU qubits, (a) code‑code adapter qubits, (a’) code‑factory adapter qubits, and (f) factory qubits.
Circuit failure probability (leading-order additive model): [$$P \approx \sum_j N_j P_j $$] summing instruction counts N_j times instruction logical error rates P_j.

Key results and comparison to prior work

Concrete module sizes and instruction-level error rates

For the gross module, the paper’s component counts imply:

Code memory circuit: (c=288) physical qubits,
LPU: (u=90),
Code‑code adapter: (a=22).

So a “compute‑capable” gross code module is roughly (c+u+a=400) physical qubits for 12 logical qubits, i.e. about 33 physical qubits per logical (or about 36 per data logical if 1/12 is reserved as pivot).

In the paper’s Table 2 (central estimates) the striking pattern is that two‑gross appears extraordinarily strong as a memory under the authors’ model (e.g., at ($$p=10^{-3}$$), idle ≈ ($$10^{-20}$$) per cycle), while the compute path is limited by measurement gadgets – especially inter‑module measurements, which dominate the failure budget in their end‑to‑end estimates.

End-to-end capability: “10× more logical qubits” (with caveats)

The paper’s capability study for random Pauli-rotation circuits reports that, for similar physical budgets and error rates, the bicycle architecture can support much larger logical workspaces than a surface-code baseline. For example, at ($$p=10^{-3}$$) and ($$q\approx 50{,}000$$) physical qubits, they estimate:

Two‑gross bicycle: ($$n\approx 704$$) logical data qubits, ($$N_T\approx 3.5\times 10^6$$),
Surface-code baseline (their comparable case): ($$n\approx 42$$), ($$N_T\approx 2.5\times 10^6$$).

This is the core “10×” narrative: more logical qubits at roughly comparable T volume, driven by code rate and lower qubit overhead per logical qubit.

But the same table reports a cost: the bicycle architecture’s timesteps per logical rotation are usually higher (e.g., ~7057 vs ~2269 in that example, assuming one factory), consistent with their observation that compilation into bicycle instructions is measurement- and scheduling-heavy.

Comparison table: how “Tour de gross” sits in the historical lineage

Work (≤ June 2025)	Primary contribution	What it assumes	Quantitative hooks (from the source)	Relationship to “Tour de gross”
Tour de gross (arXiv:2506.03094, Jun 2025)	End-to-end modular architecture + explicit instruction set + compiler + resource estimates for gross/two‑gross BB codes	Long-range couplers; Bell‑pair links; circuit-level depolarizing noise; fast decoding/control	Module component counts (Tables 1–3); instruction logical error estimates (Table 2); TFIM at ≈8.1k qubits and ($$p\approx7.3\times10^{-4}$$) (Section 4)	System integration step: turns BB-code memory + surgery + factories into a “computer design”
Low-overhead FTQC using long-range connectivity (Sci. Adv., 2022)	Logical computation via logical Pauli measurements on qLDPC memory, enabled by long-range interactions	Long-range connectivity; LDPC decoding feasibility	Order-of-magnitude overhead improvements estimated for ~100 logical qubits (in their architecture-level estimates)	Conceptual ancestor: measurement-first computation model that bicycle instantiates concretely
High-threshold, low-overhead FT quantum memory (Nature 2024 / arXiv:2308.07915)	End-to-end memory protocol for LDPC family incl. BB codes with high threshold	Specific syndrome circuits; circuit noise; decoding	Reports a circuit-noise threshold ≈0.8% and concrete example: 12 logical qubits with 288 physical qubits (memory circuit)	Provides the “memory engine” and credibility for BB-code threshold/overhead claims
Improved QLDPC surgery & bridging codes (arXiv:2407.18393, Jul 2024)	Lower-overhead logical measurement (“surgery”) + bridge constructions for joint measurements	Tanner-graph expansion; modular decoding	Shows Clifford capability on $$\[144,12,12$$ ] BB code with explicit ancilla overhead figures (e.g., ~O(w) ancillas in some regimes)	Bicycle builds directly on these ideas and optimizes LPUs/adapters for two specific BB codes
Extractors architecture (arXiv:2503.10390, Mar 2025)	General “extractor system” primitive to turn any qLDPC memory into a Pauli-measurement compute block	One logical cycle = ($$O(d)$$) syndrome cycles; scalable bridge systems	Overhead scaling claims (extractor size ($$\tilde O(n))$$) and parallel measurement emphasis	Bicycle is an explicit, benchmarked instantiation for BB codes; extractors is broader and more abstract
Magic state cultivation (arXiv:2409.17595, Sep 2024)	High-efficiency T-state preparation within a surface-code patch	Uniform depolarizing circuit noise; postselection	Claims ≈order-of-magnitude fewer qubit-rounds to reach ~($$2\times10^{-9$$}) at ($$p=10^{-3}$$)	Bicycle uses cultivation as its preferred ($$p=10^{-3}$$) factory baseline (Table 3)

Critical analysis

The preprint is unusually concrete for an architecture paper – but several assumptions and open gaps are load-bearing.

The noise model is standard – but likely optimistic for the controversial parts

The baseline model treats all operations as equally faulty (same (p)), including: measurement, idles, and critically the inter-module Bell‑coupler operations. The authors themselves flag that measurements and couplers might be worse by up to an order of magnitude, and note that relaxing this assumption could materially change the logical error rates.

A realistic validation campaign would therefore need heterogeneous noise sweeps: separate ($$p_{\text{1q}}, p_{\text{2q}}, p_{\text{meas}}, p_{\text{link}}$$), plus correlated link noise, leakage, and latency. The paper frames this as future work rather than providing sensitivity curves.

Some of the most consequential two-gross compute numbers are explicitly assumed

For two‑gross at ($$p=10^{-3}$$) and ($$p=10^{-4}$$), Table 2 brackets the in‑module and inter‑module measurement logical error rates (e.g., ($$[10^{-11}]$$), ($$[10^{-9}]$$), ($$[10^{-20}]$$), ($$[10^{-18}]$$). That is not a small detail: those measurements are the computational bottleneck, and the end‑to‑end capability estimates depend on them.

A key reproducibility question is therefore: do the full two‑gross measurement circuits under circuit noise, with a concrete decoder, actually achieve those bracketed regimes? The paper gives a methodology for rare-event extrapolation, but not the missing datasets for those bracketed entries.

Circuit distance shortfall hints that schedules/gadgets may leave distance on the table

For the gross code, they report circuit-distance upper bounds ($$d_{\text{circ}}\le 10$$) even though the code distance is 12, and they note that achieving full distance may require improved syndrome schedules.

This is more than pedantry: if the architecture’s selling point is “high distance at high rate,” then distance-losing measurement schedules are a silent overhead tax (you may need larger codes earlier). This is exactly the sort of systems‑level trap that can inflate logical overhead even when the underlying code family is excellent.

Inter-module measurements look like the first engineering “wall”

At ($$p=10^{-3}$$) for gross, their simulated inter‑module logical measurement is ≈($$10^{-2.7}$$) per use – orders of magnitude worse than memory idles and worse than in‑module measurements.

They suggest multiple plausible causes: adapter structure creating many low‑weight logicals, decoder optimization issues, and belief‑propagation difficulties with weight‑2 checks in adapters. The architecture is modular precisely to exploit long‑range links – but this result suggests that link‑enabled logic is the weakest link in the current design.

Benchmarks that would most directly validate (or falsify) the architecture’s practicality:

Compiler‑in‑the‑loop experiments: run small Pauli‑rotation workloads that stress measurement synthesis overhead (their Figure 9 histogram is a red flag for time cost).
Syndrome‑cycle experiments: demonstrate repeated BB-code syndrome extraction with the required connectivity and stable decoding performance (memory first).
One in‑module logical measurement (e.g., (X_1) or (Y_1)) with an LPU, including repeated cycles (C) and decoding/majority logic under realistic measurement noise.
One inter‑module joint measurement between two modules using a Bell‑coupler link model consistent with measured microwave‑interconnect performance (remote transfer and remote CNOT are relevant empirical building blocks).

Implications for CRQC timeline

“Tour de gross” doesn’t change the fundamental threshold theorem story: to run crypto-scale algorithms you still need logical error rates far below 10⁻¹⁰ and a sustained pipeline of non‑Clifford resources (T states or equivalents).

What it can change is the capital efficiency of reaching useful logical workspaces – especially the “hundreds to low‑thousands of logical qubits” regime that tends to bracket early scientific advantage and the pre‑RSA‑breaking era. The bicycle architecture’s argument is that surface‑code qubit overhead is the dominant bottleneck, and high‑rate qLDPC modules plus long‑range links can cut that dramatically.

Why this matters specifically for continuous-rotation workloads

Many near-term “scientific” workloads are naturally expressed as sequences of small-angle rotations (Hamiltonian simulation, phase estimation variants, product formulas). The paper’s compilation strategy explicitly targets Pauli-generated rotations with arbitrary ($$\phi$$), then routes the burden to a local factory-adjacent pivot via synthesis and T injection.

In that sense, the bicycle architecture is compatible with “CRQC” understood as:

continuous-rotation at the algorithm layer (arbitrary ($$\phi$$)),
Pauli-measurement‑driven execution (compile away internal Cliffords),
and magic-state‑driven non‑Clifford supply.

Likely milestones before it accelerates anything

Based on the paper’s own bottlenecks, the architecture becomes timeline‑relevant only if three milestones are met:

Reliable long-range connections with strong, stable error budgets, because the architecture’s modularity depends on Bell‑pair links that are “good enough” to not dominate logical failure. (Remote state transfer and remote CNOT demonstrations are suggestive but not yet an architecture guarantee.)
Inter‑module logical measurement improvements by at least ~1–2 orders of magnitude in logical failure probability (their own resource discussion calls this out as plausibly unlocking a 10× capability improvement).
A high-throughput magic-state story that doesn’t explode at lower (p). The paper uses cultivation at ($$p=10^{-3}$$) (compact) but switches to conservative distillation choices at ($$p=10^{-4}$$) (which can become enormous for certain targets). That underscores a general point: modular qLDPC compute doesn’t eliminate the T factory as a pacing resource; it changes how much workspace you can afford around it.

Quantum Upside & Quantum Risk - Handled

My company - Applied Quantum - helps governments, enterprises, and investors prepare for both the upside and the risk of quantum technologies. We deliver concise board and investor briefings; demystify quantum computing, sensing, and communications; craft national and corporate strategies to capture advantage; and turn plans into delivery. We help you mitigate the quantum risk by executing crypto‑inventory, crypto‑agility implementation, PQC migration, and broader defenses against the quantum threat. We run vendor due diligence, proof‑of‑value pilots, standards and policy alignment, workforce training, and procurement support, then oversee implementation across your organization. Contact me if you want help.

Talk to me Contact Applied Quantum