Capability 3.3: Continuous Operation (Long-Duration Stability)

Marin January 30, 2018

16 minutes read

Quantum CRQC Q-Day Capability Continuous Operation

This piece is part of an eight‑article series mapping the capabilities needed to reach a cryptanalytically relevant quantum computer (CRQC). For definitions, interdependencies, and the Q‑Day roadmap, begin with the overview: The Path to CRQC – A Capability‑Driven Method for Predicting Q‑Day.

(Updated in Sep 2025)

(Note: This is a living document. I update it as credible results, vendor roadmaps, or standards shift. Figures and timelines may lag new announcements; no warranties are given; always validate key assumptions against primary sources and your own risk posture.)

Introduction

One of the most critical requirements for a cryptographically relevant quantum computer (CRQC) is continuous operation – the ability to run a complex quantum algorithm non-stop for an extended period (on the order of days) without losing quantum coherence or needing a reset. In practical terms, the entire quantum computing stack – qubits, control electronics, error-correction processes, cooling systems – must sustain stable performance for the full duration of a long computation. This is the difference between a laboratory demo and actually factoring a 2048-bit RSA key (the classic CRQC benchmark): if the machine can only run for a few minutes before drifting out of spec or requiring manual recalibration, it will never complete the ~5-day marathon that Shor’s algorithm demands. In Craig Gidney’s recent scenario for breaking RSA-2048, for example, the quantum computer would need to sustain about 5 days of continuous operation with 1 μs cycle times and error rates below 0.1%. Any significant interruption or downtime during that period would derail the computation – unlike classical computers, quantum processors cannot simply “pause and resume” a calculation because the fragile quantum state would decohere and be lost. Continuous operation is thus absolutely blocking for a CRQC; it implies the machine must behave like a marathon runner, not a sprinter, executing billions of gate operations in sequence without a break.

At first glance, this might sound impossible given that individual qubits often lose their state (decohere) within microseconds or milliseconds. However, the theory of quantum error correction (QEC) provides a roadmap: as long as physical error rates can be suppressed below a certain threshold, a quantum computer can in principle compute for arbitrarily long durations by actively correcting errors as they occur. This “threshold theorem,” developed in the late 1990s, proved that robust, arbitrarily long quantum computations are possible in principle if error rates per operation are beneath a fixed value and error-correcting code cycles run continuously throughout the algorithm. In essence, QEC can extend a qubit’s effective coherence indefinitely – if all parts of the system work perfectly. The challenge for continuous operation is turning this principle into practice at scale. It means engineering a quantum computer that actually realizes that indefinite error-corrected stability: not just for a single logical qubit, but for thousands of logical qubits, over billions of cycles. In the sections below, we’ll explore why this is so challenging, what progress researchers have made, and what remains to be done to achieve long-duration stability in quantum processors.

The Challenge of Long-Duration Stability

Building a quantum computer that runs non-stop for days is a monumental engineering challenge. Today’s quantum hardware, whether superconducting circuits or trapped ions or other modalities, is far from meeting this mark. Some of the key obstacles to long-duration quantum operation include:

Limited Qubit Coherence and Gate Errors

Physical qubits have finite coherence times (often microseconds to seconds at best), and gates introduce errors on the order of 0.1-1% per operation in state-of-the-art devices. Without error correction, a quantum state will irreversibly degrade after a short time. Even with QEC, every additional gate or cycle is another opportunity for error, so running trillions of operations in sequence is daunting. The error rates must not only be below the fault-tolerance threshold on average, but remain consistently below threshold throughout the run – any significant spike could cause the encoded logical qubits to fail. This requires extraordinary qubit quality and error rate stability over time. Recent experiments have only just reached the point of demonstrating error rates below threshold in small systems. For example, Google’s 105-qubit “Willow” processor showed that larger QEC codes can actually lengthen qubit lifetimes (a 7×7 qubit logical patch lived over twice as long as the best physical qubit), a landmark proof that errors can be exponentially suppressed. Yet even in that experiment, achieving the ultra-low logical error rates (~10^-12) needed for a multi-day algorithm will require many further orders-of-magnitude improvement. In short, the qubits and gates must be extremely error-free and consistent over time to support continuous operation.

Calibration Drift and Environmental Stability

Real quantum processors are notoriously sensitive to drift. Qubit frequencies shift, laser alignments wander, control electronics phase jitter – and these changes can increase error rates if left uncorrected. Today, most quantum computers require frequent calibration interruptions to retune qubit parameters. For instance, on IBM’s superconducting cloud processors, the system is typically recalibrated every 24 hours to adjust for drift. In many labs, calibrations are even more frequent (multiple times per day or between experimental runs). Obviously, one cannot stop a days-long computation to recalibrate without losing the quantum state. Thus, continuous operation demands either ultra-stable hardware that doesn’t drift over days, or the ability to recalibrate “on the fly” without halting the quantum circuit. This is an active area of engineering research. Some progress is being made – for example, automating and speeding up calibrations so they can run in the background. A notable achievement was reported by IQM Quantum Computers, who kept a 20-qubit system running at a customer site for over 100 days of continuous operation without human intervention, thanks to automated calibration software that regularly tuned the qubits and gates. In that case, calibrations were performed automatically in roughly one hour each day, allowing the system to maintain high-fidelity operation with minimal downtime. Such automation is essential to approach multi-day stability. Similarly, improvements in cryogenic and environmental controls (vibration isolation, temperature stability, shielding from electromagnetic interference) aim to reduce drift at the source. The bottom line is that a CRQC will need server-like stability, where the quantum processor can run for days in a steady-state “cruise control” mode, autonomously correcting any drifts in real-time. This is far from the manual, hands-on tuning that current lab experiments often entail.

Rare Event Disruptions (Cosmic Rays & Beyond)

Even if average error rates are under control, rare but powerful error bursts can spoil a long computation. A prime example is high-energy radiation (cosmic rays or environmental radioactivity) causing correlated qubit errors. When a high-energy particle (a cosmic ray muon or gamma photon) strikes a superconducting chip, it can generate a shower of quasi-particles that simultaneously upset many qubits across the processor. These “error bursts” are relatively infrequent, but over a multi-day period you are almost guaranteed to experience some. Recent studies have directly measured this effect: on a 63-qubit superconducting array, cosmic ray muon impacts were observed roughly once every ~67 seconds on average, with even more frequent gamma-ray bursts, together causing clusters of qubit errors. Such events violate the usual assumption of most QEC codes that errors occur independently and rarely all at once. In fact, today’s QEC schemes struggle with correlated errors that hit multiple qubits simultaneously – a single cosmic ray can knock a whole patch of the code temporarily above the error threshold. Over 5 days (≈432,000 seconds), many such particle hits could occur, effectively limiting the computation time unless mitigated. Addressing this may require heavy shielding, active monitoring, or robust fault-tolerant protocols that can handle bursts. Researchers are exploring solutions like installing quantum processors in deep underground labs (e.g. SNOLAB) to reduce cosmic-ray flux, and adding in-fridge radiation detectors to trigger protective measures. Engineering “rare-event resilience” will be key – the system must either ride through these incidents or quickly recover from them, or else a single stray cosmic ray could undo days of computation.

Fault Tolerance and Redundancy

When running non-stop for days, there is also a higher probability that some component will outright fail during the run. A classical data center deals with this via redundancy (backup hardware, error-correcting memory, checkpointing, etc.). A quantum computer will need analogous strategies. For example, what if one physical qubit in a large logical qubit suffers a catastrophic failure (gets stuck or dies) mid-computation? In current designs that would likely crash the whole algorithm. A future CRQC might need to include spare qubits and flexible routing such that a “dead” qubit can be bypassed and a spare brought in, without stopping the computation. This concept has been demonstrated in principle in certain systems – notably, a recent 2025 experiment with a 3,000-atom quantum simulator showed the ability to continuously replenish qubits on the fly. In that work, a neutral-atom array normally lost atoms from its traps every minute or so (which would limit run time), but the researchers engineered a conveyor-belt system that delivered fresh atoms into the array to replace lost ones continuously. They managed to maintain a stable 3,000 atom register for over two hours, far beyond the usual 60-second lifetime of such arrays. While that was an analog quantum simulator (not a full digital quantum computer), it illustrates the kind of autonomic fault-tolerance that may be needed: the machine actively repairs itself (or at least mitigates failures) during operation. Similarly, all the classical control systems (clock generation, classical processors for decoding, etc.) must have high uptime or redundancy. The decoder, for instance, will be running in real-time for the entire multi-day period processing error syndromes; it must not crash or stall. This might entail using error-corrected classical computation or failover mechanisms in the control hardware to ensure that a glitch in the classical side doesn’t interrupt the quantum side. In summary, every link in the chain – quantum and classical – has to be robust against both steady wear-and-tear and unexpected glitches.

These challenges are interdependent and compound each other. Long-duration stability is truly a system-level problem: it demands qubits with low enough error rates operated with an effective QEC code and decoder such that the logical error rate stays below threshold with margin over time. It also relies on excellent engineering of the physical system (cryogenics, shielding, control software) so that the error environment doesn’t worsen with time. If any one piece – say, the error rate drifts upward, or a burst event injects too many errors at once, or the decoder falls behind – then continuous operation can fail even if everything else was nominal. This is why continuous operation is often seen as one of the final and hardest milestones on the road to a CRQC.

Current Status and Progress

As of 2025, no quantum computer has demonstrated the ability to run a fully error-corrected quantum algorithm for more than a few hours, let alone the multiple days that CRQC-scale tasks would require. By most assessments, Continuous Operation is at a low Technology Readiness Level (TRL ~1-2) – meaning the basic concept is understood and required, but we are still in the earliest stages of experimental validation. Here’s a snapshot of the current state and recent progress:

Hours-Scale Experiments

The longest continuous quantum operations to date have been on the order of hours, and typically these have been special-purpose demonstrations. For example, in 2023 Google Quantum AI reported running a quantum error-correcting code for billions of cycles over several hours on their superconducting processor. In one test, they used a simplified repetition code (protecting against only bit-flip errors) and managed nearly 10 billion QEC cycles (roughly 5-6 hours of runtime) without observing a logical error. This was a remarkable stability test, indicating that with careful calibration and below-threshold error rates, the system can in principle keep a qubit “alive” far longer than any single physical qubit would last. However, it’s important to note this was not running a useful algorithm, just cycling the error-correction loop to measure lifetime. It likely also involved stitching together many runs or re-initializations (the details indicate the aggregate runtime was hours, which might have been multiple shorter runs). Still, it’s evidence that we’re pushing into the regime of hour-plus sustained operation in a research setting. Likewise, some trapped-ion experiments have demonstrated multi-round fault-tolerant operations (e.g. multiple rounds of the Steane code error correction on ion qubits), though ion trap gates are slower and typically experiments are minutes long at most. The recent neutral-atom continuous operation demo (2+ hours with live atom replacement) is another proof-of-concept in a different technology. So, we’re starting to see subsystems run on the order of 1-2 hours under special conditions. But no one has run a complex quantum circuit continuously for, say, 24 hours, much less 100+ hours.

Automation and Uptime Improvements

A positive trend is the move toward automated control systems that keep quantum hardware stable without constant human tuning. The IQM 100-day result mentioned earlier is one such example, focusing on the classical control software side of the problem. Similarly, cloud quantum providers like IBM have developed automated calibration routines that run on a daily schedule or on-demand. Some research groups are exploring “calibration on the fly” where calibration pulses are interleaved with computational pulses in idle moments of the algorithm. While not yet used in major algorithm runs, these techniques could allow a quantum computer to self-correct its calibration drift continually, rather than pausing for a dedicated recalibration period. Another development is better system monitoring: modern quantum stacks increasingly have sensors and diagnostics (for fridge temperature, microwave power levels, etc.) that can warn of impending issues so adjustments can be made proactively. On the hardware front, engineering improvements like more stable microwave sources, better vacuum and cryogenic stability, and even simple things like minimizing cable creep or dielectric charging, all contribute to longer stable run times. It’s worth noting that classical supercomputers and large telescopes faced analogous issues of drift and downtime, and over decades they evolved to extremely high uptimes. Quantum computing is now entering that engineering maturation phase – companies are beginning to advertise uptime metrics. For instance, IBM Quantum occasionally reports system uptimes and has a “first failure time” metric (how long a quantum job can run before likely encountering a hardware error). Right now, those times are measured in hours, not days, for large processors.

Error Correction Advancements

The field of QEC itself is aiming not just to correct errors, but to do so more efficiently and robustly, which feeds into continuous operation. One important aspect is the decoder – the classical algorithm that processes syndrome measurement data to pinpoint errors. Decoders must output corrections quickly (often within microseconds) and reliably. Recent advances like machine-learning-based decoders and hardware-accelerated decoding are showing better accuracy and speed, which will help maintain stability over long runs. Moreover, researchers are considering how to handle correlated errors in decoders; for example, if a burst event is detected, can the decoder adapt or flag that chunk of data as unreliable? There’s active research on incorporating knowledge of spatially or temporally correlated errors into the QEC logic. The goal is a more resilient QEC system that won’t be thrown off by a sudden cluster of errors (essentially, making the logical qubits themselves more robust to rare hits). Another line of work is fault-tolerant circuit design – creating algorithms and schedules that can better tolerate pauses or asynchronous operation. One idea is to design routines that periodically refresh the state or have built-in “checkpoints” via error correction cycles, though true checkpointing of quantum state is theoretically challenging. Nonetheless, clever protocol design might allow certain mid-computation measurements that don’t collapse the algorithm but ensure nothing has gone awry. All of these QEC and protocol improvements directly contribute to longer feasible run times.

In summary, the status today is that pieces of continuous operation have been individually demonstrated (automated calibrations, hours-long error-corrected runs, mitigation of specific failure modes like atom loss). But these have not yet been integrated into a single system capable of a multi-day, fault-tolerant computation. No multi-day quantum factoring experiment has been attempted – because simply no current hardware could support it. The best we can do is extrapolate from shorter experiments. Right now, continuous operation is an aspirational target that lies beyond the demonstrated horizon, which is why it’s marked as a critical gap for CRQC. However, the steady progress in error rates and stability gives reason for optimism: each year, qubits become a bit more reliable and labs learn more about keeping them stable for longer.

Outlook and Indicators of Progress

Achieving continuous operation at the scale required for breaking RSA will likely be one of the last hurdles cleared on the path to a true CRQC. In fact, we might not know we’ve cleared it until someone actually strings everything together and completes a marathon computation. That said, there are several developments to watch in the coming years that will signal we’re closing the gap on long-duration stability:

Increasing Logical Qubit Lifetimes

One concrete metric is the lifetime (or coherence time) of a logical qubit under error correction. Researchers will be pushing this number upward by increasing code sizes and improving hardware. As of 2023, a logical qubit lived a bit over twice as long as a single physical qubit in the Google demo – on the order of a few milliseconds. To run for days, we need logical lifetimes extended to hours or more (effectively “infinite” for practical purposes, if error correction can keep up). If we start seeing reports that a logical qubit memory survived for, say, minutes or hours with QEC, that’s huge news. It would mean error correction is really working that well. Keep an eye on experiments with larger distance codes (d=11, d=13 surface codes, etc.) and on any announcements that the logical error rate per cycle has been pushed down to, e.g., 10^-9, 10^-12, etc. — each order of magnitude gained translates to more feasible runtime. Academic papers or press releases might explicitly state “we maintained the logical qubit state for X seconds/minutes”.

Length of Quantum Algorithm Runs

Another sign will be the successful execution of longer algorithms on prototype fault-tolerant hardware. Perhaps the first milestone will be a fully error-corrected algorithm running end-to-end for hours (for example, some quantum chemistry simulation that takes a few hours, completed without resetting). Eventually, a demonstration of a multi-day algorithm (not necessarily breaking RSA, but something intentionally long to test the system) would be a clear marker of this capability. Organizations might do a showpiece calculation – e.g., factor a smaller RSA number in a long run – to prove continuous operation. As of now, quantum algorithms that run on current devices typically finish in seconds or minutes due to hardware limitations; seeing that go to hours will be noteworthy.

System Uptime Metrics and Roadmaps

Quantum hardware providers are increasingly publishing roadmaps that include reliability and uptime goals. For instance, IBM’s quantum roadmap and others (Quantinuum, Google, PsiQuantum) discuss scaling to millions of qubits by the early 2030s, but they also implicitly need to address reliability by that stage. If those companies start reporting that their cryogenic systems, control electronics, etc., can run continuously for weeks without failure, that’s a positive sign. Similarly, metrics like “quantum volume” might be extended to incorporate time/stability. We might see something like a “quantum uptime” or “sustained quantum volume” metric introduced. In 2025, the startup IQM’s claim of 100 days continuous operation was a bit of marketing, but it points to an important shift: selling quantum machines on reliability, not just qubit count. As the industry matures, expect more of these claims. Users (especially cloud users) will demand SLAs (service-level agreements) for quantum uptime, which will drive vendors to quantify and improve continuous operation.

Mitigation of Environmental and Rare Faults

Advances in radiation shielding or detection will also be a key indicator. The ongoing work to test quantum chips in underground labs (like the collaboration with SNOLAB in Canada) is something to watch. If results show that moving hardware underground indeed reduces error bursts significantly, it might spur data centers to consider that for operational quantum machines. On the flip side, if new error-correcting codes or mitigation techniques are announced to handle cosmic-ray-induced errors (for example, special error-correcting code constructions that can correct double or triple faults within a local region), that will also be big news. Any solution to the “correlated error problem” will directly lengthen achievable run times. Even improvements in materials (to reduce background radiation from chip packaging) or better vacuum in ion trap enclosures can incrementally help. When you see research about “radiation-hard” qubit design or improved decoupling from the environment, it’s all feeding into continuous operation reliability.

Fully Automated Quantum Operations

Ultimately, a CRQC will need to run with minimal human babysitting. Progress toward that end will be seen in things like AI-driven calibration (using machine learning to predict and correct drifts in real-time) and health monitoring systems for qubits. Projects like Quantum Machine’s open-source “QUAlibrate” which aims to calibrate devices in minutes, or Q-CTRL’s efforts to dynamically stabilize gates against drift, are examples. As those tools mature, they’ll likely be deployed in larger setups. If a quantum data center can report that it corrected a small qubit frequency drift at hour 50 of a run without stopping the computation, that’s essentially the vision realized. We aren’t there yet, but the groundwork is being laid now.

Given all these moving parts, when might continuous 100-hour operation be achieved? It’s hard to predict, but many experts believe that achieving this level of stability is possible by the time hardware reaches the scale of ~million qubits. That could be in the early 2030s if optimistic roadmaps hold. However, it’s not a guarantee – some refer to continuous operation as the “unsung hero” capability because it blends physics, engineering, and even software. It doesn’t follow Moore’s Law or any simple scaling; it requires eliminating every source of downtime or failure. It’s possible that even after we have enough qubits and fidelity, debugging the stability could take a few extra years. From a risk management perspective (for those worried about Q-Day, the day a quantum computer breaks encryption), this is one of the reasons many believe we still have a bit of time – not only must someone build a huge quantum computer, they must also operate it like a flawless supercomputer for days on end. That said, progress can be nonlinear. Once a few groups demonstrate smaller-scale continuous runs, know-how will spread quickly.

How can a reader track this capability? A good approach is to follow technical news from major quantum hardware teams (IBM, Google Quantum AI, IonQ, Quantinuum, PsiQuantum, academic groups like at Harvard, MIT, etc.). Look specifically for achievements related to stability: terms like “quantum error correction for X hours,” “automated calibration,” “uptime,” or “long coherence.” White papers or preprints that mention “sustained operation” or “rare event mitigation” are directly relevant. Also, watch the NIST and national lab reports; they often summarize state-of-the-art and might highlight if a logical qubit has reached a new lifetime record or if someone ran a small instance of Shor’s algorithm over an unprecedented duration. Conferences in quantum computing (APS, IEEE Quantum, etc.) often have talks on system engineering and reliability. As the field progresses, we may even see a public demonstration – for instance, a live factorization of a 128-bit number taking many hours, to prove it can be done. When such a demo happens, continuous operation will have moved from theory to reality.

Conclusion

In conclusion, Continuous Operation (Long-Duration Stability) is the capstone capability that turns a quantum computer from an experimental device into an industrial-grade factoring machine. It requires qubits that behave for as long as necessary, error correction that never takes a coffee break, and a whole lot of clever engineering to avoid any “gotchas” over days of runtime. We’re still at the beginning (TRL 1-2) of this journey – basic principles identified, early experiments underway – but the progress in recent years is encouraging. Each incremental improvement in qubit coherence, calibration automation, and fault tolerance brings us closer to the day when a quantum computer can run non-stop until the job is done. On that day, if and when it arrives, the machine won’t just be doing a fancy demo – it will be capable of accomplishing cryptographically significant feats like breaking RSA-2048, because it will finally have the endurance to go the distance.

Quantum Upside & Quantum Risk - Handled

My company - Applied Quantum - helps governments, enterprises, and investors prepare for both the upside and the risk of quantum technologies. We deliver concise board and investor briefings; demystify quantum computing, sensing, and communications; craft national and corporate strategies to capture advantage; and turn plans into delivery. We help you mitigate the cquantum risk by executing crypto‑inventory, crypto‑agility implementation, PQC migration, and broader defenses against the quantum threat. We run vendor due diligence, proof‑of‑value pilots, standards and policy alignment, workforce training, and procurement support, then oversee implementation across your organization. Contact me if you want help.

Talk to me Contact Applied Quantum