Bernstein Demonstrates ML-DSA Key Recovery in Under One Second, Argues Solo Deployment Damages Security

Marin Ivezic June 4, 2026

8 minutes read

June 4, 2026 – Daniel J. Bernstein, the cryptographer behind Ed25519 and Curve25519, has published a 59-page paper titled “Exploiting ML-DSA bugs” that provides open-source attack demonstrations against two classes of ML-DSA (FIPS 204) software vulnerabilities. Both attacks recover an equivalent secret key in under one second on a single laptop core, then forge arbitrary signatures that pass standard verification.

The paper estimates that 25% of ML-DSA libraries will ship with severe vulnerabilities at initial release, projects the number of breakable ML-DSA keys per year through 2039, and concludes that deploying solo ML-DSA rather than hybrid Ed25519+ML-DSA would mean an order of magnitude more breakable signature keys for at least the next five years. The bottom line: the rush to deploy ML-DSA without retaining ECC as a safety layer is creating measurable, quantifiable risk that will persist years after the first quantum attacks arrive.

The paper also directly challenges claims made in April 2026 by cryptography engineer Filippo Valsorda, who argued that hybrid ECC+ML-DSA signatures are unnecessary and would slow urgent PQC deployment.

The Attack Demonstrations

Bernstein’s paper provides two concrete, open-source attack demos (supplement available), each targeting a different class of ML-DSA software bug.

The first attack targets what Bernstein calls the “AABBCC” bug, where coefficients in the secret polynomial $y$ y end up repeating in pairs. This is an ML-DSA-specific version of a vulnerability announced in 2018 by the Dilithium team’s own Vadim Lyubashevsky, who said the bug “can easily be exploited to recover the secret key.” That 2018 announcement was never backed by a working demo. Bernstein’s paper provides one. The attack inspects a public key and two signatures, solves a system of linear equations over the polynomial ring, and recovers the four secret polynomials $s_0, s_1, s_2, s_3$ s0,s1,s2,s3 needed to forge signatures on arbitrary messages. It ran successfully on 50 consecutive randomly generated keys.

The second attack targets nonce repetition, the ML-DSA analog of the Sony PlayStation 3 ECDSA disaster from 2010. If ML-DSA signature software accidentally reuses the secret nonce across signatures (which can happen through input-length miscalculation or truncated hash inputs), the attacker can recover the secret key by simple polynomial division. The attack again completes in under one second and works roughly 80% of the time from two signatures, with additional signatures resolving the remainder.

Both attacks produce “universal forgeries,” allowing the attacker to sign any message of their choosing under the victim’s public key.

Why These Bugs Are Likely to Ship

The paper’s most unsettling contribution is its analysis of why these bugs are not hypothetical but near-certain to appear in production code.

Bernstein walks through the ML-DSA signing code in three production libraries: the official Dilithium reference implementation, the mldsa-native library, and OpenSSL’s crypto/ml_dsa directory. He shows how each library’s coefficient-unpacking code is dense with copy-paste-modify opportunities where a single index error produces an exploitable AABBCC pattern. The code in OpenSSL, for example, has a chain of bitwise operations where accidentally changing one subtraction to a bitwise AND would zero out every other polynomial coefficient, producing an equally exploitable A0B0C0 pattern.

The critical point: these bugs pass all standard functionality tests. An ML-DSA signature generated with the AABBCC bug is a valid ML-DSA signature. It interoperates with correct verifiers. It passes known-answer tests if those tests were generated by code with the same bug (which is exactly what happened with both official Dilithium 1.0 implementations in 2017). It passes interoperability tests across libraries. The only tests that reliably catch these bugs are cross-implementation checksum comparisons using derandomized RNGs, and Bernstein documents how inconsistently such tests are applied across the ML-DSA ecosystem.

The Wycheproof test suite, cited by Valsorda as evidence of “better testing infrastructure,” does not include ML-DSA key-generation tests and uses a nonstandard interface for signature-generation tests that many implementations will skip, Bernstein argues.

The Quantitative Case

Section 7 of the paper builds a statistical model of ML-DSA vulnerability rates using published empirical data. The model draws on Blessing, Specter, and Weitzner (2021), who analyzed 312 CVEs across eight major cryptographic libraries during 2010–2020, finding 0.45 to 1.19 CVEs per 1,000 lines of code added, with an average exploitable lifetime of 5.13 years.

Bernstein estimates that 50 ML-DSA libraries will see meaningful deployment over the next five years, each containing 2,000–4,000 lines of ML-DSA-specific code, producing an aggregate 100,000–200,000 lines of new code. At published vulnerability rates, this predicts roughly 100 new vulnerabilities across the ecosystem, or about 2 per library. If one-eighth of those are severe (allowing signature forgery), approximately 25% of ML-DSA libraries will ship with at least one severe vulnerability at initial release.

For Ed25519 by comparison, Bernstein reviews all CVEs mentioning Ed25519 or EdDSA, finding only 1–2 severe vulnerabilities across an ecosystem with roughly 100 libraries and 10+ years of deployment. He estimates that Ed25519 keys have roughly a 2% chance of running on software with an unpatched severe vulnerability in 2026, compared to 18% for ML-DSA keys in 2027.

The paper’s Figure 9.1.1 plots the estimated number of breakable keys per year for three scenarios (solo Ed25519, solo ML-DSA, and Ed25519+ML-DSA) under both small-scale ( $2^{20}$ 220 dollars of equipment) and large-scale ( $2^{30}$ 230 dollars plus quantum computers from 2029) attackers. The result: solo ML-DSA produces roughly an order of magnitude more breakable keys than either alternative for the first five years. The gap persists even after quantum attacks begin, because the initial quantum computers are expensive enough that per-key attack costs limit the number of ECC keys an attacker can break.

My Analysis

This paper is technically sound, and the attack demonstrations are the strongest evidence yet that the “just ship ML-DSA” position carries quantifiable risk. But it also arrives with context that readers need to evaluate carefully.

What’s Strong

The core argument is hard to dispute on technical grounds. ML-DSA software is new, complex, and being deployed in a rush. Historical CVE data tells us what happens when complex new cryptographic code ships at scale: it ships with bugs, some of them severe, and those bugs take years to find and patch. Bernstein is applying well-established empirical data to a specific prediction, and the recent track record of ML-DSA bugs confirms the prediction is already playing out.

The Kobeissi “Verification Theatre” paper from earlier this year found 13 vulnerabilities in Cryspen’s libcrux, a library that advertised formal verification. Firefox and OpenSSH rely on libcrux. Four of those bugs were inside the formal verification boundary itself, including a wrong multiplication specification in ML-DSA that rendered axiomatized AVX2 proofs unsound. If “formally verified” ML-DSA code can ship with exploitable bugs, the prospects for the dozens of unverified implementations are exactly what Bernstein’s statistical model predicts.

The Lee, Lim, and Yoon paper published the same week adds further evidence: production ML-DSA implementations have already shipped with arithmetic overflow bugs from aggressive removal of Montgomery reductions. That paper’s finding that bugs produce “non-conformant signatures” raises the question Bernstein poses: do those non-conformant signatures leak information about the secret key?

The rebuttal of Valsorda’s arguments in Section 10 is methodical. Valsorda wrote that “ML-KEM and ML-DSA are a lot easier to implement securely than their classical alternatives.” Bernstein identifies this as unsubstantiated and difficult to reconcile with the documented bug history. Valsorda cited Wycheproof as evidence of improved testing infrastructure; Bernstein shows that Wycheproof does not test ML-DSA key generation at all and uses a nonstandard interface for signature testing that many implementations will skip. Valsorda argued that hybrid signatures would “only slow us down”; Bernstein points to the existing IETF specification for composite ML-DSA and notes that the new code required for Ed25519+ML-DSA is almost entirely the same new code required for solo ML-DSA.

What Requires Nuance

Bernstein has been the most persistent advocate for hybrid ECC+PQ deployment for a decade and one of the most vocal critics of NIST’s PQC process. His technical work is rigorous, but readers should understand the advocacy context. This paper is not a dispassionate survey; it is a carefully constructed argument for a position Bernstein has held since at least 2016. That does not make it wrong. It does mean the reader should be aware that the modeling choices (the specific vulnerability rates, the 1/5 probability of buffer overflows bypassing the other signature system, the 2029 date for secret quantum attacks) were selected by someone with a clear thesis to support.

The quantitative estimates in Figure 9.1.1 should be treated as order-of-magnitude reasoning, not precise predictions. Bernstein is transparent about this: the number of ML-DSA libraries, the number of active keys, the severity fraction, the buffer-overflow probability are all stated as assumptions with stated rationales. But when a graph with six curves and logarithmic axes appears in a 59-page paper with 131 references, it can create an impression of precision that the underlying estimates do not support. The directional conclusion (more breakable ML-DSA keys than Ed25519+ML-DSA keys) holds up. The specific numbers do not.

The comparison between Ed25519+ML-DSA and solo Ed25519 is the paper’s weakest point, and Bernstein acknowledges this (Section 9.4). If adding ML-DSA introduces severe bugs that include buffer overflows capable of bypassing the ECC signature check, then Ed25519+ML-DSA could be less secure than keeping Ed25519 alone. Bernstein models this as a 1/5 probability per severe vulnerability. If the true probability is 1/20, the conclusion reverses. This is an active area of debate that depends on assumptions about memory safety, language choice, and library architecture that vary widely across the ecosystem.

The paper also deliberately excludes the risk of a mathematical break of the ML-DSA specification itself (Section 7.1). If Dilithium’s underlying lattice assumptions turn out to be weaker than expected, the number of breakable keys would be vastly higher than anything in Figure 9.1.1. Bernstein flags this as a concern but sets it aside to focus on the “definite problem” of software vulnerabilities. This conservative scope choice means the paper may actually understate the case for hybrid deployment.

What This Means for PQC Migration

From the perspective of my CRQC Quantum Capability Framework, this paper does not change the quantum timeline. It changes the migration calculus. The question is no longer just “when will a CRQC arrive?” but “what damage will we do to ourselves in the rush to prepare for one?”

Bernstein’s seatbelt analogy (Section 1.4) is apt. Hybrid signatures are a seatbelt with negligible cost. The car industry once argued that seatbelts were too expensive; a 1983 Supreme Court ruling ultimately rejected that position. The current argument that Ed25519+ML-DSA hybrid signatures would “slow down” PQC deployment does not survive contact with the evidence presented here: the IETF specification already exists, the additional code is minimal, and the security benefit is quantifiable.

For organizations planning PQC migration, the practical implications are:

First, do not deploy solo ML-DSA for new signature applications if hybrid Ed25519+ML-DSA is available. The cost difference is negligible; the security difference is an order of magnitude in the number of breakable keys.

Second, invest in testing infrastructure for your ML-DSA implementations that goes beyond standard functionality tests. Cross-library checksum comparisons using derandomized RNGs catch the classes of bugs Bernstein demonstrates. Standard interoperability tests do not.

Third, recognize that PQC migration timelines must account for software maturation, not just algorithm deployment. An ML-DSA library that ships today and receives its first serious security audit in 2028 is carrying two years of undetected vulnerabilities. This is not speculation; it is what the CVE data predicts.

As I have argued before, debating the exact arrival date of Q-Day is increasingly beside the point. Regulatory deadlines, insurance requirements, and client expectations are already set. Bernstein’s paper adds a new dimension to this argument: even the tools we deploy to prepare for Q-Day can damage security if we deploy them recklessly. The irony of making systems less secure in the name of quantum security should not be lost on anyone.