The Gradient-Free Dual: B-Spline Kernels and the Quantum Geometry of the Invariant
Abstract. This note describes a publicly reproducible scaffold for re-expressing a gradient-trained LLM's internal geometry as a gradient-free quantum kernel. Quantum kernels are chosen precisely because they train no circuit parameters — sidestepping barren plateaus by construction. Smooth B-spline embeddings, encoded over an orthogonal Base-SAE basis, resist the kernel-method failure mode (exponential concentration). The invariance claim is made falsifiable through centered kernel alignment (CKA) across distinct models. This is a theory note; the reference scaffold and a worked example will follow. The origin-first synthesis metric enters as a single, clearly marked hook and is withheld.
I. From gradients to a dual
A modern LLM is the product of iterative gradient descent. But its internal geometry — the relational structure of its hidden states — can be re-read without any further gradient optimization. The dual is this: the model was built by iteration; the kernel reveals its geometry through fixed-feature-map state overlaps and a single convex (gradient-free) classical step. We do not retrain anything. We measure what is already there.
II. Why kernels (no barren plateau)
In quantum kernel estimation, each vector x is mapped to a state |φ(x)⟩ by a fixed, non-trainable feature-map circuit. The kernel entry is the fidelity K_ij = |⟨φ(x_i)|φ(x_j)⟩|², estimated via the inversion/overlap test. Because no circuit parameters are optimized, there is no gradient-descent loop and therefore no barren plateau in training. The only optimization left is the downstream classical problem (an eigendecomposition, or an SVM dual) — convex, with a unique optimum.
III. The real adversary: exponential concentration
Kernels do not make the difficulty disappear; they relocate it. Quantum kernels can suffer exponential concentration — as qubit count grows, off-diagonal entries collapse toward a fixed value and the Gram matrix degenerates toward the identity, yielding a model whose predictions ignore the input. This is the formal equivalent of barren plateaus, and it is driven by high expressivity, heavy entanglement, global measurements, and noise. The mitigations are: low-expressivity, smooth embeddings; modest qubit counts; projected (local) kernels; and centering the Gram matrix before spectral analysis. Naming this risk openly is the point — a method that hides it is not yet a method.
IV. Why B-splines
A B-spline kernel of odd order is a valid Mercer kernel: its Fourier transform is sinc^(2n+2) ≥ 0, i.e. smooth and rapidly decaying — a low-pass kernel. Low-frequency, smooth embeddings are exactly the regime that resists concentration. The B-spline is therefore not a stylistic choice but a structural one: it keeps the Gram matrix non-degenerate as the encoding scales.
V. The orthogonal Base-SAE basis
The encoding directions matter as much as the kernel. Standard sparse autoencoders are deliberately overcomplete and non-orthogonal — superposition is their defining feature. Choosing instead an orthogonal basis means each vector is encoded into distinguishable, non-overlapping quantum states. Orthogonality fights concentration at the level of the basis, before the circuit is ever touched. This is the origin-first move: do not search for the basis by iteration — know it, and project onto it.
VI. Making the invariant falsifiable
The candidate invariant is read out classically in two complementary ways. First, the dominant eigen-subspace of the centered Gram matrix — reported as a subspace, not as fragile individual eigenvectors. Second, centered kernel alignment (CKA) between Gram matrices computed from two different models on the same inputs. CKA is invariant to orthogonal transforms and isotropic scaling; CKA → 1 supports the hypothesis that distinct models induce the same kernel geometry; CKA materially below 1 falsifies it. This is the empirical, refutable form of the invariance claim — and it connects directly to the Platonic Representation Hypothesis: as models grow, they appear to measure distance between datapoints in increasingly alike ways.
VII. What is public, what is not
The full pipeline — extract hidden states → project onto the orthogonal basis → encode via a B-spline feature map → compute the gradient-free Gram matrix on a GPU statevector simulator (cuQuantum / qsimcirq) → spectral readout + CKA — is described here in full, and the reference code with a worked example will follow. The synthesis metric — the origin-first transform that makes the extraction synthetic rather than gradient-iterated — is not disclosed. It enters the scaffold at a single, well-marked hook. Everything around it is scaffolding; the core remains private.
Honest limits
This is a scaffold, not a solution. On a classical statevector simulator there is no quantum speedup — the value is methodological: a faithful, hardware-portable construction. Exponential concentration must be monitored empirically. The invariance is a hypothesis under test, not a result. And the surfaced structure is conditioned on the chosen layer, pooling, and basis — all of which should be swept.
A gradient-trained model, re-read without gradients. A curved geometry, kept curved. An invariant, made refutable.
