Noise-Aware Hybrid VQC · Uzair Ahmad

The Problem

Near-term quantum hardware lives in the NISQ regime - Noisy Intermediate-Scale Quantum. Devices have on the order of tens of qubits, two-qubit gate fidelities around 99%, and coherence times short enough that any circuit deeper than a few dozen entangling gates accumulates more noise than signal. In that regime, the classic Variational Quantum Classifier (VQC) - feed encoded data into a parameterized circuit, train the parameters end-to-end against a loss - tends to underperform even simple classical baselines on real tasks.

The root causes are well-known but often treated separately:

Barren plateaus. As the circuit gets deeper or wider, the loss landscape flattens exponentially, so gradients vanish and gradient-free optimizers like COBYLA stall in random regions.
Encoding mismatch. Naïve angle encoding (x → R_Y(x)) maps unnormalized feature ranges onto a periodic rotation and silently destroys class separability.
Decision pressure on a noisy estimator. Asking a noisy 4-qubit circuit to both learn a representation and emit calibrated class probabilities is two jobs at once. The noise eats whichever job is more sensitive.

We wanted to know: if a pure VQC plateaus at near-random accuracy under realistic noise, is the bottleneck the quantum component itself, or the decision head we glue onto it?

Our Approach

We compare three configurations on the Iris benchmark, holding the circuit ansatz and the noise model fixed across all three so the only thing that changes is what the quantum part is asked to do.

Baseline VQC - random, untrained parameters. Establishes the noise floor and confirms the encoder–ansatz pair is non-degenerate.
Trained VQC - same circuit, parameters optimized end-to-end with COBYLA against a cross-entropy loss measured from Pauli-Z expectations.
Hybrid Quantum-Classical (NA-HVQC) - the quantum circuit becomes a fixed feature transformer. Its four measured Z-expectations are passed to a small classical linear-softmax head. Only the classical head is trained.

Figure 1 · NA-HVQC pipeline. The quantum circuit is run once per sample; its four Z-expectations become the input vector to a classical linear-softmax head.

Method

Adaptive feature encoding

Iris features have wildly different ranges (sepal length ≈ 4.3–7.9, petal width ≈ 0.1–2.5). Encoding raw values as rotation angles squashes one feature into a tiny arc of the Bloch sphere and wraps another past π. We fit a per-feature affine map on the training set so each feature lands in [−π, π], then feed those angles into a ZZ feature map for entangled encoding:

import numpy as np
from qiskit.circuit.library import ZZFeatureMap

class AdaptiveEncoder:
    def fit(self, X):
        self.lo, self.hi = X.min(axis=0), X.max(axis=0)
        return self

    def transform(self, X):
        # scale each feature to [-π, π] using train-set range
        scaled = 2 * (X - self.lo) / (self.hi - self.lo) - 1
        return np.pi * scaled

feature_map = ZZFeatureMap(feature_dimension=4, reps=2, entanglement="linear")

Variational ansatz

A RealAmplitudes ansatz with 4 qubits and 3 layers. Linear entanglement keeps CNOT count low: 3 entangling gates per layer × 3 layers = 9 CNOTs total, comfortably inside the < 50 budget we set for NISQ-compatibility. Trainable parameters: 4 × (3 + 1) = 16.

from qiskit.circuit.library import RealAmplitudes

ansatz = RealAmplitudes(
    num_qubits=4,
    reps=3,
    entanglement="linear",
)

Hybrid forward pass

For the hybrid configuration, the circuit is built once with random (then frozen) ansatz parameters. Each sample produces a 4-vector of Z-expectations, and a single linear layer maps that into class logits:

from qiskit_aer.primitives import EstimatorV2 as Estimator
from qiskit.quantum_info import SparsePauliOp

class HybridNAHVQC:
    def __init__(self, encoder, ansatz, theta_frozen, n_classes=3):
        self.encoder, self.ansatz = encoder, ansatz
        self.theta = theta_frozen
        self.observables = [
            SparsePauliOp.from_sparse_list([("Z", [q], 1.0)], num_qubits=4)
            for q in range(4)
        ]
        self.W = np.zeros((4, n_classes))   # trainable
        self.b = np.zeros(n_classes)        # trainable

    def quantum_features(self, x):
        circ = self.encoder.assign_parameters(x) \
                   .compose(self.ansatz.assign_parameters(self.theta))
        job = Estimator().run([(circ, obs) for obs in self.observables])
        return np.array([r.data.evs for r in job.result()])

    def forward(self, x):
        z = self.quantum_features(x)        # shape: (4,)
        logits = z @ self.W + self.b
        return softmax(logits)

Training the trained-VQC baseline

For the second configuration, the ansatz parameters are trained end-to-end against negative log-likelihood. We use COBYLA because the noisy estimator makes finite-difference gradients unreliable in this regime:

from scipy.optimize import minimize

def nll(theta, X, y):
    probs = [model.forward_endtoend(x, theta) for x in X]
    return -np.mean([np.log(p[yi] + 1e-12) for p, yi in zip(probs, y)])

result = minimize(
    nll,
    x0=np.random.randn(16) * 0.1,
    args=(X_train, y_train),
    method="COBYLA",
    options={"maxiter": 200, "rhobeg": 0.3},
)

Results

All three configurations share the same ansatz, the same encoder, and the same Aer noise model. Numbers are on the held-out Iris test split.

Configuration	Accuracy	Weighted F1	CNOTs
Baseline VQC (untrained)	30%	0.22	9
Trained VQC (COBYLA)	50%	0.43	9
NA-HVQC (hybrid)	90%	0.90	9

Figure 2 · Test-set accuracy across the three configurations, same circuit and same noise model in every run.

The hybrid model nearly triples the trained VQC's accuracy with zero additional quantum resources. The improvement is not from a better circuit - it's the same 9-CNOT, 4-qubit ansatz - it's from offloading the decision boundary to a classical layer that can integrate the noisy quantum expectations without itself being noisy.

Takeaways

Hybrid > pure quantum on NISQ. When the quantum part is treated as a feature extractor and the classical part is trusted with the decision, you sidestep the optimization pathologies that wreck end-to-end VQC training.
Encoding is doing real work. Adaptive per-feature scaling was the single most impactful preprocessing step. Naïve angle encoding ate 10–15 accuracy points before we caught it.
Depth budget held. 9 CNOTs is well inside the NISQ-compatible envelope, which means this configuration could plausibly run on a real device, not just on Aer.
What this is not. 90% on Iris is not evidence of quantum advantage - a classical RBF SVM hits the same number on the same split. The contribution is methodological: a noise-tolerant pattern for sub-50-CNOT VQCs.

Stack

Qiskit (circuit construction), Qiskit Aer (noisy simulator), scikit-learn (Iris loader and metrics), SciPy (COBYLA).

Full Paper

The full report PDF is embedded below. Open in new tab or download a copy.