yale-grouptalk0816-andy.pages.dev Open in urlscan Pro
188.114.97.3  Public Scan

URL: https://yale-grouptalk0816-andy.pages.dev/
Submission: On August 17 via api from US — Scanned from NL

Form analysis 0 forms found in the DOM

Text Content

QUANTUM ADVANTAGE WITH QUANTUM ERROR MITIGATION: INTRO AND THEIR COSTS

Zhenyu Cai et al. "Quantum error mitigation." Reviews of Modern Physics 95.4
(2023)
Kento Tsubouchi et al. "Universal cost bound of quantum error mitigation based
on quantum estimation theory." Physical Review Letters 131.21 (2023)
based on these 2 articles
Andy Chu (Chia-Tung)
August 16, 2024
Yale Quantum Systems Lab



TABLE OF CONTENT

 1.  Quantum Error Mitigation - Definition

 2.  Quantifying a "Good" Error-Mitigated Estimator (1/2)

 3.  Zero-Noise Extrapolation 📄, 📄
 4.  Probabilistic Error Cancellation 📄 (1/2)
 5.  Measurement Error Mitigation 📄, 📄 (1/4)
 6.  Purity Constraints
 7.  Subspace Expansions 📄, 📄, 📄

 8.  Quantum Fisher Information

 9.  Multi-parametric QFI and re-parametrization

 10. Applying Cramér-Rao Bound

 11. Noise channel characterization

 12. Main Results: Complexity of NNN

 13. Connection between QEM and QEC & Open Problems


Enter fullscreenGo to previous slideGo to next slideShow slide overviewSwitch to
dark mode theme

Show drawing toolbar

Adjust settings

1 / 86
Draw with stylusDraw a lineDraw an arrowDraw an ellipseDraw a rectangleErase

Adjust stroke width
Set brush color

Set brush color

Set brush color

Set brush color

Set brush color

Set brush color

Set brush color


UndoRedoDelete

Pin drawing


QUANTUM ADVANTAGE WITH QUANTUM ERROR MITIGATION: INTRO AND THEIR COSTS

Zhenyu Cai et al. "Quantum error mitigation." Reviews of Modern Physics 95.4
(2023)
Kento Tsubouchi et al. "Universal cost bound of quantum error mitigation based
on quantum estimation theory." Physical Review Letters 131.21 (2023)
based on these 2 articles
Andy Chu (Chia-Tung)
August 16, 2024
Yale Quantum Systems Lab

1


TABLE OF CONTENT

 1.  Quantum Error Mitigation - Definition

 2.  Quantifying a "Good" Error-Mitigated Estimator (1/2)

 3.  Zero-Noise Extrapolation 📄, 📄
 4.  Probabilistic Error Cancellation 📄 (1/2)
 5.  Measurement Error Mitigation 📄, 📄 (1/4)
 6.  Purity Constraints
 7.  Subspace Expansions 📄, 📄, 📄

 8.  Quantum Fisher Information

 9.  Multi-parametric QFI and re-parametrization

 10. Applying Cramér-Rao Bound

 11. Noise channel characterization

 12. Main Results: Complexity of NNN

 13. Connection between QEM and QEC & Open Problems

2


WHY QUANTUM ERROR MITIGATION? (1/2)



 * Where are we now?
   Somewhere between NISQ ↔︎ Full-Scale Fault-Tolerant.

 * Do we need to wait until Full-Scale Fault-Tolerant to achieve quantum
   advantage?
   (exponential/polynomial/super-polynomial speedups over the best-known
   classical algorithms for: Hamiltonian simulation,
   algebraic/optimization/searching problems, randomness sampling, …etc.)

 * On the fault-tolerant side, we have already plentiful of progress.
   but it’ll still take a while before we can implement thousands to millions of
   logical qubits

 * Why waiting despite given all these progress?


FAULT TOLERANT PROGRESS

 * logical qubit in a diamond quantum processor 📄
 * scaling a surface code logical qubit 📄
 * fault-tolerant entangling gates on the five-qubit code and the color code 📄
 * QEC with silicon spin qubits 📄 …and more



HIGH QUALITY PHYSICAL QUBIT CONTROL

 * multiplexed two-dimensional cluster state 📄
 * 256-atom programmable quantum simulator 📄
 * Q. volume of 64 on superconducting platform 📄
 * spin qubits crossing the surface code threshold 📄

... and more

3


WHY QUANTUM ERROR MITIGATION? (2/2)

 * Existing of barrier for Error Correction:
   Threshold theorem. (hardware-fail-rate requires p<pthresholdp <
   p_\text{threshold}p<pthreshold for meaningful QEC)

 * Usefulness of QEM:
   Takes continuous progress in quantum hardware
   → immediate improvements for quantum information processing.

 * To put it simple,
   QEC: has a threshold (so that we can correct error faster than it
   accumulates).
   QEM: doesn’t have some “threshold”.

4


WHERE CAN WE USE QEM?



 * Wide usecases:
   
   * Variational quantum circuits in physics simulations
   * approximate optimization algorithms
   * heuristic algorithms for quantum machine learning
     (and much more…)

 * What do they have in common?
   
   1. applying a short-depth quantum circuit to a simple initial state
   2. estimating the expectation value of some relevant observable.

 * We want our output: “expectation value of relevant observable”, to be as
   accurate as possible.
   (for example, in physical/chemical simulation, we want the ground energy to
   be within chemical accuracy (1kcal/mol))

 * This leads to the most essential feature of QEM: the ability to minimize the
   noise-induced bias in expectation values on noisy hardware.

 * But this can also be achieved by QEC! and by many other tools like
   decoherence-free subspaces 📄 and dynamical decoupling sequences 📄 !
   
   > So what sets QEM apart from all these other techniques? It’s better to
   > answer this question with the definition of QEM.


5


QUANTUM ERROR MITIGATION - DEFINITION

 * Algorithmic schemes that reduce the noise-induced bias in the expectation
   value.
 * By post-processing outputs from an ensemble of circuit runs.
 * Using circuits at the same noise level as the original unmitigated circuit or
   above.


DIFFERENCE BETWEEN QEM AND QEC

 * Basic Idea:
   QEM: by post-processing outputs.
   QEC: by applying adaptive-gates according to mid-circuit measurements.

 * Noise-level:
   QEM: unchanged or amplified in each circuit run.
   QEC: reduced in EVERY SINGLE circuit run.

 * What does it take?
   QEM: many times of circuit runs.
   QEC: more qubits required (overhead).

6


MANY TIMES OF CIRCUIT RUNS: WHY?

How to get the expectation value of some variable of interest?
By simply averaging the measurement results of repeated execution (unless we use
some one-shot measurement like phase estimation → but that is more in the fault
tolerant era)


Sampling Overhead!!
So now, if our error-mitigated estimator has favorable sampling overhead, it’s a
good estimator;

and if its bias from the ideal expectation value is small, then it’s also a good
estimator.

7


(APPENDIX) QUANTUM ERROR MITIGATION - DESIRED FEATURES

 * Mitigation methods should ideally only require a very modest qubit overhead
   to remain practical on current and near-term quantum hardware.

 * Error mitigation techniques should provide an accuracy guarantee for the
   method. Such a guarantee should ideally provide a formal error bound on the
   mitigated expectation values that indicate how well the method works at
   varying levels of noise.

 * a reliable error mitigation method should require few assumptions (or, no
   assumptions) about the final state that is prepared by the computation.

8


QUANTIFYING A "GOOD" ERROR-MITIGATED ESTIMATOR (1/2)

 * Goal: to estimate the expectation value of some observable of interest OOO →
   to estimate Tr[ρ0O]\text{Tr}[\rho_0O]Tr[ρ0 O].

 * How: we can construct an estimator O^\hat OO^ for estimating OOO. And how
   good is O^\hat OO^? (assessing O^\hat OO^)
   
   1. Use Mean Square Error we choose this method for assessing O^\hat OO^
   
   MSE[O^]=E[(O^−Tr[ρ0O])2]\text{MSE}[\hat O] = \text{E}\left[\left( \hat O -
   \text{Tr}[\rho_0O]\right)^2\right]MSE[O^]=E[(O^−Tr[ρ0 O])2]
   
   2. Use prediction intervals
   
   calculate the interval within which the outcome of the estimator will fall
   with a given probability, offering a rigorous bound on the worst-case
   deviation of the estimator.

Formally defined goal: to reduce MSE[O^]\text{MSE}[\hat O]MSE[O^] as much as
possible, using finite resource.

9


QUANTIFYING A "GOOD" ERROR-MITIGATED ESTIMATOR (2/2)

What contributes to MSE[O^]\text{MSE}[\hat O]MSE[O^]?

It’s convenient to break it into 2 terms:

MSE[O^]=(Bias[O^])2+Var[O^]\text{MSE}[\hat O] = \left(\text{Bias}[\hat
O]\right)^2 + \text{Var}[\hat O] MSE[O^]=(Bias[O^])2+Var[O^]

Each of them defined as:

Bias[O^]=E[O^]−Tr[ρ0O]Var[O^]=E[O^2]−(E[O^])2\begin{align*} \text{Bias}[\hat O]
&= \text{E}[\hat O] - \text{Tr}[\rho_0O] \\ \text{Var}[\hat O] &=
\text{E}\left[\hat O^2\right] - \left(\text{E}[\hat O]\right)^2 \end{align*}
Bias[O^]Var[O^] =E[O^]−Tr[ρ0 O]=E[O^2]−(E[O^])2

→ related to accuracy!
→ related to precision!
Define sampling overhead:

Cem=Nshotϵ(O^em)Nshotϵ(O^ρ)=Var[O^em]Var[O^ρ]C_\text{em} =
\frac{N^\epsilon_\text{shot}(\hat O_\text{em})}{N^\epsilon_\text{shot}(\hat
O_\rho)} = \frac{\text{Var}[\hat O_\text{em}]}{\text{Var}[\hat O_\rho]} Cem
=Nshotϵ (O^ρ )Nshotϵ (O^em ) =Var[O^ρ ]Var[O^em ]

(How much more times of circuit runs do we need for
O‾em\overline{O}_\text{em}Oem to have variance of same level as the original
noisy O‾ρ\overline{O}_\rhoOρ ?)

10


(APPENDIX) SIMPLEST ESTIMATOR O‾Ρ\OVERLINE{O}_\RHOOΡ

just don’t use QEM:

> Although we are given a noisy state ρ\rhoρ instead of ρ0\rho_0ρ0 , we still
> measure OOO on that ρ\rhoρ, and repeat for NcirN_\text{cir}Ncir times
> (effectively running the noisy circuit NcirN_\text{cir}Ncir times), so we will
> have NcirN_\text{cir}Ncir independent O^ρ\hat O_\rhoO^ρ now. Finally, the way
> of constructing O‾ρ\overline{O}_\rhoOρ is to simply take average of these
> O^ρ\hat O_\rhoO^ρ .
> → the more NcirN_\text{cir}Ncir we run, the higher the quality of
> O‾ρ\overline{O}_\rhoOρ :

MSE[O‾ρ]=(Tr[ρO]−Tr[ρ0O]⏟Bias[O‾ρ]=Bias[O^ρ])2+Tr[ρO2]−Tr[ρO]2Ncir⏟Var[O‾ρ]=Var[O^ρ]/Ncir\text{MSE}[\overline{O}_\rho]=
(\underbrace{\text{Tr}[\rho O] - \text{Tr}[\rho_0
O]}_{\text{Bias}[\overline{O}_\rho]=\text{Bias}[\hat{O}_\rho]})^2 +
\underbrace{\frac{\text{Tr}\left[\rho O^2\right] - \text{Tr}[\rho
O]^2}{N_\text{cir}}}_{\text{Var}[\overline{O}_\rho] =
\text{Var}[\hat{O}_\rho]/N_\text{cir}} MSE[Oρ ]=(Bias[Oρ ]=Bias[O^ρ
]Tr[ρO]−Tr[ρ0 O] )2+Var[Oρ ]=Var[O^ρ ]/Ncir Ncir Tr[ρO2]−Tr[ρO]2

 * the second term is related to shot noise, and will reduce as we increase
   NcirN_\text{cir}Ncir .

 * so eventually MSE[O‾ρ]\text{MSE}[\overline{O}_\rho]MSE[Oρ ] will be limited
   by the first term (the bias) of the estimator
   Bias[O‾ρ]\text{Bias}[\overline{O}_\rho]Bias[Oρ ] (its magnitude, to be
   clear.) because this a systematic error that cannot be reduced by increasing
   the number of circuit runs.

11


(APPENDIX) AFTER APPLYING QEM

We can construct O‾em\overline{O}_\text{em}Oem (error-mitigated) using the same
number of circuit runs NcirN_\text{cir}Ncir to achieve:

 * smaller bias than the original naive estimator O‾ρ\overline{O}_\rhoOρ :
   magnitude of Bias[O‾em]≤\text{Bias}[\overline{O}_\text{em}] \leqBias[Oem ]≤
   magnitude of Bias[O‾ρ]\text{Bias}[\overline{O}_\rho]Bias[Oρ ].

 * but usually, inevitably, larger variance:
   Var[O‾em]≥Var[O‾ρ]\text{Var}[\overline{O}_\text{em}] \geq
   \text{Var}[\overline{O}_\rho]Var[Oem ]≥Var[Oρ ], because we usually extract
   the useful part from the noise to reduce bias, so the error-mitigated
   estimator is also more sensitive to noise.

here we post-define O^em\hat O_\text{em}O^em to be a 'one-shot' error-mitigated
estimator satisfying E[O^em]=E[O‾em]\text{E}[\hat O_\text{em}] =
\text{E}[\overline{O}_\text{em}]E[O^em ]=E[Oem ] and
Var[O^em]=Ncir×Var[O‾em]\text{Var}[\hat{O}_\text{em}]
=N_\text{cir}\times\text{Var}[\overline{O}_\text{em}]Var[O^em ]=Ncir ×Var[Oem ],
so that:

Cem=Nshotϵ(O^em)Nshotϵ(O^ρ)=Var[O^em]Var[O^ρ]C_\text{em} =
\frac{N^\epsilon_\text{shot}(\hat O_\text{em})}{N^\epsilon_\text{shot}(\hat
O_\rho)} = \frac{\text{Var}[\hat O_\text{em}]}{\text{Var}[\hat O_\rho]} Cem
=Nshotϵ (O^ρ )Nshotϵ (O^em ) =Var[O^ρ ]Var[O^em ]

12


(APPENDIX) AN ALTERNATIVE WAY TO QUANTIFY CEMC_\TEXT{EM}CEM

If we know the range of a random variable X^\hat XX^ (the difference between the
maximum and minimum possible values taken by X^\hat XX^), denoted as R[X^]R[\hat
X]R[X^], we can plug this range into Hoeffding’s inequality to get the sample
size required:

To sufficiently guarantee an "estimation of E[X^]E[\hat X]E[X^]" to
ϵ\epsilonϵ-precision with (1−δ)(1-\delta)(1−δ) probability, we need:

Nhffϵ,δ(X^)=ln⁡(2/δ)2ϵ2R[X^]2N^{\epsilon,\delta}_\text{hff}(\hat X) =
\frac{\ln(2/\delta)}{2\epsilon^2} R[\hat X]^2 Nhffϵ,δ (X^)=2ϵ2ln(2/δ) R[X^]2

This way our previously defined sampling overhead can be approximated as:

Cem≈Nhffϵ,δ(O^em)Nhffϵ,δ(O^ρ)=R[O^em]2R[O^ρ]2C_\text{em} \approx
\frac{N^{\epsilon,\delta}_\text{hff}(\hat
O_\text{em})}{N^{\epsilon,\delta}_\text{hff}(\hat O_\rho)} = \frac{R[\hat
O_\text{em}]^2}{R[\hat O_\rho]^2} Cem ≈Nhffϵ,δ (O^ρ )Nhffϵ,δ (O^em ) =R[O^ρ
]2R[O^em ]2

13


(APPENDIX) MODELING THE ERRORS

Similar to QEC, we model noise as discrete, probabilistic events called faults,
and faults can occur at various locations in the circuits, locations include
gates, idling steps, and measurements.


QUANTIFYING THE ERRORS

Assume all locations are afflicted with Pauli noise, and the Pauli noise at
location fff occurs with pfp_fpf probability.

 * if error events at all locations are independent, then the probability of no
   faults occurring at all is P0=Πf(1−pf)P_0 = \Pi_f (1-p_f)P0 =Πf (1−pf ) →
   fault-free probability of the circuit.

14


(APPENDIX) AN ALTERNATIVE WAY TO QUANTIFY ERRORS:

Run the circuit one time, see how many fault occurs. Then average this number
over many circuit runs. This final average is defined as circuit fault rate
λ=∑fpf\lambda = \sum_f p_fλ=∑f pf .

 * simple case: all pfp_fpf are the same (ppp), then λ=Mp\lambda = Mpλ=Mp if MMM
   locations in total. If MMM is large, we can model “kkk errors occur during
   one run” using Poisson Distribution: Pk=e−λλk/k!P_k =
   e^{-\lambda}\lambda^k/k!Pk =e−λλk/k!
   
   Then the fault-free probability is just the special case where k=0k=0k=0,
   P0=e−λP_0 = e^{-\lambda}P0 =e−λ.

> All of the arguments will still apply if we define (1−pf1-p_f1−pf ) as the
> coefficient of the error-free part of the Kraus representation of some general
> noise. More generally, we can always apply Pauli twirling to transform all
> noise to Pauli noise such that our arguments become valid.

15


(APPENDIX) BEHAVIOR OF SAMPLING OVERHEAD


> INEVITABILITY
> 
> Even we can always magically distinguish all perfect circuit runs (happens
> with P0=e−λP_0 = e^{-\lambda}P0 =e−λ) from the rest “flawed-runs”, we still
> need to run exponentially times of the circuit to get enough number of
> “perfect-runs” to reach ϵ\epsilonϵ-level shot noise.

As of today, we don’t have magic. So instead of impractically extracting the
error-free state ρ0\rho_0ρ0 , we focus on how to construct Error-mitigated
estimators.


QEC VS QEM

For a given noisy circuit with circuit fault rate λ\lambdaλ:

 * QEC: active correction → reduces λ\lambdaλ in each circuit run

 * QEM: relies on “post-processing the outputs” from an ensemble of circuit
   runs, each run with circuit fault rate ≥λ\geq\lambda≥λ.

16


NOW LET’S SEE (AT LEAST 3) WIDELY-USED ERROR MITIGATION TECHNIQUES

17


ZERO-NOISE EXTRAPOLATION 📄, 📄


IDEA:

If we can tune our circuit fault rate λ\lambdaλ
→ obtain a series of different state ρλ\rho_\lambdaρλ at different λ\lambdaλ
→ view Tr[ρλO]\text{Tr}[\rho_\lambda O]Tr[ρλ O] as a function of λ\lambdaλ
→ the ideal expectation value happens when we substitute λ=0\lambda=0λ=0 into
the function.


CANONICAL DEFINITION:

 1. Model the trend of the data using a parametrized function f(λ;θ⃗)f(\lambda;
    \vec\theta)f(λ;θ)
 2. Once we’re done fitting, we get a set of optimal parameters
    θ⃗∗\vec\theta^*θ∗
 3. The error-mitigated estimate of the zero-noise output is then
    E[O^em]≡f(0;θ⃗∗)\text{E}[\hat O_\text{em}] \equiv f(0; \vec\theta^*)E[O^em
    ]≡f(0;θ∗).



18


ZNE: OPERATIONAL DEGREE OF FREEDOM (1/2)

How do we scale the noise (circuit fault rate λ\lambdaλ)?

 1. Under the assumption that the noise was time-invariant, the noise could be
    effectively amplified by stretching and recalibrating the gate pulses.

 * On superconducting platform. Kandala et al., 2019
 * For two-qubit cross-resonance gates. Chow et al., 2011

 2. Or, by inserting a sequence of abundant gates that are equivalent to the
    identity (if operated noiselessly)
    examples: global folding, local folding, identity insertion

 * Cloud quantum computing. Dumitrescu et al., 2018
 * Digital zero noise extrapolation. Giurgica-Tiron et al., 2020
 * Identity insertions. He et al., 2020

19


ZNE: OPERATIONAL DEGREE OF FREEDOM (2/2)

How do we fit the {noise, expectation value} data?

For sufficiently small λ\lambdaλ (usually it is the case ) → Richardson
extrapolation (some "truncated Taylor expansion")

Tr[ρλO]≈f(λ;θ⃗)=∑k=0M−1θkλkk!\text{Tr}[\rho_{\lambda} O] \approx f(\lambda;
\vec\theta) = \sum_{k=0}^{M-1} \theta_k \frac{\lambda^k}{k!} Tr[ρλ
O]≈f(λ;θ)=k=0∑M−1 θk k!λk

which is a (M−1)th(M-1)^\text{th}(M−1)th order polynomial with MMM coefficients
(or degree of freedom).

→ simplest case: M=2M=2M=2, it’s just linear extrapolation

→ if we first assign a value for MMM (related to how sophisticated we want to
model fff), then we need to obtain at least MMM data points (in the spirit of
Lagrange polynomial) to obtain our error-mitigated estimate:

E[O^em]=θ0∗=∑m=1M(Tr[ρλmO]∏k≠mλk−0λk−λm)\text{E}[\hat O_\text{em}] = \theta_0^*
= \sum_{m=1}^M \left(\text{Tr}[\rho_{\lambda_m}O] \prod_{k\neq m}
\frac{\lambda_k-0}{\lambda_k-\lambda_m} \right) E[O^em ]=θ0∗ =m=1∑M Tr[ρλm
O]k=m∏ λk −λm λk −0

20


ZNE: PERFORMANCE ↔︎ SAMPLING OVERHEAD TRADE-OFFS


PERFORMANCE

 * The bias of E[O^em]\text{E}[\hat O_\text{em}]E[O^em ]:
   → originates from our omission of higher degree terms in f(λ;θ⃗)f(\lambda;
   \vec\theta)f(λ;θ).
   → magnitude of Bias[O^em]=O(λM)\text{Bias}[\hat O_\text{em}] =
   \mathcal{O}(\lambda^M)Bias[O^em ]=O(λM) for MMM data points.

 * Other ways to fit the dataset:
   linear fit, polynomial fit, exponential fit, poly-exponential fit,
   adaptive-exponential fit,…

 * Theoretically, ZNE cannot completely remove bias, but its cost is relatively
   cheap comparing to other QEM methods.


SAMPLING OVERHEAD

 * If we use Monte Carlo method (because E[O^em]\text{E}[\hat O_\text{em}]E[O^em
   ] is a linear combination of the set of noisy expectation values), the
   overhead for (equal gap) Richardson extrapolation is:
   
   Cem∼(∑m=1M∣∏k≠mλk−0λk−λm∣)2∼(2M−1)2C_\text{em} \sim \left( \sum_{m=1}^M
   \left| \prod_{k\neq m} \frac{\lambda_k-0}{\lambda_k-\lambda_m} \right|
   \right)^2\sim (2^M-1)^2 Cem ∼ m=1∑M k=m∏ λk −λm λk −0 2∼(2M−1)2
   
   → grows exponentially with the number of data points MMM.

 * Can choose between different fitting functions to reach optimal result and a
   more favorable sampling overhead.


21


ZNE: QUANTUM ADVANTAGE V.S. LIMITATIONS


EXPERIMENTAL REALIZATIONS

 * Kim et al. 2023 📄 : 26-qubit simulation of the 2D transverse field Ising
   model in the limit of strong entanglement
   
   → result is competitive in comparison to standard tensor network methods.

 * Kim et al. 2023 📄 : simulated the time dynamics of the Ising model up to a
   circuit size of 127 qubits and 60 layers of 2-qubit gates
   
   → result in agreement with state-of-the-art classical simulations.

 * cloud platforms (2020 experiments): 📄, 📄, 📄, … and more.


LIMITATIONS

 * Need the ability to "scale the noise"

 * Noise model might change under different noise strength

 * Theoretically cannot remove entire bias
   → but can go very close to 0.


22


PROBABILISTIC ERROR CANCELLATION 📄 (1/2)




IDEA:

→ we can write the noise-free expectation value as a linear combination of noisy
expectation values.
→ this is a quasi-probability decomposition.
→ can be sampled using Monte Carlo procedure.

23


PROBABILISTIC ERROR CANCELLATION 📄 (2/2)


FORMAL DEFINITION

 * We want to implement an ideal channel U\mathcal UU, but with a set of noisy
   basis operations {Bn}\{\mathcal B_n\}{Bn } since we can only perform these
   noisy {Bn}\{\mathcal B_n\}{Bn } on our hardware.

 * For sufficiently large basis, we can decompose the ideal channel
   U=∑nαnBn\mathcal U = \sum_n\alpha_n \mathcal B_nU=∑n αn Bn with real
   coefficients αn\alpha_nαn .
   αn\alpha_nαn can often be negative → this can often be an unphysical map, but
   this doesn’t matter.

 * Because we can express our ideal expectation value of any observable OOO as
   Tr[Oρ]=⟨ ⁣⟨O∣ρ⟩ ⁣⟩\text{Tr}[O\rho ] = \langle\!\langle O \vert \rho
   \rangle\!\rangleTr[Oρ]=⟨⟨O∣ρ⟩⟩, and decompose it into:
   
   ⟨ ⁣⟨O∣ρ⟩ ⁣⟩=⟨ ⁣⟨O∣U∣ρ0⟩ ⁣⟩=∑nαn⟨ ⁣⟨O∣Bn∣ρ0⟩ ⁣⟩\langle\!\langle O \vert \rho
   \rangle\!\rangle = \langle\!\langle O \vert \mathcal U \vert \rho _0
   \rangle\!\rangle = \sum_n\alpha_n\langle\!\langle O \vert \mathcal B_n \vert
   \rho _0 \rangle\!\rangle ⟨⟨O∣ρ⟩⟩=⟨⟨O∣U∣ρ0 ⟩⟩=n∑ αn ⟨⟨O∣Bn ∣ρ0 ⟩⟩

 * Therefore, in practice, we can sample each ⟨ ⁣⟨O∣Bn∣ρ0⟩ ⁣⟩\langle\!\langle O
   \vert \mathcal B_n \vert \rho _0 \rangle\!\rangle⟨⟨O∣Bn ∣ρ0 ⟩⟩ term with
   probability ∣αn∣/γ|\alpha_n|/\gamma∣αn ∣/γ where γ=∑nαn\gamma =
   \sum_n\alpha_nγ=∑n αn . And don’t forget to multiply the term by
   sign(αn)⋅γ\text{sign}(\alpha_n)\cdot\gammasign(αn )⋅γ before adding all up.
   Monte Carlo would usually do the sampling work above. The sampling overhead
   in this case is Cem∼γ2=(∑n∣αn∣)2C_\text{em}\sim \gamma^2 =
   (\sum_n|\alpha_n|)^2Cem ∼γ2=(∑n ∣αn ∣)2.

24


(APPENDIX) PEC: FROM SINGLE GATE TO MULTIPLE GATES

Any circuit we want to implement can be decomposed into a sequence of MMM ideal
gates {Um}\{\mathcal U_m\}{Um }, written as

⟨ ⁣⟨O∣ρ⟩ ⁣⟩=⟨ ⁣⟨O∣∏m=1MUm∣ρ0⟩ ⁣⟩\langle\!\langle O \vert \rho \rangle\!\rangle =
\langle\!\langle O \vert \prod_{m=1}^M \mathcal U_m \vert \rho _0
\rangle\!\rangle ⟨⟨O∣ρ⟩⟩=⟨⟨O∣m=1∏M Um ∣ρ0 ⟩⟩

therefore the decomposition is:

⟨ ⁣⟨O∣ρ⟩ ⁣⟩=⟨ ⁣⟨O∣∏m=1M(∑nmαm,nmBnm)∣ρ0⟩ ⁣⟩=∑n⃗αn⃗⟨ ⁣⟨O∣Bn⃗∣ρ0⟩ ⁣⟩\begin{align*}\langle\!\langle
O \vert \rho \rangle\!\rangle &= \langle\!\langle O \vert \prod_{m=1}^M \left(
\sum_{n_m} \alpha_{m, n_m} \mathcal B_{n_m} \right) \vert \rho _0
\rangle\!\rangle \\ &= \sum_{\vec n} \alpha_{\vec n}\langle\!\langle O \vert
\mathcal B_{\vec n} \vert \rho _0 \rangle\!\rangle \end{align*} ⟨⟨O∣ρ⟩⟩
=⟨⟨O∣m=1∏M (nm ∑ αm,nm Bnm )∣ρ0 ⟩⟩=n∑ αn ⟨⟨O∣Bn ∣ρ0 ⟩⟩

where n⃗={n1,n2,⋯ ,nM}\vec n = \{n_1, n_2, \cdots, n_M\}n={n1 ,n2 ,⋯,nM } which
is the set of labels for a sequence of basis elements, and we’ve used
definition: Bn⃗=∏m=1MBnm\mathcal B_{\vec n} = \prod_{m=1}^M \mathcal B_{n_m}Bn
=∏m=1M Bnm , αn⃗=∏m=1Mαm,nm\alpha_{\vec n} = \prod_{m=1}^M \alpha_{m,n_m}αn
=∏m=1M αm,nm and accordingly, γ=∑n⃗∣αn⃗∣\gamma = \sum_{\vec
n}|\alpha_{\vec{n}}|γ=∑n ∣αn ∣.

25


PEC: A SIMPLE EXAMPLE (1/2)

Suppose our basis operations only suffer from single-qubit depolarizing channel
ρ↦(1−p)ρ+p3XρX+p3YρY+p3ZρZ\rho \mapsto (1-p)\rho + \frac p3 X\rho X+ \frac p3
Y\rho Y+ \frac p3 Z\rho Zρ↦(1−p)ρ+3p XρX+3p YρY+3p ZρZ.





Then our ideal Hadamard gate H\text HH can be expanded into noisy operations:

(H)ideal=α1(H)noisy+α2(XH)noisy+α3(YH)noisy+α4(ZH)noisy\begin{align*} (\text
H)_\text{ideal} &= \alpha_1 (\text H)_\text{noisy} \\ &+\alpha_2 (\text X\text
H)_\text{noisy} \\ &+\alpha_3 (\text Y\text H)_\text{noisy} \\ &+\alpha_4 (\text
Z\text H)_\text{noisy} \end{align*} (H)ideal =α1 (H)noisy +α2 (XH)noisy +α3
(YH)noisy +α4 (ZH)noisy

> We only want pure noiseless H\boxed HH (with coeff=1) in the end.
> This gives us two equations of constraint to satisfy:
> α1(1−p)+(α2+α3+α4)(p/3)=1\alpha_1(1-p) + (\alpha_2+\alpha_3+\alpha_4)(p/3) =
> 1α1 (1−p)+(α2 +α3 +α4 )(p/3)=1 , and
> α1(p/3)+α2(1−p)=0\alpha_1(p/3) + \alpha_2(1-p) = 0α1 (p/3)+α2 (1−p)=0, (same
> for α3\alpha_3α3 and α4\alpha_4α4 ).


26


PEC: A SIMPLE EXAMPLE (2/2)

Suppose our basis operations only suffer from single-qubit depolarizing channel
ρ↦(1−p)ρ+p3XρX+p3YρY+p3ZρZ\rho \mapsto (1-p)\rho + \frac p3 X\rho X+ \frac p3
Y\rho Y+ \frac p3 Z\rho Zρ↦(1−p)ρ+3p XρX+3p YρY+3p ZρZ.





Then our ideal Hadamard gate H\text HH can be expanded into noisy operations:

(H)ideal=α1(H)noisy+α2(XH)noisy+α3(YH)noisy+α4(ZH)noisy\begin{align*} (\text
H)_\text{ideal} &= \alpha_1 (\text H)_\text{noisy} \\ &+\alpha_2 (\text X\text
H)_\text{noisy} \\ &+\alpha_3 (\text Y\text H)_\text{noisy} \\ &+\alpha_4 (\text
Z\text H)_\text{noisy} \end{align*} (H)ideal =α1 (H)noisy +α2 (XH)noisy +α3
(YH)noisy +α4 (ZH)noisy

> We only want pure noiseless H\boxed HH (with coeff=1) in the end.
> This gives us two equations of constraint to satisfy:
> α1(1−p)+(α2+α3+α4)(p/3)=1\alpha_1(1-p) + (\alpha_2+\alpha_3+\alpha_4)(p/3) =
> 1α1 (1−p)+(α2 +α3 +α4 )(p/3)=1 , and
> α2(p/3)+α2(1−p)=0\alpha_2(p/3) + \alpha_2(1-p) = 0α2 (p/3)+α2 (1−p)=0, (same
> for α3\alpha_3α3 and α4\alpha_4α4 ).

 * If our noise level p=0.1p=0.1p=0.1, then
   (α1,α2,α3,α4)=(2926,−126,−126,−126)≈(1.115,−0.038,−0.038,−0.038)(\alpha_1,
   \alpha_2, \alpha_3, \alpha_4) = (\frac{29}{26}, \frac{-1}{26}, \frac{-1}{26},
   \frac{-1}{26}) \approx (1.115, -0.038, -0.038, -0.038)(α1 ,α2 ,α3 ,α4 )=(2629
   ,26−1 ,26−1 ,26−1 )≈(1.115,−0.038,−0.038,−0.038).
 * If our noise level p=0.2p=0.2p=0.2, then
   (α1,α2,α3,α4)=(1411,−111,−111,−111)≈(1.273,−0.091,−0.091,−0.091)(\alpha_1,
   \alpha_2, \alpha_3, \alpha_4) = (\frac{14}{11}, \frac{-1}{11}, \frac{-1}{11},
   \frac{-1}{11}) \approx (1.273, -0.091, -0.091, -0.091)(α1 ,α2 ,α3 ,α4 )=(1114
   ,11−1 ,11−1 ,11−1 )≈(1.273,−0.091,−0.091,−0.091).


> We can see from this simple case that: for higher noise-level, the higher the
> one-norm γ=∣α1∣+∣α2∣+∣α3∣+∣α4∣\gamma =
> |\alpha_1|+|\alpha_2|+|\alpha_3|+|\alpha_4|γ=∣α1 ∣+∣α2 ∣+∣α3 ∣+∣α4 ∣,
> indicating higher sampling overhead ∼γ2\sim\gamma^2∼γ2.

 * How to sample with negative probability?
   
   e.g., for α2=−0.038\alpha_2=-0.038α2 =−0.038, we just sample (XH)noisy(\text
   X\text H)_\text{noisy}(XH)noisy with probability:
   0.0381.115+0.038+0.038+0.038\dfrac{0.038}{1.115+0.038+0.038+0.038}1.115+0.038+0.038+0.0380.038
   , and finally add up the negative of circuit output value to the total
   expectation value.

27


PEC: PERFORMANCE ↔︎ SAMPLING OVERHEAD TRADE-OFFS


PERFORMANCE

 * Theoretically, PEC can completely remove bias,
   but its cost is also very expensive.

 * Can also tackle correlated noise: 📄
   
   → by considering the noise model in terms of a sparse Pauli-Lindbladian
   L(ρ)=∑kλk(PkρPk−ρ)\mathcal L(\rho) = \sum_k\lambda_k(P_k\rho P_k -
   \rho)L(ρ)=∑k λk (Pk ρPk −ρ) for a set of Pauli matrices PkP_kPk , which can
   be efficiently learned for a polynomial number of Pauli terms.

 * Can also use variational circuits to construct the basis set: 📄


SAMPLING OVERHEAD

 * If we assume all the gates suffer from Pauli noise with the same error rate
   ppp, the circuit size MMM is large but the circuit fault rate is finite
   λ=Mp\lambda = Mpλ=Mp,
   
   Cem∼∏m=1M(1+p1−p)2∼e4λC_\text{em} \sim \prod_{m=1}^M \left( \frac{1+p}{1-p}
   \right)^2\sim e^{4\lambda} Cem ∼m=1∏M (1−p1+p )2∼e4λ
   
   → grows exponentially with the number of data points MMM.

 * the sampling overhead of probabilistic error cancellation is highly dependent
   on the basis we choose.


28


PEC: QUANTUM ADVANTAGE V.S. LIMITATIONS


EXPERIMENTAL REALIZATIONS

 * Song et al. 2019 📄
   
   → on superconducting platform.

 * Zhang et al. 2020 📄
   
   → on trapped-ion platform.

 * Can also be applied to continuous noise processes, and perform partial
   mitigation (by expanding the noise process into a perturbation series):
   Sun et al. 2021 📄, Hama and Nishi. 2022 📄

 * Hakoshima et al., 2021 📄: use the non-Markovianity in noise to reduce the
   sampling cost.


LIMITATIONS

 * Need to have a set of basis elements that span the ideal operations, and full
   characterisation of this basis.

 * Need to learn the hardware-specific noise model.

 * The sampling overhead would blow up if hardware error rate increases mildly.

 * PEC is more suitable for Pauli noise (can be done via Twirling).


29


MEASUREMENT ERROR MITIGATION 📄, 📄 (1/4)




IDEA:

 * Ideal Measurement
   
   * measure the circuit → obtain binary string output x∈{0,1}Nx\in
     \{0,1\}^Nx∈{0,1}N
   * ideally: want to perform projective measurements in the computational basis
     {⟨ ⁣⟨x∣}\{\langle\!\langle x\vert\}{⟨⟨x∣}.

 * Real-World Measurement
   
   * some measurement noise A\mathcal AA occurs
   * turn the projective measurement into POVM
     {⟨ ⁣⟨Ex∣}={⟨ ⁣⟨x∣A∣}\{\langle\!\langle E_x\vert\}=\{\langle\!\langle x\vert
     \mathcal A\vert\}{⟨⟨Ex ∣}={⟨⟨x∣A∣} → leads to different output statistics.
     
     > physical origins of readout errors
     > 
     >  * superconducting: thermal excitations, T1T_1T1 -decay.
     >  * ion traps: hard to detect an ion’s dark state, collisions in the trap
     >    → it’s better (or sufficient) to construct a simplified noise model
     >    that is agnostic of the actual origin of the noise.
   
   * goal is to invert A\mathcal AA, and thus restore the ideal measurement
     statistics.

30


MEASUREMENT ERROR MITIGATION 📄, 📄 (2/4)


FORMAL DEFINITION: STEP 1. NOISE MODEL

 * Assume: the noise channel A\mathcal AA, has the computational subspace
   {⟨ ⁣⟨x∣ ,x∈{0,1}N}\{ \langle\!\langle x\vert\ , x\in \{0,1\}^N
   \}{⟨⟨x∣ ,x∈{0,1}N} as its invariant subspace.
   → the resultant POVM basis {⟨ ⁣⟨Ex∣}\{\langle\!\langle E_x\vert\}{⟨⟨Ex ∣}
   lives within the computational basis.

 * So, we can decompose the POVM in this way:
   
   ⟨ ⁣⟨Ex∣=∑y⟨ ⁣⟨Ex∣y⟩ ⁣⟩⟨ ⁣⟨y∣=∑y⟨ ⁣⟨x∣A∣y⟩ ⁣⟩⟨ ⁣⟨y∣=∑yAxy⟨ ⁣⟨y∣.\begin{align*}
   \langle\!\langle E_x\vert &= \sum_y \langle\!\langle E_x\vert y
   \rangle\!\rangle \langle\!\langle y\vert \\ &= \sum_y \langle\!\langle x\vert
   \mathcal A\vert y \rangle\!\rangle \langle\!\langle y\vert \\ &= \sum_y
   A_{xy}\langle\!\langle y\vert . \end{align*} ⟨⟨Ex ∣ =y∑ ⟨⟨Ex ∣y⟩⟩⟨⟨y∣=y∑
   ⟨⟨x∣A∣y⟩⟩⟨⟨y∣=y∑ Axy ⟨⟨y∣.

 * The AxyA_{xy}Axy on the left is the assignment matrix
   (or sometimes called confusion matrix)
   
   > The coefficient ⟨ ⁣⟨x∣A∣y⟩ ⁣⟩\langle\!\langle x\vert \mathcal A\vert y
   > \rangle\!\rangle⟨⟨x∣A∣y⟩⟩ represents the probability of originally
   > measuring outcome yyy before noise, but measuring outcome xxx after the
   > noise.

 * so AAA is a transition (or "stochastic") matrix
   
   > How to actually get the coefficient ⟨ ⁣⟨x∣A∣y⟩ ⁣⟩\langle\!\langle x\vert
   > \mathcal A\vert y \rangle\!\rangle⟨⟨x∣A∣y⟩⟩? Ans: We just do the noisy
   > measurement on the computational state ∣y⟩ ⁣⟩\vert y \rangle\!\rangle∣y⟩⟩,
   > and see what’s the probability of outcome xxx.


31


MEASUREMENT ERROR MITIGATION 📄, 📄 (3/4)


FORMAL DEFINITION: STEP 2. ASSIGNMENT MATRIX




 * normal condition for inversion:
   
   > If matrix AAA is full-rank, then we can do the reverse: maps noisy outcome
   > back to noiseless outcome ⟨ ⁣⟨y∣=∑xAyx−1⟨ ⁣⟨Ex∣\langle\!\langle y\vert =
   > \sum_x A_{yx}^{-1}\langle\!\langle E_x\vert⟨⟨y∣=∑x Ayx−1 ⟨⟨Ex ∣ → we
   > restore the perfect outcome by linear-combining the noisy outcomes.

 * The AxyA_{xy}Axy on the left is the assignment matrix
   (or sometimes called the "confusion matrix")
   
   > The coefficient ⟨ ⁣⟨x∣A∣y⟩ ⁣⟩\langle\!\langle x\vert \mathcal A\vert y
   > \rangle\!\rangle⟨⟨x∣A∣y⟩⟩ represents the probability of originally
   > measuring outcome yyy before noise, but measuring outcome xxx after the
   > noise.

 * so AAA is a transition (or "stochastic") matrix
   
   > How to actually get the coefficient ⟨ ⁣⟨x∣A∣y⟩ ⁣⟩\langle\!\langle x\vert
   > \mathcal A\vert y \rangle\!\rangle⟨⟨x∣A∣y⟩⟩? Ans: We just do the noisy
   > measurement on the computational state ∣y⟩ ⁣⟩\vert y \rangle\!\rangle∣y⟩⟩,
   > and see what’s the probability of outcome xxx.


32


MEASUREMENT ERROR MITIGATION 📄, 📄 (4/4)


FORMAL DEFINITION: STEP 3. MATRIX INVERSION AND RESTORING IDEAL EXPECTATION
VALUE

 * For our incoming state ρ\rhoρ:
   
   * noiseless measurement gives output distribution p0⃗={⟨ ⁣⟨y∣ρ⟩ ⁣⟩}\vec{p_0}
     = \{\langle\!\langle y\vert \rho \rangle\!\rangle\}p0 ={⟨⟨y∣ρ⟩⟩}
   
   * noisy measurement gives output distribution
     pnoisy⃗={⟨ ⁣⟨Ex∣ρ⟩ ⁣⟩}\vec{p_\text{noisy}} = \{\langle\!\langle E_x\vert
     \rho \rangle\!\rangle\}pnoisy ={⟨⟨Ex ∣ρ⟩⟩}
   
   → pnoisy⃗=Ap0⃗\vec{p_\text{noisy}} = A \vec{p_0}pnoisy =Ap0 , hence
   p0⃗=A−1pnoisy⃗\vec{p_0} = A^{-1} \vec{p_\text{noisy}}p0 =A−1pnoisy .

 * If we want to measure on an observable OOO which has spectrum O⃗={Ox}\vec O =
   \{O_x\}O={Ox } (i.e., ⟨ ⁣⟨O∣=∑xOx⟨ ⁣⟨x∣\langle\!\langle O\vert = \sum_x O_x
   \langle\!\langle x\vert⟨⟨O∣=∑x Ox ⟨⟨x∣), once we know (or obtain) the
   assignment matrix AAA and the noisy output distribution
   pnoisy⃗\vec{p_\text{noisy}}pnoisy , we can restore the noiseless expectation
   value:

⟨ ⁣⟨O∣ρ⟩ ⁣⟩=O⃗Tp0⃗=O⃗TA−1pnoisy⃗\langle\!\langle O\vert \rho \rangle\!\rangle =
\vec{O}^T \vec{p_0} = \vec{O}^T A^{-1} \vec{p_\text{noisy}} ⟨⟨O∣ρ⟩⟩=OTp0
=OTA−1pnoisy

33
Loading slide...
34


MEM: WORKAROUND FOR LARGE SIZED AXYA_{XY}AXY (2/3)


A MORE REALISTIC APPROACH:

Bravyi et al try to construct the assignment matrix using continuous-time Markov
processes with the generators (transition rate matrices) {Gi}\{G_i\}{Gi } being
single and two-qubit operators:

A=eG   with   G=∑i=12N2riGiA = e^G \;\text{ with }\; G = \sum_{i=1}^{2N^2}
r_iG_i A=eG with G=i=1∑2N2 ri Gi

> Why we are summing up to 2N22N^22N2?
> Because for the generators, we can just pick {σx,σy}\{\sigma_x, \sigma_y\}{σx
> ,σy }.
> Therefore number of single-qubit operator → 2N2N2N.
> and the number of two-qubit operator → N(N−1)2⋅22\frac{N(N-1)}{2} \cdot
> 2^22N(N−1) ⋅22 (pick 222 qubits from NNN, and then assign each of them with
> either σx\sigma_xσx or σy\sigma_yσy ).
> Finally adding the two numbers up, we get 2N22N^22N2.


 * Once we’ve obtain all 2N22N^22N2 positive coefficients {ri}\{r_i\}{ri }, we
   can simply calculate A−1=e−GA^{-1} = e^{-G}A−1=e−G.

35


MEM: WORKAROUND FOR LARGE SIZED AXYA_{XY}AXY (3/3)

Introducing Pauli Twirling - simplifying the process 📄, 📄

After twirling, we can turn the original noise channel A\mathcal AA into a Pauli
channel D\mathcal DD, i.e., by using twirling we can remove all the off-diagonal
elements of AAA (in Pauli-basis)

 * Before twirling: ideally we want to measure in ⟨ ⁣⟨σzx∣\langle\!\langle
   \sigma_z^x\vert⟨⟨σzx ∣ (xxx is the bit string marking the qubits that are
   acted by σz\sigma_zσz ), but because of the noise, we can only measure in
   ⟨ ⁣⟨σzx∣A\langle\!\langle \sigma_z^x\vert\mathcal A⟨⟨σzx ∣A.
 * After twirling: A\mathcal AA becomes D\mathcal DD, and what’s special about
   it is that ⟨ ⁣⟨σzx∣D=Dx⟨ ⁣⟨σzx∣\langle\!\langle \sigma_z^x\vert\mathcal D =
   D_x\langle\!\langle \sigma_z^x\vert⟨⟨σzx ∣D=Dx ⟨⟨σzx ∣ where
   Dx=⟨ ⁣⟨σzx∣D∣σzx⟩ ⁣⟩D_x = \langle\!\langle \sigma_z^x\vert\mathcal D
   \vert\sigma_z^x\rangle\!\rangleDx =⟨⟨σzx ∣D∣σzx ⟩⟩ → the noisy measurement is
   just a rescale of the noiseless measurement

For a given input state ∣ρ⟩ ⁣⟩\vert \rho \rangle\!\rangle∣ρ⟩⟩, we have simply:

⟨ ⁣⟨σzx∣ρ⟩ ⁣⟩⏟ideal=Dx−1⟨ ⁣⟨σzx∣D∣ρ⟩ ⁣⟩⏟noisy, but also twirled\underbrace{\langle\!\langle
\sigma_z^x\vert \rho \rangle\!\rangle}_{\text{ideal}} = D_x^{-1}
\underbrace{\langle\!\langle \sigma_z^x\vert\mathcal D\vert \rho
\rangle\!\rangle}_{\text{noisy, but also twirled}} ideal⟨⟨σzx ∣ρ⟩⟩ =Dx−1
noisy, but also twirled⟨⟨σzx ∣D∣ρ⟩⟩

> What’s more awesome is that: for Pauli twirling in this case, we only need to
> conjugate the original noise channel A\mathcal AA with {I,σx}\{I,
> \sigma_x\}{I,σx }instead of {I,σx,σy,σz}\{I, \sigma_x, \sigma_y,
> \sigma_z\}{I,σx ,σy ,σz } because conjugating A\mathcal AA with σy\sigma_yσy
> or σz\sigma_zσz will only have trivial effect within the computational
> subspace.

36
Loading slide...
37
Loading slide...
38


PURITY CONSTRAINTS


ORIGIN

 * Many quantum algorithms target the preparation of an ideal state ρ0\rho_0ρ0
   that is pure: ρ0=∣ψ0⟩⟨ψ0∣\rho_0 = \ket{\psi_0}\bra{\psi_0}ρ0 =∣ψ0 ⟩⟨ψ0 ∣.

 * Common-seen noise channels are stochastic, which will turn our ideal pure
   state ρ0\rho_0ρ0 into some noisy mixed state ρ\rhoρ.

 * So our goal is to find (or approximate) the closest pure state to ρ\rhoρ.

 * We can do spectral decomposition of NNN-qubit state: ρ=∑i=12Npi∣ϕi⟩⟨ϕi∣\rho =
   \sum_{i=1}^{2^N} p_i \ket{\phi_i}\bra{\phi_i}ρ=∑i=12N pi ∣ϕi ⟩⟨ϕi ∣ where the
   eigenvalues are in order (p1≥p2≥p3≥⋯p_1\geq p_2 \geq p_3 \geq \cdotsp1 ≥p2
   ≥p3 ≥⋯), then our goal is to seek for ∣ϕ1⟩⟨ϕ1∣\ket{\phi_1}\bra{\phi_1}∣ϕ1
   ⟩⟨ϕ1 ∣.

 * There’s already a way to do this: quantum Principal Component Analysis →
   sample from the eigenbasis of ρ\rhoρ. But, the circuit associated with this
   method is too deep for near-term devices. → need alternative strategies.

39


VIRTUAL DISTILLATION / ERROR SUPPRESSION BY DERANGEMENT ( VD / ESD )

 * By collective measuring MMM copies of ρ\rhoρ (which is noisy) to infer the
   expectation values w.r.t. the MthM^\text{th}Mth degree purified state:
   
   ρpur(M)=ρMTr[ρM]=1∑i=12NpiM∑i=12NpiM∣ϕi⟩⟨ϕi∣\rho_\text{pur}^{(M)} =
   \frac{\rho^M}{\text{Tr}[\rho^M]} = \frac{1}{\sum_{i=1}^{2^N} p_i^M }
   \sum_{i=1}^{2^N} p_i^M \ket{\phi_i}\bra{\phi_i} ρpur(M) =Tr[ρM]ρM =∑i=12N piM
   1 i=1∑2N piM ∣ϕi ⟩⟨ϕi ∣

 * As M→∞M\rightarrow\inftyM→∞, our ρpur(M)→∣ϕ1⟩⟨ϕ1∣\rho_\text{pur}^{(M)}
   \rightarrow\ket{\phi_1}\bra{\phi_1}ρpur(M) →∣ϕ1 ⟩⟨ϕ1 ∣. The remaining bias in
   the estimator is called coherent mismatch or noise floor.

 * How do we measure to obtain
   Tr[Oρpur(M)]\text{Tr}[O\rho_\text{pur}^{(M)}]Tr[Oρpur(M) ]? Notice from the
   above formula that we have:
   
   Tr[Oρpur(M)]=Tr[OρM]Tr[ρM]\text{Tr}[O\rho_\text{pur}^{(M)}] =
   \frac{\text{Tr}[O\rho^M]}{\text{Tr}[\rho^M]} Tr[Oρpur(M) ]=Tr[ρM]Tr[OρM]
   
   → so we can separately measure for Tr[OρM]\text{Tr}[O\rho^M]Tr[OρM] and
   Tr[ρM]\text{Tr}[\rho^M]Tr[ρM] and divide one by another.

40


HOW TO MEASURE TR[OΡM]\TEXT{TR}[O\RHO^M]TR[OΡM] ?

(and Tr[ρM]\text{Tr}[\rho^M]Tr[ρM] of course)

Let SMS_MSM be the cyclic permutation operator between MMM copies of ρ\rhoρ,
then Tr[OρM]\text{Tr}[O\rho^M]Tr[OρM] is just
Tr[SMOmρ⊗M]\text{Tr}[S_MO_m\rho^{\otimes M}]Tr[SM Om ρ⊗M], or
Tr[SMOˉρ⊗M]\text{Tr}[S_M\bar O\rho^{\otimes M}]Tr[SM Oˉρ⊗M] where Oˉ=1M∑mOm\bar
O = \frac1M\sum_mO_mOˉ=M1 ∑m Om (is the observable symmetrized under copy
permutation.), case for M=3M=3M=3:

, and for Tr[ρM]\text{Tr}[\rho^M]Tr[ρM], we just simply remove OOO from the
above graph.


41
Loading slide...
42


SUBSPACE EXPANSIONS 📄, 📄, 📄

One class of methods that can take advantage of the knowledge of the problem are
quantum subspace expansion techniques.


PROBLEM FORMULATION

 * e.g., optimization or state preparation, can be phrased as the desire to find
   some ∣ψ⟩\ket{\psi}∣ψ⟩ to minimize the objective function ⟨ψ∣H∣ψ⟩\bra{\psi} H
   \ket{\psi}⟨ψ∣H∣ψ⟩, with some already-known HHH.

 * but we can not directly prepare perfect ∣ψ⟩\ket{\psi}∣ψ⟩ (due to some
   coherent error in the circuit, or just “don’t knowing what the ideal circuit
   should be“)

 * good news is that: we can easily prepare a set of MMM different states
   {∣ϕi⟩}\{\ket{\phi_i}\}{∣ϕi ⟩}, which are linearly independent (but not
   necessary orthogonal to each other). And express our desired state as a
   linear combination of them:
   
   ∣ψw⃗⟩=∑i=0M−1wi∣ϕi⟩\ket{\psi_{\vec w}} = \sum_{i=0}^{M-1} w_i\ket{\phi_i} ∣ψw
   ⟩=i=0∑M−1 wi ∣ϕi ⟩

43

 * Dual problem: it’s equivalent to finding the optimal set of coefficients
   w⃗∗\vec w^*w∗ such that:
   
   w⃗∗=arg⁡min⁡w⃗⟨ψw⃗∣H∣ψw⃗⟩     s.t.     ⟨ψw⃗∣ψw⃗⟩=1\vec w^*= \arg\min_{\vec w}
   \bra{\psi_{\vec w}} H \ket{\psi_{\vec w}} \;\; \text{ s.t. }\;\;
   \braket{\psi_{\vec w}|\psi_{\vec w}} =1 w∗=argwmin ⟨ψw ∣H∣ψw ⟩ s.t. ⟨ψw ∣ψw
   ⟩=1

 * This is a generalized linear eigenvalue problem of pair (Hˉ,Sˉ)(\bar H, \bar
   S)(Hˉ,Sˉ):
   
   HˉW=SˉWE\bar H W = \bar S WE HˉW=SˉWE
   
   , where Hˉij=⟨ϕi∣H∣ϕj⟩\bar H_{ij} = \bra{\phi_i} H \ket{\phi_j}Hˉij =⟨ϕi
   ∣H∣ϕj ⟩ and Sˉij=⟨ϕi∣ϕj⟩\bar S_{ij} = \braket{\phi_i | \phi_j}Sˉij =⟨ϕi ∣ϕj
   ⟩. To speak in English, Hˉ\bar HHˉ is just HHH, but in the
   {∣ϕi⟩}\{\ket{\phi_i}\}{∣ϕi ⟩}-basis (hence with dimension M×MM\times MM×M);
   and Sˉ\bar SSˉ is just the overlap matrix for the basis set
   {∣ϕi⟩}\{\ket{\phi_i}\}{∣ϕi ⟩}.

generalized linear eigenvalue problem

Eigenvalue problem:
Aϕ⃗i=λiϕ⃗i\bm A \vec\phi_i = \lambda_i \vec\phi_iAϕ i =λi ϕ i , or in matrix
form: AΦ=ΦΛ\bm A \bm\Phi = \bm\Phi \bm\LambdaAΦ=ΦΛ. (the lambda matrix
Λ\bm\LambdaΛ is diagonal)

Generalized linear eigenvalue problem:
Aϕ⃗i=λiBϕ⃗i\bm A \vec\phi_i = \lambda_i \bm B\vec\phi_iAϕ i =λi Bϕ i , or in
matrix form: AΦ=BΦΛ\bm A \bm\Phi = \bm B \bm\Phi \bm\LambdaAΦ=BΦΛ.
→ this problem reduces to the previous simplier one when B=identity matrix\bm B
= \text{identity matrix}B=identity matrix.

44


ILLUSTRATION OF SUBSPACE EXPANSION ROUTINE

in next slide:
calculating Code Hamiltonian Matrix Hˉij=⟨ϕi∣H∣ϕj⟩\bar H_{ij} = \bra{\phi_i} H
\ket{\phi_j}Hˉij =⟨ϕi ∣H∣ϕj ⟩ and Overlap Matrix Sˉij=⟨ϕi∣ϕj⟩\bar S_{ij} =
\braket{\phi_i | \phi_j}Sˉij =⟨ϕi ∣ϕj ⟩

45
Loading slide...
46


FINDING W⃗∗\VEC W^*W∗

 * We know HHH, and we also know {∣ϕi⟩}\{\ket{\phi_i}\}{∣ϕi ⟩}, so we can
   formulate the above problem explicitly.

 * And this problem (generalized linear eigenvalue problem) can be solved
   classically (on classical computer), obtaining the WWW and EEE matrices
   (called "eigenvectors matrices" and "eigenvalue matrices") on classical
   computer.

 * Then, what we want is the lowest eigenvalue in EEE (not yet!), and its
   associated eigenvector (in WWW) is the optimal combination of coefficients
   w⃗∗\vec w^*w∗.

47
Loading slide...
48


NEXT UP

What are the costs (sample complexity) of these techniques?

49


QUANTUM FISHER INFORMATION

 * p(x∣λ)p(x|\lambda)p(x∣λ) → conditional probability of getting xxx as the
   measurement result when the parameter is λ\lambdaλ, then the Fisher
   Information is:

F(λ)=∫p(x∣λ)(∂ln⁡(p(x∣λ))∂λ)2dx=∫1p(x∣λ)(∂p(x∣λ)∂λ)2dxF(\lambda) = \int
p(x|\lambda) \left( \frac{\partial \ln(p(x|\lambda))}{\partial \lambda}
\right)^2 dx = \int \frac{1}{p(x|\lambda)} \left( \frac{\partial
p(x|\lambda)}{\partial \lambda} \right)^2 dx F(λ)=∫p(x∣λ)(∂λ∂ln(p(x∣λ))
)2dx=∫p(x∣λ)1 (∂λ∂p(x∣λ) )2dx

 * Now introduce POVM {Πx}\{\Pi_x\}{Πx } which obeys completeness:
   ∫Πxdx=I\int\Pi_xdx=I∫Πx dx=I, and for our later convenience, define the
   symmetric logarithmic derivative (SLD) LλL_\lambdaLλ implicitly as the
   Hermitian operator satisfying:
   
   Lλρλ+ρλLλ2=∂ρλ∂λ\frac{L_\lambda\rho_\lambda + \rho_\lambda L_\lambda}{2} =
   \frac{\partial\rho_\lambda}{\partial\lambda} 2Lλ ρλ +ρλ Lλ =∂λ∂ρλ
   
   * the naming origin of SLD:
     
     symmetric: Lλρλ+ρλLλL_\lambda\rho_\lambda + \rho_\lambda L_\lambdaLλ ρλ +ρλ
     Lλ → obvious.
     logarithmic derivative: divide both sides by ρλ\rho_\lambdaρλ we kind of
     get Lλ:=∂ln⁡ρλ∂λL_\lambda :=
     \frac{\partial\ln\rho_\lambda}{\partial\lambda}Lλ :=∂λ∂lnρλ
   * This is a Lyapunov matrix equation with general solution available.

50
 * From the definition in the previous slide, we can rewrite Fisher Information
   as

F(λ)=∫1p(x∣λ)(∂p(x∣λ)∂λ)2dx=∫1tr[ρΠ](∂∂λtr[ρΠ])2dx=∫1tr[ρΠ](tr[∂ρ∂λΠ])2dx=∫1tr[ρΠ](12tr[LρΠ+ρLΠ])2dx=∫Re(tr[ρΠL])2tr[ρΠ]dx\begin{align*}
F(\lambda) &= \int \frac{1}{p(x|\lambda)} \left( \frac{\partial
p(x|\lambda)}{\partial \lambda} \right)^2 dx \\ &= \int
\frac{1}{\text{tr}[\rho\Pi]} \left( \frac{\partial}{\partial\lambda}
\text{tr}[\rho\Pi] \right)^2 dx \\ &= \int \frac{1}{\text{tr}[\rho\Pi]} \left(
\text{tr}\left[\frac{\partial\rho}{\partial\lambda} \Pi\right] \right)^2 dx \\
&= \int \frac{1}{\text{tr}[\rho\Pi]} \left( \frac12\text{tr}\left[L\rho \Pi +
\rho L \Pi\right] \right)^2 dx \\ &= \int \frac{\text{Re}(\text{tr}[\rho\Pi
L])^2}{\text{tr}[\rho\Pi]} dx \end{align*} F(λ) =∫p(x∣λ)1 (∂λ∂p(x∣λ)
)2dx=∫tr[ρΠ]1 (∂λ∂ tr[ρΠ])2dx=∫tr[ρΠ]1 (tr[∂λ∂ρ Π])2dx=∫tr[ρΠ]1 (21
tr[LρΠ+ρLΠ])2dx=∫tr[ρΠ]Re(tr[ρΠL])2 dx

51
Loading slide...
52

 * In the previous slide, the first inequality saturates when
   tr[ρΠL]\text{tr}[\rho\Pi L]tr[ρΠL] is a real number for any λ\lambdaλ, and
   the second inequality comes from the Schwartz inequality
   ∣tr[A†B]∣2≤tr[A†A]tr[B†B]|\text{tr}[A^\dagger B]|^2 \leq \text{tr}[A^\dagger
   A] \text{tr}[B^\dagger B]∣tr[A†B]∣2≤tr[A†A]tr[B†B], with
   A†≡ρΠ/tr[ρΠ]A^\dagger\equiv \sqrt\rho\sqrt\Pi/\sqrt{\text{tr}[\rho\Pi]}A†≡ρ Π
   /tr[ρΠ] and B≡ΠLρB\equiv \sqrt\Pi L\sqrt\rhoB≡Π Lρ , and it’s saturated when
   Πρtr[ρΠ]=ΠLρtr[ρΠL]\frac{\sqrt\Pi\sqrt\rho}{\text{tr}[\rho\Pi]} =
   \frac{\sqrt\Pi L\sqrt\rho}{\text{tr}[\rho\Pi L]}tr[ρΠ]Π ρ =tr[ρΠL]Π Lρ for
   any λ\lambdaλ. <- this is satisfied iff {Πx}\{\Pi_x\}{Πx } is made by the set
   of projectors over the eigenstates of LλL_\lambdaLλ , which, in turn,
   represents the optimal POVM to estimate λ\lambdaλ.

 * The above chain of inequalities prove that the Fisher information
   F(λ)F(\lambda)F(λ) of any quantum measurement is bounded by the so-called
   Quantum Fisher Information H(λ)≡tr[ρL2]=tr[∂ρ∂λL]H(\lambda) \equiv
   \text{tr}[\rho L^2] =
   \text{tr}[\frac{\partial\rho}{\partial\lambda}L]H(λ)≡tr[ρL2]=tr[∂λ∂ρ L].

 * You can view Quantum Fisher Information as the supremum of the classical
   Fisher Information over all observables.

53


LYAPUNOV MATRIX EQUATION



The general solution to Lλρλ+ρλLλ2=∂ρλ∂λ\frac{L_\lambda\rho_\lambda +
\rho_\lambda L_\lambda}{2} = \frac{\partial\rho_\lambda}{\partial\lambda}2Lλ ρλ
+ρλ Lλ =∂λ∂ρλ for the SLD LλL_\lambdaLλ may be written as:

Lλ=2∫0∞e−ρt∂ρ∂λe−ρtdtL_\lambda = 2\int_0^\infty e^{-\rho t
}\frac{\partial\rho}{\partial \lambda}e^{-\rho t } dt Lλ =2∫0∞ e−ρt∂λ∂ρ e−ρtdt

Now if we write ρ\rhoρ in its eigenbasis
∑nρn∣ψn⟩⟨ψn∣\sum_n\rho_n\ket{\psi_n}\bra{\psi_n}∑n ρn ∣ψn ⟩⟨ψn ∣, the SLD easily
becomes: (you cen verify this by direct substitution )

Lλ=2∑m,n⟨ψm∣∂ρ∂λ∣ψn⟩ρm+ρn∣ψm⟩⟨ψn∣L_\lambda = 2\sum_{m,n}
\frac{\bra{\psi_m}\frac{\partial\rho}{\partial\lambda}\ket{\psi_n}}{\rho_m+\rho_n}\ket{\psi_m}\bra{\psi_n}
Lλ =2m,n∑ ρm +ρn ⟨ψm ∣∂λ∂ρ ∣ψn ⟩ ∣ψm ⟩⟨ψn ∣

where the sum only includes terms with ρn+ρm≠0\rho_n+\rho_m\neq0ρn +ρm =0. From
this we can express our Quantum Fisher Information H(λ)H(\lambda)H(λ) as: (to
verify this, just use the dummy index trick)

H(λ)=tr[ρL2]=4∑m,nρm∣⟨ψm∣∂ρ∂λ∣ψn⟩∣2(ρm+ρn)2=2∑m,n∣⟨ψm∣∂ρ∂λ∣ψn⟩∣2ρm+ρnH(\lambda)
= \text{tr}[\rho L^2] = 4\sum_{m,n} \rho_m
\frac{\left|\bra{\psi_m}\frac{\partial\rho}{\partial\lambda}\ket{\psi_n}\right|^2}{(\rho_m+\rho_n)^2}
= 2\sum_{m,n}
\frac{\left|\bra{\psi_m}\frac{\partial\rho}{\partial\lambda}\ket{\psi_n}\right|^2}{\rho_m+\rho_n}
H(λ)=tr[ρL2]=4m,n∑ ρm (ρm +ρn )2 ⟨ψm ∣∂λ∂ρ ∣ψn ⟩ 2 =2m,n∑ ρm +ρn ⟨ψm ∣∂λ∂ρ ∣ψn ⟩
2

54


USUAL FORM OF QUANTUM FISHER INFORMATION H(Λ)H(\LAMBDA)H(Λ)

If the dynamics of ρ\rhoρ is governed by some Hamiltonian-like operator AAA,
that is, ρ=(e−iλA)(ρ0)(eiλA)\rho = (e^{-i\lambda A}) (\rho_0 )(e^{i\lambda
A})ρ=(e−iλA)(ρ0 )(eiλA), the Quantum Fisher Information reduces to:

H(λ)=2∑m,n(ρm−ρn)2ρm+ρn∣⟨ψm∣A∣ψn⟩∣2H(\lambda) = 2\sum_{m,n}
\frac{(\rho_m-\rho_n)^2}{\rho_m+\rho_n} \left|\bra{\psi_m}A\ket{\psi_n}\right|^2
H(λ)=2m,n∑ ρm +ρn (ρm −ρn )2 ∣⟨ψm ∣A∣ψn ⟩∣2

 * Proof:
   
   ⟨ψm∣∂ρ∂λ∣ψn⟩=⟨ψm∣(−iA)ρ+ρ(iA)∣ψn⟩=⟨ψm∣((−iA)∑mρn∣ψn⟩⟨ψn∣+∑mρn∣ψn⟩⟨ψn∣(iA))∣ψn⟩=−iρn⟨ψm∣A∣ψn⟩+iρm⟨ψm∣A∣ψn⟩  ⟹  ∣⟨ψm∣∂ρ∂λ∣ψn⟩∣2=(ρm−ρn)2⋅∣⟨ψm∣A∣ψn⟩∣2\begin{align*}
   &\bra{\psi_m}\frac{\partial\rho}{\partial\lambda}\ket{\psi_n} \\ &=
   \bra{\psi_m} (-iA)\rho + \rho (iA) \ket{\psi_n} \\ &= \bra{\psi_m}
   \left((-iA)\sum_m\rho_n\ket{\psi_n}\bra{\psi_n} +
   \sum_m\rho_n\ket{\psi_n}\bra{\psi_n} (iA)\right) \ket{\psi_n} \\ &= -i\rho_n
   \bra{\psi_m}A \ket{\psi_n} + i\rho_m \bra{\psi_m}A \ket{\psi_n} \\
   &\implies\left|\bra{\psi_m}\frac{\partial\rho}{\partial\lambda}\ket{\psi_n}\right|^2
   = (\rho_m-\rho_n)^2 \cdot \left| \bra{\psi_m}A \ket{\psi_n}\right|^2
   \end{align*} ⟨ψm ∣∂λ∂ρ ∣ψn ⟩=⟨ψm ∣(−iA)ρ+ρ(iA)∣ψn ⟩=⟨ψm ∣((−iA)m∑ ρn ∣ψn ⟩⟨ψn
   ∣+m∑ ρn ∣ψn ⟩⟨ψn ∣(iA))∣ψn ⟩=−iρn ⟨ψm ∣A∣ψn ⟩+iρm ⟨ψm ∣A∣ψn ⟩⟹ ⟨ψm ∣∂λ∂ρ ∣ψn
   ⟩ 2=(ρm −ρn )2⋅∣⟨ψm ∣A∣ψn ⟩∣2

55


OPTIMAL QUANTUM ESTIMATOR

Notice, however, that LλL_\lambdaLλ itself may not represent the optimal
observable to be measured. Using the fact that Tr[ρλLλ]=0\text{Tr}[\rho_\lambda
L_\lambda]=0Tr[ρλ Lλ ]=0 (because trace of ρ\rhoρ is always one, putting it into
the SLD definition and it becomes obvious~), we can come up with an explicit
form for the optimal quantum estimator:

Oλ=λI+LλH(λ)O_\lambda = \lambda \mathbb{I} + \frac{L_\lambda}{H(\lambda)} Oλ
=λI+H(λ)Lλ

for which we have:

{Tr[ρλOλ]=λTr[ρλOλ2]=λ2+Tr[ρλLλ2]H2(λ)\begin{cases} \text{Tr}[\rho_\lambda
O_\lambda] = \lambda \\ \text{Tr}[\rho_\lambda O_\lambda^2] = \lambda^2 +
\dfrac{\text{Tr}[\rho_\lambda L_\lambda^2] }{H^2(\lambda)} \end{cases} ⎩⎨⎧ Tr[ρλ
Oλ ]=λTr[ρλ Oλ2 ]=λ2+H2(λ)Tr[ρλ Lλ2 ]

therefore ⟨ΔOλ2⟩=1/H(λ)\braket{\Delta O^2_\lambda} = 1/H(\lambda)⟨ΔOλ2 ⟩=1/H(λ).

56


EXPANSION OF SLD

We can further expand ∂ρ/∂λ\partial\rho/\partial\lambda∂ρ/∂λ in its eigenbasis
as

∂ρ∂λ=∑n∂ρn∂λ∣ψn⟩⟨ψn∣+ρn∣∂ψn∂λ⟩⟨ψn∣+ρn∣ψn⟩⟨∂ψn∂λ∣\frac{\partial\rho}{\partial\lambda}
= \sum_n \frac{\partial\rho_n}{\partial\lambda} \ket{\psi_n}\bra{\psi_n} +
\rho_n \ket{\frac{\partial\psi_n}{\partial\lambda}} \bra{\psi_n} +
\rho_n\ket{\psi_n} \bra{\frac{\partial\psi_n}{\partial\lambda}} ∂λ∂ρ =n∑ ∂λ∂ρn
∣ψn ⟩⟨ψn ∣+ρn ∣∂λ∂ψn ⟩⟨ψn ∣+ρn ∣ψn ⟩⟨∂λ∂ψn ∣

but since SLD is defined only on the support of ρ\rhoρ and that both the
eigenvalues ρk\rho_kρk and the eigenvectors ∣ψn⟩\ket{\psi_n}∣ψn ⟩ may depend on
the parameter λ\lambdaλ, we may moreover expand ∣ψn⟩\ket{\psi_n}∣ψn ⟩ in some
arbitrary basis {∣k⟩}\{\ket k\}{∣k⟩} that is independent of λ\lambdaλ. If we do
this expansion for ⟨ψn∣ψm⟩=δnm\braket{\psi_n | \psi_m} = \delta_{nm}⟨ψn ∣ψm
⟩=δnm , we get (but this will not be used in the later context tbh)

⟨ψn∣ψm⟩=∑k,l(ψnk∗⟨k∣)(ψml∣l⟩)=∑kψnk∗ψmk=δnm\braket{\psi_n | \psi_m} = \sum_{k,l}
(\psi_{nk}^* \bra{k} ) (\psi_{ml} \ket l) = \sum_k
\psi_{nk}^*\psi_{mk}=\delta_{nm} ⟨ψn ∣ψm ⟩=k,l∑ (ψnk∗ ⟨k∣)(ψml ∣l⟩)=k∑ ψnk∗ ψmk
=δnm

57

On the other hand, from product rule we have

∂⟨ψm∣ψn⟩∂λ=⟨∂ψm∂λ∣ψn⟩+⟨ψm∣∂ψn∂λ⟩=∂δmn∂λ=0\frac{\partial\braket{\psi_m|\psi_n}
}{\partial\lambda} = \braket{\frac{\partial\psi_m}{\partial\lambda} | \psi_n} +
\braket{\psi_m|\frac{\partial\psi_n}{\partial\lambda}} = \frac{\partial
\delta_{mn}}{\partial\lambda} = 0 ∂λ∂⟨ψm ∣ψn ⟩ =⟨∂λ∂ψm ∣ψn ⟩+⟨ψm ∣∂λ∂ψn ⟩=∂λ∂δmn
=0

therefore using the fact that
⟨∂ψm∂λ∣ψn⟩=−⟨ψm∣∂ψn∂λ⟩\braket{\frac{\partial\psi_m}{\partial\lambda} | \psi_n}
=-\braket{\psi_m|\frac{\partial\psi_n}{\partial\lambda}}⟨∂λ∂ψm ∣ψn ⟩=−⟨ψm ∣∂λ∂ψn
⟩, we can rewrite the matrix form of LλL_\lambdaLλ in the SLD general solution
as:

Lλ=2∑m,n⟨ψm∣∂ρ∂λ∣ψn⟩ρm+ρn∣ψm⟩⟨ψn∣=∑n∂ρn∂λ∣ψn⟩⟨ψn∣ρn      +2∑m≠n(ρn⟨ψm∣∂ψn∂λ⟩+ρm⟨∂ψm∂λ∣ψn⟩)∣ψm⟩⟨ψn∣ρm+ρn=∑n∂ρn∂λ∣ψn⟩⟨ψn∣ρn+2∑m≠nρn−ρmρn+ρm⟨ψm∣∂ψn∂λ⟩∣ψm⟩⟨ψn∣\begin{align*}
L_\lambda &= 2\sum_{m,n}
\frac{\bra{\psi_m}\frac{\partial\rho}{\partial\lambda}\ket{\psi_n}}{\rho_m+\rho_n}\ket{\psi_m}\bra{\psi_n}
\\ &= \sum_n
\frac{\partial\rho_n}{\partial\lambda}\frac{\ket{\psi_n}\bra{\psi_n}}{\rho_n} \\
&\;\;\;+ 2\sum_{m\neq n} \left( \rho_n
\braket{\psi_m|\frac{\partial\psi_n}{\partial\lambda}} +
\rho_m\braket{\frac{\partial\psi_m}{\partial\lambda} | \psi_n}
\right)\frac{\ket{\psi_m}\bra{\psi_n}}{\rho_m+\rho_n} \\ &= \sum_n
\frac{\partial\rho_n}{\partial\lambda}\frac{\ket{\psi_n}\bra{\psi_n}}{\rho_n} +
2\sum_{m\neq n} \frac{\rho_n-\rho_m}{\rho_n+\rho_m}
\braket{\psi_m|\frac{\partial\psi_n}{\partial\lambda}} \ket{\psi_m}\bra{\psi_n}
\end{align*} Lλ =2m,n∑ ρm +ρn ⟨ψm ∣∂λ∂ρ ∣ψn ⟩ ∣ψm ⟩⟨ψn ∣=n∑ ∂λ∂ρn ρn ∣ψn ⟩⟨ψn ∣
+2m=n∑ (ρn ⟨ψm ∣∂λ∂ψn ⟩+ρm ⟨∂λ∂ψm ∣ψn ⟩)ρm +ρn ∣ψm ⟩⟨ψn ∣ =n∑ ∂λ∂ρn ρn ∣ψn ⟩⟨ψn
∣ +2m=n∑ ρn +ρm ρn −ρm ⟨ψm ∣∂λ∂ψn ⟩∣ψm ⟩⟨ψn ∣

58
Loading slide...
59


MULTI-PARAMETRIC QFI AND RE-PARAMETRIZATION

When our quantum states ρ\rhoρ depends on a set of parameters
λ={λi},i=1,2,⋯ ,N\bm\lambda = \{\lambda_i\}, i=1,2,\cdots,Nλ={λi },i=1,2,⋯,N,
our Quantum Fisher Information changes from a scalar into a matrix:

H(λ)ij=Tr[ρLiLj+LjLi2]=Tr[∂ρ∂λiLj]=Tr[∂ρ∂λjLi]=∑n(∂iρn)(∂jρn)ρn+∑n≠m(ρm−ρn)2ρm+ρn[⟨ψn∣∂iψm⟩⟨∂jψm∣ψn⟩+⟨ψn∣∂jψm⟩⟨∂iψm∣ψn⟩]\begin{align*}
\bm H(\bm\lambda)_{ij} &= \text{Tr}\left[ \rho \frac{L_iL_j + L_jL_i}{2} \right]
= \text{Tr}[\frac{\partial\rho}{\partial\lambda_i}L_j] =
\text{Tr}[\frac{\partial\rho}{\partial\lambda_j}L_i] \\ &= \sum_n
\frac{(\partial_i\rho_n)(\partial_j\rho_n)}{\rho_n} + \sum_{n\neq m}
\frac{(\rho_m-\rho_n)^2}{\rho_m+\rho_n} \left[ \braket{\psi_n|\partial_i\psi_m}
\braket{\partial_j\psi_m|\psi_n} + \braket{\psi_n|\partial_j\psi_m}
\braket{\partial_i\psi_m|\psi_n} \right] \end{align*} H(λ)ij =Tr[ρ2Li Lj +Lj Li
]=Tr[∂λi ∂ρ Lj ]=Tr[∂λj ∂ρ Li ]=n∑ ρn (∂i ρn )(∂j ρn ) +n=m∑ ρm +ρn (ρm −ρn )2
[⟨ψn ∣∂i ψm ⟩⟨∂j ψm ∣ψn ⟩+⟨ψn ∣∂j ψm ⟩⟨∂i ψm ∣ψn ⟩]

60

The Cramer-Rao theorem for multi-parameter estimation says that the inverse of
the Fisher matrix provides a lower bound on the covariance matrix
Cov[γ]ij=⟨λiλj⟩−⟨λi⟩⟨λj⟩\text{Cov}[\bm\gamma]_{ij} = \braket{\lambda_i\lambda_j}
- \braket{\lambda_i}\braket{\lambda_j}Cov[γ]ij =⟨λi λj ⟩−⟨λi ⟩⟨λj ⟩, that is,

Cov[γ]≥1NH(λ)−1\text{Cov}[\bm\gamma] \geq \frac{1}{N} \bm H(\bm\lambda)^{-1}
Cov[γ]≥N1 H(λ)−1

 * this is a matrix inequality, hence the corresponding bound may not be
   achievable in a multi-parameter estimation.

 * but on the other hand, the diagonal elements of the inverse Fisher matrix
   provide achievable bounds for the variances of single parameter estimators at
   fixed value of the others, which is
   Var(λi)=γii≥1N(H−1)ii\text{Var}(\lambda_i) = \gamma_{ii} \geq \frac{1}{N}
   (\bm H^{-1})_{ii}Var(λi )=γii ≥N1 (H−1)ii .

61
Loading slide...
62


APPLYING CRAMÉR-RAO BOUND

Our task is now:

Given: an unknown, noiseless quantum state ρ\rhoρ which can be parameterized as
ρ^(θ)\hat \rho(\bm{\theta})ρ^ (θ) with θ\bm{\theta}θ being the unknown
parameters.

Goal: to estimate the quantity of f(θ)f(\bm{\theta})f(θ), which is a function of
θ\bm{\theta}θ.

How: by performing POVM on NNN-many independent copies of ρ^(θ)\hat
\rho(\bm{\theta})ρ^ (θ).

63


APPLYING CRAMÉR-RAO BOUND: QUICK RECAP

Variance≥1N⋅{Quantum Fisher Information}\text{Variance} \geq
\frac{1}{N\cdot\{\text{Quantum Fisher Information}\}}
Variance≥N⋅{Quantum Fisher Information}1

Var(f(θ)est)≥1N⋅∇θ(f(θ))T  J(ρ(θ))−1  ∇θ(f(θ))\text{Var}(f(\bm\theta)^\text{est})
\geq \frac 1N \cdot \bm\nabla_{\bm\theta} (f(\bm\theta))^T \;
J(\rho(\bm\theta))^{-1} \; \bm\nabla_{\bm\theta} (f(\bm\theta)) Var(f(θ)est)≥N1
⋅∇θ (f(θ))TJ(ρ(θ))−1∇θ (f(θ))

(can jump to Main Results)

The quantum Fisher information matrix JJJ of ρ^(θ)\hat\rho(\bm\theta)ρ^ (θ) is
defined exactly the same as in Multi-parametric QFI and re-parametrization:

[J]ij=12tr[ρ^{L^i,L^j}][J]_{ij} = \frac12 \text{tr} \left[\hat\rho\Big\{ \hat
L_i, \hat L_j \Big\}\right] [J]ij =21 tr[ρ^ {L^i ,L^j }]

where L^i\hat L_iL^i is the symmetric logarithmic derivative (SLD) operator
defined exactly the same as in Quantum Fisher Information: (with slightly
changed notation)

12{ρ^,L^i}=∂ρ^∂θi\frac12\Big\{ \hat\rho, \hat L_i \Big\} = \frac{\partial
\hat\rho}{\partial \theta_i} 21 {ρ^ ,L^i }=∂θi ∂ρ^

64
Loading slide...
65
Loading slide...
66


VARIANCE OF NOISELESS STATE

The bound for Var(⟨X^⟩est)\text{Var}(\braket{\hat X}^\text{est})Var(⟨X^⟩est)
satisfies:

Var(⟨X^⟩est)≥1N⋅(2(n−1)/2  xT)J(ρ(θ))−1(2(n−1)/2  x)=2n−1NxTJ(ρ(θ))−1x=1N(ΔX^)2\begin{align*}
\text{Var}(\braket{\hat X}^\text{est}) &\geq \frac1N \cdot \left( 2^{(n-1)/2}\;
\bm{x}^T\right) J(\rho(\bm\theta))^{-1} \left( 2^{(n-1)/2}\; \bm{x}\right) \\ &=
\frac{2^{n-1}}{N} \bm{x}^T J(\rho(\bm\theta))^{-1} \bm{x} \\ &= \frac1N
(\Delta\hat X)^2 \end{align*} Var(⟨X^⟩est) ≥N1
⋅(2(n−1)/2xT)J(ρ(θ))−1(2(n−1)/2x)=N2n−1 xTJ(ρ(θ))−1x=N1 (ΔX^)2

where (ΔX^)2≡(\Delta\hat X)^2 \equiv(ΔX^)2≡ the variance of observable X^\hat
XX^ of the noiseless state ρ^\hat\rhoρ^ .

 * The equality in the above equation can be achieved when we do optimal POVM,
   which is the projection measurement in the diagonal basis of operator X^\hat
   XX^.

67


VARIANCE OF NOISY STATE

Now suppose we add noise channel with the form described in Virtual Quantum
Circuit, El:θ↦Alθ+cl\mathcal{E}_l: \bm{\theta} \mapsto A_l \bm{\theta} +
\bm{c}_lEl :θ↦Al θ+cl where its two components:

 * (Al)ij=Tr[P^iEl(P^j)]/2n=12Tr[λ^iEl(λ^j)](A_l)_{ij} = \mathrm{Tr}[\hat P_i
   \mathcal{E}_l (\hat P_j)] / 2^n = \frac12\text{Tr}[\hat\lambda_i
   \mathcal{E}_l (\hat\lambda_j)](Al )ij =Tr[P^i El (P^j )]/2n=21 Tr[λ^i El (λ^j
   )] → a (4n−1)×(4n−1)(4^n-1)\times(4^n-1)(4n−1)×(4n−1) real-valued matrix,
   → represents unital action of noise channel.

 * (cl)i=tr[P^iEl(I^)]/23n−1=12nTr[λ^iEl(I^(4n−1)×(4n−1))](\bm{c}_l)_i =
   \mathrm{tr}[\hat P_i \mathcal{E}_l(\hat I)] / \sqrt{2^{3n-1}} =
   \frac{1}{2^n}\text{Tr}[\hat\lambda_i \mathcal{E}_l (\hat
   I_{(4^n-1)\times(4^n-1)})](cl )i =tr[P^i El (I^)]/23n−1 =2n1 Tr[λ^i El
   (I^(4n−1)×(4n−1) )]
   → a (4n−1)(4^n-1)(4n−1) real-valued vector,
   → represents non-unital action of noise channel.

68

Now we can substitute J(ρ(θ))−1J(\rho(\bm\theta))^{-1}J(ρ(θ))−1 with
J(E(ρ(θ)))−1J(\mathcal{E}(\rho(\bm\theta)))^{-1}J(E(ρ(θ)))−1 and get:

Var(⟨X^⟩est)≥2n−1NxTJ(E(ρ(θ)))−1x=1N{Tr[E(ρ)  ((A−1)Tx⋅P^)2]−Tr[E(ρ)((A−1)Tx⋅P^)]2}=1N(ΔY^)2\begin{align*}
\text{Var}(\braket{\hat X}^\text{est}) &\geq \frac{2^{n-1}}{N} \bm{x}^T
J(\mathcal{E}(\rho(\bm\theta)))^{-1}\bm{x} \\ &= \frac1N \Bigg\{
\text{Tr}\left[\mathcal{E}(\rho)\; \left((A^{-1})^T\bm{x} \cdot \bm{\hat
P}\right)^2\right] \\ &\hspace{2em}- \text{Tr}\left[\mathcal{E}(\rho)
\left((A^{-1})^T\bm{x} \cdot \bm{\hat P}\right)\right] ^2 \Bigg\} \\ &= \frac1N
(\Delta\hat Y)^2 \end{align*} Var(⟨X^⟩est) ≥N2n−1 xTJ(E(ρ(θ)))−1x=N1
{Tr[E(ρ)((A−1)Tx⋅P^)2]−Tr[E(ρ)((A−1)Tx⋅P^)]2}=N1 (ΔY^)2

where (ΔY^)2≡(\Delta\hat Y)^2 \equiv(ΔY^)2≡ the variance of observable Y^\hat
YY^ of the noisy state E(ρ^)\mathcal{E}(\hat\rho)E(ρ^ ).

69

 * What is Y^\hat YY^? This is an observable which involves the optimal
   estimation strategy. Since Tr[ρ^X^]=Tr[E(ρ^)Y^]\text{Tr}[\hat\rho\hat X] =
   \text{Tr}[\mathcal{E}(\hat\rho) \hat Y]Tr[ρ^ X^]=Tr[E(ρ^ )Y^], we want it to
   satisfy E†(Y^)=X^\mathcal{E}^\dagger(\hat Y) = \hat XE†(Y^)=X^, and since we
   already assume the form of E\mathcal{E}E, we can explicitly express Y^\hat
   YY^ as:
   
   Y^=(−2(n−1)/2(A−1)Tx⋅c)I^+(A−1)Tx⋅P^\hat Y = \left(
   -2^{(n-1)/2}(A^{-1})^T\bm{x} \cdot \bm{c} \right) \hat I +(A^{-1})^T\bm{x}
   \cdot \bm{\hat P} Y^=(−2(n−1)/2(A−1)Tx⋅c)I^+(A−1)Tx⋅P^

 * Y^\hat YY^ can be interpreted as an observable that absorbs the effect of
   noise.

 * Therefore the way to minimize the variance of the unbiased estimator
   ⟨X^⟩est\braket{\hat X}^\text{est}⟨X^⟩est for the noiseless state through the
   measurement of the noisy state is to perform the projection measurement in
   the diagonal basis of the operator Y^\hat YY^.

70


NOISELESS TARGET STATE (NNN-QUBIT)

 * ρ^\hat\rhoρ^ is obtained from applying LLL layers of noiseless unitary gates
   {Ul}l=1L\{\mathcal{U_l}\}_{l=1}^L{Ul }l=1L to the initial state
   ρ^0\hat\rho_0ρ^ 0 .

 * ρ^\hat\rhoρ^ can be parameterized by a general Bloch vector θ\bm\thetaθ:
   ρ^=I^2n+θ⋅P^2n+1\hat \rho = \dfrac{\hat I}{2^n} + \dfrac{\bm\theta \cdot
   \bm{\hat P}}{\sqrt{2^{n+1}}}ρ^ =2nI^ +2n+1 θ⋅P^ with θ∈R4n−1\bm\theta \in
   \mathbb R^{4^n-1}θ∈R4n−1.

 * The coefficient 2(−1−n)/22^{(-1-n)/2}2(−1−n)/2
   
   is somehow related to the general n-level Bloch vector

 * I^≡σ^0⊗n\hat I \equiv \hat\sigma_0^{\otimes n}I^≡σ^0⊗n and
   P^≡{P^i}i=14n−1\bm{\hat P} \equiv \{\hat P_i\}_{i=1}^{4^n-1}P^≡{P^i }i=14n−1
   is the nnn-qubit Pauli group excluding all-identity:
   
   Pi^∈{σ^0,σ^x,σ^y,σ^z}⊗n\{σ^0}⊗n\hat{P_i} \in \{\hat\sigma_0, \hat\sigma_x,
   \hat\sigma_y, \hat\sigma_z\}^{\otimes n} \backslash \{\hat\sigma_0\}^{\otimes
   n}Pi ^ ∈{σ^0 ,σ^x ,σ^y ,σ^z }⊗n\{σ^0 }⊗n.
   → notice that any Pi2^\hat{P_i^2}Pi2 ^ is just I^2n×2n\hat
   I_{2^n\times2^n}I^2n×2n .

 * assume the preparation of initial state ρ^0\hat\rho_0ρ^ 0 is noiseless.

71


VIRTUAL QUANTUM CIRCUIT

Consider one single sample (one single row):

 * start from initial state ρ^0\hat\rho_0ρ^ 0

 * with LLL layers of unitary+followed by noisy operations
   {El∘Ul}l=1L\{\mathcal{E}_l \circ \mathcal{U}_l\}_{l=1}^L{El ∘Ul }l=1L and
   constraint the noise to be Markovian

 * finally use noiseless POVM M0\mathcal{M}_0M0 to estimate the expectation
   value of a traceless observable X^=x⋅P^\hat X =\bm{x} \cdot \bm{\hat
   P}X^=x⋅P^, the observable is parametrized by x∈R4n−1\bm{x} \in
   \mathbb{R}^{4^n-1}x∈R4n−1.


72
Loading slide...
73


OBJECTIVE OF QEM

To remove the effect of noise channels {El}l=1L\{\mathcal{E}_l\}_{l=1}^L{El
}l=1L , so that we can have an unbiased estimator of X^=x⋅P^\hat X =\bm{x} \cdot
\bm{\hat P}X^=x⋅P^

⟨X^⟩≡tr[ρ^X^]=2n−1θ⋅x\braket{\hat X} \equiv \mathrm{tr}[\hat\rho\hat X] =
\sqrt{2^{n-1}} \bm{\theta}\cdot\bm{x} ⟨X^⟩≡tr[ρ^ X^]=2n−1 θ⋅x

with standard deviation ε\varepsilonε. Construct Virtual Quantum Circuit as the
follows:

 * boosted noise Elm\mathcal{E}_{lm}Elm in the lthl^\text{th}lth layer and
   mthm^\text{th}mth copy such that Γ(Elm)≥Γ(El)≥γ\Gamma(\mathcal{E}_{lm}) \geq
   \Gamma(\mathcal{E}_l) \geq \gammaΓ(Elm )≥Γ(El )≥γ

 * classical register ρ^c,m\hat\rho_{c,m}ρ^ c,m coupled with the system qubits
   via the additional operations Clm\mathcal{C}_{lm}Clm

 * POVM measurement M\mathcal MM performed on the entire set of copies to output
   the estimator of ⟨X^⟩\braket{\hat X}⟨X^⟩


CONSTRAINT ON LOCCS

 * Classical register ρ^c,m\hat\rho_{c,m}ρ^ c,m is initialized with
   probabilistic mixtures of computational bases as
   ρ^c,m=∑ipmi∣i⟩⟨i∣\hat\rho_{c,m} = \sum_i p_{mi}\ket i\bra iρ^ c,m =∑i pmi
   ∣i⟩⟨i∣

 * They are coupled with system via only unitary operations as
   Clm=∑iClmi⊗∣i⟩⟨i∣\mathcal{C}_{lm} = \sum_i \mathcal{C}_{lmi}\otimes\ket i\bra
   iClm =∑i Clmi ⊗∣i⟩⟨i∣ because we’re not allowed to QEC.

74


CALCULATE SLD FISHER INFORMATION MATRIX

First, re-express the mthm^\text{th}mth copy (or mthm^\text{th}mth row) of the
quantum state in the virtual circuit as
Em′(ρ(θ)⊗ρ^c,m)\mathcal{E}'_m(\rho(\bm\theta)\otimes\hat\rho_{c,m})Em′ (ρ(θ)⊗ρ^
c,m ), with Em′\mathcal{E}'_mEm′ being an effective noise channel:

Em′=(ELm∘UL∘C(L−1)m)⋯(E1m∘U1∘C(0)m)∘(U1−1∘⋯∘UL−1)\mathcal{E}'_m =
(\mathcal{E}_{Lm} \circ \mathcal{U}_L \circ \mathcal{C}_{(L-1)m}) \cdots
(\mathcal{E}_{1m} \circ \mathcal{U}_1 \circ \mathcal{C}_{(0)m}) \circ
(\mathcal{U}_1^{-1}\circ \cdots \circ{U}_L^{-1}) Em′ =(ELm ∘UL ∘C(L−1)m )⋯(E1m
∘U1 ∘C(0)m )∘(U1−1 ∘⋯∘UL−1 )

why define it this way? because
ρ^0⊗ρ^c,m=(U1−1∘⋯∘UL−1)ρ(θ)⊗ρ^c,m\hat\rho_0\otimes\hat\rho_{c,m} =
(\mathcal{U}_1^{-1}\circ \cdots
\circ{U}_L^{-1})\rho(\bm\theta)\otimes\hat\rho_{c,m}ρ^ 0 ⊗ρ^ c,m =(U1−1 ∘⋯∘UL−1
)ρ(θ)⊗ρ^ c,m

This compilation allows for the calculation of quantum Fisher information matrix
of the state right before POVM → analyze the SLD Fisher information matrix JJJ
of the quantum state: ⊗m=1NEm′(ρ(θ)⊗ρ^c,m)\otimes_{m=1}^N
\mathcal{E}'_m(\rho(\bm\theta)\otimes\hat\rho_{c,m})⊗m=1N Em′ (ρ(θ)⊗ρ^ c,m ).

75

Note that we have NNN copies in total. (or you can think of it as the sample
complexity)

JJJ can be bounded as:

J≲∑mIΓ(Em′)2≲N1γ2LJ \lesssim \sum_m \frac{I}{\Gamma(\mathcal{E}'_m)^2} \lesssim
N\frac{1}{\gamma^{2L}} J≲m∑ Γ(Em′ )2I ≲Nγ2L1

 * indicates the exponential decay of JJJ with circuit depth (or layer #) LLL
   growth.

 * can combine this with quantum Cramér-Rao bound inequality
   → so we can relate JJJ with the standard deviation ε\varepsilonε of an
   unbiased estimator

76


MAIN RESULTS: COMPLEXITY OF NNN

Prerequisite: If the noise Elm\mathcal{E}_{lm}Elm satisfies for all lll(layers)
and mmm(copies), that:

 * for all ρ^≠σ^\hat\rho \neq \hat \sigmaρ^ =σ^, we can still distinguish them
   apart even after the same noise channel:
   Elm(ρ^)≠Elm(σ^)\mathcal{E}_{lm}(\hat\rho) \neq
   \mathcal{E}_{lm}(\hat\sigma)Elm (ρ^ )=Elm (σ^)
   → this is a necessary condition: quantum information is not completely
   destroyed after noise

 * for all ρ^\hat\rhoρ^ , the state after noise,
   Elm(ρ^)\mathcal{E}_{lm}(\hat\rho)Elm (ρ^ ), is full rank with all eigenvalues
   greater than 0
   → Elm(ρ^)\mathcal{E}_{lm}(\hat\rho)Elm (ρ^ ) is a positive-definite matrix
   → means that after the noise, the variance of all observable for any state is
   non-zero. Hence our sample cost must be greater than 0.

77
Theorem 1: Complexity of NNN (lower bound)

The cost NNN required for any unbiased estimator of ⟨X^⟩\braket{\hat X}⟨X^⟩ with
standard deviation ϵ\epsilonϵ constructed from QEM (that can be translated into
the virtual quantum circuit) satisfies:

N≥∥x∥2ε2βγ2L,N \geq \frac{\|\bm{x}\|^2}{\varepsilon^2} \beta \gamma^{2L},
N≥ε2∥x∥2 βγ2L,

where β\betaβ is the largest 0<β<10<\beta<10<β<1 such that
Elm(ρ^)−β2nI^≥0\mathcal{E}_{lm}(\hat\rho) - \frac{\beta}{2^n}\hat I \geq 0Elm
(ρ^ )−2nβ I^≥0 for all ρ^\hat\rhoρ^ and all l,ml,ml,m.

→ or β\betaβ can be viewed as “how far away“ the general Bloch sphere is from
the original surface due to the noise.


IMPLICATION

if γ>1\gamma>1γ>1, the cost (NNN) of the unbiased QEM grows exponentially with
the circuit depth LLL no matter how we choose Elm\mathcal{E}_{lm}Elm or
Clm\mathcal{C}_{lm}Clm or POVM.

(see Noise channel characterization)

78


WHEN QUANTUM CIRCUIT SCRAMBLES THE STATE STRONG ENOUGH

then we must incorporate the dependency of nnn for sample complexity into
consideration: (or else our lower bound will look like trash)

> we must pay overhead to eliminate every local noise and thus encounter
> dependence on nnn

Theorem 2: Complexity under the presence of strong entangling gate

Let U1,⋯ ,UL\mathcal{U}_1,\cdots,\mathcal{U}_LU1 ,⋯,UL all be nnn-qubit unitary
gate (which are drawn from a set of random unitary) that form unitary 2-design
and El\mathcal{E}_lEl be a local noise. Then, there is exponential growth with
both qubit count nnn and depth LLL in the average over the number of copies NNN
required to perform unbiased estimation of ⟨X^⟩\braket{\hat X}⟨X^⟩ over
{U1,⋯ ,UL}\{\mathcal{U}_1,\cdots,\mathcal{U}_L\}{U1 ,⋯,UL }.

 * To put it simple, if we encounter random or chaotic quantum circuit, then
   system’s qubit number nnn comes to play a role in the exponential scaling of
   sample complexity, if we want a better/tighter bound.

 * pure unital noise case: if Elm(I^2n)=I^2n\mathcal{E}_{lm}(\frac{\hat I}{2^n})
   = \frac{\hat I}{2^n}Elm (2nI^ )=2nI^ , then the cost bound satisfies

N≥∥x∥2ε2(1−(1−β)L)γ2L≈∥x2∥ε2γ2LN \geq \frac{\|\bm{x}\|^2}{\varepsilon^2}
\Big(1-1-\beta^L\Big) \gamma^{2L} \approx \frac{\|\bm{x}^2\|}{\varepsilon^2}
\gamma^{2L} N≥ε2∥x∥2 (1−(1−β)L)γ2L≈ε2∥x2∥ γ2L

79


SPECIAL CASES THAT TIGHTEN THE BOUND (1/3)


1. GLOBAL DEPOLARIZATION CHANNEL (THIS IS A UNITAL CHANNEL)

assume that the entire unitary gates are followed by the global depolarization
channel

Elm:θ↦(1−plm)θ,\mathcal{E}_{lm}: \bm\theta \mapsto (1-p_{lm})\bm\theta, Elm
:θ↦(1−plm )θ,

since we assume boosted noise in our framework, the error rate is bounded from
below as plm≥pp_{lm} \geq pplm ≥p (so as for the other two types of noise
below), so its minimum noise strength γ=1/(1−p)\gamma = 1/(1-p)γ=1/(1−p) and so
we have β=p\beta = pβ=p, therefore N≈∥x∥2ε2(1−p)−2LN \approx
\frac{\|\bm{x}\|^2}{\varepsilon^2} (1-p)^{-2L}N≈ε2∥x∥2 (1−p)−2L.

 * the effective error rate p′p'p′ for Em′\mathcal{E}'_mEm′ mentioned in
   calculate SLD Fisher Information Matrix.md is then p′=1−(1−p)Lp' =
   1-1-p^Lp′=1−(1−p)L → can directly verify.

80


SPECIAL CASES THAT TIGHTEN THE BOUND (2/3)


2. LOCAL DEPOLARIZATION CHANNEL (THIS IS A UNITAL CHANNEL TOO)

Elm=(Elm(0))⊗n,          Elm(0):θ↦(1−plm)θ\mathcal{E}_{lm} =
(\mathcal{E}^{(0)}_{lm})^{\otimes n},\;\;\;\;\;
\mathcal{E}^{(0)}_{lm}:\bm\theta\mapsto(1-p_{lm})\bm\theta Elm =(Elm(0)
)⊗n,Elm(0) :θ↦(1−plm )θ

if moreover the unitary gates of the circuit satisfy the condition: For a random
circuit whose unitary gate is drawn from unitary 2-design (such as nnn-qubit
Clifford group), the bound will be tighten as

E[N]≥O((1+324n4n−1p)nL)\mathbb{E}[N] \geq \mathcal O\left(
\left(1+\frac{3}{2}\frac{4^n}{4^n-1}p\right)^{nL} \right) E[N]≥O((1+23 4n−14n
p)nL)

81


SPECIAL CASES THAT TIGHTEN THE BOUND (3/3)


3. LOCAL AMPLITUDE DAMPING CHANNEL (THIS IS A NON-UNITAL CHANNEL)

Elm=(Elm(0))⊗n,          Elm(0):(θx,θy,θz)↦(1−plmθx,1−plmθy,(1−plm)θz+plm)\mathcal{E}_{lm}
= (\mathcal{E}^{(0)}_{lm})^{\otimes n},\;\;\;\;\;
\mathcal{E}^{(0)}_{lm}:(\theta_x, \theta_y, \theta_z) \mapsto
\\(\sqrt{1-p_{lm}}\theta_x, \sqrt{1-p_{lm}}\theta_y, (1-p_{lm})\theta_z +
p_{lm}) Elm =(Elm(0) )⊗n,Elm(0) :(θx ,θy ,θz )↦(1−plm θx ,1−plm θy ,(1−plm )θz
+plm )

again if we satisfy the same condition, the bound will be tighten as

E[N]≥O((1+4n4n−1p)nL)\mathbb{E}[N] \geq \mathcal O\left(
\left(1+\frac{4^n}{4^n-1}p\right)^{nL} \right) E[N]≥O((1+4n−14n p)nL)

82


CONNECTION BETWEEN QEM AND QEC & OPEN PROBLEMS




 * AMONG DIFFERENT QEMS: THEY CAN BE USED IN PARALLEL (OR CONCATENATE) E.G.
   RESILIENCE L.



 * BETWEEN QEMS AND QECS: QEM CAN BE USED ALONGSIDE QEC
   
   Variational Eigensolver (need parametrized unitary), Hamiltonian Simulation
   (trotterization)


 * OTHER WAYS OF BENCHMARKING QEMS? (OTHER THAN USING SAMPLING COMPLEXITY)
   
   like MNIST or ImageNet database in Computer Vision.


 * OPTIMAL QEM + QEC STRATEGY?
   
   if I have limited number of physical qubits, how should I partition them into
   QEC blocks, so that in a given amount of time, we can get the best estimator?
   
   > Usually cheaper to build a hundred QPUs with each QPU having 100 physical
   > qubits; than to build a single QPU with 10000 physical qubits. (finding the
   > sweet spot)

83
Loading slide...
84
Loading slide...
85


THANK YOU FOR LISTENING

any question?

Thanks to Hyukgun Kwon for important discussions, useful resources for
reference, assistance, and critical advices when finalizing the slides.

Thanks to Adi Gandotra, David Noachtar, Debayan Bandyopadhyay, Mahadevan
Subramanian, and Mariesa Teo for all the important advices, questions, and
assistance during the primitive version of the slides.

86