grouptalk-andy-0521.pages.dev Open in urlscan Pro
172.66.44.177  Public Scan

URL: https://grouptalk-andy-0521.pages.dev/
Submission: On May 11 via api from US — Scanned from DE

Form analysis 0 forms found in the DOM

Text Content

(TITLE TO BE DETERMINED) SOME (MAJOR) QUANTUM ERROR MITIGATION TECHNIQUES AND
THEIR COSTS

arXiv: 2208.09385
arXiv: 2210.00921
based on these 2 reviews
Andy Chu
May 21, 2024
Group Talk at MeMo




TABLE OF CONTENT

 1.  Quantum Error Mitigation - Definition

 2.  Error-Mitigated Estimators

 3.  Trade-offs between Bias & Variance

 4.  Exponential scaling of Sampling overhead

 5.  Zero-Noise Extrapolation 📄, 📄
 6.  Probabilistic Error Cancellation 📄
 7.  Measurement Error Mitigation 📄
 8.  Purity Constraints
 9.  Subspace Expansions 📄, 📄, 📄

 10. Quantum Fisher Information

 11. Multi-parametric QFI and re-parametrization

 12. Applying Cramér-Rao Bound

 13. Virtual Quantum Circuit

 14. Main Results: Complexity of NNN


Enter fullscreenGo to previous slideGo to next slideShow slide overviewSwitch to
dark mode theme

Show drawing toolbar

Presenter Mode
Adjust settings

1 / 88
Draw with stylusDraw a lineDraw an arrowDraw an ellipseDraw a rectangleErase

Adjust stroke width
Set brush color

Set brush color

Set brush color

Set brush color

Set brush color

Set brush color

Set brush color


UndoRedoDelete

Pin drawing


(TITLE TO BE DETERMINED) SOME (MAJOR) QUANTUM ERROR MITIGATION TECHNIQUES AND
THEIR COSTS

arXiv: 2208.09385
arXiv: 2210.00921
based on these 2 reviews
Andy Chu
May 21, 2024
Group Talk at MeMo


1


TABLE OF CONTENT

 1.  Quantum Error Mitigation - Definition

 2.  Error-Mitigated Estimators

 3.  Trade-offs between Bias & Variance

 4.  Exponential scaling of Sampling overhead

 5.  Zero-Noise Extrapolation 📄, 📄
 6.  Probabilistic Error Cancellation 📄
 7.  Measurement Error Mitigation 📄
 8.  Purity Constraints
 9.  Subspace Expansions 📄, 📄, 📄

 10. Quantum Fisher Information

 11. Multi-parametric QFI and re-parametrization

 12. Applying Cramér-Rao Bound

 13. Virtual Quantum Circuit

 14. Main Results: Complexity of NNN

2


A SNEAK PEEK OF IBM’S QUANTUM ERROR MITIGATION ROUTINE

in next slide

3
4


A SNEAK PEEK OF PENNYLANE’S QUANTUM ERROR MITIGATION METHODS (ZNE FOR EXAMPLE)

in next slide

5
6


QUANTUM ERROR MITIGATION - DEFINITION

 * Algorithmic schemes that reduce the noise-induced bias in the expectation
   value.

 * By post-processing outputs from an ensemble of circuit runs.

 * Using circuits at the same noise level as the original unmitigated circuit or
   above.



ERROR MITIGATION IS NOT ERROR CORRECTION

 * QEM will only reduce the effective damage due to noise for the whole ensemble
   of circuit runs.

 * But zooming into each individual circuit run, the circuit noise level remains
   unchanged or even increases.

 * Different from QEC, which aims to reduce the effect of noise on the output in
   every single circuit run.

7


QUANTUM ERROR MITIGATION - DESIRED FEATURE

 * the mitigation method should ideally only require a very modest qubit
   overhead to remain practical on current and near-term quantum hardware.

 * error mitigation techniques should provide an accuracy guarantee for the
   method. Such a guarantee should ideally provide a formal error bound on the
   mitigated expectation values that indicate how well the method works at
   varying levels of noise.

 * a reliable error mitigation method should require few assumptions (or, no
   assumptions) about the final state that is prepared by the computation.

8


ERROR-MITIGATED ESTIMATORS

 * Goal: to estimate the expectation value of some observable of interest OOO →
   to estimate Tr[ρ0O]\text{Tr}[\rho_0O]Tr[ρ0 O].

 * How: we can construct an estimator O^\hat OO^ for estimating OOO. And how
   good is O^\hat OO^? There are different ways to assess O^\hat OO^:
   
   1. Use prediction intervals
   
   calculate the interval within which the outcome of the estimator will fall
   with a given probability, offering a rigorous bound on the worst-case
   deviation of the estimator.
   
   2. Use Mean Square Error we choose this one for assessing O^\hat OO^
   
   so that we can see the different factors that contribute to the deviation of
   the estimator more clearly.
   
   MSE[O^]=E[(O^−Tr[ρ0O])2]\text{MSE}[\hat O] = \text{E}\left[\left( \hat O -
   \text{Tr}[\rho_0O]\right)^2\right]MSE[O^]=E[(O^−Tr[ρ0 O])2]

Formally defined goal: to reduce MSE[O^]\text{MSE}[\hat O]MSE[O^] as much as
possible, using finite resource.

9


WHAT CONTRIBUTES TO MSE[O^]\TEXT{MSE}[\HAT O]MSE[O^]?

It’s convenient to break it into 2 terms:

MSE[O^]=(Bias[O^])2+Var[O^]\text{MSE}[\hat O] = \left(\text{Bias}[\hat
O]\right)^2 + \text{Var}[\hat O] MSE[O^]=(Bias[O^])2+Var[O^]

Each of them defined as:

Bias[O^]=E[O^]−Tr[ρ0O]Var[O^]=E[O^2]−(E[O^])2\begin{align*} \text{Bias}[\hat O]
&= \text{E}[\hat O] - \text{Tr}[\rho_0O] \\ \text{Var}[\hat O] &=
\text{E}\left[\hat O^2\right] - \left(\text{E}[\hat O]\right)^2 \end{align*}
Bias[O^]Var[O^] =E[O^]−Tr[ρ0 O]=E[O^2]−(E[O^])2

10


SIMPLEST ESTIMATOR O‾Ρ\OVERLINE{O}_\RHOOΡ (JUST DON’T USE QEM)

> Although we are given a noisy state ρ\rhoρ instead of ρ0\rho_0ρ0 , we still
> measure OOO on that ρ\rhoρ, and repeat for NcirN_\text{cir}Ncir times
> (effectively running the noisy circuit NcirN_\text{cir}Ncir times), so we will
> have NcirN_\text{cir}Ncir independent O^ρ\hat O_\rhoO^ρ now. Finally, the way
> of constructing O‾ρ\overline{O}_\rhoOρ is to simply take average of these
> O^ρ\hat O_\rhoO^ρ .
> → the more NcirN_\text{cir}Ncir we run, the higher the quality of
> O‾ρ\overline{O}_\rhoOρ :

MSE[O‾ρ]=(Tr[ρO]−Tr[ρ0O]⏟Bias[O‾ρ]=Bias[O^ρ])2+Tr[ρO2]−Tr[ρO]2Ncir⏟Var[O‾ρ]=Var[O^ρ]/Ncir\text{MSE}[\overline{O}_\rho]=
(\underbrace{\text{Tr}[\rho O] - \text{Tr}[\rho_0
O]}_{\text{Bias}[\overline{O}_\rho]=\text{Bias}[\hat{O}_\rho]})^2 +
\underbrace{\frac{\text{Tr}\left[\rho O^2\right] - \text{Tr}[\rho
O]^2}{N_\text{cir}}}_{\text{Var}[\overline{O}_\rho] =
\text{Var}[\hat{O}_\rho]/N_\text{cir}} MSE[Oρ ]=(Bias[Oρ ]=Bias[O^ρ
]Tr[ρO]−Tr[ρ0 O] )2+Var[Oρ ]=Var[O^ρ ]/Ncir Ncir Tr[ρO2]−Tr[ρO]2

 * the second term is related to shot noise, and will reduce as we increase
   NcirN_\text{cir}Ncir .

 * so eventually MSE[O‾ρ]\text{MSE}[\overline{O}_\rho]MSE[Oρ ] will be limited
   by the first term (the bias) of the estimator
   Bias[O‾ρ]\text{Bias}[\overline{O}_\rho]Bias[Oρ ] (its magnitude, to be
   clear.) because this a systematic error that cannot be reduced by increasing
   the number of circuit runs.

11


WHAT IF WE USE QEM?

We can construct O‾em\overline{O}_\text{em}Oem (error-mitigated) using the same
number of circuit runs NcirN_\text{cir}Ncir to achieve:

 * smaller bias than the original naive estimator O‾ρ\overline{O}_\rhoOρ :
   magnitude of Bias[O‾em]≤\text{Bias}[\overline{O}_\text{em}] \leqBias[Oem ]≤
   magnitude of Bias[O‾ρ]\text{Bias}[\overline{O}_\rho]Bias[Oρ ].

 * but usually, inevitably, larger variance:
   Var[O‾em]≥Var[O‾ρ]\text{Var}[\overline{O}_\text{em}] \geq
   \text{Var}[\overline{O}_\rho]Var[Oem ]≥Var[Oρ ], because we usually extract
   the useful part from the noise to reduce bias, so the error-mitigated
   estimator is also more sensitive to noise.

12


TRADE-OFFS BETWEEN BIAS & VARIANCE

You can choose either:

 * quickly converging QEM method, with large residual error.

 * or more costly QEM method, but more accurate.


13


QUANTIFING THE TRADE-OFF:

How much more circuit runs do we need for O‾em\overline{O}_\text{em}Oem to have
variance of same level of original noisy O‾ρ\overline{O}_\rhoOρ ?


DEFINE SAMPLING OVERHEAD:

Cem=Nshotϵ(O^em)Nshotϵ(O^ρ)=Var[O^em]Var[O^ρ]C_\text{em} =
\frac{N^\epsilon_\text{shot}(\hat O_\text{em})}{N^\epsilon_\text{shot}(\hat
O_\rho)} = \frac{\text{Var}[\hat O_\text{em}]}{\text{Var}[\hat O_\rho]} Cem
=Nshotϵ (O^ρ )Nshotϵ (O^em ) =Var[O^ρ ]Var[O^em ]

here we post-define O^em\hat O_\text{em}O^em to be a 'one-shot' error-mitigated
estimator satisfying E[O^em]=E[O‾em]\text{E}[\hat O_\text{em}] =
\text{E}[\overline{O}_\text{em}]E[O^em ]=E[Oem ] and
Var[O^em]=Ncir×Var[O‾em]\text{Var}[\hat{O}_\text{em}]
=N_\text{cir}\times\text{Var}[\overline{O}_\text{em}]Var[O^em ]=Ncir ×Var[Oem ]

14


AN ALTERNATIVE WAY TO QUANTIFY:

If we know the range of a random variable X^\hat XX^ (the difference between the
maximum and minimum possible values taken by X^\hat XX^), denoted as R[X^]R[\hat
X]R[X^], we can plug this range into Hoeffding’s inequality to get the sample
size required:

To sufficiently guarantee an "estimation of E[X^]E[\hat X]E[X^]" to
ϵ\epsilonϵ-precision with (1−δ)(1-\delta)(1−δ) probability, we need:

Nhffϵ,δ(X^)=ln⁡(2/δ)2ϵ2R[X^]2N^{\epsilon,\delta}_\text{hff}(\hat X) =
\frac{\ln(2/\delta)}{2\epsilon^2} R[\hat X]^2 Nhffϵ,δ (X^)=2ϵ2ln(2/δ) R[X^]2

This way our previously defined sampling overhead can be approximated as:

Cem≈Nhffϵ,δ(O^em)Nhffϵ,δ(O^ρ)=R[O^em]2R[O^ρ]2C_\text{em} \approx
\frac{N^{\epsilon,\delta}_\text{hff}(\hat
O_\text{em})}{N^{\epsilon,\delta}_\text{hff}(\hat O_\rho)} = \frac{R[\hat
O_\text{em}]^2}{R[\hat O_\rho]^2} Cem ≈Nhffϵ,δ (O^ρ )Nhffϵ,δ (O^em ) =R[O^ρ
]2R[O^em ]2

15


MODELING THE ERRORS

Similar to QEC, we model noise as discrete, probabilistic events called faults,
and faults can occur at various locations in the circuits, locations include
gates, idling steps, and measurements.


QUANTIFYING THE ERRORS

Assume all locations are afflicted with Pauli noise, and the Pauli noise at
location fff occurs with pfp_fpf probability.

 * if error events at all locations are independent, then the probability of no
   faults occurring at all is P0=Πf(1−pf)P_0 = \Pi_f (1-p_f)P0 =Πf (1−pf ) →
   fault-free probability of the circuit.

16


AN ALTERNATIVE WAY TO QUANTIFY:

Run the circuit one time, see how many fault occurs. Then average this number
over many circuit runs. This final average is defined as circuit fault rate
λ=∑fpf\lambda = \sum_f p_fλ=∑f pf .

 * simple case: all pfp_fpf are the same (ppp), then λ=Mp\lambda = Mpλ=Mp if MMM
   locations in total. If MMM is large, we can model “kkk errors occur during
   one run” using Poisson Distribution: Pk=e−λλk/k!P_k =
   e^{-\lambda}\lambda^k/k!Pk =e−λλk/k!
   
   Then the fault-free probability is just the special case where k=0k=0k=0,
   P0=e−λP_0 = e^{-\lambda}P0 =e−λ.

> All of the arguments will still apply if we define (1−pf1-p_f1−pf ) as the
> coefficient of the error-free part of the Kraus representation of some general
> noise. More generally, we can always apply Pauli twirling to transform all
> noise to Pauli noise such that our arguments become valid.

17


EXPONENTIAL SCALING OF SAMPLING OVERHEAD


> INEVITABILITY
> 
> Even we can always magically distinguish all perfect circuit runs (happens
> with P0=e−λP_0 = e^{-\lambda}P0 =e−λ) from the rest “flawed-runs”, we still
> need to run exponentially times of the circuit to get enough number of
> “perfect-runs” to reach ϵ\epsilonϵ-level shot noise.

As of today, we don’t have magic. So instead of impractically extracting the
error-free state ρ0\rho_0ρ0 , we focus on how to construct Error-mitigated
estimators.


QEC VS QEM

For a given noisy circuit with circuit fault rate λ\lambdaλ:

 * QEC: active correction → reduces λ\lambdaλ in each circuit run

 * QEM: relies on “post-processing the outputs” from an ensemble of circuit
   runs, each run with circuit fault rate ≥λ\geq\lambda≥λ.

18


NOW WE INTRODUCE 5 COMMON-SEEN ERROR MITIGATION TECHNIQUES

19


ZERO-NOISE EXTRAPOLATION 📄, 📄

If we can tune our circuit fault rate
→ obtain a series of different state ρλ\rho_\lambdaρλ at different λ\lambdaλ
→ so we can view Tr[ρλO]\text{Tr}[\rho_\lambda O]Tr[ρλ O] as a function of
λ\lambdaλ
→ then the ideal expectation value happens when we substitute λ=0\lambda=0λ=0
into the function.


FORMAL DEFINITION:

 1. Model the trend of the data using a parametrized function f(λ;θ⃗)f(\lambda;
    \vec\theta)f(λ;θ)

 2. Once we’re done fitting, we get a set of optimal parameters
    θ⃗∗\vec\theta^*θ∗

 3. The error-mitigated estimate of the zero-noise output is then
    E[O^em]≡f(0;θ⃗∗)\text{E}[\hat O_\text{em}] \equiv f(0; \vec\theta^*)E[O^em
    ]≡f(0;θ∗).

20


FOR SUFFICIENTLY SMALL Λ\LAMBDAΛ (USUALLY IT IS THE CASE) → RICHARDSON
EXTRAPOLATION

We can fit the dataset with some “truncated Taylor expansion”:

Tr[ρλO]≈f(λ;θ⃗)=∑k=0M−1θkλkk!\text{Tr}[\rho_{\lambda} O] \approx f(\lambda;
\vec\theta) = \sum_{k=0}^{M-1} \theta_k \frac{\lambda^k}{k!} Tr[ρλ
O]≈f(λ;θ)=k=0∑M−1 θk k!λk

which is a (M−1)th(M-1)^\text{th}(M−1)th order polynomial with MMM coefficients
(or degree of freedom).

→ simplest case: M=2M=2M=2, it’s just linear extrapolation

→ if we first assign a value for MMM (related to how sophisticated we want to
model fff), then we need to obtain at least MMM data points (in the spirit of
Lagrange polynomial) to obtain our error-mitigated estimate:

E[O^em]=θ0∗=∑m=1M(Tr[ρλmO]∏k≠mλk−0λk−λm)\text{E}[\hat O_\text{em}] = \theta_0^*
= \sum_{m=1}^M \left(\text{Tr}[\rho_{\lambda_m}O] \prod_{k\neq m}
\frac{\lambda_k-0}{\lambda_k-\lambda_m} \right) E[O^em ]=θ0∗ =m=1∑M Tr[ρλm
O]k=m∏ λk −λm λk −0

21

 * Performance: the bias of E[O^em]\text{E}[\hat O_\text{em}]E[O^em ] originates
   from our omission of higher degree terms in f(λ;θ⃗)f(\lambda;
   \vec\theta)f(λ;θ), so the bias is of magnitude
   Bias[O^em]=O(λM)\text{Bias}[\hat O_\text{em}] =
   \mathcal{O}(\lambda^M)Bias[O^em ]=O(λM) for MMM data points.

 * Sampling overhead: if we use Monte Carlo method (because E[O^em]\text{E}[\hat
   O_\text{em}]E[O^em ] is a linear combination of the set of noisy expectation
   values), the overhead is:
   
   Cem∼(∑m=1M∣∏k≠mλk−0λk−λm∣)2C_\text{em} \sim \left( \sum_{m=1}^M \left|
   \prod_{k\neq m} \frac{\lambda_k-0}{\lambda_k-\lambda_m} \right| \right)^2 Cem
   ∼ m=1∑M k=m∏ λk −λm λk −0 2


22


HOW DO WE "SCALE THE NOISE" ON QUANTUM HARDWARE?

in next slide: including global/local unitary folding, identity insertion, pulse
stretching… etc

23
24


WHAT OTHER FUNCTIONS CAN WE USE FOR FITTING THE {NOISE, EXPECTATION VALUE}
DATASET? (OTHER THAN RICHARDSON)

in next slide: see Extrapolation methods: Factory objects section

25
26


PROBABILISTIC ERROR CANCELLATION 📄

Key idea origin:
→ we can write the noise-free expectation value as a linear combination of noisy
expectation values.
→ this is a quasi-probability decomposition,
→ can be sampled using Monte Carlo procedure.


FEATURE:

→ can fully remove the bias of expectation values of generic quantum circuits
(Bias[O^em]=0\text{Bias}[\hat O_\text{em}]=0Bias[O^em ]=0)
→ but at the expense of a sampling overhead CemC_\text{em}Cem that grows
exponentially with the circuit fault rate λ\lambdaλ.

27


FORMAL DEFINITION

 * We want to implement an ideal channel U\mathcal UU, but with a set of noisy
   basis operations {Bn}\{\mathcal B_n\}{Bn } because we can only perform
   {Bn}\{\mathcal B_n\}{Bn } on our hardware.

 * For sufficiently large basis, we can decompose the ideal channel
   U=∑nαnBn\mathcal U = \sum_n\alpha_n \mathcal B_nU=∑n αn Bn with real
   coefficients αn\alpha_nαn .

 * αn\alpha_nαn can often be negative → this can often be an unphysical map, but
   this doesn’t matter.

 * Because we can express our ideal expectation value of any observable OOO as
   Tr[Oρ]=⟨ ⁣⟨O∣ρ⟩ ⁣⟩\text{Tr}[O\rho ] = \langle\!\langle O \vert \rho
   \rangle\!\rangleTr[Oρ]=⟨⟨O∣ρ⟩⟩, and decompose it into:
   
   ⟨ ⁣⟨O∣ρ⟩ ⁣⟩=⟨ ⁣⟨O∣U∣ρ0⟩ ⁣⟩=∑nαn⟨ ⁣⟨O∣Bn∣ρ0⟩ ⁣⟩\langle\!\langle O \vert \rho
   \rangle\!\rangle = \langle\!\langle O \vert \mathcal U \vert \rho _0
   \rangle\!\rangle = \sum_n\alpha_n\langle\!\langle O \vert \mathcal B_n \vert
   \rho _0 \rangle\!\rangle ⟨⟨O∣ρ⟩⟩=⟨⟨O∣U∣ρ0 ⟩⟩=n∑ αn ⟨⟨O∣Bn ∣ρ0 ⟩⟩

 * So in real-world we can sample each ⟨ ⁣⟨O∣Bn∣ρ0⟩ ⁣⟩\langle\!\langle O \vert
   \mathcal B_n \vert \rho _0 \rangle\!\rangle⟨⟨O∣Bn ∣ρ0 ⟩⟩ term with
   probability ∣αn∣/γ|\alpha_n|/\gamma∣αn ∣/γ where γ=∑nαn\gamma =
   \sum_n\alpha_nγ=∑n αn . And don’t forget to multiply the term by
   sign(αn)⋅γ\text{sign}(\alpha_n)\cdot\gammasign(αn )⋅γ before adding all up.

 * Monte Carlo would usually do the work. The sampling overhead in this case is
   Cem∼γ2=(∑n∣αn∣)2C_\text{em}\sim \gamma^2 = (\sum_n|\alpha_n|)^2Cem ∼γ2=(∑n
   ∣αn ∣)2.

28
Loading slide...
29


A SIMPLE EXAMPLE

Suppose our basis operations only suffer from single-qubit depolarizing channel
ρ↦(1−p)ρ+p3XρX+p3YρY+p3ZρZ\rho \mapsto (1-p)\rho + \frac p3 X\rho X+ \frac p3
Y\rho Y+ \frac p3 Z\rho Zρ↦(1−p)ρ+3p XρX+3p YρY+3p ZρZ.
Then our ideal Hadamard gate H\text HH can be expanded into noisy operations:

(H)ideal=α1(H)noisy+α2(XH)noisy+α3(YH)noisy+α4(ZH)noisy\begin{align*} (\text
H)_\text{ideal} &= \alpha_1 (\text H)_\text{noisy} \\ &+\alpha_2 (\text X\text
H)_\text{noisy} \\ &+\alpha_3 (\text Y\text H)_\text{noisy} \\ &+\alpha_4 (\text
Z\text H)_\text{noisy} \end{align*} (H)ideal =α1 (H)noisy +α2 (XH)noisy +α3
(YH)noisy +α4 (ZH)noisy

 * If our noise level p=0.1p=0.1p=0.1, then
   (α1,α2,α3,α4)=(2926,−126,−126,−126)≈(1.115,−0.038,−0.038,−0.038)(\alpha_1,
   \alpha_2, \alpha_3, \alpha_4) = (\frac{29}{26}, \frac{-1}{26}, \frac{-1}{26},
   \frac{-1}{26}) \approx (1.115, -0.038, -0.038, -0.038)(α1 ,α2 ,α3 ,α4 )=(2629
   ,26−1 ,26−1 ,26−1 )≈(1.115,−0.038,−0.038,−0.038).
 * If our noise level p=0.2p=0.2p=0.2, then
   (α1,α2,α3,α4)=(1411,−111,−111,−111)≈(1.273,−0.091,−0.091,−0.091)(\alpha_1,
   \alpha_2, \alpha_3, \alpha_4) = (\frac{14}{11}, \frac{-1}{11}, \frac{-1}{11},
   \frac{-1}{11}) \approx (1.273, -0.091, -0.091, -0.091)(α1 ,α2 ,α3 ,α4 )=(1114
   ,11−1 ,11−1 ,11−1 )≈(1.273,−0.091,−0.091,−0.091).


> We only want pure noiseless -[H]- in the end, with no other noiseless
> components, that is, α1(1−p)+(α2+α3+α4)(p/3)=1\alpha_1(1-p) +
> (\alpha_2+\alpha_3+\alpha_4)(p/3) = 1α1 (1−p)+(α2 +α3 +α4 )(p/3)=1 and
> α1(p/3)+α2(1−p)=0\alpha_1(p/3) + \alpha_2(1-p) = 0α1 (p/3)+α2 (1−p)=0,
> (similar for α3\alpha_3α3 and α4\alpha_4α4 ).

> We can see from this simple case that: for higher noise-level, the higher the
> one-norm γ=∣α1∣+∣α2∣+∣α3∣+∣α4∣\gamma =
> |\alpha_1|+|\alpha_2|+|\alpha_3|+|\alpha_4|γ=∣α1 ∣+∣α2 ∣+∣α3 ∣+∣α4 ∣,
> indicating higher sampling overhead ∼γ2\sim\gamma^2∼γ2.

30


MEASUREMENT ERROR MITIGATION 📄


IDEAL MEASUREMENT

 * measure the circuit
   → obtain binary string output x∈{0,1}Nx\in \{0,1\}^Nx∈{0,1}N

 * ideally: want to perform projective measurements in the computational basis
   {⟨ ⁣⟨x∣}\{\langle\!\langle x\vert\}{⟨⟨x∣}.


REAL-WORLD MEASUREMENT:

 * some measurement noise A\mathcal AA occurs, and turn the projective
   measurement into POVM {⟨ ⁣⟨Ex∣}={⟨ ⁣⟨x∣A∣}\{\langle\!\langle E_x\vert\}
   =\{\langle\!\langle x\vert \mathcal A\vert\}{⟨⟨Ex ∣}={⟨⟨x∣A∣} → leads to a
   different output statistics

> physical origins of readout errors
> 
>  * superconducting: thermal excitations, T1T_1T1 -decay.
>  * ion traps: hard to detect an ion’s dark state, collisions in the trap

→ it’s better (or sufficient) to construct a simplified noise model that is
agnostic of the actual origin of the noise.

31
Loading slide...
32

 * The AxyA_{xy}Axy in previous slide is the assignment matrix which can be
   explained as:
   
   * The coefficient ⟨ ⁣⟨x∣A∣y⟩ ⁣⟩\langle\!\langle x\vert \mathcal A\vert y
     \rangle\!\rangle⟨⟨x∣A∣y⟩⟩ represents the probability of originally
     measuring outcome yyy before noise, but measuring outcome xxx after the
     noise.
   
   * so AAA is a transition matrix (or "Stochastic matrix" in Bill’s Physics of
     Computation Lecture)
   
   * How to actually get the coefficient ⟨ ⁣⟨x∣A∣y⟩ ⁣⟩\langle\!\langle x\vert
     \mathcal A\vert y \rangle\!\rangle⟨⟨x∣A∣y⟩⟩? Ans: We just do the noisy
     measurement on the computational state ∣y⟩ ⁣⟩\vert y \rangle\!\rangle∣y⟩⟩,
     and see what’s the probability of outcome xxx.

 * If matrix AAA is full-rank, then we can do the reverse: maps noisy outcome
   back to noiseless outcome ⟨ ⁣⟨y∣=∑xAyx−1⟨ ⁣⟨Ex∣\langle\!\langle y\vert =
   \sum_x A_{yx}^{-1}\langle\!\langle E_x\vert⟨⟨y∣=∑x Ayx−1 ⟨⟨Ex ∣ → we restore
   the perfect outcome by linear-combining the noisy outcomes. (somewhat like
   the spirit in Probabilistic Error Cancellation.

33


RESTORING IDEAL EXPECTATION VALUE

 * For our incoming state ρ\rhoρ:
   
   * noiseless measurement gives output distribution p0⃗={⟨ ⁣⟨y∣ρ⟩ ⁣⟩}\vec{p_0}
     = \{\langle\!\langle y\vert \rho \rangle\!\rangle\}p0 ={⟨⟨y∣ρ⟩⟩}
   
   * noisy measurement gives output distribution
     pnoisy⃗={⟨ ⁣⟨Ex∣ρ⟩ ⁣⟩}\vec{p_\text{noisy}} = \{\langle\!\langle E_x\vert
     \rho \rangle\!\rangle\}pnoisy ={⟨⟨Ex ∣ρ⟩⟩}
   
   → pnoisy⃗=Ap0⃗\vec{p_\text{noisy}} = A \vec{p_0}pnoisy =Ap0 , hence
   p0⃗=A−1pnoisy⃗\vec{p_0} = A^{-1} \vec{p_\text{noisy}}p0 =A−1pnoisy .

 * If we want to measure on an observable OOO which has spectrum O⃗={Ox}\vec O =
   \{O_x\}O={Ox } (i.e., ⟨ ⁣⟨O∣=∑xOx⟨ ⁣⟨x∣\langle\!\langle O\vert = \sum_x O_x
   \langle\!\langle x\vert⟨⟨O∣=∑x Ox ⟨⟨x∣), once we know (or obtain) the
   assignment matrix AAA and the noisy output distribution
   pnoisy⃗\vec{p_\text{noisy}}pnoisy , we can restore the noiseless expectation
   value:

⟨ ⁣⟨O∣ρ⟩ ⁣⟩=O⃗Tp0⃗=O⃗TA−1pnoisy⃗\langle\!\langle O\vert \rho \rangle\!\rangle =
\vec{O}^T \vec{p_0} = \vec{O}^T A^{-1} \vec{p_\text{noisy}} ⟨⟨O∣ρ⟩⟩=OTp0
=OTA−1pnoisy

34


MAKE AAA EASIER TO OBTAIN (ASSUMING NNN QUBITS)

 * Why? Because normal scaling of AAA when NNN increases: exponentially.

 * Simplest way: make additional assumption that measurement errors of different
   qubits are not correlated → AAA is just tensor product of each qubit’s →
   A=⨂n=1NAnA = \bigotimes_{n=1}^N A_nA=⨂n=1N An .
   
   * But this is just too simplified, actually previous experiments can see
     correlated bit-flips.

35


WORKAROUND

Bravyi et al try to construct the assignment matrix using continuous-time Markov
processes with the generators (transition rate matrices) {Gi}\{G_i\}{Gi } being
single and two-qubit operators:

A=eG   with   G=∑i=12N2riGiA = e^G \;\text{ with }\; G = \sum_{i=1}^{2N^2}
r_iG_i A=eG with G=i=1∑2N2 ri Gi

 * Why summing to 2N22N^22N2?
   Because for generators we can just pick {σx,σy}\{\sigma_x, \sigma_y\}{σx ,σy
   } therefore # of single-qubit operator → 2N2N2N, # of two-qubit operator →
   N(N−1)2⋅22\frac{N(N-1)}{2} \cdot 2^22N(N−1) ⋅22 (pick 222 from NNN qubits,
   and assign each of them either σx\sigma_xσx or σy\sigma_yσy ). Adding the two
   #s up we get 2N22N^22N2.

 * Once we’ve obtain all 2N22N^22N2 positive coefficients {ri}\{r_i\}{ri }, then
   A−1A^{-1}A−1 is simply e−Ge^{-G}e−G.

36


INTRODUCING PAULI TWIRLING - SIMPLIFYING THE PROCESS 📄, 📄

After twirling, we can turn the original noise channel A\mathcal AA into a Pauli
channel D\mathcal DD, i.e., by using twirling we can remove all the off-diagonal
elements of AAA (in Pauli-basis)

 * before twirling: ideally we want to measure in ⟨ ⁣⟨σzx∣\langle\!\langle
   \sigma_z^x\vert⟨⟨σzx ∣ (xxx is the bit string marking the qubits that are
   acted by σz\sigma_zσz ), but because of the noise, we can only measure in
   ⟨ ⁣⟨σzx∣A\langle\!\langle \sigma_z^x\vert\mathcal A⟨⟨σzx ∣A.

 * after twirling: A\mathcal AA becomes D\mathcal DD, and what’s special about
   it is that ⟨ ⁣⟨σzx∣D=Dx⟨ ⁣⟨σzx\langle\!\langle \sigma_z^x\vert\mathcal D =
   D_x\langle\!\langle \sigma_z^x⟨⟨σzx ∣D=Dx ⟨⟨σzx where Dx=⟨ ⁣⟨σzx∣D∣σzx⟩ ⁣⟩D_x
   = \langle\!\langle \sigma_z^x\vert\mathcal D
   \vert\sigma_z^x\rangle\!\rangleDx =⟨⟨σzx ∣D∣σzx ⟩⟩ → the noisy measurement is
   just a rescale of the noiseless measurement (due to the fact that the twirled
   channel D\mathcal DD is diagonal in Pauli-basis)

For a given input state ∣ρ⟩ ⁣⟩\vert \rho \rangle\!\rangle∣ρ⟩⟩, we have simply:

⟨ ⁣⟨σzx∣ρ⟩ ⁣⟩⏟ideal=Dx−1⟨ ⁣⟨σzx∣D∣ρ⟩ ⁣⟩⏟noisy, but also twirled\underbrace{\langle\!\langle
\sigma_z^x\vert \rho \rangle\!\rangle}_{\text{ideal}} = D_x^{-1}
\underbrace{\langle\!\langle \sigma_z^x\vert\mathcal D\vert \rho
\rangle\!\rangle}_{\text{noisy, but also twirled}} ideal⟨⟨σzx ∣ρ⟩⟩ =Dx−1
noisy, but also twirled⟨⟨σzx ∣D∣ρ⟩⟩

It’s just a rescale to restore ideal expectation value! What’s more awesome is
that: for Pauli twirling in this case, we only need to conjugate the original
noise channel A\mathcal AA with {I,σx}\{I, \sigma_x\}{I,σx }instead of
{I,σx,σy,σz}\{I, \sigma_x, \sigma_y, \sigma_z\}{I,σx ,σy ,σz } because
conjugating A\mathcal AA with σy\sigma_yσy or σz\sigma_zσz will only have
trivial effect within the computational subspace.

37


REAL-WORLD IMPLEMENTATION (IBM QISKIT FOR EXAMPLE)

in next slide: expand the Resilience Level 1 bullet point

38
39

As measurement noise is a major obstacle in almost all experimental set-ups,
implementation of measurement error mitigation is almost ubiquitous in all
near-term experiments.

(A\mathcal AA is sometimes called: confusion matrix)

40


A SNEAK PEEK OF A\MATHCAL AA’S EXPLICIT FORM FOR A 2-QUBIT SYSTEM

in next slide (confusion matrix)

41
42


PURITY CONSTRAINTS


ORIGIN

 * Many quantum algorithms target the preparation of an ideal state ρ0\rho_0ρ0
   that is pure: ρ0=∣ψ0⟩⟨ψ0∣\rho_0 = \ket{\psi_0}\bra{\psi_0}ρ0 =∣ψ0 ⟩⟨ψ0 ∣.

 * Common-seen noise channels are stochastic, which will turn our ideal pure
   state ρ0\rho_0ρ0 into some noisy mixed state ρ\rhoρ.

 * So our goal is to find (or approximate) the closest pure state to ρ\rhoρ.

 * We can do spectral decomposition of NNN-qubit state: ρ=∑i=12Npi∣ϕi⟩⟨ϕi∣\rho =
   \sum_{i=1}^{2^N} p_i \ket{\phi_i}\bra{\phi_i}ρ=∑i=12N pi ∣ϕi ⟩⟨ϕi ∣ where the
   eigenvalues are in order (p1≥p2≥p3≥⋯p_1\geq p_2 \geq p_3 \geq \cdotsp1 ≥p2
   ≥p3 ≥⋯), then our goal is to seek for ∣ϕ1⟩⟨ϕ1∣\ket{\phi_1}\bra{\phi_1}∣ϕ1
   ⟩⟨ϕ1 ∣.

 * There’s already a way to do this: quantum Principal Component Analysis →
   sample from the eigenbasis of ρ\rhoρ. But, the circuit associated with this
   method is too deep for near-term devices. → need alternative strategies.

43


VIRTUAL DISTILLATION / ERROR SUPPRESSION BY DERANGEMENT ( VD/ ESD)

 * By collective measuring MMM copies of ρ\rhoρ (which is noisy) to infer the
   expectation values w.r.t. the MthM^\text{th}Mth degree purified state:
   
   ρpur(M)=ρMTr[ρM]=1∑i=12NpiM∑i=12NpiM∣ϕi⟩⟨ϕi∣\rho_\text{pur}^{(M)} =
   \frac{\rho^M}{\text{Tr}[\rho^M]} = \frac{1}{\sum_{i=1}^{2^N} p_i^M }
   \sum_{i=1}^{2^N} p_i^M \ket{\phi_i}\bra{\phi_i} ρpur(M) =Tr[ρM]ρM =∑i=12N piM
   1 i=1∑2N piM ∣ϕi ⟩⟨ϕi ∣

 * As M→∞M\rightarrow\inftyM→∞, our ρpur(M)→∣ϕ1⟩⟨ϕ1∣\rho_\text{pur}^{(M)}
   \rightarrow\ket{\phi_1}\bra{\phi_1}ρpur(M) →∣ϕ1 ⟩⟨ϕ1 ∣. The remaining bias in
   the estimator is called coherent mismatch or noise floor.

 * How do we measure to obtain
   Tr[Oρpur(M)]\text{Tr}[O\rho_\text{pur}^{(M)}]Tr[Oρpur(M) ]? Notice from the
   above formula that we have:
   
   Tr[Oρpur(M)]=Tr[OρM]Tr[ρM]\text{Tr}[O\rho_\text{pur}^{(M)}] =
   \frac{\text{Tr}[O\rho^M]}{\text{Tr}[\rho^M]} Tr[Oρpur(M) ]=Tr[ρM]Tr[OρM]
   
   → so we can separately measure for Tr[OρM]\text{Tr}[O\rho^M]Tr[OρM] and
   Tr[ρM]\text{Tr}[\rho^M]Tr[ρM] and divide one by another.

44


HOW TO MEASURE TR[OΡM]\TEXT{TR}[O\RHO^M]TR[OΡM] ?

(and Tr[ρM]\text{Tr}[\rho^M]Tr[ρM] of course)

Let SMS_MSM be the cyclic permutation operator between MMM copies of ρ\rhoρ,
then Tr[OρM]\text{Tr}[O\rho^M]Tr[OρM] is just
Tr[SMOmρ⊗M]\text{Tr}[S_MO_m\rho^{\otimes M}]Tr[SM Om ρ⊗M], or
Tr[SMOˉρ⊗M]\text{Tr}[S_M\bar O\rho^{\otimes M}]Tr[SM Oˉρ⊗M] where Oˉ=1M∑mOm\bar
O = \frac1M\sum_mO_mOˉ=M1 ∑m Om (is the observable symmetrized under copy
permutation.), case for M=3M=3M=3:

, and for Tr[ρM]\text{Tr}[\rho^M]Tr[ρM], we just simply remove OOO from the
above graph.


45
Do we really need to do global measurement (because of the global operator
SMS_MSM )?

→ it can be decomposed into transversal operations among different copies:

SM=⨂n=1NS~M(n)S_M = \bigotimes_{n=1}^N \tilde{S}_M^{(n)}SM =⨂n=1N S~M(n) ,

where S~M(n)\tilde{S}_M^{(n)}S~M(n) cyclically permute the nthn^\text{th}nth
qubits of all copies.

> moreover, if OOO is only a single-qubit observable: then the symmetrized
> Oˉ\bar OOˉ commutes with all S~M(n)\tilde{S}_M^{(n)}S~M(n) . So we don’t need
> to do global measurement, but instead measuring S~M(n)\tilde{S}_M^{(n)}S~M(n)
> and OmO_mOm for all m,nm,nm,n (while this is many measurements, but they are
> all low-weight operators), then post-process to get
> Tr[SMOˉρ⊗M]\text{Tr}[S_M\bar O\rho^{\otimes M}]Tr[SM Oˉρ⊗M].

case for S~M=2(n)\tilde{S}_{M=2}^{(n)}S~M=2(n) shown on the right


46


SUBSPACE EXPANSIONS 📄, 📄, 📄

One class of methods that can take advantage of the knowledge of the problem are
quantum subspace expansion techniques.


PROBLEM FORMULATION

 * e.g., optimization or state preparation, can be phrased as the desire to find
   some ∣ψ⟩\ket{\psi}∣ψ⟩ to minimize the objective function ⟨ψ∣H∣ψ⟩\bra{\psi} H
   \ket{\psi}⟨ψ∣H∣ψ⟩, with some already-known HHH.

 * but we can not directly prepare perfect ∣ψ⟩\ket{\psi}∣ψ⟩ (due to some
   coherent error in the circuit, or just “don’t knowing what the ideal circuit
   should be“)

 * good news is that: we can easily prepare a set of MMM different states
   {∣ϕi⟩}\{\ket{\phi_i}\}{∣ϕi ⟩}, which are linearly independent (but not
   necessary orthogonal to each other). And express our desired state as a
   linear combination of them:
   
   ∣ψw⃗⟩=∑i=0M−1wi∣ϕi⟩\ket{\psi_{\vec w}} = \sum_{i=0}^{M-1} w_i\ket{\phi_i} ∣ψw
   ⟩=i=0∑M−1 wi ∣ϕi ⟩

47

 * Dual problem: it’s equivalent to finding the optimal set of coefficients
   w⃗∗\vec w^*w∗ such that:
   
   w⃗∗=arg⁡min⁡w⃗⟨ψw⃗∣H∣ψw⃗⟩     s.t.     ⟨ψw⃗∣ψw⃗⟩=1\vec w^*= \arg\min_{\vec w}
   \bra{\psi_{\vec w}} H \ket{\psi_{\vec w}} \;\; \text{ s.t. }\;\;
   \braket{\psi_{\vec w}|\psi_{\vec w}} =1 w∗=argwmin ⟨ψw ∣H∣ψw ⟩ s.t. ⟨ψw ∣ψw
   ⟩=1

 * This is a generalized linear eigenvalue problem of pair (Hˉ,Sˉ)(\bar H, \bar
   S)(Hˉ,Sˉ):
   
   HˉW=SˉWE\bar H W = \bar S WE HˉW=SˉWE
   
   , where Hˉij=⟨ϕi∣H∣ϕj⟩\bar H_{ij} = \bra{\phi_i} H \ket{\phi_j}Hˉij =⟨ϕi
   ∣H∣ϕj ⟩ and Sˉij=⟨ϕi∣ϕj⟩\bar S_{ij} = \braket{\phi_i | \phi_j}Sˉij =⟨ϕi ∣ϕj
   ⟩. To speak in English, Hˉ\bar HHˉ is just HHH, but in the
   {∣ϕi⟩}\{\ket{\phi_i}\}{∣ϕi ⟩}-basis (hence with dimension M×MM\times MM×M);
   and Sˉ\bar SSˉ is just the overlap matrix for the basis set
   {∣ϕi⟩}\{\ket{\phi_i}\}{∣ϕi ⟩}.

generalized linear eigenvalue problem

Eigenvalue problem:
Aϕ⃗i=λiϕ⃗i\bm A \vec\phi_i = \lambda_i \vec\phi_iAϕ i =λi ϕ i , or in matrix
form: AΦ=ΦΛ\bm A \bm\Phi = \bm\Phi \bm\LambdaAΦ=ΦΛ. (the lambda matrix
Λ\bm\LambdaΛ is diagonal)

Generalized linear eigenvalue problem:
Aϕ⃗i=λiBϕ⃗i\bm A \vec\phi_i = \lambda_i \bm B\vec\phi_iAϕ i =λi Bϕ i , or in
matrix form: AΦ=BΦΛ\bm A \bm\Phi = \bm B \bm\Phi \bm\LambdaAΦ=BΦΛ.
→ this problem reduces to the previous simplier one when B=identity matrix\bm B
= \text{identity matrix}B=identity matrix.

48


ILLUSTRATION OF SUBSPACE EXPANSION ROUTINE

in next slide:
calculating Code Hamiltonian Matrix Hˉij=⟨ϕi∣H∣ϕj⟩\bar H_{ij} = \bra{\phi_i} H
\ket{\phi_j}Hˉij =⟨ϕi ∣H∣ϕj ⟩ and Overlap Matrix Sˉij=⟨ϕi∣ϕj⟩\bar S_{ij} =
\braket{\phi_i | \phi_j}Sˉij =⟨ϕi ∣ϕj ⟩

49
Loading slide...
50


FINDING W⃗∗\VEC W^*W∗

 * We know HHH, and we also know {∣ϕi⟩}\{\ket{\phi_i}\}{∣ϕi ⟩}, so we can
   formulate the above problem explicitly.

 * And this problem (generalized linear eigenvalue problem) can be solved
   classically (on classical computer), obtaining the WWW and EEE matrices
   (called "eigenvectors matrices" and "eigenvalue matrices") on classical
   computer.

 * Then, what we want is the lowest eigenvalue in EEE (not yet!), and its
   associated eigenvector (in WWW) is the optimal combination of coefficients
   w⃗∗\vec w^*w∗.

51


UTILIZING W⃗∗\VEC W^*W∗ TO IMPROVE OOO’S EXPECTATION VALUE

The improved (or mitigated) expectation value for some observable OOO with
respect to our new found state ∣ψw⃗∗⟩\ket{\psi_{\vec w^*}}∣ψw∗ ⟩ is just:

⟨ψw⃗∗∣O∣ψw⃗∗⟩=∑i,j=0M−1wi∗wj∗⟨ϕi∣O∣ϕj⟩\bra{\psi_{\vec w^*}} O \ket{\psi_{\vec
w^*}} = \sum_{i,j=0}^{M-1} w_i^* w_j^* \bra{\phi_i} O \ket{\phi_j} ⟨ψw∗ ∣O∣ψw∗
⟩=i,j=0∑M−1 wi∗ wj∗ ⟨ϕi ∣O∣ϕj ⟩

→ we can evaluate this value via directly using w⃗∗\vec w^*w∗, combined with
measuring ⟨ϕi∣O∣ϕj⟩\bra{\phi_i} O \ket{\phi_j}⟨ϕi ∣O∣ϕj ⟩ (which we can do
easily).
→ don’t need to explicitly prepare ∣ψw⃗∗⟩\ket{\psi_{\vec w^*}}∣ψw∗ ⟩.

>  * the more “complete” the basis set {∣ϕi⟩}\{\ket{\phi_i}\}{∣ϕi ⟩}, the more
>    ideal our ∣ψw⃗∗⟩\ket{\psi_{\vec w^*}}∣ψw∗ ⟩ could be.
>  * but as our basis set grows, it consumes more and more resource when we are
>    solving generalized linear eigenvalue problem on classical computer → grows
>    exponentially.
>  * hence the key is to choosing the most suitable {∣ϕi⟩}\{\ket{\phi_i}\}{∣ϕi
>    ⟩}. So usually the first basis (∣ϕ0⟩\ket{\phi_0}∣ϕ0 ⟩) in the set would be
>    chosen as the best state we can prepare.

52


NEXT UP

What is the cost (sample complexity) of these techniques?

53


QUANTUM FISHER INFORMATION

 * p(x∣λ)p(x|\lambda)p(x∣λ) → conditional probability of getting xxx as the
   measurement result when the parameter is λ\lambdaλ, then the Fisher
   Information is:

F(λ)=∫p(x∣λ)(∂ln⁡(p(x∣λ))∂λ)2dx=∫1p(x∣λ)(∂p(x∣λ)∂λ)2dxF(\lambda) = \int
p(x|\lambda) \left( \frac{\partial \ln(p(x|\lambda))}{\partial \lambda}
\right)^2 dx = \int \frac{1}{p(x|\lambda)} \left( \frac{\partial
p(x|\lambda)}{\partial \lambda} \right)^2 dx F(λ)=∫p(x∣λ)(∂λ∂ln(p(x∣λ))
)2dx=∫p(x∣λ)1 (∂λ∂p(x∣λ) )2dx

 * Now introduce POVM {Πx}\{\Pi_x\}{Πx } which obeys completeness:
   ∫Πxdx=I\int\Pi_xdx=I∫Πx dx=I, and for our later convenience, define the
   symmetric logarithmic derivative (SLD) LλL_\lambdaLλ implicitly as the
   Hermitian operator satisfying:
   
   Lλρλ+ρλLλ2=∂ρλ∂λ\frac{L_\lambda\rho_\lambda + \rho_\lambda L_\lambda}{2} =
   \frac{\partial\rho_\lambda}{\partial\lambda} 2Lλ ρλ +ρλ Lλ =∂λ∂ρλ
   
   * the naming origin of SLD:
     
     symmetric: Lλρλ+ρλLλL_\lambda\rho_\lambda + \rho_\lambda L_\lambdaLλ ρλ +ρλ
     Lλ → obvious.
     logarithmic derivative: divide both sides by ρλ\rho_\lambdaρλ we kind of
     get Lλ:=∂ln⁡ρλ∂λL_\lambda :=
     \frac{\partial\ln\rho_\lambda}{\partial\lambda}Lλ :=∂λ∂lnρλ
   * This is a Lyapunov matrix equation with general solution available.

54
Loading slide...
55

 * This Fisher Information is ultimately upper-bounded by some perfect POVM (in
   the sense of maximizing F(λ)F(\lambda)F(λ)) by:
   
   F(λ)≤∫∣tr[ρΠL]tr[ρΠ]∣2dx=∫∣tr[ρΠtr[ρΠ]ΠLρ]∣2dx≤∫tr[ΠLρL]dx=tr[LρL]=tr[ρL2]≡H(λ)\begin{align*}
   F(\lambda) &\leq \int\left|\frac{\text{tr}[\rho\Pi
   L]}{\sqrt{\text{tr}[\rho\Pi ]}}\right|^2 dx \\ &= \int \left| \text{tr}\left[
   \frac{\sqrt\rho\sqrt\Pi}{\sqrt{\text{tr}[\rho\Pi]}} \sqrt\Pi
   L\sqrt\rho\right] \right|^2 dx \\ &\leq \int \text{tr}[\Pi L\rho L] dx \\ &=
   \text{tr}[L\rho L] \\ &= \text{tr}[\rho L^2] \equiv H(\lambda) \end{align*}
   F(λ) ≤∫ tr[ρΠ] tr[ρΠL] 2dx=∫ tr[tr[ρΠ] ρ Π Π Lρ ]
   2dx≤∫tr[ΠLρL]dx=tr[LρL]=tr[ρL2]≡H(λ)

56

 * In the previous slide, the first inequality saturates when
   tr[ρΠL]\text{tr}[\rho\Pi L]tr[ρΠL] is a real number for any λ\lambdaλ, and
   the second inequality comes from the Schwartz inequality
   ∣tr[A†B]∣2≤tr[A†A]tr[B†B]|\text{tr}[A^\dagger B]|^2 \leq \text{tr}[A^\dagger
   A] \text{tr}[B^\dagger B]∣tr[A†B]∣2≤tr[A†A]tr[B†B], with
   A†≡ρΠ/tr[ρΠ]A^\dagger\equiv \sqrt\rho\sqrt\Pi/\sqrt{\text{tr}[\rho\Pi]}A†≡ρ Π
   /tr[ρΠ] and B≡ΠLρB\equiv \sqrt\Pi L\sqrt\rhoB≡Π Lρ , and it’s saturated when
   Πρtr[ρΠ]=ΠLρtr[ρΠL]\frac{\sqrt\Pi\sqrt\rho}{\text{tr}[\rho\Pi]} =
   \frac{\sqrt\Pi L\sqrt\rho}{\text{tr}[\rho\Pi L]}tr[ρΠ]Π ρ =tr[ρΠL]Π Lρ for
   any λ\lambdaλ. <- this is satisfied iff {Πx}\{\Pi_x\}{Πx } is made by the set
   of projectors over the eigenstates of LλL_\lambdaLλ , which, in turn,
   represents the optimal POVM to estimate λ\lambdaλ.

 * The above chain of inequalities prove that the Fisher information
   F(λ)F(\lambda)F(λ) of any quantum measurement is bounded by the so-called
   Quantum Fisher Information H(λ)≡tr[ρL2]=tr[∂ρ∂λL]H(\lambda) \equiv
   \text{tr}[\rho L^2] =
   \text{tr}[\frac{\partial\rho}{\partial\lambda}L]H(λ)≡tr[ρL2]=tr[∂λ∂ρ L].

 * You can view Quantum Fisher Information as the supremum of the classical
   Fisher Information over all observables.

57


LYAPUNOV MATRIX EQUATION

The general solution to Lλρλ+ρλLλ2=∂ρλ∂λ\frac{L_\lambda\rho_\lambda +
\rho_\lambda L_\lambda}{2} = \frac{\partial\rho_\lambda}{\partial\lambda}2Lλ ρλ
+ρλ Lλ =∂λ∂ρλ for the SLD LλL_\lambdaLλ may be written as:

Lλ=2∫0∞e−ρt∂ρ∂λe−ρtdtL_\lambda = 2\int_0^\infty e^{-\rho t
}\frac{\partial\rho}{\partial \lambda}e^{-\rho t } dt Lλ =2∫0∞ e−ρt∂λ∂ρ e−ρtdt

Now if we write ρ\rhoρ in its eigenbasis
∑nρn∣ψn⟩⟨ψn∣\sum_n\rho_n\ket{\psi_n}\bra{\psi_n}∑n ρn ∣ψn ⟩⟨ψn ∣, the SLD easily
becomes: (you cen verify this by direct substitution )

Lλ=2∑m,n⟨ψm∣∂ρ∂λ∣ψn⟩ρm+ρn∣ψm⟩⟨ψn∣L_\lambda = 2\sum_{m,n}
\frac{\bra{\psi_m}\frac{\partial\rho}{\partial\lambda}\ket{\psi_n}}{\rho_m+\rho_n}\ket{\psi_m}\bra{\psi_n}
Lλ =2m,n∑ ρm +ρn ⟨ψm ∣∂λ∂ρ ∣ψn ⟩ ∣ψm ⟩⟨ψn ∣

where the sum only includes terms with ρn+ρm≠0\rho_n+\rho_m\neq0ρn +ρm =0. From
this we can express our Quantum Fisher Information H(λ)H(\lambda)H(λ) as: (to
verified this, just use the dummy index trick)

H(λ)=tr[ρL2]=4∑m,nρm∣⟨ψm∣∂ρ∂λ∣ψn⟩∣2(ρm+ρn)2=2∑m,n∣⟨ψm∣∂ρ∂λ∣ψn⟩∣2ρm+ρnH(\lambda)
= \text{tr}[\rho L^2] = 4\sum_{m,n} \rho_m
\frac{\left|\bra{\psi_m}\frac{\partial\rho}{\partial\lambda}\ket{\psi_n}\right|^2}{(\rho_m+\rho_n)^2}
= 2\sum_{m,n}
\frac{\left|\bra{\psi_m}\frac{\partial\rho}{\partial\lambda}\ket{\psi_n}\right|^2}{\rho_m+\rho_n}
H(λ)=tr[ρL2]=4m,n∑ ρm (ρm +ρn )2 ⟨ψm ∣∂λ∂ρ ∣ψn ⟩ 2 =2m,n∑ ρm +ρn ⟨ψm ∣∂λ∂ρ ∣ψn ⟩
2

58


USUAL FORM OF QUANTUM FISHER INFORMATION H(Λ)H(\LAMBDA)H(Λ)

If the dynamics of ρ\rhoρ is governed by some Hamiltonian-like operator AAA,
that is, ρ=(e−iλA)(ρ0)(eiλA)\rho = (e^{-i\lambda A}) (\rho_0 )(e^{i\lambda
A})ρ=(e−iλA)(ρ0 )(eiλA), the Quantum Fisher Information reduces to:

H(λ)=2∑m,n(ρm−ρn)2ρm+ρn∣⟨ψm∣A∣ψn⟩∣2H(\lambda) = 2\sum_{m,n}
\frac{(\rho_m-\rho_n)^2}{\rho_m+\rho_n} \left|\bra{\psi_m}A\ket{\psi_n}\right|^2
H(λ)=2m,n∑ ρm +ρn (ρm −ρn )2 ∣⟨ψm ∣A∣ψn ⟩∣2

 * Proof:
   
   ⟨ψm∣∂ρ∂λ∣ψn⟩=⟨ψm∣(−iA)ρ+ρ(iA)∣ψn⟩=⟨ψm∣((−iA)∑mρn∣ψn⟩⟨ψn∣+∑mρn∣ψn⟩⟨ψn∣(iA))∣ψn⟩=−iρn⟨ψm∣A∣ψn⟩+iρm⟨ψm∣A∣ψn⟩  ⟹  ∣⟨ψm∣∂ρ∂λ∣ψn⟩∣2=(ρm−ρn)2⋅∣⟨ψm∣A∣ψn⟩∣2\begin{align*}
   &\bra{\psi_m}\frac{\partial\rho}{\partial\lambda}\ket{\psi_n} \\ &=
   \bra{\psi_m} (-iA)\rho + \rho (iA) \ket{\psi_n} \\ &= \bra{\psi_m}
   \left((-iA)\sum_m\rho_n\ket{\psi_n}\bra{\psi_n} +
   \sum_m\rho_n\ket{\psi_n}\bra{\psi_n} (iA)\right) \ket{\psi_n} \\ &= -i\rho_n
   \bra{\psi_m}A \ket{\psi_n} + i\rho_m \bra{\psi_m}A \ket{\psi_n} \\
   &\implies\left|\bra{\psi_m}\frac{\partial\rho}{\partial\lambda}\ket{\psi_n}\right|^2
   = (\rho_m-\rho_n)^2 \cdot \left| \bra{\psi_m}A \ket{\psi_n}\right|^2
   \end{align*} ⟨ψm ∣∂λ∂ρ ∣ψn ⟩=⟨ψm ∣(−iA)ρ+ρ(iA)∣ψn ⟩=⟨ψm ∣((−iA)m∑ ρn ∣ψn ⟩⟨ψn
   ∣+m∑ ρn ∣ψn ⟩⟨ψn ∣(iA))∣ψn ⟩=−iρn ⟨ψm ∣A∣ψn ⟩+iρm ⟨ψm ∣A∣ψn ⟩⟹ ⟨ψm ∣∂λ∂ρ ∣ψn
   ⟩ 2=(ρm −ρn )2⋅∣⟨ψm ∣A∣ψn ⟩∣2

59
60


EXPANSION OF SLD

We can further expand ∂ρ/∂λ\partial\rho/\partial\lambda∂ρ/∂λ in its eigenbasis
as

∂ρ∂λ=∑n∂ρn∂λ∣ψn⟩⟨ψn∣+ρn∣∂ψn∂λ⟩⟨ψn∣+ρn∣ψn⟩⟨∂ψn∂λ∣\frac{\partial\rho}{\partial\lambda}
= \sum_n \frac{\partial\rho_n}{\partial\lambda} \ket{\psi_n}\bra{\psi_n} +
\rho_n \ket{\frac{\partial\psi_n}{\partial\lambda}} \bra{\psi_n} +
\rho_n\ket{\psi_n} \bra{\frac{\partial\psi_n}{\partial\lambda}} ∂λ∂ρ =n∑ ∂λ∂ρn
∣ψn ⟩⟨ψn ∣+ρn ∣∂λ∂ψn ⟩⟨ψn ∣+ρn ∣ψn ⟩⟨∂λ∂ψn ∣

but since SLD is defined only on the support of ρ\rhoρ and that both the
eigenvalues ρk\rho_kρk and the eigenvectors ∣ψn⟩\ket{\psi_n}∣ψn ⟩ may depend on
the parameter λ\lambdaλ, we may moreover expand ∣ψn⟩\ket{\psi_n}∣ψn ⟩ in some
arbitrary basis {∣k⟩}\{\ket k\}{∣k⟩} that is independent of λ\lambdaλ. If we do
this expansion for ⟨ψn∣ψm⟩=δnm\braket{\psi_n | \psi_m} = \delta_{nm}⟨ψn ∣ψm
⟩=δnm , we get (but this will not be used in the later context tbh)

⟨ψn∣ψm⟩=∑k,l(ψnk∗⟨k∣)(ψml∣l⟩)=∑kψnk∗ψmk=δnm\braket{\psi_n | \psi_m} = \sum_{k,l}
(\psi_{nk}^* \bra{k} ) (\psi_{ml} \ket l) = \sum_k
\psi_{nk}^*\psi_{mk}=\delta_{nm} ⟨ψn ∣ψm ⟩=k,l∑ (ψnk∗ ⟨k∣)(ψml ∣l⟩)=k∑ ψnk∗ ψmk
=δnm

61

On the other hand, from product rule we have

∂⟨ψm∣ψn⟩∂λ=⟨∂ψm∂λ∣ψn⟩+⟨ψm∣∂ψn∂λ⟩=∂δmn∂λ=0\frac{\partial\braket{\psi_m|\psi_n}
}{\partial\lambda} = \braket{\frac{\partial\psi_m}{\partial\lambda} | \psi_n} +
\braket{\psi_m|\frac{\partial\psi_n}{\partial\lambda}} = \frac{\partial
\delta_{mn}}{\partial\lambda} = 0 ∂λ∂⟨ψm ∣ψn ⟩ =⟨∂λ∂ψm ∣ψn ⟩+⟨ψm ∣∂λ∂ψn ⟩=∂λ∂δmn
=0

therefore using the fact that
⟨∂ψm∂λ∣ψn⟩=−⟨ψm∣∂ψn∂λ⟩\braket{\frac{\partial\psi_m}{\partial\lambda} | \psi_n}
=-\braket{\psi_m|\frac{\partial\psi_n}{\partial\lambda}}⟨∂λ∂ψm ∣ψn ⟩=−⟨ψm ∣∂λ∂ψn
⟩, we can rewrite the matrix form of LλL_\lambdaLλ in the SLD general solution
as:

Lλ=2∑m,n⟨ψm∣∂ρ∂λ∣ψn⟩ρm+ρn∣ψm⟩⟨ψn∣=∑n∂ρn∂λ∣ψn⟩⟨ψn∣ρn      +2∑m≠n(ρn⟨ψm∣∂ψn∂λ⟩+ρm⟨∂ψm∂λ∣ψn⟩)∣ψm⟩⟨ψn∣ρm+ρn=∑n∂ρn∂λ∣ψn⟩⟨ψn∣ρn+2∑m≠nρn−ρmρn+ρm⟨ψm∣∂ψn∂λ⟩∣ψm⟩⟨ψn∣\begin{align*}
L_\lambda &= 2\sum_{m,n}
\frac{\bra{\psi_m}\frac{\partial\rho}{\partial\lambda}\ket{\psi_n}}{\rho_m+\rho_n}\ket{\psi_m}\bra{\psi_n}
\\ &= \sum_n
\frac{\partial\rho_n}{\partial\lambda}\frac{\ket{\psi_n}\bra{\psi_n}}{\rho_n} \\
&\;\;\;+ 2\sum_{m\neq n} \left( \rho_n
\braket{\psi_m|\frac{\partial\psi_n}{\partial\lambda}} +
\rho_m\braket{\frac{\partial\psi_m}{\partial\lambda} | \psi_n}
\right)\frac{\ket{\psi_m}\bra{\psi_n}}{\rho_m+\rho_n} \\ &= \sum_n
\frac{\partial\rho_n}{\partial\lambda}\frac{\ket{\psi_n}\bra{\psi_n}}{\rho_n} +
2\sum_{m\neq n} \frac{\rho_n-\rho_m}{\rho_n+\rho_m}
\braket{\psi_m|\frac{\partial\psi_n}{\partial\lambda}} \ket{\psi_m}\bra{\psi_n}
\end{align*} Lλ =2m,n∑ ρm +ρn ⟨ψm ∣∂λ∂ρ ∣ψn ⟩ ∣ψm ⟩⟨ψn ∣=n∑ ∂λ∂ρn ρn ∣ψn ⟩⟨ψn ∣
+2m=n∑ (ρn ⟨ψm ∣∂λ∂ψn ⟩+ρm ⟨∂λ∂ψm ∣ψn ⟩)ρm +ρn ∣ψm ⟩⟨ψn ∣ =n∑ ∂λ∂ρn ρn ∣ψn ⟩⟨ψn
∣ +2m=n∑ ρn +ρm ρn −ρm ⟨ψm ∣∂λ∂ψn ⟩∣ψm ⟩⟨ψn ∣

62

and hence the Quantum Fisher Information is:

H(λ)=∑n1ρn(∂ρn∂λ)2+2∑n≠mσnm∣⟨ψm∣∂ψn∂λ⟩∣2H(\lambda) = \sum_n \frac{1}{\rho_n}
(\frac{\partial\rho_n}{\partial\lambda})^2 + 2\sum_{n\neq m} \sigma_{nm}
\left|\braket{\psi_m|\frac{\partial\psi_n}{\partial\lambda}}\right|^2 H(λ)=n∑ ρn
1 (∂λ∂ρn )2+2n=m∑ σnm ⟨ψm ∣∂λ∂ψn ⟩ 2

where σnm=(ρn−ρm)2ρn+ρm+(any anti-symmetric term)\sigma_{nm} =
\dfrac{(\rho_n-\rho_m)^2}{\rho_n+\rho_m} + \text{(any anti-symmetric term)}σnm
=ρn +ρm (ρn −ρm )2 +(any anti-symmetric term).

 * The first term in the above H(λ)H(\lambda)H(λ) represents the classical
   Fisher information of the distribution {ρn}\{\rho_n\}{ρn }.

 * The second term arises from the dependence of ρλ\rho_\lambdaρλ ’s
   eigenvectors on λ\lambdaλ. The corresponding optimal quantum estimator is
   then:

Oλ=∑n(λ+1ρn∂ρn∂λ)∣ψn⟩⟨ψn∣+2H(λ)∑n≠mρn−ρmρn+ρm⟨ψm∣∂ψn∂λ⟩∣ψm⟩⟨ψn∣O_\lambda =
\sum_n(\lambda+\frac{1}{\rho_n}\frac{\partial\rho_n}{\partial\lambda})
\ket{\psi_n}\bra{\psi_n} \\ \hspace{3em} + \frac{2}{H(\lambda)} \sum_{n\neq m}
\frac{\rho_n-\rho_m}{\rho_n+\rho_m}
\braket{\psi_m|\frac{\partial\psi_n}{\partial\lambda}} \ket{\psi_m}\bra{\psi_n}
Oλ =n∑ (λ+ρn 1 ∂λ∂ρn )∣ψn ⟩⟨ψn ∣+H(λ)2 n=m∑ ρn +ρm ρn −ρm ⟨ψm ∣∂λ∂ψn ⟩∣ψm ⟩⟨ψn
∣

63


MULTI-PARAMETRIC QFI AND RE-PARAMETRIZATION

When our quantum states ρ\rhoρ depends on a set of parameters
λ={λi},i=1,2,⋯ ,N\bm\lambda = \{\lambda_i\}, i=1,2,\cdots,Nλ={λi },i=1,2,⋯,N,
our Quantum Fisher Information changes from a scalar into a matrix:

H(λ)ij=Tr[ρLiLj+LjLi2]=Tr[∂ρ∂λiLj]=Tr[∂ρ∂λjLi]=∑n(∂iρn)(∂jρn)ρn+∑n≠m(ρm−ρn)2ρm+ρn[⟨ψn∣∂iψm⟩⟨∂jψm∣ψn⟩+⟨ψn∣∂jψm⟩⟨∂iψm∣ψn⟩]\begin{align*}
\bm H(\bm\lambda)_{ij} &= \text{Tr}\left[ \rho \frac{L_iL_j + L_jL_i}{2} \right]
= \text{Tr}[\frac{\partial\rho}{\partial\lambda_i}L_j] =
\text{Tr}[\frac{\partial\rho}{\partial\lambda_j}L_i] \\ &= \sum_n
\frac{(\partial_i\rho_n)(\partial_j\rho_n)}{\rho_n} + \sum_{n\neq m}
\frac{(\rho_m-\rho_n)^2}{\rho_m+\rho_n} \left[ \braket{\psi_n|\partial_i\psi_m}
\braket{\partial_j\psi_m|\psi_n} + \braket{\psi_n|\partial_j\psi_m}
\braket{\partial_i\psi_m|\psi_n} \right] \end{align*} H(λ)ij =Tr[ρ2Li Lj +Lj Li
]=Tr[∂λi ∂ρ Lj ]=Tr[∂λj ∂ρ Li ]=n∑ ρn (∂i ρn )(∂j ρn ) +n=m∑ ρm +ρn (ρm −ρn )2
[⟨ψn ∣∂i ψm ⟩⟨∂j ψm ∣ψn ⟩+⟨ψn ∣∂j ψm ⟩⟨∂i ψm ∣ψn ⟩]

64

The Cramer-Rao theorem for multi-parameter estimation says that the inverse of
the Fisher matrix provides a lower bound on the covariance matrix
Cov[γ]ij=⟨λiλj⟩−⟨λi⟩⟨λj⟩\text{Cov}[\bm\gamma]_{ij} = \braket{\lambda_i\lambda_j}
- \braket{\lambda_i}\braket{\lambda_j}Cov[γ]ij =⟨λi λj ⟩−⟨λi ⟩⟨λj ⟩, that is,

Cov[γ]≥1NH(λ)−1\text{Cov}[\bm\gamma] \geq \frac{1}{N} \bm H(\bm\lambda)^{-1}
Cov[γ]≥N1 H(λ)−1

 * this is a matrix inequality, hence the corresponding bound may not be
   achievable in a multi-parameter estimation.

 * but on the other hand, the diagonal elements of the inverse Fisher matrix
   provide achievable bounds for the variances of single parameter estimators at
   fixed value of the others, which is
   Var(λi)=γii≥1N(H−1)ii\text{Var}(\lambda_i) = \gamma_{ii} \geq \frac{1}{N}
   (\bm H^{-1})_{ii}Var(λi )=γii ≥N1 (H−1)ii .

65

 * Now suppose our quantity of interest is a known function of the set of
   parameters → f(λ)f(\bm\lambda)f(λ). Then in this case we need to
   re-parametrize our quantum states with a new set of parameters λ→λ~\bm\lambda
   \rightarrow \tilde{\bm\lambda}λ→λ~ that includes the quantity of interest.
   (for example: let λ1~≡f(λ)\tilde{\lambda_1} \equiv f(\bm\lambda)λ1 ~ ≡f(λ))

 * Since ∂μ~=∑νBμν∂ν\tilde{\partial_\mu} = \sum_\nu B_{\mu\nu} \partial_\nu∂μ ~
   =∑ν Bμν ∂ν where Bμν=∂λν/∂λμ~B_{\mu\nu} =
   \partial\lambda_\nu/\partial\tilde{\lambda_\mu}Bμν =∂λν /∂λμ ~ , our new SLD
   and QFI matrix can be expressed as Lμ~=∑νBμνLν\tilde{L_\mu} = \sum_\nu
   B_{\mu\nu} L_\nuLμ ~ =∑ν Bμν Lν and H~=BHBT\tilde{\bm H} = \bm B\bm H\bm
   B^TH~=BHBT hence H~−1=(BT)−1H−1B−1\tilde{\bm H}^{-1} = (\bm B^T)^{-1}{\bm
   H}^{-1}{\bm B}^{-1}H~−1=(BT)−1H−1B−1.

 * So if our new parameter set has only one parameter f(λ)f(\bm\lambda)f(λ),
   then B=[∂λ1∂f,∂λ2∂f,⋯ ,∂λr∂f]\bm B = [\frac{\partial\lambda_1}{\partial f},
   \frac{\partial\lambda_2}{\partial f},\cdots,\frac{\partial\lambda_r}{\partial
   f}]B=[∂f∂λ1 ,∂f∂λ2 ,⋯,∂f∂λr ] and therefore B−1=∇λ(f(λ))\bm B^{-1} =
   \bm\nabla_{\bm{\lambda}}(f(\bm\lambda))B−1=∇λ (f(λ)). The resulting
   Cramér-Rao inequality has the form:

Var(f(λ)est)≥1N⋅∇λ(f(λ))T⋅H(ρ(λ))−1⋅∇λ(f(λ))\text{Var}(f(\bm\lambda)^\text{est})
\geq \frac1N \cdot \bm\nabla_{\bm{\lambda}}(f(\bm\lambda))^T \cdot
\bm{H}(\rho(\bm\lambda))^{-1} \cdot \bm\nabla_{\bm{\lambda}}(f(\bm\lambda))
Var(f(λ)est)≥N1 ⋅∇λ (f(λ))T⋅H(ρ(λ))−1⋅∇λ (f(λ))

66


APPLYING CRAMÉR-RAO BOUND

Our task is now:

Given: an unknown, noiseless quantum state ρ\rhoρ which can be parameterized as
ρ^(θ)\hat \rho(\bm{\theta})ρ^ (θ) with θ\bm{\theta}θ being the unknown
parameters.

Goal: to estimate the quantity of f(θ)f(\bm{\theta})f(θ), which is a function of
θ\bm{\theta}θ.

How: by performing POVM on NNN-many independent copies of ρ^(θ)\hat
\rho(\bm{\theta})ρ^ (θ).

67


QUICK RECAP:

Variance≥1N⋅{Quantum Fisher Information}\text{Variance} \geq
\frac{1}{N\cdot\{\text{Quantum Fisher Information}\}}
Variance≥N⋅{Quantum Fisher Information}1

Var(f(θ)est)≥1N⋅∇θ(f(θ))T  J(ρ(θ))−1  ∇θ(f(θ))\text{Var}(f(\bm\theta)^\text{est})
\geq \frac 1N \cdot \bm\nabla_{\bm\theta} (f(\bm\theta))^T \;
J(\rho(\bm\theta))^{-1} \; \bm\nabla_{\bm\theta} (f(\bm\theta)) Var(f(θ)est)≥N1
⋅∇θ (f(θ))TJ(ρ(θ))−1∇θ (f(θ))

The quantum Fisher information matrix JJJ of ρ^(θ)\hat\rho(\bm\theta)ρ^ (θ) is
defined exactly the same as in Multi-parametric QFI and re-parametrization:

[J]ij=12tr[ρ^{L^i,L^j}][J]_{ij} = \frac12 \text{tr} \left[\hat\rho\Big\{ \hat
L_i, \hat L_j \Big\}\right] [J]ij =21 tr[ρ^ {L^i ,L^j }]

where L^i\hat L_iL^i is the symmetric logarithmic derivative (SLD) operator
defined exactly the same as in Quantum Fisher Information: (with slightly
changed notation)

12{ρ^,L^i}=∂ρ^∂θi\frac12\Big\{ \hat\rho, \hat L_i \Big\} = \frac{\partial
\hat\rho}{\partial \theta_i} 21 {ρ^ ,L^i }=∂θi ∂ρ^

68


FUNCTION OF GENERAL BLOCH VECTOR Θ\BM{\THETA}Θ

In order to apply the bound, we need to calculate Quantum Fisher Information; in
order to calculate QFI, we need to first calculate f(θ)f(\bm{\theta})f(θ) using
components defined in Noiseless Target state (n-qubit) and Virtual Quantum
Circuit:

f(θ)=Tr[ρ^X^]=Tr[(I^2n+θ⋅P^2n+1)(x⋅P^)]=0+2n2n+1θ⋅x=2(n−1)/2θ⋅x\begin{align*}
f(\bm{\theta}) &= \text{Tr}[\hat\rho\hat X] = \text{Tr}\left[ \left(\dfrac{\hat
I}{2^n} + \dfrac{\bm\theta \cdot \bm{\hat P}}{\sqrt{2^{n+1}}}\right)
\left(\bm{x} \cdot \bm{\hat P}\right) \right] \\ &= 0 +
\frac{2^n}{\sqrt{2^{n+1}}}\bm{\theta}\cdot\bm{x} \\ &=
2^{(n-1)/2}\bm{\theta}\cdot\bm{x} \end{align*} f(θ) =Tr[ρ^ X^]=Tr[(2nI^ +2n+1
θ⋅P^ )(x⋅P^)]=0+2n+1 2n θ⋅x=2(n−1)/2θ⋅x

69

 * we can re-factor the Pauli group P^\bm{\hat P}P^ as
   λ^i≡2(1−n)/2P^i\hat\lambda_i \equiv 2^{(1-n)/2}\hat P_iλ^i ≡2(1−n)/2P^i ,
   which are related to generators of the Lie algebra SU(2n)SU(2^n)SU(2n).

 * our density matrix (state) then becomes:
   
   ρ^(θ)=12nI^+12θ⋅λ^\hat\rho(\bm\theta) = \frac{1}{2^n}\hat I +
   \frac12\bm\theta\cdot\hat{\bm\lambda} ρ^ (θ)=2n1 I^+21 θ⋅λ^

 * the corresponding state after noise channel is:
   
   E(ρ^(θ))=12nI^+12(Aθ+c)⋅λ^\mathcal{E}(\hat\rho(\bm\theta)) =
   \frac{1}{2^n}\hat I + \frac12(A\bm\theta+c)\cdot\hat{\bm\lambda} E(ρ^
   (θ))=2n1 I^+21 (Aθ+c)⋅λ^

70


VARIANCE OF NOISELESS STATE

The bound for Var(⟨X^⟩est)\text{Var}(\braket{\hat X}^\text{est})Var(⟨X^⟩est)
satisfies:

Var(⟨X^⟩est)≥1N⋅(2(n−1)/2  xT)J(ρ(θ))−1(2(n−1)/2  x)=2n−1NxTJ(ρ(θ))−1x=1N(ΔX^)2\begin{align*}
\text{Var}(\braket{\hat X}^\text{est}) &\geq \frac1N \cdot \left( 2^{(n-1)/2}\;
\bm{x}^T\right) J(\rho(\bm\theta))^{-1} \left( 2^{(n-1)/2}\; \bm{x}\right) \\ &=
\frac{2^{n-1}}{N} \bm{x}^T J(\rho(\bm\theta))^{-1} \bm{x} \\ &= \frac1N
(\Delta\hat X)^2 \end{align*} Var(⟨X^⟩est) ≥N1
⋅(2(n−1)/2xT)J(ρ(θ))−1(2(n−1)/2x)=N2n−1 xTJ(ρ(θ))−1x=N1 (ΔX^)2

where (ΔX^)2≡(\Delta\hat X)^2 \equiv(ΔX^)2≡ the variance of observable X^\hat
XX^ of the noiseless state ρ^\hat\rhoρ^ .

 * The equality in the above equation can be achieved when we do optimal POVM,
   which is the projection measurement in the diagonal basis of operator X^\hat
   XX^.

71


VARIANCE OF NOISY STATE

Now suppose we add noise channel with the form described in Virtual Quantum
Circuit, El:θ↦Alθ+cl\mathcal{E}_l: \bm{\theta} \mapsto A_l \bm{\theta} +
\bm{c}_lEl :θ↦Al θ+cl where its two components:

 * (Al)ij=Tr[P^iEl(P^j)]/2n=12Tr[λ^iEl(λ^j)](A_l)_{ij} = \mathrm{Tr}[\hat P_i
   \mathcal{E}_l (\hat P_j)] / 2^n = \frac12\text{Tr}[\hat\lambda_i
   \mathcal{E}_l (\hat\lambda_j)](Al )ij =Tr[P^i El (P^j )]/2n=21 Tr[λ^i El (λ^j
   )] → a (4n−1)×(4n−1)(4^n-1)\times(4^n-1)(4n−1)×(4n−1) real-valued matrix,
   → represents unital action of noise channel.

 * (cl)i=tr[P^iEl(I^)]/23n−1=12nTr[λ^iEl(I^(4n−1)×(4n−1))](\bm{c}_l)_i =
   \mathrm{tr}[\hat P_i \mathcal{E}_l(\hat I)] / \sqrt{2^{3n-1}} =
   \frac{1}{2^n}\text{Tr}[\hat\lambda_i \mathcal{E}_l (\hat
   I_{(4^n-1)\times(4^n-1)})](cl )i =tr[P^i El (I^)]/23n−1 =2n1 Tr[λ^i El
   (I^(4n−1)×(4n−1) )]
   → a (4n−1)(4^n-1)(4n−1) real-valued vector,
   → represents non-unital action of noise channel.

72

Now we can substitute J(ρ(θ))−1J(\rho(\bm\theta))^{-1}J(ρ(θ))−1 with
J(E(ρ(θ)))−1J(\mathcal{E}(\rho(\bm\theta)))^{-1}J(E(ρ(θ)))−1 and get:

Var(⟨X^⟩est)≥2n−1NxTJ(E(ρ(θ)))−1x=1N{Tr[E(ρ)  ((A−1)Tx⋅P^)2]−Tr[E(ρ)((A−1)Tx⋅P^)]2}=1N(ΔY^)2\begin{align*}
\text{Var}(\braket{\hat X}^\text{est}) &\geq \frac{2^{n-1}}{N} \bm{x}^T
J(\mathcal{E}(\rho(\bm\theta)))^{-1}\bm{x} \\ &= \frac1N \Bigg\{
\text{Tr}\left[\mathcal{E}(\rho)\; \left((A^{-1})^T\bm{x} \cdot \bm{\hat
P}\right)^2\right] \\ &\hspace{5em}- \text{Tr}\left[\mathcal{E}(\rho)
\left((A^{-1})^T\bm{x} \cdot \bm{\hat P}\right)\right] ^2 \Bigg\} \\ &= \frac1N
(\Delta\hat Y)^2 \end{align*} Var(⟨X^⟩est) ≥N2n−1 xTJ(E(ρ(θ)))−1x=N1
{Tr[E(ρ)((A−1)Tx⋅P^)2]−Tr[E(ρ)((A−1)Tx⋅P^)]2}=N1 (ΔY^)2

where (ΔY^)2≡(\Delta\hat Y)^2 \equiv(ΔY^)2≡ the variance of observable Y^\hat
YY^ of the noisy state E(ρ^)\mathcal{E}(\hat\rho)E(ρ^ ).

73

 * What is Y^\hat YY^? This is an observable which involves the optimal
   estimation strategy. Since Tr[ρ^X^]=Tr[E(ρ^)Y^]\text{Tr}[\hat\rho\hat X] =
   \text{Tr}[\mathcal{E}(\hat\rho) \hat Y]Tr[ρ^ X^]=Tr[E(ρ^ )Y^], we want it to
   satisfy E†(Y^)=X^\mathcal{E}^\dagger(\hat Y) = \hat XE†(Y^)=X^, and since we
   already assume the form of E\mathcal{E}E, we can explicitly express Y^\hat
   YY^ as:
   
   Y^=(−2(n−1)/2(A−1)Tx⋅c)I^+(A−1)Tx⋅P^\hat Y = \left(
   -2^{(n-1)/2}(A^{-1})^T\bm{x} \cdot \bm{c} \right) \hat I +(A^{-1})^T\bm{x}
   \cdot \bm{\hat P} Y^=(−2(n−1)/2(A−1)Tx⋅c)I^+(A−1)Tx⋅P^

 * Y^\hat YY^ can be interpreted as an observable that absorbs the effect of
   noise.

 * Therefore the way to minimize the variance of the unbiased estimator
   ⟨X^⟩est\braket{\hat X}^\text{est}⟨X^⟩est for the noiseless state through the
   measurement of the noisy state is to perform the projection measurement in
   the diagonal basis of the operator Y^\hat YY^.

74


NOISELESS TARGET STATE (NNN-QUBIT)

 * ρ^\hat\rhoρ^ is obtained from applying LLL layers of noiseless unitary gates
   {Ul}l=1L\{\mathcal{U_l}\}_{l=1}^L{Ul }l=1L to the initial state
   ρ^0\hat\rho_0ρ^ 0 .

 * ρ^\hat\rhoρ^ can be parameterized by a general Bloch vector θ\bm\thetaθ:
   ρ^=I^2n+θ⋅P^2n+1\hat \rho = \dfrac{\hat I}{2^n} + \dfrac{\bm\theta \cdot
   \bm{\hat P}}{\sqrt{2^{n+1}}}ρ^ =2nI^ +2n+1 θ⋅P^ with θ∈R4n−1\bm\theta \in
   \mathbb R^{4^n-1}θ∈R4n−1.

 * The coefficient 2(−1−n)/22^{(-1-n)/2}2(−1−n)/2
   
   is somehow related to the general n-level Bloch vector

 * I^≡σ^0⊗n\hat I \equiv \hat\sigma_0^{\otimes n}I^≡σ^0⊗n and
   P^≡{P^i}i=14n−1\bm{\hat P} \equiv \{\hat P_i\}_{i=1}^{4^n-1}P^≡{P^i }i=14n−1
   is the nnn-qubit Pauli group excluding all-identity:
   Pi^∈{σ^0,σ^x,σ^y,σ^z}⊗n\{σ^0}⊗n\hat{P_i} \in \{\hat\sigma_0, \hat\sigma_x,
   \hat\sigma_y, \hat\sigma_z\}^{\otimes n} \backslash \{\hat\sigma_0\}^{\otimes
   n}Pi ^ ∈{σ^0 ,σ^x ,σ^y ,σ^z }⊗n\{σ^0 }⊗n.
   → notice that any Pi2^\hat{P_i^2}Pi2 ^ is just I^2n×2n\hat
   I_{2^n\times2^n}I^2n×2n .

 * assume the preparation of initial state ρ^0\hat\rho_0ρ^ 0 is noiseless.

75


VIRTUAL QUANTUM CIRCUIT

Consider one single sample (one single row):

 * start from initial state ρ^0\hat\rho_0ρ^ 0

 * with LLL layers of unitary+followed by noisy operations
   {El∘Ul}l=1L\{\mathcal{E}_l \circ \mathcal{U}_l\}_{l=1}^L{El ∘Ul }l=1L and
   constraint the noise to be Markovian

 * finally use noiseless POVM M0\mathcal{M}_0M0 to estimate the expectation
   value of a traceless observable X^=x⋅P^\hat X =\bm{x} \cdot \bm{\hat
   P}X^=x⋅P^, the observable is parametrized by x∈R4n−1\bm{x} \in
   \mathbb{R}^{4^n-1}x∈R4n−1.


76

Each noise channel El\mathcal{E}_lEl makes the generalized Bloch vector shrink
plus some non-unital part:

El:θ↦Alθ+cl\mathcal{E}_l: \bm{\theta} \mapsto A_l \bm{\theta} + \bm{c}_l El
:θ↦Al θ+cl

 * (Al)ij=tr[P^iEl(P^j)]/2n(A_l)_{ij} = \mathrm{tr}[\hat P_i \mathcal{E}_l (\hat
   P_j)] / 2^n(Al )ij =tr[P^i El (P^j )]/2n is the unital part of the Pauli
   transfer matrix

 * (cl)i=tr[P^iEl(I^)]/23n−1(\bm{c}_l)_i = \mathrm{tr}[\hat P_i
   \mathcal{E}_l(\hat I)] / \sqrt{2^{3n-1}}(cl )i =tr[P^i El (I^)]/23n−1 is the
   non-unital part of the noise.


LOWER BOUND OF NOISE STRENGTH

 1. for one single El\mathcal{E}_lEl , it causes the generalized Bloch sphere to
    shrink with at least a strength of Γ(El)\Gamma(\mathcal{E}_l)Γ(El )
    
    * Γ(El)≡∥Al∥−1\Gamma(\mathcal{E}_l) \equiv \| A_l \|^{-1}Γ(El )≡∥Al ∥−1 with
      ∥Al∥=max⁡e∈R4n−1∥Ale∥∥e∥\| A_l \| = \max_{\bm{e}\in \mathbb{R}^{4^n-1}}
      \frac{\|A_l\bm{e}\|}{\|\bm{e}\|}∥Al ∥=maxe∈R4n−1 ∥e∥∥Al e∥ where
      ∥e∥=∑i∣ei∣2\|\bm{e}\| = \sum_i |{e_i}|^2∥e∥=∑i ∣ei ∣2

 2. for all El\mathcal{E}_lEl , each of them apply at least a strength of
    γ=min⁡l{Γ(El)}l\gamma = \min_l\{ \Gamma(\mathcal{E}_l) \}_lγ=minl {Γ(El )}l
    of noise to the state.

77


OBJECTIVE OF QEM

To remove the effect of noise channels {El}l=1L\{\mathcal{E}_l\}_{l=1}^L{El
}l=1L , so that we can have an unbiased estimator of X^=x⋅P^\hat X =\bm{x} \cdot
\bm{\hat P}X^=x⋅P^

⟨X^⟩≡tr[ρ^X^]=2n−1θ⋅x\braket{\hat X} \equiv \mathrm{tr}[\hat\rho\hat X] =
\sqrt{2^{n-1}} \bm{\theta}\cdot\bm{x} ⟨X^⟩≡tr[ρ^ X^]=2n−1 θ⋅x

with standard deviation ε\varepsilonε. Construct Virtual Quantum Circuit as the
follows:

 * boosted noise Elm\mathcal{E}_{lm}Elm in the lthl^\text{th}lth layer and
   mthm^\text{th}mth copy such that Γ(Elm)≥Γ(El)≥γ\Gamma(\mathcal{E}_{lm}) \geq
   \Gamma(\mathcal{E}_l) \geq \gammaΓ(Elm )≥Γ(El )≥γ

 * classical register ρ^c,m\hat\rho_{c,m}ρ^ c,m coupled with the system qubits
   via the additional operations Clm\mathcal{C}_{lm}Clm

 * POVM measurement M\mathcal MM performed on the entire set of copies to output
   the estimator of ⟨X^⟩\braket{\hat X}⟨X^⟩


CONSTRAINT ON LOCCS

 * Classical register ρ^c,m\hat\rho_{c,m}ρ^ c,m is initialized with
   probabilistic mixtures of computational bases as
   ρ^c,m=∑ipmi∣i⟩⟨i∣\hat\rho_{c,m} = \sum_i p_{mi}\ket i\bra iρ^ c,m =∑i pmi
   ∣i⟩⟨i∣

 * They are coupled with system via only unitary operations as
   Clm=∑iClmi⊗∣i⟩⟨i∣\mathcal{C}_{lm} = \sum_i \mathcal{C}_{lmi}\otimes\ket i\bra
   iClm =∑i Clmi ⊗∣i⟩⟨i∣ because we’re not allowed to QEC.

78


CALCULATE SLD FISHER INFORMATION MATRIX

First, re-express the mthm^\text{th}mth copy (or mthm^\text{th}mth row) of the
quantum state in the virtual circuit as
Em′(ρ(θ)⊗ρ^c,m)\mathcal{E}'_m(\rho(\bm\theta)\otimes\hat\rho_{c,m})Em′ (ρ(θ)⊗ρ^
c,m ), with Em′\mathcal{E}'_mEm′ being an effective noise channel:

Em′=(ELm∘UL∘C(L−1)m)⋯(E1m∘U1∘C(0)m)∘(U1−1∘⋯∘UL−1)\mathcal{E}'_m =
(\mathcal{E}_{Lm} \circ \mathcal{U}_L \circ \mathcal{C}_{(L-1)m}) \cdots
(\mathcal{E}_{1m} \circ \mathcal{U}_1 \circ \mathcal{C}_{(0)m}) \circ
(\mathcal{U}_1^{-1}\circ \cdots \circ{U}_L^{-1}) Em′ =(ELm ∘UL ∘C(L−1)m )⋯(E1m
∘U1 ∘C(0)m )∘(U1−1 ∘⋯∘UL−1 )

why define it this way? because
ρ^0⊗ρ^c,m=(U1−1∘⋯∘UL−1)ρ(θ)⊗ρ^c,m\hat\rho_0\otimes\hat\rho_{c,m} =
(\mathcal{U}_1^{-1}\circ \cdots
\circ{U}_L^{-1})\rho(\bm\theta)\otimes\hat\rho_{c,m}ρ^ 0 ⊗ρ^ c,m =(U1−1 ∘⋯∘UL−1
)ρ(θ)⊗ρ^ c,m

This compilation allows for the calculation of quantum Fisher information matrix
of the state right before POVM → analyze the SLD Fisher information matrix JJJ
of the quantum state: ⊗m=1NEm′(ρ(θ)⊗ρ^c,m)\otimes_{m=1}^N
\mathcal{E}'_m(\rho(\bm\theta)\otimes\hat\rho_{c,m})⊗m=1N Em′ (ρ(θ)⊗ρ^ c,m ).

79

Note that we have NNN copies in total. (or you can think of it as the sample
complexity)

JJJ can be bounded as:

J≲∑mIΓ(Em′)2≲N1γ2LJ \lesssim \sum_m \frac{I}{\Gamma(\mathcal{E}'_m)^2} \lesssim
N\frac{1}{\gamma^{2L}} J≲m∑ Γ(Em′ )2I ≲Nγ2L1

 * indicates the exponential decay of JJJ with circuit depth (or layer #) LLL
   growth.

 * can combine this with quantum Cramér-Rao bound inequality
   → so we can relate JJJ with the standard deviation ε\varepsilonε of an
   unbiased estimator

80


MAIN RESULTS: COMPLEXITY OF NNN

Prerequisite: If the noise Elm\mathcal{E}_{lm}Elm satisfies for all lll(layers)
and mmm(copies), that:

 * for all ρ^≠σ^\hat\rho \neq \hat \sigmaρ^ =σ^, we can still distinguish them
   apart even after the same noise channel:
   Elm(ρ^)≠Elm(σ^)\mathcal{E}_{lm}(\hat\rho) \neq
   \mathcal{E}_{lm}(\hat\sigma)Elm (ρ^ )=Elm (σ^)
   → this is a necessary condition: quantum information is not completely
   destroyed after noise

 * for all ρ^\hat\rhoρ^ , the state after noise,
   Elm(ρ^)\mathcal{E}_{lm}(\hat\rho)Elm (ρ^ ), is full rank with all eigenvalues
   greater than 0
   → Elm(ρ^)\mathcal{E}_{lm}(\hat\rho)Elm (ρ^ ) is a positive-definite matrix
   → means that after the noise, the variance of all observable for any state is
   non-zero. Hence our sample cost must be greater than 0.

81
Theorem 1: Complexity of NNN

N≥∥x∥2ε2βγ2L,N \geq \frac{\|\bm{x}\|^2}{\varepsilon^2} \beta \gamma^{2L},
N≥ε2∥x∥2 βγ2L,

where β\betaβ is the largest 0<β<10<\beta<10<β<1 such that
Elm(ρ^)−β2nI^≥0\mathcal{E}_{lm}(\hat\rho) - \frac{\beta}{2^n}\hat I \geq 0Elm
(ρ^ )−2nβ I^≥0 for all ρ^\hat\rhoρ^ and all l,ml,ml,m (i have gained some
physical intuition for this term)
→ or β\betaβ can be viewed as “how far away“ the general Bloch sphere is from
the original surface due to the noise.


IMPLICATION

if γ>1\gamma>1γ>1, the cost (NNN) of the unbiased QEM grows exponentially with
the circuit depth LLL no matter how we choose Elm\mathcal{E}_{lm}Elm or
Clm\mathcal{C}_{lm}Clm or POVM.

82


WHEN QUANTUM CIRCUIT SCRAMBLES THE STATE STRONG ENOUGH

then we must incorporate the dependency of nnn for sample complexity into
consideration: (or else our lower bound will look like trash)

> we must pay overhead to eliminate every local noise and thus encounter
> dependence on nnn

Theorem 2: Complexity under the presence of strong entangling gate

Let U1,⋯ ,UL\mathcal{U}_1,\cdots,\mathcal{U}_LU1 ,⋯,UL all be nnn-qubit gate
(which are drawn from a set of random unitary) that form unitary 2-design
(what’s this?) and El\mathcal{E}_lEl be a local noise. Then, there is
exponential growth with both qubit count nnn and depth LLL in the average over
the number of copies NNN required to perform unbiased estimation of
⟨X^⟩\braket{\hat X}⟨X^⟩ over {U1,⋯ ,UL}\{\mathcal{U}_1,\cdots,\mathcal{U}_L\}{U1
,⋯,UL }.

 * To put it simple, if we encounter random or chaotic quantum circuit, then
   system qubit number nnn comes to play a role in the exponential scaling of
   sample complexity, if we want a better/tighter bound.

83


PURE UNITAL NOISE CASE

if Elm(I^2n)=I^2n\mathcal{E}_{lm}(\frac{\hat I}{2^n}) = \frac{\hat I}{2^n}Elm
(2nI^ )=2nI^ , then the cost bound satisfies

N≥∥x∥2ε2(1−(1−β)L)γ2L≈∥x2∥ε2γ2LN \geq \frac{\|\bm{x}\|^2}{\varepsilon^2}
\Big(1-1-\beta^L\Big) \gamma^{2L} \approx \frac{\|\bm{x}^2\|}{\varepsilon^2}
\gamma^{2L} N≥ε2∥x∥2 (1−(1−β)L)γ2L≈ε2∥x2∥ γ2L

84


SPECIAL CASES THAT TIGHTEN THE BOUND


1. GLOBAL DEPOLARIZATION CHANNEL (THIS IS A UNITAL CHANNEL)

assume that the entire unitary gates are followed by the global depolarization
channel

Elm:θ↦(1−plm)θ,\mathcal{E}_{lm}: \bm\theta \mapsto (1-p_{lm})\bm\theta, Elm
:θ↦(1−plm )θ,

since we assume boosted noise in our framework, the error rate is bounded from
below as plm≥pp_{lm} \geq pplm ≥p (so as for the other two types of noise
below), so its minimum noise strength γ=1/(1−p)\gamma = 1/(1-p)γ=1/(1−p) and so
we have β=p\beta = pβ=p, therefore N≈∥x∥2ε2(1−p)−2LN \approx
\frac{\|\bm{x}\|^2}{\varepsilon^2} (1-p)^{-2L}N≈ε2∥x∥2 (1−p)−2L.

 * the effective error rate p′p'p′ for Em′\mathcal{E}'_mEm′ mentioned in
   calculate SLD Fisher Information Matrix.md is then p′=1−(1−p)Lp' =
   1-1-p^Lp′=1−(1−p)L → can directly verify.

85


2. LOCAL DEPOLARIZATION CHANNEL (THIS IS A UNITAL CHANNEL TOO)

Elm=(Elm(0))⊗n,          Elm(0):θ↦(1−plm)θ\mathcal{E}_{lm} =
(\mathcal{E}^{(0)}_{lm})^{\otimes n},\;\;\;\;\;
\mathcal{E}^{(0)}_{lm}:\bm\theta\mapsto(1-p_{lm})\bm\theta Elm =(Elm(0)
)⊗n,Elm(0) :θ↦(1−plm )θ

if moreover the unitary gates of the circuit satisfy the condition: For a random
circuit whose unitary gate is drawn from unitary 2-design (such as nnn-qubit
Clifford group), the bound will be tighten as

E[N]≥O((1+324n4n−1p)nL)\mathbb{E}[N] \geq \mathcal O\left(
\left(1+\frac{3}{2}\frac{4^n}{4^n-1}p\right)^{nL} \right) E[N]≥O((1+23 4n−14n
p)nL)

86


3. LOCAL AMPLITUDE DAMPING CHANNEL (THIS IS A NON-UNITAL CHANNEL)

Elm=(Elm(0))⊗n,          Elm(0):(θx,θy,θz)↦(1−plmθx,1−plmθy,(1−plm)θz+plm)\mathcal{E}_{lm}
= (\mathcal{E}^{(0)}_{lm})^{\otimes n},\;\;\;\;\;
\mathcal{E}^{(0)}_{lm}:(\theta_x, \theta_y, \theta_z) \mapsto
\\(\sqrt{1-p_{lm}}\theta_x, \sqrt{1-p_{lm}}\theta_y, (1-p_{lm})\theta_z +
p_{lm}) Elm =(Elm(0) )⊗n,Elm(0) :(θx ,θy ,θz )↦(1−plm θx ,1−plm θy ,(1−plm )θz
+plm )

again if we satisfy the same condition, the bound will be tighten as

E[N]≥O((1+4n4n−1p)nL)\mathbb{E}[N] \geq \mathcal O\left(
\left(1+\frac{4^n}{4^n-1}p\right)^{nL} \right) E[N]≥O((1+4n−14n p)nL)

87


THANK YOU FOR LISTENING

any question?

88