grouptalk-andy-0521.pages.dev
Open in
urlscan Pro
172.66.44.177
Public Scan
URL:
https://grouptalk-andy-0521.pages.dev/
Submission: On May 11 via api from US — Scanned from DE
Submission: On May 11 via api from US — Scanned from DE
Form analysis
0 forms found in the DOMText Content
(TITLE TO BE DETERMINED) SOME (MAJOR) QUANTUM ERROR MITIGATION TECHNIQUES AND THEIR COSTS arXiv: 2208.09385 arXiv: 2210.00921 based on these 2 reviews Andy Chu May 21, 2024 Group Talk at MeMo TABLE OF CONTENT 1. Quantum Error Mitigation - Definition 2. Error-Mitigated Estimators 3. Trade-offs between Bias & Variance 4. Exponential scaling of Sampling overhead 5. Zero-Noise Extrapolation 📄, 📄 6. Probabilistic Error Cancellation 📄 7. Measurement Error Mitigation 📄 8. Purity Constraints 9. Subspace Expansions 📄, 📄, 📄 10. Quantum Fisher Information 11. Multi-parametric QFI and re-parametrization 12. Applying Cramér-Rao Bound 13. Virtual Quantum Circuit 14. Main Results: Complexity of NNN Enter fullscreenGo to previous slideGo to next slideShow slide overviewSwitch to dark mode theme Show drawing toolbar Presenter Mode Adjust settings 1 / 88 Draw with stylusDraw a lineDraw an arrowDraw an ellipseDraw a rectangleErase Adjust stroke width Set brush color Set brush color Set brush color Set brush color Set brush color Set brush color Set brush color UndoRedoDelete Pin drawing (TITLE TO BE DETERMINED) SOME (MAJOR) QUANTUM ERROR MITIGATION TECHNIQUES AND THEIR COSTS arXiv: 2208.09385 arXiv: 2210.00921 based on these 2 reviews Andy Chu May 21, 2024 Group Talk at MeMo 1 TABLE OF CONTENT 1. Quantum Error Mitigation - Definition 2. Error-Mitigated Estimators 3. Trade-offs between Bias & Variance 4. Exponential scaling of Sampling overhead 5. Zero-Noise Extrapolation 📄, 📄 6. Probabilistic Error Cancellation 📄 7. Measurement Error Mitigation 📄 8. Purity Constraints 9. Subspace Expansions 📄, 📄, 📄 10. Quantum Fisher Information 11. Multi-parametric QFI and re-parametrization 12. Applying Cramér-Rao Bound 13. Virtual Quantum Circuit 14. Main Results: Complexity of NNN 2 A SNEAK PEEK OF IBM’S QUANTUM ERROR MITIGATION ROUTINE in next slide 3 4 A SNEAK PEEK OF PENNYLANE’S QUANTUM ERROR MITIGATION METHODS (ZNE FOR EXAMPLE) in next slide 5 6 QUANTUM ERROR MITIGATION - DEFINITION * Algorithmic schemes that reduce the noise-induced bias in the expectation value. * By post-processing outputs from an ensemble of circuit runs. * Using circuits at the same noise level as the original unmitigated circuit or above. ERROR MITIGATION IS NOT ERROR CORRECTION * QEM will only reduce the effective damage due to noise for the whole ensemble of circuit runs. * But zooming into each individual circuit run, the circuit noise level remains unchanged or even increases. * Different from QEC, which aims to reduce the effect of noise on the output in every single circuit run. 7 QUANTUM ERROR MITIGATION - DESIRED FEATURE * the mitigation method should ideally only require a very modest qubit overhead to remain practical on current and near-term quantum hardware. * error mitigation techniques should provide an accuracy guarantee for the method. Such a guarantee should ideally provide a formal error bound on the mitigated expectation values that indicate how well the method works at varying levels of noise. * a reliable error mitigation method should require few assumptions (or, no assumptions) about the final state that is prepared by the computation. 8 ERROR-MITIGATED ESTIMATORS * Goal: to estimate the expectation value of some observable of interest OOO → to estimate Tr[ρ0O]\text{Tr}[\rho_0O]Tr[ρ0 O]. * How: we can construct an estimator O^\hat OO^ for estimating OOO. And how good is O^\hat OO^? There are different ways to assess O^\hat OO^: 1. Use prediction intervals calculate the interval within which the outcome of the estimator will fall with a given probability, offering a rigorous bound on the worst-case deviation of the estimator. 2. Use Mean Square Error we choose this one for assessing O^\hat OO^ so that we can see the different factors that contribute to the deviation of the estimator more clearly. MSE[O^]=E[(O^−Tr[ρ0O])2]\text{MSE}[\hat O] = \text{E}\left[\left( \hat O - \text{Tr}[\rho_0O]\right)^2\right]MSE[O^]=E[(O^−Tr[ρ0 O])2] Formally defined goal: to reduce MSE[O^]\text{MSE}[\hat O]MSE[O^] as much as possible, using finite resource. 9 WHAT CONTRIBUTES TO MSE[O^]\TEXT{MSE}[\HAT O]MSE[O^]? It’s convenient to break it into 2 terms: MSE[O^]=(Bias[O^])2+Var[O^]\text{MSE}[\hat O] = \left(\text{Bias}[\hat O]\right)^2 + \text{Var}[\hat O] MSE[O^]=(Bias[O^])2+Var[O^] Each of them defined as: Bias[O^]=E[O^]−Tr[ρ0O]Var[O^]=E[O^2]−(E[O^])2\begin{align*} \text{Bias}[\hat O] &= \text{E}[\hat O] - \text{Tr}[\rho_0O] \\ \text{Var}[\hat O] &= \text{E}\left[\hat O^2\right] - \left(\text{E}[\hat O]\right)^2 \end{align*} Bias[O^]Var[O^] =E[O^]−Tr[ρ0 O]=E[O^2]−(E[O^])2 10 SIMPLEST ESTIMATOR O‾Ρ\OVERLINE{O}_\RHOOΡ (JUST DON’T USE QEM) > Although we are given a noisy state ρ\rhoρ instead of ρ0\rho_0ρ0 , we still > measure OOO on that ρ\rhoρ, and repeat for NcirN_\text{cir}Ncir times > (effectively running the noisy circuit NcirN_\text{cir}Ncir times), so we will > have NcirN_\text{cir}Ncir independent O^ρ\hat O_\rhoO^ρ now. Finally, the way > of constructing O‾ρ\overline{O}_\rhoOρ is to simply take average of these > O^ρ\hat O_\rhoO^ρ . > → the more NcirN_\text{cir}Ncir we run, the higher the quality of > O‾ρ\overline{O}_\rhoOρ : MSE[O‾ρ]=(Tr[ρO]−Tr[ρ0O]⏟Bias[O‾ρ]=Bias[O^ρ])2+Tr[ρO2]−Tr[ρO]2Ncir⏟Var[O‾ρ]=Var[O^ρ]/Ncir\text{MSE}[\overline{O}_\rho]= (\underbrace{\text{Tr}[\rho O] - \text{Tr}[\rho_0 O]}_{\text{Bias}[\overline{O}_\rho]=\text{Bias}[\hat{O}_\rho]})^2 + \underbrace{\frac{\text{Tr}\left[\rho O^2\right] - \text{Tr}[\rho O]^2}{N_\text{cir}}}_{\text{Var}[\overline{O}_\rho] = \text{Var}[\hat{O}_\rho]/N_\text{cir}} MSE[Oρ ]=(Bias[Oρ ]=Bias[O^ρ ]Tr[ρO]−Tr[ρ0 O] )2+Var[Oρ ]=Var[O^ρ ]/Ncir Ncir Tr[ρO2]−Tr[ρO]2 * the second term is related to shot noise, and will reduce as we increase NcirN_\text{cir}Ncir . * so eventually MSE[O‾ρ]\text{MSE}[\overline{O}_\rho]MSE[Oρ ] will be limited by the first term (the bias) of the estimator Bias[O‾ρ]\text{Bias}[\overline{O}_\rho]Bias[Oρ ] (its magnitude, to be clear.) because this a systematic error that cannot be reduced by increasing the number of circuit runs. 11 WHAT IF WE USE QEM? We can construct O‾em\overline{O}_\text{em}Oem (error-mitigated) using the same number of circuit runs NcirN_\text{cir}Ncir to achieve: * smaller bias than the original naive estimator O‾ρ\overline{O}_\rhoOρ : magnitude of Bias[O‾em]≤\text{Bias}[\overline{O}_\text{em}] \leqBias[Oem ]≤ magnitude of Bias[O‾ρ]\text{Bias}[\overline{O}_\rho]Bias[Oρ ]. * but usually, inevitably, larger variance: Var[O‾em]≥Var[O‾ρ]\text{Var}[\overline{O}_\text{em}] \geq \text{Var}[\overline{O}_\rho]Var[Oem ]≥Var[Oρ ], because we usually extract the useful part from the noise to reduce bias, so the error-mitigated estimator is also more sensitive to noise. 12 TRADE-OFFS BETWEEN BIAS & VARIANCE You can choose either: * quickly converging QEM method, with large residual error. * or more costly QEM method, but more accurate. 13 QUANTIFING THE TRADE-OFF: How much more circuit runs do we need for O‾em\overline{O}_\text{em}Oem to have variance of same level of original noisy O‾ρ\overline{O}_\rhoOρ ? DEFINE SAMPLING OVERHEAD: Cem=Nshotϵ(O^em)Nshotϵ(O^ρ)=Var[O^em]Var[O^ρ]C_\text{em} = \frac{N^\epsilon_\text{shot}(\hat O_\text{em})}{N^\epsilon_\text{shot}(\hat O_\rho)} = \frac{\text{Var}[\hat O_\text{em}]}{\text{Var}[\hat O_\rho]} Cem =Nshotϵ (O^ρ )Nshotϵ (O^em ) =Var[O^ρ ]Var[O^em ] here we post-define O^em\hat O_\text{em}O^em to be a 'one-shot' error-mitigated estimator satisfying E[O^em]=E[O‾em]\text{E}[\hat O_\text{em}] = \text{E}[\overline{O}_\text{em}]E[O^em ]=E[Oem ] and Var[O^em]=Ncir×Var[O‾em]\text{Var}[\hat{O}_\text{em}] =N_\text{cir}\times\text{Var}[\overline{O}_\text{em}]Var[O^em ]=Ncir ×Var[Oem ] 14 AN ALTERNATIVE WAY TO QUANTIFY: If we know the range of a random variable X^\hat XX^ (the difference between the maximum and minimum possible values taken by X^\hat XX^), denoted as R[X^]R[\hat X]R[X^], we can plug this range into Hoeffding’s inequality to get the sample size required: To sufficiently guarantee an "estimation of E[X^]E[\hat X]E[X^]" to ϵ\epsilonϵ-precision with (1−δ)(1-\delta)(1−δ) probability, we need: Nhffϵ,δ(X^)=ln(2/δ)2ϵ2R[X^]2N^{\epsilon,\delta}_\text{hff}(\hat X) = \frac{\ln(2/\delta)}{2\epsilon^2} R[\hat X]^2 Nhffϵ,δ (X^)=2ϵ2ln(2/δ) R[X^]2 This way our previously defined sampling overhead can be approximated as: Cem≈Nhffϵ,δ(O^em)Nhffϵ,δ(O^ρ)=R[O^em]2R[O^ρ]2C_\text{em} \approx \frac{N^{\epsilon,\delta}_\text{hff}(\hat O_\text{em})}{N^{\epsilon,\delta}_\text{hff}(\hat O_\rho)} = \frac{R[\hat O_\text{em}]^2}{R[\hat O_\rho]^2} Cem ≈Nhffϵ,δ (O^ρ )Nhffϵ,δ (O^em ) =R[O^ρ ]2R[O^em ]2 15 MODELING THE ERRORS Similar to QEC, we model noise as discrete, probabilistic events called faults, and faults can occur at various locations in the circuits, locations include gates, idling steps, and measurements. QUANTIFYING THE ERRORS Assume all locations are afflicted with Pauli noise, and the Pauli noise at location fff occurs with pfp_fpf probability. * if error events at all locations are independent, then the probability of no faults occurring at all is P0=Πf(1−pf)P_0 = \Pi_f (1-p_f)P0 =Πf (1−pf ) → fault-free probability of the circuit. 16 AN ALTERNATIVE WAY TO QUANTIFY: Run the circuit one time, see how many fault occurs. Then average this number over many circuit runs. This final average is defined as circuit fault rate λ=∑fpf\lambda = \sum_f p_fλ=∑f pf . * simple case: all pfp_fpf are the same (ppp), then λ=Mp\lambda = Mpλ=Mp if MMM locations in total. If MMM is large, we can model “kkk errors occur during one run” using Poisson Distribution: Pk=e−λλk/k!P_k = e^{-\lambda}\lambda^k/k!Pk =e−λλk/k! Then the fault-free probability is just the special case where k=0k=0k=0, P0=e−λP_0 = e^{-\lambda}P0 =e−λ. > All of the arguments will still apply if we define (1−pf1-p_f1−pf ) as the > coefficient of the error-free part of the Kraus representation of some general > noise. More generally, we can always apply Pauli twirling to transform all > noise to Pauli noise such that our arguments become valid. 17 EXPONENTIAL SCALING OF SAMPLING OVERHEAD > INEVITABILITY > > Even we can always magically distinguish all perfect circuit runs (happens > with P0=e−λP_0 = e^{-\lambda}P0 =e−λ) from the rest “flawed-runs”, we still > need to run exponentially times of the circuit to get enough number of > “perfect-runs” to reach ϵ\epsilonϵ-level shot noise. As of today, we don’t have magic. So instead of impractically extracting the error-free state ρ0\rho_0ρ0 , we focus on how to construct Error-mitigated estimators. QEC VS QEM For a given noisy circuit with circuit fault rate λ\lambdaλ: * QEC: active correction → reduces λ\lambdaλ in each circuit run * QEM: relies on “post-processing the outputs” from an ensemble of circuit runs, each run with circuit fault rate ≥λ\geq\lambda≥λ. 18 NOW WE INTRODUCE 5 COMMON-SEEN ERROR MITIGATION TECHNIQUES 19 ZERO-NOISE EXTRAPOLATION 📄, 📄 If we can tune our circuit fault rate → obtain a series of different state ρλ\rho_\lambdaρλ at different λ\lambdaλ → so we can view Tr[ρλO]\text{Tr}[\rho_\lambda O]Tr[ρλ O] as a function of λ\lambdaλ → then the ideal expectation value happens when we substitute λ=0\lambda=0λ=0 into the function. FORMAL DEFINITION: 1. Model the trend of the data using a parametrized function f(λ;θ⃗)f(\lambda; \vec\theta)f(λ;θ) 2. Once we’re done fitting, we get a set of optimal parameters θ⃗∗\vec\theta^*θ∗ 3. The error-mitigated estimate of the zero-noise output is then E[O^em]≡f(0;θ⃗∗)\text{E}[\hat O_\text{em}] \equiv f(0; \vec\theta^*)E[O^em ]≡f(0;θ∗). 20 FOR SUFFICIENTLY SMALL Λ\LAMBDAΛ (USUALLY IT IS THE CASE) → RICHARDSON EXTRAPOLATION We can fit the dataset with some “truncated Taylor expansion”: Tr[ρλO]≈f(λ;θ⃗)=∑k=0M−1θkλkk!\text{Tr}[\rho_{\lambda} O] \approx f(\lambda; \vec\theta) = \sum_{k=0}^{M-1} \theta_k \frac{\lambda^k}{k!} Tr[ρλ O]≈f(λ;θ)=k=0∑M−1 θk k!λk which is a (M−1)th(M-1)^\text{th}(M−1)th order polynomial with MMM coefficients (or degree of freedom). → simplest case: M=2M=2M=2, it’s just linear extrapolation → if we first assign a value for MMM (related to how sophisticated we want to model fff), then we need to obtain at least MMM data points (in the spirit of Lagrange polynomial) to obtain our error-mitigated estimate: E[O^em]=θ0∗=∑m=1M(Tr[ρλmO]∏k≠mλk−0λk−λm)\text{E}[\hat O_\text{em}] = \theta_0^* = \sum_{m=1}^M \left(\text{Tr}[\rho_{\lambda_m}O] \prod_{k\neq m} \frac{\lambda_k-0}{\lambda_k-\lambda_m} \right) E[O^em ]=θ0∗ =m=1∑M Tr[ρλm O]k=m∏ λk −λm λk −0 21 * Performance: the bias of E[O^em]\text{E}[\hat O_\text{em}]E[O^em ] originates from our omission of higher degree terms in f(λ;θ⃗)f(\lambda; \vec\theta)f(λ;θ), so the bias is of magnitude Bias[O^em]=O(λM)\text{Bias}[\hat O_\text{em}] = \mathcal{O}(\lambda^M)Bias[O^em ]=O(λM) for MMM data points. * Sampling overhead: if we use Monte Carlo method (because E[O^em]\text{E}[\hat O_\text{em}]E[O^em ] is a linear combination of the set of noisy expectation values), the overhead is: Cem∼(∑m=1M∣∏k≠mλk−0λk−λm∣)2C_\text{em} \sim \left( \sum_{m=1}^M \left| \prod_{k\neq m} \frac{\lambda_k-0}{\lambda_k-\lambda_m} \right| \right)^2 Cem ∼ m=1∑M k=m∏ λk −λm λk −0 2 22 HOW DO WE "SCALE THE NOISE" ON QUANTUM HARDWARE? in next slide: including global/local unitary folding, identity insertion, pulse stretching… etc 23 24 WHAT OTHER FUNCTIONS CAN WE USE FOR FITTING THE {NOISE, EXPECTATION VALUE} DATASET? (OTHER THAN RICHARDSON) in next slide: see Extrapolation methods: Factory objects section 25 26 PROBABILISTIC ERROR CANCELLATION 📄 Key idea origin: → we can write the noise-free expectation value as a linear combination of noisy expectation values. → this is a quasi-probability decomposition, → can be sampled using Monte Carlo procedure. FEATURE: → can fully remove the bias of expectation values of generic quantum circuits (Bias[O^em]=0\text{Bias}[\hat O_\text{em}]=0Bias[O^em ]=0) → but at the expense of a sampling overhead CemC_\text{em}Cem that grows exponentially with the circuit fault rate λ\lambdaλ. 27 FORMAL DEFINITION * We want to implement an ideal channel U\mathcal UU, but with a set of noisy basis operations {Bn}\{\mathcal B_n\}{Bn } because we can only perform {Bn}\{\mathcal B_n\}{Bn } on our hardware. * For sufficiently large basis, we can decompose the ideal channel U=∑nαnBn\mathcal U = \sum_n\alpha_n \mathcal B_nU=∑n αn Bn with real coefficients αn\alpha_nαn . * αn\alpha_nαn can often be negative → this can often be an unphysical map, but this doesn’t matter. * Because we can express our ideal expectation value of any observable OOO as Tr[Oρ]=⟨ ⟨O∣ρ⟩ ⟩\text{Tr}[O\rho ] = \langle\!\langle O \vert \rho \rangle\!\rangleTr[Oρ]=⟨⟨O∣ρ⟩⟩, and decompose it into: ⟨ ⟨O∣ρ⟩ ⟩=⟨ ⟨O∣U∣ρ0⟩ ⟩=∑nαn⟨ ⟨O∣Bn∣ρ0⟩ ⟩\langle\!\langle O \vert \rho \rangle\!\rangle = \langle\!\langle O \vert \mathcal U \vert \rho _0 \rangle\!\rangle = \sum_n\alpha_n\langle\!\langle O \vert \mathcal B_n \vert \rho _0 \rangle\!\rangle ⟨⟨O∣ρ⟩⟩=⟨⟨O∣U∣ρ0 ⟩⟩=n∑ αn ⟨⟨O∣Bn ∣ρ0 ⟩⟩ * So in real-world we can sample each ⟨ ⟨O∣Bn∣ρ0⟩ ⟩\langle\!\langle O \vert \mathcal B_n \vert \rho _0 \rangle\!\rangle⟨⟨O∣Bn ∣ρ0 ⟩⟩ term with probability ∣αn∣/γ|\alpha_n|/\gamma∣αn ∣/γ where γ=∑nαn\gamma = \sum_n\alpha_nγ=∑n αn . And don’t forget to multiply the term by sign(αn)⋅γ\text{sign}(\alpha_n)\cdot\gammasign(αn )⋅γ before adding all up. * Monte Carlo would usually do the work. The sampling overhead in this case is Cem∼γ2=(∑n∣αn∣)2C_\text{em}\sim \gamma^2 = (\sum_n|\alpha_n|)^2Cem ∼γ2=(∑n ∣αn ∣)2. 28 Loading slide... 29 A SIMPLE EXAMPLE Suppose our basis operations only suffer from single-qubit depolarizing channel ρ↦(1−p)ρ+p3XρX+p3YρY+p3ZρZ\rho \mapsto (1-p)\rho + \frac p3 X\rho X+ \frac p3 Y\rho Y+ \frac p3 Z\rho Zρ↦(1−p)ρ+3p XρX+3p YρY+3p ZρZ. Then our ideal Hadamard gate H\text HH can be expanded into noisy operations: (H)ideal=α1(H)noisy+α2(XH)noisy+α3(YH)noisy+α4(ZH)noisy\begin{align*} (\text H)_\text{ideal} &= \alpha_1 (\text H)_\text{noisy} \\ &+\alpha_2 (\text X\text H)_\text{noisy} \\ &+\alpha_3 (\text Y\text H)_\text{noisy} \\ &+\alpha_4 (\text Z\text H)_\text{noisy} \end{align*} (H)ideal =α1 (H)noisy +α2 (XH)noisy +α3 (YH)noisy +α4 (ZH)noisy * If our noise level p=0.1p=0.1p=0.1, then (α1,α2,α3,α4)=(2926,−126,−126,−126)≈(1.115,−0.038,−0.038,−0.038)(\alpha_1, \alpha_2, \alpha_3, \alpha_4) = (\frac{29}{26}, \frac{-1}{26}, \frac{-1}{26}, \frac{-1}{26}) \approx (1.115, -0.038, -0.038, -0.038)(α1 ,α2 ,α3 ,α4 )=(2629 ,26−1 ,26−1 ,26−1 )≈(1.115,−0.038,−0.038,−0.038). * If our noise level p=0.2p=0.2p=0.2, then (α1,α2,α3,α4)=(1411,−111,−111,−111)≈(1.273,−0.091,−0.091,−0.091)(\alpha_1, \alpha_2, \alpha_3, \alpha_4) = (\frac{14}{11}, \frac{-1}{11}, \frac{-1}{11}, \frac{-1}{11}) \approx (1.273, -0.091, -0.091, -0.091)(α1 ,α2 ,α3 ,α4 )=(1114 ,11−1 ,11−1 ,11−1 )≈(1.273,−0.091,−0.091,−0.091). > We only want pure noiseless -[H]- in the end, with no other noiseless > components, that is, α1(1−p)+(α2+α3+α4)(p/3)=1\alpha_1(1-p) + > (\alpha_2+\alpha_3+\alpha_4)(p/3) = 1α1 (1−p)+(α2 +α3 +α4 )(p/3)=1 and > α1(p/3)+α2(1−p)=0\alpha_1(p/3) + \alpha_2(1-p) = 0α1 (p/3)+α2 (1−p)=0, > (similar for α3\alpha_3α3 and α4\alpha_4α4 ). > We can see from this simple case that: for higher noise-level, the higher the > one-norm γ=∣α1∣+∣α2∣+∣α3∣+∣α4∣\gamma = > |\alpha_1|+|\alpha_2|+|\alpha_3|+|\alpha_4|γ=∣α1 ∣+∣α2 ∣+∣α3 ∣+∣α4 ∣, > indicating higher sampling overhead ∼γ2\sim\gamma^2∼γ2. 30 MEASUREMENT ERROR MITIGATION 📄 IDEAL MEASUREMENT * measure the circuit → obtain binary string output x∈{0,1}Nx\in \{0,1\}^Nx∈{0,1}N * ideally: want to perform projective measurements in the computational basis {⟨ ⟨x∣}\{\langle\!\langle x\vert\}{⟨⟨x∣}. REAL-WORLD MEASUREMENT: * some measurement noise A\mathcal AA occurs, and turn the projective measurement into POVM {⟨ ⟨Ex∣}={⟨ ⟨x∣A∣}\{\langle\!\langle E_x\vert\} =\{\langle\!\langle x\vert \mathcal A\vert\}{⟨⟨Ex ∣}={⟨⟨x∣A∣} → leads to a different output statistics > physical origins of readout errors > > * superconducting: thermal excitations, T1T_1T1 -decay. > * ion traps: hard to detect an ion’s dark state, collisions in the trap → it’s better (or sufficient) to construct a simplified noise model that is agnostic of the actual origin of the noise. 31 Loading slide... 32 * The AxyA_{xy}Axy in previous slide is the assignment matrix which can be explained as: * The coefficient ⟨ ⟨x∣A∣y⟩ ⟩\langle\!\langle x\vert \mathcal A\vert y \rangle\!\rangle⟨⟨x∣A∣y⟩⟩ represents the probability of originally measuring outcome yyy before noise, but measuring outcome xxx after the noise. * so AAA is a transition matrix (or "Stochastic matrix" in Bill’s Physics of Computation Lecture) * How to actually get the coefficient ⟨ ⟨x∣A∣y⟩ ⟩\langle\!\langle x\vert \mathcal A\vert y \rangle\!\rangle⟨⟨x∣A∣y⟩⟩? Ans: We just do the noisy measurement on the computational state ∣y⟩ ⟩\vert y \rangle\!\rangle∣y⟩⟩, and see what’s the probability of outcome xxx. * If matrix AAA is full-rank, then we can do the reverse: maps noisy outcome back to noiseless outcome ⟨ ⟨y∣=∑xAyx−1⟨ ⟨Ex∣\langle\!\langle y\vert = \sum_x A_{yx}^{-1}\langle\!\langle E_x\vert⟨⟨y∣=∑x Ayx−1 ⟨⟨Ex ∣ → we restore the perfect outcome by linear-combining the noisy outcomes. (somewhat like the spirit in Probabilistic Error Cancellation. 33 RESTORING IDEAL EXPECTATION VALUE * For our incoming state ρ\rhoρ: * noiseless measurement gives output distribution p0⃗={⟨ ⟨y∣ρ⟩ ⟩}\vec{p_0} = \{\langle\!\langle y\vert \rho \rangle\!\rangle\}p0 ={⟨⟨y∣ρ⟩⟩} * noisy measurement gives output distribution pnoisy⃗={⟨ ⟨Ex∣ρ⟩ ⟩}\vec{p_\text{noisy}} = \{\langle\!\langle E_x\vert \rho \rangle\!\rangle\}pnoisy ={⟨⟨Ex ∣ρ⟩⟩} → pnoisy⃗=Ap0⃗\vec{p_\text{noisy}} = A \vec{p_0}pnoisy =Ap0 , hence p0⃗=A−1pnoisy⃗\vec{p_0} = A^{-1} \vec{p_\text{noisy}}p0 =A−1pnoisy . * If we want to measure on an observable OOO which has spectrum O⃗={Ox}\vec O = \{O_x\}O={Ox } (i.e., ⟨ ⟨O∣=∑xOx⟨ ⟨x∣\langle\!\langle O\vert = \sum_x O_x \langle\!\langle x\vert⟨⟨O∣=∑x Ox ⟨⟨x∣), once we know (or obtain) the assignment matrix AAA and the noisy output distribution pnoisy⃗\vec{p_\text{noisy}}pnoisy , we can restore the noiseless expectation value: ⟨ ⟨O∣ρ⟩ ⟩=O⃗Tp0⃗=O⃗TA−1pnoisy⃗\langle\!\langle O\vert \rho \rangle\!\rangle = \vec{O}^T \vec{p_0} = \vec{O}^T A^{-1} \vec{p_\text{noisy}} ⟨⟨O∣ρ⟩⟩=OTp0 =OTA−1pnoisy 34 MAKE AAA EASIER TO OBTAIN (ASSUMING NNN QUBITS) * Why? Because normal scaling of AAA when NNN increases: exponentially. * Simplest way: make additional assumption that measurement errors of different qubits are not correlated → AAA is just tensor product of each qubit’s → A=⨂n=1NAnA = \bigotimes_{n=1}^N A_nA=⨂n=1N An . * But this is just too simplified, actually previous experiments can see correlated bit-flips. 35 WORKAROUND Bravyi et al try to construct the assignment matrix using continuous-time Markov processes with the generators (transition rate matrices) {Gi}\{G_i\}{Gi } being single and two-qubit operators: A=eG with G=∑i=12N2riGiA = e^G \;\text{ with }\; G = \sum_{i=1}^{2N^2} r_iG_i A=eG with G=i=1∑2N2 ri Gi * Why summing to 2N22N^22N2? Because for generators we can just pick {σx,σy}\{\sigma_x, \sigma_y\}{σx ,σy } therefore # of single-qubit operator → 2N2N2N, # of two-qubit operator → N(N−1)2⋅22\frac{N(N-1)}{2} \cdot 2^22N(N−1) ⋅22 (pick 222 from NNN qubits, and assign each of them either σx\sigma_xσx or σy\sigma_yσy ). Adding the two #s up we get 2N22N^22N2. * Once we’ve obtain all 2N22N^22N2 positive coefficients {ri}\{r_i\}{ri }, then A−1A^{-1}A−1 is simply e−Ge^{-G}e−G. 36 INTRODUCING PAULI TWIRLING - SIMPLIFYING THE PROCESS 📄, 📄 After twirling, we can turn the original noise channel A\mathcal AA into a Pauli channel D\mathcal DD, i.e., by using twirling we can remove all the off-diagonal elements of AAA (in Pauli-basis) * before twirling: ideally we want to measure in ⟨ ⟨σzx∣\langle\!\langle \sigma_z^x\vert⟨⟨σzx ∣ (xxx is the bit string marking the qubits that are acted by σz\sigma_zσz ), but because of the noise, we can only measure in ⟨ ⟨σzx∣A\langle\!\langle \sigma_z^x\vert\mathcal A⟨⟨σzx ∣A. * after twirling: A\mathcal AA becomes D\mathcal DD, and what’s special about it is that ⟨ ⟨σzx∣D=Dx⟨ ⟨σzx\langle\!\langle \sigma_z^x\vert\mathcal D = D_x\langle\!\langle \sigma_z^x⟨⟨σzx ∣D=Dx ⟨⟨σzx where Dx=⟨ ⟨σzx∣D∣σzx⟩ ⟩D_x = \langle\!\langle \sigma_z^x\vert\mathcal D \vert\sigma_z^x\rangle\!\rangleDx =⟨⟨σzx ∣D∣σzx ⟩⟩ → the noisy measurement is just a rescale of the noiseless measurement (due to the fact that the twirled channel D\mathcal DD is diagonal in Pauli-basis) For a given input state ∣ρ⟩ ⟩\vert \rho \rangle\!\rangle∣ρ⟩⟩, we have simply: ⟨ ⟨σzx∣ρ⟩ ⟩⏟ideal=Dx−1⟨ ⟨σzx∣D∣ρ⟩ ⟩⏟noisy, but also twirled\underbrace{\langle\!\langle \sigma_z^x\vert \rho \rangle\!\rangle}_{\text{ideal}} = D_x^{-1} \underbrace{\langle\!\langle \sigma_z^x\vert\mathcal D\vert \rho \rangle\!\rangle}_{\text{noisy, but also twirled}} ideal⟨⟨σzx ∣ρ⟩⟩ =Dx−1 noisy, but also twirled⟨⟨σzx ∣D∣ρ⟩⟩ It’s just a rescale to restore ideal expectation value! What’s more awesome is that: for Pauli twirling in this case, we only need to conjugate the original noise channel A\mathcal AA with {I,σx}\{I, \sigma_x\}{I,σx }instead of {I,σx,σy,σz}\{I, \sigma_x, \sigma_y, \sigma_z\}{I,σx ,σy ,σz } because conjugating A\mathcal AA with σy\sigma_yσy or σz\sigma_zσz will only have trivial effect within the computational subspace. 37 REAL-WORLD IMPLEMENTATION (IBM QISKIT FOR EXAMPLE) in next slide: expand the Resilience Level 1 bullet point 38 39 As measurement noise is a major obstacle in almost all experimental set-ups, implementation of measurement error mitigation is almost ubiquitous in all near-term experiments. (A\mathcal AA is sometimes called: confusion matrix) 40 A SNEAK PEEK OF A\MATHCAL AA’S EXPLICIT FORM FOR A 2-QUBIT SYSTEM in next slide (confusion matrix) 41 42 PURITY CONSTRAINTS ORIGIN * Many quantum algorithms target the preparation of an ideal state ρ0\rho_0ρ0 that is pure: ρ0=∣ψ0⟩⟨ψ0∣\rho_0 = \ket{\psi_0}\bra{\psi_0}ρ0 =∣ψ0 ⟩⟨ψ0 ∣. * Common-seen noise channels are stochastic, which will turn our ideal pure state ρ0\rho_0ρ0 into some noisy mixed state ρ\rhoρ. * So our goal is to find (or approximate) the closest pure state to ρ\rhoρ. * We can do spectral decomposition of NNN-qubit state: ρ=∑i=12Npi∣ϕi⟩⟨ϕi∣\rho = \sum_{i=1}^{2^N} p_i \ket{\phi_i}\bra{\phi_i}ρ=∑i=12N pi ∣ϕi ⟩⟨ϕi ∣ where the eigenvalues are in order (p1≥p2≥p3≥⋯p_1\geq p_2 \geq p_3 \geq \cdotsp1 ≥p2 ≥p3 ≥⋯), then our goal is to seek for ∣ϕ1⟩⟨ϕ1∣\ket{\phi_1}\bra{\phi_1}∣ϕ1 ⟩⟨ϕ1 ∣. * There’s already a way to do this: quantum Principal Component Analysis → sample from the eigenbasis of ρ\rhoρ. But, the circuit associated with this method is too deep for near-term devices. → need alternative strategies. 43 VIRTUAL DISTILLATION / ERROR SUPPRESSION BY DERANGEMENT ( VD/ ESD) * By collective measuring MMM copies of ρ\rhoρ (which is noisy) to infer the expectation values w.r.t. the MthM^\text{th}Mth degree purified state: ρpur(M)=ρMTr[ρM]=1∑i=12NpiM∑i=12NpiM∣ϕi⟩⟨ϕi∣\rho_\text{pur}^{(M)} = \frac{\rho^M}{\text{Tr}[\rho^M]} = \frac{1}{\sum_{i=1}^{2^N} p_i^M } \sum_{i=1}^{2^N} p_i^M \ket{\phi_i}\bra{\phi_i} ρpur(M) =Tr[ρM]ρM =∑i=12N piM 1 i=1∑2N piM ∣ϕi ⟩⟨ϕi ∣ * As M→∞M\rightarrow\inftyM→∞, our ρpur(M)→∣ϕ1⟩⟨ϕ1∣\rho_\text{pur}^{(M)} \rightarrow\ket{\phi_1}\bra{\phi_1}ρpur(M) →∣ϕ1 ⟩⟨ϕ1 ∣. The remaining bias in the estimator is called coherent mismatch or noise floor. * How do we measure to obtain Tr[Oρpur(M)]\text{Tr}[O\rho_\text{pur}^{(M)}]Tr[Oρpur(M) ]? Notice from the above formula that we have: Tr[Oρpur(M)]=Tr[OρM]Tr[ρM]\text{Tr}[O\rho_\text{pur}^{(M)}] = \frac{\text{Tr}[O\rho^M]}{\text{Tr}[\rho^M]} Tr[Oρpur(M) ]=Tr[ρM]Tr[OρM] → so we can separately measure for Tr[OρM]\text{Tr}[O\rho^M]Tr[OρM] and Tr[ρM]\text{Tr}[\rho^M]Tr[ρM] and divide one by another. 44 HOW TO MEASURE TR[OΡM]\TEXT{TR}[O\RHO^M]TR[OΡM] ? (and Tr[ρM]\text{Tr}[\rho^M]Tr[ρM] of course) Let SMS_MSM be the cyclic permutation operator between MMM copies of ρ\rhoρ, then Tr[OρM]\text{Tr}[O\rho^M]Tr[OρM] is just Tr[SMOmρ⊗M]\text{Tr}[S_MO_m\rho^{\otimes M}]Tr[SM Om ρ⊗M], or Tr[SMOˉρ⊗M]\text{Tr}[S_M\bar O\rho^{\otimes M}]Tr[SM Oˉρ⊗M] where Oˉ=1M∑mOm\bar O = \frac1M\sum_mO_mOˉ=M1 ∑m Om (is the observable symmetrized under copy permutation.), case for M=3M=3M=3: , and for Tr[ρM]\text{Tr}[\rho^M]Tr[ρM], we just simply remove OOO from the above graph. 45 Do we really need to do global measurement (because of the global operator SMS_MSM )? → it can be decomposed into transversal operations among different copies: SM=⨂n=1NS~M(n)S_M = \bigotimes_{n=1}^N \tilde{S}_M^{(n)}SM =⨂n=1N S~M(n) , where S~M(n)\tilde{S}_M^{(n)}S~M(n) cyclically permute the nthn^\text{th}nth qubits of all copies. > moreover, if OOO is only a single-qubit observable: then the symmetrized > Oˉ\bar OOˉ commutes with all S~M(n)\tilde{S}_M^{(n)}S~M(n) . So we don’t need > to do global measurement, but instead measuring S~M(n)\tilde{S}_M^{(n)}S~M(n) > and OmO_mOm for all m,nm,nm,n (while this is many measurements, but they are > all low-weight operators), then post-process to get > Tr[SMOˉρ⊗M]\text{Tr}[S_M\bar O\rho^{\otimes M}]Tr[SM Oˉρ⊗M]. case for S~M=2(n)\tilde{S}_{M=2}^{(n)}S~M=2(n) shown on the right 46 SUBSPACE EXPANSIONS 📄, 📄, 📄 One class of methods that can take advantage of the knowledge of the problem are quantum subspace expansion techniques. PROBLEM FORMULATION * e.g., optimization or state preparation, can be phrased as the desire to find some ∣ψ⟩\ket{\psi}∣ψ⟩ to minimize the objective function ⟨ψ∣H∣ψ⟩\bra{\psi} H \ket{\psi}⟨ψ∣H∣ψ⟩, with some already-known HHH. * but we can not directly prepare perfect ∣ψ⟩\ket{\psi}∣ψ⟩ (due to some coherent error in the circuit, or just “don’t knowing what the ideal circuit should be“) * good news is that: we can easily prepare a set of MMM different states {∣ϕi⟩}\{\ket{\phi_i}\}{∣ϕi ⟩}, which are linearly independent (but not necessary orthogonal to each other). And express our desired state as a linear combination of them: ∣ψw⃗⟩=∑i=0M−1wi∣ϕi⟩\ket{\psi_{\vec w}} = \sum_{i=0}^{M-1} w_i\ket{\phi_i} ∣ψw ⟩=i=0∑M−1 wi ∣ϕi ⟩ 47 * Dual problem: it’s equivalent to finding the optimal set of coefficients w⃗∗\vec w^*w∗ such that: w⃗∗=argminw⃗⟨ψw⃗∣H∣ψw⃗⟩ s.t. ⟨ψw⃗∣ψw⃗⟩=1\vec w^*= \arg\min_{\vec w} \bra{\psi_{\vec w}} H \ket{\psi_{\vec w}} \;\; \text{ s.t. }\;\; \braket{\psi_{\vec w}|\psi_{\vec w}} =1 w∗=argwmin ⟨ψw ∣H∣ψw ⟩ s.t. ⟨ψw ∣ψw ⟩=1 * This is a generalized linear eigenvalue problem of pair (Hˉ,Sˉ)(\bar H, \bar S)(Hˉ,Sˉ): HˉW=SˉWE\bar H W = \bar S WE HˉW=SˉWE , where Hˉij=⟨ϕi∣H∣ϕj⟩\bar H_{ij} = \bra{\phi_i} H \ket{\phi_j}Hˉij =⟨ϕi ∣H∣ϕj ⟩ and Sˉij=⟨ϕi∣ϕj⟩\bar S_{ij} = \braket{\phi_i | \phi_j}Sˉij =⟨ϕi ∣ϕj ⟩. To speak in English, Hˉ\bar HHˉ is just HHH, but in the {∣ϕi⟩}\{\ket{\phi_i}\}{∣ϕi ⟩}-basis (hence with dimension M×MM\times MM×M); and Sˉ\bar SSˉ is just the overlap matrix for the basis set {∣ϕi⟩}\{\ket{\phi_i}\}{∣ϕi ⟩}. generalized linear eigenvalue problem Eigenvalue problem: Aϕ⃗i=λiϕ⃗i\bm A \vec\phi_i = \lambda_i \vec\phi_iAϕ i =λi ϕ i , or in matrix form: AΦ=ΦΛ\bm A \bm\Phi = \bm\Phi \bm\LambdaAΦ=ΦΛ. (the lambda matrix Λ\bm\LambdaΛ is diagonal) Generalized linear eigenvalue problem: Aϕ⃗i=λiBϕ⃗i\bm A \vec\phi_i = \lambda_i \bm B\vec\phi_iAϕ i =λi Bϕ i , or in matrix form: AΦ=BΦΛ\bm A \bm\Phi = \bm B \bm\Phi \bm\LambdaAΦ=BΦΛ. → this problem reduces to the previous simplier one when B=identity matrix\bm B = \text{identity matrix}B=identity matrix. 48 ILLUSTRATION OF SUBSPACE EXPANSION ROUTINE in next slide: calculating Code Hamiltonian Matrix Hˉij=⟨ϕi∣H∣ϕj⟩\bar H_{ij} = \bra{\phi_i} H \ket{\phi_j}Hˉij =⟨ϕi ∣H∣ϕj ⟩ and Overlap Matrix Sˉij=⟨ϕi∣ϕj⟩\bar S_{ij} = \braket{\phi_i | \phi_j}Sˉij =⟨ϕi ∣ϕj ⟩ 49 Loading slide... 50 FINDING W⃗∗\VEC W^*W∗ * We know HHH, and we also know {∣ϕi⟩}\{\ket{\phi_i}\}{∣ϕi ⟩}, so we can formulate the above problem explicitly. * And this problem (generalized linear eigenvalue problem) can be solved classically (on classical computer), obtaining the WWW and EEE matrices (called "eigenvectors matrices" and "eigenvalue matrices") on classical computer. * Then, what we want is the lowest eigenvalue in EEE (not yet!), and its associated eigenvector (in WWW) is the optimal combination of coefficients w⃗∗\vec w^*w∗. 51 UTILIZING W⃗∗\VEC W^*W∗ TO IMPROVE OOO’S EXPECTATION VALUE The improved (or mitigated) expectation value for some observable OOO with respect to our new found state ∣ψw⃗∗⟩\ket{\psi_{\vec w^*}}∣ψw∗ ⟩ is just: ⟨ψw⃗∗∣O∣ψw⃗∗⟩=∑i,j=0M−1wi∗wj∗⟨ϕi∣O∣ϕj⟩\bra{\psi_{\vec w^*}} O \ket{\psi_{\vec w^*}} = \sum_{i,j=0}^{M-1} w_i^* w_j^* \bra{\phi_i} O \ket{\phi_j} ⟨ψw∗ ∣O∣ψw∗ ⟩=i,j=0∑M−1 wi∗ wj∗ ⟨ϕi ∣O∣ϕj ⟩ → we can evaluate this value via directly using w⃗∗\vec w^*w∗, combined with measuring ⟨ϕi∣O∣ϕj⟩\bra{\phi_i} O \ket{\phi_j}⟨ϕi ∣O∣ϕj ⟩ (which we can do easily). → don’t need to explicitly prepare ∣ψw⃗∗⟩\ket{\psi_{\vec w^*}}∣ψw∗ ⟩. > * the more “complete” the basis set {∣ϕi⟩}\{\ket{\phi_i}\}{∣ϕi ⟩}, the more > ideal our ∣ψw⃗∗⟩\ket{\psi_{\vec w^*}}∣ψw∗ ⟩ could be. > * but as our basis set grows, it consumes more and more resource when we are > solving generalized linear eigenvalue problem on classical computer → grows > exponentially. > * hence the key is to choosing the most suitable {∣ϕi⟩}\{\ket{\phi_i}\}{∣ϕi > ⟩}. So usually the first basis (∣ϕ0⟩\ket{\phi_0}∣ϕ0 ⟩) in the set would be > chosen as the best state we can prepare. 52 NEXT UP What is the cost (sample complexity) of these techniques? 53 QUANTUM FISHER INFORMATION * p(x∣λ)p(x|\lambda)p(x∣λ) → conditional probability of getting xxx as the measurement result when the parameter is λ\lambdaλ, then the Fisher Information is: F(λ)=∫p(x∣λ)(∂ln(p(x∣λ))∂λ)2dx=∫1p(x∣λ)(∂p(x∣λ)∂λ)2dxF(\lambda) = \int p(x|\lambda) \left( \frac{\partial \ln(p(x|\lambda))}{\partial \lambda} \right)^2 dx = \int \frac{1}{p(x|\lambda)} \left( \frac{\partial p(x|\lambda)}{\partial \lambda} \right)^2 dx F(λ)=∫p(x∣λ)(∂λ∂ln(p(x∣λ)) )2dx=∫p(x∣λ)1 (∂λ∂p(x∣λ) )2dx * Now introduce POVM {Πx}\{\Pi_x\}{Πx } which obeys completeness: ∫Πxdx=I\int\Pi_xdx=I∫Πx dx=I, and for our later convenience, define the symmetric logarithmic derivative (SLD) LλL_\lambdaLλ implicitly as the Hermitian operator satisfying: Lλρλ+ρλLλ2=∂ρλ∂λ\frac{L_\lambda\rho_\lambda + \rho_\lambda L_\lambda}{2} = \frac{\partial\rho_\lambda}{\partial\lambda} 2Lλ ρλ +ρλ Lλ =∂λ∂ρλ * the naming origin of SLD: symmetric: Lλρλ+ρλLλL_\lambda\rho_\lambda + \rho_\lambda L_\lambdaLλ ρλ +ρλ Lλ → obvious. logarithmic derivative: divide both sides by ρλ\rho_\lambdaρλ we kind of get Lλ:=∂lnρλ∂λL_\lambda := \frac{\partial\ln\rho_\lambda}{\partial\lambda}Lλ :=∂λ∂lnρλ * This is a Lyapunov matrix equation with general solution available. 54 Loading slide... 55 * This Fisher Information is ultimately upper-bounded by some perfect POVM (in the sense of maximizing F(λ)F(\lambda)F(λ)) by: F(λ)≤∫∣tr[ρΠL]tr[ρΠ]∣2dx=∫∣tr[ρΠtr[ρΠ]ΠLρ]∣2dx≤∫tr[ΠLρL]dx=tr[LρL]=tr[ρL2]≡H(λ)\begin{align*} F(\lambda) &\leq \int\left|\frac{\text{tr}[\rho\Pi L]}{\sqrt{\text{tr}[\rho\Pi ]}}\right|^2 dx \\ &= \int \left| \text{tr}\left[ \frac{\sqrt\rho\sqrt\Pi}{\sqrt{\text{tr}[\rho\Pi]}} \sqrt\Pi L\sqrt\rho\right] \right|^2 dx \\ &\leq \int \text{tr}[\Pi L\rho L] dx \\ &= \text{tr}[L\rho L] \\ &= \text{tr}[\rho L^2] \equiv H(\lambda) \end{align*} F(λ) ≤∫ tr[ρΠ] tr[ρΠL] 2dx=∫ tr[tr[ρΠ] ρ Π Π Lρ ] 2dx≤∫tr[ΠLρL]dx=tr[LρL]=tr[ρL2]≡H(λ) 56 * In the previous slide, the first inequality saturates when tr[ρΠL]\text{tr}[\rho\Pi L]tr[ρΠL] is a real number for any λ\lambdaλ, and the second inequality comes from the Schwartz inequality ∣tr[A†B]∣2≤tr[A†A]tr[B†B]|\text{tr}[A^\dagger B]|^2 \leq \text{tr}[A^\dagger A] \text{tr}[B^\dagger B]∣tr[A†B]∣2≤tr[A†A]tr[B†B], with A†≡ρΠ/tr[ρΠ]A^\dagger\equiv \sqrt\rho\sqrt\Pi/\sqrt{\text{tr}[\rho\Pi]}A†≡ρ Π /tr[ρΠ] and B≡ΠLρB\equiv \sqrt\Pi L\sqrt\rhoB≡Π Lρ , and it’s saturated when Πρtr[ρΠ]=ΠLρtr[ρΠL]\frac{\sqrt\Pi\sqrt\rho}{\text{tr}[\rho\Pi]} = \frac{\sqrt\Pi L\sqrt\rho}{\text{tr}[\rho\Pi L]}tr[ρΠ]Π ρ =tr[ρΠL]Π Lρ for any λ\lambdaλ. <- this is satisfied iff {Πx}\{\Pi_x\}{Πx } is made by the set of projectors over the eigenstates of LλL_\lambdaLλ , which, in turn, represents the optimal POVM to estimate λ\lambdaλ. * The above chain of inequalities prove that the Fisher information F(λ)F(\lambda)F(λ) of any quantum measurement is bounded by the so-called Quantum Fisher Information H(λ)≡tr[ρL2]=tr[∂ρ∂λL]H(\lambda) \equiv \text{tr}[\rho L^2] = \text{tr}[\frac{\partial\rho}{\partial\lambda}L]H(λ)≡tr[ρL2]=tr[∂λ∂ρ L]. * You can view Quantum Fisher Information as the supremum of the classical Fisher Information over all observables. 57 LYAPUNOV MATRIX EQUATION The general solution to Lλρλ+ρλLλ2=∂ρλ∂λ\frac{L_\lambda\rho_\lambda + \rho_\lambda L_\lambda}{2} = \frac{\partial\rho_\lambda}{\partial\lambda}2Lλ ρλ +ρλ Lλ =∂λ∂ρλ for the SLD LλL_\lambdaLλ may be written as: Lλ=2∫0∞e−ρt∂ρ∂λe−ρtdtL_\lambda = 2\int_0^\infty e^{-\rho t }\frac{\partial\rho}{\partial \lambda}e^{-\rho t } dt Lλ =2∫0∞ e−ρt∂λ∂ρ e−ρtdt Now if we write ρ\rhoρ in its eigenbasis ∑nρn∣ψn⟩⟨ψn∣\sum_n\rho_n\ket{\psi_n}\bra{\psi_n}∑n ρn ∣ψn ⟩⟨ψn ∣, the SLD easily becomes: (you cen verify this by direct substitution ) Lλ=2∑m,n⟨ψm∣∂ρ∂λ∣ψn⟩ρm+ρn∣ψm⟩⟨ψn∣L_\lambda = 2\sum_{m,n} \frac{\bra{\psi_m}\frac{\partial\rho}{\partial\lambda}\ket{\psi_n}}{\rho_m+\rho_n}\ket{\psi_m}\bra{\psi_n} Lλ =2m,n∑ ρm +ρn ⟨ψm ∣∂λ∂ρ ∣ψn ⟩ ∣ψm ⟩⟨ψn ∣ where the sum only includes terms with ρn+ρm≠0\rho_n+\rho_m\neq0ρn +ρm =0. From this we can express our Quantum Fisher Information H(λ)H(\lambda)H(λ) as: (to verified this, just use the dummy index trick) H(λ)=tr[ρL2]=4∑m,nρm∣⟨ψm∣∂ρ∂λ∣ψn⟩∣2(ρm+ρn)2=2∑m,n∣⟨ψm∣∂ρ∂λ∣ψn⟩∣2ρm+ρnH(\lambda) = \text{tr}[\rho L^2] = 4\sum_{m,n} \rho_m \frac{\left|\bra{\psi_m}\frac{\partial\rho}{\partial\lambda}\ket{\psi_n}\right|^2}{(\rho_m+\rho_n)^2} = 2\sum_{m,n} \frac{\left|\bra{\psi_m}\frac{\partial\rho}{\partial\lambda}\ket{\psi_n}\right|^2}{\rho_m+\rho_n} H(λ)=tr[ρL2]=4m,n∑ ρm (ρm +ρn )2 ⟨ψm ∣∂λ∂ρ ∣ψn ⟩ 2 =2m,n∑ ρm +ρn ⟨ψm ∣∂λ∂ρ ∣ψn ⟩ 2 58 USUAL FORM OF QUANTUM FISHER INFORMATION H(Λ)H(\LAMBDA)H(Λ) If the dynamics of ρ\rhoρ is governed by some Hamiltonian-like operator AAA, that is, ρ=(e−iλA)(ρ0)(eiλA)\rho = (e^{-i\lambda A}) (\rho_0 )(e^{i\lambda A})ρ=(e−iλA)(ρ0 )(eiλA), the Quantum Fisher Information reduces to: H(λ)=2∑m,n(ρm−ρn)2ρm+ρn∣⟨ψm∣A∣ψn⟩∣2H(\lambda) = 2\sum_{m,n} \frac{(\rho_m-\rho_n)^2}{\rho_m+\rho_n} \left|\bra{\psi_m}A\ket{\psi_n}\right|^2 H(λ)=2m,n∑ ρm +ρn (ρm −ρn )2 ∣⟨ψm ∣A∣ψn ⟩∣2 * Proof: ⟨ψm∣∂ρ∂λ∣ψn⟩=⟨ψm∣(−iA)ρ+ρ(iA)∣ψn⟩=⟨ψm∣((−iA)∑mρn∣ψn⟩⟨ψn∣+∑mρn∣ψn⟩⟨ψn∣(iA))∣ψn⟩=−iρn⟨ψm∣A∣ψn⟩+iρm⟨ψm∣A∣ψn⟩ ⟹ ∣⟨ψm∣∂ρ∂λ∣ψn⟩∣2=(ρm−ρn)2⋅∣⟨ψm∣A∣ψn⟩∣2\begin{align*} &\bra{\psi_m}\frac{\partial\rho}{\partial\lambda}\ket{\psi_n} \\ &= \bra{\psi_m} (-iA)\rho + \rho (iA) \ket{\psi_n} \\ &= \bra{\psi_m} \left((-iA)\sum_m\rho_n\ket{\psi_n}\bra{\psi_n} + \sum_m\rho_n\ket{\psi_n}\bra{\psi_n} (iA)\right) \ket{\psi_n} \\ &= -i\rho_n \bra{\psi_m}A \ket{\psi_n} + i\rho_m \bra{\psi_m}A \ket{\psi_n} \\ &\implies\left|\bra{\psi_m}\frac{\partial\rho}{\partial\lambda}\ket{\psi_n}\right|^2 = (\rho_m-\rho_n)^2 \cdot \left| \bra{\psi_m}A \ket{\psi_n}\right|^2 \end{align*} ⟨ψm ∣∂λ∂ρ ∣ψn ⟩=⟨ψm ∣(−iA)ρ+ρ(iA)∣ψn ⟩=⟨ψm ∣((−iA)m∑ ρn ∣ψn ⟩⟨ψn ∣+m∑ ρn ∣ψn ⟩⟨ψn ∣(iA))∣ψn ⟩=−iρn ⟨ψm ∣A∣ψn ⟩+iρm ⟨ψm ∣A∣ψn ⟩⟹ ⟨ψm ∣∂λ∂ρ ∣ψn ⟩ 2=(ρm −ρn )2⋅∣⟨ψm ∣A∣ψn ⟩∣2 59 60 EXPANSION OF SLD We can further expand ∂ρ/∂λ\partial\rho/\partial\lambda∂ρ/∂λ in its eigenbasis as ∂ρ∂λ=∑n∂ρn∂λ∣ψn⟩⟨ψn∣+ρn∣∂ψn∂λ⟩⟨ψn∣+ρn∣ψn⟩⟨∂ψn∂λ∣\frac{\partial\rho}{\partial\lambda} = \sum_n \frac{\partial\rho_n}{\partial\lambda} \ket{\psi_n}\bra{\psi_n} + \rho_n \ket{\frac{\partial\psi_n}{\partial\lambda}} \bra{\psi_n} + \rho_n\ket{\psi_n} \bra{\frac{\partial\psi_n}{\partial\lambda}} ∂λ∂ρ =n∑ ∂λ∂ρn ∣ψn ⟩⟨ψn ∣+ρn ∣∂λ∂ψn ⟩⟨ψn ∣+ρn ∣ψn ⟩⟨∂λ∂ψn ∣ but since SLD is defined only on the support of ρ\rhoρ and that both the eigenvalues ρk\rho_kρk and the eigenvectors ∣ψn⟩\ket{\psi_n}∣ψn ⟩ may depend on the parameter λ\lambdaλ, we may moreover expand ∣ψn⟩\ket{\psi_n}∣ψn ⟩ in some arbitrary basis {∣k⟩}\{\ket k\}{∣k⟩} that is independent of λ\lambdaλ. If we do this expansion for ⟨ψn∣ψm⟩=δnm\braket{\psi_n | \psi_m} = \delta_{nm}⟨ψn ∣ψm ⟩=δnm , we get (but this will not be used in the later context tbh) ⟨ψn∣ψm⟩=∑k,l(ψnk∗⟨k∣)(ψml∣l⟩)=∑kψnk∗ψmk=δnm\braket{\psi_n | \psi_m} = \sum_{k,l} (\psi_{nk}^* \bra{k} ) (\psi_{ml} \ket l) = \sum_k \psi_{nk}^*\psi_{mk}=\delta_{nm} ⟨ψn ∣ψm ⟩=k,l∑ (ψnk∗ ⟨k∣)(ψml ∣l⟩)=k∑ ψnk∗ ψmk =δnm 61 On the other hand, from product rule we have ∂⟨ψm∣ψn⟩∂λ=⟨∂ψm∂λ∣ψn⟩+⟨ψm∣∂ψn∂λ⟩=∂δmn∂λ=0\frac{\partial\braket{\psi_m|\psi_n} }{\partial\lambda} = \braket{\frac{\partial\psi_m}{\partial\lambda} | \psi_n} + \braket{\psi_m|\frac{\partial\psi_n}{\partial\lambda}} = \frac{\partial \delta_{mn}}{\partial\lambda} = 0 ∂λ∂⟨ψm ∣ψn ⟩ =⟨∂λ∂ψm ∣ψn ⟩+⟨ψm ∣∂λ∂ψn ⟩=∂λ∂δmn =0 therefore using the fact that ⟨∂ψm∂λ∣ψn⟩=−⟨ψm∣∂ψn∂λ⟩\braket{\frac{\partial\psi_m}{\partial\lambda} | \psi_n} =-\braket{\psi_m|\frac{\partial\psi_n}{\partial\lambda}}⟨∂λ∂ψm ∣ψn ⟩=−⟨ψm ∣∂λ∂ψn ⟩, we can rewrite the matrix form of LλL_\lambdaLλ in the SLD general solution as: Lλ=2∑m,n⟨ψm∣∂ρ∂λ∣ψn⟩ρm+ρn∣ψm⟩⟨ψn∣=∑n∂ρn∂λ∣ψn⟩⟨ψn∣ρn +2∑m≠n(ρn⟨ψm∣∂ψn∂λ⟩+ρm⟨∂ψm∂λ∣ψn⟩)∣ψm⟩⟨ψn∣ρm+ρn=∑n∂ρn∂λ∣ψn⟩⟨ψn∣ρn+2∑m≠nρn−ρmρn+ρm⟨ψm∣∂ψn∂λ⟩∣ψm⟩⟨ψn∣\begin{align*} L_\lambda &= 2\sum_{m,n} \frac{\bra{\psi_m}\frac{\partial\rho}{\partial\lambda}\ket{\psi_n}}{\rho_m+\rho_n}\ket{\psi_m}\bra{\psi_n} \\ &= \sum_n \frac{\partial\rho_n}{\partial\lambda}\frac{\ket{\psi_n}\bra{\psi_n}}{\rho_n} \\ &\;\;\;+ 2\sum_{m\neq n} \left( \rho_n \braket{\psi_m|\frac{\partial\psi_n}{\partial\lambda}} + \rho_m\braket{\frac{\partial\psi_m}{\partial\lambda} | \psi_n} \right)\frac{\ket{\psi_m}\bra{\psi_n}}{\rho_m+\rho_n} \\ &= \sum_n \frac{\partial\rho_n}{\partial\lambda}\frac{\ket{\psi_n}\bra{\psi_n}}{\rho_n} + 2\sum_{m\neq n} \frac{\rho_n-\rho_m}{\rho_n+\rho_m} \braket{\psi_m|\frac{\partial\psi_n}{\partial\lambda}} \ket{\psi_m}\bra{\psi_n} \end{align*} Lλ =2m,n∑ ρm +ρn ⟨ψm ∣∂λ∂ρ ∣ψn ⟩ ∣ψm ⟩⟨ψn ∣=n∑ ∂λ∂ρn ρn ∣ψn ⟩⟨ψn ∣ +2m=n∑ (ρn ⟨ψm ∣∂λ∂ψn ⟩+ρm ⟨∂λ∂ψm ∣ψn ⟩)ρm +ρn ∣ψm ⟩⟨ψn ∣ =n∑ ∂λ∂ρn ρn ∣ψn ⟩⟨ψn ∣ +2m=n∑ ρn +ρm ρn −ρm ⟨ψm ∣∂λ∂ψn ⟩∣ψm ⟩⟨ψn ∣ 62 and hence the Quantum Fisher Information is: H(λ)=∑n1ρn(∂ρn∂λ)2+2∑n≠mσnm∣⟨ψm∣∂ψn∂λ⟩∣2H(\lambda) = \sum_n \frac{1}{\rho_n} (\frac{\partial\rho_n}{\partial\lambda})^2 + 2\sum_{n\neq m} \sigma_{nm} \left|\braket{\psi_m|\frac{\partial\psi_n}{\partial\lambda}}\right|^2 H(λ)=n∑ ρn 1 (∂λ∂ρn )2+2n=m∑ σnm ⟨ψm ∣∂λ∂ψn ⟩ 2 where σnm=(ρn−ρm)2ρn+ρm+(any anti-symmetric term)\sigma_{nm} = \dfrac{(\rho_n-\rho_m)^2}{\rho_n+\rho_m} + \text{(any anti-symmetric term)}σnm =ρn +ρm (ρn −ρm )2 +(any anti-symmetric term). * The first term in the above H(λ)H(\lambda)H(λ) represents the classical Fisher information of the distribution {ρn}\{\rho_n\}{ρn }. * The second term arises from the dependence of ρλ\rho_\lambdaρλ ’s eigenvectors on λ\lambdaλ. The corresponding optimal quantum estimator is then: Oλ=∑n(λ+1ρn∂ρn∂λ)∣ψn⟩⟨ψn∣+2H(λ)∑n≠mρn−ρmρn+ρm⟨ψm∣∂ψn∂λ⟩∣ψm⟩⟨ψn∣O_\lambda = \sum_n(\lambda+\frac{1}{\rho_n}\frac{\partial\rho_n}{\partial\lambda}) \ket{\psi_n}\bra{\psi_n} \\ \hspace{3em} + \frac{2}{H(\lambda)} \sum_{n\neq m} \frac{\rho_n-\rho_m}{\rho_n+\rho_m} \braket{\psi_m|\frac{\partial\psi_n}{\partial\lambda}} \ket{\psi_m}\bra{\psi_n} Oλ =n∑ (λ+ρn 1 ∂λ∂ρn )∣ψn ⟩⟨ψn ∣+H(λ)2 n=m∑ ρn +ρm ρn −ρm ⟨ψm ∣∂λ∂ψn ⟩∣ψm ⟩⟨ψn ∣ 63 MULTI-PARAMETRIC QFI AND RE-PARAMETRIZATION When our quantum states ρ\rhoρ depends on a set of parameters λ={λi},i=1,2,⋯ ,N\bm\lambda = \{\lambda_i\}, i=1,2,\cdots,Nλ={λi },i=1,2,⋯,N, our Quantum Fisher Information changes from a scalar into a matrix: H(λ)ij=Tr[ρLiLj+LjLi2]=Tr[∂ρ∂λiLj]=Tr[∂ρ∂λjLi]=∑n(∂iρn)(∂jρn)ρn+∑n≠m(ρm−ρn)2ρm+ρn[⟨ψn∣∂iψm⟩⟨∂jψm∣ψn⟩+⟨ψn∣∂jψm⟩⟨∂iψm∣ψn⟩]\begin{align*} \bm H(\bm\lambda)_{ij} &= \text{Tr}\left[ \rho \frac{L_iL_j + L_jL_i}{2} \right] = \text{Tr}[\frac{\partial\rho}{\partial\lambda_i}L_j] = \text{Tr}[\frac{\partial\rho}{\partial\lambda_j}L_i] \\ &= \sum_n \frac{(\partial_i\rho_n)(\partial_j\rho_n)}{\rho_n} + \sum_{n\neq m} \frac{(\rho_m-\rho_n)^2}{\rho_m+\rho_n} \left[ \braket{\psi_n|\partial_i\psi_m} \braket{\partial_j\psi_m|\psi_n} + \braket{\psi_n|\partial_j\psi_m} \braket{\partial_i\psi_m|\psi_n} \right] \end{align*} H(λ)ij =Tr[ρ2Li Lj +Lj Li ]=Tr[∂λi ∂ρ Lj ]=Tr[∂λj ∂ρ Li ]=n∑ ρn (∂i ρn )(∂j ρn ) +n=m∑ ρm +ρn (ρm −ρn )2 [⟨ψn ∣∂i ψm ⟩⟨∂j ψm ∣ψn ⟩+⟨ψn ∣∂j ψm ⟩⟨∂i ψm ∣ψn ⟩] 64 The Cramer-Rao theorem for multi-parameter estimation says that the inverse of the Fisher matrix provides a lower bound on the covariance matrix Cov[γ]ij=⟨λiλj⟩−⟨λi⟩⟨λj⟩\text{Cov}[\bm\gamma]_{ij} = \braket{\lambda_i\lambda_j} - \braket{\lambda_i}\braket{\lambda_j}Cov[γ]ij =⟨λi λj ⟩−⟨λi ⟩⟨λj ⟩, that is, Cov[γ]≥1NH(λ)−1\text{Cov}[\bm\gamma] \geq \frac{1}{N} \bm H(\bm\lambda)^{-1} Cov[γ]≥N1 H(λ)−1 * this is a matrix inequality, hence the corresponding bound may not be achievable in a multi-parameter estimation. * but on the other hand, the diagonal elements of the inverse Fisher matrix provide achievable bounds for the variances of single parameter estimators at fixed value of the others, which is Var(λi)=γii≥1N(H−1)ii\text{Var}(\lambda_i) = \gamma_{ii} \geq \frac{1}{N} (\bm H^{-1})_{ii}Var(λi )=γii ≥N1 (H−1)ii . 65 * Now suppose our quantity of interest is a known function of the set of parameters → f(λ)f(\bm\lambda)f(λ). Then in this case we need to re-parametrize our quantum states with a new set of parameters λ→λ~\bm\lambda \rightarrow \tilde{\bm\lambda}λ→λ~ that includes the quantity of interest. (for example: let λ1~≡f(λ)\tilde{\lambda_1} \equiv f(\bm\lambda)λ1 ~ ≡f(λ)) * Since ∂μ~=∑νBμν∂ν\tilde{\partial_\mu} = \sum_\nu B_{\mu\nu} \partial_\nu∂μ ~ =∑ν Bμν ∂ν where Bμν=∂λν/∂λμ~B_{\mu\nu} = \partial\lambda_\nu/\partial\tilde{\lambda_\mu}Bμν =∂λν /∂λμ ~ , our new SLD and QFI matrix can be expressed as Lμ~=∑νBμνLν\tilde{L_\mu} = \sum_\nu B_{\mu\nu} L_\nuLμ ~ =∑ν Bμν Lν and H~=BHBT\tilde{\bm H} = \bm B\bm H\bm B^TH~=BHBT hence H~−1=(BT)−1H−1B−1\tilde{\bm H}^{-1} = (\bm B^T)^{-1}{\bm H}^{-1}{\bm B}^{-1}H~−1=(BT)−1H−1B−1. * So if our new parameter set has only one parameter f(λ)f(\bm\lambda)f(λ), then B=[∂λ1∂f,∂λ2∂f,⋯ ,∂λr∂f]\bm B = [\frac{\partial\lambda_1}{\partial f}, \frac{\partial\lambda_2}{\partial f},\cdots,\frac{\partial\lambda_r}{\partial f}]B=[∂f∂λ1 ,∂f∂λ2 ,⋯,∂f∂λr ] and therefore B−1=∇λ(f(λ))\bm B^{-1} = \bm\nabla_{\bm{\lambda}}(f(\bm\lambda))B−1=∇λ (f(λ)). The resulting Cramér-Rao inequality has the form: Var(f(λ)est)≥1N⋅∇λ(f(λ))T⋅H(ρ(λ))−1⋅∇λ(f(λ))\text{Var}(f(\bm\lambda)^\text{est}) \geq \frac1N \cdot \bm\nabla_{\bm{\lambda}}(f(\bm\lambda))^T \cdot \bm{H}(\rho(\bm\lambda))^{-1} \cdot \bm\nabla_{\bm{\lambda}}(f(\bm\lambda)) Var(f(λ)est)≥N1 ⋅∇λ (f(λ))T⋅H(ρ(λ))−1⋅∇λ (f(λ)) 66 APPLYING CRAMÉR-RAO BOUND Our task is now: Given: an unknown, noiseless quantum state ρ\rhoρ which can be parameterized as ρ^(θ)\hat \rho(\bm{\theta})ρ^ (θ) with θ\bm{\theta}θ being the unknown parameters. Goal: to estimate the quantity of f(θ)f(\bm{\theta})f(θ), which is a function of θ\bm{\theta}θ. How: by performing POVM on NNN-many independent copies of ρ^(θ)\hat \rho(\bm{\theta})ρ^ (θ). 67 QUICK RECAP: Variance≥1N⋅{Quantum Fisher Information}\text{Variance} \geq \frac{1}{N\cdot\{\text{Quantum Fisher Information}\}} Variance≥N⋅{Quantum Fisher Information}1 Var(f(θ)est)≥1N⋅∇θ(f(θ))T J(ρ(θ))−1 ∇θ(f(θ))\text{Var}(f(\bm\theta)^\text{est}) \geq \frac 1N \cdot \bm\nabla_{\bm\theta} (f(\bm\theta))^T \; J(\rho(\bm\theta))^{-1} \; \bm\nabla_{\bm\theta} (f(\bm\theta)) Var(f(θ)est)≥N1 ⋅∇θ (f(θ))TJ(ρ(θ))−1∇θ (f(θ)) The quantum Fisher information matrix JJJ of ρ^(θ)\hat\rho(\bm\theta)ρ^ (θ) is defined exactly the same as in Multi-parametric QFI and re-parametrization: [J]ij=12tr[ρ^{L^i,L^j}][J]_{ij} = \frac12 \text{tr} \left[\hat\rho\Big\{ \hat L_i, \hat L_j \Big\}\right] [J]ij =21 tr[ρ^ {L^i ,L^j }] where L^i\hat L_iL^i is the symmetric logarithmic derivative (SLD) operator defined exactly the same as in Quantum Fisher Information: (with slightly changed notation) 12{ρ^,L^i}=∂ρ^∂θi\frac12\Big\{ \hat\rho, \hat L_i \Big\} = \frac{\partial \hat\rho}{\partial \theta_i} 21 {ρ^ ,L^i }=∂θi ∂ρ^ 68 FUNCTION OF GENERAL BLOCH VECTOR Θ\BM{\THETA}Θ In order to apply the bound, we need to calculate Quantum Fisher Information; in order to calculate QFI, we need to first calculate f(θ)f(\bm{\theta})f(θ) using components defined in Noiseless Target state (n-qubit) and Virtual Quantum Circuit: f(θ)=Tr[ρ^X^]=Tr[(I^2n+θ⋅P^2n+1)(x⋅P^)]=0+2n2n+1θ⋅x=2(n−1)/2θ⋅x\begin{align*} f(\bm{\theta}) &= \text{Tr}[\hat\rho\hat X] = \text{Tr}\left[ \left(\dfrac{\hat I}{2^n} + \dfrac{\bm\theta \cdot \bm{\hat P}}{\sqrt{2^{n+1}}}\right) \left(\bm{x} \cdot \bm{\hat P}\right) \right] \\ &= 0 + \frac{2^n}{\sqrt{2^{n+1}}}\bm{\theta}\cdot\bm{x} \\ &= 2^{(n-1)/2}\bm{\theta}\cdot\bm{x} \end{align*} f(θ) =Tr[ρ^ X^]=Tr[(2nI^ +2n+1 θ⋅P^ )(x⋅P^)]=0+2n+1 2n θ⋅x=2(n−1)/2θ⋅x 69 * we can re-factor the Pauli group P^\bm{\hat P}P^ as λ^i≡2(1−n)/2P^i\hat\lambda_i \equiv 2^{(1-n)/2}\hat P_iλ^i ≡2(1−n)/2P^i , which are related to generators of the Lie algebra SU(2n)SU(2^n)SU(2n). * our density matrix (state) then becomes: ρ^(θ)=12nI^+12θ⋅λ^\hat\rho(\bm\theta) = \frac{1}{2^n}\hat I + \frac12\bm\theta\cdot\hat{\bm\lambda} ρ^ (θ)=2n1 I^+21 θ⋅λ^ * the corresponding state after noise channel is: E(ρ^(θ))=12nI^+12(Aθ+c)⋅λ^\mathcal{E}(\hat\rho(\bm\theta)) = \frac{1}{2^n}\hat I + \frac12(A\bm\theta+c)\cdot\hat{\bm\lambda} E(ρ^ (θ))=2n1 I^+21 (Aθ+c)⋅λ^ 70 VARIANCE OF NOISELESS STATE The bound for Var(⟨X^⟩est)\text{Var}(\braket{\hat X}^\text{est})Var(⟨X^⟩est) satisfies: Var(⟨X^⟩est)≥1N⋅(2(n−1)/2 xT)J(ρ(θ))−1(2(n−1)/2 x)=2n−1NxTJ(ρ(θ))−1x=1N(ΔX^)2\begin{align*} \text{Var}(\braket{\hat X}^\text{est}) &\geq \frac1N \cdot \left( 2^{(n-1)/2}\; \bm{x}^T\right) J(\rho(\bm\theta))^{-1} \left( 2^{(n-1)/2}\; \bm{x}\right) \\ &= \frac{2^{n-1}}{N} \bm{x}^T J(\rho(\bm\theta))^{-1} \bm{x} \\ &= \frac1N (\Delta\hat X)^2 \end{align*} Var(⟨X^⟩est) ≥N1 ⋅(2(n−1)/2xT)J(ρ(θ))−1(2(n−1)/2x)=N2n−1 xTJ(ρ(θ))−1x=N1 (ΔX^)2 where (ΔX^)2≡(\Delta\hat X)^2 \equiv(ΔX^)2≡ the variance of observable X^\hat XX^ of the noiseless state ρ^\hat\rhoρ^ . * The equality in the above equation can be achieved when we do optimal POVM, which is the projection measurement in the diagonal basis of operator X^\hat XX^. 71 VARIANCE OF NOISY STATE Now suppose we add noise channel with the form described in Virtual Quantum Circuit, El:θ↦Alθ+cl\mathcal{E}_l: \bm{\theta} \mapsto A_l \bm{\theta} + \bm{c}_lEl :θ↦Al θ+cl where its two components: * (Al)ij=Tr[P^iEl(P^j)]/2n=12Tr[λ^iEl(λ^j)](A_l)_{ij} = \mathrm{Tr}[\hat P_i \mathcal{E}_l (\hat P_j)] / 2^n = \frac12\text{Tr}[\hat\lambda_i \mathcal{E}_l (\hat\lambda_j)](Al )ij =Tr[P^i El (P^j )]/2n=21 Tr[λ^i El (λ^j )] → a (4n−1)×(4n−1)(4^n-1)\times(4^n-1)(4n−1)×(4n−1) real-valued matrix, → represents unital action of noise channel. * (cl)i=tr[P^iEl(I^)]/23n−1=12nTr[λ^iEl(I^(4n−1)×(4n−1))](\bm{c}_l)_i = \mathrm{tr}[\hat P_i \mathcal{E}_l(\hat I)] / \sqrt{2^{3n-1}} = \frac{1}{2^n}\text{Tr}[\hat\lambda_i \mathcal{E}_l (\hat I_{(4^n-1)\times(4^n-1)})](cl )i =tr[P^i El (I^)]/23n−1 =2n1 Tr[λ^i El (I^(4n−1)×(4n−1) )] → a (4n−1)(4^n-1)(4n−1) real-valued vector, → represents non-unital action of noise channel. 72 Now we can substitute J(ρ(θ))−1J(\rho(\bm\theta))^{-1}J(ρ(θ))−1 with J(E(ρ(θ)))−1J(\mathcal{E}(\rho(\bm\theta)))^{-1}J(E(ρ(θ)))−1 and get: Var(⟨X^⟩est)≥2n−1NxTJ(E(ρ(θ)))−1x=1N{Tr[E(ρ) ((A−1)Tx⋅P^)2]−Tr[E(ρ)((A−1)Tx⋅P^)]2}=1N(ΔY^)2\begin{align*} \text{Var}(\braket{\hat X}^\text{est}) &\geq \frac{2^{n-1}}{N} \bm{x}^T J(\mathcal{E}(\rho(\bm\theta)))^{-1}\bm{x} \\ &= \frac1N \Bigg\{ \text{Tr}\left[\mathcal{E}(\rho)\; \left((A^{-1})^T\bm{x} \cdot \bm{\hat P}\right)^2\right] \\ &\hspace{5em}- \text{Tr}\left[\mathcal{E}(\rho) \left((A^{-1})^T\bm{x} \cdot \bm{\hat P}\right)\right] ^2 \Bigg\} \\ &= \frac1N (\Delta\hat Y)^2 \end{align*} Var(⟨X^⟩est) ≥N2n−1 xTJ(E(ρ(θ)))−1x=N1 {Tr[E(ρ)((A−1)Tx⋅P^)2]−Tr[E(ρ)((A−1)Tx⋅P^)]2}=N1 (ΔY^)2 where (ΔY^)2≡(\Delta\hat Y)^2 \equiv(ΔY^)2≡ the variance of observable Y^\hat YY^ of the noisy state E(ρ^)\mathcal{E}(\hat\rho)E(ρ^ ). 73 * What is Y^\hat YY^? This is an observable which involves the optimal estimation strategy. Since Tr[ρ^X^]=Tr[E(ρ^)Y^]\text{Tr}[\hat\rho\hat X] = \text{Tr}[\mathcal{E}(\hat\rho) \hat Y]Tr[ρ^ X^]=Tr[E(ρ^ )Y^], we want it to satisfy E†(Y^)=X^\mathcal{E}^\dagger(\hat Y) = \hat XE†(Y^)=X^, and since we already assume the form of E\mathcal{E}E, we can explicitly express Y^\hat YY^ as: Y^=(−2(n−1)/2(A−1)Tx⋅c)I^+(A−1)Tx⋅P^\hat Y = \left( -2^{(n-1)/2}(A^{-1})^T\bm{x} \cdot \bm{c} \right) \hat I +(A^{-1})^T\bm{x} \cdot \bm{\hat P} Y^=(−2(n−1)/2(A−1)Tx⋅c)I^+(A−1)Tx⋅P^ * Y^\hat YY^ can be interpreted as an observable that absorbs the effect of noise. * Therefore the way to minimize the variance of the unbiased estimator ⟨X^⟩est\braket{\hat X}^\text{est}⟨X^⟩est for the noiseless state through the measurement of the noisy state is to perform the projection measurement in the diagonal basis of the operator Y^\hat YY^. 74 NOISELESS TARGET STATE (NNN-QUBIT) * ρ^\hat\rhoρ^ is obtained from applying LLL layers of noiseless unitary gates {Ul}l=1L\{\mathcal{U_l}\}_{l=1}^L{Ul }l=1L to the initial state ρ^0\hat\rho_0ρ^ 0 . * ρ^\hat\rhoρ^ can be parameterized by a general Bloch vector θ\bm\thetaθ: ρ^=I^2n+θ⋅P^2n+1\hat \rho = \dfrac{\hat I}{2^n} + \dfrac{\bm\theta \cdot \bm{\hat P}}{\sqrt{2^{n+1}}}ρ^ =2nI^ +2n+1 θ⋅P^ with θ∈R4n−1\bm\theta \in \mathbb R^{4^n-1}θ∈R4n−1. * The coefficient 2(−1−n)/22^{(-1-n)/2}2(−1−n)/2 is somehow related to the general n-level Bloch vector * I^≡σ^0⊗n\hat I \equiv \hat\sigma_0^{\otimes n}I^≡σ^0⊗n and P^≡{P^i}i=14n−1\bm{\hat P} \equiv \{\hat P_i\}_{i=1}^{4^n-1}P^≡{P^i }i=14n−1 is the nnn-qubit Pauli group excluding all-identity: Pi^∈{σ^0,σ^x,σ^y,σ^z}⊗n\{σ^0}⊗n\hat{P_i} \in \{\hat\sigma_0, \hat\sigma_x, \hat\sigma_y, \hat\sigma_z\}^{\otimes n} \backslash \{\hat\sigma_0\}^{\otimes n}Pi ^ ∈{σ^0 ,σ^x ,σ^y ,σ^z }⊗n\{σ^0 }⊗n. → notice that any Pi2^\hat{P_i^2}Pi2 ^ is just I^2n×2n\hat I_{2^n\times2^n}I^2n×2n . * assume the preparation of initial state ρ^0\hat\rho_0ρ^ 0 is noiseless. 75 VIRTUAL QUANTUM CIRCUIT Consider one single sample (one single row): * start from initial state ρ^0\hat\rho_0ρ^ 0 * with LLL layers of unitary+followed by noisy operations {El∘Ul}l=1L\{\mathcal{E}_l \circ \mathcal{U}_l\}_{l=1}^L{El ∘Ul }l=1L and constraint the noise to be Markovian * finally use noiseless POVM M0\mathcal{M}_0M0 to estimate the expectation value of a traceless observable X^=x⋅P^\hat X =\bm{x} \cdot \bm{\hat P}X^=x⋅P^, the observable is parametrized by x∈R4n−1\bm{x} \in \mathbb{R}^{4^n-1}x∈R4n−1. 76 Each noise channel El\mathcal{E}_lEl makes the generalized Bloch vector shrink plus some non-unital part: El:θ↦Alθ+cl\mathcal{E}_l: \bm{\theta} \mapsto A_l \bm{\theta} + \bm{c}_l El :θ↦Al θ+cl * (Al)ij=tr[P^iEl(P^j)]/2n(A_l)_{ij} = \mathrm{tr}[\hat P_i \mathcal{E}_l (\hat P_j)] / 2^n(Al )ij =tr[P^i El (P^j )]/2n is the unital part of the Pauli transfer matrix * (cl)i=tr[P^iEl(I^)]/23n−1(\bm{c}_l)_i = \mathrm{tr}[\hat P_i \mathcal{E}_l(\hat I)] / \sqrt{2^{3n-1}}(cl )i =tr[P^i El (I^)]/23n−1 is the non-unital part of the noise. LOWER BOUND OF NOISE STRENGTH 1. for one single El\mathcal{E}_lEl , it causes the generalized Bloch sphere to shrink with at least a strength of Γ(El)\Gamma(\mathcal{E}_l)Γ(El ) * Γ(El)≡∥Al∥−1\Gamma(\mathcal{E}_l) \equiv \| A_l \|^{-1}Γ(El )≡∥Al ∥−1 with ∥Al∥=maxe∈R4n−1∥Ale∥∥e∥\| A_l \| = \max_{\bm{e}\in \mathbb{R}^{4^n-1}} \frac{\|A_l\bm{e}\|}{\|\bm{e}\|}∥Al ∥=maxe∈R4n−1 ∥e∥∥Al e∥ where ∥e∥=∑i∣ei∣2\|\bm{e}\| = \sum_i |{e_i}|^2∥e∥=∑i ∣ei ∣2 2. for all El\mathcal{E}_lEl , each of them apply at least a strength of γ=minl{Γ(El)}l\gamma = \min_l\{ \Gamma(\mathcal{E}_l) \}_lγ=minl {Γ(El )}l of noise to the state. 77 OBJECTIVE OF QEM To remove the effect of noise channels {El}l=1L\{\mathcal{E}_l\}_{l=1}^L{El }l=1L , so that we can have an unbiased estimator of X^=x⋅P^\hat X =\bm{x} \cdot \bm{\hat P}X^=x⋅P^ ⟨X^⟩≡tr[ρ^X^]=2n−1θ⋅x\braket{\hat X} \equiv \mathrm{tr}[\hat\rho\hat X] = \sqrt{2^{n-1}} \bm{\theta}\cdot\bm{x} ⟨X^⟩≡tr[ρ^ X^]=2n−1 θ⋅x with standard deviation ε\varepsilonε. Construct Virtual Quantum Circuit as the follows: * boosted noise Elm\mathcal{E}_{lm}Elm in the lthl^\text{th}lth layer and mthm^\text{th}mth copy such that Γ(Elm)≥Γ(El)≥γ\Gamma(\mathcal{E}_{lm}) \geq \Gamma(\mathcal{E}_l) \geq \gammaΓ(Elm )≥Γ(El )≥γ * classical register ρ^c,m\hat\rho_{c,m}ρ^ c,m coupled with the system qubits via the additional operations Clm\mathcal{C}_{lm}Clm * POVM measurement M\mathcal MM performed on the entire set of copies to output the estimator of ⟨X^⟩\braket{\hat X}⟨X^⟩ CONSTRAINT ON LOCCS * Classical register ρ^c,m\hat\rho_{c,m}ρ^ c,m is initialized with probabilistic mixtures of computational bases as ρ^c,m=∑ipmi∣i⟩⟨i∣\hat\rho_{c,m} = \sum_i p_{mi}\ket i\bra iρ^ c,m =∑i pmi ∣i⟩⟨i∣ * They are coupled with system via only unitary operations as Clm=∑iClmi⊗∣i⟩⟨i∣\mathcal{C}_{lm} = \sum_i \mathcal{C}_{lmi}\otimes\ket i\bra iClm =∑i Clmi ⊗∣i⟩⟨i∣ because we’re not allowed to QEC. 78 CALCULATE SLD FISHER INFORMATION MATRIX First, re-express the mthm^\text{th}mth copy (or mthm^\text{th}mth row) of the quantum state in the virtual circuit as Em′(ρ(θ)⊗ρ^c,m)\mathcal{E}'_m(\rho(\bm\theta)\otimes\hat\rho_{c,m})Em′ (ρ(θ)⊗ρ^ c,m ), with Em′\mathcal{E}'_mEm′ being an effective noise channel: Em′=(ELm∘UL∘C(L−1)m)⋯(E1m∘U1∘C(0)m)∘(U1−1∘⋯∘UL−1)\mathcal{E}'_m = (\mathcal{E}_{Lm} \circ \mathcal{U}_L \circ \mathcal{C}_{(L-1)m}) \cdots (\mathcal{E}_{1m} \circ \mathcal{U}_1 \circ \mathcal{C}_{(0)m}) \circ (\mathcal{U}_1^{-1}\circ \cdots \circ{U}_L^{-1}) Em′ =(ELm ∘UL ∘C(L−1)m )⋯(E1m ∘U1 ∘C(0)m )∘(U1−1 ∘⋯∘UL−1 ) why define it this way? because ρ^0⊗ρ^c,m=(U1−1∘⋯∘UL−1)ρ(θ)⊗ρ^c,m\hat\rho_0\otimes\hat\rho_{c,m} = (\mathcal{U}_1^{-1}\circ \cdots \circ{U}_L^{-1})\rho(\bm\theta)\otimes\hat\rho_{c,m}ρ^ 0 ⊗ρ^ c,m =(U1−1 ∘⋯∘UL−1 )ρ(θ)⊗ρ^ c,m This compilation allows for the calculation of quantum Fisher information matrix of the state right before POVM → analyze the SLD Fisher information matrix JJJ of the quantum state: ⊗m=1NEm′(ρ(θ)⊗ρ^c,m)\otimes_{m=1}^N \mathcal{E}'_m(\rho(\bm\theta)\otimes\hat\rho_{c,m})⊗m=1N Em′ (ρ(θ)⊗ρ^ c,m ). 79 Note that we have NNN copies in total. (or you can think of it as the sample complexity) JJJ can be bounded as: J≲∑mIΓ(Em′)2≲N1γ2LJ \lesssim \sum_m \frac{I}{\Gamma(\mathcal{E}'_m)^2} \lesssim N\frac{1}{\gamma^{2L}} J≲m∑ Γ(Em′ )2I ≲Nγ2L1 * indicates the exponential decay of JJJ with circuit depth (or layer #) LLL growth. * can combine this with quantum Cramér-Rao bound inequality → so we can relate JJJ with the standard deviation ε\varepsilonε of an unbiased estimator 80 MAIN RESULTS: COMPLEXITY OF NNN Prerequisite: If the noise Elm\mathcal{E}_{lm}Elm satisfies for all lll(layers) and mmm(copies), that: * for all ρ^≠σ^\hat\rho \neq \hat \sigmaρ^ =σ^, we can still distinguish them apart even after the same noise channel: Elm(ρ^)≠Elm(σ^)\mathcal{E}_{lm}(\hat\rho) \neq \mathcal{E}_{lm}(\hat\sigma)Elm (ρ^ )=Elm (σ^) → this is a necessary condition: quantum information is not completely destroyed after noise * for all ρ^\hat\rhoρ^ , the state after noise, Elm(ρ^)\mathcal{E}_{lm}(\hat\rho)Elm (ρ^ ), is full rank with all eigenvalues greater than 0 → Elm(ρ^)\mathcal{E}_{lm}(\hat\rho)Elm (ρ^ ) is a positive-definite matrix → means that after the noise, the variance of all observable for any state is non-zero. Hence our sample cost must be greater than 0. 81 Theorem 1: Complexity of NNN N≥∥x∥2ε2βγ2L,N \geq \frac{\|\bm{x}\|^2}{\varepsilon^2} \beta \gamma^{2L}, N≥ε2∥x∥2 βγ2L, where β\betaβ is the largest 0<β<10<\beta<10<β<1 such that Elm(ρ^)−β2nI^≥0\mathcal{E}_{lm}(\hat\rho) - \frac{\beta}{2^n}\hat I \geq 0Elm (ρ^ )−2nβ I^≥0 for all ρ^\hat\rhoρ^ and all l,ml,ml,m (i have gained some physical intuition for this term) → or β\betaβ can be viewed as “how far away“ the general Bloch sphere is from the original surface due to the noise. IMPLICATION if γ>1\gamma>1γ>1, the cost (NNN) of the unbiased QEM grows exponentially with the circuit depth LLL no matter how we choose Elm\mathcal{E}_{lm}Elm or Clm\mathcal{C}_{lm}Clm or POVM. 82 WHEN QUANTUM CIRCUIT SCRAMBLES THE STATE STRONG ENOUGH then we must incorporate the dependency of nnn for sample complexity into consideration: (or else our lower bound will look like trash) > we must pay overhead to eliminate every local noise and thus encounter > dependence on nnn Theorem 2: Complexity under the presence of strong entangling gate Let U1,⋯ ,UL\mathcal{U}_1,\cdots,\mathcal{U}_LU1 ,⋯,UL all be nnn-qubit gate (which are drawn from a set of random unitary) that form unitary 2-design (what’s this?) and El\mathcal{E}_lEl be a local noise. Then, there is exponential growth with both qubit count nnn and depth LLL in the average over the number of copies NNN required to perform unbiased estimation of ⟨X^⟩\braket{\hat X}⟨X^⟩ over {U1,⋯ ,UL}\{\mathcal{U}_1,\cdots,\mathcal{U}_L\}{U1 ,⋯,UL }. * To put it simple, if we encounter random or chaotic quantum circuit, then system qubit number nnn comes to play a role in the exponential scaling of sample complexity, if we want a better/tighter bound. 83 PURE UNITAL NOISE CASE if Elm(I^2n)=I^2n\mathcal{E}_{lm}(\frac{\hat I}{2^n}) = \frac{\hat I}{2^n}Elm (2nI^ )=2nI^ , then the cost bound satisfies N≥∥x∥2ε2(1−(1−β)L)γ2L≈∥x2∥ε2γ2LN \geq \frac{\|\bm{x}\|^2}{\varepsilon^2} \Big(1-1-\beta^L\Big) \gamma^{2L} \approx \frac{\|\bm{x}^2\|}{\varepsilon^2} \gamma^{2L} N≥ε2∥x∥2 (1−(1−β)L)γ2L≈ε2∥x2∥ γ2L 84 SPECIAL CASES THAT TIGHTEN THE BOUND 1. GLOBAL DEPOLARIZATION CHANNEL (THIS IS A UNITAL CHANNEL) assume that the entire unitary gates are followed by the global depolarization channel Elm:θ↦(1−plm)θ,\mathcal{E}_{lm}: \bm\theta \mapsto (1-p_{lm})\bm\theta, Elm :θ↦(1−plm )θ, since we assume boosted noise in our framework, the error rate is bounded from below as plm≥pp_{lm} \geq pplm ≥p (so as for the other two types of noise below), so its minimum noise strength γ=1/(1−p)\gamma = 1/(1-p)γ=1/(1−p) and so we have β=p\beta = pβ=p, therefore N≈∥x∥2ε2(1−p)−2LN \approx \frac{\|\bm{x}\|^2}{\varepsilon^2} (1-p)^{-2L}N≈ε2∥x∥2 (1−p)−2L. * the effective error rate p′p'p′ for Em′\mathcal{E}'_mEm′ mentioned in calculate SLD Fisher Information Matrix.md is then p′=1−(1−p)Lp' = 1-1-p^Lp′=1−(1−p)L → can directly verify. 85 2. LOCAL DEPOLARIZATION CHANNEL (THIS IS A UNITAL CHANNEL TOO) Elm=(Elm(0))⊗n, Elm(0):θ↦(1−plm)θ\mathcal{E}_{lm} = (\mathcal{E}^{(0)}_{lm})^{\otimes n},\;\;\;\;\; \mathcal{E}^{(0)}_{lm}:\bm\theta\mapsto(1-p_{lm})\bm\theta Elm =(Elm(0) )⊗n,Elm(0) :θ↦(1−plm )θ if moreover the unitary gates of the circuit satisfy the condition: For a random circuit whose unitary gate is drawn from unitary 2-design (such as nnn-qubit Clifford group), the bound will be tighten as E[N]≥O((1+324n4n−1p)nL)\mathbb{E}[N] \geq \mathcal O\left( \left(1+\frac{3}{2}\frac{4^n}{4^n-1}p\right)^{nL} \right) E[N]≥O((1+23 4n−14n p)nL) 86 3. LOCAL AMPLITUDE DAMPING CHANNEL (THIS IS A NON-UNITAL CHANNEL) Elm=(Elm(0))⊗n, Elm(0):(θx,θy,θz)↦(1−plmθx,1−plmθy,(1−plm)θz+plm)\mathcal{E}_{lm} = (\mathcal{E}^{(0)}_{lm})^{\otimes n},\;\;\;\;\; \mathcal{E}^{(0)}_{lm}:(\theta_x, \theta_y, \theta_z) \mapsto \\(\sqrt{1-p_{lm}}\theta_x, \sqrt{1-p_{lm}}\theta_y, (1-p_{lm})\theta_z + p_{lm}) Elm =(Elm(0) )⊗n,Elm(0) :(θx ,θy ,θz )↦(1−plm θx ,1−plm θy ,(1−plm )θz +plm ) again if we satisfy the same condition, the bound will be tighten as E[N]≥O((1+4n4n−1p)nL)\mathbb{E}[N] \geq \mathcal O\left( \left(1+\frac{4^n}{4^n-1}p\right)^{nL} \right) E[N]≥O((1+4n−14n p)nL) 87 THANK YOU FOR LISTENING any question? 88