yale-grouptalk0816-andy.pages.dev
Open in
urlscan Pro
188.114.97.3
Public Scan
URL:
https://yale-grouptalk0816-andy.pages.dev/
Submission: On August 17 via api from US — Scanned from NL
Submission: On August 17 via api from US — Scanned from NL
Form analysis
0 forms found in the DOMText Content
QUANTUM ADVANTAGE WITH QUANTUM ERROR MITIGATION: INTRO AND THEIR COSTS Zhenyu Cai et al. "Quantum error mitigation." Reviews of Modern Physics 95.4 (2023) Kento Tsubouchi et al. "Universal cost bound of quantum error mitigation based on quantum estimation theory." Physical Review Letters 131.21 (2023) based on these 2 articles Andy Chu (Chia-Tung) August 16, 2024 Yale Quantum Systems Lab TABLE OF CONTENT 1. Quantum Error Mitigation - Definition 2. Quantifying a "Good" Error-Mitigated Estimator (1/2) 3. Zero-Noise Extrapolation 📄, 📄 4. Probabilistic Error Cancellation 📄 (1/2) 5. Measurement Error Mitigation 📄, 📄 (1/4) 6. Purity Constraints 7. Subspace Expansions 📄, 📄, 📄 8. Quantum Fisher Information 9. Multi-parametric QFI and re-parametrization 10. Applying Cramér-Rao Bound 11. Noise channel characterization 12. Main Results: Complexity of NNN 13. Connection between QEM and QEC & Open Problems Enter fullscreenGo to previous slideGo to next slideShow slide overviewSwitch to dark mode theme Show drawing toolbar Adjust settings 1 / 86 Draw with stylusDraw a lineDraw an arrowDraw an ellipseDraw a rectangleErase Adjust stroke width Set brush color Set brush color Set brush color Set brush color Set brush color Set brush color Set brush color UndoRedoDelete Pin drawing QUANTUM ADVANTAGE WITH QUANTUM ERROR MITIGATION: INTRO AND THEIR COSTS Zhenyu Cai et al. "Quantum error mitigation." Reviews of Modern Physics 95.4 (2023) Kento Tsubouchi et al. "Universal cost bound of quantum error mitigation based on quantum estimation theory." Physical Review Letters 131.21 (2023) based on these 2 articles Andy Chu (Chia-Tung) August 16, 2024 Yale Quantum Systems Lab 1 TABLE OF CONTENT 1. Quantum Error Mitigation - Definition 2. Quantifying a "Good" Error-Mitigated Estimator (1/2) 3. Zero-Noise Extrapolation 📄, 📄 4. Probabilistic Error Cancellation 📄 (1/2) 5. Measurement Error Mitigation 📄, 📄 (1/4) 6. Purity Constraints 7. Subspace Expansions 📄, 📄, 📄 8. Quantum Fisher Information 9. Multi-parametric QFI and re-parametrization 10. Applying Cramér-Rao Bound 11. Noise channel characterization 12. Main Results: Complexity of NNN 13. Connection between QEM and QEC & Open Problems 2 WHY QUANTUM ERROR MITIGATION? (1/2) * Where are we now? Somewhere between NISQ ↔︎ Full-Scale Fault-Tolerant. * Do we need to wait until Full-Scale Fault-Tolerant to achieve quantum advantage? (exponential/polynomial/super-polynomial speedups over the best-known classical algorithms for: Hamiltonian simulation, algebraic/optimization/searching problems, randomness sampling, …etc.) * On the fault-tolerant side, we have already plentiful of progress. but it’ll still take a while before we can implement thousands to millions of logical qubits * Why waiting despite given all these progress? FAULT TOLERANT PROGRESS * logical qubit in a diamond quantum processor 📄 * scaling a surface code logical qubit 📄 * fault-tolerant entangling gates on the five-qubit code and the color code 📄 * QEC with silicon spin qubits 📄 …and more HIGH QUALITY PHYSICAL QUBIT CONTROL * multiplexed two-dimensional cluster state 📄 * 256-atom programmable quantum simulator 📄 * Q. volume of 64 on superconducting platform 📄 * spin qubits crossing the surface code threshold 📄 ... and more 3 WHY QUANTUM ERROR MITIGATION? (2/2) * Existing of barrier for Error Correction: Threshold theorem. (hardware-fail-rate requires p<pthresholdp < p_\text{threshold}p<pthreshold for meaningful QEC) * Usefulness of QEM: Takes continuous progress in quantum hardware → immediate improvements for quantum information processing. * To put it simple, QEC: has a threshold (so that we can correct error faster than it accumulates). QEM: doesn’t have some “threshold”. 4 WHERE CAN WE USE QEM? * Wide usecases: * Variational quantum circuits in physics simulations * approximate optimization algorithms * heuristic algorithms for quantum machine learning (and much more…) * What do they have in common? 1. applying a short-depth quantum circuit to a simple initial state 2. estimating the expectation value of some relevant observable. * We want our output: “expectation value of relevant observable”, to be as accurate as possible. (for example, in physical/chemical simulation, we want the ground energy to be within chemical accuracy (1kcal/mol)) * This leads to the most essential feature of QEM: the ability to minimize the noise-induced bias in expectation values on noisy hardware. * But this can also be achieved by QEC! and by many other tools like decoherence-free subspaces 📄 and dynamical decoupling sequences 📄 ! > So what sets QEM apart from all these other techniques? It’s better to > answer this question with the definition of QEM. 5 QUANTUM ERROR MITIGATION - DEFINITION * Algorithmic schemes that reduce the noise-induced bias in the expectation value. * By post-processing outputs from an ensemble of circuit runs. * Using circuits at the same noise level as the original unmitigated circuit or above. DIFFERENCE BETWEEN QEM AND QEC * Basic Idea: QEM: by post-processing outputs. QEC: by applying adaptive-gates according to mid-circuit measurements. * Noise-level: QEM: unchanged or amplified in each circuit run. QEC: reduced in EVERY SINGLE circuit run. * What does it take? QEM: many times of circuit runs. QEC: more qubits required (overhead). 6 MANY TIMES OF CIRCUIT RUNS: WHY? How to get the expectation value of some variable of interest? By simply averaging the measurement results of repeated execution (unless we use some one-shot measurement like phase estimation → but that is more in the fault tolerant era) Sampling Overhead!! So now, if our error-mitigated estimator has favorable sampling overhead, it’s a good estimator; and if its bias from the ideal expectation value is small, then it’s also a good estimator. 7 (APPENDIX) QUANTUM ERROR MITIGATION - DESIRED FEATURES * Mitigation methods should ideally only require a very modest qubit overhead to remain practical on current and near-term quantum hardware. * Error mitigation techniques should provide an accuracy guarantee for the method. Such a guarantee should ideally provide a formal error bound on the mitigated expectation values that indicate how well the method works at varying levels of noise. * a reliable error mitigation method should require few assumptions (or, no assumptions) about the final state that is prepared by the computation. 8 QUANTIFYING A "GOOD" ERROR-MITIGATED ESTIMATOR (1/2) * Goal: to estimate the expectation value of some observable of interest OOO → to estimate Tr[ρ0O]\text{Tr}[\rho_0O]Tr[ρ0 O]. * How: we can construct an estimator O^\hat OO^ for estimating OOO. And how good is O^\hat OO^? (assessing O^\hat OO^) 1. Use Mean Square Error we choose this method for assessing O^\hat OO^ MSE[O^]=E[(O^−Tr[ρ0O])2]\text{MSE}[\hat O] = \text{E}\left[\left( \hat O - \text{Tr}[\rho_0O]\right)^2\right]MSE[O^]=E[(O^−Tr[ρ0 O])2] 2. Use prediction intervals calculate the interval within which the outcome of the estimator will fall with a given probability, offering a rigorous bound on the worst-case deviation of the estimator. Formally defined goal: to reduce MSE[O^]\text{MSE}[\hat O]MSE[O^] as much as possible, using finite resource. 9 QUANTIFYING A "GOOD" ERROR-MITIGATED ESTIMATOR (2/2) What contributes to MSE[O^]\text{MSE}[\hat O]MSE[O^]? It’s convenient to break it into 2 terms: MSE[O^]=(Bias[O^])2+Var[O^]\text{MSE}[\hat O] = \left(\text{Bias}[\hat O]\right)^2 + \text{Var}[\hat O] MSE[O^]=(Bias[O^])2+Var[O^] Each of them defined as: Bias[O^]=E[O^]−Tr[ρ0O]Var[O^]=E[O^2]−(E[O^])2\begin{align*} \text{Bias}[\hat O] &= \text{E}[\hat O] - \text{Tr}[\rho_0O] \\ \text{Var}[\hat O] &= \text{E}\left[\hat O^2\right] - \left(\text{E}[\hat O]\right)^2 \end{align*} Bias[O^]Var[O^] =E[O^]−Tr[ρ0 O]=E[O^2]−(E[O^])2 → related to accuracy! → related to precision! Define sampling overhead: Cem=Nshotϵ(O^em)Nshotϵ(O^ρ)=Var[O^em]Var[O^ρ]C_\text{em} = \frac{N^\epsilon_\text{shot}(\hat O_\text{em})}{N^\epsilon_\text{shot}(\hat O_\rho)} = \frac{\text{Var}[\hat O_\text{em}]}{\text{Var}[\hat O_\rho]} Cem =Nshotϵ (O^ρ )Nshotϵ (O^em ) =Var[O^ρ ]Var[O^em ] (How much more times of circuit runs do we need for O‾em\overline{O}_\text{em}Oem to have variance of same level as the original noisy O‾ρ\overline{O}_\rhoOρ ?) 10 (APPENDIX) SIMPLEST ESTIMATOR O‾Ρ\OVERLINE{O}_\RHOOΡ just don’t use QEM: > Although we are given a noisy state ρ\rhoρ instead of ρ0\rho_0ρ0 , we still > measure OOO on that ρ\rhoρ, and repeat for NcirN_\text{cir}Ncir times > (effectively running the noisy circuit NcirN_\text{cir}Ncir times), so we will > have NcirN_\text{cir}Ncir independent O^ρ\hat O_\rhoO^ρ now. Finally, the way > of constructing O‾ρ\overline{O}_\rhoOρ is to simply take average of these > O^ρ\hat O_\rhoO^ρ . > → the more NcirN_\text{cir}Ncir we run, the higher the quality of > O‾ρ\overline{O}_\rhoOρ : MSE[O‾ρ]=(Tr[ρO]−Tr[ρ0O]⏟Bias[O‾ρ]=Bias[O^ρ])2+Tr[ρO2]−Tr[ρO]2Ncir⏟Var[O‾ρ]=Var[O^ρ]/Ncir\text{MSE}[\overline{O}_\rho]= (\underbrace{\text{Tr}[\rho O] - \text{Tr}[\rho_0 O]}_{\text{Bias}[\overline{O}_\rho]=\text{Bias}[\hat{O}_\rho]})^2 + \underbrace{\frac{\text{Tr}\left[\rho O^2\right] - \text{Tr}[\rho O]^2}{N_\text{cir}}}_{\text{Var}[\overline{O}_\rho] = \text{Var}[\hat{O}_\rho]/N_\text{cir}} MSE[Oρ ]=(Bias[Oρ ]=Bias[O^ρ ]Tr[ρO]−Tr[ρ0 O] )2+Var[Oρ ]=Var[O^ρ ]/Ncir Ncir Tr[ρO2]−Tr[ρO]2 * the second term is related to shot noise, and will reduce as we increase NcirN_\text{cir}Ncir . * so eventually MSE[O‾ρ]\text{MSE}[\overline{O}_\rho]MSE[Oρ ] will be limited by the first term (the bias) of the estimator Bias[O‾ρ]\text{Bias}[\overline{O}_\rho]Bias[Oρ ] (its magnitude, to be clear.) because this a systematic error that cannot be reduced by increasing the number of circuit runs. 11 (APPENDIX) AFTER APPLYING QEM We can construct O‾em\overline{O}_\text{em}Oem (error-mitigated) using the same number of circuit runs NcirN_\text{cir}Ncir to achieve: * smaller bias than the original naive estimator O‾ρ\overline{O}_\rhoOρ : magnitude of Bias[O‾em]≤\text{Bias}[\overline{O}_\text{em}] \leqBias[Oem ]≤ magnitude of Bias[O‾ρ]\text{Bias}[\overline{O}_\rho]Bias[Oρ ]. * but usually, inevitably, larger variance: Var[O‾em]≥Var[O‾ρ]\text{Var}[\overline{O}_\text{em}] \geq \text{Var}[\overline{O}_\rho]Var[Oem ]≥Var[Oρ ], because we usually extract the useful part from the noise to reduce bias, so the error-mitigated estimator is also more sensitive to noise. here we post-define O^em\hat O_\text{em}O^em to be a 'one-shot' error-mitigated estimator satisfying E[O^em]=E[O‾em]\text{E}[\hat O_\text{em}] = \text{E}[\overline{O}_\text{em}]E[O^em ]=E[Oem ] and Var[O^em]=Ncir×Var[O‾em]\text{Var}[\hat{O}_\text{em}] =N_\text{cir}\times\text{Var}[\overline{O}_\text{em}]Var[O^em ]=Ncir ×Var[Oem ], so that: Cem=Nshotϵ(O^em)Nshotϵ(O^ρ)=Var[O^em]Var[O^ρ]C_\text{em} = \frac{N^\epsilon_\text{shot}(\hat O_\text{em})}{N^\epsilon_\text{shot}(\hat O_\rho)} = \frac{\text{Var}[\hat O_\text{em}]}{\text{Var}[\hat O_\rho]} Cem =Nshotϵ (O^ρ )Nshotϵ (O^em ) =Var[O^ρ ]Var[O^em ] 12 (APPENDIX) AN ALTERNATIVE WAY TO QUANTIFY CEMC_\TEXT{EM}CEM If we know the range of a random variable X^\hat XX^ (the difference between the maximum and minimum possible values taken by X^\hat XX^), denoted as R[X^]R[\hat X]R[X^], we can plug this range into Hoeffding’s inequality to get the sample size required: To sufficiently guarantee an "estimation of E[X^]E[\hat X]E[X^]" to ϵ\epsilonϵ-precision with (1−δ)(1-\delta)(1−δ) probability, we need: Nhffϵ,δ(X^)=ln(2/δ)2ϵ2R[X^]2N^{\epsilon,\delta}_\text{hff}(\hat X) = \frac{\ln(2/\delta)}{2\epsilon^2} R[\hat X]^2 Nhffϵ,δ (X^)=2ϵ2ln(2/δ) R[X^]2 This way our previously defined sampling overhead can be approximated as: Cem≈Nhffϵ,δ(O^em)Nhffϵ,δ(O^ρ)=R[O^em]2R[O^ρ]2C_\text{em} \approx \frac{N^{\epsilon,\delta}_\text{hff}(\hat O_\text{em})}{N^{\epsilon,\delta}_\text{hff}(\hat O_\rho)} = \frac{R[\hat O_\text{em}]^2}{R[\hat O_\rho]^2} Cem ≈Nhffϵ,δ (O^ρ )Nhffϵ,δ (O^em ) =R[O^ρ ]2R[O^em ]2 13 (APPENDIX) MODELING THE ERRORS Similar to QEC, we model noise as discrete, probabilistic events called faults, and faults can occur at various locations in the circuits, locations include gates, idling steps, and measurements. QUANTIFYING THE ERRORS Assume all locations are afflicted with Pauli noise, and the Pauli noise at location fff occurs with pfp_fpf probability. * if error events at all locations are independent, then the probability of no faults occurring at all is P0=Πf(1−pf)P_0 = \Pi_f (1-p_f)P0 =Πf (1−pf ) → fault-free probability of the circuit. 14 (APPENDIX) AN ALTERNATIVE WAY TO QUANTIFY ERRORS: Run the circuit one time, see how many fault occurs. Then average this number over many circuit runs. This final average is defined as circuit fault rate λ=∑fpf\lambda = \sum_f p_fλ=∑f pf . * simple case: all pfp_fpf are the same (ppp), then λ=Mp\lambda = Mpλ=Mp if MMM locations in total. If MMM is large, we can model “kkk errors occur during one run” using Poisson Distribution: Pk=e−λλk/k!P_k = e^{-\lambda}\lambda^k/k!Pk =e−λλk/k! Then the fault-free probability is just the special case where k=0k=0k=0, P0=e−λP_0 = e^{-\lambda}P0 =e−λ. > All of the arguments will still apply if we define (1−pf1-p_f1−pf ) as the > coefficient of the error-free part of the Kraus representation of some general > noise. More generally, we can always apply Pauli twirling to transform all > noise to Pauli noise such that our arguments become valid. 15 (APPENDIX) BEHAVIOR OF SAMPLING OVERHEAD > INEVITABILITY > > Even we can always magically distinguish all perfect circuit runs (happens > with P0=e−λP_0 = e^{-\lambda}P0 =e−λ) from the rest “flawed-runs”, we still > need to run exponentially times of the circuit to get enough number of > “perfect-runs” to reach ϵ\epsilonϵ-level shot noise. As of today, we don’t have magic. So instead of impractically extracting the error-free state ρ0\rho_0ρ0 , we focus on how to construct Error-mitigated estimators. QEC VS QEM For a given noisy circuit with circuit fault rate λ\lambdaλ: * QEC: active correction → reduces λ\lambdaλ in each circuit run * QEM: relies on “post-processing the outputs” from an ensemble of circuit runs, each run with circuit fault rate ≥λ\geq\lambda≥λ. 16 NOW LET’S SEE (AT LEAST 3) WIDELY-USED ERROR MITIGATION TECHNIQUES 17 ZERO-NOISE EXTRAPOLATION 📄, 📄 IDEA: If we can tune our circuit fault rate λ\lambdaλ → obtain a series of different state ρλ\rho_\lambdaρλ at different λ\lambdaλ → view Tr[ρλO]\text{Tr}[\rho_\lambda O]Tr[ρλ O] as a function of λ\lambdaλ → the ideal expectation value happens when we substitute λ=0\lambda=0λ=0 into the function. CANONICAL DEFINITION: 1. Model the trend of the data using a parametrized function f(λ;θ⃗)f(\lambda; \vec\theta)f(λ;θ) 2. Once we’re done fitting, we get a set of optimal parameters θ⃗∗\vec\theta^*θ∗ 3. The error-mitigated estimate of the zero-noise output is then E[O^em]≡f(0;θ⃗∗)\text{E}[\hat O_\text{em}] \equiv f(0; \vec\theta^*)E[O^em ]≡f(0;θ∗). 18 ZNE: OPERATIONAL DEGREE OF FREEDOM (1/2) How do we scale the noise (circuit fault rate λ\lambdaλ)? 1. Under the assumption that the noise was time-invariant, the noise could be effectively amplified by stretching and recalibrating the gate pulses. * On superconducting platform. Kandala et al., 2019 * For two-qubit cross-resonance gates. Chow et al., 2011 2. Or, by inserting a sequence of abundant gates that are equivalent to the identity (if operated noiselessly) examples: global folding, local folding, identity insertion * Cloud quantum computing. Dumitrescu et al., 2018 * Digital zero noise extrapolation. Giurgica-Tiron et al., 2020 * Identity insertions. He et al., 2020 19 ZNE: OPERATIONAL DEGREE OF FREEDOM (2/2) How do we fit the {noise, expectation value} data? For sufficiently small λ\lambdaλ (usually it is the case ) → Richardson extrapolation (some "truncated Taylor expansion") Tr[ρλO]≈f(λ;θ⃗)=∑k=0M−1θkλkk!\text{Tr}[\rho_{\lambda} O] \approx f(\lambda; \vec\theta) = \sum_{k=0}^{M-1} \theta_k \frac{\lambda^k}{k!} Tr[ρλ O]≈f(λ;θ)=k=0∑M−1 θk k!λk which is a (M−1)th(M-1)^\text{th}(M−1)th order polynomial with MMM coefficients (or degree of freedom). → simplest case: M=2M=2M=2, it’s just linear extrapolation → if we first assign a value for MMM (related to how sophisticated we want to model fff), then we need to obtain at least MMM data points (in the spirit of Lagrange polynomial) to obtain our error-mitigated estimate: E[O^em]=θ0∗=∑m=1M(Tr[ρλmO]∏k≠mλk−0λk−λm)\text{E}[\hat O_\text{em}] = \theta_0^* = \sum_{m=1}^M \left(\text{Tr}[\rho_{\lambda_m}O] \prod_{k\neq m} \frac{\lambda_k-0}{\lambda_k-\lambda_m} \right) E[O^em ]=θ0∗ =m=1∑M Tr[ρλm O]k=m∏ λk −λm λk −0 20 ZNE: PERFORMANCE ↔︎ SAMPLING OVERHEAD TRADE-OFFS PERFORMANCE * The bias of E[O^em]\text{E}[\hat O_\text{em}]E[O^em ]: → originates from our omission of higher degree terms in f(λ;θ⃗)f(\lambda; \vec\theta)f(λ;θ). → magnitude of Bias[O^em]=O(λM)\text{Bias}[\hat O_\text{em}] = \mathcal{O}(\lambda^M)Bias[O^em ]=O(λM) for MMM data points. * Other ways to fit the dataset: linear fit, polynomial fit, exponential fit, poly-exponential fit, adaptive-exponential fit,… * Theoretically, ZNE cannot completely remove bias, but its cost is relatively cheap comparing to other QEM methods. SAMPLING OVERHEAD * If we use Monte Carlo method (because E[O^em]\text{E}[\hat O_\text{em}]E[O^em ] is a linear combination of the set of noisy expectation values), the overhead for (equal gap) Richardson extrapolation is: Cem∼(∑m=1M∣∏k≠mλk−0λk−λm∣)2∼(2M−1)2C_\text{em} \sim \left( \sum_{m=1}^M \left| \prod_{k\neq m} \frac{\lambda_k-0}{\lambda_k-\lambda_m} \right| \right)^2\sim (2^M-1)^2 Cem ∼ m=1∑M k=m∏ λk −λm λk −0 2∼(2M−1)2 → grows exponentially with the number of data points MMM. * Can choose between different fitting functions to reach optimal result and a more favorable sampling overhead. 21 ZNE: QUANTUM ADVANTAGE V.S. LIMITATIONS EXPERIMENTAL REALIZATIONS * Kim et al. 2023 📄 : 26-qubit simulation of the 2D transverse field Ising model in the limit of strong entanglement → result is competitive in comparison to standard tensor network methods. * Kim et al. 2023 📄 : simulated the time dynamics of the Ising model up to a circuit size of 127 qubits and 60 layers of 2-qubit gates → result in agreement with state-of-the-art classical simulations. * cloud platforms (2020 experiments): 📄, 📄, 📄, … and more. LIMITATIONS * Need the ability to "scale the noise" * Noise model might change under different noise strength * Theoretically cannot remove entire bias → but can go very close to 0. 22 PROBABILISTIC ERROR CANCELLATION 📄 (1/2) IDEA: → we can write the noise-free expectation value as a linear combination of noisy expectation values. → this is a quasi-probability decomposition. → can be sampled using Monte Carlo procedure. 23 PROBABILISTIC ERROR CANCELLATION 📄 (2/2) FORMAL DEFINITION * We want to implement an ideal channel U\mathcal UU, but with a set of noisy basis operations {Bn}\{\mathcal B_n\}{Bn } since we can only perform these noisy {Bn}\{\mathcal B_n\}{Bn } on our hardware. * For sufficiently large basis, we can decompose the ideal channel U=∑nαnBn\mathcal U = \sum_n\alpha_n \mathcal B_nU=∑n αn Bn with real coefficients αn\alpha_nαn . αn\alpha_nαn can often be negative → this can often be an unphysical map, but this doesn’t matter. * Because we can express our ideal expectation value of any observable OOO as Tr[Oρ]=⟨ ⟨O∣ρ⟩ ⟩\text{Tr}[O\rho ] = \langle\!\langle O \vert \rho \rangle\!\rangleTr[Oρ]=⟨⟨O∣ρ⟩⟩, and decompose it into: ⟨ ⟨O∣ρ⟩ ⟩=⟨ ⟨O∣U∣ρ0⟩ ⟩=∑nαn⟨ ⟨O∣Bn∣ρ0⟩ ⟩\langle\!\langle O \vert \rho \rangle\!\rangle = \langle\!\langle O \vert \mathcal U \vert \rho _0 \rangle\!\rangle = \sum_n\alpha_n\langle\!\langle O \vert \mathcal B_n \vert \rho _0 \rangle\!\rangle ⟨⟨O∣ρ⟩⟩=⟨⟨O∣U∣ρ0 ⟩⟩=n∑ αn ⟨⟨O∣Bn ∣ρ0 ⟩⟩ * Therefore, in practice, we can sample each ⟨ ⟨O∣Bn∣ρ0⟩ ⟩\langle\!\langle O \vert \mathcal B_n \vert \rho _0 \rangle\!\rangle⟨⟨O∣Bn ∣ρ0 ⟩⟩ term with probability ∣αn∣/γ|\alpha_n|/\gamma∣αn ∣/γ where γ=∑nαn\gamma = \sum_n\alpha_nγ=∑n αn . And don’t forget to multiply the term by sign(αn)⋅γ\text{sign}(\alpha_n)\cdot\gammasign(αn )⋅γ before adding all up. Monte Carlo would usually do the sampling work above. The sampling overhead in this case is Cem∼γ2=(∑n∣αn∣)2C_\text{em}\sim \gamma^2 = (\sum_n|\alpha_n|)^2Cem ∼γ2=(∑n ∣αn ∣)2. 24 (APPENDIX) PEC: FROM SINGLE GATE TO MULTIPLE GATES Any circuit we want to implement can be decomposed into a sequence of MMM ideal gates {Um}\{\mathcal U_m\}{Um }, written as ⟨ ⟨O∣ρ⟩ ⟩=⟨ ⟨O∣∏m=1MUm∣ρ0⟩ ⟩\langle\!\langle O \vert \rho \rangle\!\rangle = \langle\!\langle O \vert \prod_{m=1}^M \mathcal U_m \vert \rho _0 \rangle\!\rangle ⟨⟨O∣ρ⟩⟩=⟨⟨O∣m=1∏M Um ∣ρ0 ⟩⟩ therefore the decomposition is: ⟨ ⟨O∣ρ⟩ ⟩=⟨ ⟨O∣∏m=1M(∑nmαm,nmBnm)∣ρ0⟩ ⟩=∑n⃗αn⃗⟨ ⟨O∣Bn⃗∣ρ0⟩ ⟩\begin{align*}\langle\!\langle O \vert \rho \rangle\!\rangle &= \langle\!\langle O \vert \prod_{m=1}^M \left( \sum_{n_m} \alpha_{m, n_m} \mathcal B_{n_m} \right) \vert \rho _0 \rangle\!\rangle \\ &= \sum_{\vec n} \alpha_{\vec n}\langle\!\langle O \vert \mathcal B_{\vec n} \vert \rho _0 \rangle\!\rangle \end{align*} ⟨⟨O∣ρ⟩⟩ =⟨⟨O∣m=1∏M (nm ∑ αm,nm Bnm )∣ρ0 ⟩⟩=n∑ αn ⟨⟨O∣Bn ∣ρ0 ⟩⟩ where n⃗={n1,n2,⋯ ,nM}\vec n = \{n_1, n_2, \cdots, n_M\}n={n1 ,n2 ,⋯,nM } which is the set of labels for a sequence of basis elements, and we’ve used definition: Bn⃗=∏m=1MBnm\mathcal B_{\vec n} = \prod_{m=1}^M \mathcal B_{n_m}Bn =∏m=1M Bnm , αn⃗=∏m=1Mαm,nm\alpha_{\vec n} = \prod_{m=1}^M \alpha_{m,n_m}αn =∏m=1M αm,nm and accordingly, γ=∑n⃗∣αn⃗∣\gamma = \sum_{\vec n}|\alpha_{\vec{n}}|γ=∑n ∣αn ∣. 25 PEC: A SIMPLE EXAMPLE (1/2) Suppose our basis operations only suffer from single-qubit depolarizing channel ρ↦(1−p)ρ+p3XρX+p3YρY+p3ZρZ\rho \mapsto (1-p)\rho + \frac p3 X\rho X+ \frac p3 Y\rho Y+ \frac p3 Z\rho Zρ↦(1−p)ρ+3p XρX+3p YρY+3p ZρZ. Then our ideal Hadamard gate H\text HH can be expanded into noisy operations: (H)ideal=α1(H)noisy+α2(XH)noisy+α3(YH)noisy+α4(ZH)noisy\begin{align*} (\text H)_\text{ideal} &= \alpha_1 (\text H)_\text{noisy} \\ &+\alpha_2 (\text X\text H)_\text{noisy} \\ &+\alpha_3 (\text Y\text H)_\text{noisy} \\ &+\alpha_4 (\text Z\text H)_\text{noisy} \end{align*} (H)ideal =α1 (H)noisy +α2 (XH)noisy +α3 (YH)noisy +α4 (ZH)noisy > We only want pure noiseless H\boxed HH (with coeff=1) in the end. > This gives us two equations of constraint to satisfy: > α1(1−p)+(α2+α3+α4)(p/3)=1\alpha_1(1-p) + (\alpha_2+\alpha_3+\alpha_4)(p/3) = > 1α1 (1−p)+(α2 +α3 +α4 )(p/3)=1 , and > α1(p/3)+α2(1−p)=0\alpha_1(p/3) + \alpha_2(1-p) = 0α1 (p/3)+α2 (1−p)=0, (same > for α3\alpha_3α3 and α4\alpha_4α4 ). 26 PEC: A SIMPLE EXAMPLE (2/2) Suppose our basis operations only suffer from single-qubit depolarizing channel ρ↦(1−p)ρ+p3XρX+p3YρY+p3ZρZ\rho \mapsto (1-p)\rho + \frac p3 X\rho X+ \frac p3 Y\rho Y+ \frac p3 Z\rho Zρ↦(1−p)ρ+3p XρX+3p YρY+3p ZρZ. Then our ideal Hadamard gate H\text HH can be expanded into noisy operations: (H)ideal=α1(H)noisy+α2(XH)noisy+α3(YH)noisy+α4(ZH)noisy\begin{align*} (\text H)_\text{ideal} &= \alpha_1 (\text H)_\text{noisy} \\ &+\alpha_2 (\text X\text H)_\text{noisy} \\ &+\alpha_3 (\text Y\text H)_\text{noisy} \\ &+\alpha_4 (\text Z\text H)_\text{noisy} \end{align*} (H)ideal =α1 (H)noisy +α2 (XH)noisy +α3 (YH)noisy +α4 (ZH)noisy > We only want pure noiseless H\boxed HH (with coeff=1) in the end. > This gives us two equations of constraint to satisfy: > α1(1−p)+(α2+α3+α4)(p/3)=1\alpha_1(1-p) + (\alpha_2+\alpha_3+\alpha_4)(p/3) = > 1α1 (1−p)+(α2 +α3 +α4 )(p/3)=1 , and > α2(p/3)+α2(1−p)=0\alpha_2(p/3) + \alpha_2(1-p) = 0α2 (p/3)+α2 (1−p)=0, (same > for α3\alpha_3α3 and α4\alpha_4α4 ). * If our noise level p=0.1p=0.1p=0.1, then (α1,α2,α3,α4)=(2926,−126,−126,−126)≈(1.115,−0.038,−0.038,−0.038)(\alpha_1, \alpha_2, \alpha_3, \alpha_4) = (\frac{29}{26}, \frac{-1}{26}, \frac{-1}{26}, \frac{-1}{26}) \approx (1.115, -0.038, -0.038, -0.038)(α1 ,α2 ,α3 ,α4 )=(2629 ,26−1 ,26−1 ,26−1 )≈(1.115,−0.038,−0.038,−0.038). * If our noise level p=0.2p=0.2p=0.2, then (α1,α2,α3,α4)=(1411,−111,−111,−111)≈(1.273,−0.091,−0.091,−0.091)(\alpha_1, \alpha_2, \alpha_3, \alpha_4) = (\frac{14}{11}, \frac{-1}{11}, \frac{-1}{11}, \frac{-1}{11}) \approx (1.273, -0.091, -0.091, -0.091)(α1 ,α2 ,α3 ,α4 )=(1114 ,11−1 ,11−1 ,11−1 )≈(1.273,−0.091,−0.091,−0.091). > We can see from this simple case that: for higher noise-level, the higher the > one-norm γ=∣α1∣+∣α2∣+∣α3∣+∣α4∣\gamma = > |\alpha_1|+|\alpha_2|+|\alpha_3|+|\alpha_4|γ=∣α1 ∣+∣α2 ∣+∣α3 ∣+∣α4 ∣, > indicating higher sampling overhead ∼γ2\sim\gamma^2∼γ2. * How to sample with negative probability? e.g., for α2=−0.038\alpha_2=-0.038α2 =−0.038, we just sample (XH)noisy(\text X\text H)_\text{noisy}(XH)noisy with probability: 0.0381.115+0.038+0.038+0.038\dfrac{0.038}{1.115+0.038+0.038+0.038}1.115+0.038+0.038+0.0380.038 , and finally add up the negative of circuit output value to the total expectation value. 27 PEC: PERFORMANCE ↔︎ SAMPLING OVERHEAD TRADE-OFFS PERFORMANCE * Theoretically, PEC can completely remove bias, but its cost is also very expensive. * Can also tackle correlated noise: 📄 → by considering the noise model in terms of a sparse Pauli-Lindbladian L(ρ)=∑kλk(PkρPk−ρ)\mathcal L(\rho) = \sum_k\lambda_k(P_k\rho P_k - \rho)L(ρ)=∑k λk (Pk ρPk −ρ) for a set of Pauli matrices PkP_kPk , which can be efficiently learned for a polynomial number of Pauli terms. * Can also use variational circuits to construct the basis set: 📄 SAMPLING OVERHEAD * If we assume all the gates suffer from Pauli noise with the same error rate ppp, the circuit size MMM is large but the circuit fault rate is finite λ=Mp\lambda = Mpλ=Mp, Cem∼∏m=1M(1+p1−p)2∼e4λC_\text{em} \sim \prod_{m=1}^M \left( \frac{1+p}{1-p} \right)^2\sim e^{4\lambda} Cem ∼m=1∏M (1−p1+p )2∼e4λ → grows exponentially with the number of data points MMM. * the sampling overhead of probabilistic error cancellation is highly dependent on the basis we choose. 28 PEC: QUANTUM ADVANTAGE V.S. LIMITATIONS EXPERIMENTAL REALIZATIONS * Song et al. 2019 📄 → on superconducting platform. * Zhang et al. 2020 📄 → on trapped-ion platform. * Can also be applied to continuous noise processes, and perform partial mitigation (by expanding the noise process into a perturbation series): Sun et al. 2021 📄, Hama and Nishi. 2022 📄 * Hakoshima et al., 2021 📄: use the non-Markovianity in noise to reduce the sampling cost. LIMITATIONS * Need to have a set of basis elements that span the ideal operations, and full characterisation of this basis. * Need to learn the hardware-specific noise model. * The sampling overhead would blow up if hardware error rate increases mildly. * PEC is more suitable for Pauli noise (can be done via Twirling). 29 MEASUREMENT ERROR MITIGATION 📄, 📄 (1/4) IDEA: * Ideal Measurement * measure the circuit → obtain binary string output x∈{0,1}Nx\in \{0,1\}^Nx∈{0,1}N * ideally: want to perform projective measurements in the computational basis {⟨ ⟨x∣}\{\langle\!\langle x\vert\}{⟨⟨x∣}. * Real-World Measurement * some measurement noise A\mathcal AA occurs * turn the projective measurement into POVM {⟨ ⟨Ex∣}={⟨ ⟨x∣A∣}\{\langle\!\langle E_x\vert\}=\{\langle\!\langle x\vert \mathcal A\vert\}{⟨⟨Ex ∣}={⟨⟨x∣A∣} → leads to different output statistics. > physical origins of readout errors > > * superconducting: thermal excitations, T1T_1T1 -decay. > * ion traps: hard to detect an ion’s dark state, collisions in the trap > → it’s better (or sufficient) to construct a simplified noise model > that is agnostic of the actual origin of the noise. * goal is to invert A\mathcal AA, and thus restore the ideal measurement statistics. 30 MEASUREMENT ERROR MITIGATION 📄, 📄 (2/4) FORMAL DEFINITION: STEP 1. NOISE MODEL * Assume: the noise channel A\mathcal AA, has the computational subspace {⟨ ⟨x∣ ,x∈{0,1}N}\{ \langle\!\langle x\vert\ , x\in \{0,1\}^N \}{⟨⟨x∣ ,x∈{0,1}N} as its invariant subspace. → the resultant POVM basis {⟨ ⟨Ex∣}\{\langle\!\langle E_x\vert\}{⟨⟨Ex ∣} lives within the computational basis. * So, we can decompose the POVM in this way: ⟨ ⟨Ex∣=∑y⟨ ⟨Ex∣y⟩ ⟩⟨ ⟨y∣=∑y⟨ ⟨x∣A∣y⟩ ⟩⟨ ⟨y∣=∑yAxy⟨ ⟨y∣.\begin{align*} \langle\!\langle E_x\vert &= \sum_y \langle\!\langle E_x\vert y \rangle\!\rangle \langle\!\langle y\vert \\ &= \sum_y \langle\!\langle x\vert \mathcal A\vert y \rangle\!\rangle \langle\!\langle y\vert \\ &= \sum_y A_{xy}\langle\!\langle y\vert . \end{align*} ⟨⟨Ex ∣ =y∑ ⟨⟨Ex ∣y⟩⟩⟨⟨y∣=y∑ ⟨⟨x∣A∣y⟩⟩⟨⟨y∣=y∑ Axy ⟨⟨y∣. * The AxyA_{xy}Axy on the left is the assignment matrix (or sometimes called confusion matrix) > The coefficient ⟨ ⟨x∣A∣y⟩ ⟩\langle\!\langle x\vert \mathcal A\vert y > \rangle\!\rangle⟨⟨x∣A∣y⟩⟩ represents the probability of originally > measuring outcome yyy before noise, but measuring outcome xxx after the > noise. * so AAA is a transition (or "stochastic") matrix > How to actually get the coefficient ⟨ ⟨x∣A∣y⟩ ⟩\langle\!\langle x\vert > \mathcal A\vert y \rangle\!\rangle⟨⟨x∣A∣y⟩⟩? Ans: We just do the noisy > measurement on the computational state ∣y⟩ ⟩\vert y \rangle\!\rangle∣y⟩⟩, > and see what’s the probability of outcome xxx. 31 MEASUREMENT ERROR MITIGATION 📄, 📄 (3/4) FORMAL DEFINITION: STEP 2. ASSIGNMENT MATRIX * normal condition for inversion: > If matrix AAA is full-rank, then we can do the reverse: maps noisy outcome > back to noiseless outcome ⟨ ⟨y∣=∑xAyx−1⟨ ⟨Ex∣\langle\!\langle y\vert = > \sum_x A_{yx}^{-1}\langle\!\langle E_x\vert⟨⟨y∣=∑x Ayx−1 ⟨⟨Ex ∣ → we > restore the perfect outcome by linear-combining the noisy outcomes. * The AxyA_{xy}Axy on the left is the assignment matrix (or sometimes called the "confusion matrix") > The coefficient ⟨ ⟨x∣A∣y⟩ ⟩\langle\!\langle x\vert \mathcal A\vert y > \rangle\!\rangle⟨⟨x∣A∣y⟩⟩ represents the probability of originally > measuring outcome yyy before noise, but measuring outcome xxx after the > noise. * so AAA is a transition (or "stochastic") matrix > How to actually get the coefficient ⟨ ⟨x∣A∣y⟩ ⟩\langle\!\langle x\vert > \mathcal A\vert y \rangle\!\rangle⟨⟨x∣A∣y⟩⟩? Ans: We just do the noisy > measurement on the computational state ∣y⟩ ⟩\vert y \rangle\!\rangle∣y⟩⟩, > and see what’s the probability of outcome xxx. 32 MEASUREMENT ERROR MITIGATION 📄, 📄 (4/4) FORMAL DEFINITION: STEP 3. MATRIX INVERSION AND RESTORING IDEAL EXPECTATION VALUE * For our incoming state ρ\rhoρ: * noiseless measurement gives output distribution p0⃗={⟨ ⟨y∣ρ⟩ ⟩}\vec{p_0} = \{\langle\!\langle y\vert \rho \rangle\!\rangle\}p0 ={⟨⟨y∣ρ⟩⟩} * noisy measurement gives output distribution pnoisy⃗={⟨ ⟨Ex∣ρ⟩ ⟩}\vec{p_\text{noisy}} = \{\langle\!\langle E_x\vert \rho \rangle\!\rangle\}pnoisy ={⟨⟨Ex ∣ρ⟩⟩} → pnoisy⃗=Ap0⃗\vec{p_\text{noisy}} = A \vec{p_0}pnoisy =Ap0 , hence p0⃗=A−1pnoisy⃗\vec{p_0} = A^{-1} \vec{p_\text{noisy}}p0 =A−1pnoisy . * If we want to measure on an observable OOO which has spectrum O⃗={Ox}\vec O = \{O_x\}O={Ox } (i.e., ⟨ ⟨O∣=∑xOx⟨ ⟨x∣\langle\!\langle O\vert = \sum_x O_x \langle\!\langle x\vert⟨⟨O∣=∑x Ox ⟨⟨x∣), once we know (or obtain) the assignment matrix AAA and the noisy output distribution pnoisy⃗\vec{p_\text{noisy}}pnoisy , we can restore the noiseless expectation value: ⟨ ⟨O∣ρ⟩ ⟩=O⃗Tp0⃗=O⃗TA−1pnoisy⃗\langle\!\langle O\vert \rho \rangle\!\rangle = \vec{O}^T \vec{p_0} = \vec{O}^T A^{-1} \vec{p_\text{noisy}} ⟨⟨O∣ρ⟩⟩=OTp0 =OTA−1pnoisy 33 Loading slide... 34 MEM: WORKAROUND FOR LARGE SIZED AXYA_{XY}AXY (2/3) A MORE REALISTIC APPROACH: Bravyi et al try to construct the assignment matrix using continuous-time Markov processes with the generators (transition rate matrices) {Gi}\{G_i\}{Gi } being single and two-qubit operators: A=eG with G=∑i=12N2riGiA = e^G \;\text{ with }\; G = \sum_{i=1}^{2N^2} r_iG_i A=eG with G=i=1∑2N2 ri Gi > Why we are summing up to 2N22N^22N2? > Because for the generators, we can just pick {σx,σy}\{\sigma_x, \sigma_y\}{σx > ,σy }. > Therefore number of single-qubit operator → 2N2N2N. > and the number of two-qubit operator → N(N−1)2⋅22\frac{N(N-1)}{2} \cdot > 2^22N(N−1) ⋅22 (pick 222 qubits from NNN, and then assign each of them with > either σx\sigma_xσx or σy\sigma_yσy ). > Finally adding the two numbers up, we get 2N22N^22N2. * Once we’ve obtain all 2N22N^22N2 positive coefficients {ri}\{r_i\}{ri }, we can simply calculate A−1=e−GA^{-1} = e^{-G}A−1=e−G. 35 MEM: WORKAROUND FOR LARGE SIZED AXYA_{XY}AXY (3/3) Introducing Pauli Twirling - simplifying the process 📄, 📄 After twirling, we can turn the original noise channel A\mathcal AA into a Pauli channel D\mathcal DD, i.e., by using twirling we can remove all the off-diagonal elements of AAA (in Pauli-basis) * Before twirling: ideally we want to measure in ⟨ ⟨σzx∣\langle\!\langle \sigma_z^x\vert⟨⟨σzx ∣ (xxx is the bit string marking the qubits that are acted by σz\sigma_zσz ), but because of the noise, we can only measure in ⟨ ⟨σzx∣A\langle\!\langle \sigma_z^x\vert\mathcal A⟨⟨σzx ∣A. * After twirling: A\mathcal AA becomes D\mathcal DD, and what’s special about it is that ⟨ ⟨σzx∣D=Dx⟨ ⟨σzx∣\langle\!\langle \sigma_z^x\vert\mathcal D = D_x\langle\!\langle \sigma_z^x\vert⟨⟨σzx ∣D=Dx ⟨⟨σzx ∣ where Dx=⟨ ⟨σzx∣D∣σzx⟩ ⟩D_x = \langle\!\langle \sigma_z^x\vert\mathcal D \vert\sigma_z^x\rangle\!\rangleDx =⟨⟨σzx ∣D∣σzx ⟩⟩ → the noisy measurement is just a rescale of the noiseless measurement For a given input state ∣ρ⟩ ⟩\vert \rho \rangle\!\rangle∣ρ⟩⟩, we have simply: ⟨ ⟨σzx∣ρ⟩ ⟩⏟ideal=Dx−1⟨ ⟨σzx∣D∣ρ⟩ ⟩⏟noisy, but also twirled\underbrace{\langle\!\langle \sigma_z^x\vert \rho \rangle\!\rangle}_{\text{ideal}} = D_x^{-1} \underbrace{\langle\!\langle \sigma_z^x\vert\mathcal D\vert \rho \rangle\!\rangle}_{\text{noisy, but also twirled}} ideal⟨⟨σzx ∣ρ⟩⟩ =Dx−1 noisy, but also twirled⟨⟨σzx ∣D∣ρ⟩⟩ > What’s more awesome is that: for Pauli twirling in this case, we only need to > conjugate the original noise channel A\mathcal AA with {I,σx}\{I, > \sigma_x\}{I,σx }instead of {I,σx,σy,σz}\{I, \sigma_x, \sigma_y, > \sigma_z\}{I,σx ,σy ,σz } because conjugating A\mathcal AA with σy\sigma_yσy > or σz\sigma_zσz will only have trivial effect within the computational > subspace. 36 Loading slide... 37 Loading slide... 38 PURITY CONSTRAINTS ORIGIN * Many quantum algorithms target the preparation of an ideal state ρ0\rho_0ρ0 that is pure: ρ0=∣ψ0⟩⟨ψ0∣\rho_0 = \ket{\psi_0}\bra{\psi_0}ρ0 =∣ψ0 ⟩⟨ψ0 ∣. * Common-seen noise channels are stochastic, which will turn our ideal pure state ρ0\rho_0ρ0 into some noisy mixed state ρ\rhoρ. * So our goal is to find (or approximate) the closest pure state to ρ\rhoρ. * We can do spectral decomposition of NNN-qubit state: ρ=∑i=12Npi∣ϕi⟩⟨ϕi∣\rho = \sum_{i=1}^{2^N} p_i \ket{\phi_i}\bra{\phi_i}ρ=∑i=12N pi ∣ϕi ⟩⟨ϕi ∣ where the eigenvalues are in order (p1≥p2≥p3≥⋯p_1\geq p_2 \geq p_3 \geq \cdotsp1 ≥p2 ≥p3 ≥⋯), then our goal is to seek for ∣ϕ1⟩⟨ϕ1∣\ket{\phi_1}\bra{\phi_1}∣ϕ1 ⟩⟨ϕ1 ∣. * There’s already a way to do this: quantum Principal Component Analysis → sample from the eigenbasis of ρ\rhoρ. But, the circuit associated with this method is too deep for near-term devices. → need alternative strategies. 39 VIRTUAL DISTILLATION / ERROR SUPPRESSION BY DERANGEMENT ( VD / ESD ) * By collective measuring MMM copies of ρ\rhoρ (which is noisy) to infer the expectation values w.r.t. the MthM^\text{th}Mth degree purified state: ρpur(M)=ρMTr[ρM]=1∑i=12NpiM∑i=12NpiM∣ϕi⟩⟨ϕi∣\rho_\text{pur}^{(M)} = \frac{\rho^M}{\text{Tr}[\rho^M]} = \frac{1}{\sum_{i=1}^{2^N} p_i^M } \sum_{i=1}^{2^N} p_i^M \ket{\phi_i}\bra{\phi_i} ρpur(M) =Tr[ρM]ρM =∑i=12N piM 1 i=1∑2N piM ∣ϕi ⟩⟨ϕi ∣ * As M→∞M\rightarrow\inftyM→∞, our ρpur(M)→∣ϕ1⟩⟨ϕ1∣\rho_\text{pur}^{(M)} \rightarrow\ket{\phi_1}\bra{\phi_1}ρpur(M) →∣ϕ1 ⟩⟨ϕ1 ∣. The remaining bias in the estimator is called coherent mismatch or noise floor. * How do we measure to obtain Tr[Oρpur(M)]\text{Tr}[O\rho_\text{pur}^{(M)}]Tr[Oρpur(M) ]? Notice from the above formula that we have: Tr[Oρpur(M)]=Tr[OρM]Tr[ρM]\text{Tr}[O\rho_\text{pur}^{(M)}] = \frac{\text{Tr}[O\rho^M]}{\text{Tr}[\rho^M]} Tr[Oρpur(M) ]=Tr[ρM]Tr[OρM] → so we can separately measure for Tr[OρM]\text{Tr}[O\rho^M]Tr[OρM] and Tr[ρM]\text{Tr}[\rho^M]Tr[ρM] and divide one by another. 40 HOW TO MEASURE TR[OΡM]\TEXT{TR}[O\RHO^M]TR[OΡM] ? (and Tr[ρM]\text{Tr}[\rho^M]Tr[ρM] of course) Let SMS_MSM be the cyclic permutation operator between MMM copies of ρ\rhoρ, then Tr[OρM]\text{Tr}[O\rho^M]Tr[OρM] is just Tr[SMOmρ⊗M]\text{Tr}[S_MO_m\rho^{\otimes M}]Tr[SM Om ρ⊗M], or Tr[SMOˉρ⊗M]\text{Tr}[S_M\bar O\rho^{\otimes M}]Tr[SM Oˉρ⊗M] where Oˉ=1M∑mOm\bar O = \frac1M\sum_mO_mOˉ=M1 ∑m Om (is the observable symmetrized under copy permutation.), case for M=3M=3M=3: , and for Tr[ρM]\text{Tr}[\rho^M]Tr[ρM], we just simply remove OOO from the above graph. 41 Loading slide... 42 SUBSPACE EXPANSIONS 📄, 📄, 📄 One class of methods that can take advantage of the knowledge of the problem are quantum subspace expansion techniques. PROBLEM FORMULATION * e.g., optimization or state preparation, can be phrased as the desire to find some ∣ψ⟩\ket{\psi}∣ψ⟩ to minimize the objective function ⟨ψ∣H∣ψ⟩\bra{\psi} H \ket{\psi}⟨ψ∣H∣ψ⟩, with some already-known HHH. * but we can not directly prepare perfect ∣ψ⟩\ket{\psi}∣ψ⟩ (due to some coherent error in the circuit, or just “don’t knowing what the ideal circuit should be“) * good news is that: we can easily prepare a set of MMM different states {∣ϕi⟩}\{\ket{\phi_i}\}{∣ϕi ⟩}, which are linearly independent (but not necessary orthogonal to each other). And express our desired state as a linear combination of them: ∣ψw⃗⟩=∑i=0M−1wi∣ϕi⟩\ket{\psi_{\vec w}} = \sum_{i=0}^{M-1} w_i\ket{\phi_i} ∣ψw ⟩=i=0∑M−1 wi ∣ϕi ⟩ 43 * Dual problem: it’s equivalent to finding the optimal set of coefficients w⃗∗\vec w^*w∗ such that: w⃗∗=argminw⃗⟨ψw⃗∣H∣ψw⃗⟩ s.t. ⟨ψw⃗∣ψw⃗⟩=1\vec w^*= \arg\min_{\vec w} \bra{\psi_{\vec w}} H \ket{\psi_{\vec w}} \;\; \text{ s.t. }\;\; \braket{\psi_{\vec w}|\psi_{\vec w}} =1 w∗=argwmin ⟨ψw ∣H∣ψw ⟩ s.t. ⟨ψw ∣ψw ⟩=1 * This is a generalized linear eigenvalue problem of pair (Hˉ,Sˉ)(\bar H, \bar S)(Hˉ,Sˉ): HˉW=SˉWE\bar H W = \bar S WE HˉW=SˉWE , where Hˉij=⟨ϕi∣H∣ϕj⟩\bar H_{ij} = \bra{\phi_i} H \ket{\phi_j}Hˉij =⟨ϕi ∣H∣ϕj ⟩ and Sˉij=⟨ϕi∣ϕj⟩\bar S_{ij} = \braket{\phi_i | \phi_j}Sˉij =⟨ϕi ∣ϕj ⟩. To speak in English, Hˉ\bar HHˉ is just HHH, but in the {∣ϕi⟩}\{\ket{\phi_i}\}{∣ϕi ⟩}-basis (hence with dimension M×MM\times MM×M); and Sˉ\bar SSˉ is just the overlap matrix for the basis set {∣ϕi⟩}\{\ket{\phi_i}\}{∣ϕi ⟩}. generalized linear eigenvalue problem Eigenvalue problem: Aϕ⃗i=λiϕ⃗i\bm A \vec\phi_i = \lambda_i \vec\phi_iAϕ i =λi ϕ i , or in matrix form: AΦ=ΦΛ\bm A \bm\Phi = \bm\Phi \bm\LambdaAΦ=ΦΛ. (the lambda matrix Λ\bm\LambdaΛ is diagonal) Generalized linear eigenvalue problem: Aϕ⃗i=λiBϕ⃗i\bm A \vec\phi_i = \lambda_i \bm B\vec\phi_iAϕ i =λi Bϕ i , or in matrix form: AΦ=BΦΛ\bm A \bm\Phi = \bm B \bm\Phi \bm\LambdaAΦ=BΦΛ. → this problem reduces to the previous simplier one when B=identity matrix\bm B = \text{identity matrix}B=identity matrix. 44 ILLUSTRATION OF SUBSPACE EXPANSION ROUTINE in next slide: calculating Code Hamiltonian Matrix Hˉij=⟨ϕi∣H∣ϕj⟩\bar H_{ij} = \bra{\phi_i} H \ket{\phi_j}Hˉij =⟨ϕi ∣H∣ϕj ⟩ and Overlap Matrix Sˉij=⟨ϕi∣ϕj⟩\bar S_{ij} = \braket{\phi_i | \phi_j}Sˉij =⟨ϕi ∣ϕj ⟩ 45 Loading slide... 46 FINDING W⃗∗\VEC W^*W∗ * We know HHH, and we also know {∣ϕi⟩}\{\ket{\phi_i}\}{∣ϕi ⟩}, so we can formulate the above problem explicitly. * And this problem (generalized linear eigenvalue problem) can be solved classically (on classical computer), obtaining the WWW and EEE matrices (called "eigenvectors matrices" and "eigenvalue matrices") on classical computer. * Then, what we want is the lowest eigenvalue in EEE (not yet!), and its associated eigenvector (in WWW) is the optimal combination of coefficients w⃗∗\vec w^*w∗. 47 Loading slide... 48 NEXT UP What are the costs (sample complexity) of these techniques? 49 QUANTUM FISHER INFORMATION * p(x∣λ)p(x|\lambda)p(x∣λ) → conditional probability of getting xxx as the measurement result when the parameter is λ\lambdaλ, then the Fisher Information is: F(λ)=∫p(x∣λ)(∂ln(p(x∣λ))∂λ)2dx=∫1p(x∣λ)(∂p(x∣λ)∂λ)2dxF(\lambda) = \int p(x|\lambda) \left( \frac{\partial \ln(p(x|\lambda))}{\partial \lambda} \right)^2 dx = \int \frac{1}{p(x|\lambda)} \left( \frac{\partial p(x|\lambda)}{\partial \lambda} \right)^2 dx F(λ)=∫p(x∣λ)(∂λ∂ln(p(x∣λ)) )2dx=∫p(x∣λ)1 (∂λ∂p(x∣λ) )2dx * Now introduce POVM {Πx}\{\Pi_x\}{Πx } which obeys completeness: ∫Πxdx=I\int\Pi_xdx=I∫Πx dx=I, and for our later convenience, define the symmetric logarithmic derivative (SLD) LλL_\lambdaLλ implicitly as the Hermitian operator satisfying: Lλρλ+ρλLλ2=∂ρλ∂λ\frac{L_\lambda\rho_\lambda + \rho_\lambda L_\lambda}{2} = \frac{\partial\rho_\lambda}{\partial\lambda} 2Lλ ρλ +ρλ Lλ =∂λ∂ρλ * the naming origin of SLD: symmetric: Lλρλ+ρλLλL_\lambda\rho_\lambda + \rho_\lambda L_\lambdaLλ ρλ +ρλ Lλ → obvious. logarithmic derivative: divide both sides by ρλ\rho_\lambdaρλ we kind of get Lλ:=∂lnρλ∂λL_\lambda := \frac{\partial\ln\rho_\lambda}{\partial\lambda}Lλ :=∂λ∂lnρλ * This is a Lyapunov matrix equation with general solution available. 50 * From the definition in the previous slide, we can rewrite Fisher Information as F(λ)=∫1p(x∣λ)(∂p(x∣λ)∂λ)2dx=∫1tr[ρΠ](∂∂λtr[ρΠ])2dx=∫1tr[ρΠ](tr[∂ρ∂λΠ])2dx=∫1tr[ρΠ](12tr[LρΠ+ρLΠ])2dx=∫Re(tr[ρΠL])2tr[ρΠ]dx\begin{align*} F(\lambda) &= \int \frac{1}{p(x|\lambda)} \left( \frac{\partial p(x|\lambda)}{\partial \lambda} \right)^2 dx \\ &= \int \frac{1}{\text{tr}[\rho\Pi]} \left( \frac{\partial}{\partial\lambda} \text{tr}[\rho\Pi] \right)^2 dx \\ &= \int \frac{1}{\text{tr}[\rho\Pi]} \left( \text{tr}\left[\frac{\partial\rho}{\partial\lambda} \Pi\right] \right)^2 dx \\ &= \int \frac{1}{\text{tr}[\rho\Pi]} \left( \frac12\text{tr}\left[L\rho \Pi + \rho L \Pi\right] \right)^2 dx \\ &= \int \frac{\text{Re}(\text{tr}[\rho\Pi L])^2}{\text{tr}[\rho\Pi]} dx \end{align*} F(λ) =∫p(x∣λ)1 (∂λ∂p(x∣λ) )2dx=∫tr[ρΠ]1 (∂λ∂ tr[ρΠ])2dx=∫tr[ρΠ]1 (tr[∂λ∂ρ Π])2dx=∫tr[ρΠ]1 (21 tr[LρΠ+ρLΠ])2dx=∫tr[ρΠ]Re(tr[ρΠL])2 dx 51 Loading slide... 52 * In the previous slide, the first inequality saturates when tr[ρΠL]\text{tr}[\rho\Pi L]tr[ρΠL] is a real number for any λ\lambdaλ, and the second inequality comes from the Schwartz inequality ∣tr[A†B]∣2≤tr[A†A]tr[B†B]|\text{tr}[A^\dagger B]|^2 \leq \text{tr}[A^\dagger A] \text{tr}[B^\dagger B]∣tr[A†B]∣2≤tr[A†A]tr[B†B], with A†≡ρΠ/tr[ρΠ]A^\dagger\equiv \sqrt\rho\sqrt\Pi/\sqrt{\text{tr}[\rho\Pi]}A†≡ρ Π /tr[ρΠ] and B≡ΠLρB\equiv \sqrt\Pi L\sqrt\rhoB≡Π Lρ , and it’s saturated when Πρtr[ρΠ]=ΠLρtr[ρΠL]\frac{\sqrt\Pi\sqrt\rho}{\text{tr}[\rho\Pi]} = \frac{\sqrt\Pi L\sqrt\rho}{\text{tr}[\rho\Pi L]}tr[ρΠ]Π ρ =tr[ρΠL]Π Lρ for any λ\lambdaλ. <- this is satisfied iff {Πx}\{\Pi_x\}{Πx } is made by the set of projectors over the eigenstates of LλL_\lambdaLλ , which, in turn, represents the optimal POVM to estimate λ\lambdaλ. * The above chain of inequalities prove that the Fisher information F(λ)F(\lambda)F(λ) of any quantum measurement is bounded by the so-called Quantum Fisher Information H(λ)≡tr[ρL2]=tr[∂ρ∂λL]H(\lambda) \equiv \text{tr}[\rho L^2] = \text{tr}[\frac{\partial\rho}{\partial\lambda}L]H(λ)≡tr[ρL2]=tr[∂λ∂ρ L]. * You can view Quantum Fisher Information as the supremum of the classical Fisher Information over all observables. 53 LYAPUNOV MATRIX EQUATION The general solution to Lλρλ+ρλLλ2=∂ρλ∂λ\frac{L_\lambda\rho_\lambda + \rho_\lambda L_\lambda}{2} = \frac{\partial\rho_\lambda}{\partial\lambda}2Lλ ρλ +ρλ Lλ =∂λ∂ρλ for the SLD LλL_\lambdaLλ may be written as: Lλ=2∫0∞e−ρt∂ρ∂λe−ρtdtL_\lambda = 2\int_0^\infty e^{-\rho t }\frac{\partial\rho}{\partial \lambda}e^{-\rho t } dt Lλ =2∫0∞ e−ρt∂λ∂ρ e−ρtdt Now if we write ρ\rhoρ in its eigenbasis ∑nρn∣ψn⟩⟨ψn∣\sum_n\rho_n\ket{\psi_n}\bra{\psi_n}∑n ρn ∣ψn ⟩⟨ψn ∣, the SLD easily becomes: (you cen verify this by direct substitution ) Lλ=2∑m,n⟨ψm∣∂ρ∂λ∣ψn⟩ρm+ρn∣ψm⟩⟨ψn∣L_\lambda = 2\sum_{m,n} \frac{\bra{\psi_m}\frac{\partial\rho}{\partial\lambda}\ket{\psi_n}}{\rho_m+\rho_n}\ket{\psi_m}\bra{\psi_n} Lλ =2m,n∑ ρm +ρn ⟨ψm ∣∂λ∂ρ ∣ψn ⟩ ∣ψm ⟩⟨ψn ∣ where the sum only includes terms with ρn+ρm≠0\rho_n+\rho_m\neq0ρn +ρm =0. From this we can express our Quantum Fisher Information H(λ)H(\lambda)H(λ) as: (to verify this, just use the dummy index trick) H(λ)=tr[ρL2]=4∑m,nρm∣⟨ψm∣∂ρ∂λ∣ψn⟩∣2(ρm+ρn)2=2∑m,n∣⟨ψm∣∂ρ∂λ∣ψn⟩∣2ρm+ρnH(\lambda) = \text{tr}[\rho L^2] = 4\sum_{m,n} \rho_m \frac{\left|\bra{\psi_m}\frac{\partial\rho}{\partial\lambda}\ket{\psi_n}\right|^2}{(\rho_m+\rho_n)^2} = 2\sum_{m,n} \frac{\left|\bra{\psi_m}\frac{\partial\rho}{\partial\lambda}\ket{\psi_n}\right|^2}{\rho_m+\rho_n} H(λ)=tr[ρL2]=4m,n∑ ρm (ρm +ρn )2 ⟨ψm ∣∂λ∂ρ ∣ψn ⟩ 2 =2m,n∑ ρm +ρn ⟨ψm ∣∂λ∂ρ ∣ψn ⟩ 2 54 USUAL FORM OF QUANTUM FISHER INFORMATION H(Λ)H(\LAMBDA)H(Λ) If the dynamics of ρ\rhoρ is governed by some Hamiltonian-like operator AAA, that is, ρ=(e−iλA)(ρ0)(eiλA)\rho = (e^{-i\lambda A}) (\rho_0 )(e^{i\lambda A})ρ=(e−iλA)(ρ0 )(eiλA), the Quantum Fisher Information reduces to: H(λ)=2∑m,n(ρm−ρn)2ρm+ρn∣⟨ψm∣A∣ψn⟩∣2H(\lambda) = 2\sum_{m,n} \frac{(\rho_m-\rho_n)^2}{\rho_m+\rho_n} \left|\bra{\psi_m}A\ket{\psi_n}\right|^2 H(λ)=2m,n∑ ρm +ρn (ρm −ρn )2 ∣⟨ψm ∣A∣ψn ⟩∣2 * Proof: ⟨ψm∣∂ρ∂λ∣ψn⟩=⟨ψm∣(−iA)ρ+ρ(iA)∣ψn⟩=⟨ψm∣((−iA)∑mρn∣ψn⟩⟨ψn∣+∑mρn∣ψn⟩⟨ψn∣(iA))∣ψn⟩=−iρn⟨ψm∣A∣ψn⟩+iρm⟨ψm∣A∣ψn⟩ ⟹ ∣⟨ψm∣∂ρ∂λ∣ψn⟩∣2=(ρm−ρn)2⋅∣⟨ψm∣A∣ψn⟩∣2\begin{align*} &\bra{\psi_m}\frac{\partial\rho}{\partial\lambda}\ket{\psi_n} \\ &= \bra{\psi_m} (-iA)\rho + \rho (iA) \ket{\psi_n} \\ &= \bra{\psi_m} \left((-iA)\sum_m\rho_n\ket{\psi_n}\bra{\psi_n} + \sum_m\rho_n\ket{\psi_n}\bra{\psi_n} (iA)\right) \ket{\psi_n} \\ &= -i\rho_n \bra{\psi_m}A \ket{\psi_n} + i\rho_m \bra{\psi_m}A \ket{\psi_n} \\ &\implies\left|\bra{\psi_m}\frac{\partial\rho}{\partial\lambda}\ket{\psi_n}\right|^2 = (\rho_m-\rho_n)^2 \cdot \left| \bra{\psi_m}A \ket{\psi_n}\right|^2 \end{align*} ⟨ψm ∣∂λ∂ρ ∣ψn ⟩=⟨ψm ∣(−iA)ρ+ρ(iA)∣ψn ⟩=⟨ψm ∣((−iA)m∑ ρn ∣ψn ⟩⟨ψn ∣+m∑ ρn ∣ψn ⟩⟨ψn ∣(iA))∣ψn ⟩=−iρn ⟨ψm ∣A∣ψn ⟩+iρm ⟨ψm ∣A∣ψn ⟩⟹ ⟨ψm ∣∂λ∂ρ ∣ψn ⟩ 2=(ρm −ρn )2⋅∣⟨ψm ∣A∣ψn ⟩∣2 55 OPTIMAL QUANTUM ESTIMATOR Notice, however, that LλL_\lambdaLλ itself may not represent the optimal observable to be measured. Using the fact that Tr[ρλLλ]=0\text{Tr}[\rho_\lambda L_\lambda]=0Tr[ρλ Lλ ]=0 (because trace of ρ\rhoρ is always one, putting it into the SLD definition and it becomes obvious~), we can come up with an explicit form for the optimal quantum estimator: Oλ=λI+LλH(λ)O_\lambda = \lambda \mathbb{I} + \frac{L_\lambda}{H(\lambda)} Oλ =λI+H(λ)Lλ for which we have: {Tr[ρλOλ]=λTr[ρλOλ2]=λ2+Tr[ρλLλ2]H2(λ)\begin{cases} \text{Tr}[\rho_\lambda O_\lambda] = \lambda \\ \text{Tr}[\rho_\lambda O_\lambda^2] = \lambda^2 + \dfrac{\text{Tr}[\rho_\lambda L_\lambda^2] }{H^2(\lambda)} \end{cases} ⎩⎨⎧ Tr[ρλ Oλ ]=λTr[ρλ Oλ2 ]=λ2+H2(λ)Tr[ρλ Lλ2 ] therefore ⟨ΔOλ2⟩=1/H(λ)\braket{\Delta O^2_\lambda} = 1/H(\lambda)⟨ΔOλ2 ⟩=1/H(λ). 56 EXPANSION OF SLD We can further expand ∂ρ/∂λ\partial\rho/\partial\lambda∂ρ/∂λ in its eigenbasis as ∂ρ∂λ=∑n∂ρn∂λ∣ψn⟩⟨ψn∣+ρn∣∂ψn∂λ⟩⟨ψn∣+ρn∣ψn⟩⟨∂ψn∂λ∣\frac{\partial\rho}{\partial\lambda} = \sum_n \frac{\partial\rho_n}{\partial\lambda} \ket{\psi_n}\bra{\psi_n} + \rho_n \ket{\frac{\partial\psi_n}{\partial\lambda}} \bra{\psi_n} + \rho_n\ket{\psi_n} \bra{\frac{\partial\psi_n}{\partial\lambda}} ∂λ∂ρ =n∑ ∂λ∂ρn ∣ψn ⟩⟨ψn ∣+ρn ∣∂λ∂ψn ⟩⟨ψn ∣+ρn ∣ψn ⟩⟨∂λ∂ψn ∣ but since SLD is defined only on the support of ρ\rhoρ and that both the eigenvalues ρk\rho_kρk and the eigenvectors ∣ψn⟩\ket{\psi_n}∣ψn ⟩ may depend on the parameter λ\lambdaλ, we may moreover expand ∣ψn⟩\ket{\psi_n}∣ψn ⟩ in some arbitrary basis {∣k⟩}\{\ket k\}{∣k⟩} that is independent of λ\lambdaλ. If we do this expansion for ⟨ψn∣ψm⟩=δnm\braket{\psi_n | \psi_m} = \delta_{nm}⟨ψn ∣ψm ⟩=δnm , we get (but this will not be used in the later context tbh) ⟨ψn∣ψm⟩=∑k,l(ψnk∗⟨k∣)(ψml∣l⟩)=∑kψnk∗ψmk=δnm\braket{\psi_n | \psi_m} = \sum_{k,l} (\psi_{nk}^* \bra{k} ) (\psi_{ml} \ket l) = \sum_k \psi_{nk}^*\psi_{mk}=\delta_{nm} ⟨ψn ∣ψm ⟩=k,l∑ (ψnk∗ ⟨k∣)(ψml ∣l⟩)=k∑ ψnk∗ ψmk =δnm 57 On the other hand, from product rule we have ∂⟨ψm∣ψn⟩∂λ=⟨∂ψm∂λ∣ψn⟩+⟨ψm∣∂ψn∂λ⟩=∂δmn∂λ=0\frac{\partial\braket{\psi_m|\psi_n} }{\partial\lambda} = \braket{\frac{\partial\psi_m}{\partial\lambda} | \psi_n} + \braket{\psi_m|\frac{\partial\psi_n}{\partial\lambda}} = \frac{\partial \delta_{mn}}{\partial\lambda} = 0 ∂λ∂⟨ψm ∣ψn ⟩ =⟨∂λ∂ψm ∣ψn ⟩+⟨ψm ∣∂λ∂ψn ⟩=∂λ∂δmn =0 therefore using the fact that ⟨∂ψm∂λ∣ψn⟩=−⟨ψm∣∂ψn∂λ⟩\braket{\frac{\partial\psi_m}{\partial\lambda} | \psi_n} =-\braket{\psi_m|\frac{\partial\psi_n}{\partial\lambda}}⟨∂λ∂ψm ∣ψn ⟩=−⟨ψm ∣∂λ∂ψn ⟩, we can rewrite the matrix form of LλL_\lambdaLλ in the SLD general solution as: Lλ=2∑m,n⟨ψm∣∂ρ∂λ∣ψn⟩ρm+ρn∣ψm⟩⟨ψn∣=∑n∂ρn∂λ∣ψn⟩⟨ψn∣ρn +2∑m≠n(ρn⟨ψm∣∂ψn∂λ⟩+ρm⟨∂ψm∂λ∣ψn⟩)∣ψm⟩⟨ψn∣ρm+ρn=∑n∂ρn∂λ∣ψn⟩⟨ψn∣ρn+2∑m≠nρn−ρmρn+ρm⟨ψm∣∂ψn∂λ⟩∣ψm⟩⟨ψn∣\begin{align*} L_\lambda &= 2\sum_{m,n} \frac{\bra{\psi_m}\frac{\partial\rho}{\partial\lambda}\ket{\psi_n}}{\rho_m+\rho_n}\ket{\psi_m}\bra{\psi_n} \\ &= \sum_n \frac{\partial\rho_n}{\partial\lambda}\frac{\ket{\psi_n}\bra{\psi_n}}{\rho_n} \\ &\;\;\;+ 2\sum_{m\neq n} \left( \rho_n \braket{\psi_m|\frac{\partial\psi_n}{\partial\lambda}} + \rho_m\braket{\frac{\partial\psi_m}{\partial\lambda} | \psi_n} \right)\frac{\ket{\psi_m}\bra{\psi_n}}{\rho_m+\rho_n} \\ &= \sum_n \frac{\partial\rho_n}{\partial\lambda}\frac{\ket{\psi_n}\bra{\psi_n}}{\rho_n} + 2\sum_{m\neq n} \frac{\rho_n-\rho_m}{\rho_n+\rho_m} \braket{\psi_m|\frac{\partial\psi_n}{\partial\lambda}} \ket{\psi_m}\bra{\psi_n} \end{align*} Lλ =2m,n∑ ρm +ρn ⟨ψm ∣∂λ∂ρ ∣ψn ⟩ ∣ψm ⟩⟨ψn ∣=n∑ ∂λ∂ρn ρn ∣ψn ⟩⟨ψn ∣ +2m=n∑ (ρn ⟨ψm ∣∂λ∂ψn ⟩+ρm ⟨∂λ∂ψm ∣ψn ⟩)ρm +ρn ∣ψm ⟩⟨ψn ∣ =n∑ ∂λ∂ρn ρn ∣ψn ⟩⟨ψn ∣ +2m=n∑ ρn +ρm ρn −ρm ⟨ψm ∣∂λ∂ψn ⟩∣ψm ⟩⟨ψn ∣ 58 Loading slide... 59 MULTI-PARAMETRIC QFI AND RE-PARAMETRIZATION When our quantum states ρ\rhoρ depends on a set of parameters λ={λi},i=1,2,⋯ ,N\bm\lambda = \{\lambda_i\}, i=1,2,\cdots,Nλ={λi },i=1,2,⋯,N, our Quantum Fisher Information changes from a scalar into a matrix: H(λ)ij=Tr[ρLiLj+LjLi2]=Tr[∂ρ∂λiLj]=Tr[∂ρ∂λjLi]=∑n(∂iρn)(∂jρn)ρn+∑n≠m(ρm−ρn)2ρm+ρn[⟨ψn∣∂iψm⟩⟨∂jψm∣ψn⟩+⟨ψn∣∂jψm⟩⟨∂iψm∣ψn⟩]\begin{align*} \bm H(\bm\lambda)_{ij} &= \text{Tr}\left[ \rho \frac{L_iL_j + L_jL_i}{2} \right] = \text{Tr}[\frac{\partial\rho}{\partial\lambda_i}L_j] = \text{Tr}[\frac{\partial\rho}{\partial\lambda_j}L_i] \\ &= \sum_n \frac{(\partial_i\rho_n)(\partial_j\rho_n)}{\rho_n} + \sum_{n\neq m} \frac{(\rho_m-\rho_n)^2}{\rho_m+\rho_n} \left[ \braket{\psi_n|\partial_i\psi_m} \braket{\partial_j\psi_m|\psi_n} + \braket{\psi_n|\partial_j\psi_m} \braket{\partial_i\psi_m|\psi_n} \right] \end{align*} H(λ)ij =Tr[ρ2Li Lj +Lj Li ]=Tr[∂λi ∂ρ Lj ]=Tr[∂λj ∂ρ Li ]=n∑ ρn (∂i ρn )(∂j ρn ) +n=m∑ ρm +ρn (ρm −ρn )2 [⟨ψn ∣∂i ψm ⟩⟨∂j ψm ∣ψn ⟩+⟨ψn ∣∂j ψm ⟩⟨∂i ψm ∣ψn ⟩] 60 The Cramer-Rao theorem for multi-parameter estimation says that the inverse of the Fisher matrix provides a lower bound on the covariance matrix Cov[γ]ij=⟨λiλj⟩−⟨λi⟩⟨λj⟩\text{Cov}[\bm\gamma]_{ij} = \braket{\lambda_i\lambda_j} - \braket{\lambda_i}\braket{\lambda_j}Cov[γ]ij =⟨λi λj ⟩−⟨λi ⟩⟨λj ⟩, that is, Cov[γ]≥1NH(λ)−1\text{Cov}[\bm\gamma] \geq \frac{1}{N} \bm H(\bm\lambda)^{-1} Cov[γ]≥N1 H(λ)−1 * this is a matrix inequality, hence the corresponding bound may not be achievable in a multi-parameter estimation. * but on the other hand, the diagonal elements of the inverse Fisher matrix provide achievable bounds for the variances of single parameter estimators at fixed value of the others, which is Var(λi)=γii≥1N(H−1)ii\text{Var}(\lambda_i) = \gamma_{ii} \geq \frac{1}{N} (\bm H^{-1})_{ii}Var(λi )=γii ≥N1 (H−1)ii . 61 Loading slide... 62 APPLYING CRAMÉR-RAO BOUND Our task is now: Given: an unknown, noiseless quantum state ρ\rhoρ which can be parameterized as ρ^(θ)\hat \rho(\bm{\theta})ρ^ (θ) with θ\bm{\theta}θ being the unknown parameters. Goal: to estimate the quantity of f(θ)f(\bm{\theta})f(θ), which is a function of θ\bm{\theta}θ. How: by performing POVM on NNN-many independent copies of ρ^(θ)\hat \rho(\bm{\theta})ρ^ (θ). 63 APPLYING CRAMÉR-RAO BOUND: QUICK RECAP Variance≥1N⋅{Quantum Fisher Information}\text{Variance} \geq \frac{1}{N\cdot\{\text{Quantum Fisher Information}\}} Variance≥N⋅{Quantum Fisher Information}1 Var(f(θ)est)≥1N⋅∇θ(f(θ))T J(ρ(θ))−1 ∇θ(f(θ))\text{Var}(f(\bm\theta)^\text{est}) \geq \frac 1N \cdot \bm\nabla_{\bm\theta} (f(\bm\theta))^T \; J(\rho(\bm\theta))^{-1} \; \bm\nabla_{\bm\theta} (f(\bm\theta)) Var(f(θ)est)≥N1 ⋅∇θ (f(θ))TJ(ρ(θ))−1∇θ (f(θ)) (can jump to Main Results) The quantum Fisher information matrix JJJ of ρ^(θ)\hat\rho(\bm\theta)ρ^ (θ) is defined exactly the same as in Multi-parametric QFI and re-parametrization: [J]ij=12tr[ρ^{L^i,L^j}][J]_{ij} = \frac12 \text{tr} \left[\hat\rho\Big\{ \hat L_i, \hat L_j \Big\}\right] [J]ij =21 tr[ρ^ {L^i ,L^j }] where L^i\hat L_iL^i is the symmetric logarithmic derivative (SLD) operator defined exactly the same as in Quantum Fisher Information: (with slightly changed notation) 12{ρ^,L^i}=∂ρ^∂θi\frac12\Big\{ \hat\rho, \hat L_i \Big\} = \frac{\partial \hat\rho}{\partial \theta_i} 21 {ρ^ ,L^i }=∂θi ∂ρ^ 64 Loading slide... 65 Loading slide... 66 VARIANCE OF NOISELESS STATE The bound for Var(⟨X^⟩est)\text{Var}(\braket{\hat X}^\text{est})Var(⟨X^⟩est) satisfies: Var(⟨X^⟩est)≥1N⋅(2(n−1)/2 xT)J(ρ(θ))−1(2(n−1)/2 x)=2n−1NxTJ(ρ(θ))−1x=1N(ΔX^)2\begin{align*} \text{Var}(\braket{\hat X}^\text{est}) &\geq \frac1N \cdot \left( 2^{(n-1)/2}\; \bm{x}^T\right) J(\rho(\bm\theta))^{-1} \left( 2^{(n-1)/2}\; \bm{x}\right) \\ &= \frac{2^{n-1}}{N} \bm{x}^T J(\rho(\bm\theta))^{-1} \bm{x} \\ &= \frac1N (\Delta\hat X)^2 \end{align*} Var(⟨X^⟩est) ≥N1 ⋅(2(n−1)/2xT)J(ρ(θ))−1(2(n−1)/2x)=N2n−1 xTJ(ρ(θ))−1x=N1 (ΔX^)2 where (ΔX^)2≡(\Delta\hat X)^2 \equiv(ΔX^)2≡ the variance of observable X^\hat XX^ of the noiseless state ρ^\hat\rhoρ^ . * The equality in the above equation can be achieved when we do optimal POVM, which is the projection measurement in the diagonal basis of operator X^\hat XX^. 67 VARIANCE OF NOISY STATE Now suppose we add noise channel with the form described in Virtual Quantum Circuit, El:θ↦Alθ+cl\mathcal{E}_l: \bm{\theta} \mapsto A_l \bm{\theta} + \bm{c}_lEl :θ↦Al θ+cl where its two components: * (Al)ij=Tr[P^iEl(P^j)]/2n=12Tr[λ^iEl(λ^j)](A_l)_{ij} = \mathrm{Tr}[\hat P_i \mathcal{E}_l (\hat P_j)] / 2^n = \frac12\text{Tr}[\hat\lambda_i \mathcal{E}_l (\hat\lambda_j)](Al )ij =Tr[P^i El (P^j )]/2n=21 Tr[λ^i El (λ^j )] → a (4n−1)×(4n−1)(4^n-1)\times(4^n-1)(4n−1)×(4n−1) real-valued matrix, → represents unital action of noise channel. * (cl)i=tr[P^iEl(I^)]/23n−1=12nTr[λ^iEl(I^(4n−1)×(4n−1))](\bm{c}_l)_i = \mathrm{tr}[\hat P_i \mathcal{E}_l(\hat I)] / \sqrt{2^{3n-1}} = \frac{1}{2^n}\text{Tr}[\hat\lambda_i \mathcal{E}_l (\hat I_{(4^n-1)\times(4^n-1)})](cl )i =tr[P^i El (I^)]/23n−1 =2n1 Tr[λ^i El (I^(4n−1)×(4n−1) )] → a (4n−1)(4^n-1)(4n−1) real-valued vector, → represents non-unital action of noise channel. 68 Now we can substitute J(ρ(θ))−1J(\rho(\bm\theta))^{-1}J(ρ(θ))−1 with J(E(ρ(θ)))−1J(\mathcal{E}(\rho(\bm\theta)))^{-1}J(E(ρ(θ)))−1 and get: Var(⟨X^⟩est)≥2n−1NxTJ(E(ρ(θ)))−1x=1N{Tr[E(ρ) ((A−1)Tx⋅P^)2]−Tr[E(ρ)((A−1)Tx⋅P^)]2}=1N(ΔY^)2\begin{align*} \text{Var}(\braket{\hat X}^\text{est}) &\geq \frac{2^{n-1}}{N} \bm{x}^T J(\mathcal{E}(\rho(\bm\theta)))^{-1}\bm{x} \\ &= \frac1N \Bigg\{ \text{Tr}\left[\mathcal{E}(\rho)\; \left((A^{-1})^T\bm{x} \cdot \bm{\hat P}\right)^2\right] \\ &\hspace{2em}- \text{Tr}\left[\mathcal{E}(\rho) \left((A^{-1})^T\bm{x} \cdot \bm{\hat P}\right)\right] ^2 \Bigg\} \\ &= \frac1N (\Delta\hat Y)^2 \end{align*} Var(⟨X^⟩est) ≥N2n−1 xTJ(E(ρ(θ)))−1x=N1 {Tr[E(ρ)((A−1)Tx⋅P^)2]−Tr[E(ρ)((A−1)Tx⋅P^)]2}=N1 (ΔY^)2 where (ΔY^)2≡(\Delta\hat Y)^2 \equiv(ΔY^)2≡ the variance of observable Y^\hat YY^ of the noisy state E(ρ^)\mathcal{E}(\hat\rho)E(ρ^ ). 69 * What is Y^\hat YY^? This is an observable which involves the optimal estimation strategy. Since Tr[ρ^X^]=Tr[E(ρ^)Y^]\text{Tr}[\hat\rho\hat X] = \text{Tr}[\mathcal{E}(\hat\rho) \hat Y]Tr[ρ^ X^]=Tr[E(ρ^ )Y^], we want it to satisfy E†(Y^)=X^\mathcal{E}^\dagger(\hat Y) = \hat XE†(Y^)=X^, and since we already assume the form of E\mathcal{E}E, we can explicitly express Y^\hat YY^ as: Y^=(−2(n−1)/2(A−1)Tx⋅c)I^+(A−1)Tx⋅P^\hat Y = \left( -2^{(n-1)/2}(A^{-1})^T\bm{x} \cdot \bm{c} \right) \hat I +(A^{-1})^T\bm{x} \cdot \bm{\hat P} Y^=(−2(n−1)/2(A−1)Tx⋅c)I^+(A−1)Tx⋅P^ * Y^\hat YY^ can be interpreted as an observable that absorbs the effect of noise. * Therefore the way to minimize the variance of the unbiased estimator ⟨X^⟩est\braket{\hat X}^\text{est}⟨X^⟩est for the noiseless state through the measurement of the noisy state is to perform the projection measurement in the diagonal basis of the operator Y^\hat YY^. 70 NOISELESS TARGET STATE (NNN-QUBIT) * ρ^\hat\rhoρ^ is obtained from applying LLL layers of noiseless unitary gates {Ul}l=1L\{\mathcal{U_l}\}_{l=1}^L{Ul }l=1L to the initial state ρ^0\hat\rho_0ρ^ 0 . * ρ^\hat\rhoρ^ can be parameterized by a general Bloch vector θ\bm\thetaθ: ρ^=I^2n+θ⋅P^2n+1\hat \rho = \dfrac{\hat I}{2^n} + \dfrac{\bm\theta \cdot \bm{\hat P}}{\sqrt{2^{n+1}}}ρ^ =2nI^ +2n+1 θ⋅P^ with θ∈R4n−1\bm\theta \in \mathbb R^{4^n-1}θ∈R4n−1. * The coefficient 2(−1−n)/22^{(-1-n)/2}2(−1−n)/2 is somehow related to the general n-level Bloch vector * I^≡σ^0⊗n\hat I \equiv \hat\sigma_0^{\otimes n}I^≡σ^0⊗n and P^≡{P^i}i=14n−1\bm{\hat P} \equiv \{\hat P_i\}_{i=1}^{4^n-1}P^≡{P^i }i=14n−1 is the nnn-qubit Pauli group excluding all-identity: Pi^∈{σ^0,σ^x,σ^y,σ^z}⊗n\{σ^0}⊗n\hat{P_i} \in \{\hat\sigma_0, \hat\sigma_x, \hat\sigma_y, \hat\sigma_z\}^{\otimes n} \backslash \{\hat\sigma_0\}^{\otimes n}Pi ^ ∈{σ^0 ,σ^x ,σ^y ,σ^z }⊗n\{σ^0 }⊗n. → notice that any Pi2^\hat{P_i^2}Pi2 ^ is just I^2n×2n\hat I_{2^n\times2^n}I^2n×2n . * assume the preparation of initial state ρ^0\hat\rho_0ρ^ 0 is noiseless. 71 VIRTUAL QUANTUM CIRCUIT Consider one single sample (one single row): * start from initial state ρ^0\hat\rho_0ρ^ 0 * with LLL layers of unitary+followed by noisy operations {El∘Ul}l=1L\{\mathcal{E}_l \circ \mathcal{U}_l\}_{l=1}^L{El ∘Ul }l=1L and constraint the noise to be Markovian * finally use noiseless POVM M0\mathcal{M}_0M0 to estimate the expectation value of a traceless observable X^=x⋅P^\hat X =\bm{x} \cdot \bm{\hat P}X^=x⋅P^, the observable is parametrized by x∈R4n−1\bm{x} \in \mathbb{R}^{4^n-1}x∈R4n−1. 72 Loading slide... 73 OBJECTIVE OF QEM To remove the effect of noise channels {El}l=1L\{\mathcal{E}_l\}_{l=1}^L{El }l=1L , so that we can have an unbiased estimator of X^=x⋅P^\hat X =\bm{x} \cdot \bm{\hat P}X^=x⋅P^ ⟨X^⟩≡tr[ρ^X^]=2n−1θ⋅x\braket{\hat X} \equiv \mathrm{tr}[\hat\rho\hat X] = \sqrt{2^{n-1}} \bm{\theta}\cdot\bm{x} ⟨X^⟩≡tr[ρ^ X^]=2n−1 θ⋅x with standard deviation ε\varepsilonε. Construct Virtual Quantum Circuit as the follows: * boosted noise Elm\mathcal{E}_{lm}Elm in the lthl^\text{th}lth layer and mthm^\text{th}mth copy such that Γ(Elm)≥Γ(El)≥γ\Gamma(\mathcal{E}_{lm}) \geq \Gamma(\mathcal{E}_l) \geq \gammaΓ(Elm )≥Γ(El )≥γ * classical register ρ^c,m\hat\rho_{c,m}ρ^ c,m coupled with the system qubits via the additional operations Clm\mathcal{C}_{lm}Clm * POVM measurement M\mathcal MM performed on the entire set of copies to output the estimator of ⟨X^⟩\braket{\hat X}⟨X^⟩ CONSTRAINT ON LOCCS * Classical register ρ^c,m\hat\rho_{c,m}ρ^ c,m is initialized with probabilistic mixtures of computational bases as ρ^c,m=∑ipmi∣i⟩⟨i∣\hat\rho_{c,m} = \sum_i p_{mi}\ket i\bra iρ^ c,m =∑i pmi ∣i⟩⟨i∣ * They are coupled with system via only unitary operations as Clm=∑iClmi⊗∣i⟩⟨i∣\mathcal{C}_{lm} = \sum_i \mathcal{C}_{lmi}\otimes\ket i\bra iClm =∑i Clmi ⊗∣i⟩⟨i∣ because we’re not allowed to QEC. 74 CALCULATE SLD FISHER INFORMATION MATRIX First, re-express the mthm^\text{th}mth copy (or mthm^\text{th}mth row) of the quantum state in the virtual circuit as Em′(ρ(θ)⊗ρ^c,m)\mathcal{E}'_m(\rho(\bm\theta)\otimes\hat\rho_{c,m})Em′ (ρ(θ)⊗ρ^ c,m ), with Em′\mathcal{E}'_mEm′ being an effective noise channel: Em′=(ELm∘UL∘C(L−1)m)⋯(E1m∘U1∘C(0)m)∘(U1−1∘⋯∘UL−1)\mathcal{E}'_m = (\mathcal{E}_{Lm} \circ \mathcal{U}_L \circ \mathcal{C}_{(L-1)m}) \cdots (\mathcal{E}_{1m} \circ \mathcal{U}_1 \circ \mathcal{C}_{(0)m}) \circ (\mathcal{U}_1^{-1}\circ \cdots \circ{U}_L^{-1}) Em′ =(ELm ∘UL ∘C(L−1)m )⋯(E1m ∘U1 ∘C(0)m )∘(U1−1 ∘⋯∘UL−1 ) why define it this way? because ρ^0⊗ρ^c,m=(U1−1∘⋯∘UL−1)ρ(θ)⊗ρ^c,m\hat\rho_0\otimes\hat\rho_{c,m} = (\mathcal{U}_1^{-1}\circ \cdots \circ{U}_L^{-1})\rho(\bm\theta)\otimes\hat\rho_{c,m}ρ^ 0 ⊗ρ^ c,m =(U1−1 ∘⋯∘UL−1 )ρ(θ)⊗ρ^ c,m This compilation allows for the calculation of quantum Fisher information matrix of the state right before POVM → analyze the SLD Fisher information matrix JJJ of the quantum state: ⊗m=1NEm′(ρ(θ)⊗ρ^c,m)\otimes_{m=1}^N \mathcal{E}'_m(\rho(\bm\theta)\otimes\hat\rho_{c,m})⊗m=1N Em′ (ρ(θ)⊗ρ^ c,m ). 75 Note that we have NNN copies in total. (or you can think of it as the sample complexity) JJJ can be bounded as: J≲∑mIΓ(Em′)2≲N1γ2LJ \lesssim \sum_m \frac{I}{\Gamma(\mathcal{E}'_m)^2} \lesssim N\frac{1}{\gamma^{2L}} J≲m∑ Γ(Em′ )2I ≲Nγ2L1 * indicates the exponential decay of JJJ with circuit depth (or layer #) LLL growth. * can combine this with quantum Cramér-Rao bound inequality → so we can relate JJJ with the standard deviation ε\varepsilonε of an unbiased estimator 76 MAIN RESULTS: COMPLEXITY OF NNN Prerequisite: If the noise Elm\mathcal{E}_{lm}Elm satisfies for all lll(layers) and mmm(copies), that: * for all ρ^≠σ^\hat\rho \neq \hat \sigmaρ^ =σ^, we can still distinguish them apart even after the same noise channel: Elm(ρ^)≠Elm(σ^)\mathcal{E}_{lm}(\hat\rho) \neq \mathcal{E}_{lm}(\hat\sigma)Elm (ρ^ )=Elm (σ^) → this is a necessary condition: quantum information is not completely destroyed after noise * for all ρ^\hat\rhoρ^ , the state after noise, Elm(ρ^)\mathcal{E}_{lm}(\hat\rho)Elm (ρ^ ), is full rank with all eigenvalues greater than 0 → Elm(ρ^)\mathcal{E}_{lm}(\hat\rho)Elm (ρ^ ) is a positive-definite matrix → means that after the noise, the variance of all observable for any state is non-zero. Hence our sample cost must be greater than 0. 77 Theorem 1: Complexity of NNN (lower bound) The cost NNN required for any unbiased estimator of ⟨X^⟩\braket{\hat X}⟨X^⟩ with standard deviation ϵ\epsilonϵ constructed from QEM (that can be translated into the virtual quantum circuit) satisfies: N≥∥x∥2ε2βγ2L,N \geq \frac{\|\bm{x}\|^2}{\varepsilon^2} \beta \gamma^{2L}, N≥ε2∥x∥2 βγ2L, where β\betaβ is the largest 0<β<10<\beta<10<β<1 such that Elm(ρ^)−β2nI^≥0\mathcal{E}_{lm}(\hat\rho) - \frac{\beta}{2^n}\hat I \geq 0Elm (ρ^ )−2nβ I^≥0 for all ρ^\hat\rhoρ^ and all l,ml,ml,m. → or β\betaβ can be viewed as “how far away“ the general Bloch sphere is from the original surface due to the noise. IMPLICATION if γ>1\gamma>1γ>1, the cost (NNN) of the unbiased QEM grows exponentially with the circuit depth LLL no matter how we choose Elm\mathcal{E}_{lm}Elm or Clm\mathcal{C}_{lm}Clm or POVM. (see Noise channel characterization) 78 WHEN QUANTUM CIRCUIT SCRAMBLES THE STATE STRONG ENOUGH then we must incorporate the dependency of nnn for sample complexity into consideration: (or else our lower bound will look like trash) > we must pay overhead to eliminate every local noise and thus encounter > dependence on nnn Theorem 2: Complexity under the presence of strong entangling gate Let U1,⋯ ,UL\mathcal{U}_1,\cdots,\mathcal{U}_LU1 ,⋯,UL all be nnn-qubit unitary gate (which are drawn from a set of random unitary) that form unitary 2-design and El\mathcal{E}_lEl be a local noise. Then, there is exponential growth with both qubit count nnn and depth LLL in the average over the number of copies NNN required to perform unbiased estimation of ⟨X^⟩\braket{\hat X}⟨X^⟩ over {U1,⋯ ,UL}\{\mathcal{U}_1,\cdots,\mathcal{U}_L\}{U1 ,⋯,UL }. * To put it simple, if we encounter random or chaotic quantum circuit, then system’s qubit number nnn comes to play a role in the exponential scaling of sample complexity, if we want a better/tighter bound. * pure unital noise case: if Elm(I^2n)=I^2n\mathcal{E}_{lm}(\frac{\hat I}{2^n}) = \frac{\hat I}{2^n}Elm (2nI^ )=2nI^ , then the cost bound satisfies N≥∥x∥2ε2(1−(1−β)L)γ2L≈∥x2∥ε2γ2LN \geq \frac{\|\bm{x}\|^2}{\varepsilon^2} \Big(1-1-\beta^L\Big) \gamma^{2L} \approx \frac{\|\bm{x}^2\|}{\varepsilon^2} \gamma^{2L} N≥ε2∥x∥2 (1−(1−β)L)γ2L≈ε2∥x2∥ γ2L 79 SPECIAL CASES THAT TIGHTEN THE BOUND (1/3) 1. GLOBAL DEPOLARIZATION CHANNEL (THIS IS A UNITAL CHANNEL) assume that the entire unitary gates are followed by the global depolarization channel Elm:θ↦(1−plm)θ,\mathcal{E}_{lm}: \bm\theta \mapsto (1-p_{lm})\bm\theta, Elm :θ↦(1−plm )θ, since we assume boosted noise in our framework, the error rate is bounded from below as plm≥pp_{lm} \geq pplm ≥p (so as for the other two types of noise below), so its minimum noise strength γ=1/(1−p)\gamma = 1/(1-p)γ=1/(1−p) and so we have β=p\beta = pβ=p, therefore N≈∥x∥2ε2(1−p)−2LN \approx \frac{\|\bm{x}\|^2}{\varepsilon^2} (1-p)^{-2L}N≈ε2∥x∥2 (1−p)−2L. * the effective error rate p′p'p′ for Em′\mathcal{E}'_mEm′ mentioned in calculate SLD Fisher Information Matrix.md is then p′=1−(1−p)Lp' = 1-1-p^Lp′=1−(1−p)L → can directly verify. 80 SPECIAL CASES THAT TIGHTEN THE BOUND (2/3) 2. LOCAL DEPOLARIZATION CHANNEL (THIS IS A UNITAL CHANNEL TOO) Elm=(Elm(0))⊗n, Elm(0):θ↦(1−plm)θ\mathcal{E}_{lm} = (\mathcal{E}^{(0)}_{lm})^{\otimes n},\;\;\;\;\; \mathcal{E}^{(0)}_{lm}:\bm\theta\mapsto(1-p_{lm})\bm\theta Elm =(Elm(0) )⊗n,Elm(0) :θ↦(1−plm )θ if moreover the unitary gates of the circuit satisfy the condition: For a random circuit whose unitary gate is drawn from unitary 2-design (such as nnn-qubit Clifford group), the bound will be tighten as E[N]≥O((1+324n4n−1p)nL)\mathbb{E}[N] \geq \mathcal O\left( \left(1+\frac{3}{2}\frac{4^n}{4^n-1}p\right)^{nL} \right) E[N]≥O((1+23 4n−14n p)nL) 81 SPECIAL CASES THAT TIGHTEN THE BOUND (3/3) 3. LOCAL AMPLITUDE DAMPING CHANNEL (THIS IS A NON-UNITAL CHANNEL) Elm=(Elm(0))⊗n, Elm(0):(θx,θy,θz)↦(1−plmθx,1−plmθy,(1−plm)θz+plm)\mathcal{E}_{lm} = (\mathcal{E}^{(0)}_{lm})^{\otimes n},\;\;\;\;\; \mathcal{E}^{(0)}_{lm}:(\theta_x, \theta_y, \theta_z) \mapsto \\(\sqrt{1-p_{lm}}\theta_x, \sqrt{1-p_{lm}}\theta_y, (1-p_{lm})\theta_z + p_{lm}) Elm =(Elm(0) )⊗n,Elm(0) :(θx ,θy ,θz )↦(1−plm θx ,1−plm θy ,(1−plm )θz +plm ) again if we satisfy the same condition, the bound will be tighten as E[N]≥O((1+4n4n−1p)nL)\mathbb{E}[N] \geq \mathcal O\left( \left(1+\frac{4^n}{4^n-1}p\right)^{nL} \right) E[N]≥O((1+4n−14n p)nL) 82 CONNECTION BETWEEN QEM AND QEC & OPEN PROBLEMS * AMONG DIFFERENT QEMS: THEY CAN BE USED IN PARALLEL (OR CONCATENATE) E.G. RESILIENCE L. * BETWEEN QEMS AND QECS: QEM CAN BE USED ALONGSIDE QEC Variational Eigensolver (need parametrized unitary), Hamiltonian Simulation (trotterization) * OTHER WAYS OF BENCHMARKING QEMS? (OTHER THAN USING SAMPLING COMPLEXITY) like MNIST or ImageNet database in Computer Vision. * OPTIMAL QEM + QEC STRATEGY? if I have limited number of physical qubits, how should I partition them into QEC blocks, so that in a given amount of time, we can get the best estimator? > Usually cheaper to build a hundred QPUs with each QPU having 100 physical > qubits; than to build a single QPU with 10000 physical qubits. (finding the > sweet spot) 83 Loading slide... 84 Loading slide... 85 THANK YOU FOR LISTENING any question? Thanks to Hyukgun Kwon for important discussions, useful resources for reference, assistance, and critical advices when finalizing the slides. Thanks to Adi Gandotra, David Noachtar, Debayan Bandyopadhyay, Mahadevan Subramanian, and Mariesa Teo for all the important advices, questions, and assistance during the primitive version of the slides. 86