www.nlpsummit.org
Open in
urlscan Pro
51.158.129.126
Public Scan
Submitted URL: https://t.sidekickopen22.com/Ctc/2H+23284/ccpq104/JlF2-6qcW8wLKSR6lZ3kDW3KSkxB2xHGSvW4jyN9J2QvLtTW7xW1SL5-74bkW2m4ktJ2BZFp9W5...
Effective URL: https://www.nlpsummit.org/lessons-learned-de-identifying-700-million-patient-notes-with-spark-nlp/
Submission: On September 11 via api from SG — Scanned from SG
Effective URL: https://www.nlpsummit.org/lessons-learned-de-identifying-700-million-patient-notes-with-spark-nlp/
Submission: On September 11 via api from SG — Scanned from SG
Form analysis
0 forms found in the DOMText Content
* Program * NLP Training * Watch Past Summits * Healthcare NLP 2023 * NLP Summit 2022 * Healthcare NLP 2022 * NLP Summit 2021 * Healthcare NLP 2021 * NLP Summit 2020 * Register now Select Page * Program * NLP Training * Watch Past Summits * Healthcare NLP 2023 * NLP Summit 2022 * Healthcare NLP 2022 * NLP Summit 2021 * Healthcare NLP 2021 * NLP Summit 2020 * Register now LESSONS LEARNED DE-IDENTIFYING 700 MILLION PATIENT NOTES WITH SPARK NLP Providence St. Joseph Health’s (PSJH) unstructured data de-identification methodology relies on pre-trained BiLSTM-CNN-Char NER models provided by John Snow Labs. The PSJH Data science department evaluated John Snow Labs models based on accuracy and speed. The accuracy is evaluated by randomly selecting 1000 patient notes, de-identifying the notes by using the John Snow Labs de-identification model, and using human experts to validate each of the de-identified notes. There are a total of 34,701 sentences and the total number of leaked PHI events is 281. Therefore, the PHI leaks into at least 0.81% sentences. The speed of the John Snow Labs de-identification model is evaluated by measuring the time to run 100K and 500K patient notes (expected daily load ranges from 100K-500K) using a moderate size cluster. The cluster used for this test has 15 workers, each with 112 GB memory, 1 GPU, 5DBU. It took 43.76 minutes to de-identify 100K patient notes and 2.46 hours to de-identify 500K patient notes. In conclusion, the John Snow Labs de-identification model performs quite well as far as the speed is concerned. The John Snow Labs de-identification model is reasonably accurate, and consistent with advertised performance accuracy. About the speaker Vivek Tomer Principal Data Scientist at Providence St. Joseph Health Vivek Tomer is a Principal Data Scientist at Providence St. Joseph Health (PSJH) Healthcare Intelligence department where he is responsible for creating and leading strategic enterprise Data Science projects. Prior to PSJH, Mr. Tomer was Vice President, Model Development at Umpqua Bank where he led the development of the bank’s first loan-level credit risk and customer analytics models. Mr. Tomer has two master’s degrees from the University of Illinois at Urbana-Champaign, one in Theoretical Statistics and the other in Quantitative Finance, and has over a decade of experience in solving complex business problems using statistical models. WHEN Sessions: October 5 – 7 Trainings: October 4, 12 – 15 CONTACT nlpsummit@johnsnowlabs.com * Follow * Follow Presented by * Code of Conduct * Privacy Policy