towardsdatascience.com
Open in
urlscan Pro
162.159.152.4
Public Scan
Submitted URL: https://towardsdatascience.com/covid-19-map-using-elk-7b8611e9f2f4?gi=b64cb7dbe529
Effective URL: https://towardsdatascience.com/covid-19-map-using-elk-7b8611e9f2f4?gi=696de4449425
Submission: On December 08 via api from KR — Scanned from DE
Effective URL: https://towardsdatascience.com/covid-19-map-using-elk-7b8611e9f2f4?gi=696de4449425
Submission: On December 08 via api from KR — Scanned from DE
Form analysis
0 forms found in the DOMText Content
Get started Open in app Sign in Get started Follow 603K Followers · Editors' PicksFeaturesDeep DivesGrowContribute About Get started Open in app RESPONSES What are your thoughts? Cancel Respond Also publish to my profile There are currently no responses for this story. Be the first to respond. You have 2 free member-only stories left this month. Sign up for Medium and get an extra one DATA FOR CHANGE BUILDING A COVID-19 MAP USING ELK CREATE YOUR OWN CUSTOM COVID-19 MAP USING ELASTICSEARCH Carlos Cilleruelo Sep 3, 2020·11 min read Covid-19 ELK Map, available at https://covid19map.uah.es/app/dashboards#/view/478e9b90-71e1-11ea-8dd8-e1599462e413 | Image by the author Probably most of you are familiar with Johns Hopkins University (JHU) map representing the current situation of the COVID-19 pandemic. Image of the Johns Hopkins University (JHU) map (Johns Hopkins University) This map has been developed using ArcGIS technology, that has come the facto standard for developing pandemic maps in a lot of cases like the WHO or the Italian Government. After saw this I thought about creating my own map with ELK; and in a few days, everything was running with the help of a friend. Based on my experience I decided to write how you could easily do that too. This series of posts are going to be centred in how you can create your own custom map using the ELK stack. WHY ELASTICSEARCH? The first question to answer is why ELK? and not use ArcGIS technology. Elasticsearch is Open Source and everyone can easily deploy a running cluster. Furthermore, Elasticsearch has beautiful representations using Kibana and has also maps, so it has everything we need to build an incredible Covid-19 map. I really love ELK stack so I decided to give it a try. > I have run ELK stack in a single 10$/month Digital Ocean VPS, obviously > without redundancy and a lot of space, but we will see that we do not need a > lot of space for the data. Base on this our cost will only be the infrastructure for running ELK, but a little VPS can run a little cluster. I do not have a lot of spare money so I always try to maintain costs to a minimum. I have run ELK stack in a single 10$/month Digital Ocean VPS, obviously without redundancy and a lot of space, but we will see that we do not need a lot of space for the data. Another choice is the use of Elastic cloud. My current deploy en Elastic Cloud | Image by the author Elastic Cloud is the easiest way to run an ELK cluster and offer a lot of useful capabilities. For example, one of the “problems” of ELK is the number of frequently updates with new functionalities. In Elastic Cloud updating your cluster is as easy as pressing a button within the control panel. Also, the cost of the smallest deployment is under 20$/month. In my case, I ended using Elastic Cloud because of the usability and easier administration. I totally recommend this option and do not forget that you have a 14-day trial. COVID-19 DATA SOURCES After agreeing in the awesomeness of ELK stack we can start to think about how to insert Covid-19 data inside ELK. First of all, we identify a reliable and update source of data for our map. We need to retrieve and insert new data each day to update our map. Johns Hopkins University (JHU) publish their data on GitHub. They are all in CSV format and can easily work with. Something similar happens with Covid-19 Italy data. This formats can be easily parsed and then inserted, but there can be several problems. During my work parsing those data, I had several problems. The first one is the data updating process, for example, JHU is not always the fastest updating their data. And that’s normal, they need to wait for the release of new data and the incorporated in their dataset. You will see that Italy repository is more frequently updated. > If you are planning to run an updated map the best option is not to use one of > those Github repositories. > > The best option is to use a Covid-19 API. We have several options but I > decided to use Covid-19 Narrativa API. Another problem is the change in the file structure. There have been times that the CSV structure has been changed and then you need to parse again those files, columns can be renamed or order can be changed. Because of that if you are planning to run an updated map the best option is not to use one of those Github repositories. In order to avoid those problems, the best option is to use a Covid-19 API. We have several options but I decided to use Covid-19 Narrativa API. To retrieving a Covid-19 data from the 3rd of September of 2020 we just need to make a request to https://api.covid19tracking.narrativa.com/api/2020–09–03. After performed that request we will obtain a JSON response, avoiding all the problems mentioned earlier. Narrativa already checks and download Covid-19 information from several official data sources: * Spanish: Ministerio de Sanidad * Italy: Dipartimento della Protezione Civile de Italia * Germany: Robert Koch Institute * France: Santé publique France * Johns Hopkins University * Johns Hopkins University JSON response of Covid-19 Narrativa API | Image by the author Using this API, we can retrieve all the data we need. From world data to countries and region data. Full documentation of the API can be here. Reaching this step we can start to consume and insert data in Elasticsearch. INSERTING COVID-19 INSIDE ELASTIC In order to consume data from an API, we could use Logstash. Logstash is the standard tool for collecting, parsing and transform information before inserting in Elasticsearch. Furthermore, Logstash has a lot of preconfigure configs already published for common logs. But there are other possibilities, like Python. I really love Python syntax and programming so when I am consuming from an API I usually ended building a script for consuming the data and the inserted in Elasticsearch. Like I said you can do this using Logstash but I fell more comfortable programming in Python. So let’s start with the code!! First of all, we need all the data from the beginning of the pandemic until today. Also, we will like to have this data separated in dates in order to filter or create visualizations from a different period of time. Using Python request and datetime we can easily iterate throw all the dates and retrieve all the data. Personally prefer requests to urrlib, I find it much simple and elegant. > Using Python request and datetime we can easily iterate throw all the dates > and retrieve all the data. import requests from datetime import datetime, date, timedeltastart_date = date(2020, 3, 1) end_date = date(2020, 4, 9) delta = timedelta(days=1)while start_date <= end_date: day = start_date.strftime("%Y-%m-%d") print ("Downloading " + day) url = "https://api.covid19tracking.narrativa.com/api/" + day r = requests.get(url) data = r.json() start_date += delta After obtaining the data of each date we could just insert it in Elasticsearch but before doing that it is necessary to perform some formatting over the data. We are going to represent the data associate to countries and in an elastic map. In order to associate data to each country, elasticsearch need to identify the name of the country/region. Narrativa API offers us the name in English, Italian and Spanish but some country names could be problematic. Because of that ISO 3166–1 alpha-2 (iso2) and ISO 3166–1 alpha-3 (iso3) were invented. Using this nomenclature we can easily identify each country name without mistakes. Python has a package countryinfo that can help us in this process of translation country names to iso3 format. > Python has a package countryinfo that can help us in this process of > translation country names to iso3 format. > > You need to compare infection rates amount the population, or use statatitics > metrics, but not absolute numbers. This topic has been heavily detailed in an > ArcGis post that I totally recommend. from countryinfo import CountryInfofor day in data['dates']: for country in data['dates'][day]['countries']: try: country_info = CountryInfo(country) country_iso_3 = country_info.iso(3) population = country_info.population() except Exception as e: print("Error with " + country) country_iso_3 = country population = None infection_rate=0 print(e) Also if you checked the code probably most of you will have seen a population value. Unfortunately, Elasticsearch does not include population values for each country right now, they are working on that. In order to map the Covid-19 pandemic representatively, it is necessary to have the population value of each country, a heat map with only the number of cases is not representative. You need to compare infection rates amount the population, or use statistics metrics, but not absolute numbers. This topic has been heavily detailed in an ArcGis post that I totally recommend. An easy way of obtaining a representative metric of infected people in each country is the use of an infection rate. Infection rate represents the probability or risk of an infection in a population. Rate of infection formula | Image by the author Again using Python we can easily calculate that number. Unfortunately, Python countryinfo does not include the population of all the countries, due to that reason I am controlling the exception. Most of the countries are supported but I caught some errors, with Bahamas, Cabo Verde and some others. def getInfectionRate(confirmed, population): infectionRate = 100 * (confirmed / population) return float(infectionRate)if population != None: try: infection_rate=getInfectionRate(data['dates'][day]['countries'][country]['today_confirmed'], population) print(infection_rate) except: infection_rate=0 > I replaced the timestamp with a datetime format date. This way Elasticsearch > will automatically detect the date format and you forget about creating the > Kibana index. After performing all of these modifications we just need to insert this data into Elasticsearch. Before inserting the data I preferred to create a custom dictionary with all the data. I added the previous data, population, country iso3 name, infection rate and I replaced the timestamp with a datetime format date. This way Elasticsearch will automatically detect the date format and you forget about creating the Kibana index. def save_elasticsearch_es(index, result_data): es = Elasticsearch(hosts="") #Your auth info es.indices.create( index=index, ignore=400 # ignore 400 already exists code ) id_case = str(result_data['timestamp'].strftime("%d-%m-%Y")) + \ '-'+result_data['name'] es.update(index=index, id=id_case, body {'doc':result_data,'doc_as_upsert':True}) result_data = data['dates'][day]['countries'][country] del result_data['regions'] result_data['timestamp'] = result_data.pop('date')result_data.update( timestamp=datetime.strptime(day, "%Y-%m-%d"), country_iso_3=country_iso_3, population=population, infection_rate=infection_rate, )save_elasticsearch_es('covid-19-live-global',result_data) The complete script can be found on GitHub, just remember to add your elasticsearch host before running the script and install all the dependencies. CREATION COVID-19 VISUALIZATIONS USING KIBANA > Kibana will automatically recognise the timestamp field as a time filter you > will just need to select it. After running the script an index will be created inside elasticsearch and you would be able to configure it from Kibana. Kibana will automatically recognise the timestamp field as a time filter you will just need to select it. Kibana Index pattern | Image by the author DATA TABLE VISUALIZATION One of the easiest visualizations we can do is a simple table showing the countries with the most Covid-19 number of cases. Kibana table showing the countries with the most Covid-19 number of cases | Image by the author To create this visualization we will need a Data Table visualization. The first column can be the total number of confirmed cases, using a simple max aggregation over today_confirmed we can obtain that number. Total confirmed cases configuration in Kibana | Image by the author Another interesting metric could be the last confirmed cases in the last 48 hours. One can think that 24 hours number could be more interested but a lot of countries spent more time to report their cases, with a 48-hour window you will be able to represent more results. To perform this representation in Kibana we will need a Sum Bucket aggregation. Using this aggregation we can use a Date Range of the last 48 hours, now-2d, and then again used a max aggregation for the number of confirmed cases. Last 48h confirmed cases configuration in Kibana | Image by the author After having our aggregations the only thing that we need to do is to split the rows using their country name. Selecting split by name.keyword will do this. Also, I will recommend putting a size limit, we will see the most important number. A 25 descending number seem to be enough in my dashboard but you can adjust this number to your preferences. Split rows by country name | Image by the author COVID-19 PANDEMIC MAP VISUALIZATION The most incredible visualization that we can create or at least the most popular one is a map showing the evolution of the Covid-19 pandemic. Kibana offers several options for creating maps, in this case, we will select the choropleth option. Using this option we can select a word countries layer and select ISO 3166–1 alpha-3 as format, remember how we include this with our script. The Statistics sources will be our index name and the field containing the ISO 3166–1 alpha-3 will be our Join field, in our case country_iso_3. > Also, be aware of this!!, If after adding the layer the map is still black > check your Kibana date filter and change the last 15-minute selector to Last 1 > year. I spent a lot of time thinking I did something wrong and the problem was > the Kibana time selector. Kibana map choropleth configuration to create a layer in the map | Image by the author After adding this layer we are not finished, all the countries present the same data. We will need to select as metric the infection_rate variable in order to draw colours base on their value and then select Fill color by value and select infection_rate again. Furthermore, we can choose between several colour palettes under layer style, I prefer the one with red tones. Reaching this point the map should have some colours on your map. Also, be aware of this!!, If after adding the layer the map is still black check your Kibana date filter and change the last 15-minute selector to Last 1 year. I spent a lot of time thinking I did something wrong and the problem was the Kibana time selector. Kibana map choropleth layer configuration | Image by the author After selecting these options you should be able to see something like this: Kibana map creation | Image by the author CONCLUSIONS Hopefully, with the code and examples presented in this article, you will be able to create your custom maps and visualizations. There a lot of possible Kibana visualizations in the article I only addressed some ideas. My recommendation is to try as much Kibana visualizations as you can and want. Kibana is an incredible tool with a few clicks you will be able to create graphs centred in your country, region or continent. CARLOS CILLERUELO Bachelor of Computer Science and MSc on Cyber Security. Currently working as a cybersecurity researcher at the University of Alcalá. Follow Carlos Cilleruelo Follows * IVAN NINICHUCK * STEVE MICALLEF * MATT B * NICOTRIAL * DR. WILLIAM K.O See all (22) 16 SIGN UP FOR THE VARIABLE BY TOWARDS DATA SCIENCE Every Thursday, the Variable delivers the very best of Towards Data Science: from hands-on tutorials and cutting-edge research to original features you don't want to miss. Take a look. Get this newsletter * Elasticsearch * Covid 19 * Maps * Data Science * Data For Change 16 claps 16 MORE FROM TOWARDS DATA SCIENCE Follow Your home for data science. A Medium publication sharing concepts, ideas and codes. Read more from Towards Data Science MORE FROM MEDIUM HOW TO AUTOMATICALLY UPLOAD ARTICLES TO MEDIUM Tech Breaker CLEANING UP MVVMCROSS VIEWMODELS WITH FODY Denys Fiediaiev WEB3J-4.1.1 IS AVAILABLE Sam Nazha in web3labs CS373 FALL 2022 #4 shizuuuuuuu ANOMALY DETECTION AND PREDICTION USING ML.NET TIMESERIES LIBRARY Mehmet Yener YILMAZ in Geek Culture DESIGN PATTERN (PART 04) — BUILDER DESIGN PATTERN Arun prashanth HACKTM 2018: THE VENUE HackTM MEKANISM MOD 1.16.4/1.15.2 — NEW EXPERIENCE WITH MEKANISM MOD MINECRAFT — WMINECRAFT.NET Wminecraft Net About Write Help Legal Get the Medium app To make Medium work, we log user data. By using Medium, you agree to our Privacy Policy, including cookie policy.