lig-getalp-new.imag.fr
Open in
urlscan Pro
2001:660:5301:61::9
Public Scan
Submitted URL: https://lig-getalp-new.imag.fr/
Effective URL: https://lig-getalp-new.imag.fr/about-dbnary/
Submission: On July 20 via api from US — Scanned from FR
Effective URL: https://lig-getalp-new.imag.fr/about-dbnary/
Submission: On July 20 via api from US — Scanned from FR
Form analysis
1 forms found in the DOMGET https://lig-getalp-new.imag.fr/about-dbnary/
<form method="get" id="searchform" class="search-form" action="https://lig-getalp-new.imag.fr/about-dbnary/">
<label class="screen-reader-text" for="s">Search for:</label>
<div class="res-search-wrapper">
<input type="search" class="field" name="s" id="s" placeholder="Search here …">
<button type="submit" class="search-submit" value="Search"> <span class="res-search-icon icon-search"></span></button>
</div>
</form>
Text Content
↓ Skip to Main Content Dbnary Main Navigation Menu * Home * Download * Dashboard * Online Access * Development * Publications * News * Contact HOME Dbnary is an effort to provide multilingual lexical data extracted from wiktionary. The extracted data is made available as LLOD (Linguistic Linked Open Data). This data set has won the Monnet challenge in 2012. Linguistic data currently includes Bulgarian, Catalan, Chinese, Dutch, English, Finnish, French, Irish, German, Greek, Indonesian, Italian, Japanese, Kurdish, Latin, Lithuanian, Malagasy, Norvegian, Polish, Portuguese, Russian, Serbo-Croat, Spanish, Swedish and Turkish. LICENCE Dbnary is derived from Wiktionary and is distributed under Creative Commons Attribution-ShareAlike 3.0. ATTRIBUTION If you use DBnary in a way or another, please link to this web page. When citing this work in a scientific article, please do cite: Sérasset Gilles (2014). DBnary: Wiktionary as a Lemon-Based Multilingual Lexical Resource in RDF. to appear in Semantic Web Journal (special issue on Multilingual Linked Open Data). [pdf] DATASET DBnary dataset is registered on the datahub. The dataset contains extracts from 22 Wiktionary language editions. It also contains a set of additional data that is computed from extracted content. This is what I called enhancements. Up to now, the main enhancement (and the one you can reasonably count on) is a set of disambiguated translations (see files named ll_dbnary_enhancement.ttl.bz2 where ll is the language code). In this file you will find links from translation pairs to the specific word-sense(s) for which the translation is valid. The dataset may be downloaded or accessed online. STATISTICS The Dashboard will allow you to see the number of Pages, Entries, Senses, Translations, Lexical Relation… that are available globally or in each language edition. A SHORT HISTORY OF DBNARY In August 2012, the first version of DBnary was released as a participation to the Monnet Challenge for Lexical Linked Data. At that time, there were a few language extracted (mainly English, French, German, Italian and Portuguese). From the beginning, the extraction process has been designed as an ongoing process were each wiktionary dump is extracted as it is produced. This way, the dataset evolves with Wiktionary data (hence it also follows the evolution of languages). Moreover, new languages were introduced from time to time and we now maintain 22 different extractor. In practice, this means that the dataset evolves twice a month. Until July 2017, the dataset was modeled using the lemon vocabulary. At this date, all extractor switched to the ontolex vocabulary which extends over lemon and is now a W3C specification. All extracted versions are still available for download if anybody wants to study the evolution of the extracted data. From now on, the early versions of DBnary (modeled using lemon) is only available on Zenodo as we have difficulties maintaining the full history on our servers. The later version (from July 2017 and going) is still available for download on this server. As the extraction process goes on for years, the extractors and original data could become out of sync and extracted data will not reflect faithfully the wiktionary information anymore. In order to cope with this, statistics on extracted versions are computed and we use dashboard were the extraction history of each language may be studied. Usually, when the number of elements (pages, entries, translations, relations, etc.) decreases, it means that the Wiktionary community of the corresponding language edition has decided to change the way they represent the lexical information. When we detect such decrease, we try to adapt the extractor and re-synchronize them with the Wiktionary data. In the beginning, such stats were maintained in csv format and were external to the dataset. Now, all the history of statistics is available in RDF (using datacube vocabulary). These stats are available online and may be queried along with DBnary data through the SPARQL endpoint. Search for: RECENT POSTS * More examples extracted from Wiktionary * Gaelic and Catalan editions are now part of DBnary * Examples are now extracted from the English version * Eager to meet the Exolexica ? * DBnary dataset is now made available in HDT Dbnary Extraction Software by Gilles Sérasset is licensed under MIT License. Dbnary Dataset by Gilles Sérasset is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License. Based on a work at http://www.wiktionary.org. META * Log in * Entries feed * Comments feed * WordPress.org Copyright © 2024 Dbnary | Powered by Responsive Theme