lig-getalp-new.imag.fr Open in urlscan Pro
2001:660:5301:61::9  Public Scan

Submitted URL: https://lig-getalp-new.imag.fr/
Effective URL: https://lig-getalp-new.imag.fr/about-dbnary/
Submission: On July 20 via api from US — Scanned from FR

Form analysis 1 forms found in the DOM

GET https://lig-getalp-new.imag.fr/about-dbnary/

<form method="get" id="searchform" class="search-form" action="https://lig-getalp-new.imag.fr/about-dbnary/">
  <label class="screen-reader-text" for="s">Search for:</label>
  <div class="res-search-wrapper">
    <input type="search" class="field" name="s" id="s" placeholder="Search here …">
    <button type="submit" class="search-submit" value="Search"> <span class="res-search-icon icon-search"></span></button>
  </div>
</form>

Text Content

↓ Skip to Main Content

Dbnary

Main Navigation

Menu
 * Home
 * Download
 * Dashboard
 * Online Access
 * Development
 * Publications
 * News
 * Contact


HOME



Dbnary is an effort to provide multilingual lexical data extracted from
wiktionary. The extracted data is made available as LLOD (Linguistic Linked Open
Data). This data set has won the Monnet challenge in 2012.

Linguistic data currently includes Bulgarian, Catalan, Chinese, Dutch, English,
Finnish, French, Irish, German, Greek, Indonesian, Italian, Japanese, Kurdish,
Latin, Lithuanian, Malagasy, Norvegian, Polish, Portuguese, Russian,
Serbo-Croat, Spanish, Swedish and Turkish.


LICENCE

Dbnary is derived from Wiktionary and is distributed under Creative Commons
Attribution-ShareAlike 3.0.


ATTRIBUTION

If you use DBnary in a way or another, please link to this web page. When citing
this work in a scientific article, please do cite:

Sérasset Gilles (2014). DBnary: Wiktionary as a Lemon-Based Multilingual Lexical
Resource in RDF. to appear in Semantic Web Journal (special issue on
Multilingual Linked Open Data). [pdf]


DATASET

DBnary dataset is registered on the datahub.

The dataset contains extracts from 22 Wiktionary language editions. It also
contains a set of additional data that is computed from extracted content. This
is what I called enhancements. Up to now, the main enhancement (and the one you
can reasonably count on) is a set of disambiguated translations (see files named
ll_dbnary_enhancement.ttl.bz2 where ll is the language code). In this file you
will find links from translation pairs to the specific word-sense(s) for which
the translation is valid.

The dataset may be downloaded or accessed online.


STATISTICS

The Dashboard will allow you to see the number of Pages, Entries, Senses,
Translations, Lexical Relation… that are available globally or in each language
edition.


A SHORT HISTORY OF DBNARY

In August 2012, the first version of DBnary was released as a participation to
the Monnet Challenge for Lexical Linked Data. At that time, there were a few
language extracted (mainly English, French, German, Italian and Portuguese).

From the beginning, the extraction process has been designed as an ongoing
process were each wiktionary dump is extracted as it is produced. This way, the
dataset evolves with Wiktionary data (hence it also follows the evolution of
languages). Moreover, new languages were introduced from time to time and we now
maintain 22 different extractor.

In practice, this means that the dataset evolves twice a month. 

Until July 2017, the dataset was modeled using the lemon vocabulary. At this
date, all extractor switched to the ontolex vocabulary which extends over lemon
and is now a W3C specification.

All extracted versions are still available for download if anybody wants to
study the evolution of the extracted data. From now on, the early versions of
DBnary (modeled using lemon) is only available on Zenodo as we have difficulties
maintaining the full history on our servers. The later version (from July 2017
and going) is still available for download on this server.

As the extraction process goes on for years, the extractors and original data
could become out of sync and extracted data will not reflect faithfully the
wiktionary information anymore. In order to cope with this, statistics on
extracted versions are computed and we use dashboard were the extraction history
of each language may be studied. Usually, when the number of elements (pages,
entries, translations, relations, etc.) decreases, it means that the Wiktionary
community of the corresponding language edition has decided to change the way
they represent the lexical information. When we detect such decrease, we try to
adapt the extractor and re-synchronize them with the Wiktionary data. In the
beginning, such stats were maintained in csv format and were external to the
dataset. Now, all the history of statistics is available in RDF (using datacube
vocabulary). These stats are available online and may be queried along with
DBnary data through the SPARQL endpoint. 

Search for:


RECENT POSTS

 * More examples extracted from Wiktionary
 * Gaelic and Catalan editions are now part of DBnary
 * Examples are now extracted from the English version
 * Eager to meet the Exolexica ?
 * DBnary dataset is now made available in HDT

Dbnary Extraction Software by Gilles Sérasset is licensed under MIT License.
Dbnary Dataset by Gilles Sérasset is licensed under a Creative Commons
Attribution-ShareAlike 3.0 Unported License.
Based on a work at http://www.wiktionary.org.


META

 * Log in
 * Entries feed
 * Comments feed
 * WordPress.org

Copyright © 2024 Dbnary | Powered by Responsive Theme