www.researchgate.net Open in urlscan Pro
2606:4700::6811:2169  Public Scan

URL: https://www.researchgate.net/publication/257391740_Evolutionary_algorithm_based_on_different_semantic_similarity_functions_fo...
Submission: On August 26 via manual from AT

Form analysis 3 forms found in the DOM

GET search

<form method="GET" action="search" class="lite-page__header-search-input-wrapper"><input type="hidden" name="context" readonly="" value="publicSearchHeader"><input placeholder="Search for publications, researchers, or questions" name="q"
    autocomplete="off" class="lite-page__header-search-input"><button
    class="nova-c-button nova-c-button--align-center nova-c-button--radius-m nova-c-button--size-m nova-c-button--color-green nova-c-button--theme-bare nova-c-button--width-square lite-page__header-search-button" type="submit"><span
      class="nova-c-button__label"><svg aria-hidden="true" class="nova-e-icon nova-e-icon--size-s nova-e-icon--theme-bare nova-e-icon--color-inherit nova-e-icon--luminosity-medium">
        <use xlink:href="/m/4226152288051846/images/icons/nova/icon-stack-s.svg#magnifier-s"></use>
      </svg></span></button></form>

Name: loginFormPOST https://www.researchgate.net/login?_sg=Gy3Xp0Y7VTUu35YmUkK3IW7yKAnheTfGM-lbXD8JLsIUcb61FhJpuzoaQDTixDeXqugbDtxwWkTxjw

<form method="post" action="https://www.researchgate.net/login?_sg=Gy3Xp0Y7VTUu35YmUkK3IW7yKAnheTfGM-lbXD8JLsIUcb61FhJpuzoaQDTixDeXqugbDtxwWkTxjw" name="loginForm" id="headerLoginForm"><input type="hidden" name="request_token"
    value="aad-cmIZAWbd9q5Y9tvghRnpIyREcWAenKW/MvhhGpycOoGNM0mutpIz81E7H/aS6UFRgZe+XDOTBkRHoCxUREbYEOX/LnHeNgphDkMaQoLsZwMOICjlGwDBjcETip2vDHju1w026hBCKjJN9zfKe68xo92jPw8VZ9IER89cDlg0U7hC5XI9S7cioOkR1Rjcs4eKYM+hkqfzRLNZh7RKXiyEslsQK+pYQklfrTN146ZcqG6jNZrzQZuk9oxkjIXWtYX4lrHkS0Q9FNy+jUq8uCQ="><input
    type="hidden" name="urlAfterLogin" value="publication/257391740_Evolutionary_algorithm_based_on_different_semantic_similarity_functions_for_synonym_recognition_in_the_biomedical_domain"><input type="hidden" name="invalidPasswordCount"
    value="0"><input type="hidden" name="headerLogin" value="yes">
  <div class="lite-page__header-login-item"><label class="nova-e-text nova-e-text--size-m nova-e-text--family-sans-serif nova-e-text--spacing-none nova-e-text--color-inherit lite-page__header-login-label" for="input-header-login">Email <div
        class="lite-page-tooltip "><svg aria-hidden="true" class="nova-e-icon nova-e-icon--size-s nova-e-icon--theme-bare nova-e-icon--color-inherit nova-e-icon--luminosity-medium">
          <use xlink:href="/m/4226152288051846/images/icons/nova/icon-stack-s.svg#info-circle-s"></use>
        </svg>
        <div class="lite-page-tooltip__content lite-page-tooltip__content--above">
          <div class="nova-e-text nova-e-text--size-s nova-e-text--family-sans-serif nova-e-text--spacing-none nova-e-text--color-inherit"><b>Tip:</b> Most researchers use their institutional email address as their ResearchGate login</div>
          <div class="lite-page-tooltip__arrow lite-page-tooltip__arrow--above">
            <div class="lite-page-tooltip__arrow-tip"></div>
          </div>
        </div>
      </div></label></div><input type="email" required="" id="input-header-login" name="login" autocomplete="email" tabindex="1" placeholder=""
    class="nova-e-input__field nova-e-input__field--size-m lite-page__header-login-item nova-e-input__ambient nova-e-input__ambient--theme-default">
  <div class="lite-page__header-login-item"><label class="lite-page__header-login-label"
      for="input-header-password">Password</label><a class="nova-e-link nova-e-link--color-blue nova-e-link--theme-bare lite-page__header-login-forgot" href="application.LostPassword.html">Forgot password?</a></div><input type="password" required=""
    id="input-header-password" name="password" autocomplete="current-password" tabindex="2" placeholder=""
    class="nova-e-input__field nova-e-input__field--size-m lite-page__header-login-item nova-e-input__ambient nova-e-input__ambient--theme-default">
  <div><label class="nova-e-checkbox lite-page__header-login-checkbox"><input type="checkbox" class="nova-e-checkbox__input" aria-invalid="false" name="setLoginCookie" tabindex="3" value="yes" checked=""><span class="nova-e-checkbox__label"> Keep me
        logged in</span></label></div>
  <div class="nova-l-flex__item nova-l-flex nova-l-flex--gutter-m nova-l-flex--direction-column@s-up nova-l-flex--align-items-stretch@s-up nova-l-flex--justify-content-center@s-up nova-l-flex--wrap-nowrap@s-up">
    <div class="nova-l-flex__item"><button class="nova-c-button nova-c-button--align-center nova-c-button--radius-m nova-c-button--size-m nova-c-button--color-blue nova-c-button--theme-solid nova-c-button--width-full" type="submit" tabindex="4"><span
          class="nova-c-button__label">Log in</span></button></div>
    <div class="nova-l-flex__item nova-l-flex__item--align-self-center@s-up">
      <div class="nova-e-text nova-e-text--size-s nova-e-text--family-sans-serif nova-e-text--spacing-none nova-e-text--color-inherit">or</div>
    </div>
    <div class="nova-l-flex__item">
      <div
        class="nova-legacy-l-flex__item nova-legacy-l-flex nova-legacy-l-flex--gutter-m nova-legacy-l-flex--direction-column@s-up nova-legacy-l-flex--align-items-center@s-up nova-legacy-l-flex--justify-content-flex-start@s-up nova-legacy-l-flex--wrap-nowrap@s-up">
        <div class="nova-legacy-l-flex__item">
          <a href="connector/google"><div style="display:inline-block;width:247px;height:40px;text-align:left;border-radius:2px;white-space:nowrap;color:#444;background:#4285F4"><span style="margin:1px 0 0 1px;display:inline-block;vertical-align:middle;width:38px;height:38px;background:url('images/socialNetworks/logos-official-2019-05/google-logo.svg') transparent 50% no-repeat"></span><span style="color:#FFF;display:inline-block;vertical-align:middle;padding-left:15px;padding-right:42px;font-size:16px;font-family:Roboto, sans-serif">Continue with Google</span></div></a>
        </div>
      </div>
    </div>
  </div>
</form>

Name: loginFormPOST https://www.researchgate.net/login?_sg=Gy3Xp0Y7VTUu35YmUkK3IW7yKAnheTfGM-lbXD8JLsIUcb61FhJpuzoaQDTixDeXqugbDtxwWkTxjw

<form method="post" action="https://www.researchgate.net/login?_sg=Gy3Xp0Y7VTUu35YmUkK3IW7yKAnheTfGM-lbXD8JLsIUcb61FhJpuzoaQDTixDeXqugbDtxwWkTxjw" name="loginForm" id="modalLoginForm"><input type="hidden" name="request_token"
    value="aad-cmIZAWbd9q5Y9tvghRnpIyREcWAenKW/MvhhGpycOoGNM0mutpIz81E7H/aS6UFRgZe+XDOTBkRHoCxUREbYEOX/LnHeNgphDkMaQoLsZwMOICjlGwDBjcETip2vDHju1w026hBCKjJN9zfKe68xo92jPw8VZ9IER89cDlg0U7hC5XI9S7cioOkR1Rjcs4eKYM+hkqfzRLNZh7RKXiyEslsQK+pYQklfrTN146ZcqG6jNZrzQZuk9oxkjIXWtYX4lrHkS0Q9FNy+jUq8uCQ="><input
    type="hidden" name="urlAfterLogin" value="publication/257391740_Evolutionary_algorithm_based_on_different_semantic_similarity_functions_for_synonym_recognition_in_the_biomedical_domain"><input type="hidden" name="invalidPasswordCount"
    value="0"><input type="hidden" name="modalLogin" value="yes">
  <div class="nova-l-form-group nova-l-form-group--layout-stack nova-l-form-group--gutter-s">
    <div class="nova-l-form-group__item nova-l-form-group__item--width-auto@m-up"><label class="nova-e-text nova-e-text--size-m nova-e-text--family-sans-serif nova-e-text--spacing-xxs nova-e-text--color-inherit nova-e-label"
        for="input-modal-login-label"><span class="nova-e-label__text">Email <div class="lite-page-tooltip "><span class="nova-e-text nova-e-text--size-m nova-e-text--family-sans-serif nova-e-text--spacing-none nova-e-text--color-grey-500">·
              Hint</span>
            <div class="lite-page-tooltip__content lite-page-tooltip__content--above">
              <div class="nova-e-text nova-e-text--size-s nova-e-text--family-sans-serif nova-e-text--spacing-none nova-e-text--color-inherit"><b>Tip:</b> Most researchers use their institutional email address as their ResearchGate login</div>
              <div class="lite-page-tooltip__arrow lite-page-tooltip__arrow--above">
                <div class="lite-page-tooltip__arrow-tip"></div>
              </div>
            </div>
          </div></span></label><input type="email" required="" id="input-modal-login" name="login" autocomplete="email" tabindex="1" placeholder="Enter your email"
        class="nova-e-input__field nova-e-input__field--size-m nova-e-input__ambient nova-e-input__ambient--theme-default"></div>
    <div class="nova-l-form-group__item nova-l-form-group__item--width-auto@m-up">
      <div class="lite-page-modal__forgot"><label class="nova-e-text nova-e-text--size-m nova-e-text--family-sans-serif nova-e-text--spacing-xxs nova-e-text--color-inherit nova-e-label" for="input-modal-password-label"><span
            class="nova-e-label__text">Password</span></label><a class="nova-e-link nova-e-link--color-blue nova-e-link--theme-bare lite-page-modal__forgot-link" href="application.LostPassword.html">Forgot password?</a></div><input type="password"
        required="" id="input-modal-password" name="password" autocomplete="current-password" tabindex="2" placeholder="" class="nova-e-input__field nova-e-input__field--size-m nova-e-input__ambient nova-e-input__ambient--theme-default">
    </div>
    <div class="nova-l-form-group__item nova-l-form-group__item--width-auto@m-up">
      <div><label class="nova-e-checkbox"><input type="checkbox" class="nova-e-checkbox__input" aria-invalid="false" checked="" value="yes" name="setLoginCookie" tabindex="3"><span class="nova-e-checkbox__label"> Keep me logged in</span></label>
      </div>
    </div>
    <div class="nova-l-form-group__item nova-l-form-group__item--width-auto@m-up"><button
        class="nova-c-button nova-c-button--align-center nova-c-button--radius-m nova-c-button--size-m nova-c-button--color-blue nova-c-button--theme-solid nova-c-button--width-full" type="submit" tabindex="4"><span class="nova-c-button__label">Log
          in</span></button></div>
    <div class="nova-l-form-group__item nova-l-form-group__item--width-auto@m-up">
      <div class="nova-l-flex__item nova-l-flex nova-l-flex--gutter-m nova-l-flex--direction-column@s-up nova-l-flex--align-items-center@s-up nova-l-flex--justify-content-flex-start@s-up nova-l-flex--wrap-nowrap@s-up">
        <div class="nova-l-flex__item">
          <div class="nova-e-text nova-e-text--size-s nova-e-text--family-sans-serif nova-e-text--spacing-none nova-e-text--color-inherit">or</div>
        </div>
        <div class="nova-l-flex__item">
          <div
            class="nova-legacy-l-flex__item nova-legacy-l-flex nova-legacy-l-flex--gutter-m nova-legacy-l-flex--direction-column@s-up nova-legacy-l-flex--align-items-center@s-up nova-legacy-l-flex--justify-content-flex-start@s-up nova-legacy-l-flex--wrap-nowrap@s-up">
            <div class="nova-legacy-l-flex__item">
              <a href="connector/google"><div style="display:inline-block;width:247px;height:40px;text-align:left;border-radius:2px;white-space:nowrap;color:#444;background:#4285F4"><span style="margin:1px 0 0 1px;display:inline-block;vertical-align:middle;width:38px;height:38px;background:url('images/socialNetworks/logos-official-2019-05/google-logo.svg') transparent 50% no-repeat"></span><span style="color:#FFF;display:inline-block;vertical-align:middle;padding-left:15px;padding-right:42px;font-size:16px;font-family:Roboto, sans-serif">Continue with Google</span></div></a>
            </div>
          </div>
        </div>
        <div class="nova-l-flex__item">
          <div class="nova-e-text nova-e-text--size-s nova-e-text--family-sans-serif nova-e-text--spacing-none nova-e-text--color-grey-500 lite-page-center">No account?
            <a class="nova-e-link nova-e-link--color-blue nova-e-link--theme-decorated" href="signup.SignUp.html?hdrsu=1&amp;_sg%5B0%5D=KuTNQ_Y84LJaxsjXg3l9iaokGDjG9zhkiVk7YZ5pkaSUOOI5Cp3gykbi9pqab950i61Cnkklzf3yZzyWUhx4oTD664E">Sign up</a></div>
        </div>
      </div>
    </div>
  </div>
</form>

Text Content

Article


EVOLUTIONARY ALGORITHM BASED ON DIFFERENT SEMANTIC SIMILARITY FUNCTIONS FOR
SYNONYM RECOGNITION IN THE BIOMEDICAL DOMAIN

 * January 2013
 * Knowledge-Based Systems 37:62–69

DOI:10.1016/j.knosys.2012.07.005
 * Project: Semantic Similarity Measurement

Authors:
Jose M. Chaves-González
 * Universidad de Extremadura



Jorge Martinez-Gil
 * Software Competence Center Hagenberg



Request full-text PDF

To read the full-text of this research, you can request a copy directly from the
authors.

Request full-text
Download citation
Copy link Link copied
Request full-text
Download citation
Copy link Link copied
To read the full-text of this research, you can request a copy directly from the
authors.
Citations (21)
References (32)





ABSTRACT

One of the most challenging problems in the semantic web field consists of
computing the semantic similarity between different terms. The problem here is
the lack of accurate domain-specific dictionaries, such as biomedical, financial
or any other particular and dynamic field. In this article we propose a new
approach which uses different existing semantic similarity methods to obtain
precise results in the biomedical domain. Specifically, we have developed an
evolutionary algorithm which uses information provided by different semantic
similarity metrics. Our results have been validated against a variety of
biomedical datasets and different collections of similarity functions. The
proposed system provides very high quality results when compared against
similarity ratings provided by human experts (in terms of Pearson correlation
coefficient) surpassing the results of other relevant works previously published
in the literature.

Discover the world's research

 * 20+ million members
 * 135+ million publications
 * 700k+ research projects

Join for free



NO FULL-TEXT AVAILABLE

To read the full-text of this research,
you can request a copy directly from the authors.

Request full-text PDF



CITATIONS (21)


REFERENCES (32)




... This aggregation is often computed by means of heuristic and meta-heuristic
functions. 7 Our hypothesis is that these methods are not optimal, and
therefore, can be improved. The reason is that these methods are not able to
deal with the non-stochastic uncertainty induced from subjectivity, vagueness
and imprecision from the human language (even when naming concepts from the
biomedical field). ...
... In the past, there have been great efforts in finding new semantic
similarity measures mainly due to it be of fundamental importance in many
application-oriented fields of the computer science. 7 The reason is that these
techniques can be used for going beyond the literal lexical match of many kinds
of text expressions. Past works in this field include the automatic processing
of text and email messages, 14 healthcare dialogue systems, 5 question
answering, 21 and sentence fusion. ...
... For the rest of this work, we are considering similarity measures, since
most of authors in this field uses semantic similarity measures for identifying
semantic correspondences. 7 Currently, the semantic similarity for a pair of
biological expressions is computed using an aggregation function of the
individual semantic similarity values. This approach has proven to achieve very
good results. ...

Accurate Semantic Similarity Measurement of Biomedical Nomenclature by Means of
Fuzzy Logic
Article
 * Apr 2016
 * INT J UNCERTAIN FUZZ

 * Jorge Martinez-Gil

Semantic similarity measurement of biomedical nomenclature aims to determine the
likeness between two biomedical expressions that use different lexicographies
for representing the same real biomedical concept. There are many semantic
similarity measures for trying to address this issue, many of them have
represented an incremental improvement over the previous ones. In this work, we
present yet another incremental solution that is able to outperform existing
approaches by using a sophisticated aggregation method based on fuzzy logic.
Results show us that our strategy is able to consistently beat existing
approaches when solving well-known biomedical benchmark data sets.
View
Show abstract
... The relevant works conduct semantic fusion from different aspects. We define
their classifications as vector-level [4,7], metriclevel [1,2,6,42], and
model-level [10,41,43] according to the increasing granularity of semantic
fusion between corpus and ontology. ...
... Alves et al. proposed a regression function where the lexical similarity,
syntactic similarity, semantic similarity, and distributional similarity are
input as factors [2]. Chaves-González and MartíNez-Gil used evolutionary
algorithm to optimize the unsupervised combination of various WordNet-based
similarity metrics [6]. Yih and Qazvinian averaged the similarity results
derived from heterogeneous vector space models on Wikipedia, web search,
thesaurus, and WordNet, respectively [42]. ...

Joint semantic similarity assessment with raw corpus and structured ontology for
semantic-oriented service discovery
Article
Full-text available
 * Jun 2016

 * Wei Lu
 * Yuanyuan Cai
 * Xiaoping Che
 * Yuxun Lu

Semantic-oriented service matching is one of the challenges in automatic Web
service discovery. Service users may search for Web services using keywords and
receive the matching services in terms of their functional profiles. A number of
approaches to computing the semantic similarity between words have been
developed to enhance the precision of matchmaking, which can be classified into
ontology-based and corpus-based approaches. The ontology-based approaches
commonly use the differentiated concept information provided by a large ontology
for measuring lexical similarity with word sense disambiguation. Nevertheless,
most of the ontologies are domain-special and limited to lexical coverage, which
have a limited applicability. On the other hand, corpus-based approaches rely on
the distributional statistics of context to represent per word as a vector and
measure the distance of word vectors. However, the polysemous problem may lead
to a low computational accuracy. In this paper, in order to augment the semantic
information content in word vectors, we propose a multiple semantic fusion (MSF)
model to generate sense-specific vector per word. In this model, various
semantic properties of the general-purpose ontology WordNet are integrated to
fine-tune the distributed word representations learned from corpus, in terms of
vector combination strategies. The retrofitted word vectors are modeled as
semantic vectors for estimating semantic similarity. The MSF model-based
similarity measure is validated against other similarity measures on multiple
benchmark datasets. Experimental results of word similarity evaluation indicate
that our computational method can obtain higher correlation coefficient with
human judgment in most cases. Moreover, the proposed similarity measure is
demonstrated to improve the performance of Web service matchmaking based on a
single semantic resource. Accordingly, our findings provide a new method and
perspective to understand and represent lexical semantics.
View
Show abstract
... These measures are: (a) Hirst, (b) Jiang, (c) Resnik, (d) Leacock and (e)
Lin. A detailed description of these measures is out the scope of this work, but
some explanatory insights are described in Chaves-Gonzalez and Martinez-Gil
(2013). For us, it is enough to know these single measures are the
state-of-the-art in the field of semantic similarity. ...
... Table 6 shows the results for the aggregation of the different semantic
similarity measures based on cutting-edge similarity measures from the
biomedical domain. Explaining each of them is out of the scope of this work, but
a detailed description can be found in Chaves-Gonzalez and Martinez-Gil (2013).
Once again, the strategy CoTo (Consensus or Trade-Off) is able to beat all the
single measures as well as all the compensative operators by a wide margin. ...

CoTO: A Novel Approach for Fuzzy Aggregation of Semantic Similarity Measures
Article
Full-text available
 * Feb 2016
 * COGN SYST RES

 * Jorge Martinez-Gil

Semantic similarity measurement aims to determine the likeness between two text
expressions that use different lexicographies for representing the same real
object or idea. There are a lot of semantic similarity measures for addressing
this problem. However, the best results have been achieved when aggregating a
number of simple similarity measures. This means that after the various
similarity values have been calculated, the overall similarity for a pair of
text expressions is computed using an aggregation function of these individual
semantic similarity values. This aggregation is often computed by means of
statistical functions. In this work, we present CoTO (Consensus or Trade-Off) a
solution based on fuzzy logic that is able to outperform these traditional
approaches.
View
Show abstract
... In this way, when analyzing the curriculum of job candidates, this kind of
techniques can operate at the conceptual level when comparing specific terms
(e.g., Finance) also yields matches on related terms (e.g., Economics, Economic
Affairs, Financial Affairs, etc.). As another example, in the healthcare field,
an expert on the treatment of cancer could also be considered as an expert on
oncology, lymphoma or tumor treatment, etc. [9]. The potential of this kind of
techniques is that it can support Human Resource Management when leading to a
more quickly and easily cut through massive volumes of potential candidate
information, but without giving up the way human experts take decisions in the
real world. ...

A Smart Approach for Matching, Learning and Querying Information from the Human
Resources Domain
Conference Paper
Full-text available
 * Aug 2016

 * Jorge Martinez-Gil
 * Alejandra Lorena Paoletti
 * Klaus-dieter Schewe

We face the complex problem of timely, accurate and mutually satisfactory
mediation between job offers and suitable applicant profiles by means of
semantic processing techniques. In fact, this problem has become a major
challenge for all public and private recruitment agencies around the world as
well as for employers and job seekers. It is widely agreed that smart algorithms
for automatically matching, learning, and querying job offers and candidate
profiles will provide a key technology of high importance and impact and will
help to counter the lack of skilled labor and/or appropriate job positions for
unemployed people. Additionally, such a framework can support global matching
aiming at finding an optimal allocation of job seekers to available jobs, which
is relevant for independent employment agencies, e.g. in order to reduce
unemployment.
View
Show abstract
... The SR is also exploited in the biomedical domain. For example, authors in
[62] have combined a set of semantic similarity methods in order to recognize
the synonymy. Actually, the SR has been used to provide optimal centroids for
datasets [63] (i.e, those that minimize the distance to all elements pertaining
to the dataset). ...

Computing semantic relatedness using Wikipedia features
Article
Full-text available
 * Sep 2013
 * KNOWL-BASED SYST

 * Mohamed Ali Hadj Taieb
 * Mohamed Ben Aouicha
 * Abdelmajid Ben Hamadou

Measuring semantic relatedness is a critical task in many domains such as
psychology, biology, linguistics, cognitive science and artificial intelligence.
In this paper, we propose a novel system for computing semantic relatedness
between words. Recent approaches have exploited Wikipedia as a huge semantic
resource that showed good performances. Therefore, we utilized the Wikipedia
features (articles, categories, Wikipedia category graph and redirection) in a
system combining this Wikipedia semantic information in its different
components. The approach is preceded by a pre-processing step to provide for
each category pertaining to the Wikipedia category graph a semantic description
vector including the weights of stems extracted from articles assigned to the
target category. Next, for each candidate word, we collect its categories set
using an algorithm for categories extraction from the Wikipedia category graph.
Then, we compute the semantic relatedness degree using existing vector
similarity metrics (Dice, Overlap and Cosine) and a new proposed metric that
performed well as cosine formula. The basic system is followed by a set of
modules in order to exploit Wikipedia features to quantify better as possible
the semantic relatedness between words. We evaluate our measure based on two
tasks: comparison with human judgments using five datasets and a specific
application “solving choice problem”. Our result system shows a good performance
and outperforms sometimes ESA (Explicit Semantic Analysis) and TSA (Temporal
Semantic Analysis) approaches.
View
Show abstract
... Semantic similarity measurement techniques have gained great importance with
the advent of Semantic Web (Chaves-González and MartíNez-Gil, 2013). The term
semantic similarity indicates the computation of the amount of similarity among
the concepts, which does not necessary to be a lexical similarity but it could
be a conceptual similarity. ...

Semantic Similarity Measurement Methods: The State-of-the-art
Article
 * Nov 2014

 * Fatmah Nazar Mahmood
 * Amirah Ismail

With increasing importance of estimating the semantic similarity between
concepts this study tries to highlight some methods used in this area.
Similarity measurement between concepts has become a significant component in
most intelligent knowledge management applications, especially in fields of
Information Extraction (IE) and Information Retrieval (IR). Measuring similarity
among concepts has been considered as a quantitative measure of the information;
computation of similarity relies on the relations and the properties linked
between the concepts in ontology. In this study we have briefly reviewed the
main categories of semantic similarity.
View
Show abstract
... For example, consumers may use "Small4L" to replace the "Audi A4L."
Synonyms' semantic functions and rules are used to comply with the following key
points: Substitution between synonyms always occurs in a similar context, which
means that the appearance of different synonym items in a group of synonyms
often implies the similarity of context. Therefore, this can be regarded as a
sign of the same "topic context" [11]. A mathematical structure is always used
in text mining, such as "vector" to represent a word. ...

Predicting sales by online searching data keywords based on text mining:
Evidence from the Chinese automobile market
Article
Full-text available
 * Oct 2019
 * J Phys Conf

 * Yi Li
 * Liangru Yu
 * Rui Wen

View
... This means that regression analysis tries to predict the final semantic
similarity score based on the known values of other semantic similarity
measures. In this context, there are different types under which regression
analysis linear regression (Chaves-González & Martinez-Gil, 2013), N-Gram
regression (Malandrakis et al., 2012), or Support Vector regression (Croce et
al., 2012). In general, the 135 results obtained through regression analysis are
usually quite easy to interpret by a human, and therefore they are commonly used
as interpretable models. ...

A novel method based on symbolic regression for interpretable semantic
similarity measurement
Article
 * Jun 2020
 * EXPERT SYST APPL

 * Jorge Martinez-Gil
 * José Manuel Chaves-González

The problem of automatically measuring the degree of semantic similarity between
textual expressions is a challenge that consists of calculating the degree of
likeness between two text fragments that have none or few features in common
according to human judgment. In recent times, several machine learning methods
have been able to establish a new state-of-the-art regarding the accuracy, but
none or little attention has been paid to their interpretability, i.e. the
extent to which an end-user could be able to understand the cause of the output
from these approaches. Although such solutions based on symbolic regression
already exist in the field of clustering (Lensen et al., 2019), we propose here
a new approach which is being able to reach high levels of interpretability
without sacrificing accuracy in the context of semantic textual similarity.
After a complete empirical evaluation using several benchmark datasets, it is
shown that our approach yields promising results in a wide range of scenarios.
View
Show abstract
... For instance, in the fields of Natural Language Processing (NLP) and IR,
ontology-based semantic similarity measures have been used in Word Sense
Disambiguation (WSD) methods [92] , text similarity measures [86] , spelling
error detection [20] , sentence similarity models [44,66,91] , paraphrase
detection [36] , unified sense disambiguation methods for different types of
structured sources of knowledge [73] , document clustering [31] , ontology
alignment [30] , document [74] and query anonymization [11] , clustering of
nominal information [9,10] , chemical entity identification [40] ,
interoperability among agent-based systems [34] , and ontology-based Information
Retrieval (IR) models [55,62] to solve the lack of an intrinsic semantic
distance in vector ontologybased IR models [23] . In the field of
bioengineering, ontologybased similarity measures have been proposed for synonym
recognition [24] and biomedical text mining [14,98,112] . However, since the
pioneering work of Lord et al. [72] , the proposal of similarity measures for
genomics and proteomics based on the Gene Ontology (GO) [5] have attracted a lot
of attention, as detailed in a recent survey on the topic [76] . ...

HESML: A scalable ontology-based semantic similarity measures library with a set
of reproducible experiments and a replication dataset
Article
Full-text available
 * Jun 2017
 * INFORM SYST

 * Juan José Lastra-Díaz
 * Ana M Garcia-Serrano
 * Montserrat Batet
 * Fernando Chirigati

This work is a detailed companion reproducibility paper of the methods and
experiments proposed by Lastra-Díaz and García-Serrano in [56, 57, 58], which
introduces the following contributions: (1) a new and efficient representation
model for taxonomies, called PosetHERep, which is an adaptation of the half-edge
data structure commonly used to represent discrete manifolds and planar graphs;
(2) a new Java software library called the Half-Edge Semantic Measures Library
(HESML) based on PosetHERep, which implements most ontology-based semantic
similarity measures and Information Content (IC) models reported in the
literature; (3) a set of reproducible experiments on word similarity based on
HESML and ReproZip with the aim of exactly reproducing the experimental surveys
in the three aforementioned works; (4) a replication framework and dataset,
called WNSimRep v1, whose aim is to assist the exact replication of most methods
reported in the literature; and finally, (5) a set of scalability and
performance benchmarks for semantic measures libraries. PosetHERep and HESML are
motivated by several drawbacks in the current semantic measures libraries,
especially the performance and scalability, as well as the evaluation of new
methods and the replication of most previous methods. The reproducible
experiments introduced herein are encouraged by the lack of a set of large,
self-contained and easily reproducible experiments with the aim of replicating
and confirming previously reported results. Likewise, the WNSimRep v1 dataset is
motivated by the discovery of several contradictory results and difficulties in
reproducing previously reported methods and experiments. PosetHERep proposes a
memory-efficient representation for taxonomies which linearly scales with the
size of the taxonomy and provides an efficient implementation of most
taxonomy-based algorithms used by the semantic measures and IC models, whilst
HESML provides an open framework to aid research into the area by providing a
simpler and more efficient software architecture than the current software
libraries. Finally, we prove the outperformance of HESML on the state-of-the-art
libraries, as well as the possibility of significantly improving their
performance and scalability without caching using PosetHERep.
View
Show abstract
... In this way, when analyzing the curriculum of job candidates, this kind of
techniques can operate at the conceptual level when comparing specific terms
(e.g., Finance) also yields matches on related terms (e.g., Economics, Economic
Affairs, Financial Affairs, etc.). As another example, in the healthcare field,
an expert on the treatment of cancer could also be considered as an expert on
oncology, lymphoma or tumor treatment, etc [9]. The potential of this kind of
techniques is that it can support Human Resource Management when leading to a
more quickly and easily cut through massive volumes of potential candidate
information, but without giving up the way human experts take decisions in the
real world. ...

A Smart Approach for Matching, Learning and Querying Information from the Human
Resources Domain
Book
Full-text available
 * Jan 2016

 * Jorge Martinez-Gil
 * Alejandra Lorena Paoletti
 * Klaus-dieter Schewe

View
... In this way, when analyzing the curriculum of job candidates, this kind of
techniques can operate at the conceptual level when comparing specific terms
(e.g., Finance) also yields matches on related terms (e.g., Economics, Economic
Affairs, Financial Affairs, etc.). As another example, in the healthcare field,
an expert on the treatment of cancer could also be considered as an expert on
oncology, lymphoma or tumor treatment, etc [9]. The potential of this kind of
techniques is that it can support Human Resource Management when leading to a
more quickly and easily cut through massive volumes of potential candidate
information, but without giving up the way human experts take decisions in the
real world. ...

A Smart Approach for Matching, Learning and Querying Information from the Human
Resources Domain
Preprint
Full-text available
 * Sep 2017

 * Jorge Martinez-Gil
 * Alejandra Lorena Paoletti
 * Klaus-dieter Schewe

We face the complex problem of timely, accurate and mutually satisfactory
mediation between job offers and suitable applicant profiles by means of
semantic processing techniques. In fact, this problem has become a major
challenge for all public and private recruitment agencies around the world as
well as for employers and job seekers. It is widely agreed that smart algorithms
for automatically matching, learning, and querying job offers and candidate
profiles will provide a key technology of high importance and impact and will
help to counter the lack of skilled labor and/or appropriate job positions for
unemployed people. Additionally, such a framework can support global matching
aiming at finding an optimal allocation of job seekers to available jobs, which
is relevant for independent employment agencies, e.g. in order to reduce
unemployment.
View
Show abstract
... A detailed description of these measures is out the scope of this work, but
some explanatory insights are described in [8]. For us, it is enough to know
these single measures are the state-of-the-art in the field of semantic
similarity measurement [9]. Table 2 shows the results for the aggregation of the
different semantic similarity measures based on dictionary measures. ...

CoTO: A Novel Approach for Fuzzy Aggregation of Semantic Similarity Measures
Preprint
Full-text available
 * Sep 2017

 * Jorge Martinez-Gil

Semantic similarity measurement aims to determine the likeness between two text
expressions that use different lexicographies for representing the same real
object or idea. There are a lot of semantic similarity measures for addressing
this problem. However, the best results have been achieved when aggregating a
number of simple similarity measures. This means that after the various
similarity values have been calculated, the overall similarity for a pair of
text expressions is computed using an aggregation function of these individual
semantic similarity values. This aggregation is often computed by means of
statistical functions. In this work, we present CoTO (Consensus or Trade-Off) a
solution based on fuzzy logic that is able to outperform these traditional
approaches.
View
Show abstract
... In this way, it is possible to reduce the risk of relying on a single ssm
operating within production environments. Moreover, this approach has proven to
achieve good results in the past ( Chaves-González & Martinez-Gil, 2013 ). The
rationale behind this way of working is very intuitive; if there are some
specific ssm not being able to perform reasonably well for the particular
comparison of terms or textual expressions, their effects can be blurred by
others ssm that achieve better performance. ...

Automatic Design of Semantic Similarity Controllers based on Fuzzy Logics
Article
 * Apr 2019
 * EXPERT SYST APPL

 * Jorge Martinez-Gil
 * José Manuel Chaves-González

Recent advances in machine learning have been able to make improvements over the
state-of-the-art regarding semantic similarity measurement techniques. In fact,
we have all seen how classical techniques have given way to promising neural
techniques. Nonetheless, these new techniques have a weak point: they are hardly
interpretable. For this reason, we have oriented our research towards the design
of strategies being able to be accurate enough but without sacrificing their
interpretability. As a result, we have obtained a strategy for the automatic
design of semantic similarity controllers based on fuzzy logics, which are
automatically identified using genetic algorithms (GAs). After an exhaustive
evaluation using a number of well-known benchmark datasets, we can conclude that
our strategy fulfills both expectations: it is able of achieving reasonably good
results, and at the same time, it can offer high degrees of interpretability.
View
Show abstract
... Current approaches on semantic similarity measurement are usually following
an approach based on the aggregation of similarity scores retrieved from a
number of different ssm, i.e. aggregation methods try to accurately aggregate
different viewpoints to come to a final decision ( Grabisch et al. 2011). As
consequence, some authors have proposed some similarity aggregation techniques
that have achieved good results in the past (Chaves- González and Martinez-Gil
2013). The rationale behind this approach is very intuitive; if there are some
ssm not being able to perform reasonably well for a particular comparison, their
effects on the overall performance can be blurred by other ssm being able to
provide better results (Martinez-Gil 2016b). ...

Semantic similarity aggregators for very short textual expressions: a case study
on landmarks and point of interest
Article
Full-text available
 * Oct 2019
 * J Intell Inform Syst

 * Jorge Martinez-Gil

Semantic similarity measurement aims to automatically compute the degree of
similarity between two textual expressions that use different representations
for naming the same concepts. However, very short textual expressions cannot
always follow the syntax of a written language and, in general, do not provide
enough information to support proper analysis. This means that in some fields,
such as the processing of landmarks and points of interest, results are not
entirely satisfactory. In order to overcome this situation, we explore the idea
of aggregating existing methods by means of two novel aggregation operators
aiming to model an appropriate interaction between the similarity measures. As a
result, we have been able to improve the results of existing techniques when
solving the GeReSiD and the SDTS, two of the most popular benchmark datasets for
dealing with geographical information.
View
Show abstract
Near-synonym substitution using a discriminative vector space model
Article
 * May 2016
 * KNOWL-BASED SYST

 * Liang-Chih Yu
 * Lung-Hao Lee
 * Jui-Feng Yeh
 * Yu-Ling Lai

Near-synonyms are fundamental and useful knowledge resources for
computer-assisted language learning (CALL) applications. For example, in online
language learning systems, learners may have a need to express a similar meaning
using different words. However, it is usually difficult to choose suitable
near-synonyms to fit a given context because the differences of near-synonyms
are not easily grasped in practical use, especially for second language (L2)
learners. Accordingly, it is worth developing algorithms to verify whether
near-synonyms match given contexts. Such algorithms could be used in
applications to assist L2 learners in discovering the collocational differences
between near-synonyms. We propose a discriminative vector space model for the
near-synonym substitution task, and consider this task as a classification task.
There are two components: a vector space model and discriminative training. The
vector space model is used as a baseline classifier to classify test examples
into one of the near-synonyms in a given near-synonym set. A discriminative
training technique is then employed to improve the vector space model by
distinguishing positive and negative features for each near-synonym.
Experimental results show that the DT-VSM achieves higher accuracy than both
pointwise mutual information and n-gram-based methods that have been used in
previous studies.
View
Show abstract
Data Reduction for Continuum of Care: An Exploratory Study Using the
Predicate-Argument Structure to Pre-process Radiology Sentences for Measurement
of Semantic Similarity
Conference Paper
 * Jul 2013

 * Eric T. Newsom
 * Josette Jones

In the clinical setting, continuum of care depends on integrated information
services to assure a smooth progression for patient centered care, and these
integrated information services must understand past events and personal
circumstances to make care relevant. Clinicians face a problem that the amount
of information produced in disparate electronic clinical notes is increasing to
levels incapable of being processed by humans. Clinicians need a function in
information services that can reduce the free text data to a message useful at
time of care. Information extraction (IE) is a sub-field of natural language
processing with the goal of data reduction of unstructured free text. Pertinent
to IE is an annotated corpus that frames how IE methods should create a logical
expression necessary for processing meaning of text. This study explores and
reports on the requirements to using the predicate-argument statement (PAS) as
the framework. A convenient sample from a prior study with ten synsets of 100
unique sentences from radiology reports deemed by domain experts to mean the
same thing will be the text from which PAS structures are formed. Through
content analysis of pattern recognition, findings show PAS is a feasible
framework to structure sentences for semantic similarity measurement.
View
Show abstract
DMS2015short-1: Semantic similarity assessment using differential evolution
algorithm in continuous vector space
Article
 * Nov 2015

 * Wei Lu
 * Yuanyuan Cai
 * Xiaoping Che
 * Kailun Shi

The assessment of semantic similarity between terms is one of the challenging
tasks in knowledge-based applications, such as multimedia retrieval, automatic
service discovery and emotion mining. By means of similarity estimation, the
comprehension of textual resources can become more feasible and accurate. Some
studies have proposed the integration of various assessment methods for taking
advantage of different semantic resources, but most of them simply employ
average operation or regression training. In this paper, we address this problem
by combining the corpus-based similarity methods with the WordNet-based methods
based on a differential evolution (DE) algorithm. Specifically, this DE-based
approach conducts similarity assessment in a continuous vector space. It is
validated against a variety of similarity approaches on multiple benchmark
datasets. Empirical results demonstrate that our approach outperforms existing
works and more conforms to the human judgement of similarity. The results also
prove the expressiveness of continuous vectors learned from neural network on
latent lexical semantics.
View
Show abstract
Differential evolution with Pareto tournament for the multi-objective next
release problem
Article
 * Feb 2015

 * Jose M. Chaves-González
 * Miguel A. Pérez-Toledano

Software requirements selection is the engineering process in which the set of
new requirements which will be included in the next release of a software
product are chosen. This NP-hard problem is an important issue involving several
contradictory objectives that have to be tackled by software companies when
developing new releases of software packages. Software projects have to stick to
a budget, but they also have to cover the highest number of customer
requirements. Furthermore, in real instances of the problem, the requirements
tackled suffer interactions and other restrictions which complicate the problem.
In this paper, we use an adapted multi-objective version of the differential
evolution (DE) evolutionary algorithm which has been successfully applied to
several real instances of the problem. For doing this, the software requirements
selection problem has been formulated as a multiobjective optimization problem
with two objectives: the total software development cost and the overall
customer’s satisfaction, and with three interaction constraints. On the other
hand, the original DE algorithm has been adapted to solve real instances of the
problem generated from data provided by experts. Numerical experiments with case
studies on software requirements selection have been carried out to demonstrate
the effectiveness of the multiobjective proposal and the obtained results show
that the developed algorithm performs better than other relevant algorithms
previously published in the literature under a set of public datasets.
View
Show abstract
Analysis of word co-occurrence in human literature for supporting semantic
correspondence discovery
Conference Paper
 * Sep 2014

 * Jorge Martinez-Gil
 * Mario Pichler

Semantic similarity measurement aims to determine the likeness between two text
expressions that use different lexicographies for representing the same real
object or idea. In this work, we describe the way to exploit broad cultural
trends for identifying semantic similarity. This is possible through the
quantitative analysis of a vast digital book collection representing the
digested history of humanity. Our research work has revealed that appropriately
analyzing the co-occurrence of words in some periods of human literature can
help us to determine the semantic similarity between these words by means of
computers with a high degree of accuracy.
View
Show abstract
Comparison of the effectiveness of different accessibility plugins based on
important accessibility criteria
Conference Paper
Full-text available
 * Jul 2013

 * Alireza Darvishy
 * Hans-Peter Hutter

This paper compares two new freely available software plugins for MS PowerPoint
and Word documents that we have developed at the ZHAW with similar tools with
respect to important accessibility criteria. Our plugins [1, 2, 3] allow the
analysis of accessibility issues and consequently the generation of fully
accessible PDF documents. The document authors using these plugins require no
specific accessibility knowledge. The plugins are based on a flexible software
architecture concept [1] that allows the automatic generation of fully
accessible PDF documents originating from various authoring tools, such as Adobe
InDesign [5], Word or PowerPoint [6, 7]. Other available plugins, on the other
hand, need accessibility knowledge in order to use them properly and
effectively.
View
Show abstract
Thinking on the Web: Berners-Lee, Gödel and Turing.
Article
 * Jan 2007
 * COMPUT J

 * Jorge Martinez-Gil

View
Differential Evolution: A Simple and Efficient Adaptive Scheme for Global
Optimization Over Continuous Spaces
Article
Full-text available
 * Jan 1995
 * J GLOBAL OPTIM

 * Rainer Martin Storn
 * Kenneth V. Price

A new heuristic approach for minimizing possibly nonlinear and non
differentiable continuous space functions is presented. By means of an extensive
testbed, which includes the De Jong functions, it will be demonstrated that the
new method converges faster and with more certainty than Adaptive Simulated
Annealing as well as the Annealed Nelder&Mead approach, both of which have a
reputation for being very powerful. The new method requires few control
variables, is robust, easy to use and lends itself very well to parallel
computation.
View
Show abstract
Verbs Semantics and Lexical Selection
Conference Paper
Full-text available
 * Jan 1994

 * Zhibiao Wu
 * Martha Palmer

This paper will focus on the semantic representation of verbs in computer
systems and its impact on lexical selection problems in machine translation
(MT). Two groups of English and Chinese verbs are examined to show that lexical
selection must be based on interpretation of the sentences as well as selection
restrictions placed on the verb arguments. A novel representation scheme is
suggested, and is compared to representations with selection restrictions used
in transfer-based MT. We see our approach as closely aligned with
knowledge-based MT approaches (KBMT), and as a separate component that could be
incorporated into existing systems. Examples and experimental results will show
that, using this scheme, inexact matches can achieve correct lexical selection.
View
Show abstract
A comparative analysis of crossover variants in differential evolution
Article
Full-text available
 * May 2006

 * Daniela Zaharie

This paper presents a comparative analysis of binomial and exponential crossover
in differential evolution. Some theoretical results concerning the probabilities
of mutating an arbitrary component and that of mutating a given number of
components are obtained for both crossover variants. The differences between
binomial and exponential crossover are identified and the impact of these
results on the choice of control parameters and on the adaptive variants is
analyzed.
View
Show abstract
Measuring Semantic Similarity Between Biomedical Concepts Within Multiple
Ontologies
Article
Full-text available
 * Aug 2009
 * IEEE T SYST MAN CY C

 * Hisham Al-Mubaid
 * Hoa A. Nguyen

Most of the intelligent knowledge-based applications contain components for
measuring semantic similarity between terms. Many of the existing semantic
similarity measures that use ontology structure as their primary source cannot
measure semantic similarity between terms and concepts using multiple
ontologies. This research explores a new way to measure semantic similarity
between biomedical concepts using multiple ontologies. We propose a new
ontology-structure-based technique for measuring semantic similarity in single
ontology and across multiple ontologies in the biomedical domain within the
framework of unified medical language system (UMLS). The proposed measure is
based on three features: 1) cross-modified path length between two concepts; 2)
a new feature of common specificity of concepts in the ontology; and 3) local
granularity of ontology clusters. The proposed technique was evaluated relative
to human similarity scores and compared with other existing measures using two
terminologies within UMLS framework: medical subject headings and systemized
nomenclature of medicine clinical term. The experimental results validate the
efficiency of the proposed technique in single and multiple ontologies, and
demonstrate that our proposed measure achieves the best results of correlation
with human scores in all experiments.
View
Show abstract
Ontology-based semantic similarity: A new feature-based approach
Article
Full-text available
 * Mar 2012
 * EXPERT SYST APPL

 * David Sánchez
 * Montserrat Batet
 * David Isern
 * Aida Valls

View
Ontology-based information content computation
Article
Full-text available
 * Mar 2011
 * KNOWL-BASED SYST

 * David Sánchez
 * Montserrat Batet
 * David Isern

The information content (IC) of a concept provides an estimation of its degree
of generality/concreteness, a dimension which enables a better understanding of
concept’s semantics. As a result, IC has been successfully applied to the
automatic assessment of the semantic similarity between concepts. In the past,
IC has been estimated as the probability of appearance of concepts in corpora.
However, the applicability and scalability of this method are hampered due to
corpora dependency and data sparseness. More recently, some authors proposed
IC-based measures using taxonomical features extracted from an ontology for a
particular concept, obtaining promising results. In this paper, we analyse these
ontology-based approaches for IC computation and propose several improvements
aimed to better capture the semantic evidence modelled in the ontology for the
particular concept. Our approach has been evaluated and compared with related
works (both corpora and ontology-based ones) when applied to the task of
semantic similarity estimation. Results obtained for a widely used benchmark
show that our method enables similarity estimations which are better correlated
with human judgements than related works.
View
Show abstract
Measuring semantic similarity between words using Web search engines
Conference Paper
Full-text available
 * Jan 2007

 * Danushka Bollegala
 * Yutaka Matsuo
 * Mitsuru Ishizuka

Semantic similarity measures play important roles in infor- mation retrieval and
Natural Language Processing. Previ- ous work in semantic web-related
applications such as com- munity mining, relation extraction, automatic meta
data extraction have used various semantic similarity measures. Despite the
usefulness of semantic similarity measures in these applications, robustly
measuring semantic similarity between two words (or entities) remains a
challenging task. We propose a robust semantic similarity measure that uses the
information available on the Web to measure similarity between words or
entities. The proposed method exploits page counts and text snippets returned by
a Web search engine. We deflne various similarity scores for two given words P
and Q, using the page counts for the queries P, Q and P AND Q. Moreover, we
propose a novel approach to compute semantic similarity using automatically
extracted lexico-syntactic patterns from text snippets. These difierent
similarity scores are integrated using support vector ma- chines, to leverage a
robust semantic similarity measure. Experimental results on Miller-Charles
benchmark dataset show that the proposed measure outperforms all the existing
web-based semantic similarity measures by a wide margin, achieving a correlation
coe-cient of 0:834. Moreover, the proposed semantic similarity measure
signiflcantly improves the accuracy (F-measure of 0:78) in a community mining
task, and in an entity disambiguation task, thereby verifying the capability of
the proposed measure to capture semantic similarity using web content.
View
Show abstract
A comparative study of differential evolution variants for global optimization
Conference Paper
Full-text available
 * Jan 2006

 * Efrén Mezura-Montes
 * Jesús Velázquez-Reyes
 * Carlos A. Coello Coello

In this paper, we present an empirical comparison of some Differential Evolution
variants to solve global optimization problems. The aim is to identify which one
of them is more suitable to solve an optimization problem, depending on the
problem's features and also to identify the variant with the best performance,
regardless of the features of the problem to be solved. Eight variants were
implemented and tested on 13 benchmark problems taken from the specialized lit-
erature. These variants vary in the type of recombination operator used and also
in the way in which the mutation is computed. A set of statistical tests were
performed in order to obtain more confidence on the validity of the re- sults
and to reinforce our discussion. The main aim is that this study can help both
researchers and practitioners in- terested in using differential evolution as a
global optimizer, since we expect that our conclusions can provide some in-
sights regarding the advantages or limitations of each of the variants studied.
View
Show abstract
Using a Natural Language Understanding System to Generate Semantic Web Content
Article
Full-text available
 * Oct 2007
 * INT J SEMANT WEB INF

 * Akshay Java
 * Sergei Nirenburg
 * M. McShane
 * Anupam Joshi

We describe our research on automatically generating rich semantic annotations
of text and making it available on the Semantic Web. In particular, we discuss
the challenges involved in adapting the OntoSem natural language processing
system for this purpose. OntoSem, an implementation of the theory of ontological
semantics under continuous development for over fifteen years, uses a specia lly
constructed NLP-oriented ontology and an ontological- semantic lexicon to
translate English text into a custom ontology-motivated knowledge representation
language, the language of text meaning representations (TMRs). OntoSem
concentrates on a variety of ambiguity resolution tasks as well as processing
unexpected input and reference. To adapt OntoSem's representation to the
Semantic Web, we developed a translation system, OntoSem2OWL, between the TMR
language into the Semantic Web language OWL. We next used OntoSem and
OntoSem2OWL to support SemNews, an e xperimental web service that monitors RSS
news sources, processes the summaries of the news stories and publishes a
structured representation of the meaning of the text in the news story.
View
Show abstract
Evaluating WordNet-based Measures of Lexical Semantic Relatedness
Article
Full-text available
 * Mar 2006
 * COMPUT LINGUIST

 * Alexander Budanitsky
 * Graeme Hirst

The quantification of lexical semantic relatedness has many applications in NLP,
and many different measures have been proposed. We evaluate five of these
measures, all of which use WordNet as their central resource, by comparing their
performance in detecting and correcting real-word spelling errors. An
information-content-based measure proposed by Jiang and Conrath is found
superior to those proposed by Hirst and St-Onge, Leacock and Chodorow, Lin, and
Resnik. In addition, we explain why distributional similarity is not an adequate
proxy for lexical semantic relatedness.
View
Show abstract
Enabling semantic similarity estimation across multiple ontologies: An
evaluation in the biomedical domain
Article
Full-text available
 * Oct 2012
 * J Biomed Informat

 * David Sánchez
 * Albert sole ribalta
 * Montserrat Batet
 * Francesc Serratosa

The estimation of the semantic similarity between terms provides a valuable tool
to enable the understanding of textual resources. Many semantic similarity
computation paradigms have been proposed both as general-purpose solutions or
framed in concrete fields such as biomedicine. In particular, ontology-based
approaches have been very successful due to their efficiency, scalability, lack
of constraints and thanks to the availability of large and consensus ontologies
(like WordNet or those in the UMLS). These measures, however, are hampered by
the fact that only one ontology is exploited and, hence, their recall depends on
the ontological detail and coverage. In recent years, some authors have extended
some of the existing methodologies to support multiple ontologies. The problem
of integrating heterogeneous knowledge sources is tackled by means of simple
terminological matchings between ontological concepts. In this paper, we aim to
improve these methods by analysing the similarity between the modelled
taxonomical knowledge and the structure of different ontologies. As a result, we
are able to better discover the commonalities between different ontologies and
hence, improve the accuracy of the similarity estimation. Two methods are
proposed to tackle this task. They have been evaluated and compared with related
works by means of several widely-used benchmarks of biomedical terms using two
standard ontologies (WordNet and MeSH). Results show that our methods correlate
better, compared to related works, with the similarity assessments provided by
experts in biomedicine.
View
Show abstract
WordNet::Similarity - Measuring the Relatedness of Concepts
Article
Full-text available
 * Apr 2004

 * Ted Pedersen
 * Siddharth Patwardhan
 * Jason Michelizzi

WordNet::Similarity is a freely available software package that makes it
possible to measure the semantic similarity and relatedness between a pair of
concepts (or synsets). It provides six measures of similarity, and three
measures of relatedness, all of which are based on the lexical database WordNet.
These measures are implemented as Perl modules which take as input two concepts,
and return a numeric value that represents the degree to which they are similar
or related.
View
Show abstract
Lexical Chains as Representations of Context for the Detection and Correction of
Malapropisms
Article
Full-text available
 * Oct 1995

 * Graeme Hirst
 * David St-onge

this paper, we examine the idea of lexical chains as such a representation. We
show how they can be constructed by means of WordNet, and how they can be
applied in one particular linguistic task: the detection and correction of
malapropisms.
View
Show abstract
Comparison and Classification of Documents Based on Layout Similarity
Article
Full-text available
 * Dec 1999
 * INFORM RETRIEVAL

 * Jianying Hu
 * Ramanujan S. Kashi
 * Gordon Wilfong

This paper describes features and methods for document image comparison and
classification at the spatial layout level. The methods are useful for visual
similarity based document retrieval as well as fast algorithms for initial
document type classification without OCR. A novel feature set called interval
encoding is introduced to capture elements of spatial layout. This feature set
encodes region layout information in fixed-length vectors by capturing
structural characteristics of the image. These fixed-length vectors are then
compared to each other through a Manhattan distance computation for fast page
layout comparison. The paper describes experiments and results to rank-order a
set of document pages in terms of their layout similarity to a test document. We
also demonstrate the usefulness of the features derived from interval coding in
a hidden Markov model based page layout classification system that is trainable
and extendible. The methods described in the paper can be ...
View
Show abstract
Extended Gloss Overlaps as a Measure of Semantic Relatedness
Article
Full-text available
 * May 2003

 * Satanjeev Banerjee
 * Ted Pedersen

This paper presents a new measure of semantic relatedness between concepts that
is based on the number of shared words (overlaps) in their definitions
(glosses). This measure is unique in that it extends the glosses of the concepts
under consideration to include the glosses of other concepts to which they are
related according to a given concept hierarchy. We show that this new measure
reasonably correlates to human judgments. We introduce a new method of word
sense disambiguation based on extended gloss overlaps, and demonstrate that it
fares well on the SENSEVAL-2 lexical sample data.
View
Show abstract
Using Corpus Statistics and WordNet Relations for Sense Identification
Article
Full-text available
 * Jul 2002
 * COMPUT LINGUIST

 * Claudia Leacock
 * George A. Miller
 * Martin Chodorow

Introduction An impressive array of statistical methods have been developed for
word sense identification. They range from dictionary-based approaches that rely
on definitions (Vronis and Ide 1990; Wilks et al. 1993) to corpus-based
approaches that use only word cooccurrence frequencies extracted from large
textual corpora (Schfitze 1995; Dagan and Itai 1994). We have drawn on these two
traditions, using corpus-based co-occurrence and the lexical knowledge base that
is embodied in the WordNet lexicon. The two traditions complement each other.
Corpus-based approaches have the advantage of being generally applicable to new
texts, domains, and corpora without needing costly and perhaps error-prone
parsing or semantic analysis. They require only training corpora in which the
sense distinctions have been marked, but therein lies their weakness. Obtaining
training materials for statistical methods is costly and timeconsuming --it is a
"knowledge acquisition bottleneck" (Gale, Church, and Y
View
Show abstract
Smoothing Methods In Statistics
Article
 * Sep 1997
 * J AM STAT ASSOC

 * Jeffrey S. Simonoff

View
Differential Evolution: A Survey of the State-of-the-Art
Article
 * Mar 2011
 * IEEE T EVOLUT COMPUT

 * Sanjoy Das
 * Ponnuthurai N. Suganthan

Differential evolution (DE) is arguably one of the most powerful stochastic
real-parameter optimization algorithms in current use. DE operates through
similar computational steps as employed by a standard evolutionary algorithm
(EA). However, unlike traditional EAs, the DE-variants perturb the
current-generation population members with the scaled differences of randomly
selected and distinct population members. Therefore, no separate probability
distribution has to be used for generating the offspring. Since its inception in
1995, DE has drawn the attention of many researchers all over the world
resulting in a lot of variants of the basic algorithm with improved performance.
This paper presents a detailed review of the basic concepts of DE and a survey
of its major variants, its application to multiobjective, constrained, large
scale, and uncertain optimization problems, and the theoretical studies
conducted on DE so far. Also, it provides an overview of the significant
engineering applications that have benefited from the powerful nature of DE.
View
Show abstract
A semantic similarity metric combining features and intrinsic information
content
Article
 * Nov 2009
 * DATA KNOWL ENG

 * Giuseppe Pirrò

In many research fields such as Psychology, Linguistics, Cognitive Science and
Artificial Intelligence, computing semantic similarity between words is an
important issue. In this paper a new semantic similarity metric, that exploits
some notions of the feature-based theory of similarity and translates it into
the information theoretic domain, which leverages the notion of Information
Content (IC), is presented. In particular, the proposed metric exploits the
notion of intrinsic IC which quantifies IC values by scrutinizing how concepts
are arranged in an ontological structure. In order to evaluate this metric, an
on line experiment asking the community of researchers to rank a list of 65 word
pairs has been conducted. The experiment’s web setup allowed to collect 101
similarity ratings and to differentiate native and non-native English speakers.
Such a large and diverse dataset enables to confidently evaluate similarity
metrics by correlating them with human assessments. Experimental evaluations
using WordNet indicate that the proposed metric, coupled with the notion of
intrinsic IC, yields results above the state of the art. Moreover, the intrinsic
IC formulation also improves the accuracy of other IC-based metrics. In order to
investigate the generality of both the intrinsic IC formulation and proposed
similarity metric a further evaluation using the MeSH biomedical ontology has
been performed. Even in this case significant results were obtained. The
proposed metric and several others have been implemented in the Java WordNet
Similarity Library.
View
Show abstract
Evaluating the usability of natural language query languages and interfaces to
Semantic Web knowledge bases
Article
 * Nov 2010
 * J WEB SEMANT

 * Esther Kaufmann
 * Abraham Bernstein

The need to make the contents of the Semantic Web accessible to end-users
becomes increasingly pressing as the amount of information stored in
ontology-based knowledge bases steadily increases. Natural language interfaces
(NLIs) provide a familiar and convenient means of query access to Semantic Web
data for casual end-users. While several studies have shown that NLIs can
achieve high retrieval performance as well as domain independence, this paper
focuses on usability and investigates if NLIs and natural language query
languages are useful from an enduser’s point of view. To that end, we introduce
four interfaces each allowing a different query language and present a usability
study benchmarking these interfaces. The results of the study reveal a clear
preference for full natural language query sentences with a limited set of
sentence beginnings over keywords or formal query languages. NLIs to
ontology-based knowledge bases can, therefore, be considered to be useful for
casual or occasional end-users. As such, the overarching contribution is one
step towards the theoretical vision of the Semantic Web becoming reality.
View
Show abstract
Data Integration: The Teenage Years.
Conference Paper
 * Jan 2006

 * Alon Halevy
 * Anand Rajaraman
 * Joann J. Ordille

Data integration is a pervasive challenge faced in appli-cations that need to
query across multiple autonomous and heterogeneous data sources. Data
integration is crucial in large enterprises that own a multitude of data
sources, for progress in large-scale scientific projects, where data sets are
being produced independently by multiple researchers, for better cooperation
among government agencies, each with their own data sources, and in o ering good
search quality across the millions of structured data sources on the World-Wide
Web. Ten years ago we published "Querying Heterogeneous In-formation Sources
using Source Descriptions" [73], a paper describing some aspects of the
Information Manifold data integration project. The Information Manifold and many
other projects conducted at the time [5, 6, 20, 25, 38, 43, 51, 66, 100] have
led to tremendous progress on data in-tegration and to quite a few commercial
data integration products. This paper o ers a perspective on the contribu-tions
of the Information Manifold and its peers, describes some of the important
bodies of work in the data integra-tion field in the last ten years, and
outlines some challenges to data integration research today. We note in advance
that this is not intended to be a comprehensive survey of data integration, and
even though the reference list is long, it is by no means complete.
View
Show abstract
Information in Data: Using the Oxford English Dictionary on a Computer
Article
 * May 1986

 * Michael Lesk

I believe that the concept of a metric (or a dissimilarity measure) defined on a
set of records is one of the most fundamental concepts related to information
retrieval, although historically, the first science to introduce this concept as
a basic one ...
View
Show abstract
Word AdHoc Network: Using Google Core Distance to extract the most relevant
information
Article
 * Apr 2011
 * KNOWL-BASED SYST

 * Ping-I Chen
 * Shi-Jen Lin

In recent years, finding the most relevant documents or search results in a
search engine has become an important issue. Most previous research has focused
on expanding the keyword into a more meaningful sequence or using a higher
concept to form the semantic search. All of those methods need predictive
models, which are based on the training data or Web log of the users’ browsing
behaviors. In this way, they can only be used in a single knowledge domain, not
only because of the complexity of the model construction but also because the
keyword extraction methods are limited to certain areas. In this paper, we
describe a new algorithm called “Word AdHoc Network” (WANET) and use it to
extract the most important sequences of keywords to provide the most relevant
search results to the user. Our method needs no pre-processing, and all the
executions are real-time. Thus, we can use this system to extract any keyword
sequence from various knowledge domains. Our experiments show that the extracted
sequence of the documents can achieve high accuracy and can find the most
relevant information in the top 1 search results, in most cases. This new system
can increase users’ effectiveness in finding useful information for the articles
or research papers they are reading or writing.
View
Show abstract
Conceptual query expansion
Article
 * Feb 2006
 * DATA KNOWL ENG

 * Franc Grootjen
 * Theo P. van der Weide

This article presents a new, hybrid approach that projects an initial query
result onto global information, yielding a local conceptual overview. Concepts
found this way are candidates for query refinement.We show that the resulting
conceptual structure after a typical short query of 2 terms, contains
refinements that perform just as well as a most accurate query
formulation.Subsequently we illustrate that query by navigation is an effective
mechanism which in most cases finds the optimal concept in a small number of
steps. When an optimal concept is not found, the navigation process still finds
an acceptable sub-optimum.
View
Show abstract
Statistical Comparisons of Classifiers over Multiple Data Sets
Article
 * Jan 2006
 * J MACH LEARN RES

 * Janez Demsar

While methods for comparing two learning algorithms on a single data set have
been scrutinized for quite some time already, the issue of statistical tests for
comparisons of more algorithms on multiple data sets, which is even more
essential to typical machine learning studies, has been all but ignored. This
article reviews the current practice and then theoretically and empirically
examines several suitable tests. Based on that, we recommend a set of simple,
yet safe and robust non-parametric tests for statistical comparisons of
classifiers: the Wilcoxon signed ranks test for comparison of two classifiers
and the Friedman test with the corresponding post-hoc tests for comparison of
more classifiers over multiple data sets. Results of the latter can also be
neatly presented with the newly introduced CD (critical difference) diagrams.
View
Show abstract
Measures of semantic similarity and relatedness in the biomedical domain
Article
 * Jul 2007
 * J Biomed Informat

 * Ted Pedersen
 * Serguei V.S. Pakhomov
 * Siddharth Patwardhan
 * Christopher G Chute

Measures of semantic similarity between concepts are widely used in Natural
Language Processing. In this article, we show how six existing
domain-independent measures can be adapted to the biomedical domain. These
measures were originally based on WordNet, an English lexical database of
concepts and relations. In this research, we adapt these measures to the
SNOMED-CT ontology of medical concepts. The measures include two path-based
measures, and three measures that augment path-based measures with information
content statistics from corpora. We also derive a context vector measure based
on medical corpora that can be used as a measure of semantic relatedness. These
six measures are evaluated against a newly created test bed of 30 medical
concept pairs scored by three physicians and nine medical coders. We find that
the medical coders and physicians differ in their ratings, and that the context
vector measure correlates most closely with the physicians, while the path-based
measures and one of the information content measures correlates most closely
with the medical coders. We conclude that there is a role both for more flexible
measures of relatedness based on information derived from corpora, as well as
for measures that rely on existing ontological structures.
View
Show abstract
The Semantic Web Revisited
Article
 * Feb 2006
 * IEEE INTELL SYST

 * Nigel Shadbolt
 * Wendy Hall
 * Tim Berners-Lee

The article included many scenarios in which intelligent agents and bots
undertook tasks on behalf of their human or corporate owners. Of course,
shopbots and auction bots abound on the Web, but these are essentially
handcrafted for particular tasks: they have little ability to interact with
heterogeneous data and information types. Because we haven't yet delivered
large-scale, agent-based mediation, some commentators argue that the semantic
Web has failed to deliver. We argue that agents can only flourish when standards
are well established and that the Web standards for expressing shared meaning
have progressed steadily over the past five years
View
Show abstract
An Information-Theoretic Definition of Similarity
Article
 * Aug 1998

 * Dekang Lin

Similarity is an important and widely used concept. Previous definitions of
similarity are tied to a particular application or a form of knowledge
representation. We present an informationtheoretic definition of similarity that
is applicable as long as there is a probabilistic model. We demonstrate how our
definition can be used to measure the similarity in a number of different
domains.
View
Show abstract
Using Information Content to Evaluate Semantic Similarity in a Taxonomy
Article
 * Feb 1970

 * Philip Resnik

This paper presents a new measure of semantic similarity in an is-a taxonomy,
based on the notion of information content. Experimental evaluation suggests
that the measure performs encouragingly well (a correlation of r = 0:79 with a
benchmark set of human similarity judgments, with an upper bound of r = 0:90 for
human subjects performing the same task), and significantly better than the
traditional edge counting approach (r = 0:66). 1 Introduction Evaluating
semantic relatedness using network representations is a problem with a long
history in artificial intelligence and psychology, dating back to the spreading
activation approach of Quillian [ 1968 ] and Collins and Loftus [ 1975 ] .
Semantic similarity represents a special case of semantic relatedness: for
example, cars and gasoline would seem to be more closely related than, say, cars
and bicycles, but the latter pair are certainly more similar. Rada et al. [ 1989
] suggest that the assessment of similarity in semantic n...
View
Show abstract
Verb Semantics And Lexical Selection
Article
 * May 2002

 * Zhibiao Wu
 * Martha Palmer

This paper will focus on the semantic representation of verbs in computer
systems and its impact on lexical selection problems in machine translation
(MT). Two groups of English and Chinese verbs are examined to show that lexical
selec- tion must be based on interpretation of the sen- tence as well as
selection restrictions placed on the verb arguments. A novel representation
scheme is suggested, and is compared to representations with selection
restrictions used in transfer-based MT. We see our approach as closely aligned
with knowledge-based MT approaches (KBMT), and as a separate component that
could be incorporated into existing systems. Examples and experimental results
will show that, using this scheme, inexact matches can achieve correct lexical
selection.
View
Show abstract
Semantic Similarity Based on Corpus Statistics and Lexical Taxonomy
Article
 * Oct 1997

 * Jay J. Jiang
 * David W. Conrath

This paper presents a new approach for measuring semantic similarity/distance
between words and concepts. It combines a lexical taxonomy structure with corpus
statistical information so that the semantic distance between nodes in the
semantic space constructed by the taxonomy can be better quantified with the
computational evidence derived from a distributional analysis of corpus data.
Specifically, the proposed measure is a combined approach that inherits the
edge-based approach of the edge counting scheme, which is then enhanced by the
node-based approach of the information content calculation. When tested on a
common data set of word pair similarity ratings, the proposed approach
outperforms other computational models. It gives the highest correlation value
(r = 0.828) with a benchmark based on human similarity judgements, whereas an
upper bound (r = 0.885) is observed when human subjects replicate the same task.
View
Show abstract
 * Jan 1996

 * J S Simonoff

J.S. Simonoff, Smoothing Methods in Statistics, Springer, 1996.





RECOMMENDATIONS

Discover more
Project


MPCENERGY

 * Georgios C. Chasparis
 * Thomas Natschläger
 * Thomas Grubinger
 * [...]
 * Christa Illibauer

View project
Project


ACCURATE AND EFFICIENT PROFILE MATCHING IN KNOWLEDGE BASES

 * Jorge Martinez-Gil

Accurate matching of job offers and applicant profiles
View project
Project


SEMANTIC SIMILARITY MEASUREMENT

 * Jorge Martinez-Gil

View project
Project


ONTOLOGY MATCHING AND META-MATCHING

 * Jorge Martinez-Gil
 * Jose F Aldana Montes
 * Ismael Navas Delgado

View project
Preprint


EVOLUTIONARY ALGORITHM BASED ON DIFFERENT SEMANTIC SIMILARITY FUNCTIONS FOR
SYNONYM RECOGNITION IN T...

September 2017
 * Jorge Martinez-Gil
 * José M. Chaves-González

One of the most challenging problems in the semantic web field consists of
computing the semantic similarity between different terms. The problem here is
the lack of accurate domain-specific dictionaries, such as biomedical, financial
or any other particular and dynamic field. In this article we propose a new
approach which uses different existing semantic similarity methods to obtain
precise ... [Show full abstract] results in the biomedical domain. Specifically,
we have developed an evolutionary algorithm which uses information provided by
different semantic similarity metrics. Our results have been validated against a
variety of biomedical datasets and different collections of similarity
functions. The proposed system provides very high quality results when compared
against similarity ratings provided by human experts (in terms of Pearson
correlation coefficient) surpassing the results of other relevant works
previously published in the literature.
Read more



LOOKING FOR THE FULL-TEXT?

You can request the full-text of this article directly from the authors on
ResearchGate.

Request full-text
Already a member? Log in

ResearchGate iOS App
Get it from the App Store now.
Install
Keep up with your stats and more
Access scientific knowledge from anywhere

or
Discover by subject area
 * Recruit researchers
 * Join for free
 * Login
   Email
   Tip: Most researchers use their institutional email address as their
   ResearchGate login
   
   PasswordForgot password?
   Keep me logged in
   Log in
   or
   Continue with Google
   Welcome back! Please log in.
   Email
   · Hint
   Tip: Most researchers use their institutional email address as their
   ResearchGate login
   
   PasswordForgot password?
   Keep me logged in
   Log in
   or
   Continue with Google
   No account? Sign up
   

Company
About us
News
Careers
Support
Help Center
Business solutions
Advertising
Recruiting

© 2008-2021 ResearchGate GmbH. All rights reserved.
 * Terms
 * Privacy
 * Copyright
 * Imprint