www.researchgate.net Open in urlscan Pro
2606:4700::6811:2169  Public Scan

URL: https://www.researchgate.net/publication/354359339_A_Novel_Neurofuzzy_Approach_for_Semantic_Similarity_Measurement
Submission: On July 26 via manual from AT — Scanned from DE

Form analysis 3 forms found in the DOM

GET search

<form method="GET" action="search" class="lite-page__header-search-input-wrapper"><input type="hidden" name="context" readonly="" value="publicSearchHeader"><input placeholder="Search for publications, researchers, or questions" name="q"
    autocomplete="off" class="lite-page__header-search-input"><button
    class="nova-legacy-c-button nova-legacy-c-button--align-center nova-legacy-c-button--radius-m nova-legacy-c-button--size-m nova-legacy-c-button--color-green nova-legacy-c-button--theme-bare nova-legacy-c-button--width-square lite-page__header-search-button"
    type="submit"><span class="nova-legacy-c-button__label"><svg aria-hidden="true" class="nova-legacy-e-icon nova-legacy-e-icon--size-s nova-legacy-e-icon--theme-bare nova-legacy-e-icon--color-inherit nova-legacy-e-icon--luminosity-medium">
        <use xlink:href="/m/4210735495654424/images/icons/nova/icon-stack-s.svg#magnifier-s"></use>
      </svg></span></button></form>

Name: loginFormPOST https://www.researchgate.net/login?_sg=XE4Gu8QnqX4etxTfiAFeq0mOLLql-dPzsN2H14puQsW_0YYbn6xVizada49ZorzU4dGXq5-55puscg

<form method="post" action="https://www.researchgate.net/login?_sg=XE4Gu8QnqX4etxTfiAFeq0mOLLql-dPzsN2H14puQsW_0YYbn6xVizada49ZorzU4dGXq5-55puscg" name="loginForm" id="headerLoginForm"><input type="hidden" name="request_token"
    value="aad-1XO2ubGb6mFVWtfo00YndpQhj2SoMv1ic2RNjbzgCFDnmjuMXuB3wkjIA+ha6Yqvaag7poIkOlPpGE80icvZCCJPo2qJ8iNlwIhYP+v9L05CqGzvZmts2xCaoBj1cjXsPHRXGwurN3awfPc0JOof2TLgjbHa0hu59WJjT2tqW//sEBw6VYapS4DPOJrTXzq+iNunyEd31wyRrrDW+ze9xnxMaDfkdqV7A8unpUkHN2H85dAr6N+FYAyjvdU1mhhzv4oy3r5QOd1aNMOsY9c="><input
    type="hidden" name="urlAfterLogin" value="publication/354359339_A_Novel_Neurofuzzy_Approach_for_Semantic_Similarity_Measurement"><input type="hidden" name="invalidPasswordCount" value="0"><input type="hidden" name="headerLogin" value="yes">
  <div class="lite-page__header-login-item"><label class="nova-legacy-e-text nova-legacy-e-text--size-m nova-legacy-e-text--family-sans-serif nova-legacy-e-text--spacing-none nova-legacy-e-text--color-inherit lite-page__header-login-label"
      for="input-header-login">Email <div class="lite-page-tooltip "><svg aria-hidden="true" class="nova-legacy-e-icon nova-legacy-e-icon--size-s nova-legacy-e-icon--theme-bare nova-legacy-e-icon--color-inherit nova-legacy-e-icon--luminosity-medium">
          <use xlink:href="/m/4210735495654424/images/icons/nova/icon-stack-s.svg#info-circle-s"></use>
        </svg>
        <div class="lite-page-tooltip__content lite-page-tooltip__content--above">
          <div class="nova-legacy-e-text nova-legacy-e-text--size-s nova-legacy-e-text--family-sans-serif nova-legacy-e-text--spacing-none nova-legacy-e-text--color-inherit"><b>Tip:</b> Most researchers use their institutional email address as their
            ResearchGate login</div>
          <div class="lite-page-tooltip__arrow lite-page-tooltip__arrow--above">
            <div class="lite-page-tooltip__arrow-tip"></div>
          </div>
        </div>
      </div></label></div><input type="email" required="" id="input-header-login" name="login" autocomplete="email" tabindex="1" placeholder=""
    class="nova-legacy-e-input__field nova-legacy-e-input__field--size-m lite-page__header-login-item nova-legacy-e-input__ambient nova-legacy-e-input__ambient--theme-default">
  <div class="lite-page__header-login-item"><label class="lite-page__header-login-label"
      for="input-header-password">Password</label><a class="nova-legacy-e-link nova-legacy-e-link--color-blue nova-legacy-e-link--theme-bare lite-page__header-login-forgot" href="application.LostPassword.html">Forgot password?</a></div><input
    type="password" required="" id="input-header-password" name="password" autocomplete="current-password" tabindex="2" placeholder=""
    class="nova-legacy-e-input__field nova-legacy-e-input__field--size-m lite-page__header-login-item nova-legacy-e-input__ambient nova-legacy-e-input__ambient--theme-default">
  <div><label class="nova-legacy-e-checkbox lite-page__header-login-checkbox"><input type="checkbox" class="nova-legacy-e-checkbox__input" aria-invalid="false" name="setLoginCookie" tabindex="3" value="yes" checked=""><span
        class="nova-legacy-e-checkbox__checkmark"></span><span class="nova-legacy-e-checkbox__label"> Keep me logged in</span></label></div>
  <div
    class="nova-legacy-l-flex__item nova-legacy-l-flex nova-legacy-l-flex--gutter-m nova-legacy-l-flex--direction-column@s-up nova-legacy-l-flex--align-items-stretch@s-up nova-legacy-l-flex--justify-content-center@s-up nova-legacy-l-flex--wrap-nowrap@s-up">
    <div class="nova-legacy-l-flex__item"><button
        class="nova-legacy-c-button nova-legacy-c-button--align-center nova-legacy-c-button--radius-m nova-legacy-c-button--size-m nova-legacy-c-button--color-blue nova-legacy-c-button--theme-solid nova-legacy-c-button--width-full" type="submit"
        tabindex="4"><span class="nova-legacy-c-button__label">Log in</span></button></div>
    <div class="nova-legacy-l-flex__item nova-legacy-l-flex__item--align-self-center@s-up">
      <div class="nova-legacy-e-text nova-legacy-e-text--size-s nova-legacy-e-text--family-sans-serif nova-legacy-e-text--spacing-none nova-legacy-e-text--color-inherit">or</div>
    </div>
    <div class="nova-legacy-l-flex__item">
      <div
        class="nova-legacy-l-flex__item nova-legacy-l-flex nova-legacy-l-flex--gutter-m nova-legacy-l-flex--direction-column@s-up nova-legacy-l-flex--align-items-center@s-up nova-legacy-l-flex--justify-content-flex-start@s-up nova-legacy-l-flex--wrap-nowrap@s-up">
        <div class="nova-legacy-l-flex__item">
          <a href="connector/google"><div style="display:inline-block;width:247px;height:40px;text-align:left;border-radius:2px;white-space:nowrap;color:#444;background:#4285F4"><span style="margin:1px 0 0 1px;display:inline-block;vertical-align:middle;width:38px;height:38px;background:url('images/socialNetworks/logos-official-2019-05/google-logo.svg') transparent 50% no-repeat"></span><span style="color:#FFF;display:inline-block;vertical-align:middle;padding-left:15px;padding-right:42px;font-size:16px;font-family:Roboto, sans-serif">Continue with Google</span></div></a>
        </div>
      </div>
    </div>
  </div>
</form>

Name: loginFormPOST https://www.researchgate.net/login?_sg=XE4Gu8QnqX4etxTfiAFeq0mOLLql-dPzsN2H14puQsW_0YYbn6xVizada49ZorzU4dGXq5-55puscg

<form method="post" action="https://www.researchgate.net/login?_sg=XE4Gu8QnqX4etxTfiAFeq0mOLLql-dPzsN2H14puQsW_0YYbn6xVizada49ZorzU4dGXq5-55puscg" name="loginForm" id="modalLoginForm"><input type="hidden" name="request_token"
    value="aad-1XO2ubGb6mFVWtfo00YndpQhj2SoMv1ic2RNjbzgCFDnmjuMXuB3wkjIA+ha6Yqvaag7poIkOlPpGE80icvZCCJPo2qJ8iNlwIhYP+v9L05CqGzvZmts2xCaoBj1cjXsPHRXGwurN3awfPc0JOof2TLgjbHa0hu59WJjT2tqW//sEBw6VYapS4DPOJrTXzq+iNunyEd31wyRrrDW+ze9xnxMaDfkdqV7A8unpUkHN2H85dAr6N+FYAyjvdU1mhhzv4oy3r5QOd1aNMOsY9c="><input
    type="hidden" name="urlAfterLogin" value="publication/354359339_A_Novel_Neurofuzzy_Approach_for_Semantic_Similarity_Measurement"><input type="hidden" name="invalidPasswordCount" value="0"><input type="hidden" name="modalLogin" value="yes">
  <div class="nova-legacy-l-form-group nova-legacy-l-form-group--layout-stack nova-legacy-l-form-group--gutter-s">
    <div class="nova-legacy-l-form-group__item nova-legacy-l-form-group__item--width-auto@m-up"><label
        class="nova-legacy-e-text nova-legacy-e-text--size-m nova-legacy-e-text--family-sans-serif nova-legacy-e-text--spacing-xxs nova-legacy-e-text--color-inherit nova-legacy-e-label" for="input-modal-login-label"><span
          class="nova-legacy-e-label__text">Email <div class="lite-page-tooltip "><span class="nova-legacy-e-text nova-legacy-e-text--size-m nova-legacy-e-text--family-sans-serif nova-legacy-e-text--spacing-none nova-legacy-e-text--color-grey-500">·
              Hint</span>
            <div class="lite-page-tooltip__content lite-page-tooltip__content--above">
              <div class="nova-legacy-e-text nova-legacy-e-text--size-s nova-legacy-e-text--family-sans-serif nova-legacy-e-text--spacing-none nova-legacy-e-text--color-inherit"><b>Tip:</b> Most researchers use their institutional email address as
                their ResearchGate login</div>
              <div class="lite-page-tooltip__arrow lite-page-tooltip__arrow--above">
                <div class="lite-page-tooltip__arrow-tip"></div>
              </div>
            </div>
          </div></span></label><input type="email" required="" id="input-modal-login" name="login" autocomplete="email" tabindex="1" placeholder="Enter your email"
        class="nova-legacy-e-input__field nova-legacy-e-input__field--size-m nova-legacy-e-input__ambient nova-legacy-e-input__ambient--theme-default"></div>
    <div class="nova-legacy-l-form-group__item nova-legacy-l-form-group__item--width-auto@m-up">
      <div class="lite-page-modal__forgot"><label class="nova-legacy-e-text nova-legacy-e-text--size-m nova-legacy-e-text--family-sans-serif nova-legacy-e-text--spacing-xxs nova-legacy-e-text--color-inherit nova-legacy-e-label"
          for="input-modal-password-label"><span
            class="nova-legacy-e-label__text">Password</span></label><a class="nova-legacy-e-link nova-legacy-e-link--color-blue nova-legacy-e-link--theme-bare lite-page-modal__forgot-link" href="application.LostPassword.html">Forgot password?</a>
      </div><input type="password" required="" id="input-modal-password" name="password" autocomplete="current-password" tabindex="2" placeholder=""
        class="nova-legacy-e-input__field nova-legacy-e-input__field--size-m nova-legacy-e-input__ambient nova-legacy-e-input__ambient--theme-default">
    </div>
    <div class="nova-legacy-l-form-group__item nova-legacy-l-form-group__item--width-auto@m-up">
      <div><label class="nova-legacy-e-checkbox"><input type="checkbox" class="nova-legacy-e-checkbox__input" aria-invalid="false" checked="" value="yes" name="setLoginCookie" tabindex="3"><span class="nova-legacy-e-checkbox__checkmark"></span><span
            class="nova-legacy-e-checkbox__label"> Keep me logged in</span></label></div>
    </div>
    <div class="nova-legacy-l-form-group__item nova-legacy-l-form-group__item--width-auto@m-up"><button
        class="nova-legacy-c-button nova-legacy-c-button--align-center nova-legacy-c-button--radius-m nova-legacy-c-button--size-m nova-legacy-c-button--color-blue nova-legacy-c-button--theme-solid nova-legacy-c-button--width-full" type="submit"
        tabindex="4"><span class="nova-legacy-c-button__label">Log in</span></button></div>
    <div class="nova-legacy-l-form-group__item nova-legacy-l-form-group__item--width-auto@m-up">
      <div
        class="nova-legacy-l-flex__item nova-legacy-l-flex nova-legacy-l-flex--gutter-m nova-legacy-l-flex--direction-column@s-up nova-legacy-l-flex--align-items-center@s-up nova-legacy-l-flex--justify-content-flex-start@s-up nova-legacy-l-flex--wrap-nowrap@s-up">
        <div class="nova-legacy-l-flex__item">
          <div class="nova-legacy-e-text nova-legacy-e-text--size-s nova-legacy-e-text--family-sans-serif nova-legacy-e-text--spacing-none nova-legacy-e-text--color-inherit">or</div>
        </div>
        <div class="nova-legacy-l-flex__item">
          <div
            class="nova-legacy-l-flex__item nova-legacy-l-flex nova-legacy-l-flex--gutter-m nova-legacy-l-flex--direction-column@s-up nova-legacy-l-flex--align-items-center@s-up nova-legacy-l-flex--justify-content-flex-start@s-up nova-legacy-l-flex--wrap-nowrap@s-up">
            <div class="nova-legacy-l-flex__item">
              <a href="connector/google"><div style="display:inline-block;width:247px;height:40px;text-align:left;border-radius:2px;white-space:nowrap;color:#444;background:#4285F4"><span style="margin:1px 0 0 1px;display:inline-block;vertical-align:middle;width:38px;height:38px;background:url('images/socialNetworks/logos-official-2019-05/google-logo.svg') transparent 50% no-repeat"></span><span style="color:#FFF;display:inline-block;vertical-align:middle;padding-left:15px;padding-right:42px;font-size:16px;font-family:Roboto, sans-serif">Continue with Google</span></div></a>
            </div>
          </div>
        </div>
        <div class="nova-legacy-l-flex__item">
          <div class="nova-legacy-e-text nova-legacy-e-text--size-s nova-legacy-e-text--family-sans-serif nova-legacy-e-text--spacing-none nova-legacy-e-text--color-grey-500 lite-page-center">No account?
            <a class="nova-legacy-e-link nova-legacy-e-link--color-blue nova-legacy-e-link--theme-decorated" href="signup.SignUp.html?hdrsu=1&amp;_sg%5B0%5D=ZqhiFLKf3W3rJPYv5A9T5XWeQdqmt_UGM8QKUvl0Mj_BYY1ZvNtbPjuyfLFQNk4k7Q8DPcgmKfdLMG7c2y-YTh52lls">Sign up</a>
          </div>
        </div>
      </div>
    </div>
  </div>
</form>

Text Content

WE VALUE YOUR PRIVACY

We and our partners store and/or access information on a device, such as cookies
and process personal data, such as unique identifiers and standard information
sent by a device for personalised ads and content, ad and content measurement,
and audience insights, as well as to develop and improve products.With your
permission we and our partners may use precise geolocation data and
identification through device scanning. You may click to consent to our and our
partners’ processing as described above. Alternatively you may click to refuse
to consent or access more detailed information and change your preferences
before consenting.
Please note that some processing of your personal data may not require your
consent, but you have a right to object to such processing. Your preferences
will apply to a group of websites. You can change your preferences at any time
by returning to this site or visit our privacy policy.
MORE OPTIONS DISAGREE AGREE
Chapter


A NOVEL NEUROFUZZY APPROACH FOR SEMANTIC SIMILARITY MEASUREMENT

 * September 2021

DOI:10.1007/978-3-030-86534-4_18
 * In book: Big Data Analytics and Knowledge Discovery (pp.192-203)

Authors:
Jorge Martinez-Gil
 * Software Competence Center Hagenberg



Riad Mokadem
 * Paul Sabatier University - Toulouse III



Josef Küng


Josef Küng
 * This person is not on ResearchGate, or hasn't claimed this research yet.



Abdelkader Hameurlain


Abdelkader Hameurlain
 * This person is not on ResearchGate, or hasn't claimed this research yet.



Request full-text PDF

To read the full-text of this research, you can request a copy directly from the
authors.

Request full-text
Download citation
Copy link Link copied
Request full-text
Download citation
Copy link Link copied
To read the full-text of this research, you can request a copy directly from the
authors.
Citations (1)
References (34)





ABSTRACT

The problem of identifying the degree of semantic similarity between two textual
statements automatically has grown in importance in recent times. Its impact on
various computer-related domains and recent breakthroughs in neural computation
has increased the opportunities for better solutions to be developed. This
research takes the research efforts a step further by designing and developing a
novel neurofuzzy approach for semantic textual similarity that uses neural
networks and fuzzy logics. The fundamental notion is to combine the remarkable
capabilities of the current neural models for working with text with the
possibilities that fuzzy logic provides for aggregating numerical information in
a tailored manner. The results of our experiments suggest that this approach is
capable of accurately determining semantic textual similarity.

Discover the world's research

 * 20+ million members
 * 135+ million publications
 * 700k+ research projects

Join for free




NO FULL-TEXT AVAILABLE

To read the full-text of this research,
you can request a copy directly from the authors.

Request full-text PDF



CITATIONS (1)


REFERENCES (34)




End-to-End Generation of Multiple-Choice Questions Using Text-to-Text Transfer
Transformer Models
Article
 * Jul 2022
 * EXPERT SYST APPL

The increasing worldwide adoption of e-learning tools and widespread increase of
online education has brought multiple challenges, including the ability of
generating assessments at the scale and speed demanded by this environment. In
this sense, recent advances in language models and architectures like the
Transformer, provide opportunities to explore how to assist educators in these
tasks. This study focuses on using neural language models for the generation of
questionnaires composed of multiple-choice questions, based on English Wikipedia
articles as input. The problem is addressed using three dimensions: Question
Generation (QG), Question Answering (QA), and Distractor Generation (DG). A
processing pipeline based on pre-trained T5 language models is designed and a
REST API is implemented for its use. The DG task is defined using a Text-To-Text
format and a T5 model is fine-tuned on the DG-RACE dataset, showing an
improvement to ROUGE-L metric compared to the reference for the dataset. A
discussion about the lack of an adequate metric for DG is presented and the
cosine similarity using word embeddings is considered as a complement.
Questionnaires are evaluated by human experts reporting that questions and
options are generally well formed, however, they are more oriented to measuring
retention than comprehension.
View
Show abstract
A reproducible survey on word embeddings and ontology-based methods for word
similarity: Linear combinations outperform the state of the art
Article
Full-text available
 * Aug 2019
 * ENG APPL ARTIF INTEL

 * Juan José Lastra-Díaz
 * Josu Goikoetxea
 * Mohamed Ali Hadj Taieb
 * Eneko Agirre

Human similarity and relatedness judgements between concepts underlie most of
cognitive capabilities, such as categorisation, memory, decision-making and
reasoning. For this reason, the proposal of methods for the estimation of the
degree of similarity and relatedness between words and concepts has been a very
active line of research in the fields of artificial intelligence, information
retrieval and natural language processing among others. Main approaches proposed
in the literature can be categorised in two large families as follows: (1)
Ontology-based semantic similarity Measures (OM) and (2) distributional measures
whose most recent and successful methods are based on Word Embedding (WE)
models. However, the lack of a deep analysis of both families of methods slows
down the advance of this line of research and its applications. This work
introduces the largest, reproducible and detailed experimental survey of OM
measures and WE models reported in the literature which is based on the
evaluation of both families of methods on a same software platform, with the aim
of elucidating what is the state of the problem. We show that WE models which
combine distributional and ontology-based information get the best results, and
in addition, we show for the first time that a simple average of two best
performing WE models with other ontology-based measures or WE models is able to
improve the state of the art by a large margin. In addition, we provide a very
detailed reproducibility protocol together with a collection of software tools
and datasets as complementary material to allow the exact replication of our
results.
View
Show abstract
Universal Sentence Encoder for English
Conference Paper
Full-text available
 * Jan 2018

 * Daniel Cer
 * Yinfei Yang
 * Sheng-yi Kong
 * Ray Kurzweil

View
Semantic similarity aggregators for very short textual expressions: a case study
on landmarks and point of interest
Article
Full-text available
 * Oct 2019
 * J INTELL INF SYST

 * Jorge Martinez-Gil

Semantic similarity measurement aims to automatically compute the degree of
similarity between two textual expressions that use different representations
for naming the same concepts. However, very short textual expressions cannot
always follow the syntax of a written language and, in general, do not provide
enough information to support proper analysis. This means that in some fields,
such as the processing of landmarks and points of interest, results are not
entirely satisfactory. In order to overcome this situation, we explore the idea
of aggregating existing methods by means of two novel aggregation operators
aiming to model an appropriate interaction between the similarity measures. As a
result, we have been able to improve the results of existing techniques when
solving the GeReSiD and the SDTS, two of the most popular benchmark datasets for
dealing with geographical information.
View
Show abstract
Universal Sentence Encoder
Article
Full-text available
 * Mar 2018

 * Daniel Cer
 * Yinfei Yang
 * Sheng-yi Kong
 * Ray Kurzweil

We present models for encoding sentences into embedding vectors that
specifically target transfer learning to other NLP tasks. The models are
efficient and result in accurate performance on diverse transfer tasks. Two
variants of the encoding models allow for trade-offs between accuracy and
compute resources. For both variants, we investigate and report the relationship
between model complexity, resource consumption, the availability of transfer
task training data, and task performance. Comparisons are made with baselines
that use word level transfer learning via pretrained word embeddings as well as
baselines do not use any transfer learning. We find that transfer learning using
sentence embeddings tends to outperform word level transfer. With transfer
learning via sentence embeddings, we observe surprisingly good performance with
minimal amounts of supervised training data for a transfer task. We obtain
encouraging results on Word Embedding Association Tests (WEAT) targeted at
detecting model bias. Our pre-trained sentence encoding models are made freely
available for download and on TF Hub.
View
Show abstract
HESML: A scalable ontology-based semantic similarity measures library with a set
of reproducible experiments and a replication dataset
Article
Full-text available
 * Jun 2017
 * INFORM SYST

 * Juan José Lastra-Díaz
 * Ana M Garcia-Serrano
 * Montserrat Batet
 * Fernando Chirigati

This work is a detailed companion reproducibility paper of the methods and
experiments proposed by Lastra-Díaz and García-Serrano in [56, 57, 58], which
introduces the following contributions: (1) a new and efficient representation
model for taxonomies, called PosetHERep, which is an adaptation of the half-edge
data structure commonly used to represent discrete manifolds and planar graphs;
(2) a new Java software library called the Half-Edge Semantic Measures Library
(HESML) based on PosetHERep, which implements most ontology-based semantic
similarity measures and Information Content (IC) models reported in the
literature; (3) a set of reproducible experiments on word similarity based on
HESML and ReproZip with the aim of exactly reproducing the experimental surveys
in the three aforementioned works; (4) a replication framework and dataset,
called WNSimRep v1, whose aim is to assist the exact replication of most methods
reported in the literature; and finally, (5) a set of scalability and
performance benchmarks for semantic measures libraries. PosetHERep and HESML are
motivated by several drawbacks in the current semantic measures libraries,
especially the performance and scalability, as well as the evaluation of new
methods and the replication of most previous methods. The reproducible
experiments introduced herein are encouraged by the lack of a set of large,
self-contained and easily reproducible experiments with the aim of replicating
and confirming previously reported results. Likewise, the WNSimRep v1 dataset is
motivated by the discovery of several contradictory results and difficulties in
reproducing previously reported methods and experiments. PosetHERep proposes a
memory-efficient representation for taxonomies which linearly scales with the
size of the taxonomy and provides an efficient implementation of most
taxonomy-based algorithms used by the semantic measures and IC models, whilst
HESML provides an open framework to aid research into the area by providing a
simpler and more efficient software architecture than the current software
libraries. Finally, we prove the outperformance of HESML on the state-of-the-art
libraries, as well as the possibility of significantly improving their
performance and scalability without caching using PosetHERep.
View
Show abstract
CoTO: A Novel Approach for Fuzzy Aggregation of Semantic Similarity Measures
Article
Full-text available
 * Feb 2016
 * COGN SYST RES

 * Jorge Martinez-Gil

Semantic similarity measurement aims to determine the likeness between two text
expressions that use different lexicographies for representing the same real
object or idea. There are a lot of semantic similarity measures for addressing
this problem. However, the best results have been achieved when aggregating a
number of simple similarity measures. This means that after the various
similarity values have been calculated, the overall similarity for a pair of
text expressions is computed using an aggregation function of these individual
semantic similarity values. This aggregation is often computed by means of
statistical functions. In this work, we present CoTO (Consensus or Trade-Off) a
solution based on fuzzy logic that is able to outperform these traditional
approaches.
View
Show abstract
Semantic Similarity from Natural Language and Ontology Analysis
Book
Full-text available
 * May 2015

 * Sébastien Harispe
 * Sylvie Ranwez
 * Stefan Janaqi
 * Jacky Montmain

Artificial Intelligence federates numerous scientific fields in the aim of
developing machines able to assist human operators performing complex treatments
-- most of which demand high cognitive skills (e.g. learning or decision
processes). Central to this quest is to give machines the ability to estimate
the likeness or similarity between things in the way human beings estimate the
similarity between stimuli. In this context, this book focuses on semantic
measures: approaches designed for comparing semantic entities such as units of
language, e.g. words, sentences, or concepts and instances defined into
knowledge bases. The aim of these measures is to assess the similarity or
relatedness of such semantic entities by taking into account their semantics,
i.e. their meaning -- intuitively, the words tea and coffee, which both refer to
stimulating beverage, will be estimated to be more semantically similar than the
words toffee (confection) and coffee, despite that the last pair has a higher
syntactic similarity. The two state-of-the-art approaches for estimating and
quantifying semantic similarities/relatedness of semantic entities are presented
in detail: the first one relies on corpora analysis and is based on Natural
Language Processing techniques and semantic models while the second is based on
more or less formal, computer-readable and workable forms of knowledge such as
semantic networks, thesaurus or ontologies. (...) Beyond a simple inventory and
categorization of existing measures, the aim of this monograph is to convey
novices as well as researchers of these domains towards a better understanding
of semantic similarity estimation and more generally semantic measures.
View
Show abstract
jFuzzyLogic: A Java Library to Design Fuzzy Logic Controllers According to the
Standard for Fuzzy Control Programming
Article
Full-text available
 * Jun 2013

 * Pablo Cingolani
 * Jesus Alcala-Fdez

Fuzzy Logic Controllers are a specific model of Fuzzy Rule Based Systems
suitable for engineering applications for which classic control strategies do
not achieve good results or for when it is too difficult to obtain a
mathematical model. Recently, the International Electrotechnical Commission has
published a standard for fuzzy control programming in part 7 of the IEC 61131
norm in order to offer a well defined common understanding of the basic means
with which to integrate fuzzy control applications in control systems. In this
paper, we introduce an open source Java library called jFuzzyLogic which offers
a fully functional and complete implementation of a fuzzy inference system
according to this standard, providing a programming interface and Eclipse plugin
to easily write and test code for fuzzy control applications. A case study is
given to illustrate the use of jFuzzyLogic.
View
Show abstract
Distributed Representations of Words and Phrases and their Compositionality
Article
Full-text available
 * Oct 2013
 * Adv Neural Inform Process Syst

 * Tomas Mikolov
 * Ilya Sutskever
 * Kai Chen
 * Jeffrey Dean

The recently introduced continuous Skip-gram model is an efficient method for
learning high-quality distributed vector representations that capture a large
number of precise syntactic and semantic word relationships. In this paper we
present several extensions that improve both the quality of the vectors and the
training speed. By subsampling of the frequent words we obtain significant
speedup and also learn more regular word representations. We also describe a
simple alternative to the hierarchical softmax called negative sampling. An
inherent limitation of word representations is their indifference to word order
and their inability to represent idiomatic phrases. For example, the meanings of
"Canada" and "Air" cannot be easily combined to obtain "Air Canada". Motivated
by this example, we present a simple method for finding phrases in text, and
show that learning good vector representations for millions of phrases is
possible.
View
Show abstract
The Google Similarity Distance
Article
Full-text available
 * Apr 2007

 * Rudi Cilibrasi
 * Paul M. B. Vitányi

Words and phrases acquire meaning from the way they are used in society, from
their relative semantics to other words and phrases. For computers, the
equivalent of "society" is "database," and the equivalent of "use" is "a way to
search the database". We present a new theory of similarity between words and
phrases based on information distance and Kolmogorov complexity. To fix
thoughts, we use the World Wide Web (WWW) as the database, and Google as the
search engine. The method is also applicable to other search engines and
databases. This theory is then applied to construct a method to automatically
extract similarity, the Google similarity distance, of words and phrases from
the WWW using Google page counts. The WWW is the largest database on earth, and
the context information entered by millions of independent users averages out to
provide automatic semantics of useful quality. We give applications in
hierarchical clustering, classification, and language translation. We give
examples to distinguish between colors and numbers, cluster names of paintings
by 17th century Dutch masters and names of books by English novelists, the
ability to understand emergencies and primes, and we demonstrate the ability to
do a simple automatic English-Spanish translation. Finally, we use the WordNet
database as an objective baseline against which to judge the performance of our
method. We conduct a massive randomized trial in binary classification using
support vector machines to learn categories based on our Google distance,
resulting in an a mean agreement of 87 percent with the expert crafted WordNet
categories
View
Show abstract
Multiple Positional Self-Attention Network for Text Classification
Article
 * Apr 2020

 * Biyun Dai
 * Jinlong Li
 * Ruoyi Xu

Self-attention mechanisms have recently caused many concerns on Natural Language
Processing (NLP) tasks. Relative positional information is important to
self-attention mechanisms. We propose Faraway Mask focusing on the (2m + 1)-gram
words and Scaled-Distance Mask putting the logarithmic distance punishment to
avoid and weaken the self-attention of distant words respectively. To exploit
different masks, we present Positional Self-Attention Layer for generating
different Masked-Self-Attentions and a following Position-Fusion Layer in which
fused positional information multiplies the Masked-Self-Attentions for
generating sentence embeddings. To evaluate our sentence embeddings approach
Multiple Positional Self-Attention Network (MPSAN), we perform the comparison
experiments on sentiment analysis, semantic relatedness and sentence
classification tasks. The result shows that our MPSAN outperforms
state-of-the-art methods on five datasets and the test accuracy is improved by
0.81%, 0.6% on SST, CR datasets, respectively. In addition, we reduce training
parameters and improve the time efficiency of MPSAN by lowering the dimension
number of self-attention and simplifying fusion mechanism.
View
Show abstract
Automatic Design of Semantic Similarity Controllers based on Fuzzy Logics
Article
 * Apr 2019
 * EXPERT SYST APPL

 * Jorge Martinez-Gil
 * Jose M. Chaves-González

Recent advances in machine learning have been able to make improvements over the
state-of-the-art regarding semantic similarity measurement techniques. In fact,
we have all seen how classical techniques have given way to promising neural
techniques. Nonetheless, these new techniques have a weak point: they are hardly
interpretable. For this reason, we have oriented our research towards the design
of strategies being able to be accurate enough but without sacrificing their
interpretability. As a result, we have obtained a strategy for the automatic
design of semantic similarity controllers based on fuzzy logics, which are
automatically identified using genetic algorithms (GAs). After an exhaustive
evaluation using a number of well-known benchmark datasets, we can conclude that
our strategy fulfills both expectations: it is able of achieving reasonably good
results, and at the same time, it can offer high degrees of interpretability.
View
Show abstract
Distributed Representations of Words and Phrases and their Compositionality
Conference Paper
 * Jan 2013
 * Adv Neural Inform Process Syst

 * Tomas Mikolov
 * Kai Chen
 * G.s. Corrado
 * Jeffrey Dean

The recently introduced continuous Skip-gram model is an efficient method for
learning high-quality distributed vector representations that capture a large
num- ber of precise syntactic and semantic word relationships. In this paper we
present several extensions that improve both the quality of the vectors and the
training speed. By subsampling of the frequent words we obtain significant
speedup and also learn more regular word representations. We also describe a
simple alterna- tive to the hierarchical softmax called negative sampling. An
inherent limitation of word representations is their indifference to word order
and their inability to represent idiomatic phrases. For example, the meanings of
“Canada” and “Air” cannot be easily combined to obtain “Air Canada”. Motivated
by this example,we present a simplemethod for finding phrases in text, and show
that learning good vector representations for millions of phrases is possible.
View
Show abstract
An experiment in linguistic synthesis witch a fuzzy logic controller
Article
 * Jan 1995
 * Int J Man Mach Stud

 * E. Mamdani

View
LWCR: Multi-Layered Wikipedia representation for Computing word Relatedness
Article
 * Aug 2016
 * NEUROCOMPUTING

 * Mohamed Ben Aouicha
 * Mohamed Ali Hadj Taieb
 * Abdelmajid Ben Hamadou

The measurement of the semantic relatedness between words has gained increasing
interest in several research fields, including cognitive science, artificial
intelligence, biology, and linguistics. The development of efficient measures is
based on knowledge resources, such as Wikipedia, a huge and living encyclopedia
supplied by net surfers. In this paper, we propose a novel approach based on
multi-Layered Wikipedia representation for Computing word Relatedness (LWCR)
exploiting a weighting scheme based on Wikipedia Category Graph (WCG): Term
Frequency-Inverse Category Frequency (tfxicf). Our proposal provides for each
category pertaining to the WCG a Category Description Vector (CDV) including the
weights of stems extracted from articles assigned to a category. The semantic
relatedness degree is computed using the cosine measure between the CDVs
assigned to the target words couple. The basic idea is followed by enhancement
modules exploiting other Wikipedia features, such as article titles, redirection
mechanism, and neighborhood category enrichment, to exploit semantic features
and better quantify the semantic relatedness between words. To the best of our
knowledge, this is the first attempt to incorporate the WCG-based term-weighting
scheme (tfxicf) into computing model of semantic relatedness. It is also the
first work that exploits 17 datasets in the assessment process, which are
divided into two sets. The first set includes the ones designed for semantic
similarity purposes: RG65, MC30, AG203, WP300, SimLexNoun666 and GeReSiD50Sim;
the second includes datasets for semantic relatedness evaluation: WordSim353,
GM30, Zeigler25, Zeigler30, MTurk287, MTurk771, MEN3000, Rel122, ReWord26,
GeReSiD50 and SCWS1229. The found results are compared to WordNet-based measures
and distributional measures cosine and PMI performed on Wikipedia articles.
Experiments show that our approach provides consistent improvements over the
state of the art results on multiple benchmarks.
View
Show abstract
Enriching Word Vectors with Subword Information
Article
 * Jul 2016

 * Piotr Bojanowski
 * Edouard Grave
 * Armand Joulin
 * Tomas Mikolov

Continuous word representations, trained on large unlabeled corpora are useful
for many natural language processing tasks. Many popular models to learn such
representations ignore the morphology of words, by assigning a distinct vector
to each word. This is a limitation, especially for morphologically rich
languages with large vocabularies and many rare words. In this paper, we propose
a new approach based on the skip-gram model, where each word is represented as a
bag of character n-grams. A vector representation is associated to each
character n-gram, words being represented as the sum of these representations.
Our method is fast, allowing to train models on large corpus quickly. We
evaluate the obtained word representations on five different languages, on word
similarity and analogy tasks.
View
Show abstract
Combining local context and WordNet similarity for word sense identification
Article
 * Jan 1998

 * Claudia Leacock
 * Martin Chodorow
 * C. Fellbaum

View
Improving Vector Space Word Representations Using Multilingual Correlation
Article
 * Jan 2014

 * Manaal Faruqui
 * Chris Dyer

The distributional hypothesis of Harris (1954), according to which the meaning
of words is evidenced by the contexts they occur in, has motivated several
effective techniques for obtaining vector space semantic representations of
words using unannotated text corpora. This paper argues that lexico-semantic
content should additionally be invariant across languages and proposes a simple
technique based on canonical correlation analysis (CCA) for incorporating
multilingual evidence into vectors generated monolingually. We evaluate the
resulting word representations on standard lexical semantic evaluation tasks and
show that our method produces substantially better semantic representations than
monolingual techniques.
View
Show abstract
Improving word representations via global context and multiple word prototypes
Conference Paper
 * Jul 2012

 * Eric H. Huang
 * Richard Socher
 * Christopher D. Manning
 * Andrew Y. Ng

Unsupervised word representations are very useful in NLP tasks both as inputs to
learning algorithms and as extra word features in NLP systems. However, most of
these models are built with only local context and one representation per word.
This is problematic because words are often polysemous and global context can
also provide useful information for learning word meanings. We present a new
neural network architecture which 1) learns word embeddings that better capture
the semantics of words by incorporating both local and global document context,
and 2) accounts for homonymy and polysemy by learning multiple embeddings per
word. We introduce a new dataset with human judgments on pairs of words in
sentential context, and evaluate our model on it, showing that our model
outperforms competitive baselines and other neural language models.
View
Show abstract
An experiment in linguistic synthesis of fuzzy controllers
Article
 * Jan 1974

 * E.H. Mamdani
 * S. Assilian

View
Fuzzy Identification of Systems and Its Applications to Modeling and Control
Article
 * Jan 1985

 * Tomohiro Takagi
 * Michio Sugeno

A mathematical tool to build a fuzzy model of a system where fuzzy implications
and reasoning are used is presented. The premise of an implication is the
description of fuzzy subspace of inputs and its consequence is a linear
input-output relation. The method of identification of a system using its
input-output data is then shown. Two applications of the method to industrial
processes are also discussed: a water cleaning process and a converter in a
steel-making process.
View
Show abstract
Contextual Correlates of Semantic Similarity
Article
 * Jan 1991
 * LANG COGNITIVE PROC

 * George A. Miller
 * Walter G. Charles

The relationship between semantic and contextual similarity is investigated for
pairs of nouns that vary from high to low semantic similarity. Semantic
similarity is estimated by subjective ratings; contextual similarity is
estimated by the method of sorting sentential contexts. The results show an
inverse linear relationship between similarity of meaning and the
discriminability of contexts. This relation, is obtained for two separate
corpora of sentence contexts. It is concluded that, on average, for words in the
same language drawn from the same syntactic and semantic categories, the more
often two words can be substituted into the same contexts the more similar in
meaning they are judged to be.
View
Show abstract
Sugeno, M.: Fuzzy Identification of Systems and its Applications to Modeling and
Control. IEEE Transactions on Systems, Man, and Cybernetics SMC-15(1), 116-132
Article
 * Jan 1985
 * IEEE Trans Syst Man Cybern Syst Hum

 * Tomohiro Takagi
 * Michio Sugeno

A mathematical tool to build a fuzzy model of a system where fuzzy implications
and reasoning are used is presented. The premise of an implication is the
description of fuzzy subspace of inputs and its consequence is a linear
input-output relation. The method of identification of a system using its
input-output data is then shown. Two applications of the method to industrial
processes are also discussed: a water cleaning process and a converter in a
steel-making process.
View
Show abstract
A Historical Review of Evolutionary Learning Methods for Mamdani-type Fuzzy
Rule-based Systems: Designing Interpretable Genetic Fuzzy Systems
Article
 * Sep 2011
 * INT J APPROX REASON

 * Oscar Cordon

The need for trading off interpretability and accuracy is intrinsic to the use
of fuzzy systems. The obtaining of accurate but also human-comprehensible fuzzy
systems played a key role in Zadeh and Mamdani’s seminal ideas and system
identification methodologies. Nevertheless, before the advent of soft computing,
accuracy progressively became the main concern of fuzzy model builders, making
the resulting fuzzy systems get closer to black-box models such as neural
networks. Fortunately, the fuzzy modeling scientific community has come back to
its origins by considering design techniques dealing with the
interpretability-accuracy tradeoff. In particular, the use of genetic fuzzy
systems has been widely extended thanks to their inherent flexibility and their
capability to jointly consider different optimization criteria. The current
contribution constitutes a review on the most representative genetic fuzzy
systems relying on Mamdani-type fuzzy rule-based systems to obtain interpretable
linguistic fuzzy models with a good accuracy.
View
Show abstract
An Experiment in Linguistic Synthesis with a Fuzzy Logic Controller
Article
 * Jan 1975

 * E.H. Mamdani
 * S. Assilian

This paper describes an experiment on the “linguistic” synthesis of a controller
for a model industrial plant (a steam engine), Fuzzy logic is used to convert
heuristic control rues stated by a human operator into an automatic control
strategy. The experiment was initiated to investigate the possibility of human
interaction with a learning controller. However, the control strategy set up
linguistically proved to be far better than expected in its own right, and the
basic experiment of linguistic control synthesis in a non-learning controller is
reported here.
View
Show abstract
Semantic Similarity in a Taxonomy: An Information Based Measure and Its
Application to Problems of Ambiguity in Natural Language
Article
 * Jul 1999
 * JAIR

 * Ps Resnik

This article presents a measure of semantic similarity in an IS-A taxonomy based
on the notion of shared information content. Experimental evaluation against a
benchmark set of human similarity judgments demonstrates that the measure
performs better than the traditional edge-counting approach. The article
presents algorithms that take advantage of taxonomic similarity in resolving
syntactic and semantic ambiguity, along with experimental results demonstrating
their effectiveness.
View
Show abstract
Automatic Generation of Fuzzy Rule-based Models from Data by Genetic Algorithms
Article
 * Mar 2003
 * INFORM SCIENCES

 * Plamen P. Angelov
 * R.A. Buswell

A methodology for the encoding of the chromosome of a genetic algorithm (GA) is
described in the paper. The encoding procedure is applied to the problem of
automatically generating fuzzy rule-based models from data. Models generated by
this approach have much of the flexibility of black-box methods, such as neural
networks. In addition, they implicitly express information about the process
being modelled through the linguistic terms associated with the rules. They can
be applied to problems that are too complex to model in a first principles sense
and can reduce the computational overhead when compared to established first
principles based models. The encoding mechanism allows the rule base structure
and parameters of the fuzzy model to be estimated simultaneously from data. The
principle advantage is the preservation of the linguistic concept without the
need to consider the entire rule base. The GA searches for the optimum solution
given a comparatively small number of rules compared to all possible. This
minimises the computational demand of the model generation and allows problems
with realistic dimensions to be considered. A further feature is that the rules
are extracted from the data without the need to establish any information about
the model structure a priori. The implementation of the algorithm is described
and the approach is applied to the modelling of components of heating
ventilating and air-conditioning systems.
View
Show abstract
Errata to "Flexible neuro-fuzzy systems".
Article
 * May 2003

 * Leszek Rutkowski
 * Krzysztof Cpałka

In this paper, we derive new neuro-fuzzy structures called flexible neuro-fuzzy
inference systems or FLEXNFIS. Based on the input-output data, we learn not only
the parameters of the membership functions but also the type of the systems
(Mamdani or logical). Moreover, we introduce: 1) softness to fuzzy implication
operators, to aggregation of rules and to connectives of antecedents; 2)
certainty weights to aggregation of rules and to connectives of antecedents; and
3) parameterized families of T-norms and S-norms to fuzzy implication operators,
to aggregation of rules and to connectives of antecedents. Our approach
introduces more flexibility to the structure and design of neuro-fuzzy systems.
Through computer simulations, we show that Mamdani-type systems are more
suitable to approximation problems, whereas logical-type systems may be
preferred for classification problems.
View
Show abstract
An Approach for Measuring Semantic Similarity between Words Using Multiple
Information Sources
Article
 * Aug 2003

 * Yuhua Li
 * Zuhair Bandar
 * David McLean

Full-text of this article is not available in this e-prints service. This
article was originally published following peer-review in IEEE Transactions on
Knowledge and Data Engineering, published by and copyright IEEE. Semantic
similarity between words is becoming a generic problem for many applications of
computational linguistics and artificial intelligence. This paper explores the
determination of semantic similarity by a number of information sources, which
consist of structural semantic information from a lexical taxonomy and
information content from a corpus. To investigate how information sources could
be used effectively, a variety of strategies for using various possible
information sources are implemented. A new measure is then proposed which
combines information sources nonlinearly. Experimental evaluation against a
benchmark set of human similarity ratings demonstrates that the proposed measure
significantly outperforms traditional similarity measures.
View
Show abstract
Errata to "Flexible neuro-fuzzy systems".
Article
 * Feb 2003

 * Leszek Rutkowski
 * Krzysztof Cpałka

First Page of the Article
View
Show abstract
An Information-Theoretic Definition of Similarity
Article
 * Aug 1998

 * Dekang Lin

Similarity is an important and widely used concept. Previous definitions of
similarity are tied to a particular application or a form of knowledge
representation. We present an informationtheoretic definition of similarity that
is applicable as long as there is a probabilistic model. We demonstrate how our
definition can be used to measure the similarity in a number of different
domains.
View
Show abstract
Semantic Similarity Based on Corpus Statistics and Lexical Taxonomy
Article
 * Oct 1997

 * Jay J. Jiang
 * David W. Conrath

This paper presents a new approach for measuring semantic similarity/distance
between words and concepts. It combines a lexical taxonomy structure with corpus
statistical information so that the semantic distance between nodes in the
semantic space constructed by the taxonomy can be better quantified with the
computational evidence derived from a distributional analysis of corpus data.
Specifically, the proposed measure is a combined approach that inherits the
edge-based approach of the edge counting scheme, which is then enhanced by the
node-based approach of the information content calculation. When tested on a
common data set of word pair similarity ratings, the proposed approach
outperforms other computational models. It gives the highest correlation value
(r = 0.828) with a benchmark based on human similarity judgements, whereas an
upper bound (r = 0.885) is observed when human subjects replicate the same task.
View
Show abstract
Bert: pre-training of deep bidirectional transformers for language understanding
 * Jan 2018

 * J Devlin
 * M W Chang
 * K Lee
 * K Toutanova

Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: Bert: pre-training of deep
bidirectional transformers for language understanding. arXiv preprint
arXiv:1810.04805 (2018)

The Thirty-Second Innovative Applications of Artificial Intelligence Conference
 * Feb 2020
 * 7610-7617

 * B Dai
 * J Li
 * R Xu

Dai, B., Li, J., Xu, R.: Multiple positional self-attention network for text
classification. In: The Thirty-Fourth AAAI Conference on Artificial
Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial
Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational
Advances in Artificial Intelligence, EAAI 2020, New York, NY, USA, 7-12 February
2020, pp. 7610-7617. AAAI Press (2020)





RECOMMENDED PUBLICATIONS

Discover more
Article
Full-text available


A NOVEL SPLIT-SIM APPROACH FOR EFFICIENT IMAGE RETRIEVAL

April 2022 · Multimedia Systems
 * Aiswarya S Kumar
 * [...]
 * Jyothisha Nair

Recent advancements in computer vision have given image understanding and
retrieval a new face. But the current image retrieval systems do not meet users’
demand in retrieving alike and meaningful images with respect to the query. This
paper proposes a novel method for image retrieval using scene graphs. We
generate scene graphs for images and each one of them is a collection of objects
(apple, ... [Show full abstract] table), attributes ((apple, red), (table,
wooden)) and relationships (apple, on, table). When an image is given as a
query, images whose scene graphs are similar to that of the query image’s scene
graph, are retrieved. An algorithm called SPLIT–SIM is proposed to find
similarity between two scene graphs, which grant rewards for similar objects,
attributes and relationships. Based on the rewards awarded, a final ranking list
of images is generated for retrieval. The proposed algorithm was evaluated at
three levels: object level, attribute level as well as relation level and the
results demonstrate the efficiency of the proposed semantic similarity measure,
significantly improving existing similarity measures.
View full-text
Article
Full-text available


HIERARCHY-BASED SEMANTIC EMBEDDINGS FOR SINGLE-VALUED & MULTI-VALUED CATEGORICAL
VARIABLES

June 2022 · Journal of Intelligent Information Systems
 * Summaya Mumtaz
 * [...]
 * Martin Giese

In low-resource domains, it is challenging to achieve good performance using
existing machine learning methods due to a lack of training data and mixed data
types (numeric and categorical). In particular, categorical variables with high
cardinality pose a challenge to machine learning tasks such as classification
and regression because training requires sufficiently many data points for the
... [Show full abstract] possible values of each variable. Since interpolation
is not possible, nothing can be learned for values not seen in the training set.
This paper presents a method that uses prior knowledge of the application domain
to support machine learning in cases with insufficient data. We propose to
address this challenge by using embeddings for categorical variables that are
based on an explicit representation of domain knowledge (KR), namely a hierarchy
of concepts. Our approach is to 1. define a semantic similarity measure between
categories, based on the hierarchy—we propose a purely hierarchy-based measure,
but other similarity measures from the literature can be used—and 2. use that
similarity measure to define a modified one-hot encoding. We propose two
embedding schemes for single-valued and multi-valued categorical data. We
perform experiments on three different use cases. We first compare existing
similarity approaches with our approach on a word pair similarity use case. This
is followed by creating word embeddings using different similarity approaches. A
comparison with existing methods such as Google, Word2Vec and GloVe embeddings
on several benchmarks shows better performance on concept categorisation tasks
when using knowledge-based embeddings. The third use case uses a medical dataset
to compare the performance of semantic-based embeddings and standard binary
encodings. Significant improvement in performance of the downstream
classification tasks is achieved by using semantic information.
View full-text
Article
Full-text available


EMBEDDINGS EVALUATION USING A NOVEL MEASURE OF SEMANTIC SIMILARITY

March 2022 · Cognitive Computation
 * Anna Giabelli
 * Lorenzo Malandri
 * Fabio Mercorio
 * [...]
 * Navid Nobani

Lexical taxonomies and distributional representations are largely used to
support a wide range of NLP applications, including semantic similarity
measurements. Recently, several scholars have proposed new approaches to combine
those resources into unified representation preserving distributional and
knowledge-based lexical features. In this paper, we propose and implement
TaxoVec, a novel ... [Show full abstract] approach to selecting word embeddings
based on their ability to preserve taxonomic similarity. In TaxoVec, we first
compute the pairwise semantic similarity between taxonomic words through a new
measure we previously developed, the Hierarchical Semantic Similarity (HSS),
which we show outperforms previous measures on several benchmark tasks. Then, we
train several embedding models on a text corpus and select the best model, that
is, the model that maximizes the correlation between the HSS and the cosine
similarity of the pair of words that are in both the taxonomy and the corpus. To
evaluate TaxoVec, we repeat the embedding selection process using three other
semantic similarity benchmark measures. We use the vectors of the four selected
embeddings as machine learning model features to perform several NLP tasks. The
performances of those tasks constitute an extrinsic evaluation of the criteria
for the selection of the best embedding (i.e. the adopted semantic similarity
measure). Experimental results show that (i) HSS outperforms state-of-the-art
measures for measuring semantic similarity in taxonomy on a benchmark intrinsic
evaluation and (ii) the embedding selected through TaxoVec achieves a clear
victory against embeddings selected by the competing measures on benchmark NLP
tasks. We implemented the HSS, together with other benchmark measures of
semantic similarity, as a full-fledged Python package called TaxoSS, whose
documentation is available at https://pypi.org/project/TaxoSS.
View full-text
Article
Full-text available


GONTOSIM: A SEMANTIC SIMILARITY MEASURE BASED ON LCA AND COMMON DESCENDANTS

March 2022 · Scientific Reports
 * Amna Binte Kamran
 * [...]
 * Hammad Naveed

The Gene Ontology (GO) is a controlled vocabulary that captures the semantics or
context of an entity based on its functional role. Biomedical entities are
frequently compared to each other to find similarities to help in data
annotation and knowledge transfer. In this study, we propose GOntoSim, a novel
method to determine the functional similarity between genes. GOntoSim quantifies
the ... [Show full abstract] similarity between pairs of GO terms, by taking the
graph structure and the information content of nodes into consideration. Our
measure quantifies the similarity between the ancestors of the GO terms
accurately. It also takes into account the common children of the GO terms.
GOntoSim is evaluated using the entire Enzyme Dataset containing 10,890 proteins
and 97,544 GO annotations. The enzymes are clustered and compared with the Gold
Standard EC numbers. At level 1 of the EC Numbers for Molecular Function,
GOntoSim achieves a purity score of 0.75 as compared to 0.47 and 0.51 GOGO and
Wang. GOntoSim can handle the noisy IEA annotations. We achieve a purity score
of 0.94 in contrast to 0.48 for both GOGO and Wang at level 1 of the EC Numbers
with IEA annotations. GOntoSim can be freely accessed at (
http://www.cbrlab.org/GOntoSim.html ).
View full-text
Last Updated: 05 Jul 2022



LOOKING FOR THE FULL-TEXT?

You can request the full-text of this chapter directly from the authors on
ResearchGate.

Request full-text
Already a member? Log in

ResearchGate iOS App
Get it from the App Store now.
Install
Keep up with your stats and more
Access scientific knowledge from anywhere

or
Discover by subject area
 * Recruit researchers
 * Join for free
 * Login
   Email
   Tip: Most researchers use their institutional email address as their
   ResearchGate login
   
   PasswordForgot password?
   Keep me logged in
   Log in
   or
   Continue with Google
   Welcome back! Please log in.
   Email
   · Hint
   Tip: Most researchers use their institutional email address as their
   ResearchGate login
   
   PasswordForgot password?
   Keep me logged in
   Log in
   or
   Continue with Google
   No account? Sign up
   

Company
About us
News
Careers
Support
Help Center
Business solutions
Advertising
Recruiting

© 2008-2022 ResearchGate GmbH. All rights reserved.
 * Terms
 * Privacy
 * Copyright
 * Imprint