www.researchgate.net
Open in
urlscan Pro
2606:4700::6811:2169
Public Scan
URL:
https://www.researchgate.net/publication/354359339_A_Novel_Neurofuzzy_Approach_for_Semantic_Similarity_Measurement
Submission: On July 26 via manual from AT — Scanned from DE
Submission: On July 26 via manual from AT — Scanned from DE
Form analysis
3 forms found in the DOMGET search
<form method="GET" action="search" class="lite-page__header-search-input-wrapper"><input type="hidden" name="context" readonly="" value="publicSearchHeader"><input placeholder="Search for publications, researchers, or questions" name="q"
autocomplete="off" class="lite-page__header-search-input"><button
class="nova-legacy-c-button nova-legacy-c-button--align-center nova-legacy-c-button--radius-m nova-legacy-c-button--size-m nova-legacy-c-button--color-green nova-legacy-c-button--theme-bare nova-legacy-c-button--width-square lite-page__header-search-button"
type="submit"><span class="nova-legacy-c-button__label"><svg aria-hidden="true" class="nova-legacy-e-icon nova-legacy-e-icon--size-s nova-legacy-e-icon--theme-bare nova-legacy-e-icon--color-inherit nova-legacy-e-icon--luminosity-medium">
<use xlink:href="/m/4210735495654424/images/icons/nova/icon-stack-s.svg#magnifier-s"></use>
</svg></span></button></form>
Name: loginForm — POST https://www.researchgate.net/login?_sg=XE4Gu8QnqX4etxTfiAFeq0mOLLql-dPzsN2H14puQsW_0YYbn6xVizada49ZorzU4dGXq5-55puscg
<form method="post" action="https://www.researchgate.net/login?_sg=XE4Gu8QnqX4etxTfiAFeq0mOLLql-dPzsN2H14puQsW_0YYbn6xVizada49ZorzU4dGXq5-55puscg" name="loginForm" id="headerLoginForm"><input type="hidden" name="request_token"
value="aad-1XO2ubGb6mFVWtfo00YndpQhj2SoMv1ic2RNjbzgCFDnmjuMXuB3wkjIA+ha6Yqvaag7poIkOlPpGE80icvZCCJPo2qJ8iNlwIhYP+v9L05CqGzvZmts2xCaoBj1cjXsPHRXGwurN3awfPc0JOof2TLgjbHa0hu59WJjT2tqW//sEBw6VYapS4DPOJrTXzq+iNunyEd31wyRrrDW+ze9xnxMaDfkdqV7A8unpUkHN2H85dAr6N+FYAyjvdU1mhhzv4oy3r5QOd1aNMOsY9c="><input
type="hidden" name="urlAfterLogin" value="publication/354359339_A_Novel_Neurofuzzy_Approach_for_Semantic_Similarity_Measurement"><input type="hidden" name="invalidPasswordCount" value="0"><input type="hidden" name="headerLogin" value="yes">
<div class="lite-page__header-login-item"><label class="nova-legacy-e-text nova-legacy-e-text--size-m nova-legacy-e-text--family-sans-serif nova-legacy-e-text--spacing-none nova-legacy-e-text--color-inherit lite-page__header-login-label"
for="input-header-login">Email <div class="lite-page-tooltip "><svg aria-hidden="true" class="nova-legacy-e-icon nova-legacy-e-icon--size-s nova-legacy-e-icon--theme-bare nova-legacy-e-icon--color-inherit nova-legacy-e-icon--luminosity-medium">
<use xlink:href="/m/4210735495654424/images/icons/nova/icon-stack-s.svg#info-circle-s"></use>
</svg>
<div class="lite-page-tooltip__content lite-page-tooltip__content--above">
<div class="nova-legacy-e-text nova-legacy-e-text--size-s nova-legacy-e-text--family-sans-serif nova-legacy-e-text--spacing-none nova-legacy-e-text--color-inherit"><b>Tip:</b> Most researchers use their institutional email address as their
ResearchGate login</div>
<div class="lite-page-tooltip__arrow lite-page-tooltip__arrow--above">
<div class="lite-page-tooltip__arrow-tip"></div>
</div>
</div>
</div></label></div><input type="email" required="" id="input-header-login" name="login" autocomplete="email" tabindex="1" placeholder=""
class="nova-legacy-e-input__field nova-legacy-e-input__field--size-m lite-page__header-login-item nova-legacy-e-input__ambient nova-legacy-e-input__ambient--theme-default">
<div class="lite-page__header-login-item"><label class="lite-page__header-login-label"
for="input-header-password">Password</label><a class="nova-legacy-e-link nova-legacy-e-link--color-blue nova-legacy-e-link--theme-bare lite-page__header-login-forgot" href="application.LostPassword.html">Forgot password?</a></div><input
type="password" required="" id="input-header-password" name="password" autocomplete="current-password" tabindex="2" placeholder=""
class="nova-legacy-e-input__field nova-legacy-e-input__field--size-m lite-page__header-login-item nova-legacy-e-input__ambient nova-legacy-e-input__ambient--theme-default">
<div><label class="nova-legacy-e-checkbox lite-page__header-login-checkbox"><input type="checkbox" class="nova-legacy-e-checkbox__input" aria-invalid="false" name="setLoginCookie" tabindex="3" value="yes" checked=""><span
class="nova-legacy-e-checkbox__checkmark"></span><span class="nova-legacy-e-checkbox__label"> Keep me logged in</span></label></div>
<div
class="nova-legacy-l-flex__item nova-legacy-l-flex nova-legacy-l-flex--gutter-m nova-legacy-l-flex--direction-column@s-up nova-legacy-l-flex--align-items-stretch@s-up nova-legacy-l-flex--justify-content-center@s-up nova-legacy-l-flex--wrap-nowrap@s-up">
<div class="nova-legacy-l-flex__item"><button
class="nova-legacy-c-button nova-legacy-c-button--align-center nova-legacy-c-button--radius-m nova-legacy-c-button--size-m nova-legacy-c-button--color-blue nova-legacy-c-button--theme-solid nova-legacy-c-button--width-full" type="submit"
tabindex="4"><span class="nova-legacy-c-button__label">Log in</span></button></div>
<div class="nova-legacy-l-flex__item nova-legacy-l-flex__item--align-self-center@s-up">
<div class="nova-legacy-e-text nova-legacy-e-text--size-s nova-legacy-e-text--family-sans-serif nova-legacy-e-text--spacing-none nova-legacy-e-text--color-inherit">or</div>
</div>
<div class="nova-legacy-l-flex__item">
<div
class="nova-legacy-l-flex__item nova-legacy-l-flex nova-legacy-l-flex--gutter-m nova-legacy-l-flex--direction-column@s-up nova-legacy-l-flex--align-items-center@s-up nova-legacy-l-flex--justify-content-flex-start@s-up nova-legacy-l-flex--wrap-nowrap@s-up">
<div class="nova-legacy-l-flex__item">
<a href="connector/google"><div style="display:inline-block;width:247px;height:40px;text-align:left;border-radius:2px;white-space:nowrap;color:#444;background:#4285F4"><span style="margin:1px 0 0 1px;display:inline-block;vertical-align:middle;width:38px;height:38px;background:url('images/socialNetworks/logos-official-2019-05/google-logo.svg') transparent 50% no-repeat"></span><span style="color:#FFF;display:inline-block;vertical-align:middle;padding-left:15px;padding-right:42px;font-size:16px;font-family:Roboto, sans-serif">Continue with Google</span></div></a>
</div>
</div>
</div>
</div>
</form>
Name: loginForm — POST https://www.researchgate.net/login?_sg=XE4Gu8QnqX4etxTfiAFeq0mOLLql-dPzsN2H14puQsW_0YYbn6xVizada49ZorzU4dGXq5-55puscg
<form method="post" action="https://www.researchgate.net/login?_sg=XE4Gu8QnqX4etxTfiAFeq0mOLLql-dPzsN2H14puQsW_0YYbn6xVizada49ZorzU4dGXq5-55puscg" name="loginForm" id="modalLoginForm"><input type="hidden" name="request_token"
value="aad-1XO2ubGb6mFVWtfo00YndpQhj2SoMv1ic2RNjbzgCFDnmjuMXuB3wkjIA+ha6Yqvaag7poIkOlPpGE80icvZCCJPo2qJ8iNlwIhYP+v9L05CqGzvZmts2xCaoBj1cjXsPHRXGwurN3awfPc0JOof2TLgjbHa0hu59WJjT2tqW//sEBw6VYapS4DPOJrTXzq+iNunyEd31wyRrrDW+ze9xnxMaDfkdqV7A8unpUkHN2H85dAr6N+FYAyjvdU1mhhzv4oy3r5QOd1aNMOsY9c="><input
type="hidden" name="urlAfterLogin" value="publication/354359339_A_Novel_Neurofuzzy_Approach_for_Semantic_Similarity_Measurement"><input type="hidden" name="invalidPasswordCount" value="0"><input type="hidden" name="modalLogin" value="yes">
<div class="nova-legacy-l-form-group nova-legacy-l-form-group--layout-stack nova-legacy-l-form-group--gutter-s">
<div class="nova-legacy-l-form-group__item nova-legacy-l-form-group__item--width-auto@m-up"><label
class="nova-legacy-e-text nova-legacy-e-text--size-m nova-legacy-e-text--family-sans-serif nova-legacy-e-text--spacing-xxs nova-legacy-e-text--color-inherit nova-legacy-e-label" for="input-modal-login-label"><span
class="nova-legacy-e-label__text">Email <div class="lite-page-tooltip "><span class="nova-legacy-e-text nova-legacy-e-text--size-m nova-legacy-e-text--family-sans-serif nova-legacy-e-text--spacing-none nova-legacy-e-text--color-grey-500">·
Hint</span>
<div class="lite-page-tooltip__content lite-page-tooltip__content--above">
<div class="nova-legacy-e-text nova-legacy-e-text--size-s nova-legacy-e-text--family-sans-serif nova-legacy-e-text--spacing-none nova-legacy-e-text--color-inherit"><b>Tip:</b> Most researchers use their institutional email address as
their ResearchGate login</div>
<div class="lite-page-tooltip__arrow lite-page-tooltip__arrow--above">
<div class="lite-page-tooltip__arrow-tip"></div>
</div>
</div>
</div></span></label><input type="email" required="" id="input-modal-login" name="login" autocomplete="email" tabindex="1" placeholder="Enter your email"
class="nova-legacy-e-input__field nova-legacy-e-input__field--size-m nova-legacy-e-input__ambient nova-legacy-e-input__ambient--theme-default"></div>
<div class="nova-legacy-l-form-group__item nova-legacy-l-form-group__item--width-auto@m-up">
<div class="lite-page-modal__forgot"><label class="nova-legacy-e-text nova-legacy-e-text--size-m nova-legacy-e-text--family-sans-serif nova-legacy-e-text--spacing-xxs nova-legacy-e-text--color-inherit nova-legacy-e-label"
for="input-modal-password-label"><span
class="nova-legacy-e-label__text">Password</span></label><a class="nova-legacy-e-link nova-legacy-e-link--color-blue nova-legacy-e-link--theme-bare lite-page-modal__forgot-link" href="application.LostPassword.html">Forgot password?</a>
</div><input type="password" required="" id="input-modal-password" name="password" autocomplete="current-password" tabindex="2" placeholder=""
class="nova-legacy-e-input__field nova-legacy-e-input__field--size-m nova-legacy-e-input__ambient nova-legacy-e-input__ambient--theme-default">
</div>
<div class="nova-legacy-l-form-group__item nova-legacy-l-form-group__item--width-auto@m-up">
<div><label class="nova-legacy-e-checkbox"><input type="checkbox" class="nova-legacy-e-checkbox__input" aria-invalid="false" checked="" value="yes" name="setLoginCookie" tabindex="3"><span class="nova-legacy-e-checkbox__checkmark"></span><span
class="nova-legacy-e-checkbox__label"> Keep me logged in</span></label></div>
</div>
<div class="nova-legacy-l-form-group__item nova-legacy-l-form-group__item--width-auto@m-up"><button
class="nova-legacy-c-button nova-legacy-c-button--align-center nova-legacy-c-button--radius-m nova-legacy-c-button--size-m nova-legacy-c-button--color-blue nova-legacy-c-button--theme-solid nova-legacy-c-button--width-full" type="submit"
tabindex="4"><span class="nova-legacy-c-button__label">Log in</span></button></div>
<div class="nova-legacy-l-form-group__item nova-legacy-l-form-group__item--width-auto@m-up">
<div
class="nova-legacy-l-flex__item nova-legacy-l-flex nova-legacy-l-flex--gutter-m nova-legacy-l-flex--direction-column@s-up nova-legacy-l-flex--align-items-center@s-up nova-legacy-l-flex--justify-content-flex-start@s-up nova-legacy-l-flex--wrap-nowrap@s-up">
<div class="nova-legacy-l-flex__item">
<div class="nova-legacy-e-text nova-legacy-e-text--size-s nova-legacy-e-text--family-sans-serif nova-legacy-e-text--spacing-none nova-legacy-e-text--color-inherit">or</div>
</div>
<div class="nova-legacy-l-flex__item">
<div
class="nova-legacy-l-flex__item nova-legacy-l-flex nova-legacy-l-flex--gutter-m nova-legacy-l-flex--direction-column@s-up nova-legacy-l-flex--align-items-center@s-up nova-legacy-l-flex--justify-content-flex-start@s-up nova-legacy-l-flex--wrap-nowrap@s-up">
<div class="nova-legacy-l-flex__item">
<a href="connector/google"><div style="display:inline-block;width:247px;height:40px;text-align:left;border-radius:2px;white-space:nowrap;color:#444;background:#4285F4"><span style="margin:1px 0 0 1px;display:inline-block;vertical-align:middle;width:38px;height:38px;background:url('images/socialNetworks/logos-official-2019-05/google-logo.svg') transparent 50% no-repeat"></span><span style="color:#FFF;display:inline-block;vertical-align:middle;padding-left:15px;padding-right:42px;font-size:16px;font-family:Roboto, sans-serif">Continue with Google</span></div></a>
</div>
</div>
</div>
<div class="nova-legacy-l-flex__item">
<div class="nova-legacy-e-text nova-legacy-e-text--size-s nova-legacy-e-text--family-sans-serif nova-legacy-e-text--spacing-none nova-legacy-e-text--color-grey-500 lite-page-center">No account?
<a class="nova-legacy-e-link nova-legacy-e-link--color-blue nova-legacy-e-link--theme-decorated" href="signup.SignUp.html?hdrsu=1&_sg%5B0%5D=ZqhiFLKf3W3rJPYv5A9T5XWeQdqmt_UGM8QKUvl0Mj_BYY1ZvNtbPjuyfLFQNk4k7Q8DPcgmKfdLMG7c2y-YTh52lls">Sign up</a>
</div>
</div>
</div>
</div>
</div>
</form>
Text Content
WE VALUE YOUR PRIVACY We and our partners store and/or access information on a device, such as cookies and process personal data, such as unique identifiers and standard information sent by a device for personalised ads and content, ad and content measurement, and audience insights, as well as to develop and improve products.With your permission we and our partners may use precise geolocation data and identification through device scanning. You may click to consent to our and our partners’ processing as described above. Alternatively you may click to refuse to consent or access more detailed information and change your preferences before consenting. Please note that some processing of your personal data may not require your consent, but you have a right to object to such processing. Your preferences will apply to a group of websites. You can change your preferences at any time by returning to this site or visit our privacy policy. MORE OPTIONS DISAGREE AGREE Chapter A NOVEL NEUROFUZZY APPROACH FOR SEMANTIC SIMILARITY MEASUREMENT * September 2021 DOI:10.1007/978-3-030-86534-4_18 * In book: Big Data Analytics and Knowledge Discovery (pp.192-203) Authors: Jorge Martinez-Gil * Software Competence Center Hagenberg Riad Mokadem * Paul Sabatier University - Toulouse III Josef Küng Josef Küng * This person is not on ResearchGate, or hasn't claimed this research yet. Abdelkader Hameurlain Abdelkader Hameurlain * This person is not on ResearchGate, or hasn't claimed this research yet. Request full-text PDF To read the full-text of this research, you can request a copy directly from the authors. Request full-text Download citation Copy link Link copied Request full-text Download citation Copy link Link copied To read the full-text of this research, you can request a copy directly from the authors. Citations (1) References (34) ABSTRACT The problem of identifying the degree of semantic similarity between two textual statements automatically has grown in importance in recent times. Its impact on various computer-related domains and recent breakthroughs in neural computation has increased the opportunities for better solutions to be developed. This research takes the research efforts a step further by designing and developing a novel neurofuzzy approach for semantic textual similarity that uses neural networks and fuzzy logics. The fundamental notion is to combine the remarkable capabilities of the current neural models for working with text with the possibilities that fuzzy logic provides for aggregating numerical information in a tailored manner. The results of our experiments suggest that this approach is capable of accurately determining semantic textual similarity. Discover the world's research * 20+ million members * 135+ million publications * 700k+ research projects Join for free NO FULL-TEXT AVAILABLE To read the full-text of this research, you can request a copy directly from the authors. Request full-text PDF CITATIONS (1) REFERENCES (34) End-to-End Generation of Multiple-Choice Questions Using Text-to-Text Transfer Transformer Models Article * Jul 2022 * EXPERT SYST APPL The increasing worldwide adoption of e-learning tools and widespread increase of online education has brought multiple challenges, including the ability of generating assessments at the scale and speed demanded by this environment. In this sense, recent advances in language models and architectures like the Transformer, provide opportunities to explore how to assist educators in these tasks. This study focuses on using neural language models for the generation of questionnaires composed of multiple-choice questions, based on English Wikipedia articles as input. The problem is addressed using three dimensions: Question Generation (QG), Question Answering (QA), and Distractor Generation (DG). A processing pipeline based on pre-trained T5 language models is designed and a REST API is implemented for its use. The DG task is defined using a Text-To-Text format and a T5 model is fine-tuned on the DG-RACE dataset, showing an improvement to ROUGE-L metric compared to the reference for the dataset. A discussion about the lack of an adequate metric for DG is presented and the cosine similarity using word embeddings is considered as a complement. Questionnaires are evaluated by human experts reporting that questions and options are generally well formed, however, they are more oriented to measuring retention than comprehension. View Show abstract A reproducible survey on word embeddings and ontology-based methods for word similarity: Linear combinations outperform the state of the art Article Full-text available * Aug 2019 * ENG APPL ARTIF INTEL * Juan José Lastra-Díaz * Josu Goikoetxea * Mohamed Ali Hadj Taieb * Eneko Agirre Human similarity and relatedness judgements between concepts underlie most of cognitive capabilities, such as categorisation, memory, decision-making and reasoning. For this reason, the proposal of methods for the estimation of the degree of similarity and relatedness between words and concepts has been a very active line of research in the fields of artificial intelligence, information retrieval and natural language processing among others. Main approaches proposed in the literature can be categorised in two large families as follows: (1) Ontology-based semantic similarity Measures (OM) and (2) distributional measures whose most recent and successful methods are based on Word Embedding (WE) models. However, the lack of a deep analysis of both families of methods slows down the advance of this line of research and its applications. This work introduces the largest, reproducible and detailed experimental survey of OM measures and WE models reported in the literature which is based on the evaluation of both families of methods on a same software platform, with the aim of elucidating what is the state of the problem. We show that WE models which combine distributional and ontology-based information get the best results, and in addition, we show for the first time that a simple average of two best performing WE models with other ontology-based measures or WE models is able to improve the state of the art by a large margin. In addition, we provide a very detailed reproducibility protocol together with a collection of software tools and datasets as complementary material to allow the exact replication of our results. View Show abstract Universal Sentence Encoder for English Conference Paper Full-text available * Jan 2018 * Daniel Cer * Yinfei Yang * Sheng-yi Kong * Ray Kurzweil View Semantic similarity aggregators for very short textual expressions: a case study on landmarks and point of interest Article Full-text available * Oct 2019 * J INTELL INF SYST * Jorge Martinez-Gil Semantic similarity measurement aims to automatically compute the degree of similarity between two textual expressions that use different representations for naming the same concepts. However, very short textual expressions cannot always follow the syntax of a written language and, in general, do not provide enough information to support proper analysis. This means that in some fields, such as the processing of landmarks and points of interest, results are not entirely satisfactory. In order to overcome this situation, we explore the idea of aggregating existing methods by means of two novel aggregation operators aiming to model an appropriate interaction between the similarity measures. As a result, we have been able to improve the results of existing techniques when solving the GeReSiD and the SDTS, two of the most popular benchmark datasets for dealing with geographical information. View Show abstract Universal Sentence Encoder Article Full-text available * Mar 2018 * Daniel Cer * Yinfei Yang * Sheng-yi Kong * Ray Kurzweil We present models for encoding sentences into embedding vectors that specifically target transfer learning to other NLP tasks. The models are efficient and result in accurate performance on diverse transfer tasks. Two variants of the encoding models allow for trade-offs between accuracy and compute resources. For both variants, we investigate and report the relationship between model complexity, resource consumption, the availability of transfer task training data, and task performance. Comparisons are made with baselines that use word level transfer learning via pretrained word embeddings as well as baselines do not use any transfer learning. We find that transfer learning using sentence embeddings tends to outperform word level transfer. With transfer learning via sentence embeddings, we observe surprisingly good performance with minimal amounts of supervised training data for a transfer task. We obtain encouraging results on Word Embedding Association Tests (WEAT) targeted at detecting model bias. Our pre-trained sentence encoding models are made freely available for download and on TF Hub. View Show abstract HESML: A scalable ontology-based semantic similarity measures library with a set of reproducible experiments and a replication dataset Article Full-text available * Jun 2017 * INFORM SYST * Juan José Lastra-Díaz * Ana M Garcia-Serrano * Montserrat Batet * Fernando Chirigati This work is a detailed companion reproducibility paper of the methods and experiments proposed by Lastra-Díaz and García-Serrano in [56, 57, 58], which introduces the following contributions: (1) a new and efficient representation model for taxonomies, called PosetHERep, which is an adaptation of the half-edge data structure commonly used to represent discrete manifolds and planar graphs; (2) a new Java software library called the Half-Edge Semantic Measures Library (HESML) based on PosetHERep, which implements most ontology-based semantic similarity measures and Information Content (IC) models reported in the literature; (3) a set of reproducible experiments on word similarity based on HESML and ReproZip with the aim of exactly reproducing the experimental surveys in the three aforementioned works; (4) a replication framework and dataset, called WNSimRep v1, whose aim is to assist the exact replication of most methods reported in the literature; and finally, (5) a set of scalability and performance benchmarks for semantic measures libraries. PosetHERep and HESML are motivated by several drawbacks in the current semantic measures libraries, especially the performance and scalability, as well as the evaluation of new methods and the replication of most previous methods. The reproducible experiments introduced herein are encouraged by the lack of a set of large, self-contained and easily reproducible experiments with the aim of replicating and confirming previously reported results. Likewise, the WNSimRep v1 dataset is motivated by the discovery of several contradictory results and difficulties in reproducing previously reported methods and experiments. PosetHERep proposes a memory-efficient representation for taxonomies which linearly scales with the size of the taxonomy and provides an efficient implementation of most taxonomy-based algorithms used by the semantic measures and IC models, whilst HESML provides an open framework to aid research into the area by providing a simpler and more efficient software architecture than the current software libraries. Finally, we prove the outperformance of HESML on the state-of-the-art libraries, as well as the possibility of significantly improving their performance and scalability without caching using PosetHERep. View Show abstract CoTO: A Novel Approach for Fuzzy Aggregation of Semantic Similarity Measures Article Full-text available * Feb 2016 * COGN SYST RES * Jorge Martinez-Gil Semantic similarity measurement aims to determine the likeness between two text expressions that use different lexicographies for representing the same real object or idea. There are a lot of semantic similarity measures for addressing this problem. However, the best results have been achieved when aggregating a number of simple similarity measures. This means that after the various similarity values have been calculated, the overall similarity for a pair of text expressions is computed using an aggregation function of these individual semantic similarity values. This aggregation is often computed by means of statistical functions. In this work, we present CoTO (Consensus or Trade-Off) a solution based on fuzzy logic that is able to outperform these traditional approaches. View Show abstract Semantic Similarity from Natural Language and Ontology Analysis Book Full-text available * May 2015 * Sébastien Harispe * Sylvie Ranwez * Stefan Janaqi * Jacky Montmain Artificial Intelligence federates numerous scientific fields in the aim of developing machines able to assist human operators performing complex treatments -- most of which demand high cognitive skills (e.g. learning or decision processes). Central to this quest is to give machines the ability to estimate the likeness or similarity between things in the way human beings estimate the similarity between stimuli. In this context, this book focuses on semantic measures: approaches designed for comparing semantic entities such as units of language, e.g. words, sentences, or concepts and instances defined into knowledge bases. The aim of these measures is to assess the similarity or relatedness of such semantic entities by taking into account their semantics, i.e. their meaning -- intuitively, the words tea and coffee, which both refer to stimulating beverage, will be estimated to be more semantically similar than the words toffee (confection) and coffee, despite that the last pair has a higher syntactic similarity. The two state-of-the-art approaches for estimating and quantifying semantic similarities/relatedness of semantic entities are presented in detail: the first one relies on corpora analysis and is based on Natural Language Processing techniques and semantic models while the second is based on more or less formal, computer-readable and workable forms of knowledge such as semantic networks, thesaurus or ontologies. (...) Beyond a simple inventory and categorization of existing measures, the aim of this monograph is to convey novices as well as researchers of these domains towards a better understanding of semantic similarity estimation and more generally semantic measures. View Show abstract jFuzzyLogic: A Java Library to Design Fuzzy Logic Controllers According to the Standard for Fuzzy Control Programming Article Full-text available * Jun 2013 * Pablo Cingolani * Jesus Alcala-Fdez Fuzzy Logic Controllers are a specific model of Fuzzy Rule Based Systems suitable for engineering applications for which classic control strategies do not achieve good results or for when it is too difficult to obtain a mathematical model. Recently, the International Electrotechnical Commission has published a standard for fuzzy control programming in part 7 of the IEC 61131 norm in order to offer a well defined common understanding of the basic means with which to integrate fuzzy control applications in control systems. In this paper, we introduce an open source Java library called jFuzzyLogic which offers a fully functional and complete implementation of a fuzzy inference system according to this standard, providing a programming interface and Eclipse plugin to easily write and test code for fuzzy control applications. A case study is given to illustrate the use of jFuzzyLogic. View Show abstract Distributed Representations of Words and Phrases and their Compositionality Article Full-text available * Oct 2013 * Adv Neural Inform Process Syst * Tomas Mikolov * Ilya Sutskever * Kai Chen * Jeffrey Dean The recently introduced continuous Skip-gram model is an efficient method for learning high-quality distributed vector representations that capture a large number of precise syntactic and semantic word relationships. In this paper we present several extensions that improve both the quality of the vectors and the training speed. By subsampling of the frequent words we obtain significant speedup and also learn more regular word representations. We also describe a simple alternative to the hierarchical softmax called negative sampling. An inherent limitation of word representations is their indifference to word order and their inability to represent idiomatic phrases. For example, the meanings of "Canada" and "Air" cannot be easily combined to obtain "Air Canada". Motivated by this example, we present a simple method for finding phrases in text, and show that learning good vector representations for millions of phrases is possible. View Show abstract The Google Similarity Distance Article Full-text available * Apr 2007 * Rudi Cilibrasi * Paul M. B. Vitányi Words and phrases acquire meaning from the way they are used in society, from their relative semantics to other words and phrases. For computers, the equivalent of "society" is "database," and the equivalent of "use" is "a way to search the database". We present a new theory of similarity between words and phrases based on information distance and Kolmogorov complexity. To fix thoughts, we use the World Wide Web (WWW) as the database, and Google as the search engine. The method is also applicable to other search engines and databases. This theory is then applied to construct a method to automatically extract similarity, the Google similarity distance, of words and phrases from the WWW using Google page counts. The WWW is the largest database on earth, and the context information entered by millions of independent users averages out to provide automatic semantics of useful quality. We give applications in hierarchical clustering, classification, and language translation. We give examples to distinguish between colors and numbers, cluster names of paintings by 17th century Dutch masters and names of books by English novelists, the ability to understand emergencies and primes, and we demonstrate the ability to do a simple automatic English-Spanish translation. Finally, we use the WordNet database as an objective baseline against which to judge the performance of our method. We conduct a massive randomized trial in binary classification using support vector machines to learn categories based on our Google distance, resulting in an a mean agreement of 87 percent with the expert crafted WordNet categories View Show abstract Multiple Positional Self-Attention Network for Text Classification Article * Apr 2020 * Biyun Dai * Jinlong Li * Ruoyi Xu Self-attention mechanisms have recently caused many concerns on Natural Language Processing (NLP) tasks. Relative positional information is important to self-attention mechanisms. We propose Faraway Mask focusing on the (2m + 1)-gram words and Scaled-Distance Mask putting the logarithmic distance punishment to avoid and weaken the self-attention of distant words respectively. To exploit different masks, we present Positional Self-Attention Layer for generating different Masked-Self-Attentions and a following Position-Fusion Layer in which fused positional information multiplies the Masked-Self-Attentions for generating sentence embeddings. To evaluate our sentence embeddings approach Multiple Positional Self-Attention Network (MPSAN), we perform the comparison experiments on sentiment analysis, semantic relatedness and sentence classification tasks. The result shows that our MPSAN outperforms state-of-the-art methods on five datasets and the test accuracy is improved by 0.81%, 0.6% on SST, CR datasets, respectively. In addition, we reduce training parameters and improve the time efficiency of MPSAN by lowering the dimension number of self-attention and simplifying fusion mechanism. View Show abstract Automatic Design of Semantic Similarity Controllers based on Fuzzy Logics Article * Apr 2019 * EXPERT SYST APPL * Jorge Martinez-Gil * Jose M. Chaves-González Recent advances in machine learning have been able to make improvements over the state-of-the-art regarding semantic similarity measurement techniques. In fact, we have all seen how classical techniques have given way to promising neural techniques. Nonetheless, these new techniques have a weak point: they are hardly interpretable. For this reason, we have oriented our research towards the design of strategies being able to be accurate enough but without sacrificing their interpretability. As a result, we have obtained a strategy for the automatic design of semantic similarity controllers based on fuzzy logics, which are automatically identified using genetic algorithms (GAs). After an exhaustive evaluation using a number of well-known benchmark datasets, we can conclude that our strategy fulfills both expectations: it is able of achieving reasonably good results, and at the same time, it can offer high degrees of interpretability. View Show abstract Distributed Representations of Words and Phrases and their Compositionality Conference Paper * Jan 2013 * Adv Neural Inform Process Syst * Tomas Mikolov * Kai Chen * G.s. Corrado * Jeffrey Dean The recently introduced continuous Skip-gram model is an efficient method for learning high-quality distributed vector representations that capture a large num- ber of precise syntactic and semantic word relationships. In this paper we present several extensions that improve both the quality of the vectors and the training speed. By subsampling of the frequent words we obtain significant speedup and also learn more regular word representations. We also describe a simple alterna- tive to the hierarchical softmax called negative sampling. An inherent limitation of word representations is their indifference to word order and their inability to represent idiomatic phrases. For example, the meanings of “Canada” and “Air” cannot be easily combined to obtain “Air Canada”. Motivated by this example,we present a simplemethod for finding phrases in text, and show that learning good vector representations for millions of phrases is possible. View Show abstract An experiment in linguistic synthesis witch a fuzzy logic controller Article * Jan 1995 * Int J Man Mach Stud * E. Mamdani View LWCR: Multi-Layered Wikipedia representation for Computing word Relatedness Article * Aug 2016 * NEUROCOMPUTING * Mohamed Ben Aouicha * Mohamed Ali Hadj Taieb * Abdelmajid Ben Hamadou The measurement of the semantic relatedness between words has gained increasing interest in several research fields, including cognitive science, artificial intelligence, biology, and linguistics. The development of efficient measures is based on knowledge resources, such as Wikipedia, a huge and living encyclopedia supplied by net surfers. In this paper, we propose a novel approach based on multi-Layered Wikipedia representation for Computing word Relatedness (LWCR) exploiting a weighting scheme based on Wikipedia Category Graph (WCG): Term Frequency-Inverse Category Frequency (tfxicf). Our proposal provides for each category pertaining to the WCG a Category Description Vector (CDV) including the weights of stems extracted from articles assigned to a category. The semantic relatedness degree is computed using the cosine measure between the CDVs assigned to the target words couple. The basic idea is followed by enhancement modules exploiting other Wikipedia features, such as article titles, redirection mechanism, and neighborhood category enrichment, to exploit semantic features and better quantify the semantic relatedness between words. To the best of our knowledge, this is the first attempt to incorporate the WCG-based term-weighting scheme (tfxicf) into computing model of semantic relatedness. It is also the first work that exploits 17 datasets in the assessment process, which are divided into two sets. The first set includes the ones designed for semantic similarity purposes: RG65, MC30, AG203, WP300, SimLexNoun666 and GeReSiD50Sim; the second includes datasets for semantic relatedness evaluation: WordSim353, GM30, Zeigler25, Zeigler30, MTurk287, MTurk771, MEN3000, Rel122, ReWord26, GeReSiD50 and SCWS1229. The found results are compared to WordNet-based measures and distributional measures cosine and PMI performed on Wikipedia articles. Experiments show that our approach provides consistent improvements over the state of the art results on multiple benchmarks. View Show abstract Enriching Word Vectors with Subword Information Article * Jul 2016 * Piotr Bojanowski * Edouard Grave * Armand Joulin * Tomas Mikolov Continuous word representations, trained on large unlabeled corpora are useful for many natural language processing tasks. Many popular models to learn such representations ignore the morphology of words, by assigning a distinct vector to each word. This is a limitation, especially for morphologically rich languages with large vocabularies and many rare words. In this paper, we propose a new approach based on the skip-gram model, where each word is represented as a bag of character n-grams. A vector representation is associated to each character n-gram, words being represented as the sum of these representations. Our method is fast, allowing to train models on large corpus quickly. We evaluate the obtained word representations on five different languages, on word similarity and analogy tasks. View Show abstract Combining local context and WordNet similarity for word sense identification Article * Jan 1998 * Claudia Leacock * Martin Chodorow * C. Fellbaum View Improving Vector Space Word Representations Using Multilingual Correlation Article * Jan 2014 * Manaal Faruqui * Chris Dyer The distributional hypothesis of Harris (1954), according to which the meaning of words is evidenced by the contexts they occur in, has motivated several effective techniques for obtaining vector space semantic representations of words using unannotated text corpora. This paper argues that lexico-semantic content should additionally be invariant across languages and proposes a simple technique based on canonical correlation analysis (CCA) for incorporating multilingual evidence into vectors generated monolingually. We evaluate the resulting word representations on standard lexical semantic evaluation tasks and show that our method produces substantially better semantic representations than monolingual techniques. View Show abstract Improving word representations via global context and multiple word prototypes Conference Paper * Jul 2012 * Eric H. Huang * Richard Socher * Christopher D. Manning * Andrew Y. Ng Unsupervised word representations are very useful in NLP tasks both as inputs to learning algorithms and as extra word features in NLP systems. However, most of these models are built with only local context and one representation per word. This is problematic because words are often polysemous and global context can also provide useful information for learning word meanings. We present a new neural network architecture which 1) learns word embeddings that better capture the semantics of words by incorporating both local and global document context, and 2) accounts for homonymy and polysemy by learning multiple embeddings per word. We introduce a new dataset with human judgments on pairs of words in sentential context, and evaluate our model on it, showing that our model outperforms competitive baselines and other neural language models. View Show abstract An experiment in linguistic synthesis of fuzzy controllers Article * Jan 1974 * E.H. Mamdani * S. Assilian View Fuzzy Identification of Systems and Its Applications to Modeling and Control Article * Jan 1985 * Tomohiro Takagi * Michio Sugeno A mathematical tool to build a fuzzy model of a system where fuzzy implications and reasoning are used is presented. The premise of an implication is the description of fuzzy subspace of inputs and its consequence is a linear input-output relation. The method of identification of a system using its input-output data is then shown. Two applications of the method to industrial processes are also discussed: a water cleaning process and a converter in a steel-making process. View Show abstract Contextual Correlates of Semantic Similarity Article * Jan 1991 * LANG COGNITIVE PROC * George A. Miller * Walter G. Charles The relationship between semantic and contextual similarity is investigated for pairs of nouns that vary from high to low semantic similarity. Semantic similarity is estimated by subjective ratings; contextual similarity is estimated by the method of sorting sentential contexts. The results show an inverse linear relationship between similarity of meaning and the discriminability of contexts. This relation, is obtained for two separate corpora of sentence contexts. It is concluded that, on average, for words in the same language drawn from the same syntactic and semantic categories, the more often two words can be substituted into the same contexts the more similar in meaning they are judged to be. View Show abstract Sugeno, M.: Fuzzy Identification of Systems and its Applications to Modeling and Control. IEEE Transactions on Systems, Man, and Cybernetics SMC-15(1), 116-132 Article * Jan 1985 * IEEE Trans Syst Man Cybern Syst Hum * Tomohiro Takagi * Michio Sugeno A mathematical tool to build a fuzzy model of a system where fuzzy implications and reasoning are used is presented. The premise of an implication is the description of fuzzy subspace of inputs and its consequence is a linear input-output relation. The method of identification of a system using its input-output data is then shown. Two applications of the method to industrial processes are also discussed: a water cleaning process and a converter in a steel-making process. View Show abstract A Historical Review of Evolutionary Learning Methods for Mamdani-type Fuzzy Rule-based Systems: Designing Interpretable Genetic Fuzzy Systems Article * Sep 2011 * INT J APPROX REASON * Oscar Cordon The need for trading off interpretability and accuracy is intrinsic to the use of fuzzy systems. The obtaining of accurate but also human-comprehensible fuzzy systems played a key role in Zadeh and Mamdani’s seminal ideas and system identification methodologies. Nevertheless, before the advent of soft computing, accuracy progressively became the main concern of fuzzy model builders, making the resulting fuzzy systems get closer to black-box models such as neural networks. Fortunately, the fuzzy modeling scientific community has come back to its origins by considering design techniques dealing with the interpretability-accuracy tradeoff. In particular, the use of genetic fuzzy systems has been widely extended thanks to their inherent flexibility and their capability to jointly consider different optimization criteria. The current contribution constitutes a review on the most representative genetic fuzzy systems relying on Mamdani-type fuzzy rule-based systems to obtain interpretable linguistic fuzzy models with a good accuracy. View Show abstract An Experiment in Linguistic Synthesis with a Fuzzy Logic Controller Article * Jan 1975 * E.H. Mamdani * S. Assilian This paper describes an experiment on the “linguistic” synthesis of a controller for a model industrial plant (a steam engine), Fuzzy logic is used to convert heuristic control rues stated by a human operator into an automatic control strategy. The experiment was initiated to investigate the possibility of human interaction with a learning controller. However, the control strategy set up linguistically proved to be far better than expected in its own right, and the basic experiment of linguistic control synthesis in a non-learning controller is reported here. View Show abstract Semantic Similarity in a Taxonomy: An Information Based Measure and Its Application to Problems of Ambiguity in Natural Language Article * Jul 1999 * JAIR * Ps Resnik This article presents a measure of semantic similarity in an IS-A taxonomy based on the notion of shared information content. Experimental evaluation against a benchmark set of human similarity judgments demonstrates that the measure performs better than the traditional edge-counting approach. The article presents algorithms that take advantage of taxonomic similarity in resolving syntactic and semantic ambiguity, along with experimental results demonstrating their effectiveness. View Show abstract Automatic Generation of Fuzzy Rule-based Models from Data by Genetic Algorithms Article * Mar 2003 * INFORM SCIENCES * Plamen P. Angelov * R.A. Buswell A methodology for the encoding of the chromosome of a genetic algorithm (GA) is described in the paper. The encoding procedure is applied to the problem of automatically generating fuzzy rule-based models from data. Models generated by this approach have much of the flexibility of black-box methods, such as neural networks. In addition, they implicitly express information about the process being modelled through the linguistic terms associated with the rules. They can be applied to problems that are too complex to model in a first principles sense and can reduce the computational overhead when compared to established first principles based models. The encoding mechanism allows the rule base structure and parameters of the fuzzy model to be estimated simultaneously from data. The principle advantage is the preservation of the linguistic concept without the need to consider the entire rule base. The GA searches for the optimum solution given a comparatively small number of rules compared to all possible. This minimises the computational demand of the model generation and allows problems with realistic dimensions to be considered. A further feature is that the rules are extracted from the data without the need to establish any information about the model structure a priori. The implementation of the algorithm is described and the approach is applied to the modelling of components of heating ventilating and air-conditioning systems. View Show abstract Errata to "Flexible neuro-fuzzy systems". Article * May 2003 * Leszek Rutkowski * Krzysztof Cpałka In this paper, we derive new neuro-fuzzy structures called flexible neuro-fuzzy inference systems or FLEXNFIS. Based on the input-output data, we learn not only the parameters of the membership functions but also the type of the systems (Mamdani or logical). Moreover, we introduce: 1) softness to fuzzy implication operators, to aggregation of rules and to connectives of antecedents; 2) certainty weights to aggregation of rules and to connectives of antecedents; and 3) parameterized families of T-norms and S-norms to fuzzy implication operators, to aggregation of rules and to connectives of antecedents. Our approach introduces more flexibility to the structure and design of neuro-fuzzy systems. Through computer simulations, we show that Mamdani-type systems are more suitable to approximation problems, whereas logical-type systems may be preferred for classification problems. View Show abstract An Approach for Measuring Semantic Similarity between Words Using Multiple Information Sources Article * Aug 2003 * Yuhua Li * Zuhair Bandar * David McLean Full-text of this article is not available in this e-prints service. This article was originally published following peer-review in IEEE Transactions on Knowledge and Data Engineering, published by and copyright IEEE. Semantic similarity between words is becoming a generic problem for many applications of computational linguistics and artificial intelligence. This paper explores the determination of semantic similarity by a number of information sources, which consist of structural semantic information from a lexical taxonomy and information content from a corpus. To investigate how information sources could be used effectively, a variety of strategies for using various possible information sources are implemented. A new measure is then proposed which combines information sources nonlinearly. Experimental evaluation against a benchmark set of human similarity ratings demonstrates that the proposed measure significantly outperforms traditional similarity measures. View Show abstract Errata to "Flexible neuro-fuzzy systems". Article * Feb 2003 * Leszek Rutkowski * Krzysztof Cpałka First Page of the Article View Show abstract An Information-Theoretic Definition of Similarity Article * Aug 1998 * Dekang Lin Similarity is an important and widely used concept. Previous definitions of similarity are tied to a particular application or a form of knowledge representation. We present an informationtheoretic definition of similarity that is applicable as long as there is a probabilistic model. We demonstrate how our definition can be used to measure the similarity in a number of different domains. View Show abstract Semantic Similarity Based on Corpus Statistics and Lexical Taxonomy Article * Oct 1997 * Jay J. Jiang * David W. Conrath This paper presents a new approach for measuring semantic similarity/distance between words and concepts. It combines a lexical taxonomy structure with corpus statistical information so that the semantic distance between nodes in the semantic space constructed by the taxonomy can be better quantified with the computational evidence derived from a distributional analysis of corpus data. Specifically, the proposed measure is a combined approach that inherits the edge-based approach of the edge counting scheme, which is then enhanced by the node-based approach of the information content calculation. When tested on a common data set of word pair similarity ratings, the proposed approach outperforms other computational models. It gives the highest correlation value (r = 0.828) with a benchmark based on human similarity judgements, whereas an upper bound (r = 0.885) is observed when human subjects replicate the same task. View Show abstract Bert: pre-training of deep bidirectional transformers for language understanding * Jan 2018 * J Devlin * M W Chang * K Lee * K Toutanova Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: Bert: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018) The Thirty-Second Innovative Applications of Artificial Intelligence Conference * Feb 2020 * 7610-7617 * B Dai * J Li * R Xu Dai, B., Li, J., Xu, R.: Multiple positional self-attention network for text classification. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, New York, NY, USA, 7-12 February 2020, pp. 7610-7617. AAAI Press (2020) RECOMMENDED PUBLICATIONS Discover more Article Full-text available A NOVEL SPLIT-SIM APPROACH FOR EFFICIENT IMAGE RETRIEVAL April 2022 · Multimedia Systems * Aiswarya S Kumar * [...] * Jyothisha Nair Recent advancements in computer vision have given image understanding and retrieval a new face. But the current image retrieval systems do not meet users’ demand in retrieving alike and meaningful images with respect to the query. This paper proposes a novel method for image retrieval using scene graphs. We generate scene graphs for images and each one of them is a collection of objects (apple, ... [Show full abstract] table), attributes ((apple, red), (table, wooden)) and relationships (apple, on, table). When an image is given as a query, images whose scene graphs are similar to that of the query image’s scene graph, are retrieved. An algorithm called SPLIT–SIM is proposed to find similarity between two scene graphs, which grant rewards for similar objects, attributes and relationships. Based on the rewards awarded, a final ranking list of images is generated for retrieval. The proposed algorithm was evaluated at three levels: object level, attribute level as well as relation level and the results demonstrate the efficiency of the proposed semantic similarity measure, significantly improving existing similarity measures. View full-text Article Full-text available HIERARCHY-BASED SEMANTIC EMBEDDINGS FOR SINGLE-VALUED & MULTI-VALUED CATEGORICAL VARIABLES June 2022 · Journal of Intelligent Information Systems * Summaya Mumtaz * [...] * Martin Giese In low-resource domains, it is challenging to achieve good performance using existing machine learning methods due to a lack of training data and mixed data types (numeric and categorical). In particular, categorical variables with high cardinality pose a challenge to machine learning tasks such as classification and regression because training requires sufficiently many data points for the ... [Show full abstract] possible values of each variable. Since interpolation is not possible, nothing can be learned for values not seen in the training set. This paper presents a method that uses prior knowledge of the application domain to support machine learning in cases with insufficient data. We propose to address this challenge by using embeddings for categorical variables that are based on an explicit representation of domain knowledge (KR), namely a hierarchy of concepts. Our approach is to 1. define a semantic similarity measure between categories, based on the hierarchy—we propose a purely hierarchy-based measure, but other similarity measures from the literature can be used—and 2. use that similarity measure to define a modified one-hot encoding. We propose two embedding schemes for single-valued and multi-valued categorical data. We perform experiments on three different use cases. We first compare existing similarity approaches with our approach on a word pair similarity use case. This is followed by creating word embeddings using different similarity approaches. A comparison with existing methods such as Google, Word2Vec and GloVe embeddings on several benchmarks shows better performance on concept categorisation tasks when using knowledge-based embeddings. The third use case uses a medical dataset to compare the performance of semantic-based embeddings and standard binary encodings. Significant improvement in performance of the downstream classification tasks is achieved by using semantic information. View full-text Article Full-text available EMBEDDINGS EVALUATION USING A NOVEL MEASURE OF SEMANTIC SIMILARITY March 2022 · Cognitive Computation * Anna Giabelli * Lorenzo Malandri * Fabio Mercorio * [...] * Navid Nobani Lexical taxonomies and distributional representations are largely used to support a wide range of NLP applications, including semantic similarity measurements. Recently, several scholars have proposed new approaches to combine those resources into unified representation preserving distributional and knowledge-based lexical features. In this paper, we propose and implement TaxoVec, a novel ... [Show full abstract] approach to selecting word embeddings based on their ability to preserve taxonomic similarity. In TaxoVec, we first compute the pairwise semantic similarity between taxonomic words through a new measure we previously developed, the Hierarchical Semantic Similarity (HSS), which we show outperforms previous measures on several benchmark tasks. Then, we train several embedding models on a text corpus and select the best model, that is, the model that maximizes the correlation between the HSS and the cosine similarity of the pair of words that are in both the taxonomy and the corpus. To evaluate TaxoVec, we repeat the embedding selection process using three other semantic similarity benchmark measures. We use the vectors of the four selected embeddings as machine learning model features to perform several NLP tasks. The performances of those tasks constitute an extrinsic evaluation of the criteria for the selection of the best embedding (i.e. the adopted semantic similarity measure). Experimental results show that (i) HSS outperforms state-of-the-art measures for measuring semantic similarity in taxonomy on a benchmark intrinsic evaluation and (ii) the embedding selected through TaxoVec achieves a clear victory against embeddings selected by the competing measures on benchmark NLP tasks. We implemented the HSS, together with other benchmark measures of semantic similarity, as a full-fledged Python package called TaxoSS, whose documentation is available at https://pypi.org/project/TaxoSS. View full-text Article Full-text available GONTOSIM: A SEMANTIC SIMILARITY MEASURE BASED ON LCA AND COMMON DESCENDANTS March 2022 · Scientific Reports * Amna Binte Kamran * [...] * Hammad Naveed The Gene Ontology (GO) is a controlled vocabulary that captures the semantics or context of an entity based on its functional role. Biomedical entities are frequently compared to each other to find similarities to help in data annotation and knowledge transfer. In this study, we propose GOntoSim, a novel method to determine the functional similarity between genes. GOntoSim quantifies the ... [Show full abstract] similarity between pairs of GO terms, by taking the graph structure and the information content of nodes into consideration. Our measure quantifies the similarity between the ancestors of the GO terms accurately. It also takes into account the common children of the GO terms. GOntoSim is evaluated using the entire Enzyme Dataset containing 10,890 proteins and 97,544 GO annotations. The enzymes are clustered and compared with the Gold Standard EC numbers. At level 1 of the EC Numbers for Molecular Function, GOntoSim achieves a purity score of 0.75 as compared to 0.47 and 0.51 GOGO and Wang. GOntoSim can handle the noisy IEA annotations. We achieve a purity score of 0.94 in contrast to 0.48 for both GOGO and Wang at level 1 of the EC Numbers with IEA annotations. GOntoSim can be freely accessed at ( http://www.cbrlab.org/GOntoSim.html ). View full-text Last Updated: 05 Jul 2022 LOOKING FOR THE FULL-TEXT? You can request the full-text of this chapter directly from the authors on ResearchGate. Request full-text Already a member? Log in ResearchGate iOS App Get it from the App Store now. Install Keep up with your stats and more Access scientific knowledge from anywhere or Discover by subject area * Recruit researchers * Join for free * Login Email Tip: Most researchers use their institutional email address as their ResearchGate login PasswordForgot password? Keep me logged in Log in or Continue with Google Welcome back! Please log in. Email · Hint Tip: Most researchers use their institutional email address as their ResearchGate login PasswordForgot password? Keep me logged in Log in or Continue with Google No account? Sign up Company About us News Careers Support Help Center Business solutions Advertising Recruiting © 2008-2022 ResearchGate GmbH. All rights reserved. * Terms * Privacy * Copyright * Imprint