www.researchgate.net
Open in
urlscan Pro
2606:4700::6811:2169
Public Scan
URL:
https://www.researchgate.net/publication/257391740_Evolutionary_algorithm_based_on_different_semantic_similarity_functions_fo...
Submission: On August 26 via manual from AT
Submission: On August 26 via manual from AT
Form analysis
3 forms found in the DOMGET search
<form method="GET" action="search" class="lite-page__header-search-input-wrapper"><input type="hidden" name="context" readonly="" value="publicSearchHeader"><input placeholder="Search for publications, researchers, or questions" name="q"
autocomplete="off" class="lite-page__header-search-input"><button
class="nova-c-button nova-c-button--align-center nova-c-button--radius-m nova-c-button--size-m nova-c-button--color-green nova-c-button--theme-bare nova-c-button--width-square lite-page__header-search-button" type="submit"><span
class="nova-c-button__label"><svg aria-hidden="true" class="nova-e-icon nova-e-icon--size-s nova-e-icon--theme-bare nova-e-icon--color-inherit nova-e-icon--luminosity-medium">
<use xlink:href="/m/4226152288051846/images/icons/nova/icon-stack-s.svg#magnifier-s"></use>
</svg></span></button></form>
Name: loginForm — POST https://www.researchgate.net/login?_sg=Gy3Xp0Y7VTUu35YmUkK3IW7yKAnheTfGM-lbXD8JLsIUcb61FhJpuzoaQDTixDeXqugbDtxwWkTxjw
<form method="post" action="https://www.researchgate.net/login?_sg=Gy3Xp0Y7VTUu35YmUkK3IW7yKAnheTfGM-lbXD8JLsIUcb61FhJpuzoaQDTixDeXqugbDtxwWkTxjw" name="loginForm" id="headerLoginForm"><input type="hidden" name="request_token"
value="aad-cmIZAWbd9q5Y9tvghRnpIyREcWAenKW/MvhhGpycOoGNM0mutpIz81E7H/aS6UFRgZe+XDOTBkRHoCxUREbYEOX/LnHeNgphDkMaQoLsZwMOICjlGwDBjcETip2vDHju1w026hBCKjJN9zfKe68xo92jPw8VZ9IER89cDlg0U7hC5XI9S7cioOkR1Rjcs4eKYM+hkqfzRLNZh7RKXiyEslsQK+pYQklfrTN146ZcqG6jNZrzQZuk9oxkjIXWtYX4lrHkS0Q9FNy+jUq8uCQ="><input
type="hidden" name="urlAfterLogin" value="publication/257391740_Evolutionary_algorithm_based_on_different_semantic_similarity_functions_for_synonym_recognition_in_the_biomedical_domain"><input type="hidden" name="invalidPasswordCount"
value="0"><input type="hidden" name="headerLogin" value="yes">
<div class="lite-page__header-login-item"><label class="nova-e-text nova-e-text--size-m nova-e-text--family-sans-serif nova-e-text--spacing-none nova-e-text--color-inherit lite-page__header-login-label" for="input-header-login">Email <div
class="lite-page-tooltip "><svg aria-hidden="true" class="nova-e-icon nova-e-icon--size-s nova-e-icon--theme-bare nova-e-icon--color-inherit nova-e-icon--luminosity-medium">
<use xlink:href="/m/4226152288051846/images/icons/nova/icon-stack-s.svg#info-circle-s"></use>
</svg>
<div class="lite-page-tooltip__content lite-page-tooltip__content--above">
<div class="nova-e-text nova-e-text--size-s nova-e-text--family-sans-serif nova-e-text--spacing-none nova-e-text--color-inherit"><b>Tip:</b> Most researchers use their institutional email address as their ResearchGate login</div>
<div class="lite-page-tooltip__arrow lite-page-tooltip__arrow--above">
<div class="lite-page-tooltip__arrow-tip"></div>
</div>
</div>
</div></label></div><input type="email" required="" id="input-header-login" name="login" autocomplete="email" tabindex="1" placeholder=""
class="nova-e-input__field nova-e-input__field--size-m lite-page__header-login-item nova-e-input__ambient nova-e-input__ambient--theme-default">
<div class="lite-page__header-login-item"><label class="lite-page__header-login-label"
for="input-header-password">Password</label><a class="nova-e-link nova-e-link--color-blue nova-e-link--theme-bare lite-page__header-login-forgot" href="application.LostPassword.html">Forgot password?</a></div><input type="password" required=""
id="input-header-password" name="password" autocomplete="current-password" tabindex="2" placeholder=""
class="nova-e-input__field nova-e-input__field--size-m lite-page__header-login-item nova-e-input__ambient nova-e-input__ambient--theme-default">
<div><label class="nova-e-checkbox lite-page__header-login-checkbox"><input type="checkbox" class="nova-e-checkbox__input" aria-invalid="false" name="setLoginCookie" tabindex="3" value="yes" checked=""><span class="nova-e-checkbox__label"> Keep me
logged in</span></label></div>
<div class="nova-l-flex__item nova-l-flex nova-l-flex--gutter-m nova-l-flex--direction-column@s-up nova-l-flex--align-items-stretch@s-up nova-l-flex--justify-content-center@s-up nova-l-flex--wrap-nowrap@s-up">
<div class="nova-l-flex__item"><button class="nova-c-button nova-c-button--align-center nova-c-button--radius-m nova-c-button--size-m nova-c-button--color-blue nova-c-button--theme-solid nova-c-button--width-full" type="submit" tabindex="4"><span
class="nova-c-button__label">Log in</span></button></div>
<div class="nova-l-flex__item nova-l-flex__item--align-self-center@s-up">
<div class="nova-e-text nova-e-text--size-s nova-e-text--family-sans-serif nova-e-text--spacing-none nova-e-text--color-inherit">or</div>
</div>
<div class="nova-l-flex__item">
<div
class="nova-legacy-l-flex__item nova-legacy-l-flex nova-legacy-l-flex--gutter-m nova-legacy-l-flex--direction-column@s-up nova-legacy-l-flex--align-items-center@s-up nova-legacy-l-flex--justify-content-flex-start@s-up nova-legacy-l-flex--wrap-nowrap@s-up">
<div class="nova-legacy-l-flex__item">
<a href="connector/google"><div style="display:inline-block;width:247px;height:40px;text-align:left;border-radius:2px;white-space:nowrap;color:#444;background:#4285F4"><span style="margin:1px 0 0 1px;display:inline-block;vertical-align:middle;width:38px;height:38px;background:url('images/socialNetworks/logos-official-2019-05/google-logo.svg') transparent 50% no-repeat"></span><span style="color:#FFF;display:inline-block;vertical-align:middle;padding-left:15px;padding-right:42px;font-size:16px;font-family:Roboto, sans-serif">Continue with Google</span></div></a>
</div>
</div>
</div>
</div>
</form>
Name: loginForm — POST https://www.researchgate.net/login?_sg=Gy3Xp0Y7VTUu35YmUkK3IW7yKAnheTfGM-lbXD8JLsIUcb61FhJpuzoaQDTixDeXqugbDtxwWkTxjw
<form method="post" action="https://www.researchgate.net/login?_sg=Gy3Xp0Y7VTUu35YmUkK3IW7yKAnheTfGM-lbXD8JLsIUcb61FhJpuzoaQDTixDeXqugbDtxwWkTxjw" name="loginForm" id="modalLoginForm"><input type="hidden" name="request_token"
value="aad-cmIZAWbd9q5Y9tvghRnpIyREcWAenKW/MvhhGpycOoGNM0mutpIz81E7H/aS6UFRgZe+XDOTBkRHoCxUREbYEOX/LnHeNgphDkMaQoLsZwMOICjlGwDBjcETip2vDHju1w026hBCKjJN9zfKe68xo92jPw8VZ9IER89cDlg0U7hC5XI9S7cioOkR1Rjcs4eKYM+hkqfzRLNZh7RKXiyEslsQK+pYQklfrTN146ZcqG6jNZrzQZuk9oxkjIXWtYX4lrHkS0Q9FNy+jUq8uCQ="><input
type="hidden" name="urlAfterLogin" value="publication/257391740_Evolutionary_algorithm_based_on_different_semantic_similarity_functions_for_synonym_recognition_in_the_biomedical_domain"><input type="hidden" name="invalidPasswordCount"
value="0"><input type="hidden" name="modalLogin" value="yes">
<div class="nova-l-form-group nova-l-form-group--layout-stack nova-l-form-group--gutter-s">
<div class="nova-l-form-group__item nova-l-form-group__item--width-auto@m-up"><label class="nova-e-text nova-e-text--size-m nova-e-text--family-sans-serif nova-e-text--spacing-xxs nova-e-text--color-inherit nova-e-label"
for="input-modal-login-label"><span class="nova-e-label__text">Email <div class="lite-page-tooltip "><span class="nova-e-text nova-e-text--size-m nova-e-text--family-sans-serif nova-e-text--spacing-none nova-e-text--color-grey-500">·
Hint</span>
<div class="lite-page-tooltip__content lite-page-tooltip__content--above">
<div class="nova-e-text nova-e-text--size-s nova-e-text--family-sans-serif nova-e-text--spacing-none nova-e-text--color-inherit"><b>Tip:</b> Most researchers use their institutional email address as their ResearchGate login</div>
<div class="lite-page-tooltip__arrow lite-page-tooltip__arrow--above">
<div class="lite-page-tooltip__arrow-tip"></div>
</div>
</div>
</div></span></label><input type="email" required="" id="input-modal-login" name="login" autocomplete="email" tabindex="1" placeholder="Enter your email"
class="nova-e-input__field nova-e-input__field--size-m nova-e-input__ambient nova-e-input__ambient--theme-default"></div>
<div class="nova-l-form-group__item nova-l-form-group__item--width-auto@m-up">
<div class="lite-page-modal__forgot"><label class="nova-e-text nova-e-text--size-m nova-e-text--family-sans-serif nova-e-text--spacing-xxs nova-e-text--color-inherit nova-e-label" for="input-modal-password-label"><span
class="nova-e-label__text">Password</span></label><a class="nova-e-link nova-e-link--color-blue nova-e-link--theme-bare lite-page-modal__forgot-link" href="application.LostPassword.html">Forgot password?</a></div><input type="password"
required="" id="input-modal-password" name="password" autocomplete="current-password" tabindex="2" placeholder="" class="nova-e-input__field nova-e-input__field--size-m nova-e-input__ambient nova-e-input__ambient--theme-default">
</div>
<div class="nova-l-form-group__item nova-l-form-group__item--width-auto@m-up">
<div><label class="nova-e-checkbox"><input type="checkbox" class="nova-e-checkbox__input" aria-invalid="false" checked="" value="yes" name="setLoginCookie" tabindex="3"><span class="nova-e-checkbox__label"> Keep me logged in</span></label>
</div>
</div>
<div class="nova-l-form-group__item nova-l-form-group__item--width-auto@m-up"><button
class="nova-c-button nova-c-button--align-center nova-c-button--radius-m nova-c-button--size-m nova-c-button--color-blue nova-c-button--theme-solid nova-c-button--width-full" type="submit" tabindex="4"><span class="nova-c-button__label">Log
in</span></button></div>
<div class="nova-l-form-group__item nova-l-form-group__item--width-auto@m-up">
<div class="nova-l-flex__item nova-l-flex nova-l-flex--gutter-m nova-l-flex--direction-column@s-up nova-l-flex--align-items-center@s-up nova-l-flex--justify-content-flex-start@s-up nova-l-flex--wrap-nowrap@s-up">
<div class="nova-l-flex__item">
<div class="nova-e-text nova-e-text--size-s nova-e-text--family-sans-serif nova-e-text--spacing-none nova-e-text--color-inherit">or</div>
</div>
<div class="nova-l-flex__item">
<div
class="nova-legacy-l-flex__item nova-legacy-l-flex nova-legacy-l-flex--gutter-m nova-legacy-l-flex--direction-column@s-up nova-legacy-l-flex--align-items-center@s-up nova-legacy-l-flex--justify-content-flex-start@s-up nova-legacy-l-flex--wrap-nowrap@s-up">
<div class="nova-legacy-l-flex__item">
<a href="connector/google"><div style="display:inline-block;width:247px;height:40px;text-align:left;border-radius:2px;white-space:nowrap;color:#444;background:#4285F4"><span style="margin:1px 0 0 1px;display:inline-block;vertical-align:middle;width:38px;height:38px;background:url('images/socialNetworks/logos-official-2019-05/google-logo.svg') transparent 50% no-repeat"></span><span style="color:#FFF;display:inline-block;vertical-align:middle;padding-left:15px;padding-right:42px;font-size:16px;font-family:Roboto, sans-serif">Continue with Google</span></div></a>
</div>
</div>
</div>
<div class="nova-l-flex__item">
<div class="nova-e-text nova-e-text--size-s nova-e-text--family-sans-serif nova-e-text--spacing-none nova-e-text--color-grey-500 lite-page-center">No account?
<a class="nova-e-link nova-e-link--color-blue nova-e-link--theme-decorated" href="signup.SignUp.html?hdrsu=1&_sg%5B0%5D=KuTNQ_Y84LJaxsjXg3l9iaokGDjG9zhkiVk7YZ5pkaSUOOI5Cp3gykbi9pqab950i61Cnkklzf3yZzyWUhx4oTD664E">Sign up</a></div>
</div>
</div>
</div>
</div>
</form>
Text Content
Article EVOLUTIONARY ALGORITHM BASED ON DIFFERENT SEMANTIC SIMILARITY FUNCTIONS FOR SYNONYM RECOGNITION IN THE BIOMEDICAL DOMAIN * January 2013 * Knowledge-Based Systems 37:62–69 DOI:10.1016/j.knosys.2012.07.005 * Project: Semantic Similarity Measurement Authors: Jose M. Chaves-González * Universidad de Extremadura Jorge Martinez-Gil * Software Competence Center Hagenberg Request full-text PDF To read the full-text of this research, you can request a copy directly from the authors. Request full-text Download citation Copy link Link copied Request full-text Download citation Copy link Link copied To read the full-text of this research, you can request a copy directly from the authors. Citations (21) References (32) ABSTRACT One of the most challenging problems in the semantic web field consists of computing the semantic similarity between different terms. The problem here is the lack of accurate domain-specific dictionaries, such as biomedical, financial or any other particular and dynamic field. In this article we propose a new approach which uses different existing semantic similarity methods to obtain precise results in the biomedical domain. Specifically, we have developed an evolutionary algorithm which uses information provided by different semantic similarity metrics. Our results have been validated against a variety of biomedical datasets and different collections of similarity functions. The proposed system provides very high quality results when compared against similarity ratings provided by human experts (in terms of Pearson correlation coefficient) surpassing the results of other relevant works previously published in the literature. Discover the world's research * 20+ million members * 135+ million publications * 700k+ research projects Join for free NO FULL-TEXT AVAILABLE To read the full-text of this research, you can request a copy directly from the authors. Request full-text PDF CITATIONS (21) REFERENCES (32) ... This aggregation is often computed by means of heuristic and meta-heuristic functions. 7 Our hypothesis is that these methods are not optimal, and therefore, can be improved. The reason is that these methods are not able to deal with the non-stochastic uncertainty induced from subjectivity, vagueness and imprecision from the human language (even when naming concepts from the biomedical field). ... ... In the past, there have been great efforts in finding new semantic similarity measures mainly due to it be of fundamental importance in many application-oriented fields of the computer science. 7 The reason is that these techniques can be used for going beyond the literal lexical match of many kinds of text expressions. Past works in this field include the automatic processing of text and email messages, 14 healthcare dialogue systems, 5 question answering, 21 and sentence fusion. ... ... For the rest of this work, we are considering similarity measures, since most of authors in this field uses semantic similarity measures for identifying semantic correspondences. 7 Currently, the semantic similarity for a pair of biological expressions is computed using an aggregation function of the individual semantic similarity values. This approach has proven to achieve very good results. ... Accurate Semantic Similarity Measurement of Biomedical Nomenclature by Means of Fuzzy Logic Article * Apr 2016 * INT J UNCERTAIN FUZZ * Jorge Martinez-Gil Semantic similarity measurement of biomedical nomenclature aims to determine the likeness between two biomedical expressions that use different lexicographies for representing the same real biomedical concept. There are many semantic similarity measures for trying to address this issue, many of them have represented an incremental improvement over the previous ones. In this work, we present yet another incremental solution that is able to outperform existing approaches by using a sophisticated aggregation method based on fuzzy logic. Results show us that our strategy is able to consistently beat existing approaches when solving well-known biomedical benchmark data sets. View Show abstract ... The relevant works conduct semantic fusion from different aspects. We define their classifications as vector-level [4,7], metriclevel [1,2,6,42], and model-level [10,41,43] according to the increasing granularity of semantic fusion between corpus and ontology. ... ... Alves et al. proposed a regression function where the lexical similarity, syntactic similarity, semantic similarity, and distributional similarity are input as factors [2]. Chaves-González and MartíNez-Gil used evolutionary algorithm to optimize the unsupervised combination of various WordNet-based similarity metrics [6]. Yih and Qazvinian averaged the similarity results derived from heterogeneous vector space models on Wikipedia, web search, thesaurus, and WordNet, respectively [42]. ... Joint semantic similarity assessment with raw corpus and structured ontology for semantic-oriented service discovery Article Full-text available * Jun 2016 * Wei Lu * Yuanyuan Cai * Xiaoping Che * Yuxun Lu Semantic-oriented service matching is one of the challenges in automatic Web service discovery. Service users may search for Web services using keywords and receive the matching services in terms of their functional profiles. A number of approaches to computing the semantic similarity between words have been developed to enhance the precision of matchmaking, which can be classified into ontology-based and corpus-based approaches. The ontology-based approaches commonly use the differentiated concept information provided by a large ontology for measuring lexical similarity with word sense disambiguation. Nevertheless, most of the ontologies are domain-special and limited to lexical coverage, which have a limited applicability. On the other hand, corpus-based approaches rely on the distributional statistics of context to represent per word as a vector and measure the distance of word vectors. However, the polysemous problem may lead to a low computational accuracy. In this paper, in order to augment the semantic information content in word vectors, we propose a multiple semantic fusion (MSF) model to generate sense-specific vector per word. In this model, various semantic properties of the general-purpose ontology WordNet are integrated to fine-tune the distributed word representations learned from corpus, in terms of vector combination strategies. The retrofitted word vectors are modeled as semantic vectors for estimating semantic similarity. The MSF model-based similarity measure is validated against other similarity measures on multiple benchmark datasets. Experimental results of word similarity evaluation indicate that our computational method can obtain higher correlation coefficient with human judgment in most cases. Moreover, the proposed similarity measure is demonstrated to improve the performance of Web service matchmaking based on a single semantic resource. Accordingly, our findings provide a new method and perspective to understand and represent lexical semantics. View Show abstract ... These measures are: (a) Hirst, (b) Jiang, (c) Resnik, (d) Leacock and (e) Lin. A detailed description of these measures is out the scope of this work, but some explanatory insights are described in Chaves-Gonzalez and Martinez-Gil (2013). For us, it is enough to know these single measures are the state-of-the-art in the field of semantic similarity. ... ... Table 6 shows the results for the aggregation of the different semantic similarity measures based on cutting-edge similarity measures from the biomedical domain. Explaining each of them is out of the scope of this work, but a detailed description can be found in Chaves-Gonzalez and Martinez-Gil (2013). Once again, the strategy CoTo (Consensus or Trade-Off) is able to beat all the single measures as well as all the compensative operators by a wide margin. ... CoTO: A Novel Approach for Fuzzy Aggregation of Semantic Similarity Measures Article Full-text available * Feb 2016 * COGN SYST RES * Jorge Martinez-Gil Semantic similarity measurement aims to determine the likeness between two text expressions that use different lexicographies for representing the same real object or idea. There are a lot of semantic similarity measures for addressing this problem. However, the best results have been achieved when aggregating a number of simple similarity measures. This means that after the various similarity values have been calculated, the overall similarity for a pair of text expressions is computed using an aggregation function of these individual semantic similarity values. This aggregation is often computed by means of statistical functions. In this work, we present CoTO (Consensus or Trade-Off) a solution based on fuzzy logic that is able to outperform these traditional approaches. View Show abstract ... In this way, when analyzing the curriculum of job candidates, this kind of techniques can operate at the conceptual level when comparing specific terms (e.g., Finance) also yields matches on related terms (e.g., Economics, Economic Affairs, Financial Affairs, etc.). As another example, in the healthcare field, an expert on the treatment of cancer could also be considered as an expert on oncology, lymphoma or tumor treatment, etc. [9]. The potential of this kind of techniques is that it can support Human Resource Management when leading to a more quickly and easily cut through massive volumes of potential candidate information, but without giving up the way human experts take decisions in the real world. ... A Smart Approach for Matching, Learning and Querying Information from the Human Resources Domain Conference Paper Full-text available * Aug 2016 * Jorge Martinez-Gil * Alejandra Lorena Paoletti * Klaus-dieter Schewe We face the complex problem of timely, accurate and mutually satisfactory mediation between job offers and suitable applicant profiles by means of semantic processing techniques. In fact, this problem has become a major challenge for all public and private recruitment agencies around the world as well as for employers and job seekers. It is widely agreed that smart algorithms for automatically matching, learning, and querying job offers and candidate profiles will provide a key technology of high importance and impact and will help to counter the lack of skilled labor and/or appropriate job positions for unemployed people. Additionally, such a framework can support global matching aiming at finding an optimal allocation of job seekers to available jobs, which is relevant for independent employment agencies, e.g. in order to reduce unemployment. View Show abstract ... The SR is also exploited in the biomedical domain. For example, authors in [62] have combined a set of semantic similarity methods in order to recognize the synonymy. Actually, the SR has been used to provide optimal centroids for datasets [63] (i.e, those that minimize the distance to all elements pertaining to the dataset). ... Computing semantic relatedness using Wikipedia features Article Full-text available * Sep 2013 * KNOWL-BASED SYST * Mohamed Ali Hadj Taieb * Mohamed Ben Aouicha * Abdelmajid Ben Hamadou Measuring semantic relatedness is a critical task in many domains such as psychology, biology, linguistics, cognitive science and artificial intelligence. In this paper, we propose a novel system for computing semantic relatedness between words. Recent approaches have exploited Wikipedia as a huge semantic resource that showed good performances. Therefore, we utilized the Wikipedia features (articles, categories, Wikipedia category graph and redirection) in a system combining this Wikipedia semantic information in its different components. The approach is preceded by a pre-processing step to provide for each category pertaining to the Wikipedia category graph a semantic description vector including the weights of stems extracted from articles assigned to the target category. Next, for each candidate word, we collect its categories set using an algorithm for categories extraction from the Wikipedia category graph. Then, we compute the semantic relatedness degree using existing vector similarity metrics (Dice, Overlap and Cosine) and a new proposed metric that performed well as cosine formula. The basic system is followed by a set of modules in order to exploit Wikipedia features to quantify better as possible the semantic relatedness between words. We evaluate our measure based on two tasks: comparison with human judgments using five datasets and a specific application “solving choice problem”. Our result system shows a good performance and outperforms sometimes ESA (Explicit Semantic Analysis) and TSA (Temporal Semantic Analysis) approaches. View Show abstract ... Semantic similarity measurement techniques have gained great importance with the advent of Semantic Web (Chaves-González and MartíNez-Gil, 2013). The term semantic similarity indicates the computation of the amount of similarity among the concepts, which does not necessary to be a lexical similarity but it could be a conceptual similarity. ... Semantic Similarity Measurement Methods: The State-of-the-art Article * Nov 2014 * Fatmah Nazar Mahmood * Amirah Ismail With increasing importance of estimating the semantic similarity between concepts this study tries to highlight some methods used in this area. Similarity measurement between concepts has become a significant component in most intelligent knowledge management applications, especially in fields of Information Extraction (IE) and Information Retrieval (IR). Measuring similarity among concepts has been considered as a quantitative measure of the information; computation of similarity relies on the relations and the properties linked between the concepts in ontology. In this study we have briefly reviewed the main categories of semantic similarity. View Show abstract ... For example, consumers may use "Small4L" to replace the "Audi A4L." Synonyms' semantic functions and rules are used to comply with the following key points: Substitution between synonyms always occurs in a similar context, which means that the appearance of different synonym items in a group of synonyms often implies the similarity of context. Therefore, this can be regarded as a sign of the same "topic context" [11]. A mathematical structure is always used in text mining, such as "vector" to represent a word. ... Predicting sales by online searching data keywords based on text mining: Evidence from the Chinese automobile market Article Full-text available * Oct 2019 * J Phys Conf * Yi Li * Liangru Yu * Rui Wen View ... This means that regression analysis tries to predict the final semantic similarity score based on the known values of other semantic similarity measures. In this context, there are different types under which regression analysis linear regression (Chaves-González & Martinez-Gil, 2013), N-Gram regression (Malandrakis et al., 2012), or Support Vector regression (Croce et al., 2012). In general, the 135 results obtained through regression analysis are usually quite easy to interpret by a human, and therefore they are commonly used as interpretable models. ... A novel method based on symbolic regression for interpretable semantic similarity measurement Article * Jun 2020 * EXPERT SYST APPL * Jorge Martinez-Gil * José Manuel Chaves-González The problem of automatically measuring the degree of semantic similarity between textual expressions is a challenge that consists of calculating the degree of likeness between two text fragments that have none or few features in common according to human judgment. In recent times, several machine learning methods have been able to establish a new state-of-the-art regarding the accuracy, but none or little attention has been paid to their interpretability, i.e. the extent to which an end-user could be able to understand the cause of the output from these approaches. Although such solutions based on symbolic regression already exist in the field of clustering (Lensen et al., 2019), we propose here a new approach which is being able to reach high levels of interpretability without sacrificing accuracy in the context of semantic textual similarity. After a complete empirical evaluation using several benchmark datasets, it is shown that our approach yields promising results in a wide range of scenarios. View Show abstract ... For instance, in the fields of Natural Language Processing (NLP) and IR, ontology-based semantic similarity measures have been used in Word Sense Disambiguation (WSD) methods [92] , text similarity measures [86] , spelling error detection [20] , sentence similarity models [44,66,91] , paraphrase detection [36] , unified sense disambiguation methods for different types of structured sources of knowledge [73] , document clustering [31] , ontology alignment [30] , document [74] and query anonymization [11] , clustering of nominal information [9,10] , chemical entity identification [40] , interoperability among agent-based systems [34] , and ontology-based Information Retrieval (IR) models [55,62] to solve the lack of an intrinsic semantic distance in vector ontologybased IR models [23] . In the field of bioengineering, ontologybased similarity measures have been proposed for synonym recognition [24] and biomedical text mining [14,98,112] . However, since the pioneering work of Lord et al. [72] , the proposal of similarity measures for genomics and proteomics based on the Gene Ontology (GO) [5] have attracted a lot of attention, as detailed in a recent survey on the topic [76] . ... HESML: A scalable ontology-based semantic similarity measures library with a set of reproducible experiments and a replication dataset Article Full-text available * Jun 2017 * INFORM SYST * Juan José Lastra-Díaz * Ana M Garcia-Serrano * Montserrat Batet * Fernando Chirigati This work is a detailed companion reproducibility paper of the methods and experiments proposed by Lastra-Díaz and García-Serrano in [56, 57, 58], which introduces the following contributions: (1) a new and efficient representation model for taxonomies, called PosetHERep, which is an adaptation of the half-edge data structure commonly used to represent discrete manifolds and planar graphs; (2) a new Java software library called the Half-Edge Semantic Measures Library (HESML) based on PosetHERep, which implements most ontology-based semantic similarity measures and Information Content (IC) models reported in the literature; (3) a set of reproducible experiments on word similarity based on HESML and ReproZip with the aim of exactly reproducing the experimental surveys in the three aforementioned works; (4) a replication framework and dataset, called WNSimRep v1, whose aim is to assist the exact replication of most methods reported in the literature; and finally, (5) a set of scalability and performance benchmarks for semantic measures libraries. PosetHERep and HESML are motivated by several drawbacks in the current semantic measures libraries, especially the performance and scalability, as well as the evaluation of new methods and the replication of most previous methods. The reproducible experiments introduced herein are encouraged by the lack of a set of large, self-contained and easily reproducible experiments with the aim of replicating and confirming previously reported results. Likewise, the WNSimRep v1 dataset is motivated by the discovery of several contradictory results and difficulties in reproducing previously reported methods and experiments. PosetHERep proposes a memory-efficient representation for taxonomies which linearly scales with the size of the taxonomy and provides an efficient implementation of most taxonomy-based algorithms used by the semantic measures and IC models, whilst HESML provides an open framework to aid research into the area by providing a simpler and more efficient software architecture than the current software libraries. Finally, we prove the outperformance of HESML on the state-of-the-art libraries, as well as the possibility of significantly improving their performance and scalability without caching using PosetHERep. View Show abstract ... In this way, when analyzing the curriculum of job candidates, this kind of techniques can operate at the conceptual level when comparing specific terms (e.g., Finance) also yields matches on related terms (e.g., Economics, Economic Affairs, Financial Affairs, etc.). As another example, in the healthcare field, an expert on the treatment of cancer could also be considered as an expert on oncology, lymphoma or tumor treatment, etc [9]. The potential of this kind of techniques is that it can support Human Resource Management when leading to a more quickly and easily cut through massive volumes of potential candidate information, but without giving up the way human experts take decisions in the real world. ... A Smart Approach for Matching, Learning and Querying Information from the Human Resources Domain Book Full-text available * Jan 2016 * Jorge Martinez-Gil * Alejandra Lorena Paoletti * Klaus-dieter Schewe View ... In this way, when analyzing the curriculum of job candidates, this kind of techniques can operate at the conceptual level when comparing specific terms (e.g., Finance) also yields matches on related terms (e.g., Economics, Economic Affairs, Financial Affairs, etc.). As another example, in the healthcare field, an expert on the treatment of cancer could also be considered as an expert on oncology, lymphoma or tumor treatment, etc [9]. The potential of this kind of techniques is that it can support Human Resource Management when leading to a more quickly and easily cut through massive volumes of potential candidate information, but without giving up the way human experts take decisions in the real world. ... A Smart Approach for Matching, Learning and Querying Information from the Human Resources Domain Preprint Full-text available * Sep 2017 * Jorge Martinez-Gil * Alejandra Lorena Paoletti * Klaus-dieter Schewe We face the complex problem of timely, accurate and mutually satisfactory mediation between job offers and suitable applicant profiles by means of semantic processing techniques. In fact, this problem has become a major challenge for all public and private recruitment agencies around the world as well as for employers and job seekers. It is widely agreed that smart algorithms for automatically matching, learning, and querying job offers and candidate profiles will provide a key technology of high importance and impact and will help to counter the lack of skilled labor and/or appropriate job positions for unemployed people. Additionally, such a framework can support global matching aiming at finding an optimal allocation of job seekers to available jobs, which is relevant for independent employment agencies, e.g. in order to reduce unemployment. View Show abstract ... A detailed description of these measures is out the scope of this work, but some explanatory insights are described in [8]. For us, it is enough to know these single measures are the state-of-the-art in the field of semantic similarity measurement [9]. Table 2 shows the results for the aggregation of the different semantic similarity measures based on dictionary measures. ... CoTO: A Novel Approach for Fuzzy Aggregation of Semantic Similarity Measures Preprint Full-text available * Sep 2017 * Jorge Martinez-Gil Semantic similarity measurement aims to determine the likeness between two text expressions that use different lexicographies for representing the same real object or idea. There are a lot of semantic similarity measures for addressing this problem. However, the best results have been achieved when aggregating a number of simple similarity measures. This means that after the various similarity values have been calculated, the overall similarity for a pair of text expressions is computed using an aggregation function of these individual semantic similarity values. This aggregation is often computed by means of statistical functions. In this work, we present CoTO (Consensus or Trade-Off) a solution based on fuzzy logic that is able to outperform these traditional approaches. View Show abstract ... In this way, it is possible to reduce the risk of relying on a single ssm operating within production environments. Moreover, this approach has proven to achieve good results in the past ( Chaves-González & Martinez-Gil, 2013 ). The rationale behind this way of working is very intuitive; if there are some specific ssm not being able to perform reasonably well for the particular comparison of terms or textual expressions, their effects can be blurred by others ssm that achieve better performance. ... Automatic Design of Semantic Similarity Controllers based on Fuzzy Logics Article * Apr 2019 * EXPERT SYST APPL * Jorge Martinez-Gil * José Manuel Chaves-González Recent advances in machine learning have been able to make improvements over the state-of-the-art regarding semantic similarity measurement techniques. In fact, we have all seen how classical techniques have given way to promising neural techniques. Nonetheless, these new techniques have a weak point: they are hardly interpretable. For this reason, we have oriented our research towards the design of strategies being able to be accurate enough but without sacrificing their interpretability. As a result, we have obtained a strategy for the automatic design of semantic similarity controllers based on fuzzy logics, which are automatically identified using genetic algorithms (GAs). After an exhaustive evaluation using a number of well-known benchmark datasets, we can conclude that our strategy fulfills both expectations: it is able of achieving reasonably good results, and at the same time, it can offer high degrees of interpretability. View Show abstract ... Current approaches on semantic similarity measurement are usually following an approach based on the aggregation of similarity scores retrieved from a number of different ssm, i.e. aggregation methods try to accurately aggregate different viewpoints to come to a final decision ( Grabisch et al. 2011). As consequence, some authors have proposed some similarity aggregation techniques that have achieved good results in the past (Chaves- González and Martinez-Gil 2013). The rationale behind this approach is very intuitive; if there are some ssm not being able to perform reasonably well for a particular comparison, their effects on the overall performance can be blurred by other ssm being able to provide better results (Martinez-Gil 2016b). ... Semantic similarity aggregators for very short textual expressions: a case study on landmarks and point of interest Article Full-text available * Oct 2019 * J Intell Inform Syst * Jorge Martinez-Gil Semantic similarity measurement aims to automatically compute the degree of similarity between two textual expressions that use different representations for naming the same concepts. However, very short textual expressions cannot always follow the syntax of a written language and, in general, do not provide enough information to support proper analysis. This means that in some fields, such as the processing of landmarks and points of interest, results are not entirely satisfactory. In order to overcome this situation, we explore the idea of aggregating existing methods by means of two novel aggregation operators aiming to model an appropriate interaction between the similarity measures. As a result, we have been able to improve the results of existing techniques when solving the GeReSiD and the SDTS, two of the most popular benchmark datasets for dealing with geographical information. View Show abstract Near-synonym substitution using a discriminative vector space model Article * May 2016 * KNOWL-BASED SYST * Liang-Chih Yu * Lung-Hao Lee * Jui-Feng Yeh * Yu-Ling Lai Near-synonyms are fundamental and useful knowledge resources for computer-assisted language learning (CALL) applications. For example, in online language learning systems, learners may have a need to express a similar meaning using different words. However, it is usually difficult to choose suitable near-synonyms to fit a given context because the differences of near-synonyms are not easily grasped in practical use, especially for second language (L2) learners. Accordingly, it is worth developing algorithms to verify whether near-synonyms match given contexts. Such algorithms could be used in applications to assist L2 learners in discovering the collocational differences between near-synonyms. We propose a discriminative vector space model for the near-synonym substitution task, and consider this task as a classification task. There are two components: a vector space model and discriminative training. The vector space model is used as a baseline classifier to classify test examples into one of the near-synonyms in a given near-synonym set. A discriminative training technique is then employed to improve the vector space model by distinguishing positive and negative features for each near-synonym. Experimental results show that the DT-VSM achieves higher accuracy than both pointwise mutual information and n-gram-based methods that have been used in previous studies. View Show abstract Data Reduction for Continuum of Care: An Exploratory Study Using the Predicate-Argument Structure to Pre-process Radiology Sentences for Measurement of Semantic Similarity Conference Paper * Jul 2013 * Eric T. Newsom * Josette Jones In the clinical setting, continuum of care depends on integrated information services to assure a smooth progression for patient centered care, and these integrated information services must understand past events and personal circumstances to make care relevant. Clinicians face a problem that the amount of information produced in disparate electronic clinical notes is increasing to levels incapable of being processed by humans. Clinicians need a function in information services that can reduce the free text data to a message useful at time of care. Information extraction (IE) is a sub-field of natural language processing with the goal of data reduction of unstructured free text. Pertinent to IE is an annotated corpus that frames how IE methods should create a logical expression necessary for processing meaning of text. This study explores and reports on the requirements to using the predicate-argument statement (PAS) as the framework. A convenient sample from a prior study with ten synsets of 100 unique sentences from radiology reports deemed by domain experts to mean the same thing will be the text from which PAS structures are formed. Through content analysis of pattern recognition, findings show PAS is a feasible framework to structure sentences for semantic similarity measurement. View Show abstract DMS2015short-1: Semantic similarity assessment using differential evolution algorithm in continuous vector space Article * Nov 2015 * Wei Lu * Yuanyuan Cai * Xiaoping Che * Kailun Shi The assessment of semantic similarity between terms is one of the challenging tasks in knowledge-based applications, such as multimedia retrieval, automatic service discovery and emotion mining. By means of similarity estimation, the comprehension of textual resources can become more feasible and accurate. Some studies have proposed the integration of various assessment methods for taking advantage of different semantic resources, but most of them simply employ average operation or regression training. In this paper, we address this problem by combining the corpus-based similarity methods with the WordNet-based methods based on a differential evolution (DE) algorithm. Specifically, this DE-based approach conducts similarity assessment in a continuous vector space. It is validated against a variety of similarity approaches on multiple benchmark datasets. Empirical results demonstrate that our approach outperforms existing works and more conforms to the human judgement of similarity. The results also prove the expressiveness of continuous vectors learned from neural network on latent lexical semantics. View Show abstract Differential evolution with Pareto tournament for the multi-objective next release problem Article * Feb 2015 * Jose M. Chaves-González * Miguel A. Pérez-Toledano Software requirements selection is the engineering process in which the set of new requirements which will be included in the next release of a software product are chosen. This NP-hard problem is an important issue involving several contradictory objectives that have to be tackled by software companies when developing new releases of software packages. Software projects have to stick to a budget, but they also have to cover the highest number of customer requirements. Furthermore, in real instances of the problem, the requirements tackled suffer interactions and other restrictions which complicate the problem. In this paper, we use an adapted multi-objective version of the differential evolution (DE) evolutionary algorithm which has been successfully applied to several real instances of the problem. For doing this, the software requirements selection problem has been formulated as a multiobjective optimization problem with two objectives: the total software development cost and the overall customer’s satisfaction, and with three interaction constraints. On the other hand, the original DE algorithm has been adapted to solve real instances of the problem generated from data provided by experts. Numerical experiments with case studies on software requirements selection have been carried out to demonstrate the effectiveness of the multiobjective proposal and the obtained results show that the developed algorithm performs better than other relevant algorithms previously published in the literature under a set of public datasets. View Show abstract Analysis of word co-occurrence in human literature for supporting semantic correspondence discovery Conference Paper * Sep 2014 * Jorge Martinez-Gil * Mario Pichler Semantic similarity measurement aims to determine the likeness between two text expressions that use different lexicographies for representing the same real object or idea. In this work, we describe the way to exploit broad cultural trends for identifying semantic similarity. This is possible through the quantitative analysis of a vast digital book collection representing the digested history of humanity. Our research work has revealed that appropriately analyzing the co-occurrence of words in some periods of human literature can help us to determine the semantic similarity between these words by means of computers with a high degree of accuracy. View Show abstract Comparison of the effectiveness of different accessibility plugins based on important accessibility criteria Conference Paper Full-text available * Jul 2013 * Alireza Darvishy * Hans-Peter Hutter This paper compares two new freely available software plugins for MS PowerPoint and Word documents that we have developed at the ZHAW with similar tools with respect to important accessibility criteria. Our plugins [1, 2, 3] allow the analysis of accessibility issues and consequently the generation of fully accessible PDF documents. The document authors using these plugins require no specific accessibility knowledge. The plugins are based on a flexible software architecture concept [1] that allows the automatic generation of fully accessible PDF documents originating from various authoring tools, such as Adobe InDesign [5], Word or PowerPoint [6, 7]. Other available plugins, on the other hand, need accessibility knowledge in order to use them properly and effectively. View Show abstract Thinking on the Web: Berners-Lee, Gödel and Turing. Article * Jan 2007 * COMPUT J * Jorge Martinez-Gil View Differential Evolution: A Simple and Efficient Adaptive Scheme for Global Optimization Over Continuous Spaces Article Full-text available * Jan 1995 * J GLOBAL OPTIM * Rainer Martin Storn * Kenneth V. Price A new heuristic approach for minimizing possibly nonlinear and non differentiable continuous space functions is presented. By means of an extensive testbed, which includes the De Jong functions, it will be demonstrated that the new method converges faster and with more certainty than Adaptive Simulated Annealing as well as the Annealed Nelder&Mead approach, both of which have a reputation for being very powerful. The new method requires few control variables, is robust, easy to use and lends itself very well to parallel computation. View Show abstract Verbs Semantics and Lexical Selection Conference Paper Full-text available * Jan 1994 * Zhibiao Wu * Martha Palmer This paper will focus on the semantic representation of verbs in computer systems and its impact on lexical selection problems in machine translation (MT). Two groups of English and Chinese verbs are examined to show that lexical selection must be based on interpretation of the sentences as well as selection restrictions placed on the verb arguments. A novel representation scheme is suggested, and is compared to representations with selection restrictions used in transfer-based MT. We see our approach as closely aligned with knowledge-based MT approaches (KBMT), and as a separate component that could be incorporated into existing systems. Examples and experimental results will show that, using this scheme, inexact matches can achieve correct lexical selection. View Show abstract A comparative analysis of crossover variants in differential evolution Article Full-text available * May 2006 * Daniela Zaharie This paper presents a comparative analysis of binomial and exponential crossover in differential evolution. Some theoretical results concerning the probabilities of mutating an arbitrary component and that of mutating a given number of components are obtained for both crossover variants. The differences between binomial and exponential crossover are identified and the impact of these results on the choice of control parameters and on the adaptive variants is analyzed. View Show abstract Measuring Semantic Similarity Between Biomedical Concepts Within Multiple Ontologies Article Full-text available * Aug 2009 * IEEE T SYST MAN CY C * Hisham Al-Mubaid * Hoa A. Nguyen Most of the intelligent knowledge-based applications contain components for measuring semantic similarity between terms. Many of the existing semantic similarity measures that use ontology structure as their primary source cannot measure semantic similarity between terms and concepts using multiple ontologies. This research explores a new way to measure semantic similarity between biomedical concepts using multiple ontologies. We propose a new ontology-structure-based technique for measuring semantic similarity in single ontology and across multiple ontologies in the biomedical domain within the framework of unified medical language system (UMLS). The proposed measure is based on three features: 1) cross-modified path length between two concepts; 2) a new feature of common specificity of concepts in the ontology; and 3) local granularity of ontology clusters. The proposed technique was evaluated relative to human similarity scores and compared with other existing measures using two terminologies within UMLS framework: medical subject headings and systemized nomenclature of medicine clinical term. The experimental results validate the efficiency of the proposed technique in single and multiple ontologies, and demonstrate that our proposed measure achieves the best results of correlation with human scores in all experiments. View Show abstract Ontology-based semantic similarity: A new feature-based approach Article Full-text available * Mar 2012 * EXPERT SYST APPL * David Sánchez * Montserrat Batet * David Isern * Aida Valls View Ontology-based information content computation Article Full-text available * Mar 2011 * KNOWL-BASED SYST * David Sánchez * Montserrat Batet * David Isern The information content (IC) of a concept provides an estimation of its degree of generality/concreteness, a dimension which enables a better understanding of concept’s semantics. As a result, IC has been successfully applied to the automatic assessment of the semantic similarity between concepts. In the past, IC has been estimated as the probability of appearance of concepts in corpora. However, the applicability and scalability of this method are hampered due to corpora dependency and data sparseness. More recently, some authors proposed IC-based measures using taxonomical features extracted from an ontology for a particular concept, obtaining promising results. In this paper, we analyse these ontology-based approaches for IC computation and propose several improvements aimed to better capture the semantic evidence modelled in the ontology for the particular concept. Our approach has been evaluated and compared with related works (both corpora and ontology-based ones) when applied to the task of semantic similarity estimation. Results obtained for a widely used benchmark show that our method enables similarity estimations which are better correlated with human judgements than related works. View Show abstract Measuring semantic similarity between words using Web search engines Conference Paper Full-text available * Jan 2007 * Danushka Bollegala * Yutaka Matsuo * Mitsuru Ishizuka Semantic similarity measures play important roles in infor- mation retrieval and Natural Language Processing. Previ- ous work in semantic web-related applications such as com- munity mining, relation extraction, automatic meta data extraction have used various semantic similarity measures. Despite the usefulness of semantic similarity measures in these applications, robustly measuring semantic similarity between two words (or entities) remains a challenging task. We propose a robust semantic similarity measure that uses the information available on the Web to measure similarity between words or entities. The proposed method exploits page counts and text snippets returned by a Web search engine. We deflne various similarity scores for two given words P and Q, using the page counts for the queries P, Q and P AND Q. Moreover, we propose a novel approach to compute semantic similarity using automatically extracted lexico-syntactic patterns from text snippets. These difierent similarity scores are integrated using support vector ma- chines, to leverage a robust semantic similarity measure. Experimental results on Miller-Charles benchmark dataset show that the proposed measure outperforms all the existing web-based semantic similarity measures by a wide margin, achieving a correlation coe-cient of 0:834. Moreover, the proposed semantic similarity measure signiflcantly improves the accuracy (F-measure of 0:78) in a community mining task, and in an entity disambiguation task, thereby verifying the capability of the proposed measure to capture semantic similarity using web content. View Show abstract A comparative study of differential evolution variants for global optimization Conference Paper Full-text available * Jan 2006 * Efrén Mezura-Montes * Jesús Velázquez-Reyes * Carlos A. Coello Coello In this paper, we present an empirical comparison of some Differential Evolution variants to solve global optimization problems. The aim is to identify which one of them is more suitable to solve an optimization problem, depending on the problem's features and also to identify the variant with the best performance, regardless of the features of the problem to be solved. Eight variants were implemented and tested on 13 benchmark problems taken from the specialized lit- erature. These variants vary in the type of recombination operator used and also in the way in which the mutation is computed. A set of statistical tests were performed in order to obtain more confidence on the validity of the re- sults and to reinforce our discussion. The main aim is that this study can help both researchers and practitioners in- terested in using differential evolution as a global optimizer, since we expect that our conclusions can provide some in- sights regarding the advantages or limitations of each of the variants studied. View Show abstract Using a Natural Language Understanding System to Generate Semantic Web Content Article Full-text available * Oct 2007 * INT J SEMANT WEB INF * Akshay Java * Sergei Nirenburg * M. McShane * Anupam Joshi We describe our research on automatically generating rich semantic annotations of text and making it available on the Semantic Web. In particular, we discuss the challenges involved in adapting the OntoSem natural language processing system for this purpose. OntoSem, an implementation of the theory of ontological semantics under continuous development for over fifteen years, uses a specia lly constructed NLP-oriented ontology and an ontological- semantic lexicon to translate English text into a custom ontology-motivated knowledge representation language, the language of text meaning representations (TMRs). OntoSem concentrates on a variety of ambiguity resolution tasks as well as processing unexpected input and reference. To adapt OntoSem's representation to the Semantic Web, we developed a translation system, OntoSem2OWL, between the TMR language into the Semantic Web language OWL. We next used OntoSem and OntoSem2OWL to support SemNews, an e xperimental web service that monitors RSS news sources, processes the summaries of the news stories and publishes a structured representation of the meaning of the text in the news story. View Show abstract Evaluating WordNet-based Measures of Lexical Semantic Relatedness Article Full-text available * Mar 2006 * COMPUT LINGUIST * Alexander Budanitsky * Graeme Hirst The quantification of lexical semantic relatedness has many applications in NLP, and many different measures have been proposed. We evaluate five of these measures, all of which use WordNet as their central resource, by comparing their performance in detecting and correcting real-word spelling errors. An information-content-based measure proposed by Jiang and Conrath is found superior to those proposed by Hirst and St-Onge, Leacock and Chodorow, Lin, and Resnik. In addition, we explain why distributional similarity is not an adequate proxy for lexical semantic relatedness. View Show abstract Enabling semantic similarity estimation across multiple ontologies: An evaluation in the biomedical domain Article Full-text available * Oct 2012 * J Biomed Informat * David Sánchez * Albert sole ribalta * Montserrat Batet * Francesc Serratosa The estimation of the semantic similarity between terms provides a valuable tool to enable the understanding of textual resources. Many semantic similarity computation paradigms have been proposed both as general-purpose solutions or framed in concrete fields such as biomedicine. In particular, ontology-based approaches have been very successful due to their efficiency, scalability, lack of constraints and thanks to the availability of large and consensus ontologies (like WordNet or those in the UMLS). These measures, however, are hampered by the fact that only one ontology is exploited and, hence, their recall depends on the ontological detail and coverage. In recent years, some authors have extended some of the existing methodologies to support multiple ontologies. The problem of integrating heterogeneous knowledge sources is tackled by means of simple terminological matchings between ontological concepts. In this paper, we aim to improve these methods by analysing the similarity between the modelled taxonomical knowledge and the structure of different ontologies. As a result, we are able to better discover the commonalities between different ontologies and hence, improve the accuracy of the similarity estimation. Two methods are proposed to tackle this task. They have been evaluated and compared with related works by means of several widely-used benchmarks of biomedical terms using two standard ontologies (WordNet and MeSH). Results show that our methods correlate better, compared to related works, with the similarity assessments provided by experts in biomedicine. View Show abstract WordNet::Similarity - Measuring the Relatedness of Concepts Article Full-text available * Apr 2004 * Ted Pedersen * Siddharth Patwardhan * Jason Michelizzi WordNet::Similarity is a freely available software package that makes it possible to measure the semantic similarity and relatedness between a pair of concepts (or synsets). It provides six measures of similarity, and three measures of relatedness, all of which are based on the lexical database WordNet. These measures are implemented as Perl modules which take as input two concepts, and return a numeric value that represents the degree to which they are similar or related. View Show abstract Lexical Chains as Representations of Context for the Detection and Correction of Malapropisms Article Full-text available * Oct 1995 * Graeme Hirst * David St-onge this paper, we examine the idea of lexical chains as such a representation. We show how they can be constructed by means of WordNet, and how they can be applied in one particular linguistic task: the detection and correction of malapropisms. View Show abstract Comparison and Classification of Documents Based on Layout Similarity Article Full-text available * Dec 1999 * INFORM RETRIEVAL * Jianying Hu * Ramanujan S. Kashi * Gordon Wilfong This paper describes features and methods for document image comparison and classification at the spatial layout level. The methods are useful for visual similarity based document retrieval as well as fast algorithms for initial document type classification without OCR. A novel feature set called interval encoding is introduced to capture elements of spatial layout. This feature set encodes region layout information in fixed-length vectors by capturing structural characteristics of the image. These fixed-length vectors are then compared to each other through a Manhattan distance computation for fast page layout comparison. The paper describes experiments and results to rank-order a set of document pages in terms of their layout similarity to a test document. We also demonstrate the usefulness of the features derived from interval coding in a hidden Markov model based page layout classification system that is trainable and extendible. The methods described in the paper can be ... View Show abstract Extended Gloss Overlaps as a Measure of Semantic Relatedness Article Full-text available * May 2003 * Satanjeev Banerjee * Ted Pedersen This paper presents a new measure of semantic relatedness between concepts that is based on the number of shared words (overlaps) in their definitions (glosses). This measure is unique in that it extends the glosses of the concepts under consideration to include the glosses of other concepts to which they are related according to a given concept hierarchy. We show that this new measure reasonably correlates to human judgments. We introduce a new method of word sense disambiguation based on extended gloss overlaps, and demonstrate that it fares well on the SENSEVAL-2 lexical sample data. View Show abstract Using Corpus Statistics and WordNet Relations for Sense Identification Article Full-text available * Jul 2002 * COMPUT LINGUIST * Claudia Leacock * George A. Miller * Martin Chodorow Introduction An impressive array of statistical methods have been developed for word sense identification. They range from dictionary-based approaches that rely on definitions (Vronis and Ide 1990; Wilks et al. 1993) to corpus-based approaches that use only word cooccurrence frequencies extracted from large textual corpora (Schfitze 1995; Dagan and Itai 1994). We have drawn on these two traditions, using corpus-based co-occurrence and the lexical knowledge base that is embodied in the WordNet lexicon. The two traditions complement each other. Corpus-based approaches have the advantage of being generally applicable to new texts, domains, and corpora without needing costly and perhaps error-prone parsing or semantic analysis. They require only training corpora in which the sense distinctions have been marked, but therein lies their weakness. Obtaining training materials for statistical methods is costly and timeconsuming --it is a "knowledge acquisition bottleneck" (Gale, Church, and Y View Show abstract Smoothing Methods In Statistics Article * Sep 1997 * J AM STAT ASSOC * Jeffrey S. Simonoff View Differential Evolution: A Survey of the State-of-the-Art Article * Mar 2011 * IEEE T EVOLUT COMPUT * Sanjoy Das * Ponnuthurai N. Suganthan Differential evolution (DE) is arguably one of the most powerful stochastic real-parameter optimization algorithms in current use. DE operates through similar computational steps as employed by a standard evolutionary algorithm (EA). However, unlike traditional EAs, the DE-variants perturb the current-generation population members with the scaled differences of randomly selected and distinct population members. Therefore, no separate probability distribution has to be used for generating the offspring. Since its inception in 1995, DE has drawn the attention of many researchers all over the world resulting in a lot of variants of the basic algorithm with improved performance. This paper presents a detailed review of the basic concepts of DE and a survey of its major variants, its application to multiobjective, constrained, large scale, and uncertain optimization problems, and the theoretical studies conducted on DE so far. Also, it provides an overview of the significant engineering applications that have benefited from the powerful nature of DE. View Show abstract A semantic similarity metric combining features and intrinsic information content Article * Nov 2009 * DATA KNOWL ENG * Giuseppe Pirrò In many research fields such as Psychology, Linguistics, Cognitive Science and Artificial Intelligence, computing semantic similarity between words is an important issue. In this paper a new semantic similarity metric, that exploits some notions of the feature-based theory of similarity and translates it into the information theoretic domain, which leverages the notion of Information Content (IC), is presented. In particular, the proposed metric exploits the notion of intrinsic IC which quantifies IC values by scrutinizing how concepts are arranged in an ontological structure. In order to evaluate this metric, an on line experiment asking the community of researchers to rank a list of 65 word pairs has been conducted. The experiment’s web setup allowed to collect 101 similarity ratings and to differentiate native and non-native English speakers. Such a large and diverse dataset enables to confidently evaluate similarity metrics by correlating them with human assessments. Experimental evaluations using WordNet indicate that the proposed metric, coupled with the notion of intrinsic IC, yields results above the state of the art. Moreover, the intrinsic IC formulation also improves the accuracy of other IC-based metrics. In order to investigate the generality of both the intrinsic IC formulation and proposed similarity metric a further evaluation using the MeSH biomedical ontology has been performed. Even in this case significant results were obtained. The proposed metric and several others have been implemented in the Java WordNet Similarity Library. View Show abstract Evaluating the usability of natural language query languages and interfaces to Semantic Web knowledge bases Article * Nov 2010 * J WEB SEMANT * Esther Kaufmann * Abraham Bernstein The need to make the contents of the Semantic Web accessible to end-users becomes increasingly pressing as the amount of information stored in ontology-based knowledge bases steadily increases. Natural language interfaces (NLIs) provide a familiar and convenient means of query access to Semantic Web data for casual end-users. While several studies have shown that NLIs can achieve high retrieval performance as well as domain independence, this paper focuses on usability and investigates if NLIs and natural language query languages are useful from an enduser’s point of view. To that end, we introduce four interfaces each allowing a different query language and present a usability study benchmarking these interfaces. The results of the study reveal a clear preference for full natural language query sentences with a limited set of sentence beginnings over keywords or formal query languages. NLIs to ontology-based knowledge bases can, therefore, be considered to be useful for casual or occasional end-users. As such, the overarching contribution is one step towards the theoretical vision of the Semantic Web becoming reality. View Show abstract Data Integration: The Teenage Years. Conference Paper * Jan 2006 * Alon Halevy * Anand Rajaraman * Joann J. Ordille Data integration is a pervasive challenge faced in appli-cations that need to query across multiple autonomous and heterogeneous data sources. Data integration is crucial in large enterprises that own a multitude of data sources, for progress in large-scale scientific projects, where data sets are being produced independently by multiple researchers, for better cooperation among government agencies, each with their own data sources, and in o ering good search quality across the millions of structured data sources on the World-Wide Web. Ten years ago we published "Querying Heterogeneous In-formation Sources using Source Descriptions" [73], a paper describing some aspects of the Information Manifold data integration project. The Information Manifold and many other projects conducted at the time [5, 6, 20, 25, 38, 43, 51, 66, 100] have led to tremendous progress on data in-tegration and to quite a few commercial data integration products. This paper o ers a perspective on the contribu-tions of the Information Manifold and its peers, describes some of the important bodies of work in the data integra-tion field in the last ten years, and outlines some challenges to data integration research today. We note in advance that this is not intended to be a comprehensive survey of data integration, and even though the reference list is long, it is by no means complete. View Show abstract Information in Data: Using the Oxford English Dictionary on a Computer Article * May 1986 * Michael Lesk I believe that the concept of a metric (or a dissimilarity measure) defined on a set of records is one of the most fundamental concepts related to information retrieval, although historically, the first science to introduce this concept as a basic one ... View Show abstract Word AdHoc Network: Using Google Core Distance to extract the most relevant information Article * Apr 2011 * KNOWL-BASED SYST * Ping-I Chen * Shi-Jen Lin In recent years, finding the most relevant documents or search results in a search engine has become an important issue. Most previous research has focused on expanding the keyword into a more meaningful sequence or using a higher concept to form the semantic search. All of those methods need predictive models, which are based on the training data or Web log of the users’ browsing behaviors. In this way, they can only be used in a single knowledge domain, not only because of the complexity of the model construction but also because the keyword extraction methods are limited to certain areas. In this paper, we describe a new algorithm called “Word AdHoc Network” (WANET) and use it to extract the most important sequences of keywords to provide the most relevant search results to the user. Our method needs no pre-processing, and all the executions are real-time. Thus, we can use this system to extract any keyword sequence from various knowledge domains. Our experiments show that the extracted sequence of the documents can achieve high accuracy and can find the most relevant information in the top 1 search results, in most cases. This new system can increase users’ effectiveness in finding useful information for the articles or research papers they are reading or writing. View Show abstract Conceptual query expansion Article * Feb 2006 * DATA KNOWL ENG * Franc Grootjen * Theo P. van der Weide This article presents a new, hybrid approach that projects an initial query result onto global information, yielding a local conceptual overview. Concepts found this way are candidates for query refinement.We show that the resulting conceptual structure after a typical short query of 2 terms, contains refinements that perform just as well as a most accurate query formulation.Subsequently we illustrate that query by navigation is an effective mechanism which in most cases finds the optimal concept in a small number of steps. When an optimal concept is not found, the navigation process still finds an acceptable sub-optimum. View Show abstract Statistical Comparisons of Classifiers over Multiple Data Sets Article * Jan 2006 * J MACH LEARN RES * Janez Demsar While methods for comparing two learning algorithms on a single data set have been scrutinized for quite some time already, the issue of statistical tests for comparisons of more algorithms on multiple data sets, which is even more essential to typical machine learning studies, has been all but ignored. This article reviews the current practice and then theoretically and empirically examines several suitable tests. Based on that, we recommend a set of simple, yet safe and robust non-parametric tests for statistical comparisons of classifiers: the Wilcoxon signed ranks test for comparison of two classifiers and the Friedman test with the corresponding post-hoc tests for comparison of more classifiers over multiple data sets. Results of the latter can also be neatly presented with the newly introduced CD (critical difference) diagrams. View Show abstract Measures of semantic similarity and relatedness in the biomedical domain Article * Jul 2007 * J Biomed Informat * Ted Pedersen * Serguei V.S. Pakhomov * Siddharth Patwardhan * Christopher G Chute Measures of semantic similarity between concepts are widely used in Natural Language Processing. In this article, we show how six existing domain-independent measures can be adapted to the biomedical domain. These measures were originally based on WordNet, an English lexical database of concepts and relations. In this research, we adapt these measures to the SNOMED-CT ontology of medical concepts. The measures include two path-based measures, and three measures that augment path-based measures with information content statistics from corpora. We also derive a context vector measure based on medical corpora that can be used as a measure of semantic relatedness. These six measures are evaluated against a newly created test bed of 30 medical concept pairs scored by three physicians and nine medical coders. We find that the medical coders and physicians differ in their ratings, and that the context vector measure correlates most closely with the physicians, while the path-based measures and one of the information content measures correlates most closely with the medical coders. We conclude that there is a role both for more flexible measures of relatedness based on information derived from corpora, as well as for measures that rely on existing ontological structures. View Show abstract The Semantic Web Revisited Article * Feb 2006 * IEEE INTELL SYST * Nigel Shadbolt * Wendy Hall * Tim Berners-Lee The article included many scenarios in which intelligent agents and bots undertook tasks on behalf of their human or corporate owners. Of course, shopbots and auction bots abound on the Web, but these are essentially handcrafted for particular tasks: they have little ability to interact with heterogeneous data and information types. Because we haven't yet delivered large-scale, agent-based mediation, some commentators argue that the semantic Web has failed to deliver. We argue that agents can only flourish when standards are well established and that the Web standards for expressing shared meaning have progressed steadily over the past five years View Show abstract An Information-Theoretic Definition of Similarity Article * Aug 1998 * Dekang Lin Similarity is an important and widely used concept. Previous definitions of similarity are tied to a particular application or a form of knowledge representation. We present an informationtheoretic definition of similarity that is applicable as long as there is a probabilistic model. We demonstrate how our definition can be used to measure the similarity in a number of different domains. View Show abstract Using Information Content to Evaluate Semantic Similarity in a Taxonomy Article * Feb 1970 * Philip Resnik This paper presents a new measure of semantic similarity in an is-a taxonomy, based on the notion of information content. Experimental evaluation suggests that the measure performs encouragingly well (a correlation of r = 0:79 with a benchmark set of human similarity judgments, with an upper bound of r = 0:90 for human subjects performing the same task), and significantly better than the traditional edge counting approach (r = 0:66). 1 Introduction Evaluating semantic relatedness using network representations is a problem with a long history in artificial intelligence and psychology, dating back to the spreading activation approach of Quillian [ 1968 ] and Collins and Loftus [ 1975 ] . Semantic similarity represents a special case of semantic relatedness: for example, cars and gasoline would seem to be more closely related than, say, cars and bicycles, but the latter pair are certainly more similar. Rada et al. [ 1989 ] suggest that the assessment of similarity in semantic n... View Show abstract Verb Semantics And Lexical Selection Article * May 2002 * Zhibiao Wu * Martha Palmer This paper will focus on the semantic representation of verbs in computer systems and its impact on lexical selection problems in machine translation (MT). Two groups of English and Chinese verbs are examined to show that lexical selec- tion must be based on interpretation of the sen- tence as well as selection restrictions placed on the verb arguments. A novel representation scheme is suggested, and is compared to representations with selection restrictions used in transfer-based MT. We see our approach as closely aligned with knowledge-based MT approaches (KBMT), and as a separate component that could be incorporated into existing systems. Examples and experimental results will show that, using this scheme, inexact matches can achieve correct lexical selection. View Show abstract Semantic Similarity Based on Corpus Statistics and Lexical Taxonomy Article * Oct 1997 * Jay J. Jiang * David W. Conrath This paper presents a new approach for measuring semantic similarity/distance between words and concepts. It combines a lexical taxonomy structure with corpus statistical information so that the semantic distance between nodes in the semantic space constructed by the taxonomy can be better quantified with the computational evidence derived from a distributional analysis of corpus data. Specifically, the proposed measure is a combined approach that inherits the edge-based approach of the edge counting scheme, which is then enhanced by the node-based approach of the information content calculation. When tested on a common data set of word pair similarity ratings, the proposed approach outperforms other computational models. It gives the highest correlation value (r = 0.828) with a benchmark based on human similarity judgements, whereas an upper bound (r = 0.885) is observed when human subjects replicate the same task. View Show abstract * Jan 1996 * J S Simonoff J.S. Simonoff, Smoothing Methods in Statistics, Springer, 1996. RECOMMENDATIONS Discover more Project MPCENERGY * Georgios C. Chasparis * Thomas Natschläger * Thomas Grubinger * [...] * Christa Illibauer View project Project ACCURATE AND EFFICIENT PROFILE MATCHING IN KNOWLEDGE BASES * Jorge Martinez-Gil Accurate matching of job offers and applicant profiles View project Project SEMANTIC SIMILARITY MEASUREMENT * Jorge Martinez-Gil View project Project ONTOLOGY MATCHING AND META-MATCHING * Jorge Martinez-Gil * Jose F Aldana Montes * Ismael Navas Delgado View project Preprint EVOLUTIONARY ALGORITHM BASED ON DIFFERENT SEMANTIC SIMILARITY FUNCTIONS FOR SYNONYM RECOGNITION IN T... September 2017 * Jorge Martinez-Gil * José M. Chaves-González One of the most challenging problems in the semantic web field consists of computing the semantic similarity between different terms. The problem here is the lack of accurate domain-specific dictionaries, such as biomedical, financial or any other particular and dynamic field. In this article we propose a new approach which uses different existing semantic similarity methods to obtain precise ... [Show full abstract] results in the biomedical domain. Specifically, we have developed an evolutionary algorithm which uses information provided by different semantic similarity metrics. Our results have been validated against a variety of biomedical datasets and different collections of similarity functions. The proposed system provides very high quality results when compared against similarity ratings provided by human experts (in terms of Pearson correlation coefficient) surpassing the results of other relevant works previously published in the literature. Read more LOOKING FOR THE FULL-TEXT? You can request the full-text of this article directly from the authors on ResearchGate. Request full-text Already a member? Log in ResearchGate iOS App Get it from the App Store now. Install Keep up with your stats and more Access scientific knowledge from anywhere or Discover by subject area * Recruit researchers * Join for free * Login Email Tip: Most researchers use their institutional email address as their ResearchGate login PasswordForgot password? Keep me logged in Log in or Continue with Google Welcome back! Please log in. Email · Hint Tip: Most researchers use their institutional email address as their ResearchGate login PasswordForgot password? Keep me logged in Log in or Continue with Google No account? Sign up Company About us News Careers Support Help Center Business solutions Advertising Recruiting © 2008-2021 ResearchGate GmbH. All rights reserved. * Terms * Privacy * Copyright * Imprint