wickerlab.org Open in urlscan Pro
185.112.146.228  Public Scan

URL: https://wickerlab.org/
Submission Tags: phishingrod
Submission: On June 26 via api from DE — Scanned from IS

Form analysis 2 forms found in the DOM

Name: tppublistformGET

<form name="tppublistform" method="get"><a name="tppubs" id="tppubs"></a></form>

GET https://wickerlab.org/

<form role="search" method="get" action="https://wickerlab.org/" class="wp-block-search__button-inside wp-block-search__icon-button wp-block-search"><label class="wp-block-search__label screen-reader-text"
    for="wp-block-search__input-2">Search</label>
  <div class="wp-block-search__inside-wrapper "><input class="wp-block-search__input" id="wp-block-search__input-2" placeholder="" value="" type="search" name="s" required=""><button aria-label="Search"
      class="wp-block-search__button has-icon wp-element-button" type="submit"><svg class="search-icon" viewBox="0 0 24 24" width="24" height="24">
        <path d="M13 5c-3.3 0-6 2.7-6 6 0 1.4.5 2.7 1.3 3.7l-3.8 3.8 1.1 1.1 3.8-3.8c1 .8 2.3 1.3 3.7 1.3 3.3 0 6-2.7 6-6S16.3 5 13 5zm0 10.5c-2.5 0-4.5-2-4.5-4.5s2-4.5 4.5-4.5 4.5 2 4.5 4.5-2 4.5-4.5 4.5z"></path>
      </svg></button></div>
</form>

Text Content

Skip to content
 * Mastodon
 * GitHub

 * Home
 * Lab
 * Publications
 * Research
 * Projects
 * News
 * Join us
   * Join us
   * Funding

Our lab researches machine learning and its application to cheminformatics,
bioinformatics, and computational sustainability. We are always interested in
interesting new research areas both for applied and fundamental machine
learning. Currently, we are particularly interested in reliability of machine
learning models, adversarial machine learning, and bias, with applications in
chemistry, epidemiology, and environmental research.

To learn more about our lab, check out our publications or read more about our
research and projects.

You can join us as PhD student, Honours student, or other postgraduate student.
You can also visit our lab as visiting researcher or student.




NEWS


 * ARTIFICIAL INTELLIGENCE AND FRESHWATER MODELLING
   
   To protect our freshwater for future generations, we develop a framework
   enabling an understanding of how environmental factors impact our water
   quality and how mitigation strategies can help. Our project […]


 * ATTACKING THE LOOP: ADVERSARIAL ATTACKS ON GRAPH-BASED LOOP CLOSURE DETECTION
   
   


 * REGIONAL BIAS IN MONOLINGUAL ENGLISH LANGUAGE MODELS
   
   


 * A SYSTEMATIC REVIEW OF ASPECT-BASED SENTIMENT ANALYSIS: DOMAINS, METHODS, AND
   TRENDS
   
   


SOCIAL

Fediverse
 * Congrats Katharina Dost!
   May 6, 2024
   Joerg Simon Wicker wrote the following post Tue, 07 May 2024 11:56:39
   +1200Congrats Katharina Dost!#^https://youtu.be/wkY4QP4KpMQ?t=6838
 * (no title)
   March 22, 2024
   Joerg Simon Wicker wrote the following post Fri, 22 Mar 2024 22:33:12
   +1300William Stafford Noble – Deep learning applications in mass spectrometry
   proteomics and single-cell genomics UoA ML Seminar: William Stafford Noble –
   DL applications in MS proteomics and single-cell genomics by Machine Learning
   Group – University of Auckland on YouTubeOur most recent UoA Machine […]
 * (no title)
   March 19, 2024
   Joerg Simon Wicker wrote the following post Wed, 20 Mar 2024 00:41:09
   +1300Attacking the Loop: Adversarial Attacks on Graph-based Loop Closure
   DetectionJonathan Kim's latest paper – Attacking the Loop: #Adversarial
   Attacks on Graph-based Loop Closure Detection –  attacking loop closure in
   #SLAM. Check it out here.
 * (no title)
   February 14, 2024
   New publication alert!'Making waves: Enhancing pollutant biodegradation via
   rational engineering of microbial
   consortia'https://www.sciencedirect.com/science/article/pii/S004313542301196X?via%3Dihub
 * (no title)
   November 21, 2023
   Likes @enviPath's Note New data package: EAWAG-SLUDGE !EAWAG-SLUDGE contains
   biotransformation data from activated sludge experiments extracted from 27
   scientific articles, including our own paper by Trostel et al.
   (2023)https://envipath.org/package/7932e576-03c7-4106-819d-fe80dc605b8a


RECENT PUBLICATIONS


JOURNAL ARTICLES

Long, Derek; Eade, Liam; Dost, Katharina; Meier-Menches, Samuel M; Goldstone,
David C; Sullivan, Matthew P; Hartinger, Christian; Wicker, Jörg; Taskova,
Katerina

AdductHunter: Identifying Protein-Metal Complex Adducts in Mass Spectra Journal
Article

In: Journal of Cheminformatics, vol. 16, iss. 1, 2024, ISSN: 1758-2946.

Abstract | Links | BibTeX | Altmetric | PlumX

See more details

X (3)
Mendeley (5)

Close

Plum Print visual indicator of research metrics
 * Captures
   * Readers: 5

see details

Close

@article{Long2023adducthunter,

title = {AdductHunter: Identifying Protein-Metal Complex Adducts in Mass Spectra},

author = {Derek Long and Liam Eade and Katharina Dost and Samuel M Meier-Menches and David C Goldstone and Matthew P Sullivan and Christian Hartinger and J\"{o}rg Wicker and Katerina Taskova},

url = {https://adducthunter.wickerlab.org

https://doi.org/10.21203/rs.3.rs-3322854/v1},

doi = {10.1186/s13321-023-00797-7},

issn = {1758-2946},

year  = {2024},

date = {2024-02-06},

urldate = {2023-05-29},

journal = {Journal of Cheminformatics},

volume = {16},

issue = {1},

abstract = {Mass spectrometry (MS) is an analytical technique for molecule identification that can be used for investigating protein-metal complex interactions. Once the MS data is collected, the mass spectra are usually interpreted manually to identify the adducts formed as a result of the interactions between proteins and metal-based species. However, with increasing resolution, dataset size, and species complexity, the time required to identify adducts and the error-prone nature of manual assignment have become limiting factors in MS analysis. AdductHunter is a open-source web-based analysis tool that  automates the peak identification process using constraint integer optimization to find feasible combinations of protein and fragments, and dynamic time warping to calculate the dissimilarity between the theoretical isotope pattern of a species and its experimental isotope peak distribution. Empirical evaluation on a collection of 22 unique MS datasetsshows fast and accurate identification of protein-metal complex adducts in deconvoluted mass spectra.},

keywords = {},

pubstate = {published},

tppubtype = {article}

}



Close

Mass spectrometry (MS) is an analytical technique for molecule identification
that can be used for investigating protein-metal complex interactions. Once the
MS data is collected, the mass spectra are usually interpreted manually to
identify the adducts formed as a result of the interactions between proteins and
metal-based species. However, with increasing resolution, dataset size, and
species complexity, the time required to identify adducts and the error-prone
nature of manual assignment have become limiting factors in MS analysis.
AdductHunter is a open-source web-based analysis tool that automates the peak
identification process using constraint integer optimization to find feasible
combinations of protein and fragments, and dynamic time warping to calculate the
dissimilarity between the theoretical isotope pattern of a species and its
experimental isotope peak distribution. Empirical evaluation on a collection of
22 unique MS datasetsshows fast and accurate identification of protein-metal
complex adducts in deconvoluted mass spectra.

Close

 * https://adducthunter.wickerlab.org
 * https://doi.org/10.21203/rs.3.rs-3322854/v1
 * doi:10.1186/s13321-023-00797-7

Close

Plum Print visual indicator of research metrics

Miller, Catriona J; Golovina, Evgenija; Wicker, Jörg; Jacobson, Jessie C;
O'Sullivan, Justin M

De novo network analysis reveals autism causal genes and developmental links to
co-occurring traits Journal Article

In: Life Science Alliance, vol. 6, no. 10, 2023.

Abstract | Links | BibTeX | Altmetric | PlumX

See more details

X (10)
Mendeley (5)

Close

Plum Print visual indicator of research metrics
 * Captures
   * Readers: 5

see details

Close

@article{Miller2023denovo,

title = {De novo network analysis reveals autism causal genes and developmental links to co-occurring traits},

author = {Catriona J Miller and Evgenija Golovina and J\"{o}rg Wicker and Jessie C Jacobson and Justin M O\'Sullivan},

url = {https://www.medrxiv.org/content/10.1101/2023.04.24.23289060v1},

doi = {10.26508/lsa.202302142},

year  = {2023},

date = {2023-08-08},

urldate = {2023-08-08},

journal = {Life Science Alliance},

volume = {6},

number = {10},

abstract = {Autism is a complex neurodevelopmental condition that manifests in various ways. Autism is often accompanied by other conditions, such as attention-deficit/hyperactivity disorder and schizophrenia, which can complicate diagnosis and management. Although research has investigated the role of specific genes in autism, their relationship with co-occurring traits is not fully understood. To address this, we conducted a two-sample Mendelian randomisation analysis and identified four genes located at the 17q21.31 locus that are putatively causal for autism in fetal cortical tissue (LINC02210, LRRC37A4P, RP11-259G18.1, and RP11-798G7.6). LINC02210 was also identified as putatively causal for autism in adult cortical tissue. By integrating data from expression quantitative trait loci, genes and protein interactions, we identified that the 17q21.31 locus contributes to the intersection between autism and other neurological traits in fetal cortical tissue. We also identified a distinct cluster of co-occurring traits, including cognition and worry, linked to the genetic loci at 3p21.1. Our findings provide insights into the relationship between autism and co-occurring traits, which could be used to develop predictive models for more accurate diagnosis and better clinical management.},

keywords = {},

pubstate = {published},

tppubtype = {article}

}



Close

Autism is a complex neurodevelopmental condition that manifests in various ways.
Autism is often accompanied by other conditions, such as
attention-deficit/hyperactivity disorder and schizophrenia, which can complicate
diagnosis and management. Although research has investigated the role of
specific genes in autism, their relationship with co-occurring traits is not
fully understood. To address this, we conducted a two-sample Mendelian
randomisation analysis and identified four genes located at the 17q21.31 locus
that are putatively causal for autism in fetal cortical tissue (LINC02210,
LRRC37A4P, RP11-259G18.1, and RP11-798G7.6). LINC02210 was also identified as
putatively causal for autism in adult cortical tissue. By integrating data from
expression quantitative trait loci, genes and protein interactions, we
identified that the 17q21.31 locus contributes to the intersection between
autism and other neurological traits in fetal cortical tissue. We also
identified a distinct cluster of co-occurring traits, including cognition and
worry, linked to the genetic loci at 3p21.1. Our findings provide insights into
the relationship between autism and co-occurring traits, which could be used to
develop predictive models for more accurate diagnosis and better clinical
management.

Close

 * https://www.medrxiv.org/content/10.1101/2023.04.24.23289060v1
 * doi:10.26508/lsa.202302142

Close

Plum Print visual indicator of research metrics

Dost, Katharina; Pullar-Strecker, Zac; Brydon, Liam; Zhang, Kunyang; Hafner,
Jasmin; Riddle, Pat; Wicker, Jörg

Combatting over-specialization bias in growing chemical databases Journal
Article

In: Journal of Cheminformatics, vol. 15, iss. 1, pp. 53, 2023, ISSN: 1758-2946.

Abstract | Links | BibTeX | Altmetric | PlumX

See more details

X (15)
Mendeley (14)

Close

Plum Print visual indicator of research metrics
 * Citations
   * Citation Indexes: 1
 * Captures
   * Readers: 14
 * Social Media
   * Shares, Likes & Comments: 39

see details

Close

@article{Dost2023Combatting,

title = {Combatting over-specialization bias in growing chemical databases},

author = {Katharina Dost and Zac Pullar-Strecker and Liam Brydon and Kunyang Zhang and Jasmin Hafner and Pat Riddle and J\"{o}rg Wicker},

url = {https://jcheminf.biomedcentral.com/articles/10.1186/s13321-023-00716-w



},

doi = {10.1186/s13321-023-00716-w},

issn = {1758-2946},

year  = {2023},

date = {2023-05-19},

urldate = {2023-05-19},

journal = {Journal of Cheminformatics},

volume = {15},

issue = {1},

pages = {53},

abstract = {Background



Predicting in advance the behavior of new chemical compounds can support the design process of new products by directing the research toward the most promising candidates and ruling out others. Such predictive models can be data-driven using Machine Learning or based on researchers’ experience and depend on the collection of past results. In either case: models (or researchers) can only make reliable assumptions about compounds that are similar to what they have seen before. Therefore, consequent usage of these predictive models shapes the dataset and causes a continuous specialization shrinking the applicability domain of all trained models on this dataset in the future, and increasingly harming model-based exploration of the space.

Proposed solution



In this paper, we propose cancels (CounterActiNg Compound spEciaLization biaS), a technique that helps to break the dataset specialization spiral. Aiming for a smooth distribution of the compounds in the dataset, we identify areas in the space that fall short and suggest additional experiments that help bridge the gap. Thereby, we generally improve the dataset quality in an entirely unsupervised manner and create awareness of potential flaws in the data. cancels does not aim to cover the entire compound space and hence retains a desirable degree of specialization to a specified research domain.

Results



An extensive set of experiments on the use-case of biodegradation pathway prediction not only reveals that the bias spiral can indeed be observed but also that cancels produces meaningful results. Additionally, we demonstrate that mitigating the observed bias is crucial as it cannot only intervene with the continuous specialization process, but also significantly improves a predictor’s performance while reducing the number of required experiments. Overall, we believe that cancels can support researchers in their experimentation process to not only better understand their data and potential flaws, but also to grow the dataset in a sustainable way. All code is available under github.com/KatDost/Cancels.},

keywords = {},

pubstate = {published},

tppubtype = {article}

}



Close

Background



Predicting in advance the behavior of new chemical compounds can support the
design process of new products by directing the research toward the most
promising candidates and ruling out others. Such predictive models can be
data-driven using Machine Learning or based on researchers’ experience and
depend on the collection of past results. In either case: models (or
researchers) can only make reliable assumptions about compounds that are similar
to what they have seen before. Therefore, consequent usage of these predictive
models shapes the dataset and causes a continuous specialization shrinking the
applicability domain of all trained models on this dataset in the future, and
increasingly harming model-based exploration of the space.
Proposed solution

In this paper, we propose cancels (CounterActiNg Compound spEciaLization biaS),
a technique that helps to break the dataset specialization spiral. Aiming for a
smooth distribution of the compounds in the dataset, we identify areas in the
space that fall short and suggest additional experiments that help bridge the
gap. Thereby, we generally improve the dataset quality in an entirely
unsupervised manner and create awareness of potential flaws in the data. cancels
does not aim to cover the entire compound space and hence retains a desirable
degree of specialization to a specified research domain.
Results

An extensive set of experiments on the use-case of biodegradation pathway
prediction not only reveals that the bias spiral can indeed be observed but also
that cancels produces meaningful results. Additionally, we demonstrate that
mitigating the observed bias is crucial as it cannot only intervene with the
continuous specialization process, but also significantly improves a predictor’s
performance while reducing the number of required experiments. Overall, we
believe that cancels can support researchers in their experimentation process to
not only better understand their data and potential flaws, but also to grow the
dataset in a sustainable way. All code is available under
github.com/KatDost/Cancels.

Close

 * https://jcheminf.biomedcentral.com/articles/10.1186/s13321-023-00716-w
 * doi:10.1186/s13321-023-00716-w

Close

Plum Print visual indicator of research metrics

Bensemann, Joshua; Cheena, Hasnain; Huang, David Tse Jung; Broadbent, Elizabeth;
Williams, Jonathan; Wicker, Jörg

From What You See to What We Smell: Linking Human Emotions to Bio-markers in
Breath Journal Article

In: IEEE Transactions on Affective Computing, pp. 1-13, 2023, ISSN: 1949-3045.

Abstract | Links | BibTeX | Altmetric | PlumX

See more details

X (2)

Close

Plum Print visual indicator of research metrics
No metrics available.

see details

Close

@article{bensemann2023from,

title = {From What You See to What We Smell: Linking Human Emotions to Bio-markers in Breath},

author = {Joshua Bensemann and Hasnain Cheena and David Tse Jung Huang and Elizabeth Broadbent and Jonathan Williams and J\"{o}rg Wicker},

url = {https://ieeexplore.ieee.org/document/10123109

https://doi.org/10.17608/k6.auckland.22777364 

https://doi.org/10.17608/k6.auckland.22777352 },

doi = {10.1109/TAFFC.2023.3275216},

issn = {1949-3045},

year  = {2023},

date = {2023-05-11},

urldate = {2023-05-11},

journal = {IEEE Transactions on Affective Computing},

pages = {1-13},

abstract = {Research has shown that the composition of breath can differ based on the human’s behavioral patterns and mental and physical states immediately before being collected. These breath-collection techniques have also been extended to observe the general processes occurring in groups of humans and can link them to what those groups are collectively experiencing. In this research, we applied machine learning techniques to the breath data collected from cinema audiences. These techniques included XGBOOST Regression, Hierarchical Clustering, and Item Basket analyses created using the Apriori algorithm. They were conducted to find associations between the biomarkers in the crowd’s breath and the movie’s audio-visual stimuli and thematic events. This analysis enabled us to directly link what the group was experiencing and their biological response to that experience. We first extracted visual and auditory features from a movie to achieve this. We compared it to the biomarkers in the crowd’s breath using regression and pattern mining techniques. Our results supported the theory that a crowd’s collective experience directly correlates to the biomarkers in the crowd’s breath. Consequently, these findings suggest that visual and auditory experiences have predictable effects on the human

body that can be monitored without requiring expensive or invasive neuroimaging techniques.},

keywords = {},

pubstate = {published},

tppubtype = {article}

}



Close

Research has shown that the composition of breath can differ based on the
human’s behavioral patterns and mental and physical states immediately before
being collected. These breath-collection techniques have also been extended to
observe the general processes occurring in groups of humans and can link them to
what those groups are collectively experiencing. In this research, we applied
machine learning techniques to the breath data collected from cinema audiences.
These techniques included XGBOOST Regression, Hierarchical Clustering, and Item
Basket analyses created using the Apriori algorithm. They were conducted to find
associations between the biomarkers in the crowd’s breath and the movie’s
audio-visual stimuli and thematic events. This analysis enabled us to directly
link what the group was experiencing and their biological response to that
experience. We first extracted visual and auditory features from a movie to
achieve this. We compared it to the biomarkers in the crowd’s breath using
regression and pattern mining techniques. Our results supported the theory that
a crowd’s collective experience directly correlates to the biomarkers in the
crowd’s breath. Consequently, these findings suggest that visual and auditory
experiences have predictable effects on the human
body that can be monitored without requiring expensive or invasive neuroimaging
techniques.

Close

 * https://ieeexplore.ieee.org/document/10123109
 * https://doi.org/10.17608/k6.auckland.22777364
 * https://doi.org/10.17608/k6.auckland.22777352
 * doi:10.1109/TAFFC.2023.3275216

Close

Plum Print visual indicator of research metrics

Roeslin, Samuel; Ma, Quincy; Chigullapally, Pavan; Wicker, Jörg; Wotherspoon,
Liam

Development of a Seismic Loss Prediction Model for Residential Buildings using
Machine Learning – Christchurch, New Zealand Journal Article

In: Natural Hazards and Earth System Sciences, vol. 23, no. 3, pp. 1207-1226,
2023.

Abstract | Links | BibTeX | Altmetric | PlumX

See more details

X (2)
Mendeley (13)

Close

Plum Print visual indicator of research metrics
 * Captures
   * Readers: 13

see details

Close

@article{Roeslin2023development,

title = {Development of a Seismic Loss Prediction Model for Residential Buildings using Machine Learning \textendash Christchurch, New Zealand},

author = {Samuel Roeslin and Quincy Ma and Pavan Chigullapally and J\"{o}rg Wicker and Liam Wotherspoon},

url = {https://nhess.copernicus.org/articles/23/1207/2023/},

doi = {10.5194/nhess-23-1207-2023},

year  = {2023},

date = {2023-03-22},

urldate = {2023-03-22},

journal = {Natural Hazards and Earth System Sciences},

volume = {23},

number = {3},

pages = {1207-1226},

abstract = {This paper presents a new framework for the seismic loss prediction of residential buildings in Christchurch, New Zealand. It employs data science techniques, geospatial tools, and machine learning (ML) trained on insurance claims data from the Earthquake Commission (EQC) collected following the 2010\textendash2011 Canterbury Earthquake Sequence (CES). The seismic loss prediction obtained from the ML model is shown to outperform the output from existing risk analysis tools for New Zealand for each of the main earthquakes of the CES. In addition to the prediction capabilities, the ML model delivered useful insights into the most important features contributing to losses during the CES. ML correctly highlighted that liquefaction significantly influenced buildings losses for the 22 February 2011 earthquake. The results are consistent with observations, engineering knowledge, and previous studies, confirming the potential of data science and ML in the analysis of insurance claims data and the development of seismic loss prediction models using empirical loss data.},

keywords = {},

pubstate = {published},

tppubtype = {article}

}



Close

This paper presents a new framework for the seismic loss prediction of
residential buildings in Christchurch, New Zealand. It employs data science
techniques, geospatial tools, and machine learning (ML) trained on insurance
claims data from the Earthquake Commission (EQC) collected following the
2010–2011 Canterbury Earthquake Sequence (CES). The seismic loss prediction
obtained from the ML model is shown to outperform the output from existing risk
analysis tools for New Zealand for each of the main earthquakes of the CES. In
addition to the prediction capabilities, the ML model delivered useful insights
into the most important features contributing to losses during the CES. ML
correctly highlighted that liquefaction significantly influenced buildings
losses for the 22 February 2011 earthquake. The results are consistent with
observations, engineering knowledge, and previous studies, confirming the
potential of data science and ML in the analysis of insurance claims data and
the development of seismic loss prediction models using empirical loss data.

Close

 * https://nhess.copernicus.org/articles/23/1207/2023/
 * doi:10.5194/nhess-23-1207-2023

Close

Plum Print visual indicator of research metrics


PROCEEDINGS ARTICLES

Kim, Jonathan; Urschler, Martin; Riddle, Pat; Wicker, Jörg

Attacking the Loop: Adversarial Attacks on Graph-based Loop Closure Detection
Proceedings Article

In: Proceedings of the 19th International Joint Conference on Computer Vision,
Imaging and Computer Graphics Theory and Applications, pp. 90-97, 2024.

Abstract | Links | BibTeX | Altmetric | PlumX

Close

Plum Print visual indicator of research metrics
No metrics available.

see details

Close

@inproceedings{kim2024attacking,

title = {Attacking the Loop: Adversarial Attacks on Graph-based Loop Closure Detection},

author = {Jonathan Kim and Martin Urschler and Pat Riddle and J\"{o}rg Wicker },

url = {http://arxiv.org/abs/2312.06991

https://doi.org/10.48550/arxiv.2312.06991},

doi = {10.5220/0012313100003660},

year  = {2024},

date = {2024-02-27},

urldate = {2024-02-27},

booktitle = {Proceedings of the 19th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications},

volume = {4},

pages = {90-97},

abstract = {With the advancement in robotics, it is becoming increasingly common for large factories and warehouses to incorporate visual SLAM (vSLAM) enabled automated robots that operate closely next to humans. This makes any adversarial attacks on vSLAM components potentially detrimental to humans working alongside them. Loop Closure Detection (LCD) is a crucial component in vSLAM that minimizes the accumulation of drift in mapping, since even a small drift can accumulate into a significant drift over time. Previous work by Kim et al. , unified visual features and semantic objects into a single graph structure for finding loop closure candidates. While this provided a performance improvement over visual feature-based LCD, it also created a single point of vulnerability for potential graph-based adversarial attacks. Unlike previously reported visual-patch based attacks, small graph perturbations are far more challenging to detect, making them a more significant threat. In this paper, we present Adversarial-LCD, a novel black-box evasion attack framework that employs an eigencentrality-based perturbation method and an SVM-RBF surrogate model with a Weisfeiler-Lehman feature extractor for attacking graph-based LCD. Our evaluation shows that the attack performance of Adversarial-LCD was superior to that of other machine learning surrogate algorithms, including SVM-linear, SVM-polynomial, and Bayesian classifier, demonstrating the effectiveness of our attack framework. Furthermore, we show that our eigencentrality-based perturbation method outperforms other algorithms, such as Random-walk and Shortest-path, highlighting the efficiency of Adversarial-LCD’s perturbation selection method.},

keywords = {},

pubstate = {published},

tppubtype = {inproceedings}

}



Close

With the advancement in robotics, it is becoming increasingly common for large
factories and warehouses to incorporate visual SLAM (vSLAM) enabled automated
robots that operate closely next to humans. This makes any adversarial attacks
on vSLAM components potentially detrimental to humans working alongside them.
Loop Closure Detection (LCD) is a crucial component in vSLAM that minimizes the
accumulation of drift in mapping, since even a small drift can accumulate into a
significant drift over time. Previous work by Kim et al. , unified visual
features and semantic objects into a single graph structure for finding loop
closure candidates. While this provided a performance improvement over visual
feature-based LCD, it also created a single point of vulnerability for potential
graph-based adversarial attacks. Unlike previously reported visual-patch based
attacks, small graph perturbations are far more challenging to detect, making
them a more significant threat. In this paper, we present Adversarial-LCD, a
novel black-box evasion attack framework that employs an eigencentrality-based
perturbation method and an SVM-RBF surrogate model with a Weisfeiler-Lehman
feature extractor for attacking graph-based LCD. Our evaluation shows that the
attack performance of Adversarial-LCD was superior to that of other machine
learning surrogate algorithms, including SVM-linear, SVM-polynomial, and
Bayesian classifier, demonstrating the effectiveness of our attack framework.
Furthermore, we show that our eigencentrality-based perturbation method
outperforms other algorithms, such as Random-walk and Shortest-path,
highlighting the efficiency of Adversarial-LCD’s perturbation selection method.

Close

 * http://arxiv.org/abs/2312.06991
 * https://doi.org/10.48550/arxiv.2312.06991
 * doi:10.5220/0012313100003660

Close

Plum Print visual indicator of research metrics

Pullar-Strecker, Zac; Chang, Xinglong; Brydon, Liam; Ziogas, Ioannis; Dost,
Katharina; Wicker, Jörg

Memento: Facilitating Effortless, Efficient, and Reliable ML Experiments
Proceedings Article

In: Morales, Gianmarco De Francisci; Perlich, Claudia; Ruchansky, Natali;
Kourtellis, Nicolas; Baralis, Elena; Bonchi, Francesco (Ed.): Machine Learning
and Knowledge Discovery in Databases: Applied Data Science and Demo Track, pp.
310-314, Springer Nature Switzerland, Cham, 2023, ISBN: 978-3-031-43430-3.

Abstract | Links | BibTeX | Altmetric | PlumX

Close

Plum Print visual indicator of research metrics
 * Captures
   * Readers: 1

see details

Close

@inproceedings{Pullar-Strecker2023memento,

title = {Memento: Facilitating Effortless, Efficient, and Reliable ML Experiments},

author = {Zac Pullar-Strecker and Xinglong Chang and Liam Brydon and Ioannis Ziogas and Katharina Dost and J\"{o}rg Wicker},

editor = {Gianmarco De Francisci Morales and Claudia Perlich and Natali Ruchansky and Nicolas Kourtellis and Elena Baralis and Francesco Bonchi },

url = {https://arxiv.org/abs/2304.09175

https://github.com/wickerlab/memento},

doi = {10.1007/978-3-031-43430-3_21},

isbn = {978-3-031-43430-3},

year  = {2023},

date = {2023-09-17},

urldate = {2023-09-17},

booktitle = {Machine Learning and Knowledge Discovery in Databases: Applied Data Science and Demo Track},

journal = {Lecture Notes in Computer Science},

pages = {310-314},

publisher = {Springer Nature Switzerland},

address = {Cham},

abstract = { Running complex sets of machine learning experiments is challenging and time-consuming due to the lack of a unified framework. This leaves researchers forced to spend time implementing necessary features such as parallelization, caching, and checkpointing themselves instead of focussing on their project. To simplify the process, in our paper, we introduce Memento, a Python package that is designed to aid researchers and data scientists in the efficient management and execution of computationally intensive experiments. Memento has the capacity to streamline any experimental pipeline by providing a straightforward configuration matrix and the ability to concurrently run experiments across multiple threads.



Code related to this paper is available at: https://github.com/wickerlab/memento.},

keywords = {},

pubstate = {published},

tppubtype = {inproceedings}

}



Close

Running complex sets of machine learning experiments is challenging and
time-consuming due to the lack of a unified framework. This leaves researchers
forced to spend time implementing necessary features such as parallelization,
caching, and checkpointing themselves instead of focussing on their project. To
simplify the process, in our paper, we introduce Memento, a Python package that
is designed to aid researchers and data scientists in the efficient management
and execution of computationally intensive experiments. Memento has the capacity
to streamline any experimental pipeline by providing a straightforward
configuration matrix and the ability to concurrently run experiments across
multiple threads.



Code related to this paper is available at:
https://github.com/wickerlab/memento.

Close

 * https://arxiv.org/abs/2304.09175
 * https://github.com/wickerlab/memento
 * doi:10.1007/978-3-031-43430-3_21

Close

Plum Print visual indicator of research metrics

Chang, Luke; Dost, Katharina; Zhai, Kaiqi; Demontis, Ambra; Roli, Fabio; Dobbie,
Gillian; Wicker, Jörg

BAARD: Blocking Adversarial Examples by Testing for Applicability, Reliability
and Decidability Proceedings Article

In: Kashima, Hisashi; Ide, Tsuyoshi; Peng, Wen-Chih (Ed.): The 27th Pacific-Asia
Conference on Knowledge Discovery and Data Mining (PAKDD), pp. 3-14, Springer
Nature Switzerland, Cham, 2023, ISSN: 978-3-031-33374-3.

Abstract | Links | BibTeX | Altmetric | PlumX

See more details

X (2)
Mendeley (2)

Close

Plum Print visual indicator of research metrics
No metrics available.

see details

Close

@inproceedings{chang2021baard,

title = {BAARD: Blocking Adversarial Examples by Testing for Applicability, Reliability and Decidability},

author = {Luke Chang and Katharina Dost and Kaiqi Zhai and Ambra Demontis and Fabio Roli and Gillian Dobbie and J\"{o}rg Wicker},

editor = {Hisashi Kashima and Tsuyoshi Ide and Wen-Chih Peng},

url = {https://arxiv.org/abs/2105.00495

https://github.com/wickerlab/baard},

doi = {10.1007/978-3-031-33374-3_1},

issn = {978-3-031-33374-3},

year  = {2023},

date = {2023-05-27},

urldate = {2023-05-27},

booktitle = {The 27th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD)},

journal = {The 27th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD)},

pages = {3-14},

publisher = {Springer Nature Switzerland},

address = {Cham},

abstract = {Adversarial defenses protect machine learning models from adversarial attacks, but are often tailored to one type of model or attack. The lack of information on unknown potential attacks makes detecting adversarial examples challenging. Additionally, attackers do not need to follow the rules made by the defender. To address this problem, we take inspiration from the concept of Applicability Domain in cheminformatics. Cheminformatics models struggle to make accurate predictions because only a limited number of compounds are known and available for training. Applicability Domain defines a domain based on the known compounds and rejects any unknown compound that falls outside the domain. Similarly, adversarial examples start as harmless inputs, but can be manipulated to evade reliable classification by moving outside the domain of the classifier. We are the first to identify the similarity between Applicability Domain and adversarial detection. Instead of focusing on unknown attacks, we focus on what is known, the training data. We propose a simple yet robust triple-stage data-driven framework that checks the input globally and locally, and confirms that they are coherent with the model’s output. This framework can be applied to any classification model and is not limited to specific attacks. We demonstrate these three stages work as one unit, effectively detecting various attacks, even for a white-box scenario.},

keywords = {},

pubstate = {published},

tppubtype = {inproceedings}

}



Close

Adversarial defenses protect machine learning models from adversarial attacks,
but are often tailored to one type of model or attack. The lack of information
on unknown potential attacks makes detecting adversarial examples challenging.
Additionally, attackers do not need to follow the rules made by the defender. To
address this problem, we take inspiration from the concept of Applicability
Domain in cheminformatics. Cheminformatics models struggle to make accurate
predictions because only a limited number of compounds are known and available
for training. Applicability Domain defines a domain based on the known compounds
and rejects any unknown compound that falls outside the domain. Similarly,
adversarial examples start as harmless inputs, but can be manipulated to evade
reliable classification by moving outside the domain of the classifier. We are
the first to identify the similarity between Applicability Domain and
adversarial detection. Instead of focusing on unknown attacks, we focus on what
is known, the training data. We propose a simple yet robust triple-stage
data-driven framework that checks the input globally and locally, and confirms
that they are coherent with the model’s output. This framework can be applied to
any classification model and is not limited to specific attacks. We demonstrate
these three stages work as one unit, effectively detecting various attacks, even
for a white-box scenario.

Close

 * https://arxiv.org/abs/2105.00495
 * https://github.com/wickerlab/baard
 * doi:10.1007/978-3-031-33374-3_1

Close

Plum Print visual indicator of research metrics

Chen, Zeyu; Dost, Katharina; Zhu, Xuan; Chang, Xinglong; Dobbie, Gillian;
Wicker, Jörg

Targeted Attacks on Time Series Forecasting Proceedings Article

In: Kashima, Hisashi; Ide, Tsuyoshi; Peng, Wen-Chih (Ed.): The 27th Pacific-Asia
Conference on Knowledge Discovery and Data Mining (PAKDD), pp. 314-327, Springer
Nature Switzerland, Cham, 2023, ISSN: 978-3-031-33383-5.

Abstract | Links | BibTeX | Altmetric | PlumX

Close

Plum Print visual indicator of research metrics
 * Captures
   * Readers: 8

see details

Close

@inproceedings{Chen2023targeted,

title = {Targeted Attacks on Time Series Forecasting},

author = {Zeyu Chen and Katharina Dost and Xuan Zhu and Xinglong Chang and Gillian Dobbie and J\"{o}rg Wicker},

editor = {Hisashi Kashima and Tsuyoshi Ide and Wen-Chih Peng},

url = {https://github.com/wickerlab/nvita},

doi = {10.1007/978-3-031-33383-5_25},

issn = {978-3-031-33383-5},

year  = {2023},

date = {2023-05-26},

urldate = {2023-05-25},

booktitle = {The 27th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD)},

pages = {314-327},

publisher = {Springer Nature Switzerland},

address = {Cham},

abstract = {Abstract. Time Series Forecasting (TSF) is well established in domains dealing with temporal data to predict future events yielding the basis for strategic decision-making. Previous research indicated that forecasting models are vulnerable to adversarial attacks, that is, maliciously crafted perturbations of the original data with the goal of altering the model’s predictions. However, attackers targeting specific outcomes pose a substantially more severe threat as they could manipulate the model and bend it to their needs. Regardless, there is no systematic approach for targeted adversarial learning in the TSF domain yet. In this paper, we introduce targeted attacks on TSF in a systematic manner. We establish a new experimental design standard regarding attack goals and perturbation control for targeted adversarial learning on TSF. For this purpose, we present a novel indirect sparse black-box evasion attack on TSF, nVita. Additionally, we adapt the popular white-box attacks Fast Gradient Sign Method (FGSM) and Basic Iterative Method (BIM). Our experiments confirm not only that all three methods are effective but also that current state-of-the-art TSF models are indeed susceptible to attacks. These results motivate future research in this area to achieve higher reliability of forecasting models.},

keywords = {},

pubstate = {published},

tppubtype = {inproceedings}

}



Close

Abstract. Time Series Forecasting (TSF) is well established in domains dealing
with temporal data to predict future events yielding the basis for strategic
decision-making. Previous research indicated that forecasting models are
vulnerable to adversarial attacks, that is, maliciously crafted perturbations of
the original data with the goal of altering the model’s predictions. However,
attackers targeting specific outcomes pose a substantially more severe threat as
they could manipulate the model and bend it to their needs. Regardless, there is
no systematic approach for targeted adversarial learning in the TSF domain yet.
In this paper, we introduce targeted attacks on TSF in a systematic manner. We
establish a new experimental design standard regarding attack goals and
perturbation control for targeted adversarial learning on TSF. For this purpose,
we present a novel indirect sparse black-box evasion attack on TSF, nVita.
Additionally, we adapt the popular white-box attacks Fast Gradient Sign Method
(FGSM) and Basic Iterative Method (BIM). Our experiments confirm not only that
all three methods are effective but also that current state-of-the-art TSF
models are indeed susceptible to attacks. These results motivate future research
in this area to achieve higher reliability of forecasting models.

Close

 * https://github.com/wickerlab/nvita
 * doi:10.1007/978-3-031-33383-5_25

Close

Plum Print visual indicator of research metrics


MISCELLANEOUS

Lorsbach, Tim; Wicker, Jörg

enviPath-python: v0.2.3 Miscellaneous

Zenedo, 2024.

Links | BibTeX | Altmetric | PlumX

Close

Plum Print visual indicator of research metrics
No metrics available.

see details

Close

@misc{lorsbach2024envipath,

title = {enviPath-python: v0.2.3},

author = {Tim Lorsbach and J\"{o}rg Wicker},

url = {https://github.com/enviPath/enviPath-python/tree/v0.2.3},

doi = {10.5281/zenodo.10929408},

year  = {2024},

date = {2024-04-05},

urldate = {2024-04-05},

howpublished = {Zenedo},

keywords = {},

pubstate = {published},

tppubtype = {misc}

}



Close

 * https://github.com/enviPath/enviPath-python/tree/v0.2.3
 * doi:10.5281/zenodo.10929408

Close

Plum Print visual indicator of research metrics

Chang, Xinglong; Brydon, Liam; Wicker, Jörg

Memento: v1.1.1 Miscellaneous

Zenedo, 2024.

Links | BibTeX | Altmetric | PlumX

Close

Plum Print visual indicator of research metrics
No metrics available.

see details

Close

@misc{chang2024memento,

title = {Memento: v1.1.1},

author = {Xinglong Chang and Liam Brydon and J\"{o}rg Wicker},

url = {https://github.com/wickerlab/memento/tree/v1.1.1},

doi = {10.5281/zenodo.10929406},

year  = {2024},

date = {2024-04-05},

howpublished = {Zenedo},

keywords = {},

pubstate = {published},

tppubtype = {misc}

}



Close

 * https://github.com/wickerlab/memento/tree/v1.1.1
 * doi:10.5281/zenodo.10929406

Close

Plum Print visual indicator of research metrics

Wicker, Jörg; Krauter, Nicolas; Derstorff, Bettina; Stönner, Christof;
Bourtsoukidis, Efstratios; Klüpfel, Thomas; Williams, Jonathan; Kramer, Stefan

Cinema Experiments 2013 Miscellaneous

2023.

Links | BibTeX | Altmetric | PlumX

Close

Plum Print visual indicator of research metrics
No metrics available.

see details

Close

@misc{Wicker2023cinema,

title = {Cinema Experiments 2013},

author = { J\"{o}rg Wicker and Nicolas Krauter and Bettina Derstorff and Christof St\"{o}nner and Efstratios Bourtsoukidis and Thomas Kl\"{u}pfel and Jonathan Williams and Stefan Kramer},

url = {https://auckland.figshare.com/articles/dataset/Cinema_Experiments_2013/22777364},

doi = {10.17608/k6.auckland.22777364.v3},

year  = {2023},

date = {2023-05-23},

keywords = {},

pubstate = {published},

tppubtype = {misc}

}



Close

 * https://auckland.figshare.com/articles/dataset/Cinema_Experiments_2013/22777364
 * doi:10.17608/k6.auckland.22777364.v3

Close

Plum Print visual indicator of research metrics

Stönner, Christof; Edtbauer, Achim; Derstorff, Bettina; Bourtsoukidis,
Efstratios; Klüpfel, Thomas; Wicker, Jörg; Williams, Jonathan

Cinema Experiments 2015 Miscellaneous

2023.

Links | BibTeX | Altmetric | PlumX

Close

Plum Print visual indicator of research metrics
No metrics available.

see details

Close

@misc{St\"{o}nner2023cinema,

title = {Cinema Experiments 2015},

author = { Christof St\"{o}nner and Achim Edtbauer and Bettina Derstorff and Efstratios Bourtsoukidis and Thomas Kl\"{u}pfel and J\"{o}rg Wicker and Jonathan Williams},

url = {https://auckland.figshare.com/articles/dataset/Cinema_Experiments_2015/22777352},

doi = {10.17608/k6.auckland.22777352.v2},

year  = {2023},

date = {2023-05-23},

keywords = {},

pubstate = {published},

tppubtype = {misc}

}



Close

 * https://auckland.figshare.com/articles/dataset/Cinema_Experiments_2015/22777352
 * doi:10.17608/k6.auckland.22777352.v2

Close

Plum Print visual indicator of research metrics


UNPUBLISHED

Graffeuille, Olivier; Lehmann, Moritz; Allan, Matthew; Wicker, Jörg; Koh, Yun
Sing

Lake by Lake, Globally: Enhancing Water Quality Remote Sensing with Multi-Task
Learning Models Unpublished Forthcoming

Forthcoming, ISSN: 1556-5068.

Abstract | Links | BibTeX | Altmetric | PlumX

Close

Plum Print visual indicator of research metrics
 * Usage
   * Abstract Views: 94
   * Downloads: 30

see details

Close

@unpublished{graffeuille2024lake,

title = {Lake by Lake, Globally: Enhancing Water Quality Remote Sensing with Multi-Task Learning Models},

author = {Olivier Graffeuille and Moritz Lehmann and Matthew Allan and J\"{o}rg Wicker and Yun Sing Koh },

doi = {10.2139/ssrn.4762429},

issn = {1556-5068},

year  = {2024},

date = {2024-03-17},

abstract = {The estimation of water quality from satellite remote sensing data in inland and coastal waters is an important yet challenging problem. Recent collaborative efforts have produced large global datasets with sufficient data to train machine learning models with high accuracy. In this work, we investigate global water quality remote sensing models at the granularity of individual water bodies. We introduce Multi-Task Learning (MTL), a machine learning technique that learns a distinct model for each water body in the dataset from few data points by sharing knowledge between models. This approach allows MTL to learn water body differences, leading to more accurate predictions. We train and validate our model on the GLORIA dataset of in situ measured remote sensing reflectance and three water quality indicators: chlorophyll$a$, total suspended solids and coloured dissolved organic matter. MTL outperforms other machine learning models by 8-31% in Root Mean Squared Error (RMSE) and 12-34% in Mean Absolute Percentage Error (MAPE). Training on a smaller dataset of chlorophyll$a$ measurements from New Zealand lakes with simultaneous Sentinel-3 OLCI remote sensing reflectance further demonstrates the effectiveness of our model when applied regionally. Additionally, we investigate the performance of machine learning models at estimating the variation in water quality indicators within individual water bodies. Our results reveal that overall performance metrics overestimate the quality of model fit of models trained on a large number of water bodies due to the large between-water body variability of water quality indicators. In our experiments, when estimating TSS or CDOM, all models excluding multi-task learning fail to learn within-water body variability, and fail to outperform a naive baseline approach, suggesting that these models may be of limited usefulness to practitioners monitoring water quality. Overall, our research highlights the importance of considering water body differences in water quality remote sensing research for both model design and evaluation. },

keywords = {},

pubstate = {forthcoming},

tppubtype = {unpublished}

}



Close

The estimation of water quality from satellite remote sensing data in inland and
coastal waters is an important yet challenging problem. Recent collaborative
efforts have produced large global datasets with sufficient data to train
machine learning models with high accuracy. In this work, we investigate global
water quality remote sensing models at the granularity of individual water
bodies. We introduce Multi-Task Learning (MTL), a machine learning technique
that learns a distinct model for each water body in the dataset from few data
points by sharing knowledge between models. This approach allows MTL to learn
water body differences, leading to more accurate predictions. We train and
validate our model on the GLORIA dataset of in situ measured remote sensing
reflectance and three water quality indicators: chlorophyll$a$, total suspended
solids and coloured dissolved organic matter. MTL outperforms other machine
learning models by 8-31% in Root Mean Squared Error (RMSE) and 12-34% in Mean
Absolute Percentage Error (MAPE). Training on a smaller dataset of
chlorophyll$a$ measurements from New Zealand lakes with simultaneous Sentinel-3
OLCI remote sensing reflectance further demonstrates the effectiveness of our
model when applied regionally. Additionally, we investigate the performance of
machine learning models at estimating the variation in water quality indicators
within individual water bodies. Our results reveal that overall performance
metrics overestimate the quality of model fit of models trained on a large
number of water bodies due to the large between-water body variability of water
quality indicators. In our experiments, when estimating TSS or CDOM, all models
excluding multi-task learning fail to learn within-water body variability, and
fail to outperform a naive baseline approach, suggesting that these models may
be of limited usefulness to practitioners monitoring water quality. Overall, our
research highlights the importance of considering water body differences in
water quality remote sensing research for both model design and evaluation.

Close

 * doi:10.2139/ssrn.4762429

Close

Plum Print visual indicator of research metrics

Lyu, Jiachen; Dost, Katharina; Koh, Yun Sing; Wicker, Jörg

Regional Bias in Monolingual English Language Models Unpublished Forthcoming

Forthcoming, (accepted at the ECML / PKDD 2024 Journal Track).

Abstract | Links | BibTeX | Altmetric | PlumX

Close

Plum Print visual indicator of research metrics
No metrics available.

see details

Close

@unpublished{lyu2023regional,

title = {Regional Bias in Monolingual English Language Models},

author = {Jiachen Lyu and Katharina Dost and Yun Sing Koh and J\"{o}rg Wicker},

doi = {10.21203/rs.3.rs-3713494/v1},

year  = {2023},

date = {2023-12-06},

urldate = {2023-12-06},

abstract = {In Natural Language Processing (NLP), pre-trained language models (LLMs) are widely employed and refined for various tasks. These models have shown considerable social and geographic biases creating skewed or even unfair representations of certain groups. Research focuses on biases toward L2 (English as a second language) regions but neglects bias within L1 (first language) regions. In this work, we ask if there is regional bias within L1 regions already inherent in pre-trained LLMs and, if so, what the consequences are in terms of downstream model performance. We contribute an investigation framework specifically tailored for low-resource regions, offering a method to identify bias without imposing strict requirements for labeled datasets. Our research reveals subtle geographic variations in the word embeddings of BERT, even in cultures traditionally perceived as similar. These nuanced features, once captured, have the potential to significantly impact downstream tasks. Generally, models exhibit comparable performance on datasets that share similarities, and conversely, performance may diverge when datasets differ in their nuanced features embedded within the language. It is crucial to note that estimating model performance solely based on standard benchmark datasets may not necessarily apply to the datasets with distinct features from the benchmark datasets. Our proposed framework plays a pivotal role in identifying and addressing biases detected in word embeddings, particularly evident in low-resource regions such as New Zealand.},

note = {accepted at the ECML / PKDD 2024 Journal Track},

keywords = {},

pubstate = {forthcoming},

tppubtype = {unpublished}

}



Close

In Natural Language Processing (NLP), pre-trained language models (LLMs) are
widely employed and refined for various tasks. These models have shown
considerable social and geographic biases creating skewed or even unfair
representations of certain groups. Research focuses on biases toward L2 (English
as a second language) regions but neglects bias within L1 (first language)
regions. In this work, we ask if there is regional bias within L1 regions
already inherent in pre-trained LLMs and, if so, what the consequences are in
terms of downstream model performance. We contribute an investigation framework
specifically tailored for low-resource regions, offering a method to identify
bias without imposing strict requirements for labeled datasets. Our research
reveals subtle geographic variations in the word embeddings of BERT, even in
cultures traditionally perceived as similar. These nuanced features, once
captured, have the potential to significantly impact downstream tasks.
Generally, models exhibit comparable performance on datasets that share
similarities, and conversely, performance may diverge when datasets differ in
their nuanced features embedded within the language. It is crucial to note that
estimating model performance solely based on standard benchmark datasets may not
necessarily apply to the datasets with distinct features from the benchmark
datasets. Our proposed framework plays a pivotal role in identifying and
addressing biases detected in word embeddings, particularly evident in
low-resource regions such as New Zealand.

Close

 * doi:10.21203/rs.3.rs-3713494/v1

Close

Plum Print visual indicator of research metrics

Hua, Yan Cathy; Denny, Paul; Wicker, Jörg; Taskova, Katerina

A Systematic Review of Aspect-based Sentiment Analysis: Domains, Methods, and
Trends Unpublished Forthcoming

Forthcoming.

Abstract | Links | BibTeX | Altmetric | PlumX

Close

Plum Print visual indicator of research metrics
No metrics available.

see details

Close

@unpublished{hua2023systematic,

title = {A Systematic Review of Aspect-based Sentiment Analysis: Domains, Methods, and Trends},

author = {Yan Cathy Hua and Paul Denny and J\"{o}rg Wicker and Katerina Taskova},

url = {https://arxiv.org/abs/2311.10777},

doi = {10.48550/arXiv.2311.10777},

year  = {2023},

date = {2023-11-17},

urldate = {2023-11-17},

abstract = {Aspect-based Sentiment Analysis (ABSA) is a fine-grained type of sentiment analysis that identifies aspects and their associated opinions from a given text. With the surge of digital opinionated text data, ABSA gained increasing popularity for its ability to mine more detailed and targeted insights. Many review papers on ABSA subtasks and solution methodologies exist, however, few focus on trends over time or systemic issues relating to research application domains, datasets, and solution approaches. To fill the gap, this paper presents a Systematic Literature Review (SLR) of ABSA studies with a focus on trends and high-level relationships among these fundamental components. This review is one of the largest SLRs on ABSA, and also, to our knowledge, the first that systematically examines the trends and inter-relations among ABSA research and data distribution across domains and solution paradigms and approaches. Our sample includes 519 primary studies screened from 4191 search results without time constraints via an innovative automatic filtering process. Our quantitative analysis not only identifies trends in nearly two decades of ABSA research development but also unveils a systemic lack of dataset and domain diversity as well as domain mismatch that may hinder the development of future ABSA research. We discuss these findings and their implications and propose suggestions for future research. },

keywords = {},

pubstate = {forthcoming},

tppubtype = {unpublished}

}



Close

Aspect-based Sentiment Analysis (ABSA) is a fine-grained type of sentiment
analysis that identifies aspects and their associated opinions from a given
text. With the surge of digital opinionated text data, ABSA gained increasing
popularity for its ability to mine more detailed and targeted insights. Many
review papers on ABSA subtasks and solution methodologies exist, however, few
focus on trends over time or systemic issues relating to research application
domains, datasets, and solution approaches. To fill the gap, this paper presents
a Systematic Literature Review (SLR) of ABSA studies with a focus on trends and
high-level relationships among these fundamental components. This review is one
of the largest SLRs on ABSA, and also, to our knowledge, the first that
systematically examines the trends and inter-relations among ABSA research and
data distribution across domains and solution paradigms and approaches. Our
sample includes 519 primary studies screened from 4191 search results without
time constraints via an innovative automatic filtering process. Our quantitative
analysis not only identifies trends in nearly two decades of ABSA research
development but also unveils a systemic lack of dataset and domain diversity as
well as domain mismatch that may hinder the development of future ABSA research.
We discuss these findings and their implications and propose suggestions for
future research.

Close

 * https://arxiv.org/abs/2311.10777
 * doi:10.48550/arXiv.2311.10777

Close

Plum Print visual indicator of research metrics

Dost, Katharina; Tam, Jason; Lorsbach, Tim; Schmidt, Sebastian; Wicker, Jörg

Defining Applicability Domain in Biodegradation Pathway Prediction Unpublished
Forthcoming

Forthcoming.

Abstract | Links | BibTeX | Altmetric | PlumX

Close

Plum Print visual indicator of research metrics
No metrics available.

see details

Close

@unpublished{dost2023defining,

title = {Defining Applicability Domain in Biodegradation Pathway Prediction},

author = {Katharina Dost and Jason Tam and Tim Lorsbach and Sebastian Schmidt and J\"{o}rg Wicker},

doi = {https://doi.org/10.21203/rs.3.rs-3587632/v1},

year  = {2023},

date = {2023-11-10},

urldate = {2023-11-10},

abstract = {When developing a new chemical, investigating its long-term influences on the environment is crucial to prevent harm. Unfortunately, these experiments are time-consuming. In silico methods can learn from already obtained data to predict biotransformation pathways, and thereby help focus all development efforts on only the most promising chemicals. As all data-based models, these predictors will output pathway predictions for all input compounds in a suitable format, however, these predictions will be faulty unless the model has seen similar compounds during the training process. A common approach to prevent this for other types of models is to define an Applicability Domain for the model that makes predictions only for in-domain compounds and rejects out-of-domain ones. Nonetheless, although exploration of the compound space is particularly interesting in the development of new chemicals, no Applicability Domain method has been tailored to the specific data structure of pathway predictions yet. In this paper, we are the first to define Applicability Domain specialized in biodegradation pathway prediction. Assessing a model’s reliability from different angles, we suggest a three-stage approach that checks for applicability, reliability, and decidability of the model for a queried compound and only allows it to output a prediction if all three stages are passed. Experiments confirm that our proposed technique reliably rejects unsuitable compounds and therefore improves the safety of the biotransformation pathway predictor. },

keywords = {},

pubstate = {forthcoming},

tppubtype = {unpublished}

}



Close

When developing a new chemical, investigating its long-term influences on the
environment is crucial to prevent harm. Unfortunately, these experiments are
time-consuming. In silico methods can learn from already obtained data to
predict biotransformation pathways, and thereby help focus all development
efforts on only the most promising chemicals. As all data-based models, these
predictors will output pathway predictions for all input compounds in a suitable
format, however, these predictions will be faulty unless the model has seen
similar compounds during the training process. A common approach to prevent this
for other types of models is to define an Applicability Domain for the model
that makes predictions only for in-domain compounds and rejects out-of-domain
ones. Nonetheless, although exploration of the compound space is particularly
interesting in the development of new chemicals, no Applicability Domain method
has been tailored to the specific data structure of pathway predictions yet. In
this paper, we are the first to define Applicability Domain specialized in
biodegradation pathway prediction. Assessing a model’s reliability from
different angles, we suggest a three-stage approach that checks for
applicability, reliability, and decidability of the model for a queried compound
and only allows it to output a prediction if all three stages are passed.
Experiments confirm that our proposed technique reliably rejects unsuitable
compounds and therefore improves the safety of the biotransformation pathway
predictor.

Close

 * doi:https://doi.org/10.21203/rs.3.rs-3587632/v1

Close

Plum Print visual indicator of research metrics

Hafner, Jasmin; Lorsbach, Tim; Schmidt, Sebastian; Brydon, Liam; Dost,
Katharina; Zhang, Kunyang; Fenner, Kathrin; Wicker, Jörg

Advancements in Biotransformation Pathway Prediction: Enhancements, Datasets,
and Novel Functionalities in enviPath Unpublished Forthcoming

Forthcoming.

Abstract | Links | BibTeX | Altmetric | PlumX

Close

Plum Print visual indicator of research metrics
No metrics available.

see details

Close

@unpublished{nokey,

title = {Advancements in Biotransformation Pathway Prediction: Enhancements, Datasets, and Novel Functionalities in enviPath},

author = {Jasmin Hafner and Tim Lorsbach and Sebastian Schmidt and Liam Brydon and Katharina Dost and Kunyang Zhang and Kathrin Fenner and J\"{o}rg Wicker},

doi = {10.21203/rs.3.rs-3607847/v1},

year  = {2023},

date = {2023-11-03},

urldate = {2023-11-03},

abstract = {enviPath is a widely used database and prediction system for microbial biotransformation pathways of primarily xenobiotic compounds. Data and prediction system are freely available both via a web interface and a public REST API.  Since its initial release in 2016, we extended the data available in enviPath and improved the performance of the prediction system  and usability of the overall system. We now provide three diverse data sets, covering microbial biotransformation in different environments and under different experimental conditions. This also enabled developing a pathway prediction model that is applicable to a more diverse set of chemicals. In the prediction engine, we implemented a new evaluation tailored towards pathway prediction, that returns a more honest and holistic view on the performance. We also implemented a novel applicability domain algorithm, which allows the user to estimate how well the model will perform on their data. Finally, we improved the implementation to speed up the overall system and provide new functionality via a plugin system. Overall, enviPath has developed into a reliable database and prediction system with a unique use case in research in microbial biotransformations. },

keywords = {},

pubstate = {forthcoming},

tppubtype = {unpublished}

}



Close

enviPath is a widely used database and prediction system for microbial
biotransformation pathways of primarily xenobiotic compounds. Data and
prediction system are freely available both via a web interface and a public
REST API. Since its initial release in 2016, we extended the data available in
enviPath and improved the performance of the prediction system and usability of
the overall system. We now provide three diverse data sets, covering microbial
biotransformation in different environments and under different experimental
conditions. This also enabled developing a pathway prediction model that is
applicable to a more diverse set of chemicals. In the prediction engine, we
implemented a new evaluation tailored towards pathway prediction, that returns a
more honest and holistic view on the performance. We also implemented a novel
applicability domain algorithm, which allows the user to estimate how well the
model will perform on their data. Finally, we improved the implementation to
speed up the overall system and provide new functionality via a plugin system.
Overall, enviPath has developed into a reliable database and prediction system
with a unique use case in research in microbial biotransformations.

Close

 * doi:10.21203/rs.3.rs-3607847/v1

Close

Plum Print visual indicator of research metrics

Chang, Xinglong; Dost, Katharina; Dobbie, Gillian; Wicker, Jörg

Poison is Not Traceless: Fully-Agnostic Detection of Poisoning Attacks
Unpublished Forthcoming

Forthcoming.

Abstract | Links | BibTeX | Altmetric | PlumX

Close

Plum Print visual indicator of research metrics
No metrics available.

see details

Close

@unpublished{Chang2023poison,

title = {Poison is Not Traceless: Fully-Agnostic Detection of Poisoning Attacks },

author = {Xinglong Chang and Katharina Dost and Gillian Dobbie and J\"{o}rg Wicker},

url = {http://arxiv.org/abs/2310.16224},

doi = {10.48550/arXiv.2310.16224},

year  = {2023},

date = {2023-10-23},

urldate = {2023-10-23},

abstract = {The performance of machine learning models depends on the quality of the underlying data. Malicious actors can attack the model by poisoning the training data. Current detectors are tied to either specific data types, models, or attacks, and therefore have limited applicability in real-world scenarios. This paper presents a novel fully-agnostic framework, Diva (Detecting InVisible Attacks), that detects attacks solely relying on analyzing the potentially poisoned data set. Diva is based on the idea that poisoning attacks can be detected by comparing the classifier’s accuracy on poisoned and clean data and pre-trains a meta-learner using Complexity Measures to estimate the otherwise unknown accuracy on a hypothetical clean dataset. The framework applies to generic poisoning attacks. For evaluation purposes, in this paper, we test Diva on label-flipping attacks.},

keywords = {},

pubstate = {forthcoming},

tppubtype = {unpublished}

}



Close

The performance of machine learning models depends on the quality of the
underlying data. Malicious actors can attack the model by poisoning the training
data. Current detectors are tied to either specific data types, models, or
attacks, and therefore have limited applicability in real-world scenarios. This
paper presents a novel fully-agnostic framework, Diva (Detecting InVisible
Attacks), that detects attacks solely relying on analyzing the potentially
poisoned data set. Diva is based on the idea that poisoning attacks can be
detected by comparing the classifier’s accuracy on poisoned and clean data and
pre-trains a meta-learner using Complexity Measures to estimate the otherwise
unknown accuracy on a hypothetical clean dataset. The framework applies to
generic poisoning attacks. For evaluation purposes, in this paper, we test Diva
on label-flipping attacks.

Close

 * http://arxiv.org/abs/2310.16224
 * doi:10.48550/arXiv.2310.16224

Close

Plum Print visual indicator of research metrics

Chang, Xinglong; Dobbie, Gillian; Wicker, Jörg

Fast Adversarial Label-Flipping Attack on Tabular Data Unpublished Forthcoming

Forthcoming.

Abstract | Links | BibTeX | Altmetric | PlumX

Close

Plum Print visual indicator of research metrics
No metrics available.

see details

Close

@unpublished{Chang2023fast,

title = {Fast Adversarial Label-Flipping Attack on Tabular Data},

author = {Xinglong Chang and Gillian Dobbie and J\"{o}rg Wicker},

url = {https://arxiv.org/abs/2310.10744},

doi = {10.48550/arXiv.2310.10744},

year  = {2023},

date = {2023-10-16},

urldate = {2023-10-16},

abstract = {Machine learning models are increasingly used in fields that require high reliability such as cybersecurity. However, these models remain vulnerable to various attacks, among which the adversarial label-flipping attack poses significant threats. In label-flipping attacks, the adversary maliciously flips a portion of training labels to compromise the machine learning model. This paper raises significant concerns as these attacks can camouflage a highly skewed dataset as an easily solvable classification problem, often misleading machine learning practitioners into lower defenses and miscalculations of potential risks. This concern amplifies in tabular data settings, where identifying true labels requires expertise, allowing malicious label-flipping attacks to easily slip under the radar. To demonstrate this risk is inherited in the adversary\'s objective, we propose FALFA (Fast Adversarial Label-Flipping Attack), a novel efficient attack for crafting adversarial labels. FALFA is based on transforming the adversary\'s objective and employs linear programming to reduce computational complexity. Using ten real-world tabular datasets, we demonstrate FALFA\'s superior attack potential, highlighting the need for robust defenses against such threats. },

keywords = {},

pubstate = {forthcoming},

tppubtype = {unpublished}

}



Close

Machine learning models are increasingly used in fields that require high
reliability such as cybersecurity. However, these models remain vulnerable to
various attacks, among which the adversarial label-flipping attack poses
significant threats. In label-flipping attacks, the adversary maliciously flips
a portion of training labels to compromise the machine learning model. This
paper raises significant concerns as these attacks can camouflage a highly
skewed dataset as an easily solvable classification problem, often misleading
machine learning practitioners into lower defenses and miscalculations of
potential risks. This concern amplifies in tabular data settings, where
identifying true labels requires expertise, allowing malicious label-flipping
attacks to easily slip under the radar. To demonstrate this risk is inherited in
the adversary's objective, we propose FALFA (Fast Adversarial Label-Flipping
Attack), a novel efficient attack for crafting adversarial labels. FALFA is
based on transforming the adversary's objective and employs linear programming
to reduce computational complexity. Using ten real-world tabular datasets, we
demonstrate FALFA's superior attack potential, highlighting the need for robust
defenses against such threats.

Close

 * https://arxiv.org/abs/2310.10744
 * doi:10.48550/arXiv.2310.10744

Close

Plum Print visual indicator of research metrics




 * Home

Search



Contact

j.wicker@auckland.ac.nz
Private Bag 92019
School of Computer Science
University of Auckland
Auckland, 1142
New Zealand

© 2023 Wickerlab

 * Privacy Policy
 * Terms & Conditions