Bruges, Belgium April 22 - 24
Content of the proceedings
-
Reliable counterfactuals for machine learning models
Interpretability, privacy, fairness
Neuro Symbolic AI and Complex Data
Beyond Performance: Comprehensive Evaluation Strategies for Impactful Machine Learning
Time series, online learning and domain adaptation
Natural Language Processing
Efficient and Resilient Machine Learning for Industrial Applications
Reliability, Safety and Robustness of AI applications
Classification and regression
Learning and Reasoning on Knowledge and Heterogeneous Graphs
Vision, image processing and healthcare AI
Recurrent and reinforcement learning
DImension reduction, feature selection and unsupervised learning
Graph learning
Deep models and learning principles
Reliable counterfactuals for machine learning models
Reliable Counterfactuals for Machine Learning Models - Current Aspects and Perspectives
Marika Kaden, Benjamin Paassen, Barbara Hammer, Ronny Schubert, Thomas Villmann
https://doi.org/10.14428/esann/2026.ES2026-2
Marika Kaden, Benjamin Paassen, Barbara Hammer, Ronny Schubert, Thomas Villmann
https://doi.org/10.14428/esann/2026.ES2026-2
Abstract:
Counterfactuals are becoming increasingly important for explaining and evaluating machine learning approaches. In crucial and challenging applications such as medical diagnosis support, credit scoring, and technical control systems, evaluating counterfactuals is a promising methodology to exploring the applicability and limits of AI systems. This approach also empowers users with actionable advice on how to influence automatic decisions. Thus, reliable and faithful generation as well as a prudent interpretation of counterfactuals contribute to establishing more reliable AI systems and improving their trustworthiness.
Determination and generation of counterfactual samples can be motivated and processed taking two different perspectives: The cognitive perspective considers trustworthiness and reliance based on empirical evidence in human reasoning or model explanations inspired by experiences in social sciences. The technical perspective is mainly triggered by issues like plausibility and actionability of counterfactuals as well as by their efficient computation and evaluation. Current developments in counterfactual research provide substantial progress but are far away from to be sufficient in the field.
In this introduction paper, we highlight some current aspects in this interdisciplinary field of research inspired by cognitive models of inference and reasoning as well as triggered by technical developments in the field of machine learning and artificial intelligence.
Counterfactuals are becoming increasingly important for explaining and evaluating machine learning approaches. In crucial and challenging applications such as medical diagnosis support, credit scoring, and technical control systems, evaluating counterfactuals is a promising methodology to exploring the applicability and limits of AI systems. This approach also empowers users with actionable advice on how to influence automatic decisions. Thus, reliable and faithful generation as well as a prudent interpretation of counterfactuals contribute to establishing more reliable AI systems and improving their trustworthiness.
Determination and generation of counterfactual samples can be motivated and processed taking two different perspectives: The cognitive perspective considers trustworthiness and reliance based on empirical evidence in human reasoning or model explanations inspired by experiences in social sciences. The technical perspective is mainly triggered by issues like plausibility and actionability of counterfactuals as well as by their efficient computation and evaluation. Current developments in counterfactual research provide substantial progress but are far away from to be sufficient in the field.
In this introduction paper, we highlight some current aspects in this interdisciplinary field of research inspired by cognitive models of inference and reasoning as well as triggered by technical developments in the field of machine learning and artificial intelligence.
Identifying counterfactual probabilities using bivariate distributions and uplift modeling
Théo Verhelst, Gianluca Bontempi
https://doi.org/10.14428/esann/2026.ES2026-96
Théo Verhelst, Gianluca Bontempi
https://doi.org/10.14428/esann/2026.ES2026-96
Abstract:
Uplift modeling estimates the causal effect of an intervention as the difference between potential outcomes under treatment and control, whereas counterfactual identification aims to recover the joint distribution of these potential outcomes (e.g., “Would this customer still have churned had we given them a marketing offer?”). This joint counterfactual distribution provides richer information than the uplift but is harder to estimate. However, the two approaches are synergistic: uplift models can be leveraged for counterfactual estimation. We propose a counterfactual estimator that fits a bivariate beta distribution to predicted uplift scores, yielding posterior distributions over counterfactual outcomes. Our approach requires no causal assumptions beyond those of uplift modeling. Simulations show the efficacy of the approach, which can be applied, for example, to the problem of customer churn in telecom, where it reveals insights unavailable to standard ML or uplift models alone.
Uplift modeling estimates the causal effect of an intervention as the difference between potential outcomes under treatment and control, whereas counterfactual identification aims to recover the joint distribution of these potential outcomes (e.g., “Would this customer still have churned had we given them a marketing offer?”). This joint counterfactual distribution provides richer information than the uplift but is harder to estimate. However, the two approaches are synergistic: uplift models can be leveraged for counterfactual estimation. We propose a counterfactual estimator that fits a bivariate beta distribution to predicted uplift scores, yielding posterior distributions over counterfactual outcomes. Our approach requires no causal assumptions beyond those of uplift modeling. Simulations show the efficacy of the approach, which can be applied, for example, to the problem of customer churn in telecom, where it reveals insights unavailable to standard ML or uplift models alone.
Graph Diffusion Counterfactual Explanation
David Bechtoldt, Sidney Bender
https://doi.org/10.14428/esann/2026.ES2026-138
David Bechtoldt, Sidney Bender
https://doi.org/10.14428/esann/2026.ES2026-138
Abstract:
Machine learning models that operate on graph-structured data, such as molecular graphs or social networks, often make accurate predictions but offer little insight into why certain predictions are made. Counterfactual explanations address this challenge by seeking the closest alternative scenario where the model’s prediction would change. Although counterfactual explanations are extensively studied in tabular data and computer vision, the graph domain remains comparatively underexplored. Constructing graph counterfactuals is intrinsically difficult because graphs are discrete and non-euclidean objects. We introduce Graph Diffusion Counterfactual Explanation, a novel framework for generating counterfactual explanations on graph data, combining discrete diffusion models and classifier-free guidance. We empirically demonstrate that our method reliably generates in-distribution as well as minimally structurally different counterfactuals for both discrete classification targets and continuous properties.
Machine learning models that operate on graph-structured data, such as molecular graphs or social networks, often make accurate predictions but offer little insight into why certain predictions are made. Counterfactual explanations address this challenge by seeking the closest alternative scenario where the model’s prediction would change. Although counterfactual explanations are extensively studied in tabular data and computer vision, the graph domain remains comparatively underexplored. Constructing graph counterfactuals is intrinsically difficult because graphs are discrete and non-euclidean objects. We introduce Graph Diffusion Counterfactual Explanation, a novel framework for generating counterfactual explanations on graph data, combining discrete diffusion models and classifier-free guidance. We empirically demonstrate that our method reliably generates in-distribution as well as minimally structurally different counterfactuals for both discrete classification targets and continuous properties.
Geometric-analytical Generation of Counterfactuals for Prototype-based Classifiers
Marika Kaden, Lynn Reuss, Thomas Villmann
https://doi.org/10.14428/esann/2026.ES2026-62
Marika Kaden, Lynn Reuss, Thomas Villmann
https://doi.org/10.14428/esann/2026.ES2026-62
Abstract:
Counterfactuals are usuful objects to explain decisions of machine learning classifiers. In the best case, counterfactuals can provide to derive causal inference structure realized by the model. Yet, counterfactual generation in general is known as a constrained optimization problem.
In this contribution we demonstrate that counterfactuals can be determined geometric-analytically in case of prototype based classifiers. For this we only require that nearest prototype classification is based on norms induced by an inner product, which has to be applied for consistency also to evaluate the deviation between a given sample and a desired counterfactual class.
Counterfactuals are usuful objects to explain decisions of machine learning classifiers. In the best case, counterfactuals can provide to derive causal inference structure realized by the model. Yet, counterfactual generation in general is known as a constrained optimization problem.
In this contribution we demonstrate that counterfactuals can be determined geometric-analytically in case of prototype based classifiers. For this we only require that nearest prototype classification is based on norms induced by an inner product, which has to be applied for consistency also to evaluate the deviation between a given sample and a desired counterfactual class.
Interpretability, privacy, fairness
Learning Counterfactual Densities via Marginal Contrastive Discrimination
Katia Meziani, Aminata Ndiaye, Madalina Olteanu
https://doi.org/10.14428/esann/2026.ES2026-290
Katia Meziani, Aminata Ndiaye, Madalina Olteanu
https://doi.org/10.14428/esann/2026.ES2026-290
Abstract:
Estimating counterfactual outcome densities provides a richer understanding of causal effects than traditional estimators based for example on average treatment effects (ATE). However, reliable conditional density estimation remains challenging, especially in high-dimensional settings. We propose to use Marginal Contrastive Discrimination (MCD), a recent methodology that reframes conditional density estimation as a generalised contrastive learning task, enabling the use of supervised machine learning. The result is a new framework which delivers accurate counterfactual density estimates and which is illustrated through simulated numerical examples. The latter show that the proposed technique handles high-dimensional data and/or multivariate treatments while improving the state-of-the-art methods, providing empirical support of its potential for nuanced and robust causal effect analysis.
Estimating counterfactual outcome densities provides a richer understanding of causal effects than traditional estimators based for example on average treatment effects (ATE). However, reliable conditional density estimation remains challenging, especially in high-dimensional settings. We propose to use Marginal Contrastive Discrimination (MCD), a recent methodology that reframes conditional density estimation as a generalised contrastive learning task, enabling the use of supervised machine learning. The result is a new framework which delivers accurate counterfactual density estimates and which is illustrated through simulated numerical examples. The latter show that the proposed technique handles high-dimensional data and/or multivariate treatments while improving the state-of-the-art methods, providing empirical support of its potential for nuanced and robust causal effect analysis.
Low-Rank Lens for Scalable LLMs Interpretability
Giuseppe Trimigno, gianfranco lombardo, Stefano Cagnoni
https://doi.org/10.14428/esann/2026.ES2026-221
Giuseppe Trimigno, gianfranco lombardo, Stefano Cagnoni
https://doi.org/10.14428/esann/2026.ES2026-221
Abstract:
Representation lenses expose layer-wise predictions in LLMs. Current methods rely on full-rank affine maps with quadratic cost. However, spectral evidence across multiple model families shows these maps are intrinsically low-rank. We propose LoRA-Lens, a low-rank residual alignment mechanism that reduces parameters by over 95\% while preserving fidelity to the model's final output. Experiments on OLMo, Qwen, and Gemma (up to 32B) demonstrate strong fidelity, large memory savings, robust transfer to instruction-tuned models, and effective early-exit inference.
Representation lenses expose layer-wise predictions in LLMs. Current methods rely on full-rank affine maps with quadratic cost. However, spectral evidence across multiple model families shows these maps are intrinsically low-rank. We propose LoRA-Lens, a low-rank residual alignment mechanism that reduces parameters by over 95\% while preserving fidelity to the model's final output. Experiments on OLMo, Qwen, and Gemma (up to 32B) demonstrate strong fidelity, large memory savings, robust transfer to instruction-tuned models, and effective early-exit inference.
RKLU: Redistributive KL Distillation for Efficient Retain-Free Machine Unlearning
Varun Sampath Kumar, Esmaeil S Nadimi, Vinay Chakravarthi Gogineni
https://doi.org/10.14428/esann/2026.ES2026-80
Varun Sampath Kumar, Esmaeil S Nadimi, Vinay Chakravarthi Gogineni
https://doi.org/10.14428/esann/2026.ES2026-80
Abstract:
Machine unlearning aims to remove the influence of specific training samples from model, motivated by privacy regulations and data revocation requirements. Existing approaches often depend on retain data, which compromises privacy and becomes computationally expensive. To address this challenge, we propose RKLU, a novel retain-data-free unlearning method that fine-tunes models by minimizing the KL divergence between their outputs and a target distribution that suppresses the probabilities of the samples to be forgotten. RKLU achieves effective unlearning with minimal utility loss on diverse vision and text classification benchmarks, offering a privacy-preserving and computationally efficient alternative.
Machine unlearning aims to remove the influence of specific training samples from model, motivated by privacy regulations and data revocation requirements. Existing approaches often depend on retain data, which compromises privacy and becomes computationally expensive. To address this challenge, we propose RKLU, a novel retain-data-free unlearning method that fine-tunes models by minimizing the KL divergence between their outputs and a target distribution that suppresses the probabilities of the samples to be forgotten. RKLU achieves effective unlearning with minimal utility loss on diverse vision and text classification benchmarks, offering a privacy-preserving and computationally efficient alternative.
Breaking the Trade-off between Performance and Cost with an Improved Untargeted APGD
Ammar Al-Najjar, Mark Jelasity
https://doi.org/10.14428/esann/2026.ES2026-164
Ammar Al-Najjar, Mark Jelasity
https://doi.org/10.14428/esann/2026.ES2026-164
Abstract:
Auto-PGD (APGD), a component of AutoAttack, relies on two surrogate loss functions: cross entropy (CE) and Difference of Logits Ratio (DLR). AutoAttack includes untargeted APGD variants that are fast but less effective and targeted variants that are very effective but expensive. In this work, we introduce a low-cost untargeted variant of APGD that represents a substantial improvement over previous untargeted
variants, and is almost as effective as the targeted APGD. Our simplest APGD variant begins with a CE-based initialization, and then transitions to DLR-based optimization. We then introduce an improvement to this idea, and propose a novel initialization approach. Our experimental results demonstrate that our hybrid attacks have a cost similar to standard untargeted APGD variants while they achieve attack success rates comparable to targeted APGD.
Auto-PGD (APGD), a component of AutoAttack, relies on two surrogate loss functions: cross entropy (CE) and Difference of Logits Ratio (DLR). AutoAttack includes untargeted APGD variants that are fast but less effective and targeted variants that are very effective but expensive. In this work, we introduce a low-cost untargeted variant of APGD that represents a substantial improvement over previous untargeted
variants, and is almost as effective as the targeted APGD. Our simplest APGD variant begins with a CE-based initialization, and then transitions to DLR-based optimization. We then introduce an improvement to this idea, and propose a novel initialization approach. Our experimental results demonstrate that our hybrid attacks have a cost similar to standard untargeted APGD variants while they achieve attack success rates comparable to targeted APGD.
Fairness in machine learning: A Compact Survey
Jeremy de Bodt, Dounia Mulders, Cyril de Bodt, Lee John, Marco Saerens
https://doi.org/10.14428/esann/2026.ES2026-294
Jeremy de Bodt, Dounia Mulders, Cyril de Bodt, Lee John, Marco Saerens
https://doi.org/10.14428/esann/2026.ES2026-294
Abstract:
Considering, assessing, and ensuring fairness is key when re-
lying on machine learning (ML) for sensitive decision making. Yet, despite
growing attention, multiple fairness definitions are currently adopted and
consensual guidelines across learning paradigms are still lacking. There-
fore, this review first describes sources of bias and focuses on outcome
fairness, comparing popular criterion such as independence, separation,
and sufficiency. Fairness interventions are then examined at pre-, in-, and
post-processing stages and across supervised, semi-supervised, unsuper-
vised, and self-supervised settings.
Considering, assessing, and ensuring fairness is key when re-
lying on machine learning (ML) for sensitive decision making. Yet, despite
growing attention, multiple fairness definitions are currently adopted and
consensual guidelines across learning paradigms are still lacking. There-
fore, this review first describes sources of bias and focuses on outcome
fairness, comparing popular criterion such as independence, separation,
and sufficiency. Fairness interventions are then examined at pre-, in-, and
post-processing stages and across supervised, semi-supervised, unsuper-
vised, and self-supervised settings.
DeepFedNAS: Pareto Optimal Supernet Training for Improved and Predictor-Free Federated Neural Architecture Search
Bostan Khan, Masoud Daneshtalab
https://doi.org/10.14428/esann/2026.ES2026-32
Bostan Khan, Masoud Daneshtalab
https://doi.org/10.14428/esann/2026.ES2026-32
Abstract:
Federated Neural Architecture Search (FedNAS) is hindered by unguided supernet training and costly post-training search pipelines. We introduce DeepFedNAS, a two-phase framework that resolves these issues. We propose Federated Pareto Optimal Supernet Training, using a pre-computed path of elite architectures as an intelligent curriculum to train a superior supernet. Subsequently, our Predictor-Free Search Method uses a principled fitness function as a zero-cost proxy for accuracy, finding optimal subnets in seconds. DeepFedNAS achieves state-of-the-art accuracy, superior robustness to data heterogeneity, and a ~61x search pipeline speedup, making FedNAS practical and efficient.
Federated Neural Architecture Search (FedNAS) is hindered by unguided supernet training and costly post-training search pipelines. We introduce DeepFedNAS, a two-phase framework that resolves these issues. We propose Federated Pareto Optimal Supernet Training, using a pre-computed path of elite architectures as an intelligent curriculum to train a superior supernet. Subsequently, our Predictor-Free Search Method uses a principled fitness function as a zero-cost proxy for accuracy, finding optimal subnets in seconds. DeepFedNAS achieves state-of-the-art accuracy, superior robustness to data heterogeneity, and a ~61x search pipeline speedup, making FedNAS practical and efficient.
FedHENet: A Frugal Federated Learning Framework for Heterogeneous Environments
Alejandro Dopico-Castro, Oscar Fontenla-Romero, Amparo Alonso-Betanzos, Bertha Guijarro-Berdiñas, Ivan Perez Digon
https://doi.org/10.14428/esann/2026.ES2026-165
Alejandro Dopico-Castro, Oscar Fontenla-Romero, Amparo Alonso-Betanzos, Bertha Guijarro-Berdiñas, Ivan Perez Digon
https://doi.org/10.14428/esann/2026.ES2026-165
Abstract:
Federated Learning (FL) enables collaborative training without centralizing data, essential for privacy compliance in real-world scenarios involving sensitive visual information. Most FL approaches rely on expensive, iterative deep network optimization, which still risks privacy via shared gradients. In this work, we propose FedHENet, extending the FedHEONN framework to image classification. By using a fixed, pre-trained feature extractor and learning only a single output layer, we avoid costly local fine-tuning. This layer is learned by analytically aggregating client knowledge in a single round of communication using homomorphic encryption (HE). Experiments show that FedHENet achieves competitive accuracy compared to iterative FL baselines while demonstrating superior stability performance and up to 70\% better energy efficiency. Crucially, our method is hyperparameter-free, removing the carbon footprint associated with hyperparameter tuning in standard FL. Code available in \url{https://github.com/AlejandroDopico2/FedHENet/}
Federated Learning (FL) enables collaborative training without centralizing data, essential for privacy compliance in real-world scenarios involving sensitive visual information. Most FL approaches rely on expensive, iterative deep network optimization, which still risks privacy via shared gradients. In this work, we propose FedHENet, extending the FedHEONN framework to image classification. By using a fixed, pre-trained feature extractor and learning only a single output layer, we avoid costly local fine-tuning. This layer is learned by analytically aggregating client knowledge in a single round of communication using homomorphic encryption (HE). Experiments show that FedHENet achieves competitive accuracy compared to iterative FL baselines while demonstrating superior stability performance and up to 70\% better energy efficiency. Crucially, our method is hyperparameter-free, removing the carbon footprint associated with hyperparameter tuning in standard FL. Code available in \url{https://github.com/AlejandroDopico2/FedHENet/}
Neuro Symbolic AI and Complex Data
Neuro Symbolic AI and Complex Data
Luca Oneto, Nicolò Navarin, Luca Pasa, Davide Rigoni, Davide Anguita
https://doi.org/10.14428/esann/2026.ES2026-1
Luca Oneto, Nicolò Navarin, Luca Pasa, Davide Rigoni, Davide Anguita
https://doi.org/10.14428/esann/2026.ES2026-1
Abstract:
The widespread use of Artificial Intelligence (AI) for decision-making on complex data - such as images, text, graphs, and nonlinear systems - has enabled significant progress across many application domains.
However, purely data-driven approaches often struggle to provide structured reasoning, interpretability, robustness, and effective integration of domain knowledge.
Neuro-Symbolic AI addresses these challenges by combining sub-symbolic learning with symbolic reasoning, allowing models to incorporate logical rules, constraints, ontologies, and expert knowledge.
This tutorial offers a concise overview of Neuro-Symbolic AI for complex data, presenting core concepts, representative methods, and key application areas, including constraint-aware learning, AI for science, socially responsible AI, and knowledge-guided inference.
It discusses both the opportunities and limitations of current approaches, highlighting how neuro-symbolic integration can improve transparency, trustworthiness, and human-centric alignment in AI systems.
This tutorial aims to foster cross-disciplinary understanding and support the development of robust and explainable AI solutions for real-world problems.
The widespread use of Artificial Intelligence (AI) for decision-making on complex data - such as images, text, graphs, and nonlinear systems - has enabled significant progress across many application domains.
However, purely data-driven approaches often struggle to provide structured reasoning, interpretability, robustness, and effective integration of domain knowledge.
Neuro-Symbolic AI addresses these challenges by combining sub-symbolic learning with symbolic reasoning, allowing models to incorporate logical rules, constraints, ontologies, and expert knowledge.
This tutorial offers a concise overview of Neuro-Symbolic AI for complex data, presenting core concepts, representative methods, and key application areas, including constraint-aware learning, AI for science, socially responsible AI, and knowledge-guided inference.
It discusses both the opportunities and limitations of current approaches, highlighting how neuro-symbolic integration can improve transparency, trustworthiness, and human-centric alignment in AI systems.
This tutorial aims to foster cross-disciplinary understanding and support the development of robust and explainable AI solutions for real-world problems.
Multi-label Complementary Labels Learning under Hard Logical Constraints
Luca Oneto, Yi Gao, Davide Anguita, Fabio Roli, Min-Ling Zhang, Fulvio Mastrogiovanni
https://doi.org/10.14428/esann/2026.ES2026-66
Luca Oneto, Yi Gao, Davide Anguita, Fabio Roli, Min-Ling Zhang, Fulvio Mastrogiovanni
https://doi.org/10.14428/esann/2026.ES2026-66
Abstract:
Two of the main challenges in multi-label classification are the need to collect labeled data, which can be costly or impractical, and the need to satisfy hard logical constraints between labels, which is often computationally expensive.
In some applications, complementary labels - that is, labels specifying a class to which a sample does not belong - are available and much less costly to obtain.
Researchers have therefore developed methods to learn from such labels efficiently and effectively.
Similar efforts have been made to address the problem of learning under hard logical constraints.
Nevertheless, to the best of our knowledge, no prior work has investigated the problem of learning from complementary labels under hard logical constraints.
In this work, we propose and compare methods to address this problem, showing that hard logical constraints, besides representing restrictions to be satisfied, can also serve as an additional source of weak supervision.
The relationships between labels can help bridge the information gap between relevant and complementary labels.
Experimental results on different datasets and scenarios support our claims.
Two of the main challenges in multi-label classification are the need to collect labeled data, which can be costly or impractical, and the need to satisfy hard logical constraints between labels, which is often computationally expensive.
In some applications, complementary labels - that is, labels specifying a class to which a sample does not belong - are available and much less costly to obtain.
Researchers have therefore developed methods to learn from such labels efficiently and effectively.
Similar efforts have been made to address the problem of learning under hard logical constraints.
Nevertheless, to the best of our knowledge, no prior work has investigated the problem of learning from complementary labels under hard logical constraints.
In this work, we propose and compare methods to address this problem, showing that hard logical constraints, besides representing restrictions to be satisfied, can also serve as an additional source of weak supervision.
The relationships between labels can help bridge the information gap between relevant and complementary labels.
Experimental results on different datasets and scenarios support our claims.
Ring-constrained Molecular Graph Generation with Diffusion Models
Davide Rigoni, Rana İşlek, Nicolò Navarin
https://doi.org/10.14428/esann/2026.ES2026-129
Davide Rigoni, Rana İşlek, Nicolò Navarin
https://doi.org/10.14428/esann/2026.ES2026-129
Abstract:
Designing molecules with specific attributes is vital in drug discovery and materials science. Ring structures are key to a molecule's stability, reactivity, and biological interactions, ensuring that designed compounds are both feasible and synthetically viable, thereby increasing their potential for lab production and therapeutic use. Generative diffusion models have become essential tools for in silico molecule generation. However, integrating structural constraints, especially those involving ring structures, remains challenging. This study introduces a method for applying hard ring-related constraints in molecule generation, enhancing synthetic validity and utility, with evaluations on the QM9 dataset.
Designing molecules with specific attributes is vital in drug discovery and materials science. Ring structures are key to a molecule's stability, reactivity, and biological interactions, ensuring that designed compounds are both feasible and synthetically viable, thereby increasing their potential for lab production and therapeutic use. Generative diffusion models have become essential tools for in silico molecule generation. However, integrating structural constraints, especially those involving ring structures, remains challenging. This study introduces a method for applying hard ring-related constraints in molecule generation, enhancing synthetic validity and utility, with evaluations on the QM9 dataset.
Code-Guided Reasoning in Vision-Language Models for Complex Diagram Understanding
Daniel Steinigen, Lucie Flek, Sebastian Houben
https://doi.org/10.14428/esann/2026.ES2026-372
Daniel Steinigen, Lucie Flek, Sebastian Houben
https://doi.org/10.14428/esann/2026.ES2026-372
Abstract:
Understanding complex structured diagrams, such as circuit schematics, molecular structures, musical notation, or business process models, requires precise symbolic, spatial, and relational reasoning. Current vision-language models (VLMs) struggle with such tasks because they lack access to the underlying symbolic structure that governs these diagrams. We introduce a training paradigm in which VLMs explicitly learn to reason through an intermediate symbolic representation of the image that is expressed in code. We generate a large synthetic dataset covering 21 diagram types across 7 domains by prompting large language models to generate code in specific formal representation languages (FRLs) and rendering them into paired code-image samples. During VLM training, the FRL code is provided along with the image, enabling the model to incorporate the symbolic representation during reasoning. Experiments show that models capable of producing valid code benefit from this symbolic intermediate layer, yielding improved accuracy on diagram understanding tasks. Our results demonstrate that integrating symbolic code into VLM training offers a promising direction for VLM design to handle complex visual data by bridging diagram perception with symbolic reasoning.
Understanding complex structured diagrams, such as circuit schematics, molecular structures, musical notation, or business process models, requires precise symbolic, spatial, and relational reasoning. Current vision-language models (VLMs) struggle with such tasks because they lack access to the underlying symbolic structure that governs these diagrams. We introduce a training paradigm in which VLMs explicitly learn to reason through an intermediate symbolic representation of the image that is expressed in code. We generate a large synthetic dataset covering 21 diagram types across 7 domains by prompting large language models to generate code in specific formal representation languages (FRLs) and rendering them into paired code-image samples. During VLM training, the FRL code is provided along with the image, enabling the model to incorporate the symbolic representation during reasoning. Experiments show that models capable of producing valid code benefit from this symbolic intermediate layer, yielding improved accuracy on diagram understanding tasks. Our results demonstrate that integrating symbolic code into VLM training offers a promising direction for VLM design to handle complex visual data by bridging diagram perception with symbolic reasoning.
NSA: Neuro-symbolic ARC Challenge
Pawel Batorski, Jannik Brinkmann, Paul Swoboda
https://doi.org/10.14428/esann/2026.ES2026-119
Pawel Batorski, Jannik Brinkmann, Paul Swoboda
https://doi.org/10.14428/esann/2026.ES2026-119
Abstract:
The Abstraction and Reasoning Corpus (ARC) evaluates general reasoning skills that remain challenging for both machine learning models and combinatorial search. We propose NSA, a neuro-symbolic approach that couples a small 25.3M-parameter transformer for proposing DSL primitives with a symbolic program search. The transformer narrows the search space by suggesting promising transformations, is pre-trained on synthetically generated tasks, and is further adapted at test time using task-specific synthetic data. NSA surpasses comparable state of the art on the ARC evaluation set by 27\% and compares favourably on the ARC train set. Code: https://github.com/Batorskq/NSA.
The Abstraction and Reasoning Corpus (ARC) evaluates general reasoning skills that remain challenging for both machine learning models and combinatorial search. We propose NSA, a neuro-symbolic approach that couples a small 25.3M-parameter transformer for proposing DSL primitives with a symbolic program search. The transformer narrows the search space by suggesting promising transformations, is pre-trained on synthetically generated tasks, and is further adapted at test time using task-specific synthetic data. NSA surpasses comparable state of the art on the ARC evaluation set by 27\% and compares favourably on the ARC train set. Code: https://github.com/Batorskq/NSA.
Beyond Performance: Comprehensive Evaluation Strategies for Impactful Machine Learning
Beyond Performance: Comprehensive Evaluation Strategies for Impactful Machine Learning
Valerie Vaquet, Ulrike Kuhl, Saša Brdnik, Benjamin Paassen
https://doi.org/10.14428/esann/2026.ES2026-3
Valerie Vaquet, Ulrike Kuhl, Saša Brdnik, Benjamin Paassen
https://doi.org/10.14428/esann/2026.ES2026-3
Abstract:
Evaluation is an integral part of developing machine learning and AI-based systems for real-world applications. Given the transformative changes induced by ML/AI systems like large language models, evaluation needs to go beyond performance and include robustness, fairness, user perception, and legal compliance to ensure responsible usage. Further, evaluation practices need to also consider the full range of applications beyond the classic batch setting of machine learning, i.e., data streams, recommender systems, reinforcement learning, and foundation models. This paper provides an analysis of current evaluation practices and gaps across settings and dimensions, and argues for holistic, reproducible evaluation beyond benchmark performance.
Evaluation is an integral part of developing machine learning and AI-based systems for real-world applications. Given the transformative changes induced by ML/AI systems like large language models, evaluation needs to go beyond performance and include robustness, fairness, user perception, and legal compliance to ensure responsible usage. Further, evaluation practices need to also consider the full range of applications beyond the classic batch setting of machine learning, i.e., data streams, recommender systems, reinforcement learning, and foundation models. This paper provides an analysis of current evaluation practices and gaps across settings and dimensions, and argues for holistic, reproducible evaluation beyond benchmark performance.
High Performance, Low Reliability: Uncertainty Benchmarking for Tabular Foundation Models
José Lucas De Melo Costa, Fabrice Popineau, Arpad Rimmel, Bich-Liên Doan
https://doi.org/10.14428/esann/2026.ES2026-261
José Lucas De Melo Costa, Fabrice Popineau, Arpad Rimmel, Bich-Liên Doan
https://doi.org/10.14428/esann/2026.ES2026-261
Abstract:
Recent Tabular Foundation Models (TFMs) have demonstrated state-of-the-art predictive performance, often surpassing Gradient-Boosted Decision Trees (GBDTs). However, the trustworthiness of these models, particularly their uncertainty quantification, has been largely overlooked. We investigate this gap through an extensive study comparing TFMs, GBDTs, and classical baselines on the 112 datasets of the TALENT benchmark. Our results reveal a performance–uncertainty trade-off: although TFMs achieve the highest predictive performance (AUC), they exhibit lower conditional coverage under conformal prediction (SSCS) compared to GBDTs. Complementary experiments on synthetic datasets further characterize the regimes in which this effect intensifies. We conclude that while TFMs advance predictive frontiers, achieving well-calibrated uncertainty remains a major open challenge for their reliable adoption. Code is available at: \href{https://github.com/jose-melo/high-performance-low-reliability}{https://github.com/jose-melo/high-performance-low-reliability}
Recent Tabular Foundation Models (TFMs) have demonstrated state-of-the-art predictive performance, often surpassing Gradient-Boosted Decision Trees (GBDTs). However, the trustworthiness of these models, particularly their uncertainty quantification, has been largely overlooked. We investigate this gap through an extensive study comparing TFMs, GBDTs, and classical baselines on the 112 datasets of the TALENT benchmark. Our results reveal a performance–uncertainty trade-off: although TFMs achieve the highest predictive performance (AUC), they exhibit lower conditional coverage under conformal prediction (SSCS) compared to GBDTs. Complementary experiments on synthetic datasets further characterize the regimes in which this effect intensifies. We conclude that while TFMs advance predictive frontiers, achieving well-calibrated uncertainty remains a major open challenge for their reliable adoption. Code is available at: \href{https://github.com/jose-melo/high-performance-low-reliability}{https://github.com/jose-melo/high-performance-low-reliability}
Adversarial Robustness by Combining Prototype Models with Lipschitz Training
Benjamin Paassen, Adia Khalid
https://doi.org/10.14428/esann/2026.ES2026-41
Benjamin Paassen, Adia Khalid
https://doi.org/10.14428/esann/2026.ES2026-41
Abstract:
Beyond accuracy, adversarial robustness and interpretability are crucial elements of trustworthy machine learning systems. Prototype-based models such as generalized learning vector quantization (GLVQ) have favourable robustness and interpretability properties but do not achieve state-of-the-art accuracy in many practical domains, such as image classification. Applying prototype-based models in the embedding space of a deep neural network boosts their accuracy but removes their interpretability and robustness. We partially resolve this dilemma: We prove that robustness guarantees of shallow classifiers translate to robustness guarantees of deep classifiers when imposing Lipschitz continuity, we provide a training scheme to achieve Lipschitz continuity, and we empirically validate our approach on three image classification data sets against fast gradient sign attacks.
Beyond accuracy, adversarial robustness and interpretability are crucial elements of trustworthy machine learning systems. Prototype-based models such as generalized learning vector quantization (GLVQ) have favourable robustness and interpretability properties but do not achieve state-of-the-art accuracy in many practical domains, such as image classification. Applying prototype-based models in the embedding space of a deep neural network boosts their accuracy but removes their interpretability and robustness. We partially resolve this dilemma: We prove that robustness guarantees of shallow classifiers translate to robustness guarantees of deep classifiers when imposing Lipschitz continuity, we provide a training scheme to achieve Lipschitz continuity, and we empirically validate our approach on three image classification data sets against fast gradient sign attacks.
Drift-Aware Evaluation of Fair Stream Learning
Kathrin Lammers, Fabian Hinder, Barbara Hammer, Valerie Vaquet
https://doi.org/10.14428/esann/2026.ES2026-152
Kathrin Lammers, Fabian Hinder, Barbara Hammer, Valerie Vaquet
https://doi.org/10.14428/esann/2026.ES2026-152
Abstract:
Algorithmic fairness, a key concern in algorithmic decision making, is a well-studied topic in the batch setup. Recently, several extensions for improving fairness in classification tasks have been proposed for the important scenario of non-stationary data streams. Yet, the question of how to reliably evaluate fairness for non-stationary data streams is still open, as popular batch measures can lead to misleading results. Specifically, typically cumulative fairness measures can be problematic if concept drift results in significant changes in model fairness across the data stream. In this contribution, we propose novel fairness scores that are suitable for the streaming scenario, and we demonstrate their suitability on streaming data benchmarks.
Algorithmic fairness, a key concern in algorithmic decision making, is a well-studied topic in the batch setup. Recently, several extensions for improving fairness in classification tasks have been proposed for the important scenario of non-stationary data streams. Yet, the question of how to reliably evaluate fairness for non-stationary data streams is still open, as popular batch measures can lead to misleading results. Specifically, typically cumulative fairness measures can be problematic if concept drift results in significant changes in model fairness across the data stream. In this contribution, we propose novel fairness scores that are suitable for the streaming scenario, and we demonstrate their suitability on streaming data benchmarks.
Real vs. Virtual Drift: Creating Realistic Stream Learning Benchmarks
Fabian Hinder, Johannes Brinkrolf, Kathrin Lammers, Barbara Hammer
https://doi.org/10.14428/esann/2026.ES2026-200
Fabian Hinder, Johannes Brinkrolf, Kathrin Lammers, Barbara Hammer
https://doi.org/10.14428/esann/2026.ES2026-200
Abstract:
Concept drift -- changes in the data distribution over time -- is a central challenge in stream learning. However, existing benchmarks either lack controlled drift or fail to capture the characteristics of real-world data. We propose a pipeline for constructing verifiable and realistic drift, enabling more systematic evaluation of stream learning algorithms. Here, we pay special attention to controlling both real and virtual drift. To underscore the relevance of our contribution, we analyze the effects of real and virtual drift on both real-world and synthetic data streams using our method, revealing a substantial mismatch between the two setups.
Concept drift -- changes in the data distribution over time -- is a central challenge in stream learning. However, existing benchmarks either lack controlled drift or fail to capture the characteristics of real-world data. We propose a pipeline for constructing verifiable and realistic drift, enabling more systematic evaluation of stream learning algorithms. Here, we pay special attention to controlling both real and virtual drift. To underscore the relevance of our contribution, we analyze the effects of real and virtual drift on both real-world and synthetic data streams using our method, revealing a substantial mismatch between the two setups.
Linearity of Sensitive Concepts in Language Models
Sarah Schröder, Valerie Vaquet, Barbara Hammer
https://doi.org/10.14428/esann/2026.ES2026-217
Sarah Schröder, Valerie Vaquet, Barbara Hammer
https://doi.org/10.14428/esann/2026.ES2026-217
Abstract:
Identifying how sensitive attributes like ethnicity are encoded in language models can yield valuable insights in terms of fairness. This knowledge could enhance explanations of model decisions, aid in mitigating social biases, or indicate under-represented minorities. Based on the literature on fairness and explainable AI, it should be possible to learn sensitive attributes such as gender or ethnicity with linear methods. Unfortunately, there are not many papers on the intersection of concept learning and fairness. On the other hand, too many fairness papers restrict their evaluation to binary gender and do not consider more complex test cases. So, it is not entirely clear whether all sensitive attributes and identity groups are encoded linearly in language models. Hence, we evaluate this question on a broad selection of identity groups, datasets, and language models.
Identifying how sensitive attributes like ethnicity are encoded in language models can yield valuable insights in terms of fairness. This knowledge could enhance explanations of model decisions, aid in mitigating social biases, or indicate under-represented minorities. Based on the literature on fairness and explainable AI, it should be possible to learn sensitive attributes such as gender or ethnicity with linear methods. Unfortunately, there are not many papers on the intersection of concept learning and fairness. On the other hand, too many fairness papers restrict their evaluation to binary gender and do not consider more complex test cases. So, it is not entirely clear whether all sensitive attributes and identity groups are encoded linearly in language models. Hence, we evaluate this question on a broad selection of identity groups, datasets, and language models.
Evaluation of Rashomon Sets for the Determination of Stable and Plausible Model Explanations
Marika Kaden, Mahrokh Karimi, Subhashree Panda, Thomas Pfaff, Thomas Villmann
https://doi.org/10.14428/esann/2026.ES2026-250
Marika Kaden, Mahrokh Karimi, Subhashree Panda, Thomas Pfaff, Thomas Villmann
https://doi.org/10.14428/esann/2026.ES2026-250
Abstract:
Training of machine learning models for classification frequently
yields several different solutions although the performance remains
approximately the same, i.e. one observes many close-to-optimum solutions
with only marginal performance differences which are, however, qualitatively
well-distinguishable. This behaviour is known as the Rashomon effect
and may be dedicated to the stochastic in the training process, different
learning strategies or various initial settings. Hence, model explanations
may become difficult and have to be related to a given configuration.
Therefore, stable and plausible explanations are required based on
the evaluation of the Rashomon set. Yet, the consistency of the resulting
explanations remained largely unexplored so far.
Here we propose to evaluate the Rashomon set qualitatively by means
of a cluster analysis based on the determination of the feature importance.
Feature importance of a model gives insights about the decision making
process and, hence, provides an appropriate criterion to distinguish
model decision realizations. Clustering of them reveal stable and
plausible classification strategies and, hence, contribute to reliable
explanations.
Training of machine learning models for classification frequently
yields several different solutions although the performance remains
approximately the same, i.e. one observes many close-to-optimum solutions
with only marginal performance differences which are, however, qualitatively
well-distinguishable. This behaviour is known as the Rashomon effect
and may be dedicated to the stochastic in the training process, different
learning strategies or various initial settings. Hence, model explanations
may become difficult and have to be related to a given configuration.
Therefore, stable and plausible explanations are required based on
the evaluation of the Rashomon set. Yet, the consistency of the resulting
explanations remained largely unexplored so far.
Here we propose to evaluate the Rashomon set qualitatively by means
of a cluster analysis based on the determination of the feature importance.
Feature importance of a model gives insights about the decision making
process and, hence, provides an appropriate criterion to distinguish
model decision realizations. Clustering of them reveal stable and
plausible classification strategies and, hence, contribute to reliable
explanations.
Towards meaningful evaluation of uncertainty-aware segmentation workflows for medical applications
Dany Rimez, Lee John, Ana Maria Barragan Montero
https://doi.org/10.14428/esann/2026.ES2026-256
Dany Rimez, Lee John, Ana Maria Barragan Montero
https://doi.org/10.14428/esann/2026.ES2026-256
Abstract:
In radiation oncology, image segmentation with deep learning models must reduce clinician workload without compromising patient safety. Uncertainty quantification is therefore essential to provide reliable error estimates and to determine which segmentations require correction. Despite progress, we argue that current evaluation protocols enable only general comparison and ignore the definition of decision thresholds used in practice.
We propose a new paradigm for the evaluation of such automated systems through the quantification of clinical outcomes. We compute the fraction of high-confidence segmentation prediction meeting quality standards and their corresponding average performance, based on a decision threshold. We calibrate this threshold to bound the amount of segmentations inferior to standards below a specified risk tolerance.
Through experiments across four medical datasets, we show our approach delivers meaningful performance guaranties essential for regulatory compliance (e.g., EU AI Act) and building trust in automated systems. Code: https://github.com/Dany546/esann2026
In radiation oncology, image segmentation with deep learning models must reduce clinician workload without compromising patient safety. Uncertainty quantification is therefore essential to provide reliable error estimates and to determine which segmentations require correction. Despite progress, we argue that current evaluation protocols enable only general comparison and ignore the definition of decision thresholds used in practice.
We propose a new paradigm for the evaluation of such automated systems through the quantification of clinical outcomes. We compute the fraction of high-confidence segmentation prediction meeting quality standards and their corresponding average performance, based on a decision threshold. We calibrate this threshold to bound the amount of segmentations inferior to standards below a specified risk tolerance.
Through experiments across four medical datasets, we show our approach delivers meaningful performance guaranties essential for regulatory compliance (e.g., EU AI Act) and building trust in automated systems. Code: https://github.com/Dany546/esann2026
Time series, online learning and domain adaptation
A Deep Learning Diagnostic Observer for Time Series Anomaly Detection
Assmaa ALSAMADI, Fannia Pacheco, Paul Honeine
https://doi.org/10.14428/esann/2026.ES2026-190
Assmaa ALSAMADI, Fannia Pacheco, Paul Honeine
https://doi.org/10.14428/esann/2026.ES2026-190
Abstract:
Mathematical models for anomaly detection (AD) in dy-namic systems have demonstrated high performance, particularly diag-nostic observer models. Deep learning (DL) models are also effective, butthey often require complex architectures, large training datasets and costlyfine-tuning to achieve generalization across diverse time series (TS) types. This paperintroduces a DL AD model based on the algebraic design of a diagnosticobserver, marking its first adaptation for TS data. Experiments on largeTS benchmark datasets demonstrate its superiority over various recent DLmodels.
Mathematical models for anomaly detection (AD) in dy-namic systems have demonstrated high performance, particularly diag-nostic observer models. Deep learning (DL) models are also effective, butthey often require complex architectures, large training datasets and costlyfine-tuning to achieve generalization across diverse time series (TS) types. This paperintroduces a DL AD model based on the algebraic design of a diagnosticobserver, marking its first adaptation for TS data. Experiments on largeTS benchmark datasets demonstrate its superiority over various recent DLmodels.
Time Series Forecasting in the Presence of Explosive Bubbles
Julien Peignon, Fabrice Rossi, Arthur Thomas
https://doi.org/10.14428/esann/2026.ES2026-206
Julien Peignon, Fabrice Rossi, Arthur Thomas
https://doi.org/10.14428/esann/2026.ES2026-206
Abstract:
Neural forecasting methods typically assume Gaussian distributions, focusing on point prediction via MSE minimization. This overlooks heavy-tailed, locally explosive time series where predictive densities exhibit multimodality. We propose a Mixture Density Network with skewed Student-t components for density forecasting. To address extreme event rarity, we develop a dual reweighting strategy with post-hoc recalibration correcting distributional shift. Experiments on noncausal autoregressive processes demonstrate competitive point prediction with well-calibrated uncertainty quantification.
Neural forecasting methods typically assume Gaussian distributions, focusing on point prediction via MSE minimization. This overlooks heavy-tailed, locally explosive time series where predictive densities exhibit multimodality. We propose a Mixture Density Network with skewed Student-t components for density forecasting. To address extreme event rarity, we develop a dual reweighting strategy with post-hoc recalibration correcting distributional shift. Experiments on noncausal autoregressive processes demonstrate competitive point prediction with well-calibrated uncertainty quantification.
On the Components That Enable Robust Generalization in HAR Models
Otávio Napoli, Edson Borin
https://doi.org/10.14428/esann/2026.ES2026-193
Otávio Napoli, Edson Borin
https://doi.org/10.14428/esann/2026.ES2026-193
Abstract:
Generalizing human activity recognition (HAR) models across datasets remains challenging due to variations in sensors, environments, and user behavior. Domain Generalization (DG) methods attempt to address these shifts through objective-level modifications, architecture-level augmentations, and model pretraining strategies, but prior HAR studies often evaluate these components in isolation using suboptimal baselines. We systematically assess the contribution of each DG component across multiple HAR architectures, from CNNs to Transformers, using the DAGHAR benchmark. Our results show that pretraining the model with a self-supervised learning technique provides the most substantial and consistent gains in cross-dataset generalization, while architecture-level augmentations offer complementary improvements, and objective-level methods alone yield limited benefits across architectures. This suggests that DG studies should treat model pretraining as a standard baseline rather than an optional enhancement.
Generalizing human activity recognition (HAR) models across datasets remains challenging due to variations in sensors, environments, and user behavior. Domain Generalization (DG) methods attempt to address these shifts through objective-level modifications, architecture-level augmentations, and model pretraining strategies, but prior HAR studies often evaluate these components in isolation using suboptimal baselines. We systematically assess the contribution of each DG component across multiple HAR architectures, from CNNs to Transformers, using the DAGHAR benchmark. Our results show that pretraining the model with a self-supervised learning technique provides the most substantial and consistent gains in cross-dataset generalization, while architecture-level augmentations offer complementary improvements, and objective-level methods alone yield limited benefits across architectures. This suggests that DG studies should treat model pretraining as a standard baseline rather than an optional enhancement.
Drift Localization using Conformal Predictions
Fabian Hinder, Valerie Vaquet, Johannes Brinkrolf, Barbara Hammer
https://doi.org/10.14428/esann/2026.ES2026-198
Fabian Hinder, Valerie Vaquet, Johannes Brinkrolf, Barbara Hammer
https://doi.org/10.14428/esann/2026.ES2026-198
Abstract:
Concept drift -- the change of the distribution over time -- poses significant challenges for learning systems and is of central interest for monitoring. Understanding drift is thus paramount, and drift localization -- determining which samples are affected by the drift -- is essential. While several approaches exist, most rely on local testing schemes, which tend to fail in high-dimensional, low-signal settings. In this work, we consider a fundamentally different approach based on conformal predictions. We discuss and show the shortcomings of common approaches and demonstrate the performance of our approach on state-of-the-art image datasets.
Concept drift -- the change of the distribution over time -- poses significant challenges for learning systems and is of central interest for monitoring. Understanding drift is thus paramount, and drift localization -- determining which samples are affected by the drift -- is essential. While several approaches exist, most rely on local testing schemes, which tend to fail in high-dimensional, low-signal settings. In this work, we consider a fundamentally different approach based on conformal predictions. We discuss and show the shortcomings of common approaches and demonstrate the performance of our approach on state-of-the-art image datasets.
Stochastic Parroting in Temporal Attention - Regulating the Diagonal Sink
Victoria Hankemeier, Malte Schilling
https://doi.org/10.14428/esann/2026.ES2026-150
Victoria Hankemeier, Malte Schilling
https://doi.org/10.14428/esann/2026.ES2026-150
Abstract:
Spatio-temporal models analyze spatial structures and temporal dynamics, which makes them prone to information degeneration among space and time. Prior literature has demonstrated that over-squashing in causal attention or temporal convolutions creates a bias on the first tokens. To analyze whether such a bias is present in temporal attention mechanisms, we derive sensitivity bounds on the expected value of the Jacobian of a temporal attention layer. We theoretically show how off-diagonal attention scores depend on the sequence length, and that temporal attention matrices suffer a diagonal attention sink. We suggest regularization methods, and experimentally demonstrate their effectiveness.
Spatio-temporal models analyze spatial structures and temporal dynamics, which makes them prone to information degeneration among space and time. Prior literature has demonstrated that over-squashing in causal attention or temporal convolutions creates a bias on the first tokens. To analyze whether such a bias is present in temporal attention mechanisms, we derive sensitivity bounds on the expected value of the Jacobian of a temporal attention layer. We theoretically show how off-diagonal attention scores depend on the sequence length, and that temporal attention matrices suffer a diagonal attention sink. We suggest regularization methods, and experimentally demonstrate their effectiveness.
Exploring the Relationship Between Synaptic Dynamics Properties: Gain-Control and Temporal Filtering
Ferney Beltran-Velandia, Nico Scherf, Martin Bogdan
https://doi.org/10.14428/esann/2026.ES2026-236
Ferney Beltran-Velandia, Nico Scherf, Martin Bogdan
https://doi.org/10.14428/esann/2026.ES2026-236
Abstract:
The Gain-control property of Synaptic Dynamics (SD) describes how the response of a neuron, mediated by short-term depression, can be sensitive to proportional changes of stochastic inputs. This has been related to the temporal filtering property of synaptic efficacy. However, the limits of the strength of Gain-Control in relationship to the efficacy have not been explored. This paper meets this gap by simulating networks using two synapses with fast- and slow-decays of efficacy. The results show that the Gain-Control effect decreases with the decay of efficacy. Formally describing this relationship can facilitate the integration of SD into Spiking Neural Networks.
The Gain-control property of Synaptic Dynamics (SD) describes how the response of a neuron, mediated by short-term depression, can be sensitive to proportional changes of stochastic inputs. This has been related to the temporal filtering property of synaptic efficacy. However, the limits of the strength of Gain-Control in relationship to the efficacy have not been explored. This paper meets this gap by simulating networks using two synapses with fast- and slow-decays of efficacy. The results show that the Gain-Control effect decreases with the decay of efficacy. Formally describing this relationship can facilitate the integration of SD into Spiking Neural Networks.
On the Importance of Time Constants in Spiking Neural Networks
Filippa Brandt, Saeed Bastani, Alexander Hunt, Amir Aminifar, Baktash Behmanesh
https://doi.org/10.14428/esann/2026.ES2026-325
Filippa Brandt, Saeed Bastani, Alexander Hunt, Amir Aminifar, Baktash Behmanesh
https://doi.org/10.14428/esann/2026.ES2026-325
Abstract:
Time constants in spiking neural networks (SNNs) are crucial for determining performance. While prior work shows that learning time constants can improve accuracy, it typically assumes near-optimal initial values and rarely examines recovery from poor initializations. We systematically study how membrane and synaptic time constants affect SNN performance using multiple training strategies. Our results show that suboptimal values for time constants can reduce accuracy by nearly 10\%, but networks can recover through optimization during the training process.
Time constants in spiking neural networks (SNNs) are crucial for determining performance. While prior work shows that learning time constants can improve accuracy, it typically assumes near-optimal initial values and rarely examines recovery from poor initializations. We systematically study how membrane and synaptic time constants affect SNN performance using multiple training strategies. Our results show that suboptimal values for time constants can reduce accuracy by nearly 10\%, but networks can recover through optimization during the training process.
A New Positional Encoding Loss for Anomaly Transformer in Time series Anomaly Detection
Quan Khuu, Huynh Ngu
https://doi.org/10.14428/esann/2026.ES2026-370
Quan Khuu, Huynh Ngu
https://doi.org/10.14428/esann/2026.ES2026-370
Abstract:
Time-series anomaly detection (TSAD) is central to automated monitoring, where early detection of unexpected behaviors helps prevent system failures. Despite its strong performance, Anomaly Transformer remains limited by conventional positional encoding, which can cause positional duplication across input tokens and weaken temporal separability. To address this issue, we propose a learnable positional encoding (PE) module trained with a PE loss that explicitly penalizes duplicated positional representations, thereby improving temporal distinguishability. Experiments on three benchmarks---PSM, MSL, and SMAP---show consistent gains over the Anomaly Transformer baseline, improving F1 by 0.76, 0.63, and 0.21 percentage points, respectively. These results suggest that regularizing positional representations is a simple and general way to strengthen Transformer-based TSAD. The main code is available in Github.
Time-series anomaly detection (TSAD) is central to automated monitoring, where early detection of unexpected behaviors helps prevent system failures. Despite its strong performance, Anomaly Transformer remains limited by conventional positional encoding, which can cause positional duplication across input tokens and weaken temporal separability. To address this issue, we propose a learnable positional encoding (PE) module trained with a PE loss that explicitly penalizes duplicated positional representations, thereby improving temporal distinguishability. Experiments on three benchmarks---PSM, MSL, and SMAP---show consistent gains over the Anomaly Transformer baseline, improving F1 by 0.76, 0.63, and 0.21 percentage points, respectively. These results suggest that regularizing positional representations is a simple and general way to strengthen Transformer-based TSAD. The main code is available in Github.
Optimal Training of the Online Newton Algorithm based on Conformal CUSUM
Thomas Grava, Frédéric Vrins
https://doi.org/10.14428/esann/2026.ES2026-91
Thomas Grava, Frédéric Vrins
https://doi.org/10.14428/esann/2026.ES2026-91
Abstract:
We apply conformal CUSUM to identify the training period of the Online Newton Algorithm in a continuous learning setup. Numerical simulations demonstrate the ability of our approach, called Lean-ONS, to efficiently detect breaks and identify a relevant training set, which improves prediction performance compared to standard alternatives in the literature.
We apply conformal CUSUM to identify the training period of the Online Newton Algorithm in a continuous learning setup. Numerical simulations demonstrate the ability of our approach, called Lean-ONS, to efficiently detect breaks and identify a relevant training set, which improves prediction performance compared to standard alternatives in the literature.
Lightweight Personalisation for MEMS-Based Wearables: A Padel Stroke Recognition Case Study
Alberto Gascón Roche, Fatemeh Akbarian, Amir Aminifar, Alvaro Marco, Roberto Casas
https://doi.org/10.14428/esann/2026.ES2026-180
Alberto Gascón Roche, Fatemeh Akbarian, Amir Aminifar, Alvaro Marco, Roberto Casas
https://doi.org/10.14428/esann/2026.ES2026-180
Abstract:
This work investigates lightweight personalisation of MEMS-based wearables using a public padel database as a case study. We compare a centralised CNN model, single-user models and two fine-tuning schemes (full and last-layer) on wrist-worn IMU data from 23 players and 13 stroke classes. Personalised models with data augmentation achieve weighted F1-scores above 90%, closing most of the gap to an optimistic single-user upper bound while reducing inter-subject variability. FLOP and memory analyses show that last-layer fine-tuning offers a favourable trade-off between accuracy and efficiency for on-device deployment in MEMS-based wearables.
This work investigates lightweight personalisation of MEMS-based wearables using a public padel database as a case study. We compare a centralised CNN model, single-user models and two fine-tuning schemes (full and last-layer) on wrist-worn IMU data from 23 players and 13 stroke classes. Personalised models with data augmentation achieve weighted F1-scores above 90%, closing most of the gap to an optimistic single-user upper bound while reducing inter-subject variability. FLOP and memory analyses show that last-layer fine-tuning offers a favourable trade-off between accuracy and efficiency for on-device deployment in MEMS-based wearables.
General Continual Unsupervised Learning with Augmented Vector Quantization
Hervé Frezza-Buet, Nasr Allah Aghelias, Bernard Girau
https://doi.org/10.14428/esann/2026.ES2026-179
Hervé Frezza-Buet, Nasr Allah Aghelias, Bernard Girau
https://doi.org/10.14428/esann/2026.ES2026-179
Abstract:
In this paper, we introduce a novel online heuristic for dynamic topological
vector quantization models, called SAND (Stable Adaptive Network Drift-handling). This algorithm enhances existing models by enabling
them to track non-stationary distributions, i.e., input streams
composed of distinct contexts that drift over time. It employs an
online out-of-distribution detector to identify context switches and
subsequently updates the labels of vector quantization model nodes
accordingly. The proposed algorithm is fully unsupervised, requiring
no external labels to learn, recall, or adapt to varying contexts. SAND
thus provides an efficient solution to continuously learn representations
of evolving data streams in dynamic environments.
In this paper, we introduce a novel online heuristic for dynamic topological
vector quantization models, called SAND (Stable Adaptive Network Drift-handling). This algorithm enhances existing models by enabling
them to track non-stationary distributions, i.e., input streams
composed of distinct contexts that drift over time. It employs an
online out-of-distribution detector to identify context switches and
subsequently updates the labels of vector quantization model nodes
accordingly. The proposed algorithm is fully unsupervised, requiring
no external labels to learn, recall, or adapt to varying contexts. SAND
thus provides an efficient solution to continuously learn representations
of evolving data streams in dynamic environments.
Natural Language Processing
Lost in Modality: Evaluating the Effectiveness of Text-Based Membership Inference Attacks on Large Multimodal Models
ZIYI TONG, FEIFEI SUN, Le Minh Nguyen
https://doi.org/10.14428/esann/2026.ES2026-176
ZIYI TONG, FEIFEI SUN, Le Minh Nguyen
https://doi.org/10.14428/esann/2026.ES2026-176
Abstract:
Large Multimodal Language Models (MLLMs) are emerging as one of the foundational tools in an expanding range of applications. Consequently, understanding training-data leakage in these systems is increasingly critical. Log-probability-based membership inference attacks (MIAs) have become a widely adopted approach for assessing data exposure in large language models (LLMs), yet their effect in MLLMs remains unclear. We present the first comprehensive evaluation of extending these text-based MIA methods to multimodal settings. Our experiments under vision-and-text (V+T) and text-only (T-only) conditions across the DeepSeek-VL and InternVL model families show that in in-distribution settings, logit-based MIAs perform comparably across configurations, with a slight V+T advantage. Conversely, in out-of-distribution settings, visual inputs act as regularizers, effectively masking membership signals.
Large Multimodal Language Models (MLLMs) are emerging as one of the foundational tools in an expanding range of applications. Consequently, understanding training-data leakage in these systems is increasingly critical. Log-probability-based membership inference attacks (MIAs) have become a widely adopted approach for assessing data exposure in large language models (LLMs), yet their effect in MLLMs remains unclear. We present the first comprehensive evaluation of extending these text-based MIA methods to multimodal settings. Our experiments under vision-and-text (V+T) and text-only (T-only) conditions across the DeepSeek-VL and InternVL model families show that in in-distribution settings, logit-based MIAs perform comparably across configurations, with a slight V+T advantage. Conversely, in out-of-distribution settings, visual inputs act as regularizers, effectively masking membership signals.
Task-Specific Efficiency Analysis: When Small Language Models Outperform Large Language Models
Jinghan Cao, Yu Ma, Xinjin Li, Qingyang Ren, Xiangyun Chen
https://doi.org/10.14428/esann/2026.ES2026-274
Jinghan Cao, Yu Ma, Xinjin Li, Qingyang Ren, Xiangyun Chen
https://doi.org/10.14428/esann/2026.ES2026-274
Abstract:
Large Language Models achieve remarkable performance but
incur substantial computational costs unsuitable for resource-constrained
deployments. This paper presents the first comprehensive task-specific ef-
ficiency analysis comparing 16 language models across five diverse NLP
tasks. We introduce the Performance-Efficiency Ratio (PER), a novel
metric integrating accuracy, throughput, memory, and latency through
geometric mean normalization. Our systematic evaluation reveals that
small models (0.5–3B parameters) achieve superior PER scores across all
given tasks. These findings establish quantitative foundations for deploy-
ing small models in production environments prioritizing inference effi-
ciency over marginal accuracy gains.
Large Language Models achieve remarkable performance but
incur substantial computational costs unsuitable for resource-constrained
deployments. This paper presents the first comprehensive task-specific ef-
ficiency analysis comparing 16 language models across five diverse NLP
tasks. We introduce the Performance-Efficiency Ratio (PER), a novel
metric integrating accuracy, throughput, memory, and latency through
geometric mean normalization. Our systematic evaluation reveals that
small models (0.5–3B parameters) achieve superior PER scores across all
given tasks. These findings establish quantitative foundations for deploy-
ing small models in production environments prioritizing inference effi-
ciency over marginal accuracy gains.
Emotion Recognition in Multimodal Social Data
Lucia Passaro, Davide Amadei, Davide Bacciu
https://doi.org/10.14428/esann/2026.ES2026-287
Lucia Passaro, Davide Amadei, Davide Bacciu
https://doi.org/10.14428/esann/2026.ES2026-287
Abstract:
Emotion recognition on social media is often approached in unimodal or single-label settings, despite the multimodal nature of online communication. This paper presents a study of multilabel emotion recognition from paired text-image data. We evaluate vision--language encoders and compare them with strong unimodal baselines and a zero-shot multimodal LLM. A simple multimodal classifier built on CLIP achieves the most reliable performance. Data-centric additions such as emoji transcription, caption augmentation, and pseudo-labelling offer limited gains, whereas calibrated decision thresholds have a consistent effect. The results highlight the value of visual cues and show limitations of recent VLMs.
Emotion recognition on social media is often approached in unimodal or single-label settings, despite the multimodal nature of online communication. This paper presents a study of multilabel emotion recognition from paired text-image data. We evaluate vision--language encoders and compare them with strong unimodal baselines and a zero-shot multimodal LLM. A simple multimodal classifier built on CLIP achieves the most reliable performance. Data-centric additions such as emoji transcription, caption augmentation, and pseudo-labelling offer limited gains, whereas calibrated decision thresholds have a consistent effect. The results highlight the value of visual cues and show limitations of recent VLMs.
Alignment of Islamic Legal Texts
HAZIM BAROUDI, Wassim Ammar, Farid Bouchiba, Shadha Karoumi Marmardji, Christian Mueller, Fabrice Rossi
https://doi.org/10.14428/esann/2026.ES2026-334
HAZIM BAROUDI, Wassim Ammar, Farid Bouchiba, Shadha Karoumi Marmardji, Christian Mueller, Fabrice Rossi
https://doi.org/10.14428/esann/2026.ES2026-334
Abstract:
This work addresses the automatic alignment of chapters across different Arabic legal texts. We examine two representation strategies: TF-IDF-based lexical embeddings and contextual semantic embeddings generated by AraBERT. These embeddings enable us to assess the semantic proximity between chapters. We then frame the text alignment process as an optimal transport problem, incorporating soft structural constraints. To analyze the impact of method parameters, we use a curated ground-truth dataset derived from a pair of representative texts.
This work addresses the automatic alignment of chapters across different Arabic legal texts. We examine two representation strategies: TF-IDF-based lexical embeddings and contextual semantic embeddings generated by AraBERT. These embeddings enable us to assess the semantic proximity between chapters. We then frame the text alignment process as an optimal transport problem, incorporating soft structural constraints. To analyze the impact of method parameters, we use a curated ground-truth dataset derived from a pair of representative texts.
Network analysis of conferences: Mapping the backbone of ESANN topics
Arya Nair, Jens Christian Claussen
https://doi.org/10.14428/esann/2026.ES2026-343
Arya Nair, Jens Christian Claussen
https://doi.org/10.14428/esann/2026.ES2026-343
Abstract:
Academic conferences are central to knowledge dissemination and community formation. The {\sl maps of science} approach [Boyack, Klavans and Börner 2005] introduced the framework of visualization of sets of text documents, based on distances calculated from (dis)similarity between the text documents. Can a similar framework be applied to conferences, to visualize subtopics and their fine structure and time development? Here, we focus on the abstracts of the ESANN conference series to trace topic and collaboration dynamics from 2010 to 2025, based on Topic Modelling of ~2,000 abstracts used SPECTER Embeddings, DBSCAN/HDBSCAN Clustering, UMAP Visualisation, and Temporal Drift Analysis. A co-authorship network of 3,500 authors and 7,500 ties was examined through Centrality Measures, Clustering Coefficients, Louvain Community Detection, and Largest Connected Component Analysis. Our findings reveal continuity in Data Mining and Graph Learning, rapid growth in Deep Learning for Natural Language Processing and Medical Imaging, and a decline of Feature Selection and Spectral Clustering. The collaboration network shows a fragmented core-periphery structure reliant on a few hubs and brokers, reflecting both continuity and disruption in Machine Learning research.
Academic conferences are central to knowledge dissemination and community formation. The {\sl maps of science} approach [Boyack, Klavans and Börner 2005] introduced the framework of visualization of sets of text documents, based on distances calculated from (dis)similarity between the text documents. Can a similar framework be applied to conferences, to visualize subtopics and their fine structure and time development? Here, we focus on the abstracts of the ESANN conference series to trace topic and collaboration dynamics from 2010 to 2025, based on Topic Modelling of ~2,000 abstracts used SPECTER Embeddings, DBSCAN/HDBSCAN Clustering, UMAP Visualisation, and Temporal Drift Analysis. A co-authorship network of 3,500 authors and 7,500 ties was examined through Centrality Measures, Clustering Coefficients, Louvain Community Detection, and Largest Connected Component Analysis. Our findings reveal continuity in Data Mining and Graph Learning, rapid growth in Deep Learning for Natural Language Processing and Medical Imaging, and a decline of Feature Selection and Spectral Clustering. The collaboration network shows a fragmented core-periphery structure reliant on a few hubs and brokers, reflecting both continuity and disruption in Machine Learning research.
Four Quadrants of Difficulty: A Categorisation for Curriculum Learning in NLP
Vanessa Toborek, Sebastian Müller, Christian Bauckhage
https://doi.org/10.14428/esann/2026.ES2026-351
Vanessa Toborek, Sebastian Müller, Christian Bauckhage
https://doi.org/10.14428/esann/2026.ES2026-351
Abstract:
Curriculum Learning (CL) aims to improve the outcome of model training by estimating the difficulty of training samples and scheduling them accordingly. In NLP, difficulty is commonly approximated using task-agnostic linguistic heuristics or human intuition, implicitly assuming that these signals correlate with what neural models find difficult to learn. We propose a four-quadrant categorisation of difficulty signals -- human vs. model and task-agnostic vs. task-dependent -- and systematically analyse their interactions on a natural language understanding dataset. We find that task-agnostic features behave largely independently and that only task-dependent features align. These findings challenge common CL intuitions and highlight the need for lightweight, task-dependent difficulty estimators that better reflect model learning behaviour.
Curriculum Learning (CL) aims to improve the outcome of model training by estimating the difficulty of training samples and scheduling them accordingly. In NLP, difficulty is commonly approximated using task-agnostic linguistic heuristics or human intuition, implicitly assuming that these signals correlate with what neural models find difficult to learn. We propose a four-quadrant categorisation of difficulty signals -- human vs. model and task-agnostic vs. task-dependent -- and systematically analyse their interactions on a natural language understanding dataset. We find that task-agnostic features behave largely independently and that only task-dependent features align. These findings challenge common CL intuitions and highlight the need for lightweight, task-dependent difficulty estimators that better reflect model learning behaviour.
AsymPuzl: A minimal puzzle testbed for LLM-based two agent communication with information asymmetry
Xavier Cadet, Edward Koh, Peter Chin
https://doi.org/10.14428/esann/2026.ES2026-366
Xavier Cadet, Edward Koh, Peter Chin
https://doi.org/10.14428/esann/2026.ES2026-366
Abstract:
Large Language Model (LLM) agents are increasingly studied in multi-turn, multi-agent scenarios, yet most existing setups emphasize open-ended roleplay rather than controlled evaluation.
We introduce AsymPuzl, a minimal but expressive two-agent puzzle environment isolating communication under information asymmetry.
Each agent observes complementary but incomplete views of a puzzle and must exchange messages to solve it.
Using contemporary LLMs, we show that (i) models such as GPT-5 and Claude-4.0 reliably solve puzzles of different sizes by sharing complete information in few turns, (ii) feedback design in multi-agent LLM systems is non-trivial, more information can degrade performance.
Large Language Model (LLM) agents are increasingly studied in multi-turn, multi-agent scenarios, yet most existing setups emphasize open-ended roleplay rather than controlled evaluation.
We introduce AsymPuzl, a minimal but expressive two-agent puzzle environment isolating communication under information asymmetry.
Each agent observes complementary but incomplete views of a puzzle and must exchange messages to solve it.
Using contemporary LLMs, we show that (i) models such as GPT-5 and Claude-4.0 reliably solve puzzles of different sizes by sharing complete information in few turns, (ii) feedback design in multi-agent LLM systems is non-trivial, more information can degrade performance.
Model Sees but Does Not Learn: Eliminating Error Propagation in Reasoning Distillation
jaeeun jang, Hansle Lee, wonjun cho, sangmin kim
https://doi.org/10.14428/esann/2026.ES2026-94
jaeeun jang, Hansle Lee, wonjun cho, sangmin kim
https://doi.org/10.14428/esann/2026.ES2026-94
Abstract:
Small LLMs struggle to acquire robust reasoning through RLVR due to instability and reward sparsity, and standard SFT, which directly imitates teacher-generated reasoning traces, inherits erroneous intermediate steps and collapses reasoning diversity. We introduce VGGM, a selective-learning objective that applies gradients only to verified correction segments, preventing error memorization and mitigating excessive entropy collapse. VGGM yields more stable optimization, stronger metacognitive correction behavior, and substantially higher data efficiency. Across GSM8K, MATH, and AMC'23, VGGM consistently outperforms standard SFT and, when combined with GRPO, achieves performance approaching DeepSeek-R1-distillation models while using 40 times less supervised data.
Small LLMs struggle to acquire robust reasoning through RLVR due to instability and reward sparsity, and standard SFT, which directly imitates teacher-generated reasoning traces, inherits erroneous intermediate steps and collapses reasoning diversity. We introduce VGGM, a selective-learning objective that applies gradients only to verified correction segments, preventing error memorization and mitigating excessive entropy collapse. VGGM yields more stable optimization, stronger metacognitive correction behavior, and substantially higher data efficiency. Across GSM8K, MATH, and AMC'23, VGGM consistently outperforms standard SFT and, when combined with GRPO, achieves performance approaching DeepSeek-R1-distillation models while using 40 times less supervised data.
Efficient and Resilient Machine Learning for Industrial Applications
Efficient and Resilient Machine Learning for Industrial Applications
Philipp Wissmann, Philip Naumann, Daniel Hein, Steffen Udluft, Marc Weber, Simon Leszek, Thomas Runkler
https://doi.org/10.14428/esann/2026.ES2026-6
Philipp Wissmann, Philip Naumann, Daniel Hein, Steffen Udluft, Marc Weber, Simon Leszek, Thomas Runkler
https://doi.org/10.14428/esann/2026.ES2026-6
Abstract:
Machine learning is rapidly transforming industrial landscapes, yet it faces significant hurdles related to efficiency and resilience. This paper discusses industrial challenges and provides a structured overview of current approaches, encompassing data-centric methodologies, efficient training for reliable solutions, hardware-optimized deployment, and the emerging role of foundation models.
Machine learning is rapidly transforming industrial landscapes, yet it faces significant hurdles related to efficiency and resilience. This paper discusses industrial challenges and provides a structured overview of current approaches, encompassing data-centric methodologies, efficient training for reliable solutions, hardware-optimized deployment, and the emerging role of foundation models.
Variational Deep Embedding for Unsupervised Clustering of Industrial Noise in Steelmaking Plants
Muhammad Waseem Akram, Marco Vannucci, Giorgio Carlo Buttazzo, Valentina Colla, Stefano Dettori, Donatella Salvatore
https://doi.org/10.14428/esann/2026.ES2026-280
Muhammad Waseem Akram, Marco Vannucci, Giorgio Carlo Buttazzo, Valentina Colla, Stefano Dettori, Donatella Salvatore
https://doi.org/10.14428/esann/2026.ES2026-280
Abstract:
Industrial plants are major sources of environmental noise, producing complex and high-intensity acoustic emissions that vary across different operational conditions. Automatically characterizing the sources that generate harmful acoustic emissions is crucial to take the necessary actions to reduce them. However, manually labeling these sounds is impractical due to their volume and variability. In this study, we employ an unsupervised deep learning framework for clustering industrial sound emissions in steelmaking plants, focusing on areas such as the hot rolling mill, Electric Arc Furnace, and scrapyard. The approach integrates Variational Autoencoders with Gaussian Mixture Models to learn compact latent representations from Mel-spectrogram features of raw, unlabelled audio data. We compare this approach to traditional clustering techniques such as K-means and GMM, as well as Deep Embedding Clustering. The results demonstrate that the approach significantly outperforms traditional methods, offering reliable and interpretable clustering of industrial acoustic events. This research contributes to the development of automated, efficient, and sustainable noise-monitoring systems for industrial operations, addressing key challenges in environmental noise monitoring.
Industrial plants are major sources of environmental noise, producing complex and high-intensity acoustic emissions that vary across different operational conditions. Automatically characterizing the sources that generate harmful acoustic emissions is crucial to take the necessary actions to reduce them. However, manually labeling these sounds is impractical due to their volume and variability. In this study, we employ an unsupervised deep learning framework for clustering industrial sound emissions in steelmaking plants, focusing on areas such as the hot rolling mill, Electric Arc Furnace, and scrapyard. The approach integrates Variational Autoencoders with Gaussian Mixture Models to learn compact latent representations from Mel-spectrogram features of raw, unlabelled audio data. We compare this approach to traditional clustering techniques such as K-means and GMM, as well as Deep Embedding Clustering. The results demonstrate that the approach significantly outperforms traditional methods, offering reliable and interpretable clustering of industrial acoustic events. This research contributes to the development of automated, efficient, and sustainable noise-monitoring systems for industrial operations, addressing key challenges in environmental noise monitoring.
TSFM in-context learning for time-series classification of bearing-health status
Michel Tokic, Slobodan Djukanovic, Anja von Beuningen, Cheng Feng
https://doi.org/10.14428/esann/2026.ES2026-77
Michel Tokic, Slobodan Djukanovic, Anja von Beuningen, Cheng Feng
https://doi.org/10.14428/esann/2026.ES2026-77
Abstract:
We introduce a classification method based on in-context learning using time-series foundation models (TSFMs). We demonstrate how data not included in the TSFM training can be classified without fine-tuning the foundation model or training a traditional classification model. Examples are represented as targets (class labels) and covariates (data matrices) within the TSFM prompt, enabling the classification of unknown covariate data patterns alongside the forecast horizon through in-context learning.
We apply this method to vibration data to assess the health state of a bearing within a servo-press motor. The method transforms frequency-domain reference signals into pseudo time-series patterns, generates aligned covariate and target signals, and uses the TSFM to predict class-membership probabilities for predefined labels.
Leveraging the scalability of pre-trained models, the proposed method demonstrates effectiveness across varying operational conditions. This represents significant progress beyond traditional, custom AI solutions towards broader AI-driven maintenance systems that could potentially be provided as Model- or Software-as-a-Service applications.
We introduce a classification method based on in-context learning using time-series foundation models (TSFMs). We demonstrate how data not included in the TSFM training can be classified without fine-tuning the foundation model or training a traditional classification model. Examples are represented as targets (class labels) and covariates (data matrices) within the TSFM prompt, enabling the classification of unknown covariate data patterns alongside the forecast horizon through in-context learning.
We apply this method to vibration data to assess the health state of a bearing within a servo-press motor. The method transforms frequency-domain reference signals into pseudo time-series patterns, generates aligned covariate and target signals, and uses the TSFM to predict class-membership probabilities for predefined labels.
Leveraging the scalability of pre-trained models, the proposed method demonstrates effectiveness across varying operational conditions. This represents significant progress beyond traditional, custom AI solutions towards broader AI-driven maintenance systems that could potentially be provided as Model- or Software-as-a-Service applications.
Label-Efficient and Adaptable Image Selection for Large-Scale E-Commerce Catalogs
Maytal Messing, Guy Shani
https://doi.org/10.14428/esann/2026.ES2026-375
Maytal Messing, Guy Shani
https://doi.org/10.14428/esann/2026.ES2026-375
Abstract:
E-commerce catalogs depend on high-quality, representative product images for user trust and engagement. Candidate images often include distracting backgrounds, irrelevant elements, or non-compliant branding or certification marks. We present a label-efficient, adaptable image selection pipeline combining: (i) prompt-guided zero-shot image cropping, which reduces non-product or branded elements and improves downstream quality predictions, and (ii) weakly supervised embedding-based outlier filtering that removes images inconsistent with the target product. Evaluated on high-impact categories, the approach adapts rapidly to new products, image sources, and catalog requirements while requiring no category-specific labels or retraining, providing a practical, scalable solution for industrial catalog curation.
E-commerce catalogs depend on high-quality, representative product images for user trust and engagement. Candidate images often include distracting backgrounds, irrelevant elements, or non-compliant branding or certification marks. We present a label-efficient, adaptable image selection pipeline combining: (i) prompt-guided zero-shot image cropping, which reduces non-product or branded elements and improves downstream quality predictions, and (ii) weakly supervised embedding-based outlier filtering that removes images inconsistent with the target product. Evaluated on high-impact categories, the approach adapts rapidly to new products, image sources, and catalog requirements while requiring no category-specific labels or retraining, providing a practical, scalable solution for industrial catalog curation.
Context-Aware Graph Attention for Unsupervised Telco Anomaly Detection
Sara Malacarne, Eirik Hoel-Høiseth, David Zsolt Biro, Erlend Aune, Massimiliano Ruocco
https://doi.org/10.14428/esann/2026.ES2026-124
Sara Malacarne, Eirik Hoel-Høiseth, David Zsolt Biro, Erlend Aune, Massimiliano Ruocco
https://doi.org/10.14428/esann/2026.ES2026-124
Abstract:
We propose C-MTAD-GAT, an \emph{unsupervised}, \emph{context-aware} graph-attention
model for anomaly detection for multivariate Key Performance Indicator (KPI)
time series from mobile networks. C-MTAD-GAT combines temporal and feature-wise graph
attention with lightweight context embeddings, and uses a deterministic
GRU-based reconstruction head and multi-step forecaster to produce anomaly scores per-feature,
per-timestamp. Detection thresholds are calibrated \emph{without
labels} from validation residuals, keeping the pipeline fully
unsupervised. On the public TELCO dataset, C-MTAD-GAT consistently outperforms
the graph-based MTAD-GAT and the Telco-specific DC-VAE, two state-of-the-art baselines, in both event-level and pointwise F1, while triggering
substantially fewer alarms than DC-VAE under the same calibration. C-MTAD-GAT is also deployed in the Core network of a national mobile operator, demonstrating its efficiency and resilience in a real industrial monitoring pipeline.
We propose C-MTAD-GAT, an \emph{unsupervised}, \emph{context-aware} graph-attention
model for anomaly detection for multivariate Key Performance Indicator (KPI)
time series from mobile networks. C-MTAD-GAT combines temporal and feature-wise graph
attention with lightweight context embeddings, and uses a deterministic
GRU-based reconstruction head and multi-step forecaster to produce anomaly scores per-feature,
per-timestamp. Detection thresholds are calibrated \emph{without
labels} from validation residuals, keeping the pipeline fully
unsupervised. On the public TELCO dataset, C-MTAD-GAT consistently outperforms
the graph-based MTAD-GAT and the Telco-specific DC-VAE, two state-of-the-art baselines, in both event-level and pointwise F1, while triggering
substantially fewer alarms than DC-VAE under the same calibration. C-MTAD-GAT is also deployed in the Core network of a national mobile operator, demonstrating its efficiency and resilience in a real industrial monitoring pipeline.
A comparison of open time-series foundation models for industrial manufacturing applications
Can Calisir, Simon Leszek
https://doi.org/10.14428/esann/2026.ES2026-310
Can Calisir, Simon Leszek
https://doi.org/10.14428/esann/2026.ES2026-310
Abstract:
Large-scale, pre-trained foundation models have recently been introduced for time-series modeling. While typically evaluated on broad forecasting benchmarks, we assess their suitability for industrial manufacturing. We benchmark three open-source time series foundation models (TSFMs) on two representative datasets: steel-plant energy consumption and computer numerical control (CNC) milling spindle current. In the structured, pattern-driven steel setting, TSFMs consistently outperform classical baselines, even without task-specific training. In contrast, the highly dynamic CNC process reveals limited TSFM gains without domain-specific signals, with simple models excelling once control covariates are provided. These results highlight both the promise and current limitations of TSFMs for real-world industrial applications.
Large-scale, pre-trained foundation models have recently been introduced for time-series modeling. While typically evaluated on broad forecasting benchmarks, we assess their suitability for industrial manufacturing. We benchmark three open-source time series foundation models (TSFMs) on two representative datasets: steel-plant energy consumption and computer numerical control (CNC) milling spindle current. In the structured, pattern-driven steel setting, TSFMs consistently outperform classical baselines, even without task-specific training. In contrast, the highly dynamic CNC process reveals limited TSFM gains without domain-specific signals, with simple models excelling once control covariates are provided. These results highlight both the promise and current limitations of TSFMs for real-world industrial applications.
Reliability, Safety and Robustness of AI applications
Techniques for Reliable, Safe and Robust AI Applications
Caroline König, Cecilio Angulo, Pedro Jesús Copado, G. Kumar Venayagamoorthy
https://doi.org/10.14428/esann/2026.ES2026-5
Caroline König, Cecilio Angulo, Pedro Jesús Copado, G. Kumar Venayagamoorthy
https://doi.org/10.14428/esann/2026.ES2026-5
Abstract:
Reliability, safety, and robustness are essential requirements for safety critical applications. Implementing these properties in artificial intelligence based systems introduces additional challenges, particularly in the development and validation of data driven models. To address these requirements, new techniques are needed for assessing predictive uncertainty, ensuring robustness against intentional or environmental input perturbations, integrating explicit safety constraints into model architectures, and enabling human oversight and interpretability to support auditing and supervision.
Reliability, safety, and robustness are essential requirements for safety critical applications. Implementing these properties in artificial intelligence based systems introduces additional challenges, particularly in the development and validation of data driven models. To address these requirements, new techniques are needed for assessing predictive uncertainty, ensuring robustness against intentional or environmental input perturbations, integrating explicit safety constraints into model architectures, and enabling human oversight and interpretability to support auditing and supervision.
Out-of-Distribution Segmentation via Wasserstein-Based Evidential Uncertainty
Arnold Brosch, Abdelrahman Eldesokey, Michael Felsberg, Kira Maag
https://doi.org/10.14428/esann/2026.ES2026-108
Arnold Brosch, Abdelrahman Eldesokey, Michael Felsberg, Kira Maag
https://doi.org/10.14428/esann/2026.ES2026-108
Abstract:
Deep neural networks achieve superior performance in seman-
tic segmentation, but are limited to a predefined set of classes, which leads
to failures when they encounter unknown objects in open-world scenarios.
Recognizing and segmenting these out-of-distribution (OOD) objects is
crucial for safety-critical applications such as automated driving. In this
work, we present an evidence segmentation framework using a Wasserstein
loss, which captures distributional distances while respecting the probabil-
ity simplex geometry. Combined with Kullback-Leibler regularization and
Dice structural consistency terms, our approach leads to improved OOD
segmentation performance compared to uncertainty-based approaches.
Deep neural networks achieve superior performance in seman-
tic segmentation, but are limited to a predefined set of classes, which leads
to failures when they encounter unknown objects in open-world scenarios.
Recognizing and segmenting these out-of-distribution (OOD) objects is
crucial for safety-critical applications such as automated driving. In this
work, we present an evidence segmentation framework using a Wasserstein
loss, which captures distributional distances while respecting the probabil-
ity simplex geometry. Combined with Kullback-Leibler regularization and
Dice structural consistency terms, our approach leads to improved OOD
segmentation performance compared to uncertainty-based approaches.
Neuromodulated Delta Adapters: Stabilizing Test-Time Adaptation via Gated Error Correction
Jiarui Zhang, Deng Yifan
https://doi.org/10.14428/esann/2026.ES2026-143
Jiarui Zhang, Deng Yifan
https://doi.org/10.14428/esann/2026.ES2026-143
Abstract:
Static neural networks rapidly degrade under distribution shift, and existing Test-Time Adaptation (TTA) methods either backpropagate during inference or exhibit unstable Hebbian dynamics. We propose the Neuromodulated Delta Adapter (NDA), a plug-and-play PEFT module that inserts a rank-r fast-weight bottleneck into frozen Transformers. NDA couples a gated Delta rule with a three-factor ``surprise'' signal, providing adaptive gain control that keeps the fast weights Lyapunov-stable. On the corrected FLORES-101 continuous benchmark NDA surpasses TENT by +0.9 spBLEU and remains the sole dynamic adapter capable of maintaining stability in the challenging English-to-Yoruba translation regime. On PG-19 streaming it lowers perplexity to 36.9 at 8k tokens and recalls 94% of long-range ``needle'' facts, demonstrating practical, low-overhead test-time learning.
Static neural networks rapidly degrade under distribution shift, and existing Test-Time Adaptation (TTA) methods either backpropagate during inference or exhibit unstable Hebbian dynamics. We propose the Neuromodulated Delta Adapter (NDA), a plug-and-play PEFT module that inserts a rank-r fast-weight bottleneck into frozen Transformers. NDA couples a gated Delta rule with a three-factor ``surprise'' signal, providing adaptive gain control that keeps the fast weights Lyapunov-stable. On the corrected FLORES-101 continuous benchmark NDA surpasses TENT by +0.9 spBLEU and remains the sole dynamic adapter capable of maintaining stability in the challenging English-to-Yoruba translation regime. On PG-19 streaming it lowers perplexity to 36.9 at 8k tokens and recalls 94% of long-range ``needle'' facts, demonstrating practical, low-overhead test-time learning.
Weakly Supervised Shortcut Learning Mitigation Using Sparse Autoencoders
sari sadiya, Muhammad Ahsan, Despina Tawadros, Phuong Quynh Le, Jörg Schlötterer, Christin Seifert, Gemma Roig
https://doi.org/10.14428/esann/2026.ES2026-170
sari sadiya, Muhammad Ahsan, Despina Tawadros, Phuong Quynh Le, Jörg Schlötterer, Christin Seifert, Gemma Roig
https://doi.org/10.14428/esann/2026.ES2026-170
Abstract:
Reliance on spurious features that coincidentally correlate with task labels (i.e., shortcut learning) remains a major barrier to the reliable deployment of machine learning models, particularly in high-stakes domains like medical diagnostics. Moreover, in such settings retraining models or collecting and labeling additional data is often impractical, limiting the applicability of many existing shortcut mitigation methods. In this paper we propose a lightweight framework that leverages sparse autoencoders to disentangle spurious from core features to mitigate shortcut learning. Our approach requires no model retraining and works even when group annotations are scarce or unavailable for certain classes. Results on standard benchmarks demonstrate that, even with as few as 50 labeled examples, reliance on spurious features can be significantly reduced.
Reliance on spurious features that coincidentally correlate with task labels (i.e., shortcut learning) remains a major barrier to the reliable deployment of machine learning models, particularly in high-stakes domains like medical diagnostics. Moreover, in such settings retraining models or collecting and labeling additional data is often impractical, limiting the applicability of many existing shortcut mitigation methods. In this paper we propose a lightweight framework that leverages sparse autoencoders to disentangle spurious from core features to mitigate shortcut learning. Our approach requires no model retraining and works even when group annotations are scarce or unavailable for certain classes. Results on standard benchmarks demonstrate that, even with as few as 50 labeled examples, reliance on spurious features can be significantly reduced.
Revisiting Neural Activation Coverage for Uncertainty Estimation
Benedikt Franke, Nils Förster, Frank Köster, Markus Lange, Arne Raulf, Asja Fischer
https://doi.org/10.14428/esann/2026.ES2026-29
Benedikt Franke, Nils Förster, Frank Köster, Markus Lange, Arne Raulf, Asja Fischer
https://doi.org/10.14428/esann/2026.ES2026-29
Abstract:
Neuron activation coverage (NAC) is a recently-proposed
technique for out-of-distribution detection and generalization. We build
upon this promising foundation and extend the method to work as an
uncertainty estimation technique for already-trained artificial neural net-
works in the domain of regression. Our experiments confirm NAC uncer-
tainty scores to be more meaningful than other techniques, e.g. Monte-
Carlo Dropout.
Neuron activation coverage (NAC) is a recently-proposed
technique for out-of-distribution detection and generalization. We build
upon this promising foundation and extend the method to work as an
uncertainty estimation technique for already-trained artificial neural net-
works in the domain of regression. Our experiments confirm NAC uncer-
tainty scores to be more meaningful than other techniques, e.g. Monte-
Carlo Dropout.
Adversarial Confusion Attack: Disrupting Multimodal Large Language Models
Jakub Hoscilowicz, Artur Janicki
https://doi.org/10.14428/esann/2026.ES2026-153
Jakub Hoscilowicz, Artur Janicki
https://doi.org/10.14428/esann/2026.ES2026-153
Abstract:
We introduce the adversarial confusion attack as a new class of threats against multimodal large language models (MLLMs). Unlike jailbreaks or targeted misclassification, the goal is to induce systematic disruption that makes the model generate incoherent or confidently incorrect outputs. Practical applications include embedding such adversarial images into websites to prevent MLLM-powered AI Agents from operating reliably. The proposed attack maximizes next-token entropy using a small ensemble of open-source MLLMs. In the white-box setting, we show that a single adversarial image can disrupt all models in the ensemble, both in the full-image and CAPTCHA-style adversarial patch settings. Despite relying on a basic adversarial technique, such as projected gradient descent (PGD), the attack generates perturbations that transfer to both unseen open-source (e.g., Qwen3-VL) and proprietary (e.g., GPT-5.1) models.
We introduce the adversarial confusion attack as a new class of threats against multimodal large language models (MLLMs). Unlike jailbreaks or targeted misclassification, the goal is to induce systematic disruption that makes the model generate incoherent or confidently incorrect outputs. Practical applications include embedding such adversarial images into websites to prevent MLLM-powered AI Agents from operating reliably. The proposed attack maximizes next-token entropy using a small ensemble of open-source MLLMs. In the white-box setting, we show that a single adversarial image can disrupt all models in the ensemble, both in the full-image and CAPTCHA-style adversarial patch settings. Despite relying on a basic adversarial technique, such as projected gradient descent (PGD), the attack generates perturbations that transfer to both unseen open-source (e.g., Qwen3-VL) and proprietary (e.g., GPT-5.1) models.
Model Selection Hijacking Adversarial Attack
Luca Pajola, Riccardo Petrucci, Francesco Marchiori, Luca Pasa, Mauro Conti
https://doi.org/10.14428/esann/2026.ES2026-162
Luca Pajola, Riccardo Petrucci, Francesco Marchiori, Luca Pasa, Mauro Conti
https://doi.org/10.14428/esann/2026.ES2026-162
Abstract:
Model selection plays a critical role in the deployment of machine learning systems, yet its vulnerability to adversarial manipulation remains largely unexplored. We introduce MOSHI (MOdel Selection HIjacking), a novel framework that examines whether targeted poisoning of only the validation set, without any access to training data, model internals, or system configuration, can systematically bias the selection process toward inferior models. Leveraging a VAE-based perturbation mechanism, we empirically demonstrate that MOSHI can induce coherent misselection in both vision and speech benchmarks, leading to models with degraded generalization, as well as increased inference latency and energy consumption. Our results highlight that model selection, typically viewed as a benign step, can significantly affect robustness-suggesting it should be treated as an integral component of adversarial ML analysis.
Model selection plays a critical role in the deployment of machine learning systems, yet its vulnerability to adversarial manipulation remains largely unexplored. We introduce MOSHI (MOdel Selection HIjacking), a novel framework that examines whether targeted poisoning of only the validation set, without any access to training data, model internals, or system configuration, can systematically bias the selection process toward inferior models. Leveraging a VAE-based perturbation mechanism, we empirically demonstrate that MOSHI can induce coherent misselection in both vision and speech benchmarks, leading to models with degraded generalization, as well as increased inference latency and energy consumption. Our results highlight that model selection, typically viewed as a benign step, can significantly affect robustness-suggesting it should be treated as an integral component of adversarial ML analysis.
The Alignment Gate: Intent and Instruction Guardrails for Agentic AI
Akash Borigi, Peggy Lindner, Alexander Schlager, Saifullah Shoaib, Rupendra Lekkala, Sai Sowjanya Bhamidipati, Amaury Lendasse
https://doi.org/10.14428/esann/2026.ES2026-172
Akash Borigi, Peggy Lindner, Alexander Schlager, Saifullah Shoaib, Rupendra Lekkala, Sai Sowjanya Bhamidipati, Amaury Lendasse
https://doi.org/10.14428/esann/2026.ES2026-172
Abstract:
This paper proposes an alignment framework for Agentic AI systems, designed to map user intents to corresponding system instructions through interpretable probabilistic associations. The framework introduces a min-median threshold rule to determine whether an instruction is plausibly linked to a given intent, providing a tunable balance between strict and lenient execution criteria. The approach is both lightweight and explainable, enabling clear visualization of alignment scores and transparent control over execution decisions. At this stage, the goal is not to obtain the optimal or final alignment verification mechanism, but rather to assess feasibility and establish a structured foundation for future, more comprehensive alignment frameworks. The method supports modern AI governance by offering a scalable, interpretable path to safer Agentic AI.
This paper proposes an alignment framework for Agentic AI systems, designed to map user intents to corresponding system instructions through interpretable probabilistic associations. The framework introduces a min-median threshold rule to determine whether an instruction is plausibly linked to a given intent, providing a tunable balance between strict and lenient execution criteria. The approach is both lightweight and explainable, enabling clear visualization of alignment scores and transparent control over execution decisions. At this stage, the goal is not to obtain the optimal or final alignment verification mechanism, but rather to assess feasibility and establish a structured foundation for future, more comprehensive alignment frameworks. The method supports modern AI governance by offering a scalable, interpretable path to safer Agentic AI.
SPARC: Superpixel-based Black-Box Adversarial Attack with Regional Confidence
Tram Ho, Ngoc-Thao Nguyen, Bac Le
https://doi.org/10.14428/esann/2026.ES2026-205
Tram Ho, Ngoc-Thao Nguyen, Bac Le
https://doi.org/10.14428/esann/2026.ES2026-205
Abstract:
Deep learning models for critical vision tasks remain vulnerable to adversarial attacks. We present SPARC, the first superpixel-targeted black-box attack offering high interpretability and strong performance. Our method uses regional confidence maps to guide perturbations to the most important regions and controls their magnitude using L2 and L1 constraints, keeping changes small and spatially coherent. In
targeted attacks, SPARC achieves a competitive success rate with subtle perturbations, while in untargeted attacks, it attains the highest success among superpixel-based and score-based methods with the fewest black-box queries. SPARC provides a practical balance of performance and interpretability, suitable for real-world black-box scenarios.
Deep learning models for critical vision tasks remain vulnerable to adversarial attacks. We present SPARC, the first superpixel-targeted black-box attack offering high interpretability and strong performance. Our method uses regional confidence maps to guide perturbations to the most important regions and controls their magnitude using L2 and L1 constraints, keeping changes small and spatially coherent. In
targeted attacks, SPARC achieves a competitive success rate with subtle perturbations, while in untargeted attacks, it attains the highest success among superpixel-based and score-based methods with the fewest black-box queries. SPARC provides a practical balance of performance and interpretability, suitable for real-world black-box scenarios.
On the Impact of Differential Privacy on Federated Neuromorphic Learning Accuracy
Luiz Pereira, Dalton Valadares, Mirko Perkusich, Kyller Gorgônio
https://doi.org/10.14428/esann/2026.ES2026-246
Luiz Pereira, Dalton Valadares, Mirko Perkusich, Kyller Gorgônio
https://doi.org/10.14428/esann/2026.ES2026-246
Abstract:
Federated Neuromorphic Learning (FNL) applies Spiking Neural Networks (SNNs) to enable energy-efficient collaborative learning on devices without centralizing data. However, integrating Differential Privacy (DP) introduces critical changes to the SNN firing dynamics, which propagate to server coordination strategies. This paper investigates DP-induced firing-rate distortions and their influence on global model convergence and generalization. Experimental ablation studies across privacy budgets and clipping bounds highlight firing distortions directly related to global accuracy degradation. Additionally, client selection instabilities related to DP noise degrade the model aggregation performance. The results reinforce that firing-rate-based FNL strategies are fragile under DP and require precise calibration to maintain the effectiveness of federated coordination.
Federated Neuromorphic Learning (FNL) applies Spiking Neural Networks (SNNs) to enable energy-efficient collaborative learning on devices without centralizing data. However, integrating Differential Privacy (DP) introduces critical changes to the SNN firing dynamics, which propagate to server coordination strategies. This paper investigates DP-induced firing-rate distortions and their influence on global model convergence and generalization. Experimental ablation studies across privacy budgets and clipping bounds highlight firing distortions directly related to global accuracy degradation. Additionally, client selection instabilities related to DP noise degrade the model aggregation performance. The results reinforce that firing-rate-based FNL strategies are fragile under DP and require precise calibration to maintain the effectiveness of federated coordination.
Interpreting Logical Explanations of Classifying Neural Networks
Fabrizio Leopardi, Faezeh Labbaf, Tomas Kolarik, Michael Wand, Natasha Sharygina
https://doi.org/10.14428/esann/2026.ES2026-326
Fabrizio Leopardi, Faezeh Labbaf, Tomas Kolarik, Michael Wand, Natasha Sharygina
https://doi.org/10.14428/esann/2026.ES2026-326
Abstract:
Formal methods are routinely used to address the issue of explainability of machine learning models. Yet, it is not always trivial to understand how a logical explanation could be useful in practice due to human readability challenges. This paper applies classical geometric methods for interpreting logical explanations and illustrates the usefulness for the users on datasets from medical and image classification domains previously studied in the context of formal explainability.
Formal methods are routinely used to address the issue of explainability of machine learning models. Yet, it is not always trivial to understand how a logical explanation could be useful in practice due to human readability challenges. This paper applies classical geometric methods for interpreting logical explanations and illustrates the usefulness for the users on datasets from medical and image classification domains previously studied in the context of formal explainability.
Classification and regression
Implementation of Multi-Matrix Median Generalized Learning Vector Quantization
Lukas Bader, Ina Terwey-Scheulen, Dietlind Zühlke
https://doi.org/10.14428/esann/2026.ES2026-157
Lukas Bader, Ina Terwey-Scheulen, Dietlind Zühlke
https://doi.org/10.14428/esann/2026.ES2026-157
Abstract:
We present Multi-Matrix Median Generalized Learning Vector Quantization (M³GLVQ), a median-based LVQ method for multiple heterogeneous proximity matrices. The model combines class-wise medoid prototypes with a learnable simplex-constrained relevance vector that determines each matrix’s contribution. Prototype positions are updated via the established greedy hill-climbing procedure of median GLVQ, while matrix weights are adapted through a normalized gradient step followed by simplex projection, ensuring stable, scale-independent updates. This alternating optimization operates directly on proximity data and requires no embedding or feature representation. Experiments on industrial customer data with four complementary proximity sources show that M$^{3}$GLVQ leads to higher recall than standard MGLVQ.
We present Multi-Matrix Median Generalized Learning Vector Quantization (M³GLVQ), a median-based LVQ method for multiple heterogeneous proximity matrices. The model combines class-wise medoid prototypes with a learnable simplex-constrained relevance vector that determines each matrix’s contribution. Prototype positions are updated via the established greedy hill-climbing procedure of median GLVQ, while matrix weights are adapted through a normalized gradient step followed by simplex projection, ensuring stable, scale-independent updates. This alternating optimization operates directly on proximity data and requires no embedding or feature representation. Experiments on industrial customer data with four complementary proximity sources show that M$^{3}$GLVQ leads to higher recall than standard MGLVQ.
Enhancing classification performance at the RAM-neuron level
Antonio Sorgente, Gianluca Coda, Alessandro De Gregorio, Massimo De Gregorio, Paolo Vanacore
https://doi.org/10.14428/esann/2026.ES2026-226
Antonio Sorgente, Gianluca Coda, Alessandro De Gregorio, Massimo De Gregorio, Paolo Vanacore
https://doi.org/10.14428/esann/2026.ES2026-226
Abstract:
Even though RAM-neurons may have different discriminative reliability, DRASiW treats their contribution uniformly.
In this work, we introduce the new metric RDA (RAM Discrimination Amplifier) that assigns to each RAM a class-specific amplification (or reduction) factor.
This factor is calculated from divergence metrics applied to the RAM address distributions.
RDA preserves the weightless nature of DRASiW while improving the quality of decision.
Experiments on 41 datasets show consistent accuracy and f1-score gains on different evaluation protocols.
Even though RAM-neurons may have different discriminative reliability, DRASiW treats their contribution uniformly.
In this work, we introduce the new metric RDA (RAM Discrimination Amplifier) that assigns to each RAM a class-specific amplification (or reduction) factor.
This factor is calculated from divergence metrics applied to the RAM address distributions.
RDA preserves the weightless nature of DRASiW while improving the quality of decision.
Experiments on 41 datasets show consistent accuracy and f1-score gains on different evaluation protocols.
Diminishing Returns - Data Integer Quantization and its Effects on Training Dynamics of Distance Based Classifiers
Thomas Davies, Alexander Engelsberger, Magdalena Psenickova, Thomas Villmann
https://doi.org/10.14428/esann/2026.ES2026-27
Thomas Davies, Alexander Engelsberger, Magdalena Psenickova, Thomas Villmann
https://doi.org/10.14428/esann/2026.ES2026-27
Abstract:
In certain subfields of machine learning, such as those involving homomorphic encryption or quantum computing, it is crucial to estimate the numerical precision of data used for training without compromising model quality. This paper introduces a method for precisely quantifying the loss of accuracy in distance-based classifiers, such as Generalized Learning Vector Quantization, when operating on quantized data represented by bounded integer sets. Our approach employs conditional entropy to measure the information loss induced by quantization, which closely correlates with the model’s mean performance.
In certain subfields of machine learning, such as those involving homomorphic encryption or quantum computing, it is crucial to estimate the numerical precision of data used for training without compromising model quality. This paper introduces a method for precisely quantifying the loss of accuracy in distance-based classifiers, such as Generalized Learning Vector Quantization, when operating on quantized data represented by bounded integer sets. Our approach employs conditional entropy to measure the information loss induced by quantization, which closely correlates with the model’s mean performance.
SMOTE k-out: Enhancing Class Separability through Outer Synthetic Sampling
Verónica Bolón-Canedo, José Luis Morillo-Salas, Laura Morán-Fernández, Amparo Alonso-Betanzos
https://doi.org/10.14428/esann/2026.ES2026-72
Verónica Bolón-Canedo, José Luis Morillo-Salas, Laura Morán-Fernández, Amparo Alonso-Betanzos
https://doi.org/10.14428/esann/2026.ES2026-72
Abstract:
Oversampling techniques are commonly used to address class imbalance in supervised classification, with SMOTE being a popular approach. However, traditional SMOTE generates synthetic samples within the neighbourhood of minority instances, which can increase data complexity and hinder class separability. This work proposes SMOTE k-out, which creates synthetic samples outside the local neighbourhood to increase minority class sparsity. This aims to reduce overfitting and mitigate the impact of noise, thereby improving the definition of the decision boundary. Experiments on multiple imbalanced datasets demonstrate that SMOTE k-out consistently reduces complexity and achieves higher accuracy and F-measure, particularly with SVM and LDA classifiers.
Oversampling techniques are commonly used to address class imbalance in supervised classification, with SMOTE being a popular approach. However, traditional SMOTE generates synthetic samples within the neighbourhood of minority instances, which can increase data complexity and hinder class separability. This work proposes SMOTE k-out, which creates synthetic samples outside the local neighbourhood to increase minority class sparsity. This aims to reduce overfitting and mitigate the impact of noise, thereby improving the definition of the decision boundary. Experiments on multiple imbalanced datasets demonstrate that SMOTE k-out consistently reduces complexity and achieves higher accuracy and F-measure, particularly with SVM and LDA classifiers.
Kernel Thinning for faster KSVM hyper-parametrization
Blanca Cano-Camarero, Ángela Fernández, José R. Dorronsoro
https://doi.org/10.14428/esann/2026.ES2026-34
Blanca Cano-Camarero, Ángela Fernández, José R. Dorronsoro
https://doi.org/10.14428/esann/2026.ES2026-34
Abstract:
This work presents KT-Funnel, a novel procedure to accelerate hyperparameter tuning in kernel methods, together with an empirical study of efficient cross-validation strategies for Kernel Support Vector Machines (KSVM) based on Kernel Thinning (KT).
The main aim is to reduce computational cost while preserving predictive accuracy.
Experiments on 25 classification datasets show that KT and KT-Funnel significantly speed up hyperparameter tuning, being at least twice faster than traditional cross-validation.
In particular, the proposed method with higher thinning levels attains comparable balanced accuracy and improved hyperparameter ranking stability, demonstrating its scalability and reliability for KSVM model selection.
This work presents KT-Funnel, a novel procedure to accelerate hyperparameter tuning in kernel methods, together with an empirical study of efficient cross-validation strategies for Kernel Support Vector Machines (KSVM) based on Kernel Thinning (KT).
The main aim is to reduce computational cost while preserving predictive accuracy.
Experiments on 25 classification datasets show that KT and KT-Funnel significantly speed up hyperparameter tuning, being at least twice faster than traditional cross-validation.
In particular, the proposed method with higher thinning levels attains comparable balanced accuracy and improved hyperparameter ranking stability, demonstrating its scalability and reliability for KSVM model selection.
Local Concept Embeddings in the Context of Self-Supervised Learning
Kim Paulke, Hans-Oliver Hansen, Thomas Martinetz, Gesina Schwalbe
https://doi.org/10.14428/esann/2026.ES2026-85
Kim Paulke, Hans-Oliver Hansen, Thomas Martinetz, Gesina Schwalbe
https://doi.org/10.14428/esann/2026.ES2026-85
Abstract:
This work investigates how self-supervised learning (SSL)
frameworks encode semantic structure within their latent representations
using the introspection technique Local Concept Embeddings (LoCE). We
analyse three complementary SSL paradigms—contrastive (Barlow Twins),
generative (Denoising Autoencoder), and predictive (Relative Patch Loca-
tion)—all pretrained on the Cityscapes dataset and evaluated in a seman-
tic segmentation setting. LoCE reveals that the Denoising Autoencoder
produces the most distinct and coherent concept clusters (highest sep-
arability and clustering metrics), while Barlow Twins and RPL exhibit
moderate structure and higher intra-class variability. Furthermore we find
that greater latent disentanglement before fine-tuning correlates with im-
proved segmentation performance, uncovering an interesting link between
latent organization and downstream generalization.
This work investigates how self-supervised learning (SSL)
frameworks encode semantic structure within their latent representations
using the introspection technique Local Concept Embeddings (LoCE). We
analyse three complementary SSL paradigms—contrastive (Barlow Twins),
generative (Denoising Autoencoder), and predictive (Relative Patch Loca-
tion)—all pretrained on the Cityscapes dataset and evaluated in a seman-
tic segmentation setting. LoCE reveals that the Denoising Autoencoder
produces the most distinct and coherent concept clusters (highest sep-
arability and clustering metrics), while Barlow Twins and RPL exhibit
moderate structure and higher intra-class variability. Furthermore we find
that greater latent disentanglement before fine-tuning correlates with im-
proved segmentation performance, uncovering an interesting link between
latent organization and downstream generalization.
THDC: Training Hyperdimensional Computing Models with Backpropagation
Hanne Dejonghe, Sam Leroux
https://doi.org/10.14428/esann/2026.ES2026-186
Hanne Dejonghe, Sam Leroux
https://doi.org/10.14428/esann/2026.ES2026-186
Abstract:
Hyperdimensional computing (HDC) offers lightweight learning for energy-constrained devices by encoding data into high-dimensional vectors. However, its reliance on ultra-high dimensionality and static, randomly initialized hypervectors limits memory efficiency and learning capacity. Therefore, we propose Trainable Hyperdimensional Computing (THDC), which enables end-to-end HDC via backpropagation. THDC replaces randomly initialized vectors with trainable embeddings and introduces a one-layer binary neural network to optimize class representations. Evaluated on MNIST, Fashion-MNIST and CIFAR-10, THDC achieves equal or better accuracy than state-of-the-art HDC, with dimensionality reduced from 10.000 to 64.
Hyperdimensional computing (HDC) offers lightweight learning for energy-constrained devices by encoding data into high-dimensional vectors. However, its reliance on ultra-high dimensionality and static, randomly initialized hypervectors limits memory efficiency and learning capacity. Therefore, we propose Trainable Hyperdimensional Computing (THDC), which enables end-to-end HDC via backpropagation. THDC replaces randomly initialized vectors with trainable embeddings and introduces a one-layer binary neural network to optimize class representations. Evaluated on MNIST, Fashion-MNIST and CIFAR-10, THDC achieves equal or better accuracy than state-of-the-art HDC, with dimensionality reduced from 10.000 to 64.
AdaCap: An Adaptive Contrastive Approach for Small-Data Neural Networks
Bruno Belucci, Karim Lounici, Katia Meziani
https://doi.org/10.14428/esann/2026.ES2026-182
Bruno Belucci, Karim Lounici, Katia Meziani
https://doi.org/10.14428/esann/2026.ES2026-182
Abstract:
Neural networks struggle on small tabular datasets, where tree-based models remain dominant. We introduce Adaptive Contrastive Approach (AdaCap), a training scheme that combines a permutation-based contrastive loss with a Tikhonov-based closed-form output mapping. Across 85 real-world regression datasets and multiple architectures, AdaCap yields consistent and statistically significant improvements in the small-sample regime, particularly for residual models. A meta-predictor trained on dataset characteristics (size, skewness, noise) accurately anticipates when AdaCap is beneficial.
These results show that AdaCap acts as a targeted regularization mechanism, strengthening neural networks precisely where they are most fragile. All results and code are publicly available at https://github.com/BrunoBelucci/adacap.
Neural networks struggle on small tabular datasets, where tree-based models remain dominant. We introduce Adaptive Contrastive Approach (AdaCap), a training scheme that combines a permutation-based contrastive loss with a Tikhonov-based closed-form output mapping. Across 85 real-world regression datasets and multiple architectures, AdaCap yields consistent and statistically significant improvements in the small-sample regime, particularly for residual models. A meta-predictor trained on dataset characteristics (size, skewness, noise) accurately anticipates when AdaCap is beneficial.
These results show that AdaCap acts as a targeted regularization mechanism, strengthening neural networks precisely where they are most fragile. All results and code are publicly available at https://github.com/BrunoBelucci/adacap.
Clagging: Generating and combining predictions using clustering
Arnaud Germain, Frédéric Vrins
https://doi.org/10.14428/esann/2026.ES2026-56
Arnaud Germain, Frédéric Vrins
https://doi.org/10.14428/esann/2026.ES2026-56
Abstract:
We introduce a new ensemble learning strategy called clagging (for cluster aggregating) which consists in combining models fitted on different clusters. First, we perform K clustering tasks on the same training set, increasing linearly the number of clusters from 1 to K. Next, we fit a model on each of those 1+2+...+K clusters. Finally, the output for a given test point is obtained by combining the predictions of the corresponding models using the distance of the test point to the clusters' centroids. We perform an extensive horse race study where we benchmark clagging on 10 regression datasets and 7 prediction algorithms. Our results suggest that clagging outperforms the standard version of bagging and typically performs best when choosing K>1, indicating that it outperforms the considered model trained on the whole training set (K=1).
We introduce a new ensemble learning strategy called clagging (for cluster aggregating) which consists in combining models fitted on different clusters. First, we perform K clustering tasks on the same training set, increasing linearly the number of clusters from 1 to K. Next, we fit a model on each of those 1+2+...+K clusters. Finally, the output for a given test point is obtained by combining the predictions of the corresponding models using the distance of the test point to the clusters' centroids. We perform an extensive horse race study where we benchmark clagging on 10 regression datasets and 7 prediction algorithms. Our results suggest that clagging outperforms the standard version of bagging and typically performs best when choosing K>1, indicating that it outperforms the considered model trained on the whole training set (K=1).
mDAE: modified Denoising AutoEncoder for missing data imputation
Mariette Dupuy, Marie Chavent, Rémi Dubois
https://doi.org/10.14428/esann/2026.ES2026-68
Mariette Dupuy, Marie Chavent, Rémi Dubois
https://doi.org/10.14428/esann/2026.ES2026-68
Abstract:
This paper introduces a method based on Denoising AutoEncoder (DAE) for missing data imputation. This method, called mDAE hereafter, results from a modification of the loss function and a straightforward procedure for choosing the hyper-parameters. An ablation study shows on several UCI Machine Learning Repository datasets, the benefit of using this modified loss function and an overcomplete structure, in terms of Root Mean Squared Error (RMSE) of reconstruction. This numerical study is complemented by a comparison between mDAE and eight alternative approaches (four classical and four more recent), using the Mean Distance to the Best (MDB) criterion, which quantifies the overall performance of each method across all the datasets.
This paper introduces a method based on Denoising AutoEncoder (DAE) for missing data imputation. This method, called mDAE hereafter, results from a modification of the loss function and a straightforward procedure for choosing the hyper-parameters. An ablation study shows on several UCI Machine Learning Repository datasets, the benefit of using this modified loss function and an overcomplete structure, in terms of Root Mean Squared Error (RMSE) of reconstruction. This numerical study is complemented by a comparison between mDAE and eight alternative approaches (four classical and four more recent), using the Mean Distance to the Best (MDB) criterion, which quantifies the overall performance of each method across all the datasets.
Learning and Reasoning on Knowledge and Heterogeneous Graphs
Learning and Reasoning on Knowledge and Heterogeneous Graphs in the era of Graph Foundation and Large Language Models
Matteo Zignani, Pasquale Minervini, Roberto Interdonato, Manuel Dileo
https://doi.org/10.14428/esann/2026.ES2026-4
Matteo Zignani, Pasquale Minervini, Roberto Interdonato, Manuel Dileo
https://doi.org/10.14428/esann/2026.ES2026-4
Abstract:
Knowledge Graphs (KGs) and heterogeneous graphs (HGs) offer a principled way to represent multi-entity, multi-relational systems, while also revealing a persistent tension between expressive modeling, scalable learning, and faithful reasoning. Two trends are rapidly reshaping the field: \emph{graph foundation models} (GFMs), which seek transfer across graphs, tasks, and domains via large-scale pretraining, and the growing integration of \emph{large language models} (LLMs) with graph-structured knowledge to improve grounding, interaction, and reasoning. Temporal settings add further challenges, as evolving facts and interactions demand time-consistent modeling and evaluation. This tutorial provides a structured survey of these directions: we introduce a unified background and notation for typed heterogeneous graphs, (temporal) KGs, and event-based temporal heterogeneous graphs; we then formalize the main task families (KG completion, query answering, node/graph prediction, and temporal variants), emphasizing evaluation protocols and leakage pitfalls. Finally, we review recent advances in GFMs and LLM--graph integration, and summarize the state of the art in learning over temporal heterogeneous graphs and temporal KGs.
Knowledge Graphs (KGs) and heterogeneous graphs (HGs) offer a principled way to represent multi-entity, multi-relational systems, while also revealing a persistent tension between expressive modeling, scalable learning, and faithful reasoning. Two trends are rapidly reshaping the field: \emph{graph foundation models} (GFMs), which seek transfer across graphs, tasks, and domains via large-scale pretraining, and the growing integration of \emph{large language models} (LLMs) with graph-structured knowledge to improve grounding, interaction, and reasoning. Temporal settings add further challenges, as evolving facts and interactions demand time-consistent modeling and evaluation. This tutorial provides a structured survey of these directions: we introduce a unified background and notation for typed heterogeneous graphs, (temporal) KGs, and event-based temporal heterogeneous graphs; we then formalize the main task families (KG completion, query answering, node/graph prediction, and temporal variants), emphasizing evaluation protocols and leakage pitfalls. Finally, we review recent advances in GFMs and LLM--graph integration, and summarize the state of the art in learning over temporal heterogeneous graphs and temporal KGs.
A Possible Human-Centered Embedding Space Search in Degenerate Clifford Algebras
Isaac Roberts, Louis Mozart Kamdem teyou, Alexander Schulz, N'Dah Jean Kouagou, Axel Ngonga Ngomo, Barbara Hammer
https://doi.org/10.14428/esann/2026.ES2026-283
Isaac Roberts, Louis Mozart Kamdem teyou, Alexander Schulz, N'Dah Jean Kouagou, Axel Ngonga Ngomo, Barbara Hammer
https://doi.org/10.14428/esann/2026.ES2026-283
Abstract:
Recent knowledge graph embedding (KGE) models increasingly exploit algebraic structures to encode relational semantics. Clifford-based models, in particular, offer strong expressiveness and geometric interpretability. In this work, we analyze the representations and decision boundaries of such models using an embedding-based reasoner as a classification function. To interpret Clifford-based geometric effects, we adapt DeepView, a visualization framework that approximates decision functions of deep classification models. This study provides one of the first systematic visual analyses of Clifford-based KGE models, helping bridge algebraic representation learning and interpretability.
Recent knowledge graph embedding (KGE) models increasingly exploit algebraic structures to encode relational semantics. Clifford-based models, in particular, offer strong expressiveness and geometric interpretability. In this work, we analyze the representations and decision boundaries of such models using an embedding-based reasoner as a classification function. To interpret Clifford-based geometric effects, we adapt DeepView, a visualization framework that approximates decision functions of deep classification models. This study provides one of the first systematic visual analyses of Clifford-based KGE models, helping bridge algebraic representation learning and interpretability.
Graph Representation Learning for Software Architecture Recovery
Rakhshanda Jabeen, Morgan Ericsson, Jonas Nordqvist, Anna Wingkvist
https://doi.org/10.14428/esann/2026.ES2026-316
Rakhshanda Jabeen, Morgan Ericsson, Jonas Nordqvist, Anna Wingkvist
https://doi.org/10.14428/esann/2026.ES2026-316
Abstract:
Software architecture recovery aims to infer a system’s modular organization from source code, bridging the gap between design intent and implementation structure. Traditional techniques rely on handcrafted heuristics and often fail to capture deeper architectural relationships. We investigate whether GNNs can recover these relationships by framing the task as unsupervised representation learning over a multi-relational software graph. Our approach learns node embeddings that reflect architectural boundaries, offering a promising alternative to existing recovery methods.
Software architecture recovery aims to infer a system’s modular organization from source code, bridging the gap between design intent and implementation structure. Traditional techniques rely on handcrafted heuristics and often fail to capture deeper architectural relationships. We investigate whether GNNs can recover these relationships by framing the task as unsupervised representation learning over a multi-relational software graph. Our approach learns node embeddings that reflect architectural boundaries, offering a promising alternative to existing recovery methods.
A Multi-Agent LLM System for Natural Language Querying of Operational Knowledge Graphs in Satellite Ground Stations
Fosco Eugenio Quadri, Filippo Bianchini
https://doi.org/10.14428/esann/2026.ES2026-42
Fosco Eugenio Quadri, Filippo Bianchini
https://doi.org/10.14428/esann/2026.ES2026-42
Abstract:
Satellite ground-station maintenance generates vast operational data, yet traditional query interfaces limit discoverability and slow time-critical decision making. We present a multi-agent system deployed
at Fucino Space Centre that combines Large Language Models with knowledge graphs and Retrieval-Augmented Generation to support operators in troubleshooting by exploiting 40,000 historical maintenance tickets. Specialized agents collaborate on intent mapping, multi-hop reasoning, and explainable synthesis. This work brings the following contributions: (1) an explainable architecture for conversational retrieval, (2) a domain knowledge graph operationalizing antenna-system context, and (3) integration lessons for operator-in-the-loop. Our novel approach demonstrates how agentic AI enhances transparency and operational reliability in aerospace ground operations.
Satellite ground-station maintenance generates vast operational data, yet traditional query interfaces limit discoverability and slow time-critical decision making. We present a multi-agent system deployed
at Fucino Space Centre that combines Large Language Models with knowledge graphs and Retrieval-Augmented Generation to support operators in troubleshooting by exploiting 40,000 historical maintenance tickets. Specialized agents collaborate on intent mapping, multi-hop reasoning, and explainable synthesis. This work brings the following contributions: (1) an explainable architecture for conversational retrieval, (2) a domain knowledge graph operationalizing antenna-system context, and (3) integration lessons for operator-in-the-loop. Our novel approach demonstrates how agentic AI enhances transparency and operational reliability in aerospace ground operations.
GraphTreeBoost: Soft Decision Tree-Based Graph Learning With Spectral Aggregation
László Fetter, András Gézsi
https://doi.org/10.14428/esann/2026.ES2026-111
László Fetter, András Gézsi
https://doi.org/10.14428/esann/2026.ES2026-111
Abstract:
We propose GraphTreeBoost, a gradient-boosted framework for graph-structured data that couples soft decision trees with spectral feature aggregation. Each split operates on features filtered by truncated Chebyshev or Chebyshev–Bessel (heat kernel) expansions of the normalized adjacency, enabling efficient graph-aware learning without eigendecomposition. Training combines analytic second-order leaf updates with AdamW for routing and filter parameters, yielding stable optimization and interpretable thresholds with per-node gain scores. All spectral operations are implemented in the feature space using sparse graph primitives, scaling linearly with the number of edges and remaining practical on CPU hardware. Experiments on six benchmarks show consistent accuracy gains over feature-only baselines, demonstrating that GraphTreeBoost unites the transparency of decision trees with scalable spectral graph learning.
We propose GraphTreeBoost, a gradient-boosted framework for graph-structured data that couples soft decision trees with spectral feature aggregation. Each split operates on features filtered by truncated Chebyshev or Chebyshev–Bessel (heat kernel) expansions of the normalized adjacency, enabling efficient graph-aware learning without eigendecomposition. Training combines analytic second-order leaf updates with AdamW for routing and filter parameters, yielding stable optimization and interpretable thresholds with per-node gain scores. All spectral operations are implemented in the feature space using sparse graph primitives, scaling linearly with the number of edges and remaining practical on CPU hardware. Experiments on six benchmarks show consistent accuracy gains over feature-only baselines, demonstrating that GraphTreeBoost unites the transparency of decision trees with scalable spectral graph learning.
Vision, image processing and healthcare AI
See Without Decoding: Motion-Vector-Based Tracking in Compressed Video
Axel Duché, Clement Chatelain , Gilles Gasso
https://doi.org/10.14428/esann/2026.ES2026-184
Axel Duché, Clement Chatelain , Gilles Gasso
https://doi.org/10.14428/esann/2026.ES2026-184
Abstract:
We propose a lightweight compressed-domain tracking model that operates directly on video streams, without requiring full RGB video decoding. Using motion vectors and transform coefficients from compressed data, our deep model propagates object bounding boxes across frames, achieving a computational speed-up of order up to 3.7× with only
a slight 4% mAP@0.5 drop vs RGB baseline on MOTS15/17/20 datasets. These results highlight codec-domain motion modeling efficiency for real-time analytics in large monitoring systems.
We propose a lightweight compressed-domain tracking model that operates directly on video streams, without requiring full RGB video decoding. Using motion vectors and transform coefficients from compressed data, our deep model propagates object bounding boxes across frames, achieving a computational speed-up of order up to 3.7× with only
a slight 4% mAP@0.5 drop vs RGB baseline on MOTS15/17/20 datasets. These results highlight codec-domain motion modeling efficiency for real-time analytics in large monitoring systems.
But Are These Images Conceptually Similar?
Isaac Roberts, Riza Velioglu, Inaam Ashraf, Luca Hermes, Barbara Hammer
https://doi.org/10.14428/esann/2026.ES2026-292
Isaac Roberts, Riza Velioglu, Inaam Ashraf, Luca Hermes, Barbara Hammer
https://doi.org/10.14428/esann/2026.ES2026-292
Abstract:
Assessing the similarity between two images remains a core challenge in computer vision. Traditional full-reference image quality assessment~(FR-IQA) metrics measure pixel-wise or low-level structural distortions and falter when human perception effortlessly recognizes equivalence. Perceptual metrics such as LPIPS and DISTS improve correlation with human judgments but remain opaque black boxes.
We propose \textbf{Conceptual Similarity~(CSIM)}, a transparent and steerable image similarity metric that operates directly on human-interpretable semantic concepts.
The resulting metric is simultaneously (1) a FR-IQA metric that is robust to non-semantic distortions while remaining sensitive to meaningful semantic changes, and (2) a general image-to-image similarity measure. Most importantly, CSIM offers transparency—users can inspect which concepts drive the score—and gives rise to a novel capability, which we call \textbf{Human Similarity Steering} that permits user-determined per-concept weighting to influence the similarity score according to their preferences.
Assessing the similarity between two images remains a core challenge in computer vision. Traditional full-reference image quality assessment~(FR-IQA) metrics measure pixel-wise or low-level structural distortions and falter when human perception effortlessly recognizes equivalence. Perceptual metrics such as LPIPS and DISTS improve correlation with human judgments but remain opaque black boxes.
We propose \textbf{Conceptual Similarity~(CSIM)}, a transparent and steerable image similarity metric that operates directly on human-interpretable semantic concepts.
The resulting metric is simultaneously (1) a FR-IQA metric that is robust to non-semantic distortions while remaining sensitive to meaningful semantic changes, and (2) a general image-to-image similarity measure. Most importantly, CSIM offers transparency—users can inspect which concepts drive the score—and gives rise to a novel capability, which we call \textbf{Human Similarity Steering} that permits user-determined per-concept weighting to influence the similarity score according to their preferences.
Hierarchical Multi-Scale Deep Neural Network for Schizophrenia Detection in Neuroimaging
Carlos Dias Maia, Gabriel Fonseca, Silvio Jamil, Luis Zárate
https://doi.org/10.14428/esann/2026.ES2026-331
Carlos Dias Maia, Gabriel Fonseca, Silvio Jamil, Luis Zárate
https://doi.org/10.14428/esann/2026.ES2026-331
Abstract:
Schizophrenia remains difficult to diagnose due to its reliance on subjective clinical assessment. This work proposes a pipeline
for automated schizophrenia classification using functional MRI data from
the UCLA CNP dataset. The method extracts multi-view slices from
nine anatomical orientations using a hierarchical analysis and processes
them with a Vision Transformer model (MultiSliceViT). Under stratified
5-fold cross-validation, the approach achieved 82.6% accuracy, outperforming models with fewer views. Interpretability analyses highlighted consistent attention to key regions, including the dorsolateral prefrontal cortex,
hippocampus, and anterior cingulate. These results demonstrate the effectiveness of multi-view transformer architectures for identifying meaningful
functional biomarkers.
Schizophrenia remains difficult to diagnose due to its reliance on subjective clinical assessment. This work proposes a pipeline
for automated schizophrenia classification using functional MRI data from
the UCLA CNP dataset. The method extracts multi-view slices from
nine anatomical orientations using a hierarchical analysis and processes
them with a Vision Transformer model (MultiSliceViT). Under stratified
5-fold cross-validation, the approach achieved 82.6% accuracy, outperforming models with fewer views. Interpretability analyses highlighted consistent attention to key regions, including the dorsolateral prefrontal cortex,
hippocampus, and anterior cingulate. These results demonstrate the effectiveness of multi-view transformer architectures for identifying meaningful
functional biomarkers.
Predictive Coding inspired convolutional networks can capture the neural dynamics of recurrent processing in human image recognition
Manshan Guo, Bhavin Choksi, sari sadiya, Pablo Oyarzo, Radoslaw Cichy, Gemma Roig
https://doi.org/10.14428/esann/2026.ES2026-168
Manshan Guo, Bhavin Choksi, sari sadiya, Pablo Oyarzo, Radoslaw Cichy, Gemma Roig
https://doi.org/10.14428/esann/2026.ES2026-168
Abstract:
Inspired by the robustness of human vision, various attempts have been made to incorporate brain-inspired mechanisms into artificial neural networks. A popular candidate has been predictive coding, a prominent theory in neuroscience, that theorizes that feedback connections communicate top-down predictions to earlier regions. While recurrence in models has been demonstrated to be useful when processing noisy and difficult stimuli, a direct evidence of its utility for explaining brain data under such situation was yet to be shown. Here, we investigated whether such brain-inspired mechanism actually helps to capture neural dynamics. Specifically, we measured the brain alignment between representations of a predictive version of a popular feedforward CNN often used as a computational model of the visual cortex–VGG16–and human EEG collected when viewing images that were relatively easy (Control) or difficult to classify (Challenge). We demonstrate that the recurrent dynamics significantly enhanced the model’s alignment with EEG responses, underscoring the importance of recurrent connectivity in computational models of human vision, an effect distinctly visible for challenging stimuli.
Inspired by the robustness of human vision, various attempts have been made to incorporate brain-inspired mechanisms into artificial neural networks. A popular candidate has been predictive coding, a prominent theory in neuroscience, that theorizes that feedback connections communicate top-down predictions to earlier regions. While recurrence in models has been demonstrated to be useful when processing noisy and difficult stimuli, a direct evidence of its utility for explaining brain data under such situation was yet to be shown. Here, we investigated whether such brain-inspired mechanism actually helps to capture neural dynamics. Specifically, we measured the brain alignment between representations of a predictive version of a popular feedforward CNN often used as a computational model of the visual cortex–VGG16–and human EEG collected when viewing images that were relatively easy (Control) or difficult to classify (Challenge). We demonstrate that the recurrent dynamics significantly enhanced the model’s alignment with EEG responses, underscoring the importance of recurrent connectivity in computational models of human vision, an effect distinctly visible for challenging stimuli.
Ensembling Post-Hoc Image Explanations: When It Works, When It Fails, and How to Tell the Difference
Luca Oneto, jinhua xu, Davide Anguita, Fabio Roli, Jing Yuan
https://doi.org/10.14428/esann/2026.ES2026-73
Luca Oneto, jinhua xu, Davide Anguita, Fabio Roli, Jing Yuan
https://doi.org/10.14428/esann/2026.ES2026-73
Abstract:
Post-hoc explanation methods of image recognition models often exhibit high variance or disagreement across explanations when the input data are perturbed, the underlying models are modified, or different explainability techniques are employed.
To mitigate this issue, several approaches have been proposed, among which ensemble strategies that aggregate multiple explanations have attracted particular attention.
Although some of these methods demonstrate good empirical performance, most existing works remain largely empirical, with limited theoretical justification or understanding of why ensemble strategies work and when they fail.
In this paper, we analyze the factors that influence the success and failure of ensemble strategies that combines multiple explanations, using different datasets, convolutional neural network architectures, post-hoc explanation techniques, and ensembling strategies to identify the most influential image patches.
In particular, we compare various ensembling strategies based on distinct voting principles - namely, Borda Count, Kemeny–Young, Reciprocal Rank Fusion, and the Schulze method - and show that the performance of such ensemble methods depends on the degree of satisfaction of their underlying theoretical assumptions.
Post-hoc explanation methods of image recognition models often exhibit high variance or disagreement across explanations when the input data are perturbed, the underlying models are modified, or different explainability techniques are employed.
To mitigate this issue, several approaches have been proposed, among which ensemble strategies that aggregate multiple explanations have attracted particular attention.
Although some of these methods demonstrate good empirical performance, most existing works remain largely empirical, with limited theoretical justification or understanding of why ensemble strategies work and when they fail.
In this paper, we analyze the factors that influence the success and failure of ensemble strategies that combines multiple explanations, using different datasets, convolutional neural network architectures, post-hoc explanation techniques, and ensembling strategies to identify the most influential image patches.
In particular, we compare various ensembling strategies based on distinct voting principles - namely, Borda Count, Kemeny–Young, Reciprocal Rank Fusion, and the Schulze method - and show that the performance of such ensemble methods depends on the degree of satisfaction of their underlying theoretical assumptions.
Towards Learning a Generalizable 3D Scene Representation from 2D Observations
Martin Gromniak, Jan-Gerrit Habekost, Sebastian Kamp, Sven Magg, Stefan Wermter
https://doi.org/10.14428/esann/2026.ES2026-122
Martin Gromniak, Jan-Gerrit Habekost, Sebastian Kamp, Sven Magg, Stefan Wermter
https://doi.org/10.14428/esann/2026.ES2026-122
Abstract:
We introduce a Generalizable Neural Radiance Field approach
for predicting 3D workspace occupancy from egocentric robot observations.
Unlike prior methods operating in camera-centric coordinates, our model
constructs occupancy representations in a global workspace frame, making
it directly applicable to robotic manipulation. The model integrates flex-
ible source views and generalizes to unseen object arrangements without
scene-specific finetuning. We demonstrate the approach on a humanoid
robot and evaluate predicted geometry against 3D sensor ground truth.
Trained on 40 real scenes, our model achieves 26mm reconstruction er-
ror, including occluded regions, validating its ability to infer complete 3D
occupancy beyond traditional stereo vision methods.
We introduce a Generalizable Neural Radiance Field approach
for predicting 3D workspace occupancy from egocentric robot observations.
Unlike prior methods operating in camera-centric coordinates, our model
constructs occupancy representations in a global workspace frame, making
it directly applicable to robotic manipulation. The model integrates flex-
ible source views and generalizes to unseen object arrangements without
scene-specific finetuning. We demonstrate the approach on a humanoid
robot and evaluate predicted geometry against 3D sensor ground truth.
Trained on 40 real scenes, our model achieves 26mm reconstruction er-
ror, including occluded regions, validating its ability to infer complete 3D
occupancy beyond traditional stereo vision methods.
When Curvature Counts: Hyperbolic Geometry in Prototype-Based Image Classification
Silvia Grosso, Samuele Fonio, Mirko Polato, Roberto Esposito, Sara Bouchenak
https://doi.org/10.14428/esann/2026.ES2026-214
Silvia Grosso, Samuele Fonio, Mirko Polato, Roberto Esposito, Sara Bouchenak
https://doi.org/10.14428/esann/2026.ES2026-214
Abstract:
Prototype Learning offers an interpretable and efficient classification framework by mapping data into an embedding space structured around class prototypes. Recent research has explored non-Euclidean geometries, such as hyperspherical and hyperbolic spaces, to more effectively model latent hierarchical structures and complex data relationships. While these geometries have shown potential, leveraging them within an image classification context is not trivial. To address this, we propose HypPNet, a hyperbolic prototypical model on the Poincaré ball that integrates Riemannian optimization and norm-based regularization to perform effectively without prior data knowledge. Experiments on three benchmark datasets and multiple embedding dimensions show that HypPNet outperforms its competitors across alternative geometries, improving classification performance over various metrics.
Prototype Learning offers an interpretable and efficient classification framework by mapping data into an embedding space structured around class prototypes. Recent research has explored non-Euclidean geometries, such as hyperspherical and hyperbolic spaces, to more effectively model latent hierarchical structures and complex data relationships. While these geometries have shown potential, leveraging them within an image classification context is not trivial. To address this, we propose HypPNet, a hyperbolic prototypical model on the Poincaré ball that integrates Riemannian optimization and norm-based regularization to perform effectively without prior data knowledge. Experiments on three benchmark datasets and multiple embedding dimensions show that HypPNet outperforms its competitors across alternative geometries, improving classification performance over various metrics.
Movements as Images: CNNs are Good Feature Extractors in Sign Language Recognition
Pierre Poitier, Loïc Brangier, Ariel Basso Madjoukeng, Benoit Frénay
https://doi.org/10.14428/esann/2026.ES2026-300
Pierre Poitier, Loïc Brangier, Ariel Basso Madjoukeng, Benoit Frénay
https://doi.org/10.14428/esann/2026.ES2026-300
Abstract:
This work explores a simple approach to Isolated Sign Language Recognition (ISLR) by reframing the classification of pose sequences as a standard image classification task. While recent trends in Sign Language Processing (SLP) heavily favor complex temporal architectures like Transformers, we investigate the projection of spatio-temporal pose information into a static image representation. By mapping time and skeletal joints to spatial dimensions and coordinate values to color channels, we allow standard Convolutional Neural Networks (CNNs), like ResNets, to extract features effectively. Our experiments on challenging real-world ISLR datasets demonstrate that this method is not only computationally efficient, but also outperforms existing architectures like Pose-VIT and SPOTER in a simple classification setting.
This work explores a simple approach to Isolated Sign Language Recognition (ISLR) by reframing the classification of pose sequences as a standard image classification task. While recent trends in Sign Language Processing (SLP) heavily favor complex temporal architectures like Transformers, we investigate the projection of spatio-temporal pose information into a static image representation. By mapping time and skeletal joints to spatial dimensions and coordinate values to color channels, we allow standard Convolutional Neural Networks (CNNs), like ResNets, to extract features effectively. Our experiments on challenging real-world ISLR datasets demonstrate that this method is not only computationally efficient, but also outperforms existing architectures like Pose-VIT and SPOTER in a simple classification setting.
XAI-Enabled Custom CNN for Cross-Modal Generalization in Breast Cancer Detection
Maram Issaoui, Amal Jlassi, Abir Baâzaoui, Walid Barhoumi
https://doi.org/10.14428/esann/2026.ES2026-305
Maram Issaoui, Amal Jlassi, Abir Baâzaoui, Walid Barhoumi
https://doi.org/10.14428/esann/2026.ES2026-305
Abstract:
This study presents a unified deep learning framework for breast cancer detection that generalizes effectively across mammography and histopathology. Using fine-tuned CNN architectures evaluated under a
consistent cross-modal protocol, the method achieves stable high accuracy on both imaging types, demonstrating robustness to domain shifts and heterogeneous clinical conditions. Moreover, the integration of model-agnostic and model-specific explainability techniques enables a balanced trade-off between performance and interpretability. The designed strategy provides clinically meaningful visual and feature-level insights, supporting transparent, reliable, and multi-modal diagnostic decision-making.
This study presents a unified deep learning framework for breast cancer detection that generalizes effectively across mammography and histopathology. Using fine-tuned CNN architectures evaluated under a
consistent cross-modal protocol, the method achieves stable high accuracy on both imaging types, demonstrating robustness to domain shifts and heterogeneous clinical conditions. Moreover, the integration of model-agnostic and model-specific explainability techniques enables a balanced trade-off between performance and interpretability. The designed strategy provides clinically meaningful visual and feature-level insights, supporting transparent, reliable, and multi-modal diagnostic decision-making.
Fast denoising of low-count Monte Carlo proton therapy dose distributions with ResUNet
Pierre Merveille, Ana Maria Barragan Montero, Kevin Souris, Lee John
https://doi.org/10.14428/esann/2026.ES2026-333
Pierre Merveille, Ana Maria Barragan Montero, Kevin Souris, Lee John
https://doi.org/10.14428/esann/2026.ES2026-333
Abstract:
Monte Carlo (MC) dose calculation is the gold standard for proton therapy but remains computationally expensive due to statistical noise at low-count particle histories. We propose a deep learning approach based on a ResUNet architecture to denoise high uncertainty (5%) MC dose distributions and approximate low uncertainty (0.5%) reference doses. Using the MCsquare engine, dose pairs were generated for 150 patients from head-and-neck, prostate, and lung datasets. The proposed model achieved high structural similarity and dosimetric accuracy while reducing computation time by a factor of 30, enabling fast and accurate MC dose estimation for clinical proton therapy.
Monte Carlo (MC) dose calculation is the gold standard for proton therapy but remains computationally expensive due to statistical noise at low-count particle histories. We propose a deep learning approach based on a ResUNet architecture to denoise high uncertainty (5%) MC dose distributions and approximate low uncertainty (0.5%) reference doses. Using the MCsquare engine, dose pairs were generated for 150 patients from head-and-neck, prostate, and lung datasets. The proposed model achieved high structural similarity and dosimetric accuracy while reducing computation time by a factor of 30, enabling fast and accurate MC dose estimation for clinical proton therapy.
Non-Linear Activation Functions for Deep Riemannian Neural Networks
Lucas H. dos Santos, Joao Barbon, Sylvain Chevallier, Denis G. Fantinato
https://doi.org/10.14428/esann/2026.ES2026-312
Lucas H. dos Santos, Joao Barbon, Sylvain Chevallier, Denis G. Fantinato
https://doi.org/10.14428/esann/2026.ES2026-312
Abstract:
In the context of EEG-based Brain-Computer Interfaces (BCIs), Deep Riemannian Neural Networks (DRNNs) have emerged as a state-of-the-art framework, particularly in classifying motor imagery. A crucial component of these networks is the activation function, which must preserve the manifold's geometry. The ReEig function is the prevailing choice, providing a foundational but potentially limited nonlinear transformation. This work investigates whether alternative activation functions can improve the performance of DRNNs. We conduct a comparative analysis of the standard ReEig function against four alternatives -- cosh, sinh, ReLU, and SiLU -- within the SPDNet and EE(G)-SPDNet architectures. The experiments are performed on three public motor imagery datasets: BCI Competition IV2a, PhysioNetMI, and Cho. The results consistently indicate that alternative nonlinear functions perform better than the conventionally used ReEig, achieving superior classification accuracy.
In the context of EEG-based Brain-Computer Interfaces (BCIs), Deep Riemannian Neural Networks (DRNNs) have emerged as a state-of-the-art framework, particularly in classifying motor imagery. A crucial component of these networks is the activation function, which must preserve the manifold's geometry. The ReEig function is the prevailing choice, providing a foundational but potentially limited nonlinear transformation. This work investigates whether alternative activation functions can improve the performance of DRNNs. We conduct a comparative analysis of the standard ReEig function against four alternatives -- cosh, sinh, ReLU, and SiLU -- within the SPDNet and EE(G)-SPDNet architectures. The experiments are performed on three public motor imagery datasets: BCI Competition IV2a, PhysioNetMI, and Cho. The results consistently indicate that alternative nonlinear functions perform better than the conventionally used ReEig, achieving superior classification accuracy.
Recurrent and reinforcement learning
Random Unicycle Network (RUN!): supercharging harmonic oscillator networks via non-holonomic constraints
Mariano Ramirez, Andrea Ceni, Andrea Cossu, Davide Bacciu, Claudio Gallicchio, Cosimo Della Santina
https://doi.org/10.14428/esann/2026.ES2026-136
Mariano Ramirez, Andrea Ceni, Andrea Cossu, Davide Bacciu, Claudio Gallicchio, Cosimo Della Santina
https://doi.org/10.14428/esann/2026.ES2026-136
Abstract:
Motivated by advances in physical reservoir computing, we seek models that retain the modularity of echo state networks while enriching their internal dynamics. Recent studies have demonstrated that oscillator networks can achieve this balance, although their simple harmonic nature may limit their expressiveness. Here, we investigate the idea of augmenting harmonic oscillators with non-holonomic (velocity-level) constraints, known to induce rich, nonlocal behaviors. We implement these constraints intrinsically within each dynamical unit, yielding a model equivalent to the unicycle — the canonical representation of the simplest vehicle. We test the model on three time-series classification benchmarks, achieving competitive or superior accuracy compared to the state of the art, with reservoirs as small as 20 unicycles.
Motivated by advances in physical reservoir computing, we seek models that retain the modularity of echo state networks while enriching their internal dynamics. Recent studies have demonstrated that oscillator networks can achieve this balance, although their simple harmonic nature may limit their expressiveness. Here, we investigate the idea of augmenting harmonic oscillators with non-holonomic (velocity-level) constraints, known to induce rich, nonlocal behaviors. We implement these constraints intrinsically within each dynamical unit, yielding a model equivalent to the unicycle — the canonical representation of the simplest vehicle. We test the model on three time-series classification benchmarks, achieving competitive or superior accuracy compared to the state of the art, with reservoirs as small as 20 unicycles.
Memristive-Friendly Hadamard Reservoirs
Andrea Ceni, Gianluca Milano, Carlo Ricciardi, Claudio Gallicchio
https://doi.org/10.14428/esann/2026.ES2026-228
Andrea Ceni, Gianluca Milano, Carlo Ricciardi, Claudio Gallicchio
https://doi.org/10.14428/esann/2026.ES2026-228
Abstract:
Reservoir Computing (RC) processes temporal data using a fixed recurrent system and a trained linear readout, making it appealing for hardware-limited neuromorphic settings. Memristive-friendly reservoirs refine this idea by adopting neuron dynamics inspired by resistive devices, but they still rely on dense recurrent matrices that are costly to implement physically. We introduce a Hadamard-based alternative in which the recurrence is replaced by an orthogonal, multiplier-free operator with $O(N\log N)$ complexity and $O(N)$ parameters. Experiments on time-series tasks show that the proposed approach matches the performance of dense RC baselines while improving hardware compatibility.
Reservoir Computing (RC) processes temporal data using a fixed recurrent system and a trained linear readout, making it appealing for hardware-limited neuromorphic settings. Memristive-friendly reservoirs refine this idea by adopting neuron dynamics inspired by resistive devices, but they still rely on dense recurrent matrices that are costly to implement physically. We introduce a Hadamard-based alternative in which the recurrence is replaced by an orthogonal, multiplier-free operator with $O(N\log N)$ complexity and $O(N)$ parameters. Experiments on time-series tasks show that the proposed approach matches the performance of dense RC baselines while improving hardware compatibility.
Unpacking the Role of Intrinsic Motivation in Elastic Decision Transformers: A Post-Hoc Analysis of Embedding Geometry and Performance
Leonardo Guiducci, Antonio Rizzo, Giovanna Maria Dimitri
https://doi.org/10.14428/esann/2026.ES2026-281
Leonardo Guiducci, Antonio Rizzo, Giovanna Maria Dimitri
https://doi.org/10.14428/esann/2026.ES2026-281
Abstract:
Elastic Decision Transformers (EDTs) augmented with intrinsic motivation exhibit improved performance in offline reinforcement learning, yet the cognitive processes driving these gains remain unclear. We present a systematic post-hoc explainability framework that examines how intrinsic motivation influences learned embeddings through statistical characterization of covariance structure, vector magnitudes, and orthogonality. Our findings show that distinct intrinsic-motivation variants induce qualitatively different representational organizations: EDT-SIL (state-based) produces significantly more compact embedding spaces than baseline EDT, whereas EDT-TIL (transformer output-based) increases representational orthogonality. We identify environment-dependent correlations between embedding metrics and performance across locomotion domains. The results indicate that intrinsic motivation acts as a representational prior that shapes embedding geometry in cognitively meaningful ways, yielding environment-specific structures that support improved decision-making beyond simple exploration bonuses.
Elastic Decision Transformers (EDTs) augmented with intrinsic motivation exhibit improved performance in offline reinforcement learning, yet the cognitive processes driving these gains remain unclear. We present a systematic post-hoc explainability framework that examines how intrinsic motivation influences learned embeddings through statistical characterization of covariance structure, vector magnitudes, and orthogonality. Our findings show that distinct intrinsic-motivation variants induce qualitatively different representational organizations: EDT-SIL (state-based) produces significantly more compact embedding spaces than baseline EDT, whereas EDT-TIL (transformer output-based) increases representational orthogonality. We identify environment-dependent correlations between embedding metrics and performance across locomotion domains. The results indicate that intrinsic motivation acts as a representational prior that shapes embedding geometry in cognitively meaningful ways, yielding environment-specific structures that support improved decision-making beyond simple exploration bonuses.
Physics-Informed Recurrent Architecture with Embedded Thermodynamic Dynamics for Robust Sequence Modeling
Zafer Yigit, Håkan Forsberg, Masoud Daneshtalab
https://doi.org/10.14428/esann/2026.ES2026-43
Zafer Yigit, Håkan Forsberg, Masoud Daneshtalab
https://doi.org/10.14428/esann/2026.ES2026-43
Abstract:
Physics-informed machine learning has shown strong potential in improving generalisation under limited or noisy data, but most existing approaches treat physical priors only as soft regularisation terms on the loss. This work introduces a physics-structured recurrent architecture where thermodynamic differential equations are embedded directly into LSTM state updates. Adaptive physical parameters are learned through auxiliary multilayer perceptrons, forming a differentiable hybrid dynamical system that fuses physics priors with sequence learning. Experiments on industrial datasets show improved robustness under unseen fault conditions, outperforming conventional LSTMs and PINN-style models. The framework offers a scalable and generalizable approach to physics-aware recurrent modeling.
Physics-informed machine learning has shown strong potential in improving generalisation under limited or noisy data, but most existing approaches treat physical priors only as soft regularisation terms on the loss. This work introduces a physics-structured recurrent architecture where thermodynamic differential equations are embedded directly into LSTM state updates. Adaptive physical parameters are learned through auxiliary multilayer perceptrons, forming a differentiable hybrid dynamical system that fuses physics priors with sequence learning. Experiments on industrial datasets show improved robustness under unseen fault conditions, outperforming conventional LSTMs and PINN-style models. The framework offers a scalable and generalizable approach to physics-aware recurrent modeling.
Constraint Guided Recurrent Convolutional AutoEncoders for Condition Indicator Estimation
Maarten Meire, Quinten Van Baelen, Ted Ooijevaar, Peter Karsmakers
https://doi.org/10.14428/esann/2026.ES2026-189
Maarten Meire, Quinten Van Baelen, Ted Ooijevaar, Peter Karsmakers
https://doi.org/10.14428/esann/2026.ES2026-189
Abstract:
To effectively monitor industrial applications, an accurate estimate of their condition, or a Condition Indicator (CI), is required. Recently, a CI estimation method called Monotonically Constraint Guided Autoencoders (MCGAE) was introduced, which constrains the CI to remain within predefined ranges for both normal and anomalous data while also enforcing monotonic behavior over time. However, that work employed a Convolutional AutoEncoder (CAE) architecture that did not capture longer-term temporal dependencies. In this study, we evaluate a recurrent CAE variant that incorporates a Long Short-Term Memory (LSTM) layer and compare its performance against alternative architectures without a recurrent component. Experimental results on a bearing run-to-failure dataset, indicate that the addition of LSTM improves the monotonic behavior of the estimated CI.
To effectively monitor industrial applications, an accurate estimate of their condition, or a Condition Indicator (CI), is required. Recently, a CI estimation method called Monotonically Constraint Guided Autoencoders (MCGAE) was introduced, which constrains the CI to remain within predefined ranges for both normal and anomalous data while also enforcing monotonic behavior over time. However, that work employed a Convolutional AutoEncoder (CAE) architecture that did not capture longer-term temporal dependencies. In this study, we evaluate a recurrent CAE variant that incorporates a Long Short-Term Memory (LSTM) layer and compare its performance against alternative architectures without a recurrent component. Experimental results on a bearing run-to-failure dataset, indicate that the addition of LSTM improves the monotonic behavior of the estimated CI.
Mixed-Density Diffuser: Efficient Planning with Non-Uniform Temporal Resolution
Crimson Stambaugh, Rajesh Rao
https://doi.org/10.14428/esann/2026.ES2026-30
Crimson Stambaugh, Rajesh Rao
https://doi.org/10.14428/esann/2026.ES2026-30
Abstract:
Recent studies demonstrate that diffusion planners benefit from sparse-step planning over single-step planning. Training models to skip steps in their trajectories helps capture long-term dependencies without additional or memory computational cost. However, predicting excessively sparse plans degrades performance. We hypothesize this temporal density threshold is non-uniform across a temporal horizon and that certain parts of a planned trajectory should be more densely planned. We propose Mixed-Density Diffuser (MDD), a diffusion planner where the densities throughout the horizon are tunable hyperparameters. We show that MDD achieves a new SOTA across the Maze2D, Franka Kitchen, and Antmaze D4RL task domains.
Recent studies demonstrate that diffusion planners benefit from sparse-step planning over single-step planning. Training models to skip steps in their trajectories helps capture long-term dependencies without additional or memory computational cost. However, predicting excessively sparse plans degrades performance. We hypothesize this temporal density threshold is non-uniform across a temporal horizon and that certain parts of a planned trajectory should be more densely planned. We propose Mixed-Density Diffuser (MDD), a diffusion planner where the densities throughout the horizon are tunable hyperparameters. We show that MDD achieves a new SOTA across the Maze2D, Franka Kitchen, and Antmaze D4RL task domains.
Integrating Potential-Based Reward Shaping into AlphaZero
Koen Boeckx, Xavier Neyt
https://doi.org/10.14428/esann/2026.ES2026-104
Koen Boeckx, Xavier Neyt
https://doi.org/10.14428/esann/2026.ES2026-104
Abstract:
AlphaZero achieves superhuman performance through pure self-play without human expertise, but its dependence on sparse terminal rewards limits learning efficiency. This paper investigates integrating potential-based reward shaping into AlphaZero to accelerate learning while preserving optimality. We address whether reward shaping improves sample efficiency without compromising final performance, and which integration methods prove most effective. We present two implementation approaches: search-time shaping and auxiliary network heads, each targeting different components of the learning process. Experimental evaluation on Othello provides initial evidence of benefits, with ongoing work on comprehensive performance characterization across diverse environments.
AlphaZero achieves superhuman performance through pure self-play without human expertise, but its dependence on sparse terminal rewards limits learning efficiency. This paper investigates integrating potential-based reward shaping into AlphaZero to accelerate learning while preserving optimality. We address whether reward shaping improves sample efficiency without compromising final performance, and which integration methods prove most effective. We present two implementation approaches: search-time shaping and auxiliary network heads, each targeting different components of the learning process. Experimental evaluation on Othello provides initial evidence of benefits, with ongoing work on comprehensive performance characterization across diverse environments.
Hybrid-AIRL: Enhancing Inverse Reinforcement Learning with Supervised Expert Guidance
Bram Silue, Santiago Amaya-Corredor, Patrick Mannion, Lander Willem, Pieter Libin
https://doi.org/10.14428/esann/2026.ES2026-114
Bram Silue, Santiago Amaya-Corredor, Patrick Mannion, Lander Willem, Pieter Libin
https://doi.org/10.14428/esann/2026.ES2026-114
Abstract:
Adversarial Inverse Reinforcement Learning (AIRL) addresses sparse rewards by inferring dense reward functions from expert demonstrations, but its performance in complex, imperfect-information settings is underexplored. We evaluate AIRL in Heads-Up Limit Hold'em (HULHE) poker and observe that it faces challenges producing sufficiently informative rewards. To address this, we introduce Hybrid-AIRL (H-AIRL), which improves reward inference and policy learning using a partially supervised loss from expert data and stochastic regularization. Experiments on Gymnasium benchmarks and HULHE poker show that H-AIRL improves sample efficiency and training stability, highlighting the value of supervised signals in inverse RL.
Adversarial Inverse Reinforcement Learning (AIRL) addresses sparse rewards by inferring dense reward functions from expert demonstrations, but its performance in complex, imperfect-information settings is underexplored. We evaluate AIRL in Heads-Up Limit Hold'em (HULHE) poker and observe that it faces challenges producing sufficiently informative rewards. To address this, we introduce Hybrid-AIRL (H-AIRL), which improves reward inference and policy learning using a partially supervised loss from expert data and stochastic regularization. Experiments on Gymnasium benchmarks and HULHE poker show that H-AIRL improves sample efficiency and training stability, highlighting the value of supervised signals in inverse RL.
Deobfuscation as a GNN-Based Graph-Edit Problem by Reinforcement Learning
Roxane Cohen, Robin David, Samuel Hangouët, Florian Yger, Fabrice Rossi
https://doi.org/10.14428/esann/2026.ES2026-155
Roxane Cohen, Robin David, Samuel Hangouët, Florian Yger, Fabrice Rossi
https://doi.org/10.14428/esann/2026.ES2026-155
Abstract:
Obfuscation is a software protection technique that transforms a program's binary code to conceal its behavior and to hinder analysis. Conversely, deobfuscation is an adversarial process that seeks to partially or fully remove the applied obfuscation in order to recover the original, unobfuscated code. This work introduces the first deobfuscation framework based on Reinforcement Learning (RL). It models an obfuscated function's binary code using a novel graph representation integrating both data and control-flow. The graph is then progressively simplified through a sequence of graph-edit operations, selected iteratively by a Graph Neural Network-based agent operating within a RL pipeline. Experiments demonstrate promising results on Mixed Boolean-Arithmetic (MBA) obfuscation, where multiple variants of diverse expressions can successfully be simplified into valid deobfuscated variants.
Obfuscation is a software protection technique that transforms a program's binary code to conceal its behavior and to hinder analysis. Conversely, deobfuscation is an adversarial process that seeks to partially or fully remove the applied obfuscation in order to recover the original, unobfuscated code. This work introduces the first deobfuscation framework based on Reinforcement Learning (RL). It models an obfuscated function's binary code using a novel graph representation integrating both data and control-flow. The graph is then progressively simplified through a sequence of graph-edit operations, selected iteratively by a Graph Neural Network-based agent operating within a RL pipeline. Experiments demonstrate promising results on Mixed Boolean-Arithmetic (MBA) obfuscation, where multiple variants of diverse expressions can successfully be simplified into valid deobfuscated variants.
Point-wise Q-value maximization for converging Q-learning in continuous state-spaces
Philipp Wissmann, Daniel Hein, Steffen Udluft, Thomas Runkler
https://doi.org/10.14428/esann/2026.ES2026-253
Philipp Wissmann, Daniel Hein, Steffen Udluft, Thomas Runkler
https://doi.org/10.14428/esann/2026.ES2026-253
Abstract:
This paper introduces a novel Q-learning framework to address instabilities in offline reinforcement learning with continuous state spaces. We identify the recurring collapse of Q-value targets as core challenge and propose a stabilization technique that replaces the iteration-wise targets with their point-wise maximum across iterations. This approach enforces convergence and fully mitigates recursive errors. We show that a performance metric linking Q-values to policy performance is directly available. Our findings represent a first step toward stabilizing Q-learning in challenging settings and highlight the potential of model-based approaches.
This paper introduces a novel Q-learning framework to address instabilities in offline reinforcement learning with continuous state spaces. We identify the recurring collapse of Q-value targets as core challenge and propose a stabilization technique that replaces the iteration-wise targets with their point-wise maximum across iterations. This approach enforces convergence and fully mitigates recursive errors. We show that a performance metric linking Q-values to policy performance is directly available. Our findings represent a first step toward stabilizing Q-learning in challenging settings and highlight the potential of model-based approaches.
DImension reduction, feature selection and unsupervised learning
LMAP: Local PCA Models with Global MDS Embeddings
Oliver Kramer
https://doi.org/10.14428/esann/2026.ES2026-58
Oliver Kramer
https://doi.org/10.14428/esann/2026.ES2026-58
Abstract:
This paper introduces LMAP (Local PCA Models with Global MDS Embeddings), a geometric method for nonlinear dimensionality reduction that combines local PCA-based tangent charts with global MDS alignment to obtain smooth embeddings with coherent local and global structure. Landmark points define locally linear models that approximate the manifold’s tangent geometry, while classical multidimensional scaling aligns these charts into a consistent low-dimensional representation.
The resulting atlas admits a closed-form out-of-sample extension via weighted blending of multiple tangent charts, yielding a continuous and reproducible mapping from the ambient space to the embedding.
Experiments on synthetic manifolds analyze the influence of landmark density and neighborhood size and show that LMAP produces globally consistent embeddings that bridge the gap between linear PCA and stochastic neighbor-based methods, achieving low global distortion while maintaining reliable trustworthiness and out-of-sample stability.
This paper introduces LMAP (Local PCA Models with Global MDS Embeddings), a geometric method for nonlinear dimensionality reduction that combines local PCA-based tangent charts with global MDS alignment to obtain smooth embeddings with coherent local and global structure. Landmark points define locally linear models that approximate the manifold’s tangent geometry, while classical multidimensional scaling aligns these charts into a consistent low-dimensional representation.
The resulting atlas admits a closed-form out-of-sample extension via weighted blending of multiple tangent charts, yielding a continuous and reproducible mapping from the ambient space to the embedding.
Experiments on synthetic manifolds analyze the influence of landmark density and neighborhood size and show that LMAP produces globally consistent embeddings that bridge the gap between linear PCA and stochastic neighbor-based methods, achieving low global distortion while maintaining reliable trustworthiness and out-of-sample stability.
SPDNet-AE: a Compact SPD Representation through Riemannian Autoencoding
Thibault de Surrel, Charlotte Boucherie, Florian Yger
https://doi.org/10.14428/esann/2026.ES2026-158
Thibault de Surrel, Charlotte Boucherie, Florian Yger
https://doi.org/10.14428/esann/2026.ES2026-158
Abstract:
When building dimension reduction methods tailored for Symmetric Positive Definite (SPD) matrices, it is crucial to account for their Riemannian geometry. In this work, we propose an SPDNet-based autoencoder, that we call \emph{SPDNet-AE}, that learns low-dimensional SPD representations of high-dimensional SPD matrices while preserving the geometry throughout the network. The SPDNet-AE is built using the BiMap layer of the SPDNet, but we allow it to have multiple channels. We show that our SPDNet-AE is able to learn a useful low-dimensional representation of the data for classification (without any class information). Moreover, we show that with a comparable number of parameters, a classical Euclidean autoencoder is not able to learn and maintain the SPD constraint on the input matrices.
When building dimension reduction methods tailored for Symmetric Positive Definite (SPD) matrices, it is crucial to account for their Riemannian geometry. In this work, we propose an SPDNet-based autoencoder, that we call \emph{SPDNet-AE}, that learns low-dimensional SPD representations of high-dimensional SPD matrices while preserving the geometry throughout the network. The SPDNet-AE is built using the BiMap layer of the SPDNet, but we allow it to have multiple channels. We show that our SPDNet-AE is able to learn a useful low-dimensional representation of the data for classification (without any class information). Moreover, we show that with a comparable number of parameters, a classical Euclidean autoencoder is not able to learn and maintain the SPD constraint on the input matrices.
Multi-Scale Stochastic Neighbor Embedding with Twice Adaptive Bandwidths
Lee John, Pierre Lambert, Edouard Couplet, Pierre Merveille, Dounia Mulders, Cyril de Bodt, Michel Verleysen
https://doi.org/10.14428/esann/2026.ES2026-335
Lee John, Pierre Lambert, Edouard Couplet, Pierre Merveille, Dounia Mulders, Cyril de Bodt, Michel Verleysen
https://doi.org/10.14428/esann/2026.ES2026-335
Abstract:
Neighbor embedding has been a quantum leap in nonlinear dimensionality reduction, revolutionizing the way data can be visualized.
Neighbor embedding typically adapts to the local density in the high-dimensional data space with adaptive bandwidths in entropic affinities, while it resolves scale indeterminacies by having unit bandwidths in the low-dimensional embedding space.
In this paper, multi-scale stochastic neighbor embedding (Ms.SNE) is improved by allowing it to adapt low-dimensional bandwidths in a data-driven way instead of having fixed ones.
In practice, Ms.SNE goes through a multi-scale optimization process; coordinates and bandwidths are optimized separately, in an alternate fashion, to avoid interferences: (i) bandwidths are optimized from previous coordinates and (ii) coordinates are optimized given the new bandwidths.
Experimentally, twice adaptive bandwidths improve Ms.SNE's capability to preserve neighborhoods on all scales, i.e., local \emph{and} global data structure; this claim is supported with quantitative results on several benchmarks.
Neighbor embedding has been a quantum leap in nonlinear dimensionality reduction, revolutionizing the way data can be visualized.
Neighbor embedding typically adapts to the local density in the high-dimensional data space with adaptive bandwidths in entropic affinities, while it resolves scale indeterminacies by having unit bandwidths in the low-dimensional embedding space.
In this paper, multi-scale stochastic neighbor embedding (Ms.SNE) is improved by allowing it to adapt low-dimensional bandwidths in a data-driven way instead of having fixed ones.
In practice, Ms.SNE goes through a multi-scale optimization process; coordinates and bandwidths are optimized separately, in an alternate fashion, to avoid interferences: (i) bandwidths are optimized from previous coordinates and (ii) coordinates are optimized given the new bandwidths.
Experimentally, twice adaptive bandwidths improve Ms.SNE's capability to preserve neighborhoods on all scales, i.e., local \emph{and} global data structure; this claim is supported with quantitative results on several benchmarks.
Interpretable Parametric Neighbour Embedding
Edouard Couplet, Pierre Lambert, Michel Verleysen, Lee John, Cyril de Bodt
https://doi.org/10.14428/esann/2026.ES2026-338
Edouard Couplet, Pierre Lambert, Michel Verleysen, Lee John, Cyril de Bodt
https://doi.org/10.14428/esann/2026.ES2026-338
Abstract:
Neighbour embedding methods effectively preserve local structures in low-dimensional spaces but are difficult to interpret due to their nonlinear nature, limiting their full potential for data exploration. Post-hoc interpretability methods exist but require extra effort and only approximate the embedding. We propose an interpretable-by-design neighbour embedding approach, where each data point is projected via a linear combination of shared basis matrices, enabling exact and direct explanations in terms of local coefficients and global directions. We demonstrate the approach on a single-cell dataset using a t-SNE loss, showing that it can provide useful interpretations while maintaining embedding quality.
Neighbour embedding methods effectively preserve local structures in low-dimensional spaces but are difficult to interpret due to their nonlinear nature, limiting their full potential for data exploration. Post-hoc interpretability methods exist but require extra effort and only approximate the embedding. We propose an interpretable-by-design neighbour embedding approach, where each data point is projected via a linear combination of shared basis matrices, enabling exact and direct explanations in terms of local coefficients and global directions. We demonstrate the approach on a single-cell dataset using a t-SNE loss, showing that it can provide useful interpretations while maintaining embedding quality.
Enforcing Feature Sparseness for Reliable Classification by Prototype-Based Models
Marika Kaden, Julius Voigt, Sascha Saralajew, Thomas Villmann
https://doi.org/10.14428/esann/2026.ES2026-88
Marika Kaden, Julius Voigt, Sascha Saralajew, Thomas Villmann
https://doi.org/10.14428/esann/2026.ES2026-88
Abstract:
Machine learning classifiers adjust implicitly or explicitly the importance of the data features to solve a given classification task. This feature weighting often does not imply feature sparseness, which, however, may be important for interpretability and model evaluation. This contribution proposes how to force feature sparseness in combination with feature relevance for prototype-based classification learning to obtain reliable and interpretable classification decisions.
Machine learning classifiers adjust implicitly or explicitly the importance of the data features to solve a given classification task. This feature weighting often does not imply feature sparseness, which, however, may be important for interpretability and model evaluation. This contribution proposes how to force feature sparseness in combination with feature relevance for prototype-based classification learning to obtain reliable and interpretable classification decisions.
Autoencoders versus PCA for feature extraction in FDG PET scans in neurodegenerative diseases
Roland Veen, Sofie Lövdal, Kaitlin Vos, Ciro Setolino, Sanne Meles, Michael Biehl
https://doi.org/10.14428/esann/2026.ES2026-216
Roland Veen, Sofie Lövdal, Kaitlin Vos, Ciro Setolino, Sanne Meles, Michael Biehl
https://doi.org/10.14428/esann/2026.ES2026-216
Abstract:
Positron Emission Tomography (PET) neuroimaging is a valuable tool for studying neurodegenerative disorders. Using N=236 FDG PET scans from healthy individuals and three patient classes, we compare linear and non-linear feature extraction using Principal Component Analysis (PCA), a convolutional autoencoder (CAE) and a variational autoencoder (VAE). We investigate whether non-linear dimensionality reduction improves disease classification performance when used with Generalised Matrix Learning Vector Quantisation (GMLVQ) classifiers trained in the latent space. Although PCA had a smaller reconstruction error between the original and reconstructed images, the features from both AEs achieved higher classification performance, with the VAE showing a slight advantage. The interpretability of the AE-GMLVQ combination was retained by visualising the GMLVQ classification space and the decoded prototypes in voxel space. Even with limited training data, using AE for feature extraction improved classification performance by a significant margin while maintaining interpretability.
Positron Emission Tomography (PET) neuroimaging is a valuable tool for studying neurodegenerative disorders. Using N=236 FDG PET scans from healthy individuals and three patient classes, we compare linear and non-linear feature extraction using Principal Component Analysis (PCA), a convolutional autoencoder (CAE) and a variational autoencoder (VAE). We investigate whether non-linear dimensionality reduction improves disease classification performance when used with Generalised Matrix Learning Vector Quantisation (GMLVQ) classifiers trained in the latent space. Although PCA had a smaller reconstruction error between the original and reconstructed images, the features from both AEs achieved higher classification performance, with the VAE showing a slight advantage. The interpretability of the AE-GMLVQ combination was retained by visualising the GMLVQ classification space and the decoded prototypes in voxel space. Even with limited training data, using AE for feature extraction improved classification performance by a significant margin while maintaining interpretability.
Information-Theoretic Unsupervised Feature Selection for High-Dimensional Spatial Data
Samuel Suárez-Marcote, Abhijeet Vishwasrao, Ricardo Vinuesa, Laura Morán-Fernández, Verónica Bolón-Canedo
https://doi.org/10.14428/esann/2026.ES2026-252
Samuel Suárez-Marcote, Abhijeet Vishwasrao, Ricardo Vinuesa, Laura Morán-Fernández, Verónica Bolón-Canedo
https://doi.org/10.14428/esann/2026.ES2026-252
Abstract:
High-dimensional unlabelled datasets present significant challenges for efficient analysis, storage and interpretation. Unsupervised feature selection offers a way to retain the most informative variables while discarding redundant or uninformative ones, enabling more scalable processing. We introduce a spatially aware, unsupervised method that uses information theoretic criteria to identify informative variables while limiting redundancy, producing compact and spatially dispersed subsets of features. Our approach avoids dependence on labelled data or model-specific wrappers, making it suitable for large unstructured datasets. Experiments on MNIST and EMNIST datasets, including high-resolution upscaled versions, show that the selected features preserve both discriminative structure and reconstruction quality better than chosen supervised and unsupervised baselines, demonstrating the effectiveness of entropy and mutual information coupling in unlabelled high-dimensional settings.
High-dimensional unlabelled datasets present significant challenges for efficient analysis, storage and interpretation. Unsupervised feature selection offers a way to retain the most informative variables while discarding redundant or uninformative ones, enabling more scalable processing. We introduce a spatially aware, unsupervised method that uses information theoretic criteria to identify informative variables while limiting redundancy, producing compact and spatially dispersed subsets of features. Our approach avoids dependence on labelled data or model-specific wrappers, making it suitable for large unstructured datasets. Experiments on MNIST and EMNIST datasets, including high-resolution upscaled versions, show that the selected features preserve both discriminative structure and reconstruction quality better than chosen supervised and unsupervised baselines, demonstrating the effectiveness of entropy and mutual information coupling in unlabelled high-dimensional settings.
Topology-Preserving Prototype Learning on Riemannian Manifolds
Lucas Schwarz, Magdalena Psenickova, Thomas Villmann, Florian Röhrbein
https://doi.org/10.14428/esann/2026.ES2026-53
Lucas Schwarz, Magdalena Psenickova, Thomas Villmann, Florian Röhrbein
https://doi.org/10.14428/esann/2026.ES2026-53
Abstract:
Learning prototypes in an unsupervised manner that respects the data density and topology is crucial for tasks such as clustering, representation learning, and visualization of high-dimensional datasets. In this paper, we propose a generalization of the Neural Gas algorithm to Riemannian manifolds, leveraging geodesic distances for prototype adaptation. The approach additionally generates a prototype neighborhood structure, enabling faithful approximation of both geometry and topology of data distributed on Riemannian manifolds. We demonstrate its effectiveness on real-world datasets from manifolds such as $SO(n)$, $S_{++}^n$ and $Gr(n,k)$ and compare our approach to Riemannian versions of other related methods such as K-Means, K-Medoids and a Riemannian Self-Organizing Map.
Learning prototypes in an unsupervised manner that respects the data density and topology is crucial for tasks such as clustering, representation learning, and visualization of high-dimensional datasets. In this paper, we propose a generalization of the Neural Gas algorithm to Riemannian manifolds, leveraging geodesic distances for prototype adaptation. The approach additionally generates a prototype neighborhood structure, enabling faithful approximation of both geometry and topology of data distributed on Riemannian manifolds. We demonstrate its effectiveness on real-world datasets from manifolds such as $SO(n)$, $S_{++}^n$ and $Gr(n,k)$ and compare our approach to Riemannian versions of other related methods such as K-Means, K-Medoids and a Riemannian Self-Organizing Map.
Effects of a Parametrized Neighborhood Family on the Quality of Self-Organizing Maps
Cesar Cardenas, Erzsébet Merényi
https://doi.org/10.14428/esann/2026.ES2026-263
Cesar Cardenas, Erzsébet Merényi
https://doi.org/10.14428/esann/2026.ES2026-263
Abstract:
We present a parametrized family of neighborhood functions and experiments to analyze the isolated effects of their shapes on the quality of SOM representations. There appears to be an emerging trend in the relationship between topological and quantization errors, and Shannon entropy. We highlight the trade-offs among these SOM quality measures.
We present a parametrized family of neighborhood functions and experiments to analyze the isolated effects of their shapes on the quality of SOM representations. There appears to be an emerging trend in the relationship between topological and quantization errors, and Shannon entropy. We highlight the trade-offs among these SOM quality measures.
Polarizing Kernels: A Definite Approach to Clustering with Indefinite Similarities
Frank-Michael Schleif, Manuel Röder, Maximilian Münch, Peter Preinesberger
https://doi.org/10.14428/esann/2026.ES2026-36
Frank-Michael Schleif, Manuel Röder, Maximilian Münch, Peter Preinesberger
https://doi.org/10.14428/esann/2026.ES2026-36
Abstract:
Many real-world similarity measures are indefinite, violating the assumptions of kernel-based clustering methods.
We propose a principled framework based on the polar decomposition of the similarity matrix, yielding a positive semi-definite
component that preserves relational structure while enabling consistent out-of-sample extensions also for dissimilarities.
The resulting polarized kernels support stable and interpretable clustering across synthetic and real datasets, demonstrating
that polar decomposition provides a theoretically sound and practically effective bridge between indefinite similarity learning and kernel-based methods.
Many real-world similarity measures are indefinite, violating the assumptions of kernel-based clustering methods.
We propose a principled framework based on the polar decomposition of the similarity matrix, yielding a positive semi-definite
component that preserves relational structure while enabling consistent out-of-sample extensions also for dissimilarities.
The resulting polarized kernels support stable and interpretable clustering across synthetic and real datasets, demonstrating
that polar decomposition provides a theoretically sound and practically effective bridge between indefinite similarity learning and kernel-based methods.
Graph learning
Assessing Graph Neural Networks for latency and power consumption prediction in application mappings on multicore architectures
Oscar Roussel, Zainab Ghrayeb, Sébastien Le Nours, Christine Sinoquet
https://doi.org/10.14428/esann/2026.ES2026-149
Oscar Roussel, Zainab Ghrayeb, Sébastien Le Nours, Christine Sinoquet
https://doi.org/10.14428/esann/2026.ES2026-149
Abstract:
Accurately estimating the latency and power consumption of software applications deployed on multicore systems remains a major challenge for early-stage optimization, as existing methods typically rely on slow and resource-intensive simulations. This paper explores modeling application-to-architecture mappings as heterogeneous graphs and investigates Graph Neural Networks (GNNs) for predicting their performance. Four GNN models are evaluated across eleven datasets, considering five neural network-based software applications. The two best models achieve mean absolute percentage errors of about 2\% for power prediction and 15\% for latency, with prediction times of only a few tens of milliseconds. These results indicate the potential of GNN-based prediction as an efficient alternative to simulation-driven estimation, paving the way for early-stage AI-assisted mapping optimization.
Accurately estimating the latency and power consumption of software applications deployed on multicore systems remains a major challenge for early-stage optimization, as existing methods typically rely on slow and resource-intensive simulations. This paper explores modeling application-to-architecture mappings as heterogeneous graphs and investigates Graph Neural Networks (GNNs) for predicting their performance. Four GNN models are evaluated across eleven datasets, considering five neural network-based software applications. The two best models achieve mean absolute percentage errors of about 2\% for power prediction and 15\% for latency, with prediction times of only a few tens of milliseconds. These results indicate the potential of GNN-based prediction as an efficient alternative to simulation-driven estimation, paving the way for early-stage AI-assisted mapping optimization.
GNNs Don't Need Backprop
Pascal Welke, Benoit Goupil, Fabian Jogl
https://doi.org/10.14428/esann/2026.ES2026-340
Pascal Welke, Benoit Goupil, Fabian Jogl
https://doi.org/10.14428/esann/2026.ES2026-340
Abstract:
We propose an alternative training method for graph neural networks (GNNs) that does not require gradient information.
Instead, we sample randomly initialized models and select the one that maximizes an alignment score between its graph embedding space and the label space.
Our method is easy to parallelize on CPU and GPU architectures and achieves competitive results with state-of-the-art stochastic gradient descent training on several graph classification benchmarks.
We propose an alternative training method for graph neural networks (GNNs) that does not require gradient information.
Instead, we sample randomly initialized models and select the one that maximizes an alignment score between its graph embedding space and the label space.
Our method is easy to parallelize on CPU and GPU architectures and achieves competitive results with state-of-the-art stochastic gradient descent training on several graph classification benchmarks.
A Collaborative Distillation Framework for Graph Neural Networks
Paul Agbaje, Arkajyoti Mitra, Afia Anjum, Pranali Khose, Ebelechukwu Nwafor, Habeeb Olufowobi
https://doi.org/10.14428/esann/2026.ES2026-174
Paul Agbaje, Arkajyoti Mitra, Afia Anjum, Pranali Khose, Ebelechukwu Nwafor, Habeeb Olufowobi
https://doi.org/10.14428/esann/2026.ES2026-174
Abstract:
Graph Neural Networks (GNNs) power applications such as content recommendation, knowledge graph reasoning, and social network analysis, where modeling both structure and features is essential. Knowledge Distillation (KD) enables transferring knowledge from large GNNs to compact models for efficient deployment, yet most approaches rely on a pre-trained teacher. We propose a mutual learning framework in which shallow GNNs collaboratively distill knowledge by iteratively exchanging predictions during training. The framework integrates adaptive logit weighting to balance peer influence and entropy enhancement to promote exploration and prevent early convergence. Experiments on multiple benchmark datasets show that our approach improves GNN performance and that the learned knowledge can be effectively transferred to lightweight graph-less models, offering a scalable alternative for graph learning.
Graph Neural Networks (GNNs) power applications such as content recommendation, knowledge graph reasoning, and social network analysis, where modeling both structure and features is essential. Knowledge Distillation (KD) enables transferring knowledge from large GNNs to compact models for efficient deployment, yet most approaches rely on a pre-trained teacher. We propose a mutual learning framework in which shallow GNNs collaboratively distill knowledge by iteratively exchanging predictions during training. The framework integrates adaptive logit weighting to balance peer influence and entropy enhancement to promote exploration and prevent early convergence. Experiments on multiple benchmark datasets show that our approach improves GNN performance and that the learned knowledge can be effectively transferred to lightweight graph-less models, offering a scalable alternative for graph learning.
Enriching Graph Topology Representations with Line Graph Transformations
Paolo Frazzetto, Luca Pasa, Nicolò Navarin, Alessandro Sperduti
https://doi.org/10.14428/esann/2026.ES2026-210
Paolo Frazzetto, Luca Pasa, Nicolò Navarin, Alessandro Sperduti
https://doi.org/10.14428/esann/2026.ES2026-210
Abstract:
Many Graph Neural Networks (GNNs) in the literature are based on message-passing, which introduces a strong learning bias that may fail to capture critical relational information encoded in the edges of the graph, particularly in tasks where the structural role of edges is as significant as that of nodes, such as in chemical molecular analysis or social network dynamics. We propose a novel architecture inspired by line graph theory that explicitly models edge adjacencies, iteratively transforming a graph into its corresponding line graph. Differently from message-passing, the iterative application of this transformation enables the exchange of information among non-adjacent nodes, allowing for the capture of complex topological dependencies, which standard GNNs overlook.
Experiments on standard benchmarks show promising results.
Many Graph Neural Networks (GNNs) in the literature are based on message-passing, which introduces a strong learning bias that may fail to capture critical relational information encoded in the edges of the graph, particularly in tasks where the structural role of edges is as significant as that of nodes, such as in chemical molecular analysis or social network dynamics. We propose a novel architecture inspired by line graph theory that explicitly models edge adjacencies, iteratively transforming a graph into its corresponding line graph. Differently from message-passing, the iterative application of this transformation enables the exchange of information among non-adjacent nodes, allowing for the capture of complex topological dependencies, which standard GNNs overlook.
Experiments on standard benchmarks show promising results.
On the Rank Properties of the Renormalization Trick in GCNs
Anna Bison, Alessandro Sperduti
https://doi.org/10.14428/esann/2026.ES2026-233
Anna Bison, Alessandro Sperduti
https://doi.org/10.14428/esann/2026.ES2026-233
Abstract:
We analyze the renormalization trick in GCNs beyond its established spectral smoothing effect. We prove that self-loops can increase the rank of the propagation matrix by resolving local symmetries that otherwise induce linear dependencies, providing a rigorous explanation for the trick's effectiveness: through Oono and Suzuki's framework, the rank increment counteracts the loss of expressive power. From a spectral point of view, the addition of self-loops in GCNs ensures that some information located in the normalized adjacency's kernel is preserved and propagated rather than discarded.
We analyze the renormalization trick in GCNs beyond its established spectral smoothing effect. We prove that self-loops can increase the rank of the propagation matrix by resolving local symmetries that otherwise induce linear dependencies, providing a rigorous explanation for the trick's effectiveness: through Oono and Suzuki's framework, the rank increment counteracts the loss of expressive power. From a spectral point of view, the addition of self-loops in GCNs ensures that some information located in the normalized adjacency's kernel is preserved and propagated rather than discarded.
Domination Reliability Analysis Based on Graph Features Using Generalized Matrix LVQ
Mandy Lange-Geisler , Klaus Dohmen, Thomas Villmann
https://doi.org/10.14428/esann/2026.ES2026-348
Mandy Lange-Geisler , Klaus Dohmen, Thomas Villmann
https://doi.org/10.14428/esann/2026.ES2026-348
Abstract:
Evaluating domination reliability -- a network reliability measure related to service networks---is a computationally expensive task,
due to its proven NP-hardness. To address this challenge, we propose an interpretable prototype-based classification approach that predicts domination reliability
levels from selected graph features using Generalized Matrix Learning Vector Quantization (GMLVQ) with a particular focus on analyzing how these graph features influence the predicted reliability levels. The interpretability is enhanced by the visualization of a threshold graph, which is derived from the learned relevance matrix.
Evaluating domination reliability -- a network reliability measure related to service networks---is a computationally expensive task,
due to its proven NP-hardness. To address this challenge, we propose an interpretable prototype-based classification approach that predicts domination reliability
levels from selected graph features using Generalized Matrix Learning Vector Quantization (GMLVQ) with a particular focus on analyzing how these graph features influence the predicted reliability levels. The interpretability is enhanced by the visualization of a threshold graph, which is derived from the learned relevance matrix.
Scaling up graph-based classifiers with a divide and conquer approach
Caius Souza, Rafael Lopes Almeida, Frederico Gualberto Ferreira Coelho, Antonio Padua Braga
https://doi.org/10.14428/esann/2026.ES2026-222
Caius Souza, Rafael Lopes Almeida, Frederico Gualberto Ferreira Coelho, Antonio Padua Braga
https://doi.org/10.14428/esann/2026.ES2026-222
Abstract:
Traditional techniques, such as SVMs, may become impractical for real-world applications. This case study introduces a divide-and-conquer ensemble framework, namely Gabriel Graph Network Ensemble, applied to ChipClass, a graph-based SVM. By training on independent data partitions and then aggregating predictions, GGNE achieves a 1500x speed up while preserving comparable AUC across 23 datasets. These results demonstrate that our ensemble decomposition can achieve critical acceleration without risking performance, offering a feasible approach for classifying massive volumes of data.
Traditional techniques, such as SVMs, may become impractical for real-world applications. This case study introduces a divide-and-conquer ensemble framework, namely Gabriel Graph Network Ensemble, applied to ChipClass, a graph-based SVM. By training on independent data partitions and then aggregating predictions, GGNE achieves a 1500x speed up while preserving comparable AUC across 23 datasets. These results demonstrate that our ensemble decomposition can achieve critical acceleration without risking performance, offering a feasible approach for classifying massive volumes of data.
Boundary-Constrained Diffusion Models for Floorplan Generation: Balancing Realism and Diversity
Leonardo Stoppani, Davide Bacciu, Shahab Mokarizadeh
https://doi.org/10.14428/esann/2026.ES2026-75
Leonardo Stoppani, Davide Bacciu, Shahab Mokarizadeh
https://doi.org/10.14428/esann/2026.ES2026-75
Abstract:
Diffusion models have become widely popular for automated floorplan generation, producing highly realistic layouts conditioned
on user-defined constraints. However, optimizing for perceptual metrics such as the Fréchet Inception Distance (FID) causes limited design diversity. To address this, we propose the Diversity Score (DS), a metric that quantifies layout diversity under fixed constraints. Moreover, to improve geometric consistency, we introduce a Boundary Cross-Attention (BCA) module that enables conditioning on building boundaries. Our experiments show that BCA significantly improves boundary adherence, while prolonged training drives diversity collapse undiagnosed by FID, revealing a critical trade-off between realism and diversity. Out-Of-Distribution evaluations further demonstrate the models’ reliance on dataset priors, emphasizing the need for generative systems that explicitly balance fidelity, diversity, and generalization in architectural design tasks.
Diffusion models have become widely popular for automated floorplan generation, producing highly realistic layouts conditioned
on user-defined constraints. However, optimizing for perceptual metrics such as the Fréchet Inception Distance (FID) causes limited design diversity. To address this, we propose the Diversity Score (DS), a metric that quantifies layout diversity under fixed constraints. Moreover, to improve geometric consistency, we introduce a Boundary Cross-Attention (BCA) module that enables conditioning on building boundaries. Our experiments show that BCA significantly improves boundary adherence, while prolonged training drives diversity collapse undiagnosed by FID, revealing a critical trade-off between realism and diversity. Out-Of-Distribution evaluations further demonstrate the models’ reliance on dataset priors, emphasizing the need for generative systems that explicitly balance fidelity, diversity, and generalization in architectural design tasks.
Deep models and learning principles
Scalable Linearized Laplace Approximation via Surrogate Neural Kernel
Luis A. Ortega, Simón Rodríguez-Santana, Daniel Hernández-Lobato
https://doi.org/10.14428/esann/2026.ES2026-71
Luis A. Ortega, Simón Rodríguez-Santana, Daniel Hernández-Lobato
https://doi.org/10.14428/esann/2026.ES2026-71
Abstract:
We introduce a scalable method to approximate the kernel of the Linearized Laplace Approximation (LLA). For this, we use a surrogate deep neural network (DNN) that learns a compact feature representation whose inner product replicates the Neural Tangent Kernel (NTK). This avoids the need to compute large Jacobians. Training relies solely on efficient Jacobian–vector products, allowing to compute predictive uncertainty on large-scale pre-trained DNNs. Experimental results show similar or improved uncertainty estimation and calibration compared to existing LLA approximations. Notwithstanding, biasing the learned kernel significantly enhances out-of-distribution detection. This remarks the benefits of the proposed method for finding better kernels than the NTK in the context of LLA to compute prediction uncertainty given a pre-trained DNN.
We introduce a scalable method to approximate the kernel of the Linearized Laplace Approximation (LLA). For this, we use a surrogate deep neural network (DNN) that learns a compact feature representation whose inner product replicates the Neural Tangent Kernel (NTK). This avoids the need to compute large Jacobians. Training relies solely on efficient Jacobian–vector products, allowing to compute predictive uncertainty on large-scale pre-trained DNNs. Experimental results show similar or improved uncertainty estimation and calibration compared to existing LLA approximations. Notwithstanding, biasing the learned kernel significantly enhances out-of-distribution detection. This remarks the benefits of the proposed method for finding better kernels than the NTK in the context of LLA to compute prediction uncertainty given a pre-trained DNN.
Where to grow: a surprisingly straightforward criterion to detect under-expressive layers
Yifan WANG, Julien MILLE, Moncef HIDANE
https://doi.org/10.14428/esann/2026.ES2026-117
Yifan WANG, Julien MILLE, Moncef HIDANE
https://doi.org/10.14428/esann/2026.ES2026-117
Abstract:
The idea of gradually increasing the capacity of a neural network, during or after training, has gained attention in recent years. Online network growing raises three fundamental questions: when, where, and how to expand. Among all questions, the present paper focuses on the "where". We introduce the Normalized Activation Gradient Norm (NAGN), a lightweight criterion to detect under-expressive layers using standard backpropagation signals. Experiments on image classification demonstrate that this approach consistently discovers compact architectures that match larger static baselines with reduced training costs.
The idea of gradually increasing the capacity of a neural network, during or after training, has gained attention in recent years. Online network growing raises three fundamental questions: when, where, and how to expand. Among all questions, the present paper focuses on the "where". We introduce the Normalized Activation Gradient Norm (NAGN), a lightweight criterion to detect under-expressive layers using standard backpropagation signals. Experiments on image classification demonstrate that this approach consistently discovers compact architectures that match larger static baselines with reduced training costs.
Improving the Linearized Laplace Approximation via Quadratic Approximations
Pedro Jiménez García-Ligero, Luis A. Ortega, Pablo Morales-Álvarez, Daniel Hernández-Lobato
https://doi.org/10.14428/esann/2026.ES2026-154
Pedro Jiménez García-Ligero, Luis A. Ortega, Pablo Morales-Álvarez, Daniel Hernández-Lobato
https://doi.org/10.14428/esann/2026.ES2026-154
Abstract:
Deep neural networks (DNNs) often produce overconfident out-of-distribution predictions, motivating Bayesian uncertainty quantification. The Linearized Laplace Approximation (LLA) achieves this by linearizing the DNN and applying Laplace inference to the resulting model. Importantly, the linear model is also used for prediction. We argue this linearization in the posterior may degrade fidelity to the true Laplace approximation. To alleviate this problem, without increasing significantly the computational cost, we propose the Quadratic Laplace Approximation (QLA). QLA approximates each second order factor in the approximate Laplace log-posterior using a rank-one factor obtained via efficient power iterations. QLA is expected to yield a posterior precision closer to that of the full Laplace without forming the full Hessian, which is typically intractable. For prediction, QLA also uses the linearized model. Empirically, QLA yields modest yet consistent uncertainty estimation improvements over LLA on five regression datasets.
Deep neural networks (DNNs) often produce overconfident out-of-distribution predictions, motivating Bayesian uncertainty quantification. The Linearized Laplace Approximation (LLA) achieves this by linearizing the DNN and applying Laplace inference to the resulting model. Importantly, the linear model is also used for prediction. We argue this linearization in the posterior may degrade fidelity to the true Laplace approximation. To alleviate this problem, without increasing significantly the computational cost, we propose the Quadratic Laplace Approximation (QLA). QLA approximates each second order factor in the approximate Laplace log-posterior using a rank-one factor obtained via efficient power iterations. QLA is expected to yield a posterior precision closer to that of the full Laplace without forming the full Hessian, which is typically intractable. For prediction, QLA also uses the linearized model. Empirically, QLA yields modest yet consistent uncertainty estimation improvements over LLA on five regression datasets.
Self-Certified Deep Metric Learning with N-Tuple Losses
Oritsemisan Meggison, Sijia Zhou, Ata Kaban
https://doi.org/10.14428/esann/2026.ES2026-203
Oritsemisan Meggison, Sijia Zhou, Ata Kaban
https://doi.org/10.14428/esann/2026.ES2026-203
Abstract:
Deep metric learning (DML) excels in retrieval and re-identification tasks, driven by tuple-wise loss functions that capture rich inter-sample relationships. Yet, the non-i.i.d. nature of such training complicates generalisation analysis, and the impact of tuple size remains unclear. While PAC-Bayes bounds have been applied to pairwise learning, their behaviour for higher-order tuples is unexplored. We extend this evaluation to general N-tuple settings using neural networks trained with a PAC-Bayes regularised surrogate loss. Experiments on CIFAR-10 show that sample complexity increases with tuple size, revealing trade-offs between tuple size, model capacity, and certificate tightness.
Deep metric learning (DML) excels in retrieval and re-identification tasks, driven by tuple-wise loss functions that capture rich inter-sample relationships. Yet, the non-i.i.d. nature of such training complicates generalisation analysis, and the impact of tuple size remains unclear. While PAC-Bayes bounds have been applied to pairwise learning, their behaviour for higher-order tuples is unexplored. We extend this evaluation to general N-tuple settings using neural networks trained with a PAC-Bayes regularised surrogate loss. Experiments on CIFAR-10 show that sample complexity increases with tuple size, revealing trade-offs between tuple size, model capacity, and certificate tightness.
Towards Understanding The Winner-Take-Most Behavior Of Neural Network Representations
Gilles Peiffer, Christophe De Vleeschouwer, Simon Carbonnelle
https://doi.org/10.14428/esann/2026.ES2026-262
Gilles Peiffer, Christophe De Vleeschouwer, Simon Carbonnelle
https://doi.org/10.14428/esann/2026.ES2026-262
Abstract:
We analyze neuron-level representations of generalizing and memorizing networks, using a synthetic dataset designed through aggregating hidden patterns into supervision classes.
We observe that the average pre-activation of the most activated patterns of a class (and inversely) in each neuron increases during training: a winner-take-most phenomenon.
The network applies a divide-and-conquer strategy, where each neuron specializes in classifying different patterns of a class.
Through an ablation study, describe three necessary conditions for this phenomenon.
Finally, we provide intuition for why it occurs, drawing links with existing work on sample difficulty, gradient coherence, and implicit clustering.
We analyze neuron-level representations of generalizing and memorizing networks, using a synthetic dataset designed through aggregating hidden patterns into supervision classes.
We observe that the average pre-activation of the most activated patterns of a class (and inversely) in each neuron increases during training: a winner-take-most phenomenon.
The network applies a divide-and-conquer strategy, where each neuron specializes in classifying different patterns of a class.
Through an ablation study, describe three necessary conditions for this phenomenon.
Finally, we provide intuition for why it occurs, drawing links with existing work on sample difficulty, gradient coherence, and implicit clustering.
Energy-Based Dropout with Patch-Level Regularization
Tom Devynck, Bilal FAYE, Djamel Djamel BOUCHAFFRA, Nadjib Lazaar, Hanane AZZAG, Mustapha Lebbah
https://doi.org/10.14428/esann/2026.ES2026-293
Tom Devynck, Bilal FAYE, Djamel Djamel BOUCHAFFRA, Nadjib Lazaar, Hanane AZZAG, Mustapha Lebbah
https://doi.org/10.14428/esann/2026.ES2026-293
Abstract:
Dropout is a widely used stochastic regularization technique, yet it over-
looks structural dependencies within feature maps. We introduce PB-
EDropout, an energy-based approach that preserves low-energy spatial
patches within each channel while suppressing the rest. During training,
candidate masks are sampled from Gibbs distributions and refined us-
ing genetic operators, and a running exponential moving average yields
deterministic masks for inference. Experiments on shallow CNNs demon-
strate that PB-EDropout consistently improves test accuracy over stan-
dard dropout, remains effective even with frozen masks, generates inter-
pretable visualizations of discriminative features and are available here
https://github.com/Tom-Dvk/PB-EDropout/tree/main
Dropout is a widely used stochastic regularization technique, yet it over-
looks structural dependencies within feature maps. We introduce PB-
EDropout, an energy-based approach that preserves low-energy spatial
patches within each channel while suppressing the rest. During training,
candidate masks are sampled from Gibbs distributions and refined us-
ing genetic operators, and a running exponential moving average yields
deterministic masks for inference. Experiments on shallow CNNs demon-
strate that PB-EDropout consistently improves test accuracy over stan-
dard dropout, remains effective even with frozen masks, generates inter-
pretable visualizations of discriminative features and are available here
https://github.com/Tom-Dvk/PB-EDropout/tree/main
Distillation of a tractable model from the VQ-VAE
Armin Hadzic, Milan Papež, Tomáš Pevný
https://doi.org/10.14428/esann/2026.ES2026-167
Armin Hadzic, Milan Papež, Tomáš Pevný
https://doi.org/10.14428/esann/2026.ES2026-167
Abstract:
Deep generative models with a discrete latent space, such as the Vector-Quantized Variational Autoencoder (VQ-VAE), offer excellent data generation capabilities, but---due to the large size of their latent space---their probabilistic inference is deemed intractable.
We demonstrate that the VQ-VAE can be \emph{distilled} into a tractable model by selecting a subset of latent variables with high probability under the prior.
We frame the distilled model as a probabilistic circuit, and show that it preserves the expressiveness of the VQ-VAE while providing tractable probabilistic inference.
Experiments illustrate competitive performance in both density estimation and conditional generation tasks, challenging the view of the VQ-VAE as an inherently intractable model.
Deep generative models with a discrete latent space, such as the Vector-Quantized Variational Autoencoder (VQ-VAE), offer excellent data generation capabilities, but---due to the large size of their latent space---their probabilistic inference is deemed intractable.
We demonstrate that the VQ-VAE can be \emph{distilled} into a tractable model by selecting a subset of latent variables with high probability under the prior.
We frame the distilled model as a probabilistic circuit, and show that it preserves the expressiveness of the VQ-VAE while providing tractable probabilistic inference.
Experiments illustrate competitive performance in both density estimation and conditional generation tasks, challenging the view of the VQ-VAE as an inherently intractable model.
Cross-tested Aggregated Hold-out
Joseph Rynkiewicz
https://doi.org/10.14428/esann/2026.ES2026-224
Joseph Rynkiewicz
https://doi.org/10.14428/esann/2026.ES2026-224
Abstract:
Model selection is central to machine learning. Recently, Aggregated Hold-Out (Agghoo) has been shown to have optimal theoretical properties. This procedure mixes cross-validation and aggregation and is undoubtedly one of the most popular methods for obtaining state-of-the-art predictors. However, suppose that we have to choose the best family of deep learning models from a wide range of architectures. In that case, it isn't easy to know which one will perform best during aggregation, especially if the sample size is too small. The purpose of this paper is to explore, from both a practical and a theoretical perspective, what can happen with this aggregation method and to provide a method for assessing the performance of the aggregated model.
Model selection is central to machine learning. Recently, Aggregated Hold-Out (Agghoo) has been shown to have optimal theoretical properties. This procedure mixes cross-validation and aggregation and is undoubtedly one of the most popular methods for obtaining state-of-the-art predictors. However, suppose that we have to choose the best family of deep learning models from a wide range of architectures. In that case, it isn't easy to know which one will perform best during aggregation, especially if the sample size is too small. The purpose of this paper is to explore, from both a practical and a theoretical perspective, what can happen with this aggregation method and to provide a method for assessing the performance of the aggregated model.
LLM-based Vulnerable Code Augmentation: Generate or Refactor?
Dyna Soumhane Ouchebara, Stéphane Dupont
https://doi.org/10.14428/esann/2026.ES2026-374
Dyna Soumhane Ouchebara, Stéphane Dupont
https://doi.org/10.14428/esann/2026.ES2026-374
Abstract:
Vulnerability code-bases often suffer from severe imbalance, limiting the effectiveness of Deep Learning-based vulnerability classifiers. Data Augmentation could help solve this by mitigating the scarcity of under-represented vulnerability types. In this context, we investigate LLM-based augmentation for vulnerable functions, comparing controlled generation of new vulnerable samples with semantics-preserving refactoring of existing ones. Using Qwen2.5-Coder to produce augmented data and CodeBERT as a classifier on the SVEN dataset, we find that our approaches are indeed effective in enriching vulnerable code-bases through a simple process and with reasonable quality, and that a hybrid strategy best boosts vulnerability classifiers' performance. Code repository is available at https://github.com/DynaSoumhaneOuchebara/LLM-based-code-augmentation-Generate-or-Refactor- .
Vulnerability code-bases often suffer from severe imbalance, limiting the effectiveness of Deep Learning-based vulnerability classifiers. Data Augmentation could help solve this by mitigating the scarcity of under-represented vulnerability types. In this context, we investigate LLM-based augmentation for vulnerable functions, comparing controlled generation of new vulnerable samples with semantics-preserving refactoring of existing ones. Using Qwen2.5-Coder to produce augmented data and CodeBERT as a classifier on the SVEN dataset, we find that our approaches are indeed effective in enriching vulnerable code-bases through a simple process and with reasonable quality, and that a hybrid strategy best boosts vulnerability classifiers' performance. Code repository is available at https://github.com/DynaSoumhaneOuchebara/LLM-based-code-augmentation-Generate-or-Refactor- .
Boosting the Lottery Ticket Hypothesis with Knowledge Distillation: Finding Sparser Winning Tickets
Daan Luyckx, Peter Karsmakers
https://doi.org/10.14428/esann/2026.ES2026-237
Daan Luyckx, Peter Karsmakers
https://doi.org/10.14428/esann/2026.ES2026-237
Abstract:
The Lottery Ticket Hypothesis (LTH) suggests that dense networks contain sparse, trainable subnetworks, ”winning tickets”, that can match the original model’s performance when trained in isolation. These subnetworks are usually found through iterative pruning, but at high sparsity many potentially effective subnetworks fail to converge under standard training. Knowledge distillation (KD) mitigates this issue by providing richer supervision from a teacher model. We propose the Knowledge-Distilled Lottery Ticket (KDLT) procedure, a dual-phase method that applies KD during pruning and retraining to recover stronger sparse subnetworks. Experiments on MNIST, CIFAR-10/100, and Tiny-ImageNet show that KDLT delivers higher accuracy at fixed sparsity or comparable accuracy at higher sparsity.
The Lottery Ticket Hypothesis (LTH) suggests that dense networks contain sparse, trainable subnetworks, ”winning tickets”, that can match the original model’s performance when trained in isolation. These subnetworks are usually found through iterative pruning, but at high sparsity many potentially effective subnetworks fail to converge under standard training. Knowledge distillation (KD) mitigates this issue by providing richer supervision from a teacher model. We propose the Knowledge-Distilled Lottery Ticket (KDLT) procedure, a dual-phase method that applies KD during pruning and retraining to recover stronger sparse subnetworks. Experiments on MNIST, CIFAR-10/100, and Tiny-ImageNet show that KDLT delivers higher accuracy at fixed sparsity or comparable accuracy at higher sparsity.
Hub-Aware Hybrid Search: Accelerating the Locally Aligned Ant Technique
Simone Vilardi, Reynier Peletier, Felipe Contreras, Kerstin Bunte
https://doi.org/10.14428/esann/2026.ES2026-110
Simone Vilardi, Reynier Peletier, Felipe Contreras, Kerstin Bunte
https://doi.org/10.14428/esann/2026.ES2026-110
Abstract:
Finding manifold structures in noisy and high-dimensional point clouds is a challenging but important problem. In astronomical observation survey and simulation data the detection of filaments, streams (1D), walls (2D) and clusters (3D) gives rise to deeper understanding of the evolution of our universe. The Locally Aligned Ant Technique (LAAT) uses biologically inspired agents to efficiently recover faint and multidimensional structures. However, very dense hubs (e.g. nodes or globular clusters) dominate the ants’ activity, creating unnecessary computational overheads. In this paper we propose a two-stage solution. First a fast preprocessing step locates the hubs and replaces them with a tailored likelihood model. Subsequently, a mixed likelihood-pheromone strategy guides the ants to efficiently bridge the dense regions. We demonstrate improvements in detection efficiency and robustness of LAAT with synthetic and a large-scale astronomical N-body simulation of the cosmic web.
Finding manifold structures in noisy and high-dimensional point clouds is a challenging but important problem. In astronomical observation survey and simulation data the detection of filaments, streams (1D), walls (2D) and clusters (3D) gives rise to deeper understanding of the evolution of our universe. The Locally Aligned Ant Technique (LAAT) uses biologically inspired agents to efficiently recover faint and multidimensional structures. However, very dense hubs (e.g. nodes or globular clusters) dominate the ants’ activity, creating unnecessary computational overheads. In this paper we propose a two-stage solution. First a fast preprocessing step locates the hubs and replaces them with a tailored likelihood model. Subsequently, a mixed likelihood-pheromone strategy guides the ants to efficiently bridge the dense regions. We demonstrate improvements in detection efficiency and robustness of LAAT with synthetic and a large-scale astronomical N-body simulation of the cosmic web.
Linear Evaluation Complexity of Surrogate-Assisted (1+1)-EA on OneMax
Oliver Kramer
https://doi.org/10.14428/esann/2026.ES2026-74
Oliver Kramer
https://doi.org/10.14428/esann/2026.ES2026-74
Abstract:
Fitness evaluations often dominate the runtime of evolutionary algorithms (EAs), yet most runtime analyses assume that every offspring is evaluated on the true objective function.
This work presents a unified runtime-theoretic framework for \emph{surrogate-assisted} evolutionary algorithms that selectively schedule true evaluations based on predictive information.
The framework characterizes expected optimization time directly in terms of evaluation frequency and improvement probabilities, abstracting from specific surrogate implementations.
Focusing on a surrogate model with imperfect prediction accuracy, we show that linear $\Theta(n)$ scaling of true fitness evaluations can be achieved on OneMax when accuracy remains bounded away from zero and evaluations are performed periodically with logarithmic frequency. Simulation results confirm this prediction and clearly separate the surrogate-assisted process from the classical $\Theta(n\log n)$ behavior of the (1+1)-EA.
Fitness evaluations often dominate the runtime of evolutionary algorithms (EAs), yet most runtime analyses assume that every offspring is evaluated on the true objective function.
This work presents a unified runtime-theoretic framework for \emph{surrogate-assisted} evolutionary algorithms that selectively schedule true evaluations based on predictive information.
The framework characterizes expected optimization time directly in terms of evaluation frequency and improvement probabilities, abstracting from specific surrogate implementations.
Focusing on a surrogate model with imperfect prediction accuracy, we show that linear $\Theta(n)$ scaling of true fitness evaluations can be achieved on OneMax when accuracy remains bounded away from zero and evaluations are performed periodically with logarithmic frequency. Simulation results confirm this prediction and clearly separate the surrogate-assisted process from the classical $\Theta(n\log n)$ behavior of the (1+1)-EA.