Bruges, Belgium October 05 - 07
Content of the proceedings
-
Feature extraction & Prototype learning
Continual Learning beyond classification
Classification
Learning theory and principles
Deep learning, signal, image
Anomaly and change point detection
Deep Semantic Segmentation Models in Computer Vision
Regression and forecasting
Recurrent learning and reservoir computing
Natural language processing, and recommender systems
Machine Learning and Information Theoretic Methods for Molecular Biology and Medicine
Concept drift
Deep Learning for Graphs
Reinforcement learning
Feature extraction & Prototype learning
Modular Representations for Weak Disentanglement
Andrea Valenti, Davide Bacciu
https://doi.org/10.14428/esann/2022.ES2022-52
Andrea Valenti, Davide Bacciu
https://doi.org/10.14428/esann/2022.ES2022-52
Abstract:
The recently introduced weakly disentangled representations proposed to relax some constraints of the previous definitions of disentanglement, in exchange for more flexibility. However, at the moment, weak disentanglement can only be achieved by increasing the amount of supervision as the number of factors of variations of the data increase. In this paper, we introduce modular representations for weak disentanglement, a novel method that allows to keep the amount of supervised information constant with respect the number of generative factors. The experiments shows that models using modular representations can increase their performance with respect to previous work without the need of additional supervision.
The recently introduced weakly disentangled representations proposed to relax some constraints of the previous definitions of disentanglement, in exchange for more flexibility. However, at the moment, weak disentanglement can only be achieved by increasing the amount of supervision as the number of factors of variations of the data increase. In this paper, we introduce modular representations for weak disentanglement, a novel method that allows to keep the amount of supervised information constant with respect the number of generative factors. The experiments shows that models using modular representations can increase their performance with respect to previous work without the need of additional supervision.
Feature selection for transfer learning using particle swarm optimization and complexity measures
Verónica Bolón-Canedo, Guillermo Castillo García, Laura Morán-Fernández
https://doi.org/10.14428/esann/2022.ES2022-57
Verónica Bolón-Canedo, Guillermo Castillo García, Laura Morán-Fernández
https://doi.org/10.14428/esann/2022.ES2022-57
Abstract:
Particle Swarm Optimization is an optimization algorithm that explores a search space guided by a fitness function in order to find a good solution. We apply it to perform feature selection for domain adaptation. Usually, classification error is used in the fitness function to evaluate the goodness of subsets of features. In this paper, we propose to employ complexity metrics instead, as we assume that reducing the complexity of the problem will lead to good results while being less computationally demanding and independent from the classifier used for testing. We found out that our method is indeed faster and selects fewer features, obtaining competitive classification accuracy results.
Particle Swarm Optimization is an optimization algorithm that explores a search space guided by a fitness function in order to find a good solution. We apply it to perform feature selection for domain adaptation. Usually, classification error is used in the fitness function to evaluate the goodness of subsets of features. In this paper, we propose to employ complexity metrics instead, as we assume that reducing the complexity of the problem will lead to good results while being less computationally demanding and independent from the classifier used for testing. We found out that our method is indeed faster and selects fewer features, obtaining competitive classification accuracy results.
Supervised dimensionality reduction technique accounting for soft classes
Sorina Mustatea, Michael Aupetit, Jaakko Peltonen, Sylvain Lespinats, Denys Dutykh
https://doi.org/10.14428/esann/2022.ES2022-26
Sorina Mustatea, Michael Aupetit, Jaakko Peltonen, Sylvain Lespinats, Denys Dutykh
https://doi.org/10.14428/esann/2022.ES2022-26
Abstract:
Exploratory visual analysis of multidimensional labeled data is challenging. Multidimensional Projections for labeled data attempt to separate classes while preserving neighborhoods. In this work, we consider the case where instances are assigned multiple labels with probabilities or weights: for example, the output of a probabilistic classifier, fuzzy membership functions in fuzzy logic, or the votes of each voters for each candidate in an election. We propose a new technique to better preserve neighborhoods of such data. Our experiments show improved qualitative results compared to unsupervised, and existing dimensionality reduction techniques.
Exploratory visual analysis of multidimensional labeled data is challenging. Multidimensional Projections for labeled data attempt to separate classes while preserving neighborhoods. In this work, we consider the case where instances are assigned multiple labels with probabilities or weights: for example, the output of a probabilistic classifier, fuzzy membership functions in fuzzy logic, or the votes of each voters for each candidate in an election. We propose a new technique to better preserve neighborhoods of such data. Our experiments show improved qualitative results compared to unsupervised, and existing dimensionality reduction techniques.
Graph-Induced Geodesics Approximation for Non-Euclidian K-Means
Hervé Frezza-Buet
https://doi.org/10.14428/esann/2022.ES2022-14
Hervé Frezza-Buet
https://doi.org/10.14428/esann/2022.ES2022-14
Abstract:
In this paper, an adaptation of the k-means algorithm and related methods to non-Euclidian topology is presented. The paper introduces a rationale for approximating the geodesics of that topology, as well as a learning rule that is robust to noise. The first results on artificial but very noisy distributions presented here are promising for further experimentation on real cases.
In this paper, an adaptation of the k-means algorithm and related methods to non-Euclidian topology is presented. The paper introduces a rationale for approximating the geodesics of that topology, as well as a learning rule that is robust to noise. The first results on artificial but very noisy distributions presented here are promising for further experimentation on real cases.
A WiSARD-based conditional branch predictor
Luis A. Q. Villon, Zachary Susskind, Alan T. L. Bacellar, Igor D. S. Miranda, Leandro Santiago de Araújo, Priscila Lima, Mauricio Breternitz Jr., LIZY JOHN, Felipe França, Diego Leonel Cadette Dutra
https://doi.org/10.14428/esann/2022.ES2022-65
Luis A. Q. Villon, Zachary Susskind, Alan T. L. Bacellar, Igor D. S. Miranda, Leandro Santiago de Araújo, Priscila Lima, Mauricio Breternitz Jr., LIZY JOHN, Felipe França, Diego Leonel Cadette Dutra
https://doi.org/10.14428/esann/2022.ES2022-65
Abstract:
Conditional branch prediction is a technique used to speculatively execute instructions before knowing the direction of conditional branch statements. Perceptron-based predictors have been extensively studied, however, they need large input sizes for the data to be linearly separable. To learn nonlinear functions from the inputs, we propose a conditional branch predictor based on the WiSARD model and compare it with two state-of-the-art predictors, the TAGE-SC-L and the Multiperspective Perceptron. We show that the WiSARD-based predictor with a smaller input size outperforms the perceptron-based predictor by about 0.09% and achieves similar accuracy to that of TAGE-SC-L.
Conditional branch prediction is a technique used to speculatively execute instructions before knowing the direction of conditional branch statements. Perceptron-based predictors have been extensively studied, however, they need large input sizes for the data to be linearly separable. To learn nonlinear functions from the inputs, we propose a conditional branch predictor based on the WiSARD model and compare it with two state-of-the-art predictors, the TAGE-SC-L and the Multiperspective Perceptron. We show that the WiSARD-based predictor with a smaller input size outperforms the perceptron-based predictor by about 0.09% and achieves similar accuracy to that of TAGE-SC-L.
Distributive Thermometer: A New Unary Encoding for Weightless Neural Networks
Alan T. L. Bacellar, Zachary Susskind, Luis A. Q. Villon, Igor D. S. Miranda, Leandro Santiago de Araújo, Diego Leonel Cadette Dutra, Mauricio Breternitz Jr., LIZY JOHN, Priscila Lima, Felipe França
https://doi.org/10.14428/esann/2022.ES2022-94
Alan T. L. Bacellar, Zachary Susskind, Luis A. Q. Villon, Igor D. S. Miranda, Leandro Santiago de Araújo, Diego Leonel Cadette Dutra, Mauricio Breternitz Jr., LIZY JOHN, Priscila Lima, Felipe França
https://doi.org/10.14428/esann/2022.ES2022-94
Abstract:
The binary encoding of real valued inputs is a crucial part of Weightless Neural Networks. The Linear Thermometer and its variations are the most prominent methods to determine binary encoding for input data but, as they make assumptions about the input distribution, the resulting encoding is sub-optimal and possibly wasteful when the assumption is incorrect. We propose a new thermometer approach that doesn’t require such assumptions. Our results show that it achieves similar or better accuracy when compared to a thermometer that correctly assumes the distribution, and accuracy gains up to 26.3% when other thermometers assume an unsound distribution.
The binary encoding of real valued inputs is a crucial part of Weightless Neural Networks. The Linear Thermometer and its variations are the most prominent methods to determine binary encoding for input data but, as they make assumptions about the input distribution, the resulting encoding is sub-optimal and possibly wasteful when the assumption is incorrect. We propose a new thermometer approach that doesn’t require such assumptions. Our results show that it achieves similar or better accuracy when compared to a thermometer that correctly assumes the distribution, and accuracy gains up to 26.3% when other thermometers assume an unsound distribution.
Pruning Weightless Neural Networks
Zachary Susskind, Alan T. L. Bacellar, Aman Arora, Luis A. Q. Villon, Renan Mendanha, Leandro Santiago de Araújo, Diego Leonel Cadette Dutra, Priscila Lima, Felipe França, Igor D. S. Miranda, Mauricio Breternitz Jr., LIZY JOHN
https://doi.org/10.14428/esann/2022.ES2022-55
Zachary Susskind, Alan T. L. Bacellar, Aman Arora, Luis A. Q. Villon, Renan Mendanha, Leandro Santiago de Araújo, Diego Leonel Cadette Dutra, Priscila Lima, Felipe França, Igor D. S. Miranda, Mauricio Breternitz Jr., LIZY JOHN
https://doi.org/10.14428/esann/2022.ES2022-55
Abstract:
Weightless neural networks (WNNs) are a type of machine learning model which perform prediction using lookup tables (LUTs) instead of arithmetic operations. Recent advancements in WNNs have reduced model sizes and improved accuracies, reducing the gap in accuracy with deep neural networks (DNNs). Modern DNNs leverage "pruning" techniques to reduce model size, but this has not been previously explored for WNNs. We propose a WNN pruning strategy based on identifying and culling the LUTs which contribute least to overall model accuracy. We demonstrate an average 40% reduction in model size with at most 1% reduction in accuracy.
Weightless neural networks (WNNs) are a type of machine learning model which perform prediction using lookup tables (LUTs) instead of arithmetic operations. Recent advancements in WNNs have reduced model sizes and improved accuracies, reducing the gap in accuracy with deep neural networks (DNNs). Modern DNNs leverage "pruning" techniques to reduce model size, but this has not been previously explored for WNNs. We propose a WNN pruning strategy based on identifying and culling the LUTs which contribute least to overall model accuracy. We demonstrate an average 40% reduction in model size with at most 1% reduction in accuracy.
Classification of preclinical markers in Alzheimer's disease via WiSARD classifier
Massimo De Gregorio, Alfonso Di Costanzo, Andrea Motta, Debora Paris, Antonio Sorgente
https://doi.org/10.14428/esann/2022.ES2022-63
Massimo De Gregorio, Alfonso Di Costanzo, Andrea Motta, Debora Paris, Antonio Sorgente
https://doi.org/10.14428/esann/2022.ES2022-63
Abstract:
Weightless Neural Networks (WNN) showed good results in various classification problems in different domains where a significant number of instances for each class was available. In this work, we present different WiSARD classifiers facing a quite difficult problem from both the clinical and the machine learning point of views: the classification of preclinical markers in Alzheimer's disease continuum patients. The four domain classes show overlapping molecular features and each has few instances (around 40). Together with improved class separation, the confirmation of the goodness of the results is given by a series of experiments that have compared the WiSARD classifiers to many state-of-the-art classifiers, even those ensembles, showing that the obtained results are very close to the top best models.
Weightless Neural Networks (WNN) showed good results in various classification problems in different domains where a significant number of instances for each class was available. In this work, we present different WiSARD classifiers facing a quite difficult problem from both the clinical and the machine learning point of views: the classification of preclinical markers in Alzheimer's disease continuum patients. The four domain classes show overlapping molecular features and each has few instances (around 40). Together with improved class separation, the confirmation of the goodness of the results is given by a series of experiments that have compared the WiSARD classifiers to many state-of-the-art classifiers, even those ensembles, showing that the obtained results are very close to the top best models.
A bayesian variational principle for dynamic self organizing maps
Anthony Fillion, Thibaut Kulak, François Blayo
https://doi.org/10.14428/esann/2022.ES2022-11
Anthony Fillion, Thibaut Kulak, François Blayo
https://doi.org/10.14428/esann/2022.ES2022-11
Abstract:
We propose organisation conditions that yield a method for training SOM with adaptative neighborhood radius in a variational Bayesian framework. This method is validated on a non-stationary setting and compared in an high-dimensional setting with an other adaptative method.
We propose organisation conditions that yield a method for training SOM with adaptative neighborhood radius in a variational Bayesian framework. This method is validated on a non-stationary setting and compared in an high-dimensional setting with an other adaptative method.
The role of feature selection in personalized recommender systems
Roger Bagué-Masanés, Verónica Bolón-Canedo, Beatriz Remeseiro
https://doi.org/10.14428/esann/2022.ES2022-43
Roger Bagué-Masanés, Verónica Bolón-Canedo, Beatriz Remeseiro
https://doi.org/10.14428/esann/2022.ES2022-43
Abstract:
Recommender systems suggest products to users, based on their popularity or the users' preferences. This paper proposes a hybrid personalized recommender system based on users' tastes and also on information available about items. We used a dataset downloaded from TripAdvisor, which contains some information from restaurants (items), such as price range or special diets. Feature selection techniques are employed to analyze the impact that each variable has on personalized recommendations, allowing us to understand not only the process underlying the recommendation to favor the transparency of the system, but also what users value the most when choosing a restaurant.
Recommender systems suggest products to users, based on their popularity or the users' preferences. This paper proposes a hybrid personalized recommender system based on users' tastes and also on information available about items. We used a dataset downloaded from TripAdvisor, which contains some information from restaurants (items), such as price range or special diets. Feature selection techniques are employed to analyze the impact that each variable has on personalized recommendations, allowing us to understand not only the process underlying the recommendation to favor the transparency of the system, but also what users value the most when choosing a restaurant.
Adaptive Gabor Filters for Interpretable Color Texture Classification
Gerrit Luimstra, Kerstin Bunte
https://doi.org/10.14428/esann/2022.ES2022-87
Gerrit Luimstra, Kerstin Bunte
https://doi.org/10.14428/esann/2022.ES2022-87
Abstract:
We introduce the use of trainable feature extractors, based on the Gabor function, into the interpretable machine learning domain. The use of adaptive Gabor filters allows for interpretable feature extraction to be learned automatically in a domain agnostic way, and comes with the benefit of a large reduction in trainable parameters. We implemented the filters into an image classification variant of learning vector quantization. We extend and compare the image classification variant of learning vector quantization with adaptive Gabor filters and demonstrate the proposed technique on VisTex color texture images. The adaptive Gabor filters show promising results for interpretable and efficient color texture classification.
We introduce the use of trainable feature extractors, based on the Gabor function, into the interpretable machine learning domain. The use of adaptive Gabor filters allows for interpretable feature extraction to be learned automatically in a domain agnostic way, and comes with the benefit of a large reduction in trainable parameters. We implemented the filters into an image classification variant of learning vector quantization. We extend and compare the image classification variant of learning vector quantization with adaptive Gabor filters and demonstrate the proposed technique on VisTex color texture images. The adaptive Gabor filters show promising results for interpretable and efficient color texture classification.
Continual Learning beyond classification
Tutorial - Continual Learning beyond classification
Alexander Gepperth, Timothée Lesort
https://doi.org/10.14428/esann/2022.ES2022-4
Alexander Gepperth, Timothée Lesort
https://doi.org/10.14428/esann/2022.ES2022-4
Abstract:
Continual Learning (CL, sometimes also termed incremental learning) is a flavor of machine learning where the usual assumption of stationary data distribution is relaxed or omitted. When naively applying, e.g., DNNs in CL problems, changes in the data distribution can cause the so-called catastrophic forgetting (CF) effect: an abrupt loss of previous knowledge. Although many significant contributions to enabling CL have been made in recent years, most works address supervised (classification) problems. This article reviews literature that study CL in other settings, such as learning with reduced supervision, fully unsupervised learning, and reinforcement learning. Besides proposing a simple schema for classifying CL approaches w.r.t. their level of autonomy and supervision, we discuss the specific challenges associated with each setting and the potential contributions to the field of CL in general.
Continual Learning (CL, sometimes also termed incremental learning) is a flavor of machine learning where the usual assumption of stationary data distribution is relaxed or omitted. When naively applying, e.g., DNNs in CL problems, changes in the data distribution can cause the so-called catastrophic forgetting (CF) effect: an abrupt loss of previous knowledge. Although many significant contributions to enabling CL have been made in recent years, most works address supervised (classification) problems. This article reviews literature that study CL in other settings, such as learning with reduced supervision, fully unsupervised learning, and reinforcement learning. Besides proposing a simple schema for classifying CL approaches w.r.t. their level of autonomy and supervision, we discuss the specific challenges associated with each setting and the potential contributions to the field of CL in general.
Continual Learning for Human State Monitoring
Federico Matteoni, Andrea Cossu, Claudio Gallicchio, Vincenzo Lomonaco, Davide Bacciu
https://doi.org/10.14428/esann/2022.ES2022-38
Federico Matteoni, Andrea Cossu, Claudio Gallicchio, Vincenzo Lomonaco, Davide Bacciu
https://doi.org/10.14428/esann/2022.ES2022-38
Abstract:
Continual Learning (CL) on time series data represents a promising but under-studied avenue for real-world applications. We propose two new CL benchmarks for Human State Monitoring. We carefully designed the benchmarks to mirror real-world environments in which new subjects are continuously added. We conducted an empirical evaluation to assess the ability of popular CL strategies to mitigate forgetting in our benchmarks. Our results highlight the fact that, due to the domain-incremental properties of our benchmarks, forgetting can be easily tackled even with a simple finetuning and that existing strategies struggle in accumulating knowledge over a fixed, held-out, test subject.
Continual Learning (CL) on time series data represents a promising but under-studied avenue for real-world applications. We propose two new CL benchmarks for Human State Monitoring. We carefully designed the benchmarks to mirror real-world environments in which new subjects are continuously added. We conducted an empirical evaluation to assess the ability of popular CL strategies to mitigate forgetting in our benchmarks. Our results highlight the fact that, due to the domain-incremental properties of our benchmarks, forgetting can be easily tackled even with a simple finetuning and that existing strategies struggle in accumulating knowledge over a fixed, held-out, test subject.
Continual Incremental Language Learning for Neural Machine Translation
Michele Resta, Davide Bacciu
https://doi.org/10.14428/esann/2022.ES2022-80
Michele Resta, Davide Bacciu
https://doi.org/10.14428/esann/2022.ES2022-80
Abstract:
The paper provides an experimental investigation of the phenomena of catastrophic forgetting for Neural Machine Translation systems. We introduce and describe the continual incremental language learning setting and its analogy with the classical continual learning scenario. The experiments measure the performance loss of a naive incremental training strategy against a jointly trained baseline, and we show the mitigating effect of the replay strategy. To this end, we also introduce a prioritized replay buffer strategy informed by the specific application domain.
The paper provides an experimental investigation of the phenomena of catastrophic forgetting for Neural Machine Translation systems. We introduce and describe the continual incremental language learning setting and its analogy with the classical continual learning scenario. The experiments measure the performance loss of a naive incremental training strategy against a jointly trained baseline, and we show the mitigating effect of the replay strategy. To this end, we also introduce a prioritized replay buffer strategy informed by the specific application domain.
Diverse Memory for Experience Replay in Continual Learning
Andrii Krutsylo, Pawel Morawiecki
https://doi.org/10.14428/esann/2022.ES2022-83
Andrii Krutsylo, Pawel Morawiecki
https://doi.org/10.14428/esann/2022.ES2022-83
Abstract:
Neural networks trained on data whose distribution is shifted in time suffer greatly from performance degradation. This problem is known as catastrophic forgetting, i.e. learning new classes leads to loss of accuracy on previously seen ones. A replay buffer can mitigate this problem by storing and reusing some of the data. In this paper, we propose a modification of sampling to the memory buffer using deep features extracted from the classifier itself to increase the diversity of stored samples. Our method demonstrates a consistent reduction in forgetting verified on different settings for MNIST, SVHN and CIFAR-10 datasets.
Neural networks trained on data whose distribution is shifted in time suffer greatly from performance degradation. This problem is known as catastrophic forgetting, i.e. learning new classes leads to loss of accuracy on previously seen ones. A replay buffer can mitigate this problem by storing and reusing some of the data. In this paper, we propose a modification of sampling to the memory buffer using deep features extracted from the classifier itself to increase the diversity of stored samples. Our method demonstrates a consistent reduction in forgetting verified on different settings for MNIST, SVHN and CIFAR-10 datasets.
Classification
Model Agnostic Local Explanations of Reject
André Artelt, Roel Visser, Barbara Hammer
https://doi.org/10.14428/esann/2022.ES2022-34
André Artelt, Roel Visser, Barbara Hammer
https://doi.org/10.14428/esann/2022.ES2022-34
Abstract:
The application of machine learning based decision making systems in safety critical areas requires reliable high certainty predictions. Reject options are a common way of ensuring a sufficiently high certainty of predictions. While being able to reject uncertain samples is important, it is also of importance to be able to explain why a particular sample was rejected. However, explaining reject options is still an open problem. We propose a model-agnostic method for locally explaining reject options by means of interpretable models and counterfactual explanations.
The application of machine learning based decision making systems in safety critical areas requires reliable high certainty predictions. Reject options are a common way of ensuring a sufficiently high certainty of predictions. While being able to reject uncertain samples is important, it is also of importance to be able to explain why a particular sample was rejected. However, explaining reject options is still an open problem. We propose a model-agnostic method for locally explaining reject options by means of interpretable models and counterfactual explanations.
Adaptive multi-modal positive semi-definite and indefinite kernel fusion for binary classification
Maximilian Münch, Christoph Raab, Simon Heilig, Manuel Röder, Frank-Michael Schleif
https://doi.org/10.14428/esann/2022.ES2022-70
Maximilian Münch, Christoph Raab, Simon Heilig, Manuel Röder, Frank-Michael Schleif
https://doi.org/10.14428/esann/2022.ES2022-70
Abstract:
Data and information are nowadays frequently available in multiple modalities like different sensor signals, textual descriptions, graph structures, and other formats. The maximum information from these heterogeneous representations can be obtained by fusing the various modalities by specific embeddings or proximity measures. Current approaches are widely limited in the fusion model and the used measures, especially when the data is non-vectorial. We propose a model to learn the spectral properties of the different inner product representations in a joined optimization problem. The approach is evaluated on various multi-modal data and compared to modern multiple-kernel learning and baseline techniques.
Data and information are nowadays frequently available in multiple modalities like different sensor signals, textual descriptions, graph structures, and other formats. The maximum information from these heterogeneous representations can be obtained by fusing the various modalities by specific embeddings or proximity measures. Current approaches are widely limited in the fusion model and the used measures, especially when the data is non-vectorial. We propose a model to learn the spectral properties of the different inner product representations in a joined optimization problem. The approach is evaluated on various multi-modal data and compared to modern multiple-kernel learning and baseline techniques.
A Kernel Based Multilinear SVD Approach for Multiple Sclerosis Profiles Classification
Berardino Barile, Pooya Ashtari, Francoise Durand-Dubief, Frederik Maes, Dominique Sappey-Marinier, Sabine Van Huffel
https://doi.org/10.14428/esann/2022.ES2022-17
Berardino Barile, Pooya Ashtari, Francoise Durand-Dubief, Frederik Maes, Dominique Sappey-Marinier, Sabine Van Huffel
https://doi.org/10.14428/esann/2022.ES2022-17
Abstract:
In machine learning, kernel data analysis represents a new approach to the study of neurological diseases such as Multiple Sclerosis (MS). In this work, a kernelization technique was combined with a tensor factorization method based on Multilinear Singular Value Decomposition (MLSVD) for MS profile classification. Our simple, yet effective, approach generates a meaningful feature embedding of multi-view data, allowing good classification performance. The results presented in this work define an interesting approach, given that only the anatomical T1-weighted image was used, which represents the most important modality in clinical applications.
In machine learning, kernel data analysis represents a new approach to the study of neurological diseases such as Multiple Sclerosis (MS). In this work, a kernelization technique was combined with a tensor factorization method based on Multilinear Singular Value Decomposition (MLSVD) for MS profile classification. Our simple, yet effective, approach generates a meaningful feature embedding of multi-view data, allowing good classification performance. The results presented in this work define an interesting approach, given that only the anatomical T1-weighted image was used, which represents the most important modality in clinical applications.
A Machine Learning Approach for School Dropout Prediction in Brazil
João Gabriel Corrêa Krüger, Jean Paul Barddal, Alceu de Souza Britto Jr.
https://doi.org/10.14428/esann/2022.ES2022-15
João Gabriel Corrêa Krüger, Jean Paul Barddal, Alceu de Souza Britto Jr.
https://doi.org/10.14428/esann/2022.ES2022-15
Abstract:
School dropout is a problem that impacts many socio-economic aspects, including inequality. Dropout prediction algorithms can help remediate this problem, although several past attempts in the literature did so using small datasets. This paper brings forward an experimental approach of machine learning for school dropout prediction in Brazilian schools. The data used for this study was first retrieved from the academic systems of a group of Brazilian private schools, which was later enriched with socio-economic data extracted from governmental sources. Using the dataset to train different types of classifiers, we obtained up to 95.2% precision rates when predicting dropout at different year and educational stages, thus allowing schools to plan and apply retention strategies.
School dropout is a problem that impacts many socio-economic aspects, including inequality. Dropout prediction algorithms can help remediate this problem, although several past attempts in the literature did so using small datasets. This paper brings forward an experimental approach of machine learning for school dropout prediction in Brazilian schools. The data used for this study was first retrieved from the academic systems of a group of Brazilian private schools, which was later enriched with socio-economic data extracted from governmental sources. Using the dataset to train different types of classifiers, we obtained up to 95.2% precision rates when predicting dropout at different year and educational stages, thus allowing schools to plan and apply retention strategies.
An empirical comparison of generators in replay-based continual learning
NADZEYA DZEMIDOVICH, Alexander Gepperth
https://doi.org/10.14428/esann/2022.ES2022-111
NADZEYA DZEMIDOVICH, Alexander Gepperth
https://doi.org/10.14428/esann/2022.ES2022-111
Abstract:
This study in the context of continual learning (CL) with DNNs compares several types of generators when performing replay, i.e., the generation of previously seen samples, to avoid catastrophic forgetting. Principal generators are generative adversarial networks (GANs) and variational autoencoders (VAEs). We evaluate these generators in various flavors (conditional, Wasserstein etc.) w.r.t. CL performance on a variety of CL tasks generated from the MNIST benchmark. Concerning generators, we find that VAEs are generally more compatible with CL than GANs. More generally, we find that replay-based CL faces counter-intuitive issues for seemingly simple problems: first, that performance degrades more strongly as less information is added, and, furthermore, that performance degrades even when only known information is added.
This study in the context of continual learning (CL) with DNNs compares several types of generators when performing replay, i.e., the generation of previously seen samples, to avoid catastrophic forgetting. Principal generators are generative adversarial networks (GANs) and variational autoencoders (VAEs). We evaluate these generators in various flavors (conditional, Wasserstein etc.) w.r.t. CL performance on a variety of CL tasks generated from the MNIST benchmark. Concerning generators, we find that VAEs are generally more compatible with CL than GANs. More generally, we find that replay-based CL faces counter-intuitive issues for seemingly simple problems: first, that performance degrades more strongly as less information is added, and, furthermore, that performance degrades even when only known information is added.
Machine learning for automated quality control in injection moulding manufacturing
Steven Michiels, Cédric De Schryver, Lynn Houthuys, Frederik Vogeler, Frederik Desplentere
https://doi.org/10.14428/esann/2022.ES2022-48
Steven Michiels, Cédric De Schryver, Lynn Houthuys, Frederik Vogeler, Frederik Desplentere
https://doi.org/10.14428/esann/2022.ES2022-48
Abstract:
Machine learning (ML) may improve and automate quality control (QC) in injection moulding manufacturing. As the labelling of extensive, real-world process data is costly, however, the use of simulated process data may offer a first step towards successful implementation. In this study, simulated data was used to develop a predictive model for the product quality of an injection moulded sorting container. The achieved accuracy, specificity and sensitivity on the test set was 99.4%, 99.7% and 94.7%, respectively. This study thus shows the potential of ML towards automated QC in injection moulding and encourages extension to models trained on real-world data.
Machine learning (ML) may improve and automate quality control (QC) in injection moulding manufacturing. As the labelling of extensive, real-world process data is costly, however, the use of simulated process data may offer a first step towards successful implementation. In this study, simulated data was used to develop a predictive model for the product quality of an injection moulded sorting container. The achieved accuracy, specificity and sensitivity on the test set was 99.4%, 99.7% and 94.7%, respectively. This study thus shows the potential of ML towards automated QC in injection moulding and encourages extension to models trained on real-world data.
Simple Non Regressive Informed Machine Learning Model for Predictive Maintenance of Railway Critical Assets
Luca Oneto, Simone Minisi, Andrea Garrone, Renzo Canepa, Carlo Dambra, Davide Anguita
https://doi.org/10.14428/esann/2022.ES2022-59
Luca Oneto, Simone Minisi, Andrea Garrone, Renzo Canepa, Carlo Dambra, Davide Anguita
https://doi.org/10.14428/esann/2022.ES2022-59
Abstract:
Signals, track circuits, switches, and relay rooms are simultaneously the most critical and most maintained railway assets. A fault of one of these assets may strongly reduce the railway network capacity or even disrupt the circulation. Effectively predicting what assets may need maintenance allows to anticipate the intervention thus avoiding a failure. Currently, this problem is tackled by infrastructure managers mostly relying on operators' experience and with limited support of decision supporting tools. In this paper, we propose a Simple Informed Machine Learning (ML) based model able to automatically predict what asset need to be maintained fully leveraging on the operator experience. However, ML models in modern industrial MLOps pipelines demand continuous data collection, model re-training, testing, and monitoring, creating a large technical debt. In fact, one of the main requirements of these pipelines is to not be regressive, i.e., not simply improve average performances but also not incorrectly predicting an output that was correctly classified by the reference model (negative flips). In this work we face this problem by empowering the proposed ML with Non Regressive properties. Results on real data coming from a portion of an Italian Railway Network managed by Rete Ferroviaria Italiana, the Italian Infrastructure Manager, will support our proposal.
Signals, track circuits, switches, and relay rooms are simultaneously the most critical and most maintained railway assets. A fault of one of these assets may strongly reduce the railway network capacity or even disrupt the circulation. Effectively predicting what assets may need maintenance allows to anticipate the intervention thus avoiding a failure. Currently, this problem is tackled by infrastructure managers mostly relying on operators' experience and with limited support of decision supporting tools. In this paper, we propose a Simple Informed Machine Learning (ML) based model able to automatically predict what asset need to be maintained fully leveraging on the operator experience. However, ML models in modern industrial MLOps pipelines demand continuous data collection, model re-training, testing, and monitoring, creating a large technical debt. In fact, one of the main requirements of these pipelines is to not be regressive, i.e., not simply improve average performances but also not incorrectly predicting an output that was correctly classified by the reference model (negative flips). In this work we face this problem by empowering the proposed ML with Non Regressive properties. Results on real data coming from a portion of an Italian Railway Network managed by Rete Ferroviaria Italiana, the Italian Infrastructure Manager, will support our proposal.
Price direction prediction in financial markets, using Random Forest and Adaboost
Mohammadmahdi Ghahramani, Fabio Aiolli
https://doi.org/10.14428/esann/2022.ES2022-115
Mohammadmahdi Ghahramani, Fabio Aiolli
https://doi.org/10.14428/esann/2022.ES2022-115
Abstract:
Experience shows trading in financial markets can be highly profitable. In this light, a great deal of effort has been devoted to using machine learning to predict market behavior. By using Random Forest and Adaboost models, we present a novel method for modeling candlestick patterns in financial markets. Our first contribution in the preprocessing part is to prepare data, develop additional features, and modify data. Our second contribution is introducing a novel prediction approach, named dataset ensembling to predict daily prices. Using three-year daily Bitcoin prices, the models are trained, tuned and then tested on one year of unseen data, showing the feasibility of the approach in terms of accuracy.
Experience shows trading in financial markets can be highly profitable. In this light, a great deal of effort has been devoted to using machine learning to predict market behavior. By using Random Forest and Adaboost models, we present a novel method for modeling candlestick patterns in financial markets. Our first contribution in the preprocessing part is to prepare data, develop additional features, and modify data. Our second contribution is introducing a novel prediction approach, named dataset ensembling to predict daily prices. Using three-year daily Bitcoin prices, the models are trained, tuned and then tested on one year of unseen data, showing the feasibility of the approach in terms of accuracy.
Learning theory and principles
Multioutput Regression Neural Network Training via Gradient Boosting
seyedsaman emami, Gonzalo Martínez-Muñoz
https://doi.org/10.14428/esann/2022.ES2022-95
seyedsaman emami, Gonzalo Martínez-Muñoz
https://doi.org/10.14428/esann/2022.ES2022-95
Abstract:
A novel sequential procedure to train the final layers of a multi-output regression neural network (NN) based on Gradient Boosting is proposed, where the NN is an additive expansion of the Gradient Boosting. The method works by training portions of the network in an iterative manner in such a way that each new portion of the NN is learnt to compensate for the errors of the already trained portions, and the final result of the network forms by provided weight and the last hidden layer output. This is in contrast to the standard training of NNs in which the whole network is trained to learn the concept at hand. Extensive experiments show the good performance of the proposed method with respect to NN.
A novel sequential procedure to train the final layers of a multi-output regression neural network (NN) based on Gradient Boosting is proposed, where the NN is an additive expansion of the Gradient Boosting. The method works by training portions of the network in an iterative manner in such a way that each new portion of the NN is learnt to compensate for the errors of the already trained portions, and the final result of the network forms by provided weight and the last hidden layer output. This is in contrast to the standard training of NNs in which the whole network is trained to learn the concept at hand. Extensive experiments show the good performance of the proposed method with respect to NN.
Do We Really Need a New Theory to Understand the Double-Descent?
Luca Oneto, Sandro Ridella, Davide Anguita
https://doi.org/10.14428/esann/2022.ES2022-13
Luca Oneto, Sandro Ridella, Davide Anguita
https://doi.org/10.14428/esann/2022.ES2022-13
Abstract:
This century saw an unprecedented increase of public and private investments in Artificial Intelligence (AI) and especially in Machine Learning (ML). This led to breakthroughs in their practical ability to solve complex real world problems impacting research and society at large. Instead, our ability to understand the fundamental mechanism behind these breakthroughs has slowed down because of their increased complexity. This questioned researchers about the necessity for a new theoretical framework able to help researchers catch up on this lag. One of the still not well understood mechanisms is the so called over-parametrization, namely the ability of certain models to increasing their generalization performance (reduce test error) when the number of parameters is above the interpolating threshold (zero training error), and the associated double-descent curve. In this paper we will show that this phenomena can be better understood using both known theories, i.e., the algorithmic stability theory, and empirical evidence.
This century saw an unprecedented increase of public and private investments in Artificial Intelligence (AI) and especially in Machine Learning (ML). This led to breakthroughs in their practical ability to solve complex real world problems impacting research and society at large. Instead, our ability to understand the fundamental mechanism behind these breakthroughs has slowed down because of their increased complexity. This questioned researchers about the necessity for a new theoretical framework able to help researchers catch up on this lag. One of the still not well understood mechanisms is the so called over-parametrization, namely the ability of certain models to increasing their generalization performance (reduce test error) when the number of parameters is above the interpolating threshold (zero training error), and the associated double-descent curve. In this paper we will show that this phenomena can be better understood using both known theories, i.e., the algorithmic stability theory, and empirical evidence.
Filtering participants improves generalization in competitions and benchmarks
Adrien Pavao, Isabelle Guyon, Zhengying Liu
https://doi.org/10.14428/esann/2022.ES2022-72
Adrien Pavao, Isabelle Guyon, Zhengying Liu
https://doi.org/10.14428/esann/2022.ES2022-72
Abstract:
We address the problem of selecting a winning algorithm in a challenge or benchmark. While evaluations of algorithms carried out by third party organizers eliminate the inventor-evaluator bias, little attention has been paid to the risk of over-fitting the winner's selection by the organizers. In this paper, we carry out an empirical evaluation using the results of several challenges and benchmarks, evidencing this phenomenon. We show that a heuristic commonly used by organizers consisting of pre-filtering participants using a trial run, reduces over-fitting. We formalize this method and derive a semi-empirical formula to determine the optimal number of top k participants to retain from the trial run.
We address the problem of selecting a winning algorithm in a challenge or benchmark. While evaluations of algorithms carried out by third party organizers eliminate the inventor-evaluator bias, little attention has been paid to the risk of over-fitting the winner's selection by the organizers. In this paper, we carry out an empirical evaluation using the results of several challenges and benchmarks, evidencing this phenomenon. We show that a heuristic commonly used by organizers consisting of pre-filtering participants using a trial run, reduces over-fitting. We formalize this method and derive a semi-empirical formula to determine the optimal number of top k participants to retain from the trial run.
Sliced-Wasserstein normalizing flows: beyond maximum likelihood training
Florentin Coeurdoux, Nicolas Dobigeon, Pierre Chainais
https://doi.org/10.14428/esann/2022.ES2022-101
Florentin Coeurdoux, Nicolas Dobigeon, Pierre Chainais
https://doi.org/10.14428/esann/2022.ES2022-101
Abstract:
Despite their advantages, normalizing flows generally suffer from several shortcomings including their tendency to generate unrealistic data (e.g., images) and their failing to detect out-of-distribution data. One reason for these deficiencies lies in the training strategy which traditionally exploits a maximum likelihood principle only. This paper proposes a new training paradigm based on a hybrid objective function combining the maximum likelihood principle (MLE) and a Sliced-Wasserstein distance. Results obtained on synthetic toy examples and real image data sets show better generative abilities in terms of both likelihood and visual aspects of the generated samples. Reciprocally, the proposed approach leads to a lower likelihood of out-of-distribution data, demonstrating a greater data fidelity of the resulting flows.
Despite their advantages, normalizing flows generally suffer from several shortcomings including their tendency to generate unrealistic data (e.g., images) and their failing to detect out-of-distribution data. One reason for these deficiencies lies in the training strategy which traditionally exploits a maximum likelihood principle only. This paper proposes a new training paradigm based on a hybrid objective function combining the maximum likelihood principle (MLE) and a Sliced-Wasserstein distance. Results obtained on synthetic toy examples and real image data sets show better generative abilities in terms of both likelihood and visual aspects of the generated samples. Reciprocally, the proposed approach leads to a lower likelihood of out-of-distribution data, demonstrating a greater data fidelity of the resulting flows.
A Fast and Simple Evolution Strategy with Covariance Matrix Estimation
Oliver Kramer
https://doi.org/10.14428/esann/2022.ES2022-112
Oliver Kramer
https://doi.org/10.14428/esann/2022.ES2022-112
Abstract:
With the rise of A.I. methods the demand for efficient optimization methods that are easy to implement and use increases. This paper introduces a simple optimization method for numerical blackbox optimization. It proposes to apply covariance matrix estimation for the (1+1)-ES with Rechenberg's step size control. Experiments on a small set of benchmark functions demonstrate that the approach outperforms its isotropic variant allowing competitive convergence on problems with scaled and correlated dimensions.
With the rise of A.I. methods the demand for efficient optimization methods that are easy to implement and use increases. This paper introduces a simple optimization method for numerical blackbox optimization. It proposes to apply covariance matrix estimation for the (1+1)-ES with Rechenberg's step size control. Experiments on a small set of benchmark functions demonstrate that the approach outperforms its isotropic variant allowing competitive convergence on problems with scaled and correlated dimensions.
Constraint Guided Gradient Descent: Guided Training with Inequality Constraints
Quinten Van Baelen, Peter Karsmakers
https://doi.org/10.14428/esann/2022.ES2022-105
Quinten Van Baelen, Peter Karsmakers
https://doi.org/10.14428/esann/2022.ES2022-105
Abstract:
Deep learning is typically performed by learning a neural network solely from data in the form of input-output pairs ignoring available domain knowledge. In this work, the Constraint Guided Gradient Descent (CGGD) framework is proposed that enables the injection of domain knowledge into the training procedure. The domain knowledge is assumed to be described as a conjunction of hard inequality constraints which appears to be a natural choice for several applications. Compared to other neuro-symbolic approaches, the proposed method converges to a model that satisfies any inequality constraint on the training data and does not require to first transform the constraints into some ad-hoc term that is added to the learning (optimisation) objective. Under certain conditions, it is shown that CGGD can converges to a model that satisfies the constraints on the training set, while prior work does not necessarily converge to such a model. It is empirically shown on two independent and small data sets that CGGD makes training less dependent on the initialisation of the network and improves the constraint satisfiability on all data.
Deep learning is typically performed by learning a neural network solely from data in the form of input-output pairs ignoring available domain knowledge. In this work, the Constraint Guided Gradient Descent (CGGD) framework is proposed that enables the injection of domain knowledge into the training procedure. The domain knowledge is assumed to be described as a conjunction of hard inequality constraints which appears to be a natural choice for several applications. Compared to other neuro-symbolic approaches, the proposed method converges to a model that satisfies any inequality constraint on the training data and does not require to first transform the constraints into some ad-hoc term that is added to the learning (optimisation) objective. Under certain conditions, it is shown that CGGD can converges to a model that satisfies the constraints on the training set, while prior work does not necessarily converge to such a model. It is empirically shown on two independent and small data sets that CGGD makes training less dependent on the initialisation of the network and improves the constraint satisfiability on all data.
Bayes Point Rule Set Learning
Mirko Polato, Fabio Aiolli, Luca Bergamin, Tommaso Carraro
https://doi.org/10.14428/esann/2022.ES2022-108
Mirko Polato, Fabio Aiolli, Luca Bergamin, Tommaso Carraro
https://doi.org/10.14428/esann/2022.ES2022-108
Abstract:
This paper proposes an effective bottom-up extension of the popular FIND-S algorithm to learn (monotone) DNF-type rulesets. The algorithm greedily finds a partition of the positive examples. The produced monotone DNF is a set of conjunctive rules, each corresponding to the most specific rule consistent with a part of positive and all negative examples. We also propose two principled extensions of this method, approximating the Bayes Optimal Classifier by aggregating monotone DNF decision rules. Finally, we provide a methodology to improve the explainability of the learned rules while retaining their generalization capabilities. An extensive comparison with state-of-the-art symbolic and statistical methods on several benchmark data sets shows that our proposal provides an excellent balance between explainability and accuracy.
This paper proposes an effective bottom-up extension of the popular FIND-S algorithm to learn (monotone) DNF-type rulesets. The algorithm greedily finds a partition of the positive examples. The produced monotone DNF is a set of conjunctive rules, each corresponding to the most specific rule consistent with a part of positive and all negative examples. We also propose two principled extensions of this method, approximating the Bayes Optimal Classifier by aggregating monotone DNF decision rules. Finally, we provide a methodology to improve the explainability of the learned rules while retaining their generalization capabilities. An extensive comparison with state-of-the-art symbolic and statistical methods on several benchmark data sets shows that our proposal provides an excellent balance between explainability and accuracy.
Neural-network-based estimation of normal distributions in black-box optimization
Jiří Tumpach, Jan Koza, Martin Holeňa
https://doi.org/10.14428/esann/2022.ES2022-113
Jiří Tumpach, Jan Koza, Martin Holeňa
https://doi.org/10.14428/esann/2022.ES2022-113
Abstract:
The paper presents a novel application of artificial neural networks (ANNs) in the context of surrogate models for black-box optimization, i.e. optimization of objective functions that are accessed through empirical evaluation. For active learning of surrogate models, a very important role plays learning of multidimensional normal distributions, for which Gaussian processes (GPs) have been traditionally used. On the other hand, the research reported in this paper evaluated the applicability of two ANN-based methods to this end: combining GPs with ANNs and learning normal distributions with evidential ANNs. After methods sketch, the paper brings their comparison on a large collection of data from surrogate-assisted black-box optimization. It shows that combining GPs using linear covariance functions with ANNs yields lower errors than the investigated methods of evidential learning.
The paper presents a novel application of artificial neural networks (ANNs) in the context of surrogate models for black-box optimization, i.e. optimization of objective functions that are accessed through empirical evaluation. For active learning of surrogate models, a very important role plays learning of multidimensional normal distributions, for which Gaussian processes (GPs) have been traditionally used. On the other hand, the research reported in this paper evaluated the applicability of two ANN-based methods to this end: combining GPs with ANNs and learning normal distributions with evidential ANNs. After methods sketch, the paper brings their comparison on a large collection of data from surrogate-assisted black-box optimization. It shows that combining GPs using linear covariance functions with ANNs yields lower errors than the investigated methods of evidential learning.
Deep learning, signal, image
Feature Compression Using Dynamic Switches in Multi-split CNNs
Suresh Kirthi Kumaraswamy, Alexey Ozerov, Ngoc Q. K. Duong, Anne Lambert, François Schnitzler, Patrick Fontaine
https://doi.org/10.14428/esann/2022.ES2022-18
Suresh Kirthi Kumaraswamy, Alexey Ozerov, Ngoc Q. K. Duong, Anne Lambert, François Schnitzler, Patrick Fontaine
https://doi.org/10.14428/esann/2022.ES2022-18
Abstract:
Convolutional neural networks (CNN) are often computationally demanding for mobile devices. Offloading some computation lowers this burden: initial convolutional layers are processed on a smartphone, the resulting high dimensional features transmitted, and latter layers processed in the cloud/edge/another device. To improve this process, we propose Dynamic Switch, a convolutional subnetwork enabling anywhere splittable CNNs with multirate feature compression using a single set of network parameters. We achieve 90% feature compression with at most 3% accuracy loss for MobileNet and MSDNet on ImageNet dataset and at most 4.58% on CIFAR100 dataset with MSDNet, ResNet-18, MobileNet/MobileNetv2 and ShuffleNet/ShuffleNetv2.
Convolutional neural networks (CNN) are often computationally demanding for mobile devices. Offloading some computation lowers this burden: initial convolutional layers are processed on a smartphone, the resulting high dimensional features transmitted, and latter layers processed in the cloud/edge/another device. To improve this process, we propose Dynamic Switch, a convolutional subnetwork enabling anywhere splittable CNNs with multirate feature compression using a single set of network parameters. We achieve 90% feature compression with at most 3% accuracy loss for MobileNet and MSDNet on ImageNet dataset and at most 4.58% on CIFAR100 dataset with MSDNet, ResNet-18, MobileNet/MobileNetv2 and ShuffleNet/ShuffleNetv2.
Hyperspectral Wavelength Analysis with U-Net for Larynx Cancer Detection
Felix Meyer-Veit, Rania Rayyes, Andreas O. H. Gerstner, Jochen J. Steil
https://doi.org/10.14428/esann/2022.ES2022-100
Felix Meyer-Veit, Rania Rayyes, Andreas O. H. Gerstner, Jochen J. Steil
https://doi.org/10.14428/esann/2022.ES2022-100
Abstract:
Early detection of laryngeal tumors is critical for their successful therapy. In this paper, we investigate how hyperspectral (HS) imaging can contribute to this aim based on an in-vivo data set of 13 HS image cubes recorded in clinical practice. We perform semantic segmentation with a tailored U-Net trained on labels provided by the clinicians. We specifically investigate the influence of exposure time during image acquisition, the suitable wavelengths to determine the most informative image channels, and present quantitative results on accuracy and the AUC measure.
Early detection of laryngeal tumors is critical for their successful therapy. In this paper, we investigate how hyperspectral (HS) imaging can contribute to this aim based on an in-vivo data set of 13 HS image cubes recorded in clinical practice. We perform semantic segmentation with a tailored U-Net trained on labels provided by the clinicians. We specifically investigate the influence of exposure time during image acquisition, the suitable wavelengths to determine the most informative image channels, and present quantitative results on accuracy and the AUC measure.
Lightening CNN architectures by regularization driven weights' pruning
Giovanni Bonetta, Rossella Cancelliere
https://doi.org/10.14428/esann/2022.ES2022-102
Giovanni Bonetta, Rossella Cancelliere
https://doi.org/10.14428/esann/2022.ES2022-102
Abstract:
Deep learning models are getting increasingly big, leading towards overparametrized architectures with high computational and storage requirements. This hinders the possibility to train/deploy them on IoT or mobile devices, while also creating concerns about their environmental fingerprint. We propose a regularization technique which allows to selectively shrink the norm of non significant weights in order to subsequently prune them, generating highly compressed models. We tested the proposed technique on three well known image classification tasks, obtaining results on par or better than competitors in terms of sparsity and metrics.
Deep learning models are getting increasingly big, leading towards overparametrized architectures with high computational and storage requirements. This hinders the possibility to train/deploy them on IoT or mobile devices, while also creating concerns about their environmental fingerprint. We propose a regularization technique which allows to selectively shrink the norm of non significant weights in order to subsequently prune them, generating highly compressed models. We tested the proposed technique on three well known image classification tasks, obtaining results on par or better than competitors in terms of sparsity and metrics.
1D vs 2D convolutional neural networks for scalp high frequency oscillations identification
Gaëlle MILON-HARNOIS, Nisrine JRAD, Daniel Schang, Patrick VAN BOGAERT, Pierre CHAUVET
https://doi.org/10.14428/esann/2022.ES2022-84
Gaëlle MILON-HARNOIS, Nisrine JRAD, Daniel Schang, Patrick VAN BOGAERT, Pierre CHAUVET
https://doi.org/10.14428/esann/2022.ES2022-84
Abstract:
Scalp High Frequency Oscillations (HFOs) are promising biomarkers of epileptogenic zones. Since HFOs visual detection is strenuous, there is a real need to develop accurate HFOs automatic detectors. In this paper, we present a comparative study of two detectors: one-dimensional (1D) Convolutional Neural Networks (CNN) running on High-Density Electroencephalograms signals and two dimensional (2D) CNN on time-frequency maps of those signals. Experimental results show that 1D-CNN enables easy end-to-end learning of preprocessing, feature extraction and classification modules while achieving competitive performance.
Scalp High Frequency Oscillations (HFOs) are promising biomarkers of epileptogenic zones. Since HFOs visual detection is strenuous, there is a real need to develop accurate HFOs automatic detectors. In this paper, we present a comparative study of two detectors: one-dimensional (1D) Convolutional Neural Networks (CNN) running on High-Density Electroencephalograms signals and two dimensional (2D) CNN on time-frequency maps of those signals. Experimental results show that 1D-CNN enables easy end-to-end learning of preprocessing, feature extraction and classification modules while achieving competitive performance.
Deep latent position model for node clustering in graphs
Dingge Liang, Marco Corneli, Charles Bouveyron, Pierre Latouche
https://doi.org/10.14428/esann/2022.ES2022-30
Dingge Liang, Marco Corneli, Charles Bouveyron, Pierre Latouche
https://doi.org/10.14428/esann/2022.ES2022-30
Abstract:
With the significant increase of interactions between individuals through numeric means, the clustering of vertex in graphs has become a fundamental approach for analysing large and complex networks. We propose here the deep latent position model (DeepLPM), an end-to-end clustering approach which combines the widely used latent position model (LPM) for network analysis with a graph convolutional network (GCN) encoding strategy. Thus, DeepLPM can automatically assign each node to its group without using any additional algorithms and better preserves the network topology. Numerical experiments on simulated data and an application on the Cora citation network are conducted to demonstrate its effectiveness and interest in performing unsupervised clustering tasks.
With the significant increase of interactions between individuals through numeric means, the clustering of vertex in graphs has become a fundamental approach for analysing large and complex networks. We propose here the deep latent position model (DeepLPM), an end-to-end clustering approach which combines the widely used latent position model (LPM) for network analysis with a graph convolutional network (GCN) encoding strategy. Thus, DeepLPM can automatically assign each node to its group without using any additional algorithms and better preserves the network topology. Numerical experiments on simulated data and an application on the Cora citation network are conducted to demonstrate its effectiveness and interest in performing unsupervised clustering tasks.
Deep Convolutional Neural Networks with Sequentially Semiseparable Weight Matrices
Matthias Kissel, Klaus Diepold
https://doi.org/10.14428/esann/2022.ES2022-21
Matthias Kissel, Klaus Diepold
https://doi.org/10.14428/esann/2022.ES2022-21
Abstract:
Modern Convolutional Neural Networks (CNNs) comprise millions of parameters. Therefore, the use of these networks requires high computing and memory resources. We propose to reduce these resource requirements by using structured matrices. For that, we replace weight matrices of the fully connected classifier part of several pre-trained CNNs by Sequentially Semiseparable (SSS) Matrices. By that, the number of parameters in these layers can be reduced drastically, as well as the number of operations required for evaluating the layer. We show that the combination of approximating the original weight matrices with SSS matrices followed by gradient-descent based training leads to the best prediction results (compared to just approximating or training from scratch).
Modern Convolutional Neural Networks (CNNs) comprise millions of parameters. Therefore, the use of these networks requires high computing and memory resources. We propose to reduce these resource requirements by using structured matrices. For that, we replace weight matrices of the fully connected classifier part of several pre-trained CNNs by Sequentially Semiseparable (SSS) Matrices. By that, the number of parameters in these layers can be reduced drastically, as well as the number of operations required for evaluating the layer. We show that the combination of approximating the original weight matrices with SSS matrices followed by gradient-descent based training leads to the best prediction results (compared to just approximating or training from scratch).
Deep networks with ReLU activation functions can be smooth statistical models
Joseph Rynkiewicz
https://doi.org/10.14428/esann/2022.ES2022-20
Joseph Rynkiewicz
https://doi.org/10.14428/esann/2022.ES2022-20
Abstract:
Most Deep neural networks use ReLU activation functions. Since these functions are not differentiable in $0$, we may believe that such models may have irregular behavior. In this paper, we will show that the issue is more in the data than in the model, and if the data are ``smooth'', the model will be differentiable in a suitable sense. We give a striking illustration of this fact with the example of adversarial attacks.
Most Deep neural networks use ReLU activation functions. Since these functions are not differentiable in $0$, we may believe that such models may have irregular behavior. In this paper, we will show that the issue is more in the data than in the model, and if the data are ``smooth'', the model will be differentiable in a suitable sense. We give a striking illustration of this fact with the example of adversarial attacks.
PCA improves the adversarial robustness of neural networks
István Megyeri, Ammar Al-Najjar
https://doi.org/10.14428/esann/2022.ES2022-96
István Megyeri, Ammar Al-Najjar
https://doi.org/10.14428/esann/2022.ES2022-96
Abstract:
Deep neural networks perform well in many visual recognition tasks, but they are sensitive to adversarial input perturbation. More robust models can be learned when attacks are applied to the training data or preprocessing is used. However, the effect of preprocessing is frequently underestimated and it has not received sufficient attention as it usually does not affect the network's clean accuracy. Here, we seek to demonstrate that preprocessing can play a role in improving adversarial robustness. Our empirical results show that principal component analysis, a simple yet effective preprocessing method, can significantly improve neural networks' robustness for both regular and adversarial training.
Deep neural networks perform well in many visual recognition tasks, but they are sensitive to adversarial input perturbation. More robust models can be learned when attacks are applied to the training data or preprocessing is used. However, the effect of preprocessing is frequently underestimated and it has not received sufficient attention as it usually does not affect the network's clean accuracy. Here, we seek to demonstrate that preprocessing can play a role in improving adversarial robustness. Our empirical results show that principal component analysis, a simple yet effective preprocessing method, can significantly improve neural networks' robustness for both regular and adversarial training.
Battery detection of XRay images using transfer learning
Nermeen Abou Baker, David Rohrschneider, Uwe Handmann
https://doi.org/10.14428/esann/2022.ES2022-60
Nermeen Abou Baker, David Rohrschneider, Uwe Handmann
https://doi.org/10.14428/esann/2022.ES2022-60
Abstract:
The need for detecting and sorting batteries is drastically increasing for many applications. This study proves the potential of transfer learning in predicting whether the image contains a battery or not, the location and identifying three types of batteries, namely: prismatic, pouch, and cylindrical Lithium-Ion Batteries (LIB). Particularly, it focuses on the transfer learning method in two applications: Training a large-scale dataset to detect electronic devices using a pre-trained YOLOv5m, then using these latter trained weights to detect and classify the batteries. The precision of battery detection achieves 94%, which outperforms the pre-trained YOLOv5m weights with 5%, in 22ms inference time.
The need for detecting and sorting batteries is drastically increasing for many applications. This study proves the potential of transfer learning in predicting whether the image contains a battery or not, the location and identifying three types of batteries, namely: prismatic, pouch, and cylindrical Lithium-Ion Batteries (LIB). Particularly, it focuses on the transfer learning method in two applications: Training a large-scale dataset to detect electronic devices using a pre-trained YOLOv5m, then using these latter trained weights to detect and classify the batteries. The precision of battery detection achieves 94%, which outperforms the pre-trained YOLOv5m weights with 5%, in 22ms inference time.
Real-time capable Ensemble Estimation for 2D Object Detection
Lukas Enderich, Simon Heming
https://doi.org/10.14428/esann/2022.ES2022-41
Lukas Enderich, Simon Heming
https://doi.org/10.14428/esann/2022.ES2022-41
Abstract:
Deep neural networks tend to make overconfident predictions. Although ensemble methods improve the predictive performance by producing better calibrated confidences, they are computationally expensive. Thus, we propose a real-time capable ensemble method for object detection that significantly improves the performance with only a minor increase in runtime. Our method diversifies the prediction of the class probabilities on the anchor space using multiple classification heads. A regularization further increases the diversity of the heads, making ensemble distillation unnecessary. On the KITTI benchmark dataset, our approach increases the mean average precision of an SSD based network from 0.58 to 0.71.
Deep neural networks tend to make overconfident predictions. Although ensemble methods improve the predictive performance by producing better calibrated confidences, they are computationally expensive. Thus, we propose a real-time capable ensemble method for object detection that significantly improves the performance with only a minor increase in runtime. Our method diversifies the prediction of the class probabilities on the anchor space using multiple classification heads. A regularization further increases the diversity of the heads, making ensemble distillation unnecessary. On the KITTI benchmark dataset, our approach increases the mean average precision of an SSD based network from 0.58 to 0.71.
Appearance-Context aware Axial Attention for Fashion Landmark Detection
Nikhil Kilari, Gaurab Bhattacharya, Pavan Kumar Reddy K, Jayavardhana Gubbi, Arpan Pal
https://doi.org/10.14428/esann/2022.ES2022-74
Nikhil Kilari, Gaurab Bhattacharya, Pavan Kumar Reddy K, Jayavardhana Gubbi, Arpan Pal
https://doi.org/10.14428/esann/2022.ES2022-74
Abstract:
Fashion landmark detection is a fundamental task in several fashion image analysis problems. The associated challenges involving non-rigid structures and variations in style and orientation makes it extremely hard to accurately detect the landmarks. In this paper, we propose Appearance-Context network (ACNet), which encapsulates both global and local contextual information extending the axial attention mechanism. We design axial attention augmented local appearance network and introduce a novel Global-Context aware axial attention module which aggregates the global features attending discriminatory cues across height, width and channel axes. The proposed ACNet architecture outperforms existing methods on two large-scale fashion landmark datasets.
Fashion landmark detection is a fundamental task in several fashion image analysis problems. The associated challenges involving non-rigid structures and variations in style and orientation makes it extremely hard to accurately detect the landmarks. In this paper, we propose Appearance-Context network (ACNet), which encapsulates both global and local contextual information extending the axial attention mechanism. We design axial attention augmented local appearance network and introduce a novel Global-Context aware axial attention module which aggregates the global features attending discriminatory cues across height, width and channel axes. The proposed ACNet architecture outperforms existing methods on two large-scale fashion landmark datasets.
ROP inception: signal estimation with quadratic random sketching
Remi Delogne, Vincent Schellekens, Laurent Jacques
https://doi.org/10.14428/esann/2022.ES2022-97
Remi Delogne, Vincent Schellekens, Laurent Jacques
https://doi.org/10.14428/esann/2022.ES2022-97
Abstract:
Rank-one projections (ROP) of matrices and quadratic random sketching of signals support several data processing and machine learning methods, as well as recent imaging applications, such as phase retrieval or optical processing units. In this paper, we demonstrate how signal estimation can be operated directly through such quadratic sketches—equivalent to the ROPs of the "lifted signal" obtained as its outer product with itself—without explicitly reconstructing that signal. Our analysis relies on showing that, up to a minor debiasing trick, the ROP measurement operator satisfies a generalised sign product embedding (SPE) property. In a nutshell, the SPE shows that the scalar product of a signal sketch with the sign of the sketch of a given pattern approximates the square of the projection of that signal on this pattern. This thus amounts to an insertion (an inception) of a ROP model inside a ROP sketch. The effectiveness of our approach is evaluated in several synthetic experiments.
Rank-one projections (ROP) of matrices and quadratic random sketching of signals support several data processing and machine learning methods, as well as recent imaging applications, such as phase retrieval or optical processing units. In this paper, we demonstrate how signal estimation can be operated directly through such quadratic sketches—equivalent to the ROPs of the "lifted signal" obtained as its outer product with itself—without explicitly reconstructing that signal. Our analysis relies on showing that, up to a minor debiasing trick, the ROP measurement operator satisfies a generalised sign product embedding (SPE) property. In a nutshell, the SPE shows that the scalar product of a signal sketch with the sign of the sketch of a given pattern approximates the square of the projection of that signal on this pattern. This thus amounts to an insertion (an inception) of a ROP model inside a ROP sketch. The effectiveness of our approach is evaluated in several synthetic experiments.
Semi-synthetic Data for Automatic Drone Shadow Detection
Mohammed El Amine Mokhtari, Virginie Vandenbulcke, Sohaib Laraba, Matei Mancas, Elias Ennadifi, Mohamed Lamine Tazir, Bernard Gosselin
https://doi.org/10.14428/esann/2022.ES2022-82
Mohammed El Amine Mokhtari, Virginie Vandenbulcke, Sohaib Laraba, Matei Mancas, Elias Ennadifi, Mohamed Lamine Tazir, Bernard Gosselin
https://doi.org/10.14428/esann/2022.ES2022-82
Abstract:
In this paper, we deal with the problem of shadow detection of UAVs, which impacts their navigation. We propose to generate synthetic images containing shadows in random locations, backgrounds, sizes, and opacities in order to augment our dataset. The generated data is used to train and compare several models to effectively detect, in real-time, UAVs shadows which will help to stabilize their localization and navigation. Deep learning models such as SSD, YOLOv3, and YOLOv5 are tested for the detection part. With our approach, we achieved 99\% of the mean average precision when using the YOLOv5.
In this paper, we deal with the problem of shadow detection of UAVs, which impacts their navigation. We propose to generate synthetic images containing shadows in random locations, backgrounds, sizes, and opacities in order to augment our dataset. The generated data is used to train and compare several models to effectively detect, in real-time, UAVs shadows which will help to stabilize their localization and navigation. Deep learning models such as SSD, YOLOv3, and YOLOv5 are tested for the detection part. With our approach, we achieved 99\% of the mean average precision when using the YOLOv5.
Deep learning for Parkinson’s disease symptom detection and severity evaluation using accelerometer signal
Tomasz Gutowski
https://doi.org/10.14428/esann/2022.ES2022-107
Tomasz Gutowski
https://doi.org/10.14428/esann/2022.ES2022-107
Abstract:
This paper presents a neural network for predicting the severity/presence of Parkinson’s disease motor symptoms – tremor, bradykinesia and dyskinesia, based on accelerometer signals collected while the patient is executing selected tasks. The suggested network uses accelerometer signals as input along with the type of completed task and the side the device is worn on. The data was collected in the Levodopa Response Study funded by MJFF. The model has been trained for every symptom separately and the results have helped to identify the tasks that result in the best accuracy of symptom detection and evaluation.
This paper presents a neural network for predicting the severity/presence of Parkinson’s disease motor symptoms – tremor, bradykinesia and dyskinesia, based on accelerometer signals collected while the patient is executing selected tasks. The suggested network uses accelerometer signals as input along with the type of completed task and the side the device is worn on. The data was collected in the Levodopa Response Study funded by MJFF. The model has been trained for every symptom separately and the results have helped to identify the tasks that result in the best accuracy of symptom detection and evaluation.
Anomaly and change point detection
Challenges in anomaly and change point detection
Madalina Olteanu, Fabrice Rossi, Florian Yger
https://doi.org/10.14428/esann/2022.ES2022-6
Madalina Olteanu, Fabrice Rossi, Florian Yger
https://doi.org/10.14428/esann/2022.ES2022-6
Abstract:
This paper presents an introduction to the state-of-the-art in anomaly and change-point detection. On the one hand, the main concepts needed to understand the vast scientific literature on those subjects are introduced. On the other, a selection of important surveys and books, as well as two selected active research topics in the field, are presented.
This paper presents an introduction to the state-of-the-art in anomaly and change-point detection. On the one hand, the main concepts needed to understand the vast scientific literature on those subjects are introduced. On the other, a selection of important surveys and books, as well as two selected active research topics in the field, are presented.
Anomaly detections on the oil system of a turbofan engine by a neural autoencoder
Jean Coussirou, Thomas Vanaret, Jérôme Lacaille
https://doi.org/10.14428/esann/2022.ES2022-24
Jean Coussirou, Thomas Vanaret, Jérôme Lacaille
https://doi.org/10.14428/esann/2022.ES2022-24
Abstract:
The turbofan engine uses oil to lubricate and cool its components. This extremely sensitive system can cause in-flight engine shutdowns in the event of a failure. This article presents the implementation of a fully automatic anomaly detection system capable of detecting both known phenomena and exceptional cases using weak signals.
The turbofan engine uses oil to lubricate and cool its components. This extremely sensitive system can cause in-flight engine shutdowns in the event of a failure. This article presents the implementation of a fully automatic anomaly detection system capable of detecting both known phenomena and exceptional cases using weak signals.
Contrasting Explanation of Concept Drift
Fabian Hinder, André Artelt, Valerie Vaquet, Barbara Hammer
https://doi.org/10.14428/esann/2022.ES2022-71
Fabian Hinder, André Artelt, Valerie Vaquet, Barbara Hammer
https://doi.org/10.14428/esann/2022.ES2022-71
Abstract:
The notion of concept drift refers to the phenomenon that the distribution, which is underlying the observed data, changes over time; as a consequence machine learning models may become inaccurate and need adjustment. While there do exist methods to detect concept drift or to adjust models in the presence of observed drift, the question of explaining drift is still widely unsolved. This problem is of importance, since it enables an understanding of the most prominent drift characteristics. In this work we propose to explain concept drift by means of contrasting explanations describing characteristic changes of spatial features. We demonstrate the usefulness of the explanation in several examples.
The notion of concept drift refers to the phenomenon that the distribution, which is underlying the observed data, changes over time; as a consequence machine learning models may become inaccurate and need adjustment. While there do exist methods to detect concept drift or to adjust models in the presence of observed drift, the question of explaining drift is still widely unsolved. This problem is of importance, since it enables an understanding of the most prominent drift characteristics. In this work we propose to explain concept drift by means of contrasting explanations describing characteristic changes of spatial features. We demonstrate the usefulness of the explanation in several examples.
Anomaly detection and representation learning in an instrumented railway bridge
Yacine Bel-Hadj, Wout Weijtjens, Francisco de Nolasco Santos
https://doi.org/10.14428/esann/2022.ES2022-29
Yacine Bel-Hadj, Wout Weijtjens, Francisco de Nolasco Santos
https://doi.org/10.14428/esann/2022.ES2022-29
Abstract:
In this contribution, the strain measurements of a railway bridge are used for anomaly detection, in the context of Structural Health Monitoring (SHM). The methodology used is a combination of a sparse convolutional autoencoder (CSAE) and a Mahalanobis distance. Due to the lack of labeled anomalous data, a simulated fault is used to evaluate the performance of the algorithm. The proposed approach far outperforms the classical feature-based approach. Finally, the latent dimension of the autoencoder is studied and shown to be structured and representative of the underlying physics of the problem.
In this contribution, the strain measurements of a railway bridge are used for anomaly detection, in the context of Structural Health Monitoring (SHM). The methodology used is a combination of a sparse convolutional autoencoder (CSAE) and a Mahalanobis distance. Due to the lack of labeled anomalous data, a simulated fault is used to evaluate the performance of the algorithm. The proposed approach far outperforms the classical feature-based approach. Finally, the latent dimension of the autoencoder is studied and shown to be structured and representative of the underlying physics of the problem.
Deep Semantic Segmentation Models in Computer Vision
Deep Semantic Segmentation Models in Computer Vision
Paolo Andreini, Giovanna Maria Dimitri
https://doi.org/10.14428/esann/2022.ES2022-5
Paolo Andreini, Giovanna Maria Dimitri
https://doi.org/10.14428/esann/2022.ES2022-5
Abstract:
Recently, deep learning models have had a huge impact on computer vision applications, in particular in semantic segmentation, in which many challenges are open. As an example, the lack of large annotated datasets implies the need for new semi-supervised and unsupervised techniques. This problem is particularly relevant in the medical field due to privacy issues and high costs of image tagging by medical experts. The aim of this tutorial overview paper is to provide a short overview of the recent results and advances regarding deep learning applications in computer vision particularly for what concerns semantic segmentation
Recently, deep learning models have had a huge impact on computer vision applications, in particular in semantic segmentation, in which many challenges are open. As an example, the lack of large annotated datasets implies the need for new semi-supervised and unsupervised techniques. This problem is particularly relevant in the medical field due to privacy issues and high costs of image tagging by medical experts. The aim of this tutorial overview paper is to provide a short overview of the recent results and advances regarding deep learning applications in computer vision particularly for what concerns semantic segmentation
Deep Semantic Segmentation in Skin Detection
Daniela Cuza, Andrea Loreggia, Alessandra Lumini, Loris Nanni
https://doi.org/10.14428/esann/2022.ES2022-35
Daniela Cuza, Andrea Loreggia, Alessandra Lumini, Loris Nanni
https://doi.org/10.14428/esann/2022.ES2022-35
Abstract:
Deep semantic segmentation is a task that identifies objects and their boundaries in images, to do that a classification task is performed at the pixel level to tag whether a pixel belongs to an object. In skin detection, areas of images are classified as skin or non-skin regions. In this work, we report a short survey of the recent literature covering the task to help researchers in selecting the most suitable method for their application and to expand the knowledge about the available datasets for this topic. A compact empirical evaluation comparing recent models and a new ensemble model is reported.
Deep semantic segmentation is a task that identifies objects and their boundaries in images, to do that a classification task is performed at the pixel level to tag whether a pixel belongs to an object. In skin detection, areas of images are classified as skin or non-skin regions. In this work, we report a short survey of the recent literature covering the task to help researchers in selecting the most suitable method for their application and to expand the knowledge about the available datasets for this topic. A compact empirical evaluation comparing recent models and a new ensemble model is reported.
A weakly supervised approach to skin lesion segmentation
Simone Bonechi
https://doi.org/10.14428/esann/2022.ES2022-46
Simone Bonechi
https://doi.org/10.14428/esann/2022.ES2022-46
Abstract:
Early detection of skin cancers greatly increases patients' chances of recovery. To support dermatologists in this diagnosis, many decision support systems based on Convolutional Neural Networks have recently been proposed to segment the lesion and classify it. The use of the information coming from the segmentation, as an additional input to the classifier, proved to be fundamental to increase its performance and, in fact, the shape of the lesion is of diagnostic importance unanimously recognized by clinicians. However, in the ISIC database, the public reference dataset that collects a huge number of skin lesion images, all samples are labeled for classification but only a very small fraction of them are also labeled for segmentation. To overcome this limitation, the present paper proposes a weakly supervised approach to extract the segmentation label maps of approximately 43,000 ISIC images, used to train a segmentation network, with very promising performance.
Early detection of skin cancers greatly increases patients' chances of recovery. To support dermatologists in this diagnosis, many decision support systems based on Convolutional Neural Networks have recently been proposed to segment the lesion and classify it. The use of the information coming from the segmentation, as an additional input to the classifier, proved to be fundamental to increase its performance and, in fact, the shape of the lesion is of diagnostic importance unanimously recognized by clinicians. However, in the ISIC database, the public reference dataset that collects a huge number of skin lesion images, all samples are labeled for classification but only a very small fraction of them are also labeled for segmentation. To overcome this limitation, the present paper proposes a weakly supervised approach to extract the segmentation label maps of approximately 43,000 ISIC images, used to train a segmentation network, with very promising performance.
A Deep Learning approach for oocytes segmentation and analysis
Paolo Andreini, Niccolò Pancino, Filippo Costanti, Gabriele Eusepi, Barbara Toniella Corradini
https://doi.org/10.14428/esann/2022.ES2022-44
Paolo Andreini, Niccolò Pancino, Filippo Costanti, Gabriele Eusepi, Barbara Toniella Corradini
https://doi.org/10.14428/esann/2022.ES2022-44
Abstract:
Medical Assisted Procreation (MAP) has seen a sharp increase in demand over the past decade, due to a variety of reasons, including genetic factors, health conditions altered by stress and pollution, as well as delayed pregnancy and age-related loss of fertility. The success of MAP techniques is strongly correlated to the dexterity of a human operator, who is asked to classify and select healthy oocytes to fertilize and return to the uterus. This work describes a deep learning approach to the segmentation of oocyte images, to support operators in their selection, to improve the success probability of MAP.
Medical Assisted Procreation (MAP) has seen a sharp increase in demand over the past decade, due to a variety of reasons, including genetic factors, health conditions altered by stress and pollution, as well as delayed pregnancy and age-related loss of fertility. The success of MAP techniques is strongly correlated to the dexterity of a human operator, who is asked to classify and select healthy oocytes to fertilize and return to the uterus. This work describes a deep learning approach to the segmentation of oocyte images, to support operators in their selection, to improve the success probability of MAP.
Deep Learning Approaches for mice glomeruli segmentation
Duccio Meconcelli, Simone Bonechi, Giovanna Maria Dimitri
https://doi.org/10.14428/esann/2022.ES2022-40
Duccio Meconcelli, Simone Bonechi, Giovanna Maria Dimitri
https://doi.org/10.14428/esann/2022.ES2022-40
Abstract:
Deep learning (DL) is widely applied in biomedical image processing nowadays. In this paper, we propose the use of DL architectures for glomerulus segmentation in histopathological images of mouse kidneys. Indeed, in humans, the analysis of the glomeruli is fundamental to decide on the transplantability of the organ. However, no datasets with human samples are publicly available. Therefore, obtaining good segmentation performance on the kidneys of mice could be the first step for a transfer learning approach to humans. We compared the use of two well–known architectures for image segmentation, namely MobileNet and DeepLab V2. Both models showed very promising results.
Deep learning (DL) is widely applied in biomedical image processing nowadays. In this paper, we propose the use of DL architectures for glomerulus segmentation in histopathological images of mouse kidneys. Indeed, in humans, the analysis of the glomeruli is fundamental to decide on the transplantability of the organ. However, no datasets with human samples are publicly available. Therefore, obtaining good segmentation performance on the kidneys of mice could be the first step for a transfer learning approach to humans. We compared the use of two well–known architectures for image segmentation, namely MobileNet and DeepLab V2. Both models showed very promising results.
Detection and Localization of GAN Manipulated Multi-spectral Satellite Images
Lydia Abady, Giovanna Maria Dimitri, Mauro Barni
https://doi.org/10.14428/esann/2022.ES2022-39
Lydia Abady, Giovanna Maria Dimitri, Mauro Barni
https://doi.org/10.14428/esann/2022.ES2022-39
Abstract:
Owing to their realistic features and continuous improvements, images manipulated by Generative Adversarial Network (GAN) have become a compelling research topic. In this paper, we apply detection and localization to GAN manipulated images by means of models, based on EfficientNet-B4 architectures. Detection is tested on multiple generated multi-spectral datasets from several world regions and different GAN architectures, whereas localization is tested on an inpainted images dataset of sizes 2048×2048×13. The results obtained for both detection and localization are shown to be promising.
Owing to their realistic features and continuous improvements, images manipulated by Generative Adversarial Network (GAN) have become a compelling research topic. In this paper, we apply detection and localization to GAN manipulated images by means of models, based on EfficientNet-B4 architectures. Detection is tested on multiple generated multi-spectral datasets from several world regions and different GAN architectures, whereas localization is tested on an inpainted images dataset of sizes 2048×2048×13. The results obtained for both detection and localization are shown to be promising.
Regression and forecasting
Wind power forecasting based on bagging extreme learning machine ensemble model
Matheus Henrique Dal Molin Ribeiro, Sinvaldo Rodrigues Moreno, Ramon Gomes da Silva, José Henrique Kleinubing Larcher, Cristiane Canton, Viviana Cocco Mariani, Leandro dos Santos Coelho
https://doi.org/10.14428/esann/2022.ES2022-117
Matheus Henrique Dal Molin Ribeiro, Sinvaldo Rodrigues Moreno, Ramon Gomes da Silva, José Henrique Kleinubing Larcher, Cristiane Canton, Viviana Cocco Mariani, Leandro dos Santos Coelho
https://doi.org/10.14428/esann/2022.ES2022-117
Abstract:
The wind energy forecast is an useful tool for wind farm production planning, and operation, facilitating decision making in terms of maintenance, electricity market clearing, and load sharing. This study proposes a cooperative ensemble learning model, using time series pre-processing, multi-objective optimization, and artificial intelligence to forecast wind energy generation in two wind farms in Brazil. Multi-objective optimization is employed to combine variational mode decomposition-based components of a model with bootstrap aggregation (bagging) and extreme learning machine models. Forecasting accuracy is evaluated through the root mean squared error, mean absolute error, mean absolute percentage error, and Diebold-Mariano hypothesis test. The empirical results suggest that proposed ensemble learning model achieved better forecasting performance than bootstrap stacking, machine learning, artificial neural networks, and statistical models, with values of approximately 12.76%, 25.25%, 31.91%, and 34.76%, respectively, in terms of root mean squared errors reduction for out-of-sample forecasting.
The wind energy forecast is an useful tool for wind farm production planning, and operation, facilitating decision making in terms of maintenance, electricity market clearing, and load sharing. This study proposes a cooperative ensemble learning model, using time series pre-processing, multi-objective optimization, and artificial intelligence to forecast wind energy generation in two wind farms in Brazil. Multi-objective optimization is employed to combine variational mode decomposition-based components of a model with bootstrap aggregation (bagging) and extreme learning machine models. Forecasting accuracy is evaluated through the root mean squared error, mean absolute error, mean absolute percentage error, and Diebold-Mariano hypothesis test. The empirical results suggest that proposed ensemble learning model achieved better forecasting performance than bootstrap stacking, machine learning, artificial neural networks, and statistical models, with values of approximately 12.76%, 25.25%, 31.91%, and 34.76%, respectively, in terms of root mean squared errors reduction for out-of-sample forecasting.
Dynamics-aware Representation Learning via Multivariate Time Series Transformers
Michael Potter, ILKAY YILDIZ POTTER, OCTAVIA CAMPS, MARIO SZNAIER
https://doi.org/10.14428/esann/2022.ES2022-12
Michael Potter, ILKAY YILDIZ POTTER, OCTAVIA CAMPS, MARIO SZNAIER
https://doi.org/10.14428/esann/2022.ES2022-12
Abstract:
We propose a novel multivariate time series autoencoder, which produces interpretable linear-dynamical latent features that govern the predictions for several downstream tasks. To this end, we combine a transformer autoencoder with a dynamical atoms-based autoencoder to mimic Koopman operators in the latent space, without the need for finding eigenfunctions or their low-dimensional approximations. We demonstrate that our approach significantly outperforms deep Koopman operator learning baselines for time series forecasting on chaotic systems such as the Lorentz Attractor. Furthermore, the dynamics-aware representations, combined with a transformer classifier, lead to state-of-the-art classification accuracy on four benchmark multivariate time series datasets.
We propose a novel multivariate time series autoencoder, which produces interpretable linear-dynamical latent features that govern the predictions for several downstream tasks. To this end, we combine a transformer autoencoder with a dynamical atoms-based autoencoder to mimic Koopman operators in the latent space, without the need for finding eigenfunctions or their low-dimensional approximations. We demonstrate that our approach significantly outperforms deep Koopman operator learning baselines for time series forecasting on chaotic systems such as the Lorentz Attractor. Furthermore, the dynamics-aware representations, combined with a transformer classifier, lead to state-of-the-art classification accuracy on four benchmark multivariate time series datasets.
Minkowski logarithmic error: A physics-informed neural network approach for wind turbine lifetime assessment
Francisco de Nolasco Santos, Pietro D'Antuono, Nymfa Noppe, Wout Weijtjens, Christof Devriendt
https://doi.org/10.14428/esann/2022.ES2022-16
Francisco de Nolasco Santos, Pietro D'Antuono, Nymfa Noppe, Wout Weijtjens, Christof Devriendt
https://doi.org/10.14428/esann/2022.ES2022-16
Abstract:
In this contribution we present a physics-informed neural network (PINN) approach for wind turbine fatigue estimation. This PINN incorporates physical information of the structure's fatigue profile in its loss function, referred to as Minkowski logarithmic error (MLE) - an extension of the log loss for any given Lp space. The function is mathematically analysed and differentiated in order to better understand its behaviour. The results obtained using the MLE are favourably compared with previous efforts using the mean squared logarithmic error. Finally, the long-term error is evaluated based on the effect of p.
In this contribution we present a physics-informed neural network (PINN) approach for wind turbine fatigue estimation. This PINN incorporates physical information of the structure's fatigue profile in its loss function, referred to as Minkowski logarithmic error (MLE) - an extension of the log loss for any given Lp space. The function is mathematically analysed and differentiated in order to better understand its behaviour. The results obtained using the MLE are favourably compared with previous efforts using the mean squared logarithmic error. Finally, the long-term error is evaluated based on the effect of p.
Improving Laplacian Pyramids Regression with Localization in Frequency and Time
Neta Rabin, Ben Hen, Ángela Fernández
https://doi.org/10.14428/esann/2022.ES2022-106
Neta Rabin, Ben Hen, Ángela Fernández
https://doi.org/10.14428/esann/2022.ES2022-106
Abstract:
Auto-Adaptive Laplacian Pyramids (ALP) is an iterative kernel-based regression model. It constructs a multi-scale representation of the train data, where the multi-scale modes are average residuals. In this work, we demonstrate two extensions of the model. The first is a hybrid approach that combines ALP with Empirical Mode Decomposition to provide localization in the frequency domain. The second modifies ALP to fit datasets with non-uniform noise. This is achieved by computing the optimal stopping criterion in a point-dependent manner. Experimental results demonstrate these models for solar energy prediction and for forecasting epidemiology infections.
Auto-Adaptive Laplacian Pyramids (ALP) is an iterative kernel-based regression model. It constructs a multi-scale representation of the train data, where the multi-scale modes are average residuals. In this work, we demonstrate two extensions of the model. The first is a hybrid approach that combines ALP with Empirical Mode Decomposition to provide localization in the frequency domain. The second modifies ALP to fit datasets with non-uniform noise. This is achieved by computing the optimal stopping criterion in a point-dependent manner. Experimental results demonstrate these models for solar energy prediction and for forecasting epidemiology infections.
Gap filling in air temperature series by matrix completion methods
Benoît Loucheur, Pierre-Antoine Absil, Michel Journée
https://doi.org/10.14428/esann/2022.ES2022-67
Benoît Loucheur, Pierre-Antoine Absil, Michel Journée
https://doi.org/10.14428/esann/2022.ES2022-67
Abstract:
Quality control of meteorological data is an important part of atmospheric analysis and prediction, as missing or erroneous data can have a negative impact on the accuracy of these environmental products. In practice, the presence of missing data in the weather series is quite common and problematic for many uses. We have compared the performance of matrix completion methods with the state of the art to solve this missing data problem. The experimental results were carried out using the daily minimum and maximum temperature measurements of the network of weather stations operated by the Royal Meteorological Institute (RMI) of Belgium.
Quality control of meteorological data is an important part of atmospheric analysis and prediction, as missing or erroneous data can have a negative impact on the accuracy of these environmental products. In practice, the presence of missing data in the weather series is quite common and problematic for many uses. We have compared the performance of matrix completion methods with the state of the art to solve this missing data problem. The experimental results were carried out using the daily minimum and maximum temperature measurements of the network of weather stations operated by the Royal Meteorological Institute (RMI) of Belgium.
Predicting Test Execution Times with Asymmetric Random Forests
Francisco Pereira, Helio Silva, João Gomes, Javam Machado
https://doi.org/10.14428/esann/2022.ES2022-114
Francisco Pereira, Helio Silva, João Gomes, Javam Machado
https://doi.org/10.14428/esann/2022.ES2022-114
Abstract:
Being able to estimate a test execution time is of fundamental importance when you need to prioritize tests. Furthermore, it is also important that an estimation algorithm do not underestimate the execution time, since time can be a hard constraint in many problems. If a test take longer than expected, some test that is planned to be executed in the future may have to be cancelled. Under such scenario, in this paper, we developed two simple variants of the Random Forest regression algorithm to predict test execution times in storage diagnostics tests. The proposed methods are compared to a baseline time estimation method (already available in a commercial product) and other machine learning based models. On the basis of our experiments we can state that the proposed variants achieved promising results when considering an asymmetric error metric.
Being able to estimate a test execution time is of fundamental importance when you need to prioritize tests. Furthermore, it is also important that an estimation algorithm do not underestimate the execution time, since time can be a hard constraint in many problems. If a test take longer than expected, some test that is planned to be executed in the future may have to be cancelled. Under such scenario, in this paper, we developed two simple variants of the Random Forest regression algorithm to predict test execution times in storage diagnostics tests. The proposed methods are compared to a baseline time estimation method (already available in a commercial product) and other machine learning based models. On the basis of our experiments we can state that the proposed variants achieved promising results when considering an asymmetric error metric.
Recurrent learning and reservoir computing
Orthogonality in Additive Echo State Networks
Andrea Ceni, Claudio Gallicchio
https://doi.org/10.14428/esann/2022.ES2022-47
Andrea Ceni, Claudio Gallicchio
https://doi.org/10.14428/esann/2022.ES2022-47
Abstract:
Reservoir computing (RC) is a state-of-the-art approach for efficient training in temporal domains. In this paper, we explore new RC architectures that generalise the popular leaky echo state network model (leaky-ESN) introducing an additive orthogonal term outside the nonlinear part of the ESN equation. We investigate the benefits of employing orthogonal matrices in ESNs both inside the nonlinearity and outside of it. We show empirically how to boost the memory capacity towards the theoretical maximum value while still preserving the power of nonlinear computations. Ergo, we optimise the compromise between computing with memory and computing with nonlinearity. The proposed model demonstrates to outperform both leaky-ESN and orthogonal reservoir ESN models on tasks requiring nonlinear computations with memory.
Reservoir computing (RC) is a state-of-the-art approach for efficient training in temporal domains. In this paper, we explore new RC architectures that generalise the popular leaky echo state network model (leaky-ESN) introducing an additive orthogonal term outside the nonlinear part of the ESN equation. We investigate the benefits of employing orthogonal matrices in ESNs both inside the nonlinearity and outside of it. We show empirically how to boost the memory capacity towards the theoretical maximum value while still preserving the power of nonlinear computations. Ergo, we optimise the compromise between computing with memory and computing with nonlinearity. The proposed model demonstrates to outperform both leaky-ESN and orthogonal reservoir ESN models on tasks requiring nonlinear computations with memory.
Towards Better Transition Modeling in Recurrent Neural Networks: the Case of Sign Language Tokenization
Pierre Poitier, Jérôme Fink, Benoit Frénay
https://doi.org/10.14428/esann/2022.ES2022-50
Pierre Poitier, Jérôme Fink, Benoit Frénay
https://doi.org/10.14428/esann/2022.ES2022-50
Abstract:
Recurrent neural networks can be used to segment sequences such as videos, where transitions can be challenging to detect. This paper benchmarks strategies to better model the transition between states. The specific task of SL video tokenization is chosen for the evaluation, as it remains challenging. Tokenizers are the cornerstone of natural language processing pipelines. There exist powerful tokenizers for such data, but sign language (SL) video tokenizers are still under development. Benchmarked strategies prove to be useful to improve SL videos tokenization, but there is still room for improvement to better model state transitions.
Recurrent neural networks can be used to segment sequences such as videos, where transitions can be challenging to detect. This paper benchmarks strategies to better model the transition between states. The specific task of SL video tokenization is chosen for the evaluation, as it remains challenging. Tokenizers are the cornerstone of natural language processing pipelines. There exist powerful tokenizers for such data, but sign language (SL) video tokenizers are still under development. Benchmarked strategies prove to be useful to improve SL videos tokenization, but there is still room for improvement to better model state transitions.
Federated Adaptation of Reservoirs via Intrinsic Plasticity
Valerio De Caro, Claudio Gallicchio, Davide Bacciu
https://doi.org/10.14428/esann/2022.ES2022-98
Valerio De Caro, Claudio Gallicchio, Davide Bacciu
https://doi.org/10.14428/esann/2022.ES2022-98
Abstract:
We propose a novel algorithm for performing federated learning with Echo State Networks (ESNs) in a client-server scenario. In particular, our proposal focuses on the adaptation of reservoirs by combining Intrinsic Plasticity with Federated Averaging. The former is a gradient-based method for adapting the reservoir's non-linearity in a local and unsupervised manner, while the latter provides the framework for learning in the federated scenario. We evaluate our approach on real-world datasets from human monitoring, in comparison with the previous approach for federated ESNs existing in literature. Results show that adapting the reservoir with our algorithm provides a significant improvement on the performance of the global model.
We propose a novel algorithm for performing federated learning with Echo State Networks (ESNs) in a client-server scenario. In particular, our proposal focuses on the adaptation of reservoirs by combining Intrinsic Plasticity with Federated Averaging. The former is a gradient-based method for adapting the reservoir's non-linearity in a local and unsupervised manner, while the latter provides the framework for learning in the federated scenario. We evaluate our approach on real-world datasets from human monitoring, in comparison with the previous approach for federated ESNs existing in literature. Results show that adapting the reservoir with our algorithm provides a significant improvement on the performance of the global model.
Recurrent Restricted Kernel Machines for Time-series Forecasting
Arun Pandey, Hannes De Meulemeester, Henri De Plaen, Bart De Moor, Johan Suykens
https://doi.org/10.14428/esann/2022.ES2022-120
Arun Pandey, Hannes De Meulemeester, Henri De Plaen, Bart De Moor, Johan Suykens
https://doi.org/10.14428/esann/2022.ES2022-120
Abstract:
In this paper, we propose a novel method for time-series modeling and forecasting. It is based on the temporal formulation of Restricted Kernel Machines leading to a dynamical equation in the latent-variables. Forecasting involves finding the next latent variable and then solving a pre-image problem to predict a new-point in the input space. Further, we benchmark our model on several standard data sets against other well-known time-series models.
In this paper, we propose a novel method for time-series modeling and forecasting. It is based on the temporal formulation of Restricted Kernel Machines leading to a dynamical equation in the latent-variables. Forecasting involves finding the next latent variable and then solving a pre-image problem to predict a new-point in the input space. Further, we benchmark our model on several standard data sets against other well-known time-series models.
Input Routed Echo State Networks
Luca Argentieri, Claudio Gallicchio, Alessio Micheli
https://doi.org/10.14428/esann/2022.ES2022-90
Luca Argentieri, Claudio Gallicchio, Alessio Micheli
https://doi.org/10.14428/esann/2022.ES2022-90
Abstract:
We introduce a novel Reservoir Computing (RC) approach for multi-dimensional temporal signals. Our proposal is based on routing the different dimensions of the driving input towards different dynamical sub-modules in a multi-reservoir architecture. At the same time, controllable interconnections among the sub-modules allow modeling the interplay between the different dynamics that might be required by the task. Experiments on synthetic and real-world time-series classification problems clearly show the advantages of the proposed approach in dealing with multi-dimensional signals in comparison to standard RC neural networks.
We introduce a novel Reservoir Computing (RC) approach for multi-dimensional temporal signals. Our proposal is based on routing the different dimensions of the driving input towards different dynamical sub-modules in a multi-reservoir architecture. At the same time, controllable interconnections among the sub-modules allow modeling the interplay between the different dynamics that might be required by the task. Experiments on synthetic and real-world time-series classification problems clearly show the advantages of the proposed approach in dealing with multi-dimensional signals in comparison to standard RC neural networks.
Natural language processing, and recommender systems
Attention-based Ingredient Phrase Parser
Zhengxiang Shi, Pin Ni, Meihui Wang, To Eun Kim, Aldo Lipani
https://doi.org/10.14428/esann/2022.ES2022-10
Zhengxiang Shi, Pin Ni, Meihui Wang, To Eun Kim, Aldo Lipani
https://doi.org/10.14428/esann/2022.ES2022-10
Abstract:
As virtual personal assistants have now penetrated the consumer market, with products such as Siri and Alexa, the research community has produced several works on task-oriented dialogue tasks such as hotel booking, restaurant booking, and movie recommendation. Assisting users to cook is one of these tasks that are expected to be solved by intelligent assistants, where ingredients and their corresponding attributes, such as name, unit, and quantity, should be provided to users precisely and promptly. However, existing ingredient information scraped from the cooking website is in the unstructured form with huge variation in the lexical structure, for example, “1 garlic clove, crushed”, and “1 (8 ounce) package cream cheese, softened”, making it difficult to extract information exactly. To provide an engaged and successful conversational service to users for cooking tasks, we propose a new ingredient parsing model that can parse an ingredient phrase of recipes into the structure form with its corresponding attributes with over 0.93 F1-score. Experimental results show that our model achieves state-of-the-art performance on AllRecipes and Food.com datasets.
As virtual personal assistants have now penetrated the consumer market, with products such as Siri and Alexa, the research community has produced several works on task-oriented dialogue tasks such as hotel booking, restaurant booking, and movie recommendation. Assisting users to cook is one of these tasks that are expected to be solved by intelligent assistants, where ingredients and their corresponding attributes, such as name, unit, and quantity, should be provided to users precisely and promptly. However, existing ingredient information scraped from the cooking website is in the unstructured form with huge variation in the lexical structure, for example, “1 garlic clove, crushed”, and “1 (8 ounce) package cream cheese, softened”, making it difficult to extract information exactly. To provide an engaged and successful conversational service to users for cooking tasks, we propose a new ingredient parsing model that can parse an ingredient phrase of recipes into the structure form with its corresponding attributes with over 0.93 F1-score. Experimental results show that our model achieves state-of-the-art performance on AllRecipes and Food.com datasets.
Neural Architecture Search for Sentence Classification with BERT
Philip Kenneweg, Sarah Schröder, Barbara Hammer
https://doi.org/10.14428/esann/2022.ES2022-45
Philip Kenneweg, Sarah Schröder, Barbara Hammer
https://doi.org/10.14428/esann/2022.ES2022-45
Abstract:
Pre training of language models on large text corpora is common prac- tice in Natural Language Processing. Following, fine tuning of these models is per- formed to achieve the best results on a variety of tasks. In this paper we question the common practice of only adding a single output layer as a classification head on top of the network. We perform an AutoML search to find architectures that outperform the current single layer at only a small compute cost. We validate our classification architecture on a variety of NLP benchmarks from the GLUE dataset. The source code is open-source and free (MIT licensed) software, available at https://github.com/TheMody/ NASforSentenceEmbeddingHeads.
Pre training of language models on large text corpora is common prac- tice in Natural Language Processing. Following, fine tuning of these models is per- formed to achieve the best results on a variety of tasks. In this paper we question the common practice of only adding a single output layer as a classification head on top of the network. We perform an AutoML search to find architectures that outperform the current single layer at only a small compute cost. We validate our classification architecture on a variety of NLP benchmarks from the GLUE dataset. The source code is open-source and free (MIT licensed) software, available at https://github.com/TheMody/ NASforSentenceEmbeddingHeads.
High Accuracy and Low Regret for User-Cold-Start Using Latent Bandits
David Young, Douglas Leith
https://doi.org/10.14428/esann/2022.ES2022-79
David Young, Douglas Leith
https://doi.org/10.14428/esann/2022.ES2022-79
Abstract:
We develop a novel latent-bandit algorithm for tackling the cold-start problem for new users joining a recommender system. This new algorithm significantly outperforms the state of the art, simultaneously achieving both higher accuracy and lower regret.
We develop a novel latent-bandit algorithm for tackling the cold-start problem for new users joining a recommender system. This new algorithm significantly outperforms the state of the art, simultaneously achieving both higher accuracy and lower regret.
Machine Learning and Information Theoretic Methods for Molecular Biology and Medicine
Tutorial - Machine Learning and Information Theoretic Methods for Molecular Biology and Medicine
Thomas Villmann, Jonas Almeida, Lee John, Susana Vinga
https://doi.org/10.14428/esann/2022.ES2022-3
Thomas Villmann, Jonas Almeida, Lee John, Susana Vinga
https://doi.org/10.14428/esann/2022.ES2022-3
Abstract:
A short introduction to the application of information-theoretic and machine learning methods to biomolecular and medical data is provided as the motivating material that supports special session dedicated to this topic at ESANN 2022. In particular, we highlight current developments of foundation such as interpretability and model certainty. Further, we emphasize how theoretic models provide a natural framework to deal with heterogeneous and complex data structures as frequently occurring in biomedical research.
A short introduction to the application of information-theoretic and machine learning methods to biomolecular and medical data is provided as the motivating material that supports special session dedicated to this topic at ESANN 2022. In particular, we highlight current developments of foundation such as interpretability and model certainty. Further, we emphasize how theoretic models provide a natural framework to deal with heterogeneous and complex data structures as frequently occurring in biomedical research.
Interactive dual projections for gene expression analysis
Ignacio Diaz-Blanco, Jose M. Enguita-Gonzalez, Diego Garcia-Perez, Ana Gonzalez-Muñiz, Abel A. Cuadrado-Vega, Maria Dolores Chiara-Romero, Nuria Valdes-Gallego
https://doi.org/10.14428/esann/2022.ES2022-22
Ignacio Diaz-Blanco, Jose M. Enguita-Gonzalez, Diego Garcia-Perez, Ana Gonzalez-Muñiz, Abel A. Cuadrado-Vega, Maria Dolores Chiara-Romero, Nuria Valdes-Gallego
https://doi.org/10.14428/esann/2022.ES2022-22
Abstract:
We present an application of interactive dimensionality reduction (DR) for exploratory analysis of gene expression data that produces two lively updated projections, a sample map and a gene map, by rendering intermediate results of a t-SNE. The user can condition the projections "on the fly" by subsets of genes or samples, so updated views reveal co-expression patterns for different cancer types or gene groups.
We present an application of interactive dimensionality reduction (DR) for exploratory analysis of gene expression data that produces two lively updated projections, a sample map and a gene map, by rendering intermediate results of a t-SNE. The user can condition the projections "on the fly" by subsets of genes or samples, so updated views reveal co-expression patterns for different cancer types or gene groups.
Efficient classification learning of biochemical structured data by means of relevance weighting for sensoric response features
Katrin Sophie Bohnsack, Marika Kaden, Julius Voigt, Thomas Villmann
https://doi.org/10.14428/esann/2022.ES2022-36
Katrin Sophie Bohnsack, Marika Kaden, Julius Voigt, Thomas Villmann
https://doi.org/10.14428/esann/2022.ES2022-36
Abstract:
We present an approach for generating vectorial representations of graphs for machine learning applications based on a sensoric response principle and multiple graph kernels. The sensor perspective reduces the graph kernel computations significantly. Thus, multiple kernel (relevance) learning can be realized using the interpretable generalized matrix learning vector quantization (GMLVQ) classifier. Results obtained in small molecule classification serve as proof of concept.
We present an approach for generating vectorial representations of graphs for machine learning applications based on a sensoric response principle and multiple graph kernels. The sensor perspective reduces the graph kernel computations significantly. Thus, multiple kernel (relevance) learning can be realized using the interpretable generalized matrix learning vector quantization (GMLVQ) classifier. Results obtained in small molecule classification serve as proof of concept.
Interactive visual analytics for medical data: application to COVID-19 clinical information during the first wave
Ignacio Diaz-Blanco, Jose M. Enguita-Gonzalez, Diego Garcia-Perez, Maria Dolores Chiara-Romero, Nuria Valdes-Gallego, Ana Gonzalez-Muñiz, Abel A. Cuadrado-Vega
https://doi.org/10.14428/esann/2022.ES2022-31
Ignacio Diaz-Blanco, Jose M. Enguita-Gonzalez, Diego Garcia-Perez, Maria Dolores Chiara-Romero, Nuria Valdes-Gallego, Ana Gonzalez-Muñiz, Abel A. Cuadrado-Vega
https://doi.org/10.14428/esann/2022.ES2022-31
Abstract:
Biomedical data recorded as a result of clinical practice are often multi-domain -involving lab measurements, medication, patient attributes, logistic information-, and also highly unstructured, with high rates of missing data and asynchronously sampled measurements. In this scenario, we need tools capable of providing a broad picture prior to more detailed analyses. We present here a visual analytics approach that uses the morphing projections technique to combine the visualization of a t-SNE projection of clinical time series, with views of other clinical or patient's information. The proposed approach is demonstrated on an application case study of COVID-19 clinical information taken during the first wave.
Biomedical data recorded as a result of clinical practice are often multi-domain -involving lab measurements, medication, patient attributes, logistic information-, and also highly unstructured, with high rates of missing data and asynchronously sampled measurements. In this scenario, we need tools capable of providing a broad picture prior to more detailed analyses. We present here a visual analytics approach that uses the morphing projections technique to combine the visualization of a t-SNE projection of clinical time series, with views of other clinical or patient's information. The proposed approach is demonstrated on an application case study of COVID-19 clinical information taken during the first wave.
Improving Intensive Care Chest X-Ray Classification by Transfer Learning and Automatic Label Generation
Helen Schneider, David Biesner, Sebastian Nowak, Yannik Layer, Maike Theis, Wolfgang Block, Benjamin Wulff, Alois M. Sprinkart, Ulrike I. Attenberger, Rafet Sifa
https://doi.org/10.14428/esann/2022.ES2022-85
Helen Schneider, David Biesner, Sebastian Nowak, Yannik Layer, Maike Theis, Wolfgang Block, Benjamin Wulff, Alois M. Sprinkart, Ulrike I. Attenberger, Rafet Sifa
https://doi.org/10.14428/esann/2022.ES2022-85
Abstract:
Radiologists commonly conduct chest X-rays for the diagnosis of pathologies or the evaluation of extrathoracic material positions in intensive care unit (ICU) patients. Automated assessments of radiographs have the potential to assist physicians by detecting pathologies that pose an emergency, leading to faster initiation of treatment and optimization of clinical workflows. The amount and quality of training data is a key aspect for developing deep learning models with reliable performance. This work investigates the effects of transfer learning on public data, automatically generated data labels and manual data annotation on the classification of ICU chest X-rays of a German hospital.
Radiologists commonly conduct chest X-rays for the diagnosis of pathologies or the evaluation of extrathoracic material positions in intensive care unit (ICU) patients. Automated assessments of radiographs have the potential to assist physicians by detecting pathologies that pose an emergency, leading to faster initiation of treatment and optimization of clinical workflows. The amount and quality of training data is a key aspect for developing deep learning models with reliable performance. This work investigates the effects of transfer learning on public data, automatically generated data labels and manual data annotation on the classification of ICU chest X-rays of a German hospital.
Concept drift
Federated learning vector quantization for dealing with drift between nodes
Johannes Brinkrolf, Valerie Vaquet, Fabian Hinder, Patrick Menz, Udo Seiffert, Barbara Hammer
https://doi.org/10.14428/esann/2022.ES2022-89
Johannes Brinkrolf, Valerie Vaquet, Fabian Hinder, Patrick Menz, Udo Seiffert, Barbara Hammer
https://doi.org/10.14428/esann/2022.ES2022-89
Abstract:
Federated learning is an efficient methodology to reduce the data transmissions to the server when working with large amounts of (sensor) data from diverse physical locations. When using data from different sensor devices concept drift between the single sensors poses an additional challenge. In this contribution we define a formal framework for federated learning with concept drift and propose a version of federated LVQ dealing with concept drift induced by different hyperspectral cameras. We evaluate this approach experimentally and demonstrate its robustness to class imbalance and missing classes.
Federated learning is an efficient methodology to reduce the data transmissions to the server when working with large amounts of (sensor) data from diverse physical locations. When using data from different sensor devices concept drift between the single sensors poses an additional challenge. In this contribution we define a formal framework for federated learning with concept drift and propose a version of federated LVQ dealing with concept drift induced by different hyperspectral cameras. We evaluate this approach experimentally and demonstrate its robustness to class imbalance and missing classes.
From hyperspectral to multispectral sensing – from simulation to reality: A comprehensive approach for calibration model transfer
Patrick Menz, Valerie Vaquet, Barbara Hammer, Udo Seiffert
https://doi.org/10.14428/esann/2022.ES2022-56
Patrick Menz, Valerie Vaquet, Barbara Hammer, Udo Seiffert
https://doi.org/10.14428/esann/2022.ES2022-56
Abstract:
High-resolution hyperspectral sensors provide precise but expensive information of the chemical composition of an object in various industries. We present a method to transfer this ability into customized low-budget multispectral solutions. Based on a relevance analysis of spectra for a given problem, we simulated and constructed a multispectral sensor based on inverse spectroscopy. The corresponding calibration model derived from simulation of such a device and linked to the multispectral sensor hardware must not drop in precision significantly. For this purpose, different methods of calibration model transfer were tested, which can cope with a limited subset of the data. The latent space transformation with Chebychev polynomials outperformed all other methods by achieving high performance with the fewest labeled data.
High-resolution hyperspectral sensors provide precise but expensive information of the chemical composition of an object in various industries. We present a method to transfer this ability into customized low-budget multispectral solutions. Based on a relevance analysis of spectra for a given problem, we simulated and constructed a multispectral sensor based on inverse spectroscopy. The corresponding calibration model derived from simulation of such a device and linked to the multispectral sensor hardware must not drop in precision significantly. For this purpose, different methods of calibration model transfer were tested, which can cope with a limited subset of the data. The latent space transformation with Chebychev polynomials outperformed all other methods by achieving high performance with the fewest labeled data.
Data stream generation through real concept's interpolation
Joanna Komorniczak, Pawel Ksieniewicz
https://doi.org/10.14428/esann/2022.ES2022-49
Joanna Komorniczak, Pawel Ksieniewicz
https://doi.org/10.14428/esann/2022.ES2022-49
Abstract:
Among the recently published works in the field of data stream analysis – both in the context of classification task and concept drift detection – the deficit of real-world data streams is a recurring problem. This article proposes a method for generating data streams with given parameters based on real-world static data. The method uses one-dimensional interpolation to generate sudden or incremental concept drifts. The generated streams were subjected to an exemplary analysis in the concept drift detection task using a detector ensemble. The method has the potential to contribute to the development of methods focused on data stream processing.
Among the recently published works in the field of data stream analysis – both in the context of classification task and concept drift detection – the deficit of real-world data streams is a recurring problem. This article proposes a method for generating data streams with given parameters based on real-world static data. The method uses one-dimensional interpolation to generate sudden or incremental concept drifts. The generated streams were subjected to an exemplary analysis in the concept drift detection task using a detector ensemble. The method has the potential to contribute to the development of methods focused on data stream processing.
Deep Learning for Graphs
Deep Learning for Graphs
Davide Bacciu, Federico Errica, Nicolò Navarin, Luca Pasa, Daniele Zambon
https://doi.org/10.14428/esann/2022.ES2022-7
Davide Bacciu, Federico Errica, Nicolò Navarin, Luca Pasa, Daniele Zambon
https://doi.org/10.14428/esann/2022.ES2022-7
Abstract:
The flourishing field of deep learning for graphs relies on the layered computation of representations from graph-structured data given as input. A widely used strategy for processing graphs is via message passing, based on exchanging information among the connected nodes of the graph. Subsequently, node representations are employed to address tasks associated with the nodes and edges of a graph, or even entire graphs. The present tutorial paper reviews fundamental concepts and open challenges of deep learning for graphs and summarizes the contributed papers of the ESANN 2022 special session on the topic.
The flourishing field of deep learning for graphs relies on the layered computation of representations from graph-structured data given as input. A widely used strategy for processing graphs is via message passing, based on exchanging information among the connected nodes of the graph. Subsequently, node representations are employed to address tasks associated with the nodes and edges of a graph, or even entire graphs. The present tutorial paper reviews fundamental concepts and open challenges of deep learning for graphs and summarizes the contributed papers of the ESANN 2022 special session on the topic.
Beyond Homophily with Graph Echo State Networks
Domenico Tortorella, Alessio Micheli
https://doi.org/10.14428/esann/2022.ES2022-58
Domenico Tortorella, Alessio Micheli
https://doi.org/10.14428/esann/2022.ES2022-58
Abstract:
Graph Echo State Networks (GESN) have already demonstrated their efficacy and efficiency in graph classification tasks. However, semi-supervised node classification brought out the problem of over-smoothing in end-to-end trained deep models, which causes a bias towards high homophily graphs. We evaluate for the first time GESN on node classification tasks with different degrees of homophily, analyzing also the impact of the reservoir radius. Our experiments show that reservoir models are able to achieve better or comparable accuracy with respect to fully trained deep models that implement ad hoc variations in the architectural bias, with a gain in terms of efficiency.
Graph Echo State Networks (GESN) have already demonstrated their efficacy and efficiency in graph classification tasks. However, semi-supervised node classification brought out the problem of over-smoothing in end-to-end trained deep models, which causes a bias towards high homophily graphs. We evaluate for the first time GESN on node classification tasks with different degrees of homophily, analyzing also the impact of the reservoir radius. Our experiments show that reservoir models are able to achieve better or comparable accuracy with respect to fully trained deep models that implement ad hoc variations in the architectural bias, with a gain in terms of efficiency.
Biased Edge Dropout in NIFTY for Fair Graph Representation Learning
Federico Caldart, Luca Pasa, Luca Oneto, Alessandro Sperduti, Nicolò Navarin
https://doi.org/10.14428/esann/2022.ES2022-99
Federico Caldart, Luca Pasa, Luca Oneto, Alessandro Sperduti, Nicolò Navarin
https://doi.org/10.14428/esann/2022.ES2022-99
Abstract:
Graph Neural Networks (GNNs) are nowadays widely used in many real-world applications. Nonetheless, the data relationships can be a source of biases based on sensitive attributes (e.g., gender or ethnicity). Several methods have been proposed to learn fair graph node representations. In this work we extend NIFTY, an approach that exploits additional terms in the loss function based on perturbing the input data to enforce the fairness of the GNNs. In particular, we exploit a biased perturbation of the adjacency matrix of the graph able to reduce the edge homophily. We show the effectiveness of our approach in four real-world graph datasets.
Graph Neural Networks (GNNs) are nowadays widely used in many real-world applications. Nonetheless, the data relationships can be a source of biases based on sensitive attributes (e.g., gender or ethnicity). Several methods have been proposed to learn fair graph node representations. In this work we extend NIFTY, an approach that exploits additional terms in the loss function based on perturbing the input data to enforce the fairness of the GNNs. In particular, we exploit a biased perturbation of the adjacency matrix of the graph able to reduce the edge homophily. We show the effectiveness of our approach in four real-world graph datasets.
Embedding-based next song recommendation for playlists
Raphaël Romero, Tijl De Bie
https://doi.org/10.14428/esann/2022.ES2022-28
Raphaël Romero, Tijl De Bie
https://doi.org/10.14428/esann/2022.ES2022-28
Abstract:
In recent years, music storage and consumption has shifted massively to digital platforms, where large-scale libraries of songs are stored along with their metadata. As a byproduct of this transformation, music is increasingly being organized and accessed in the form of playlists. User-curated playlists have become massively available online, and the challenge of automatically generating playlists has gained popularity in the music information retrieval community. In this paper, we build on link prediction for graphs to propose a flexible music playlist generation method. We transform a playlist dataset into a weighted graph of songs and posit a Poisson model on the count of co-occurence between songs, where the rate is modulated by the euclidean distance between song embeddings. Our method yields prediction results superior to common deterministic baselines, suggesting that the learned embeddings can be used to derive a meaningful notion of song similarity.
In recent years, music storage and consumption has shifted massively to digital platforms, where large-scale libraries of songs are stored along with their metadata. As a byproduct of this transformation, music is increasingly being organized and accessed in the form of playlists. User-curated playlists have become massively available online, and the challenge of automatically generating playlists has gained popularity in the music information retrieval community. In this paper, we build on link prediction for graphs to propose a flexible music playlist generation method. We transform a playlist dataset into a weighted graph of songs and posit a Poisson model on the count of co-occurence between songs, where the rate is modulated by the euclidean distance between song embeddings. Our method yields prediction results superior to common deterministic baselines, suggesting that the learned embeddings can be used to derive a meaningful notion of song similarity.
Graph Neural Networks for Propositional Model Counting
Gaia Saveri
https://doi.org/10.14428/esann/2022.ES2022-88
Gaia Saveri
https://doi.org/10.14428/esann/2022.ES2022-88
Abstract:
Graph Neural Networks (GNNs) have been recently leveraged to solve several logical reasoning tasks. Nevertheless, counting problems such as propositional model counting (#SAT) are still mostly approached with traditional solvers. Here we tackle this gap by presenting an architecture based on the GNN framework for belief propagation (BP) of [Kuck et al., 2020], extended with self-attentive GNN and trained to approximately solve the #SAT problem. We experimentally show that our model, trained on a small set of random Boolean formulae, is able to scale effectively to much larger problem sizes, outperforming state of the art approximate solvers. Moreover, we show that it can be efficiently fine-tuned to provide good generalization results on different formulae distributions.
Graph Neural Networks (GNNs) have been recently leveraged to solve several logical reasoning tasks. Nevertheless, counting problems such as propositional model counting (#SAT) are still mostly approached with traditional solvers. Here we tackle this gap by presenting an architecture based on the GNN framework for belief propagation (BP) of [Kuck et al., 2020], extended with self-attentive GNN and trained to approximately solve the #SAT problem. We experimentally show that our model, trained on a small set of random Boolean formulae, is able to scale effectively to much larger problem sizes, outperforming state of the art approximate solvers. Moreover, we show that it can be efficiently fine-tuned to provide good generalization results on different formulae distributions.
Revisiting Edge Pooling in Graph Neural Networks
Francesco Landolfi
https://doi.org/10.14428/esann/2022.ES2022-92
Francesco Landolfi
https://doi.org/10.14428/esann/2022.ES2022-92
Abstract:
Sparse pooling methods for graph neural networks typically perform graph reduction by keeping only the top-k vertices according to an adaptive scoring function. Although fast and scalable, these methods destroy the relational information of the graph or even make it disconnected. EdgePool is one of the few sparse alternatives that preserve the connectivity of the input graph by performing a series of edge contractions according to an adaptive scoring of the edges, but it has the drawback of being sequential and not scalable on large scale graphs. In this paper we show that EdgePool can be efficiently computed using a well-known parallel algorithm from literature, and we also propose a novel, lightweight alternative that leverages on an adaptive scoring function of the nodes. We tested both methods on standard benchmark datasets, showing that they generally outperform other sparse pooling methods from literature.
Sparse pooling methods for graph neural networks typically perform graph reduction by keeping only the top-k vertices according to an adaptive scoring function. Although fast and scalable, these methods destroy the relational information of the graph or even make it disconnected. EdgePool is one of the few sparse alternatives that preserve the connectivity of the input graph by performing a series of edge contractions according to an adaptive scoring of the edges, but it has the drawback of being sequential and not scalable on large scale graphs. In this paper we show that EdgePool can be efficiently computed using a well-known parallel algorithm from literature, and we also propose a novel, lightweight alternative that leverages on an adaptive scoring function of the nodes. We tested both methods on standard benchmark datasets, showing that they generally outperform other sparse pooling methods from literature.
Reinforcement learning
Size Scaling in Self-Play Reinforcement Learning
Oren Neumann, Claudius Gros
https://doi.org/10.14428/esann/2022.ES2022-53
Oren Neumann, Claudius Gros
https://doi.org/10.14428/esann/2022.ES2022-53
Abstract:
Performance scaling laws with resources are heavily studied in supervised deep learning models but not in reinforcement learning. We examine the scaling of the AlphaZero algorithm’s performance with model size by training agents on three competitive two-player games, Connect Four, Oware and Pentago. We find that performance, in the form of Elo rating, scales logarithmically with the number of free neural network parameters, a trend consistent across games and when using deeper neural networks. This leads to a universal expression for the average match outcome which depends only on the ratio of sizes between opponents, which is supported by an agnostic rating method.
Performance scaling laws with resources are heavily studied in supervised deep learning models but not in reinforcement learning. We examine the scaling of the AlphaZero algorithm’s performance with model size by training agents on three competitive two-player games, Connect Four, Oware and Pentago. We find that performance, in the form of Elo rating, scales logarithmically with the number of free neural network parameters, a trend consistent across games and when using deeper neural networks. This leads to a universal expression for the average match outcome which depends only on the ratio of sizes between opponents, which is supported by an agnostic rating method.
Improving Zorro Explanations for Sparse Observations with Dense Proxy Data
Andreas Mazur, André Artelt, Barbara Hammer
https://doi.org/10.14428/esann/2022.ES2022-27
Andreas Mazur, André Artelt, Barbara Hammer
https://doi.org/10.14428/esann/2022.ES2022-27
Abstract:
Explanation methods are considered the most prominent way of achieving the ubiquitous requirement of transparency. Ideally, in order to be useful, explanations should be "easy to understand'' -- i.e. being of low complexity. In this work, we empirically study explanations generated by Zorro, a prominent explanation method for Graph Neural Networks. We propose a methodology to improve the quality of generated explanations in case of sparse observations in the particular application of a standard reinforcement learning scenario.
Explanation methods are considered the most prominent way of achieving the ubiquitous requirement of transparency. Ideally, in order to be useful, explanations should be "easy to understand'' -- i.e. being of low complexity. In this work, we empirically study explanations generated by Zorro, a prominent explanation method for Graph Neural Networks. We propose a methodology to improve the quality of generated explanations in case of sparse observations in the particular application of a standard reinforcement learning scenario.
Reinforcement learning for constructing low density sign representations of Boolean functions
Oytun Yapar, Erhan Oztop
https://doi.org/10.14428/esann/2022.ES2022-25
Oytun Yapar, Erhan Oztop
https://doi.org/10.14428/esann/2022.ES2022-25
Abstract:
Boolean functions (BFs) can be uniquely represented with polynomial functions by representing True and False with ±1. With the ’sign-representation’ framework, i.e., when the sign of the polynomials is used instead of the exact ±1, the representation is not unique anymore, and several measures of sign-representation become the target of research. One such measure is the polynomial threshold function density (PTF density), i.e., the minimum number of monomials that suffices to sign-represent a given BF. Several algorithms can find sign-representations with a low number of monomials; however, to find a representation with the minimum number of monomials possible is a combinatorial search problem. The recent success of reinforcement learning (RL) algorithms in solving combinatorial search problems poses the question of whether RL can perform well in finding sign-representations with a low number of monomials. To address this question, we focused on Deep Q-Networks (DQN) and explored its applicability to the sign-representation problem. To be concrete, we present our work on modeling RL agents for solving the sign-representation problem and give our results on the application of DQN to BFs with a low number of variables (n = 4). Our results indicate that the trained DQN agent generalizes well and exploits intrinsic structure of BFs, such as their equivalence in terms of certain equivalence relations.
Boolean functions (BFs) can be uniquely represented with polynomial functions by representing True and False with ±1. With the ’sign-representation’ framework, i.e., when the sign of the polynomials is used instead of the exact ±1, the representation is not unique anymore, and several measures of sign-representation become the target of research. One such measure is the polynomial threshold function density (PTF density), i.e., the minimum number of monomials that suffices to sign-represent a given BF. Several algorithms can find sign-representations with a low number of monomials; however, to find a representation with the minimum number of monomials possible is a combinatorial search problem. The recent success of reinforcement learning (RL) algorithms in solving combinatorial search problems poses the question of whether RL can perform well in finding sign-representations with a low number of monomials. To address this question, we focused on Deep Q-Networks (DQN) and explored its applicability to the sign-representation problem. To be concrete, we present our work on modeling RL agents for solving the sign-representation problem and give our results on the application of DQN to BFs with a low number of variables (n = 4). Our results indicate that the trained DQN agent generalizes well and exploits intrinsic structure of BFs, such as their equivalence in terms of certain equivalence relations.
Developmental Modular Reinforcement Learning
Jianyong Xue, Frédéric Alexandre
https://doi.org/10.14428/esann/2022.ES2022-19
Jianyong Xue, Frédéric Alexandre
https://doi.org/10.14428/esann/2022.ES2022-19
Abstract:
In this article, we propose a modular reinforcement learning (MRL) architecture that coordinates the competition and the cooperation between modules, and inspires, in a developmental approach, the generation of new modules in cases where new goals have been detected. We evaluate the effectiveness of our approach in a multiple-goal torus grid world. Results show that our approach has better performance than previous MRL methods in learning separate strategies for sub-goals, and reusing them for solving task-specific or unseen multi-goal problems, as well as maintaining the independence of the learning in each module.
In this article, we propose a modular reinforcement learning (MRL) architecture that coordinates the competition and the cooperation between modules, and inspires, in a developmental approach, the generation of new modules in cases where new goals have been detected. We evaluate the effectiveness of our approach in a multiple-goal torus grid world. Results show that our approach has better performance than previous MRL methods in learning separate strategies for sub-goals, and reusing them for solving task-specific or unseen multi-goal problems, as well as maintaining the independence of the learning in each module.
Adaptive Behavior Cloning Regularization for Stable Offline-to-Online Reinforcement Learning
Yi Zhao, Rinu Boney, Alexander Ilin, Juho Kannala, Joni Pajarinen
https://doi.org/10.14428/esann/2022.ES2022-110
Yi Zhao, Rinu Boney, Alexander Ilin, Juho Kannala, Joni Pajarinen
https://doi.org/10.14428/esann/2022.ES2022-110
Abstract:
Offline reinforcement learning, by learning from a fixed dataset, makes it possible to learn agent behaviors without interacting with the environment. However, depending on the quality of the offline dataset, such pre-trained agents may have limited performance and would further need to be fine-tuned online by interacting with the environment. During online fine-tuning, the performance of the pre-trained agent may collapse quickly due to the sudden distribution shift from offline to online data. We propose to adaptively weigh the behavior cloning loss during online fine-tuning based on the agent's performance and training stability. Moreover, we use a randomized ensemble of Q functions to further increase the sample efficiency of online fine-tuning by performing a large number of learning updates. Experiments show that the proposed method yields state-of-the-art offline-to-online reinforcement learning performance on the popular D4RL benchmark.
Offline reinforcement learning, by learning from a fixed dataset, makes it possible to learn agent behaviors without interacting with the environment. However, depending on the quality of the offline dataset, such pre-trained agents may have limited performance and would further need to be fine-tuned online by interacting with the environment. During online fine-tuning, the performance of the pre-trained agent may collapse quickly due to the sudden distribution shift from offline to online data. We propose to adaptively weigh the behavior cloning loss during online fine-tuning based on the agent's performance and training stability. Moreover, we use a randomized ensemble of Q functions to further increase the sample efficiency of online fine-tuning by performing a large number of learning updates. Experiments show that the proposed method yields state-of-the-art offline-to-online reinforcement learning performance on the popular D4RL benchmark.