Bruges, Belgium, April 24-25-26
Content of the proceedings
- 
    Classification and Bayesian learning
Embeddings and Representation Learning for Structured Data
Deep learning and CNN
Learning methods and optimization
60 Years of Weightless Neural Systems
Domain adaptation and learning
Streaming data analysis, concept drift and analysis of dynamic data sets
Societal Issues in Machine Learning: When Learning from Data is Not Enough
Statistical physics of learning and inference
Image processing and transfer learning
Time series and signal processing
Dynamical systems and reinforcement learning
Classification and Bayesian learning
        ES2019-35
Conditional BRUNO: a neural process for exchangeable labelled data
Iryna Korshunova, Yarin Gal, Arthur Gretton, Joni Dambre
Conditional BRUNO: a neural process for exchangeable labelled data
Iryna Korshunova, Yarin Gal, Arthur Gretton, Joni Dambre
            Abstract:
We present a neural process that models exchangeable sequences of high dimensional complex observations conditionally on a set of labels or tags. Our model combines the expressiveness of deep neural networks with the data-efficiency of Gaussian processes, resulting in a probabilistic model for which the posterior distribution is easy to evaluate and sample from, and the computational complexity scales linearly with the number of observations. The advantages of the proposed architecture are demonstrated on a challenging few-shot view reconstruction task which requires generalisation from short sequences of viewpoints.
        
    We present a neural process that models exchangeable sequences of high dimensional complex observations conditionally on a set of labels or tags. Our model combines the expressiveness of deep neural networks with the data-efficiency of Gaussian processes, resulting in a probabilistic model for which the posterior distribution is easy to evaluate and sample from, and the computational complexity scales linearly with the number of observations. The advantages of the proposed architecture are demonstrated on a challenging few-shot view reconstruction task which requires generalisation from short sequences of viewpoints.
        ES2019-98
interpretable dynamics models for data-efficient reinforcement learning
Markus Kaiser, Clemens Otte, Thomas Runkler, Carl Henrik Ek
interpretable dynamics models for data-efficient reinforcement learning
Markus Kaiser, Clemens Otte, Thomas Runkler, Carl Henrik Ek
            Abstract:
In this paper, we present a Bayesian view on model-based reinforcement learning. We use expert knowledge to impose structure on the transition model and present an efficient learning scheme based on variational inference. This scheme is applied to a heteroskedastic and bimodal benchmark problem on which we compare our results to NFQ and show how our approach yields human-interpretable insight about the underlying dynamics while also increasing data-efficiency.
        
    In this paper, we present a Bayesian view on model-based reinforcement learning. We use expert knowledge to impose structure on the transition model and present an efficient learning scheme based on variational inference. This scheme is applied to a heteroskedastic and bimodal benchmark problem on which we compare our results to NFQ and show how our approach yields human-interpretable insight about the underlying dynamics while also increasing data-efficiency.
        ES2019-77
PAC-Bayes and Fairness: Risk and Fairness Bounds on Distribution Dependent Fair Priors
Luca Oneto, Michele Donini, Massimiliano Pontil
PAC-Bayes and Fairness: Risk and Fairness Bounds on Distribution Dependent Fair Priors
Luca Oneto, Michele Donini, Massimiliano Pontil
            Abstract:
We address the problem of algorithmic fairness: ensuring that sensitive information does not unfairly influence the outcome of a classifier. We face this issue in the PAC-Bayes framework and we present an approach which trades off and bounds the risk and the fairness of the Gibbs Classifier measured with respect to different state-of-the-art fairness measures. For this purpose, we further develop the idea that the PAC-Bayes prior can be defined based on the data-generating distribution without actually needing to know it. In particular, we define a prior and a posterior which gives more weight to functions which exhibit good generalization and fairness properties.
        
    We address the problem of algorithmic fairness: ensuring that sensitive information does not unfairly influence the outcome of a classifier. We face this issue in the PAC-Bayes framework and we present an approach which trades off and bounds the risk and the fairness of the Gibbs Classifier measured with respect to different state-of-the-art fairness measures. For this purpose, we further develop the idea that the PAC-Bayes prior can be defined based on the data-generating distribution without actually needing to know it. In particular, we define a prior and a posterior which gives more weight to functions which exhibit good generalization and fairness properties.
        ES2019-131
DropConnect for Evaluation of Classification Stability in Learning Vector Quantization
Jensun Ravichandran, Sascha Saralajew, Thomas Villmann
DropConnect for Evaluation of Classification Stability in Learning Vector Quantization
Jensun Ravichandran, Sascha Saralajew, Thomas Villmann
            Abstract:
In this paper we consider DropOut/DropConnect techniques known from deep neural networks to evaluate the stability of learning vector quantization classifiers (LVQ). For this purpose, we consider the LVQ as a multilayer network and transfer the respective concepts to LVQ. Particularly, we consider the output as a stochastic ensemble such that an information theoretic measure is obtained to judge the stability level.
        
    In this paper we consider DropOut/DropConnect techniques known from deep neural networks to evaluate the stability of learning vector quantization classifiers (LVQ). For this purpose, we consider the LVQ as a multilayer network and transfer the respective concepts to LVQ. Particularly, we consider the output as a stochastic ensemble such that an information theoretic measure is obtained to judge the stability level.
        ES2019-189
Pixel-wise Conditioning of Generative Adversarial Networks
Cyprien Ruffino, Romain HERAULT, Eric Laloy, Gilles Gasso
Pixel-wise Conditioning of Generative Adversarial Networks
Cyprien Ruffino, Romain HERAULT, Eric Laloy, Gilles Gasso
            Abstract:
Generative Adversarial Networks (GANs) have proven successful for unsupervised image generation. Several works extended GANs to image inpainting by conditioning the generation with parts of the image one wants to reconstruct. However, these methods have limitations in settings where only a small subset of the image pixels is known beforehand. In this paper, we study the effectiveness of conditioning GANs by adding an explicit regularization term to enforce pixel-wise conditions when very few pixel values are provided. In addition, we also investigate the influence of this regularization term on the quality of the generated images and the satisfaction of the conditions. Conducted experiments on MNIST and FashionMNIST show evidence that this regularization term allows for controlling the trade-off between quality of the generated images and constraint satisfaction.
        
    Generative Adversarial Networks (GANs) have proven successful for unsupervised image generation. Several works extended GANs to image inpainting by conditioning the generation with parts of the image one wants to reconstruct. However, these methods have limitations in settings where only a small subset of the image pixels is known beforehand. In this paper, we study the effectiveness of conditioning GANs by adding an explicit regularization term to enforce pixel-wise conditions when very few pixel values are provided. In addition, we also investigate the influence of this regularization term on the quality of the generated images and the satisfaction of the conditions. Conducted experiments on MNIST and FashionMNIST show evidence that this regularization term allows for controlling the trade-off between quality of the generated images and constraint satisfaction.
        ES2019-142
Committees as Artificial Organisms - Evolution and Adaptation
Roberto Alamino
Committees as Artificial Organisms - Evolution and Adaptation
Roberto Alamino
            Abstract:
Generalised committee machines are here proposed to model an organism's DNA interaction with its environment, which are shown to induce a unique genotype-phenotype map. An application to organisms being subjected to a toxic environment is shown to allow antagonistic pleiotropy. The same scenario is studied in order to show the difference in adaptation when there is a fitness cost given by a lower reproduction rate.
        
    Generalised committee machines are here proposed to model an organism's DNA interaction with its environment, which are shown to induce a unique genotype-phenotype map. An application to organisms being subjected to a toxic environment is shown to allow antagonistic pleiotropy. The same scenario is studied in order to show the difference in adaptation when there is a fitness cost given by a lower reproduction rate.
        ES2019-68
Towards a device-free passive presence detection system with Bluetooth Low Energy beacons
Maximilian Münch, Karsten Huffstadt, Frank-Michael Schleif
Towards a device-free passive presence detection system with Bluetooth Low Energy beacons
Maximilian Münch, Karsten Huffstadt, Frank-Michael Schleif
            Abstract:
In an era of smart information systems and smart buildings, detecting, tracking and identifying the presence of attendants inside of enclosed rooms have evolved to a key challenge in the research area of smart building systems. Therefore, several types of sensing systems were proposed over the past decade to tackle these challenge. Depending on the component’s arrangement, a distinction is made between so-called device-based active and device-free passive sensing systems. Here we focus on the device-free passive concept and introduce a strategy of using Bluetooth Low Energy beacons for passive presence detection.
        
    In an era of smart information systems and smart buildings, detecting, tracking and identifying the presence of attendants inside of enclosed rooms have evolved to a key challenge in the research area of smart building systems. Therefore, several types of sensing systems were proposed over the past decade to tackle these challenge. Depending on the component’s arrangement, a distinction is made between so-called device-based active and device-free passive sensing systems. Here we focus on the device-free passive concept and introduce a strategy of using Bluetooth Low Energy beacons for passive presence detection.
        ES2019-84
Defending against poisoning attacks in online learning settings
Greg Collinge, Emil C Lupu, Luis Muñoz-González
Defending against poisoning attacks in online learning settings
Greg Collinge, Emil C Lupu, Luis Muñoz-González
            Abstract:
Machine learning systems are vulnerable to data poisoning, a coordinated attack where a fraction of the training dataset is manipulated by an attacker to subvert learning. In this paper we first formulate an optimal attack strategy against online learning classifiers to assess worst-case scenarios. We also propose two defence mechanisms to mitigate the effect of online poisoning attacks by analysing the impact of the data points in the classifier and by means of an adaptive combination of machine learning classifiers with different learning rates. Our experimental evaluation supports the usefulness of our proposed defences to mitigate the effect of poisoning attacks in online learning settings.
        
    Machine learning systems are vulnerable to data poisoning, a coordinated attack where a fraction of the training dataset is manipulated by an attacker to subvert learning. In this paper we first formulate an optimal attack strategy against online learning classifiers to assess worst-case scenarios. We also propose two defence mechanisms to mitigate the effect of online poisoning attacks by analysing the impact of the data points in the classifier and by means of an adaptive combination of machine learning classifiers with different learning rates. Our experimental evaluation supports the usefulness of our proposed defences to mitigate the effect of poisoning attacks in online learning settings.
        ES2019-90
Hybrid vibration signal monitoring approach for rolling element bearings
Jarno Kansanaho, Tommi Kärkkäinen
Hybrid vibration signal monitoring approach for rolling element bearings
Jarno Kansanaho, Tommi Kärkkäinen
            Abstract:
New approach to identify different lifetime stages of rolling element bearings,to improve early bearing fault detection, is presented. We extract characteristic features from vibration signals generated by rolling element bearings. This data is first pre-labelled with an unsupervised clustering method. Then, supervised methods are used to improve the labelling. Moreover, we assess feature importance with each classifier. From the practical point of view, the classifiers are compared on how early emergence of a bearing fault is being suggested. The results show that all of the classifiers are usable for bearing fault detection and the importance of the features was consistent.
        
    New approach to identify different lifetime stages of rolling element bearings,to improve early bearing fault detection, is presented. We extract characteristic features from vibration signals generated by rolling element bearings. This data is first pre-labelled with an unsupervised clustering method. Then, supervised methods are used to improve the labelling. Moreover, we assess feature importance with each classifier. From the practical point of view, the classifiers are compared on how early emergence of a bearing fault is being suggested. The results show that all of the classifiers are usable for bearing fault detection and the importance of the features was consistent.
        ES2019-93
Modal sense classification with task-specific context embeddings
Bo Li, Mathieu Dehouck, Pascal Denis
Modal sense classification with task-specific context embeddings
Bo Li, Mathieu Dehouck, Pascal Denis
            Abstract:
Sense disambiguation of modal constructions is a crucial part of natural language understanding. Framed as a supervised learning task, this problem heavily depends on an adequate feature representation of the modal verb context. Inspired by recent work on general word sense disambiguation, we propose a simple approach of modal sense classification in which standard shallow features are enhanced with task-specific context embedding features. Comprehensive experiments show that these enriched contextual representations fed into a simple SVM model lead to significant classification gains over shallow feature sets.
        
    Sense disambiguation of modal constructions is a crucial part of natural language understanding. Framed as a supervised learning task, this problem heavily depends on an adequate feature representation of the modal verb context. Inspired by recent work on general word sense disambiguation, we propose a simple approach of modal sense classification in which standard shallow features are enhanced with task-specific context embedding features. Comprehensive experiments show that these enriched contextual representations fed into a simple SVM model lead to significant classification gains over shallow feature sets.
        ES2019-114
Adversarial robustness of linear models: regularization and dimensionality
Istvan Megyeri, Istvan Hegedus, Mark Jelasity
Adversarial robustness of linear models: regularization and dimensionality
Istvan Megyeri, Istvan Hegedus, Mark Jelasity
            Abstract:
Many machine learning models are sensitive to adversarial input, meaning that very small but carefully designed noise added to correctly classified examples may lead to misclassification. The reasons for this are still poorly understood, even in the simple case of linear models. Here, we study linear models and offer a number of novel insights. We focus on the effect of regularization and dimensionality. We show that in very high dimensions adversarial robustness is inherently very low due to some mathematical properties of high-dimensional spaces that have received little attention so far. We also demonstrate that---although regularization may help---adversarial robustness is harder to achieve than high accuracy during the learning process. This is typically overlooked when researchers set optimization meta-parameters.
        
    Many machine learning models are sensitive to adversarial input, meaning that very small but carefully designed noise added to correctly classified examples may lead to misclassification. The reasons for this are still poorly understood, even in the simple case of linear models. Here, we study linear models and offer a number of novel insights. We focus on the effect of regularization and dimensionality. We show that in very high dimensions adversarial robustness is inherently very low due to some mathematical properties of high-dimensional spaces that have received little attention so far. We also demonstrate that---although regularization may help---adversarial robustness is harder to achieve than high accuracy during the learning process. This is typically overlooked when researchers set optimization meta-parameters.
        ES2019-167
A Simple and Effective Scheme for Data Pre-processing in Extreme Classification
Sujay Khandagale, Rohit Babbar
A Simple and Effective Scheme for Data Pre-processing in Extreme Classification
Sujay Khandagale, Rohit Babbar
            Abstract:
Extreme multi-label classification (XMC) refers to supervised multi-label learning involving hundreds of thousand or even millions of labels. It has been shown to be an effective framework for addressing crucial tasks such as recommendation, ranking and web-advertising. In this paper, we propose a method for effective and well-motivated data pre-processing scheme in XMC. We show that our proposed algorithm, PrunEX, can remove upto 90% data in the input which is redundant from a classification view-point. Our scheme is universal in the sense it is applicable to all known public datasets in the domain of XMC.
        
    Extreme multi-label classification (XMC) refers to supervised multi-label learning involving hundreds of thousand or even millions of labels. It has been shown to be an effective framework for addressing crucial tasks such as recommendation, ranking and web-advertising. In this paper, we propose a method for effective and well-motivated data pre-processing scheme in XMC. We show that our proposed algorithm, PrunEX, can remove upto 90% data in the input which is redundant from a classification view-point. Our scheme is universal in the sense it is applicable to all known public datasets in the domain of XMC.
        ES2019-154
MAP best performances prediction for endurance runners
Ángel Campo, Marc Francaux, Laurent Baijot, Michel Verleysen
MAP best performances prediction for endurance runners
Ángel Campo, Marc Francaux, Laurent Baijot, Michel Verleysen
            Abstract:
The preparation of long-distance runners requires to estimate their potential race performances beforehand. Athlete performances can be modeled based on their past records, but the task is made difficult because of the high variability in runner race performances. This paper presents a maximum a posteriori (MAP) estimation that addresses the issues related to this high variability. The inclusion of athlete priors and a specific residual model are inferred with the help of a large set of race results.
        
    The preparation of long-distance runners requires to estimate their potential race performances beforehand. Athlete performances can be modeled based on their past records, but the task is made difficult because of the high variability in runner race performances. This paper presents a maximum a posteriori (MAP) estimation that addresses the issues related to this high variability. The inclusion of athlete priors and a specific residual model are inferred with the help of a large set of race results.
        ES2019-58
TrIK-SVM : an alternative decomposition for kernel methods in Kreı̆n spaces
Gaelle Loosli
TrIK-SVM : an alternative decomposition for kernel methods in Kreı̆n spaces
Gaelle Loosli
            Abstract:
The proposed work aims at proposing a alternative kernel decomposition in the context of kernel machines with indefinite kernels. The original paper of KSVM (SVM in Kreı̆n spaces) uses the eigen-decomposition, our proposition avoids this decompostion. We explain how it can help in designing an algorithm that won’t require to compute the full kernel matrix. Finally we illustrate the good behavior of the proposed method compared to KSVM.
    The proposed work aims at proposing a alternative kernel decomposition in the context of kernel machines with indefinite kernels. The original paper of KSVM (SVM in Kreı̆n spaces) uses the eigen-decomposition, our proposition avoids this decompostion. We explain how it can help in designing an algorithm that won’t require to compute the full kernel matrix. Finally we illustrate the good behavior of the proposed method compared to KSVM.
Embeddings and Representation Learning for Structured Data
        ES2019-4
Embeddings and Representation Learning for Structured Data
Benjamin Paaßen, Claudio Gallicchio, Alessio Micheli, Alessandro Sperduti
Embeddings and Representation Learning for Structured Data
Benjamin Paaßen, Claudio Gallicchio, Alessio Micheli, Alessandro Sperduti
            Abstract:
Learning models of structured data, such as sequences, trees, and graphs, has become a rich and promising research objective in many fields of machine learning, such as (deep) neural networks, probabilistic models, kernels, metric learning, and dimensionality reduction. All these seemingly disparate approaches are connected by their construction of vectorial representations and embeddings of structured data, be it implicit or explicit, fixed or learned, deterministic or stochastic. Such embeddings can not only be utilized for classification or regression, but for generation of structured data, visualization, and interpretation.
        
    Learning models of structured data, such as sequences, trees, and graphs, has become a rich and promising research objective in many fields of machine learning, such as (deep) neural networks, probabilistic models, kernels, metric learning, and dimensionality reduction. All these seemingly disparate approaches are connected by their construction of vectorial representations and embeddings of structured data, be it implicit or explicit, fixed or learned, deterministic or stochastic. Such embeddings can not only be utilized for classification or regression, but for generation of structured data, visualization, and interpretation.
        ES2019-107
Graph generation by sequential edge prediction
Davide Bacciu, Alessio Micheli, Podda Marco
Graph generation by sequential edge prediction
Davide Bacciu, Alessio Micheli, Podda Marco
            Abstract:
Graph generation with Machine Learning models is a challenging problem with applications in various research fields. Here, we propose a recurrent Deep Learning based model to generate graphs by learning to predict their ordered edge sequence. Despite its simplicity, our experiments on a wide range of datasets show that our approach is able to generate graphs originating from very different distributions, outperforming canonical graph generative models from graph theory, and reaching performances comparable to the current state of the art on graph generation.
        
    Graph generation with Machine Learning models is a challenging problem with applications in various research fields. Here, we propose a recurrent Deep Learning based model to generate graphs by learning to predict their ordered edge sequence. Despite its simplicity, our experiments on a wide range of datasets show that our approach is able to generate graphs originating from very different distributions, outperforming canonical graph generative models from graph theory, and reaching performances comparable to the current state of the art on graph generation.
        ES2019-137
On the definition of complex structured feature spaces
Nicolò Navarin, Dinh Van Tran, Alessandro Sperduti
On the definition of complex structured feature spaces
Nicolò Navarin, Dinh Van Tran, Alessandro Sperduti
            Abstract:
In this paper, we propose a graph kernel whose feature space is defined combining pairs of features of an existing base graph kernel. Furthermore, we propose a variation where the feature space is adaptive with respect to the learning task at hand, allowing to learn a representation suited to it. Experimental results on six real-world graph datasets from different domains show that the proposed kernels are able to get a consistent performance improvement over the considered base kernel, and over previously defined feature combination methods in literature.
        
    In this paper, we propose a graph kernel whose feature space is defined combining pairs of features of an existing base graph kernel. Furthermore, we propose a variation where the feature space is adaptive with respect to the learning task at hand, allowing to learn a representation suited to it. Experimental results on six real-world graph datasets from different domains show that the proposed kernels are able to get a consistent performance improvement over the considered base kernel, and over previously defined feature combination methods in literature.
        ES2019-60
Deep Weisfeiler-Lehman assignment kernels via multiple kernel learning
Nils Morten Kriege
Deep Weisfeiler-Lehman assignment kernels via multiple kernel learning
Nils Morten Kriege
            Abstract:
Kernels for structured data are commonly obtained by decomposing objects into their parts and adding up the similarities between all pairs of parts measured by a base kernel. Assignment kernels are based on an optimal bijection between the parts and have proven to be an effective alternative to the established convolution kernels. We explore how the base kernel can be learned as part of the classification problem. We build on the theory of valid assignment kernels derived from hierarchies defined on the parts. We show that the weights of this hierarchy can be optimized via multiple kernel learning. We apply this result to learn vertex similarities for the Weisfeiler-Lehman optimal assignment kernel for graph classification. We present first experimental results which demonstrate the feasibility and effectiveness of the approach.
        
    Kernels for structured data are commonly obtained by decomposing objects into their parts and adding up the similarities between all pairs of parts measured by a base kernel. Assignment kernels are based on an optimal bijection between the parts and have proven to be an effective alternative to the established convolution kernels. We explore how the base kernel can be learned as part of the classification problem. We build on the theory of valid assignment kernels derived from hierarchies defined on the parts. We show that the weights of this hierarchy can be optimized via multiple kernel learning. We apply this result to learn vertex similarities for the Weisfeiler-Lehman optimal assignment kernel for graph classification. We present first experimental results which demonstrate the feasibility and effectiveness of the approach.
        ES2019-67
Predicting vehicle behaviour using LSTMs and a vector power representation for spatial positions
Florian Mirus, Peter Blouw, Stewart Terrence, Jörg Conradt
Predicting vehicle behaviour using LSTMs and a vector power representation for spatial positions
Florian Mirus, Peter Blouw, Stewart Terrence, Jörg Conradt
            Abstract:
Predicting future vehicle behaviour is an essential task to enable safe and situation-aware automated driving. In this paper, we propose to encapsulate spatial information of multiple objects in a semantic vector-representation. Assuming that future vehicle motion is influenced not only by past positions but also by the behaviour of other traffic participants, we use this representation as input for a Long Short-Term Memory (LSTM) network for sequence to sequence prediction of vehicle positions. We train and evaluate our system on real-world driving data collected mainly on highways in southern Germany and compare it to other models for reference.
        
    Predicting future vehicle behaviour is an essential task to enable safe and situation-aware automated driving. In this paper, we propose to encapsulate spatial information of multiple objects in a semantic vector-representation. Assuming that future vehicle motion is influenced not only by past positions but also by the behaviour of other traffic participants, we use this representation as input for a Long Short-Term Memory (LSTM) network for sequence to sequence prediction of vehicle positions. We train and evaluate our system on real-world driving data collected mainly on highways in southern Germany and compare it to other models for reference.
        ES2019-79
Efficient learning of email similarities for customer support
Jelle Bakker, Kerstin Bunte
Efficient learning of email similarities for customer support
Jelle Bakker, Kerstin Bunte
            Abstract:
One way to increase customer satisfaction is efficient and con-sistent customer email support. In this contribution we investigate the use of dimensionality reduction, metric learning and classification methods to predict answer templates that can be used by an employee or retrieve his-toric conversations with potential suitable answers given an email query. The strategies are tested on email data and the publicly available Reuters data. We conclude that prototype-based metric learning is fast to train and the parameters provide a compressed representation of the database enabling efficient content based retrieval. Furthermore, learning customer email embeddings based on the similarity of employee answers is a promising direction for computer aided customer support.
        
    One way to increase customer satisfaction is efficient and con-sistent customer email support. In this contribution we investigate the use of dimensionality reduction, metric learning and classification methods to predict answer templates that can be used by an employee or retrieve his-toric conversations with potential suitable answers given an email query. The strategies are tested on email data and the publicly available Reuters data. We conclude that prototype-based metric learning is fast to train and the parameters provide a compressed representation of the database enabling efficient content based retrieval. Furthermore, learning customer email embeddings based on the similarity of employee answers is a promising direction for computer aided customer support.
        ES2019-140
Nonnegative matrix factorization with polynomial signals via hierarchical alternating least squares
Cécile Hautecoeur, François Glineur
Nonnegative matrix factorization with polynomial signals via hierarchical alternating least squares
Cécile Hautecoeur, François Glineur
            Abstract:
Nonnegative matrix factorization (NMF) is a widely used tool in data analysis due to its ability to extract significant features from data vectors. Among algorithms developed to solve NMF, hierarchical alternative least squares (HALS) is often used to obtain state-of-the-art results. We generalize HALS to tackle an NMF problem where both input data and features consist of nonnegative polynomial signals. Compared to standard HALS applied to a discretization of the problem, our algorithm is able to recover smoother features, with a computational time growing moderately with the number of observations compared to existing approaches.
    Nonnegative matrix factorization (NMF) is a widely used tool in data analysis due to its ability to extract significant features from data vectors. Among algorithms developed to solve NMF, hierarchical alternative least squares (HALS) is often used to obtain state-of-the-art results. We generalize HALS to tackle an NMF problem where both input data and features consist of nonnegative polynomial signals. Compared to standard HALS applied to a discretization of the problem, our algorithm is able to recover smoother features, with a computational time growing moderately with the number of observations compared to existing approaches.
Deep learning and CNN
        ES2019-30
Deep Embedded SOM: joint representation learning and self-organization
Florent Forest, Lebbah Mustapha, Azzag Hanane, Jérôme Lacaille
Deep Embedded SOM: joint representation learning and self-organization
Florent Forest, Lebbah Mustapha, Azzag Hanane, Jérôme Lacaille
            Abstract:
In the wake of recent advances in joint clustering and deep learning, we introduce the Deep Embedded Self-Organizing Map, a model that jointly learns representations and the code vectors of a self-organizing map. Our model is composed of an autoencoder and a custom SOM layer that are optimized in a joint training procedure, motivated by the idea that the SOM prior could help learning SOM-friendly representations. We evaluate SOM-based models in terms of clustering quality and unsupervised clustering accuracy, and study the benefits of joint training.
        
    In the wake of recent advances in joint clustering and deep learning, we introduce the Deep Embedded Self-Organizing Map, a model that jointly learns representations and the code vectors of a self-organizing map. Our model is composed of an autoencoder and a custom SOM layer that are optimized in a joint training procedure, motivated by the idea that the SOM prior could help learning SOM-friendly representations. We evaluate SOM-based models in terms of clustering quality and unsupervised clustering accuracy, and study the benefits of joint training.
        ES2019-48
Deep convolutional neural network for survival estimation of Amyotrophic Lateral Sclerosis patients
Enrico Grisan, Alessandro Zandonà, Barbara Di Camillo
Deep convolutional neural network for survival estimation of Amyotrophic Lateral Sclerosis patients
Enrico Grisan, Alessandro Zandonà, Barbara Di Camillo
            Abstract:
We propose a convolutional neural network (CNN) coupled with a fully connected top layer for survival estimation. We design an objective function to directly estimate the probability of survival at discrete time intervals, conditional to the patient not having incurred any adverse event at previous time points. We test our CNN and objective function on a large dataset of longitudinal data of patients with Amyotrophic Lateral Sclerosis (ALS). We compare our CNN and the objective function against other neural networks designed for survival analysis, and against the optimization of Cox-partial-likelihood or a simple logistic classifier. The use of our objective function outperforms both Cox-partial-likelihood and logistic classifier, independently of the network architecture, and our deep CNN provides the best results in terms of AU-ROC, accuracy and mean absolute error.
        
    We propose a convolutional neural network (CNN) coupled with a fully connected top layer for survival estimation. We design an objective function to directly estimate the probability of survival at discrete time intervals, conditional to the patient not having incurred any adverse event at previous time points. We test our CNN and objective function on a large dataset of longitudinal data of patients with Amyotrophic Lateral Sclerosis (ALS). We compare our CNN and the objective function against other neural networks designed for survival analysis, and against the optimization of Cox-partial-likelihood or a simple logistic classifier. The use of our objective function outperforms both Cox-partial-likelihood and logistic classifier, independently of the network architecture, and our deep CNN provides the best results in terms of AU-ROC, accuracy and mean absolute error.
        ES2019-89
Detecting adversarial examples with inductive Venn-ABERS predictors
Jonathan Peck, Bart Goossens, Yvan Saeys
Detecting adversarial examples with inductive Venn-ABERS predictors
Jonathan Peck, Bart Goossens, Yvan Saeys
            Abstract:
Inductive Venn-ABERS predictors (IVAPs) are a type of probabilistic predictors with the theoretical guarantee that their predictions are perfectly calibrated. We propose to exploit this calibration property for the detection of adversarial examples in binary classification tasks. By rejecting predictions if the uncertainty of the IVAP is too high, we obtain an algorithm that is both accurate on the original test set and significantly more robust to adversarial examples. The method appears to be competitive to the state of the art in adversarial defense, both in terms of robustness as well as scalability.
        
    Inductive Venn-ABERS predictors (IVAPs) are a type of probabilistic predictors with the theoretical guarantee that their predictions are perfectly calibrated. We propose to exploit this calibration property for the detection of adversarial examples in binary classification tasks. By rejecting predictions if the uncertainty of the IVAP is too high, we obtain an algorithm that is both accurate on the original test set and significantly more robust to adversarial examples. The method appears to be competitive to the state of the art in adversarial defense, both in terms of robustness as well as scalability.
        ES2019-113
Learning Rich Event Representations and Interactions for Temporal Relation Classification
Onkar Pandit, Pascal Denis, Liva Ralaivola
Learning Rich Event Representations and Interactions for Temporal Relation Classification
Onkar Pandit, Pascal Denis, Liva Ralaivola
            Abstract:
Most existing systems for identifying temporal relations between events heavily rely on hand-crafted features derived from event words and explicit temporal markers. Besides, less attention has been given to automatically learning contextualized event representations or to finding complex interactions between events. This paper fills this gap in showing that a combination of rich event representations and interaction learning is essential to more accurate temporal relation classification. Specifically, we propose a neural architecture, in which i) Recurrent Neural Network (RNN) is used to extract contextual information for pairs of events, and ii) a deep Convolutional Neural Network (CNN) architecture is used to find out intricate interactions between events. We show that the proposed approach outperforms most existing systems on commonly used datasets, while using fully automatic feature extraction and simple local inference.
        
    Most existing systems for identifying temporal relations between events heavily rely on hand-crafted features derived from event words and explicit temporal markers. Besides, less attention has been given to automatically learning contextualized event representations or to finding complex interactions between events. This paper fills this gap in showing that a combination of rich event representations and interaction learning is essential to more accurate temporal relation classification. Specifically, we propose a neural architecture, in which i) Recurrent Neural Network (RNN) is used to extract contextual information for pairs of events, and ii) a deep Convolutional Neural Network (CNN) architecture is used to find out intricate interactions between events. We show that the proposed approach outperforms most existing systems on commonly used datasets, while using fully automatic feature extraction and simple local inference.
        ES2019-156
L1-norm double backpropagation adversarial defense
Ismaila Seck, Gaelle Loosli, Stéphane Canu
L1-norm double backpropagation adversarial defense
Ismaila Seck, Gaelle Loosli, Stéphane Canu
            Abstract:
Adversarial examples are a challenging open problem for deep neural networks. We propose in this paper to add a penalization term that forces the decision function to be flat in some regions of the input space, such that it becomes, at least locally, less sensitive to attacks. Our proposition is theoretically motivated and shows on a first set of carefully conducted experiments that it behaves as expected when used alone, and seems promising when coupled with adversarial training.
        
    Adversarial examples are a challenging open problem for deep neural networks. We propose in this paper to add a penalization term that forces the decision function to be flat in some regions of the input space, such that it becomes, at least locally, less sensitive to attacks. Our proposition is theoretically motivated and shows on a first set of carefully conducted experiments that it behaves as expected when used alone, and seems promising when coupled with adversarial training.
        ES2019-174
Application of deep neural networks for automatic planning in radiation oncology treatments
Ana Barragan Montero, Dan Nguyen, Weiguo Lu, Mu-Han Lin, Xavier Geets, edmond sterpin, Steve Jiang
Application of deep neural networks for automatic planning in radiation oncology treatments
Ana Barragan Montero, Dan Nguyen, Weiguo Lu, Mu-Han Lin, Xavier Geets, edmond sterpin, Steve Jiang
            Abstract:
Treatment planning for radiotherapy patients is a time-consuming and manual process. In this work, we investigate the use of deep neural networks to learn from previous clinical cases and directly predict the optimal dose distribution for a new patient. The proposed model combines two architectures, UNet and DenseNet, and used mean squared error as loss function. Ten input channels were used to include dosimetric and anatomical information. A set of 100 patients was used for training/validation and 29 for testing. Dice similarity coefficients ≥ 0.9 for the isodose-lines in the predicted versus the clinical dose showed the excellent accuracy of the model.
        
    Treatment planning for radiotherapy patients is a time-consuming and manual process. In this work, we investigate the use of deep neural networks to learn from previous clinical cases and directly predict the optimal dose distribution for a new patient. The proposed model combines two architectures, UNet and DenseNet, and used mean squared error as loss function. Ten input channels were used to include dosimetric and anatomical information. A set of 100 patients was used for training/validation and 29 for testing. Dice similarity coefficients ≥ 0.9 for the isodose-lines in the predicted versus the clinical dose showed the excellent accuracy of the model.
        ES2019-25
Conditional WGAN for grasp generation
Florian Patzelt, Robert Haschke, Helge Ritter
Conditional WGAN for grasp generation
Florian Patzelt, Robert Haschke, Helge Ritter
            Abstract:
This work proposes a new approach to robotic grasping exploiting conditional Wasserstein generative adversarial networks (WGANs), which output promising grasp candidates from depth image inputs. In contrast to discriminative models, the WGAN approach enables deliberative navigation in the set of feasible grasps and thus allows a smooth integration with other motion planning tools. We find that the training autonomously partitioned the space of feasible grasps into several regions corresponding to different grasp types. Each region forms a smooth grasp manifold with latent parameters corresponding to important grasp parameters like approach direction. We evaluate the model in simulation on the multi-fingered Shadow Robot hand, comparing it a) to a classical grasp planner for primitive geometric object shapes and b) to a state-of-the-art discriminative network model. The proposed generative model matches the grasp success rate of its trainer models and exhibits better generalization.
        
    This work proposes a new approach to robotic grasping exploiting conditional Wasserstein generative adversarial networks (WGANs), which output promising grasp candidates from depth image inputs. In contrast to discriminative models, the WGAN approach enables deliberative navigation in the set of feasible grasps and thus allows a smooth integration with other motion planning tools. We find that the training autonomously partitioned the space of feasible grasps into several regions corresponding to different grasp types. Each region forms a smooth grasp manifold with latent parameters corresponding to important grasp parameters like approach direction. We evaluate the model in simulation on the multi-fingered Shadow Robot hand, comparing it a) to a classical grasp planner for primitive geometric object shapes and b) to a state-of-the-art discriminative network model. The proposed generative model matches the grasp success rate of its trainer models and exhibits better generalization.
        ES2019-19
Multilingual short text categorization using convolutional neural network
Liriam Enamoto, Li Weigang
Multilingual short text categorization using convolutional neural network
Liriam Enamoto, Li Weigang
            Abstract:
One of the most meaningful use of online social media is to communicate quickly during emergency. In case of global emergency, the threat might cross countries borders, affect different cultures and languages. This article aims to explore Convolutional Neural Network (CNN) for multilingual short text categorization in English, Japanese and Portuguese to identify useful information in social media. A CNN is constructed for this special purpose. The experiment results show that CNN model performs better than SVM even in small dataset. And more interestingly, the cross languages test suggests that English, Japanese and Portuguese text can use the same model with few hyperparameters changes.
        
    One of the most meaningful use of online social media is to communicate quickly during emergency. In case of global emergency, the threat might cross countries borders, affect different cultures and languages. This article aims to explore Convolutional Neural Network (CNN) for multilingual short text categorization in English, Japanese and Portuguese to identify useful information in social media. A CNN is constructed for this special purpose. The experiment results show that CNN model performs better than SVM even in small dataset. And more interestingly, the cross languages test suggests that English, Japanese and Portuguese text can use the same model with few hyperparameters changes.
        ES2019-26
Fast and reliable architecture selection for convolutional neural networks
Lukas Hahn, Lutz Roese-Koerner, Klaus Friedrichs, Anton Kummert
Fast and reliable architecture selection for convolutional neural networks
Lukas Hahn, Lutz Roese-Koerner, Klaus Friedrichs, Anton Kummert
            Abstract:
The performance of a Convolutional Neural Network (CNN) depends on its hyperparameters, like the number of layers, kernel sizes, or the learning rate for example. Especially in smaller networks and applications with limited computational resources, optimisation is key.\\ We present a fast and efficient approach for CNN architecture selection. Taking into account time consumption, precision and robustness, we develop a heuristic to quickly and reliably assess a network's performance. In combination with Bayesian optimisation, to effectively cover the vast parameter space, our contribution offers a plain and powerful architecture search for this machine learning technique.
        
    The performance of a Convolutional Neural Network (CNN) depends on its hyperparameters, like the number of layers, kernel sizes, or the learning rate for example. Especially in smaller networks and applications with limited computational resources, optimisation is key.\\ We present a fast and efficient approach for CNN architecture selection. Taking into account time consumption, precision and robustness, we develop a heuristic to quickly and reliably assess a network's performance. In combination with Bayesian optimisation, to effectively cover the vast parameter space, our contribution offers a plain and powerful architecture search for this machine learning technique.
        ES2019-32
On the Speedup of Deep Reinforcement Learning Deep Q-Networks (RL-DQNs)
Anas Albaghajati, Lahouari Ghouti
On the Speedup of Deep Reinforcement Learning Deep Q-Networks (RL-DQNs)
Anas Albaghajati, Lahouari Ghouti
            Abstract:
Deep reinforcement learning (DRL) merges reinforcement (RL) and deep learning (DL). In DRL-based agents rely on high-dimensional imagery inputs to make accurate decisions. Such excessively high-dimensional inputs and sophisticated algorithms require very powerful computing resources and longer training times. To alleviate the need for powerful resources and reduce the training times, this paper proposes novel solutions to mitigate the curse-of-dimensionality without compromising the DRL agent performance. Using these solutions, the deep Q-network model (DQN) and its improved versions require less training times while achieving better performance.
        
    Deep reinforcement learning (DRL) merges reinforcement (RL) and deep learning (DL). In DRL-based agents rely on high-dimensional imagery inputs to make accurate decisions. Such excessively high-dimensional inputs and sophisticated algorithms require very powerful computing resources and longer training times. To alleviate the need for powerful resources and reduce the training times, this paper proposes novel solutions to mitigate the curse-of-dimensionality without compromising the DRL agent performance. Using these solutions, the deep Q-network model (DQN) and its improved versions require less training times while achieving better performance.
        ES2019-37
Deep Autoencoder Feature Extraction for Fault Detection of Elevator Systems
Krishna Mohan Mishra, Tomi Krogerus , Kalevi Huhtala
Deep Autoencoder Feature Extraction for Fault Detection of Elevator Systems
Krishna Mohan Mishra, Tomi Krogerus , Kalevi Huhtala
            Abstract:
In this research, we propose a generic deep autoencoder model for automated feature extraction from the elevator sensor data. Extracted deep features are classified with random forest algorithm for fault detection. Sensor data are labelled as healthy or faulty based on the maintenance actions recorded. In our research, we have included all fault types present for each elevator. The remaining healthy data is used for validation of the model to prove its efficacy in terms of avoiding false positives. We have achieved nearly 100% accuracy in fault detection along with avoiding false positives based on new extracted deep features, which outperform the results using existing features.
        
    In this research, we propose a generic deep autoencoder model for automated feature extraction from the elevator sensor data. Extracted deep features are classified with random forest algorithm for fault detection. Sensor data are labelled as healthy or faulty based on the maintenance actions recorded. In our research, we have included all fault types present for each elevator. The remaining healthy data is used for validation of the model to prove its efficacy in terms of avoiding false positives. We have achieved nearly 100% accuracy in fault detection along with avoiding false positives based on new extracted deep features, which outperform the results using existing features.
        ES2019-121
Detecting Ghostwriters in High Schools
Magnus Stavngaard, August Sørensen, Stephan Lorenzen, Niklas Hjuler, Stephen Alstrup
Detecting Ghostwriters in High Schools
Magnus Stavngaard, August Sørensen, Stephan Lorenzen, Niklas Hjuler, Stephen Alstrup
            Abstract:
Students hiring ghostwriters to write their assignments is an increasing problem in educational institutions all over the world, with companies selling these services as a product. In this work, we develop automatic techniques with special focus on detecting such ghostwriting in high school assignments. This is done by training deep neural networks on an unprecedented large amount of data supplied by the Danish company MaCom, which covers 90% of Danish high schools. We achieve an accuracy of 0.875 and a AUC score of 0.947 on an evenly split data set.
        
    Students hiring ghostwriters to write their assignments is an increasing problem in educational institutions all over the world, with companies selling these services as a product. In this work, we develop automatic techniques with special focus on detecting such ghostwriting in high school assignments. This is done by training deep neural networks on an unprecedented large amount of data supplied by the Danish company MaCom, which covers 90% of Danish high schools. We achieve an accuracy of 0.875 and a AUC score of 0.947 on an evenly split data set.
        ES2019-125
Design of Power-Efficient FPGA Convolutional Cores with Approximate Log Multiplier
Leonardo Tavares Oliveira, Min Soo Kim, Alberto Antonio Del Barrio García, Nader Bagherzadeh, Ricardo Menotti
Design of Power-Efficient FPGA Convolutional Cores with Approximate Log Multiplier
Leonardo Tavares Oliveira, Min Soo Kim, Alberto Antonio Del Barrio García, Nader Bagherzadeh, Ricardo Menotti
            Abstract:
This paper presents the design of a convolutional core that utilizes an approximate log multiplier to significantly reduce the power consumption of FPGA acceleration of convolutional neural networks. The core also exploits FPGA reconfigurability as well as the parallelism and input sharing opportunities in convolutional layers to minimize the costs. The simulation results show reductions up to 78.19% of LUT usage and 60.54% of power consumption compared to the core that uses exact fixed-point multiplier, while maintaining comparable accuracy on a subset of MNIST dataset.
        
    This paper presents the design of a convolutional core that utilizes an approximate log multiplier to significantly reduce the power consumption of FPGA acceleration of convolutional neural networks. The core also exploits FPGA reconfigurability as well as the parallelism and input sharing opportunities in convolutional layers to minimize the costs. The simulation results show reductions up to 78.19% of LUT usage and 60.54% of power consumption compared to the core that uses exact fixed-point multiplier, while maintaining comparable accuracy on a subset of MNIST dataset.
        ES2019-144
Improving Pedestrian Recognition using Incremental Cross Modality Deep Learning
Danut Ovidiu Pop, Alexandrina Rogozan, Fawzi Nashashibi, Abdelaziz Bensrhair
Improving Pedestrian Recognition using Incremental Cross Modality Deep Learning
Danut Ovidiu Pop, Alexandrina Rogozan, Fawzi Nashashibi, Abdelaziz Bensrhair
            Abstract:
Late fusion schemes with deep learning classification patterns set up with multi-modality images have an essential role in pedestrian protection systems since they have achieved prominent results in the pedestrian recognition task. In this paper, the late fusion scheme merged with Convolutional Neural Networks (CNN) is investigated for pedestrian recognition based on the Daimler stereo vision data sets. An independent CNN-based classifier for each imaging modality (Intensity, Depth, and Optical Flow) is handled before the fusion of its probabilistic output scores with a Multi-Layer Perceptron which provides the recognition decision. In this paper, we set out to prove that the incremental cross-modality deep learning approach enhances pedestrian recognition performances. It also outperforms state-of-the-art pedestrian classifiers on the Daimler stereo-vision data sets.
        
    Late fusion schemes with deep learning classification patterns set up with multi-modality images have an essential role in pedestrian protection systems since they have achieved prominent results in the pedestrian recognition task. In this paper, the late fusion scheme merged with Convolutional Neural Networks (CNN) is investigated for pedestrian recognition based on the Daimler stereo vision data sets. An independent CNN-based classifier for each imaging modality (Intensity, Depth, and Optical Flow) is handled before the fusion of its probabilistic output scores with a Multi-Layer Perceptron which provides the recognition decision. In this paper, we set out to prove that the incremental cross-modality deep learning approach enhances pedestrian recognition performances. It also outperforms state-of-the-art pedestrian classifiers on the Daimler stereo-vision data sets.
        ES2019-152
Machine learning in research and development of new vaccines products: opportunities and challenges
Paul Smyth, Gaël de Lannoy, Moritz Von Stosch, Alexander Pysik, Amin Khan
Machine learning in research and development of new vaccines products: opportunities and challenges
Paul Smyth, Gaël de Lannoy, Moritz Von Stosch, Alexander Pysik, Amin Khan
            Abstract:
Modern high-throughput technologies deployed in research and development of new vaccine products have opened the door to machine learning applications that allow the automation of tasks and support for data-driven risk-based decision making. In this paper, the opportunities and the challenges faced for the deployment of machine learning algorithms in the field of vaccines development are discussed.
        
    Modern high-throughput technologies deployed in research and development of new vaccine products have opened the door to machine learning applications that allow the automation of tasks and support for data-driven risk-based decision making. In this paper, the opportunities and the challenges faced for the deployment of machine learning algorithms in the field of vaccines development are discussed.
        ES2019-157
Real-time Convolutional Neural Networks for emotion and gender classification
Matias Valdenegro-Toro, Octavio Arriaga, Paul Plöger
Real-time Convolutional Neural Networks for emotion and gender classification
Matias Valdenegro-Toro, Octavio Arriaga, Paul Plöger
            Abstract:
Emotion and gender recognition from facial features are important properties of human empathy. Robots should also have these capabilities. For this purpose we have designed special convolutional modules that allow a model to recognize emotions and gender with a considerable lower number of parameters, enabling real-time evaluation on a constrained platform. We report accuracies of 96% in the IMDB gender dataset and 66% in the FER-2013 emotion dataset, while requiring a computation time of less than 0.008 seconds on a Core i7 CPU. All our code, demos and pre-trained architectures have been released under an open-source license in our repository at https://github.com/oarriaga/face classification
    Emotion and gender recognition from facial features are important properties of human empathy. Robots should also have these capabilities. For this purpose we have designed special convolutional modules that allow a model to recognize emotions and gender with a considerable lower number of parameters, enabling real-time evaluation on a constrained platform. We report accuracies of 96% in the IMDB gender dataset and 66% in the FER-2013 emotion dataset, while requiring a computation time of less than 0.008 seconds on a Core i7 CPU. All our code, demos and pre-trained architectures have been released under an open-source license in our repository at https://github.com/oarriaga/face classification
Learning methods and optimization
        ES2019-57
Experimental study of the neuron-level mechanisms emerging from backpropagation
Simon Carbonnelle, Christophe De Vleeschouwer
Experimental study of the neuron-level mechanisms emerging from backpropagation
Simon Carbonnelle, Christophe De Vleeschouwer
            Abstract:
The backpropagation algorithm is the most successful learning algorithm for training deep artificial neural networks. Its inner workings are in stark contrast with other learning rules, as it is based on a global, black box optimization procedure rather than the repetition of a local, neuron-level procedure (e.g. like hebbian learning). In this paper, we present preliminary evidence suggesting that local, neuron-level mechanisms are in fact emerging during backpropagation-based training of neural networks and describe what could be key components of it.
        
    The backpropagation algorithm is the most successful learning algorithm for training deep artificial neural networks. Its inner workings are in stark contrast with other learning rules, as it is based on a global, black box optimization procedure rather than the repetition of a local, neuron-level procedure (e.g. like hebbian learning). In this paper, we present preliminary evidence suggesting that local, neuron-level mechanisms are in fact emerging during backpropagation-based training of neural networks and describe what could be key components of it.
        ES2019-69
Learning multimodal fixed-point weights using gradient descent
Lukas Enderich, Fabian Timm, Lars Rosenbaum, Wolfram Burgard
Learning multimodal fixed-point weights using gradient descent
Lukas Enderich, Fabian Timm, Lars Rosenbaum, Wolfram Burgard
            Abstract:
Due to their high computational complexity, deep neural networks are still limited to powerful processing units. To promote a reduced model complexity by dint of low-bit fixed-point quantization, we propose a gradient-based optimization strategy to generate a symmetric mixture of Gaussian modes (SGM) where each mode belongs to a particular quantization stage. We achieve 2-bit state-of-the-art performance and illustrate the model's ability for self-dependent weight adaptation during training.
        
    Due to their high computational complexity, deep neural networks are still limited to powerful processing units. To promote a reduced model complexity by dint of low-bit fixed-point quantization, we propose a gradient-based optimization strategy to generate a symmetric mixture of Gaussian modes (SGM) where each mode belongs to a particular quantization stage. We achieve 2-bit state-of-the-art performance and illustrate the model's ability for self-dependent weight adaptation during training.
        ES2019-133
Preconditioned conjugate gradient algorithms for graph regularized matrix completion
Shuyu Dong, Pierre-Antoine Absil, Kyle Gallivan
Preconditioned conjugate gradient algorithms for graph regularized matrix completion
Shuyu Dong, Pierre-Antoine Absil, Kyle Gallivan
            Abstract:
Low-rank matrix completion is the problem of recovering the missing entries of a data matrix by using the assumption that a good low-rank approximation to the true matrix is possible. Much attention has been put recently to exploiting correlations between the column/row entities through side information to improve the matrix completion quality. In this paper, we propose an efficient algorithm for solving the low-rank matrix completion with graph-based regularizers. Experiments on synthetic data show that our approach achieves significant speedup compared to the alternating minimization scheme.
        
    Low-rank matrix completion is the problem of recovering the missing entries of a data matrix by using the assumption that a good low-rank approximation to the true matrix is possible. Much attention has been put recently to exploiting correlations between the column/row entities through side information to improve the matrix completion quality. In this paper, we propose an efficient algorithm for solving the low-rank matrix completion with graph-based regularizers. Experiments on synthetic data show that our approach achieves significant speedup compared to the alternating minimization scheme.
        ES2019-194
Direct calculation of out-of-sample predictions in multi-class kernel FDA
Treder Matthias
Direct calculation of out-of-sample predictions in multi-class kernel FDA
Treder Matthias
            Abstract:
After a two-class kernel Fisher Discriminant Analysis (KFDA) has been trained on the full dataset, matrix inverse updates allow for the direct calculation of out-of-sample predictions for different test sets. Here, this approach is extended to the multi-class case by casting KFDA in an Optimal Scoring framework. In simulations using 10-fold cross-validation and permutation tests the approach is shown to be more than 1000x faster than retraining the classifier in each fold. Direct out-of-sample predictions can be useful on large datasets and in studies with many training-testing iterations.
        
    After a two-class kernel Fisher Discriminant Analysis (KFDA) has been trained on the full dataset, matrix inverse updates allow for the direct calculation of out-of-sample predictions for different test sets. Here, this approach is extended to the multi-class case by casting KFDA in an Optimal Scoring framework. In simulations using 10-fold cross-validation and permutation tests the approach is shown to be more than 1000x faster than retraining the classifier in each fold. Direct out-of-sample predictions can be useful on large datasets and in studies with many training-testing iterations.
        ES2019-176
Complex Valued Gated Auto-encoder for Video Frame Prediction
Niloofar Azizi, Nils Wandel, Sven Behnke
Complex Valued Gated Auto-encoder for Video Frame Prediction
Niloofar Azizi, Nils Wandel, Sven Behnke
            Abstract:
Over recent years, complex-valued artificial neural networks have gained increasing interest as they allow neural networks to learn richer representations while potentially incorporating less parameters. Especially in the domain of computer graphics, many traditional operations such as image smoothing/sharpening rely heavily on computations in the complex domain thus complex valued neural networks apply naturally. In this paper, we perform frame predictions in video sequences using a complex valued gated auto-encoder with tied input weights. First, our method is motivated showing how the Fourier transform can be seen as the basis for translational operations. Then, we present how a complex neural network can learn such transformations and compare its performance and parameter efficiency to a real valued gated auto-encoder. Furthermore, we show how extending both --- the real and the complex valued --- neural networks by using convolutional units can significantly improve prediction performance and parameter efficiency. All networks are assessed on the bouncing ball dataset.
        
    Over recent years, complex-valued artificial neural networks have gained increasing interest as they allow neural networks to learn richer representations while potentially incorporating less parameters. Especially in the domain of computer graphics, many traditional operations such as image smoothing/sharpening rely heavily on computations in the complex domain thus complex valued neural networks apply naturally. In this paper, we perform frame predictions in video sequences using a complex valued gated auto-encoder with tied input weights. First, our method is motivated showing how the Fourier transform can be seen as the basis for translational operations. Then, we present how a complex neural network can learn such transformations and compare its performance and parameter efficiency to a real valued gated auto-encoder. Furthermore, we show how extending both --- the real and the complex valued --- neural networks by using convolutional units can significantly improve prediction performance and parameter efficiency. All networks are assessed on the bouncing ball dataset.
        ES2019-80
On overfitting of multilayer perceptrons for classification
joseph Rynkiewicz
On overfitting of multilayer perceptrons for classification
joseph Rynkiewicz
            Abstract:
In this paper, we consider classification models involving multilayer perceptrons (MLP) with rectified linear (ReLU) functions for activation units. It is a difficult task to study the statistical properties of such models. The main reason is that in practice these models may be heavily overparameterized. We study the asymptotic behavior of the difference between the loss function of estimated models and the loss function of the theoretical best model. These theoretical results give us information on the overfitting properties of such models. Some simulations illustrate our theoretical finding and raise new questions.
        
    In this paper, we consider classification models involving multilayer perceptrons (MLP) with rectified linear (ReLU) functions for activation units. It is a difficult task to study the statistical properties of such models. The main reason is that in practice these models may be heavily overparameterized. We study the asymptotic behavior of the difference between the loss function of estimated models and the loss function of the theoretical best model. These theoretical results give us information on the overfitting properties of such models. Some simulations illustrate our theoretical finding and raise new questions.
        ES2019-141
Very Simple Classifier: a concept binary classifier to investigate features based on subsampling and locality
Luca Masera, Enrico Blanzieri
Very Simple Classifier: a concept binary classifier to investigate features based on subsampling and locality
Luca Masera, Enrico Blanzieri
            Abstract:
We propose Very Simple Classifier (VSC) a novel method designed to incorporate the concepts of subsampling and locality in the definition of features to be used as the input of a perceptron. The rationale is that locality theoretically guarantees a bound on the generalization error. Each feature in VSC is a max-margin classifier built on randomly-selected pairs of samples. The locality in VSC is achieved by multiplying the value of the feature by a confidence measure that can be characterized in terms of the Chebichev inequality. The output of the layer is then fed in a output layer of neurons. The weights of the output layer are then determined by a regularized pseudoinverse. Extensive comparison of VSC against 9 competitors in the task of binary classification is carried out. Results on 22 benchmark datasets with fixed parameters show that VSC is competitive with the Multi Layer Perceptron (MLP) and outperforms the other competitors. An exploration of the parameter space shows VSC can outperform MLP.
        
    We propose Very Simple Classifier (VSC) a novel method designed to incorporate the concepts of subsampling and locality in the definition of features to be used as the input of a perceptron. The rationale is that locality theoretically guarantees a bound on the generalization error. Each feature in VSC is a max-margin classifier built on randomly-selected pairs of samples. The locality in VSC is achieved by multiplying the value of the feature by a confidence measure that can be characterized in terms of the Chebichev inequality. The output of the layer is then fed in a output layer of neurons. The weights of the output layer are then determined by a regularized pseudoinverse. Extensive comparison of VSC against 9 competitors in the task of binary classification is carried out. Results on 22 benchmark datasets with fixed parameters show that VSC is competitive with the Multi Layer Perceptron (MLP) and outperforms the other competitors. An exploration of the parameter space shows VSC can outperform MLP.
        ES2019-178
Sparse minimal learning machine using a diversity measure minimization
Madson Dias, Lucas Sousa, Ajalmar Rocha Neto, Cesar Mattos, Joao Gomes, Tommi Kärkkäinen
Sparse minimal learning machine using a diversity measure minimization
Madson Dias, Lucas Sousa, Ajalmar Rocha Neto, Cesar Mattos, Joao Gomes, Tommi Kärkkäinen
            Abstract:
The minimal learning machine (MLM) training procedure consists in solving a linear system with multiple measurement vectors (MMV) created between the geometric configurations of points in the input and output spaces. Such geometric configurations are built upon two matrices created using subsets of input and output points, named reference points (RPs). The present paper considers an extension of the focal underdetermined system solver (FOCUSS) for MMV linear systems problems with additive noise, named regularized MMV FOCUSS (regularized M FOCUSS), and evaluates it in the task of selecting input reference points for regression settings. Experiments were carried out using UCI datasets, where the proposal was able to produce sparser models and achieve competitive performance when compared to the regular strategy of selecting MLM input RPs.
        
    The minimal learning machine (MLM) training procedure consists in solving a linear system with multiple measurement vectors (MMV) created between the geometric configurations of points in the input and output spaces. Such geometric configurations are built upon two matrices created using subsets of input and output points, named reference points (RPs). The present paper considers an extension of the focal underdetermined system solver (FOCUSS) for MMV linear systems problems with additive noise, named regularized MMV FOCUSS (regularized M FOCUSS), and evaluates it in the task of selecting input reference points for regression settings. Experiments were carried out using UCI datasets, where the proposal was able to produce sparser models and achieve competitive performance when compared to the regular strategy of selecting MLM input RPs.
        ES2019-87
Minimax center to extract a common subspace from multiple datasets
Emilie Renard, Pierre-Antoine Absil, Kyle Gallivan
Minimax center to extract a common subspace from multiple datasets
Emilie Renard, Pierre-Antoine Absil, Kyle Gallivan
            Abstract:
We address the problem of extracting common information from multiple datasets. More specifically, we look for a common subspace minimizing the maximal dissimilarity with all datasets and we propose an algorithm derived from the first order necessary conditions of optimality. On synthetic datasets the proposed method gives as good results as a Riemannian based approach, but also provides an evaluation on how far the iterate is from a critical point.
        
    We address the problem of extracting common information from multiple datasets. More specifically, we look for a common subspace minimizing the maximal dissimilarity with all datasets and we propose an algorithm derived from the first order necessary conditions of optimality. On synthetic datasets the proposed method gives as good results as a Riemannian based approach, but also provides an evaluation on how far the iterate is from a critical point.
        ES2019-164
Interpolation on the manifold of fixed-rank positive-semidefinite matrices for parametric model order reduction: preliminary results
Estelle Massart, Pierre-Yves Gousenbourger, Thanh Son Nguyen, Tatjana Stykel, Pierre-Antoine Absil
Interpolation on the manifold of fixed-rank positive-semidefinite matrices for parametric model order reduction: preliminary results
Estelle Massart, Pierre-Yves Gousenbourger, Thanh Son Nguyen, Tatjana Stykel, Pierre-Antoine Absil
            Abstract:
We present several interpolation schemes on the manifold of fixed-rank positive-semidefinite (PSD) matrices. We explain how these techniques can be used for model order reduction of parameterized linear dynamical systems, and obtain preliminary results on an application.
        
    We present several interpolation schemes on the manifold of fixed-rank positive-semidefinite (PSD) matrices. We explain how these techniques can be used for model order reduction of parameterized linear dynamical systems, and obtain preliminary results on an application.
        ES2019-115
Progress Towards Graph Optimization: Efficient Learning of Vector to Graph Space Mappings
Stefan Mautner, Rolf Backofen, Fabrizio Costa
Progress Towards Graph Optimization: Efficient Learning of Vector to Graph Space Mappings
Stefan Mautner, Rolf Backofen, Fabrizio Costa
            Abstract:
Optimization in vector space domains is well understood. However, in high dimensional settings or when dealing with structured data such as sequences and graphs, optimization becomes difficult. A possible strategy is to map graphs to vector codes and use machine learning to learn a map from codes back to graphs. This in turn allows to employ standard optimization techniques over vectors to optimize graphs. Here we propose an approach to invert a vector mapping based on a combination of graph kernels and graph grammars. We evaluate the proposed approach in an artificial setup and on real molecular graphs.
    Optimization in vector space domains is well understood. However, in high dimensional settings or when dealing with structured data such as sequences and graphs, optimization becomes difficult. A possible strategy is to map graphs to vector codes and use machine learning to learn a map from codes back to graphs. This in turn allows to employ standard optimization techniques over vectors to optimize graphs. Here we propose an approach to invert a vector mapping based on a combination of graph kernels and graph grammars. We evaluate the proposed approach in an artificial setup and on real molecular graphs.
60 Years of Weightless Neural Systems
        ES2019-1
Systems with 'subjective feelings' - the perspective from weightless automata
Igor Aleksander, Helen Morton
Systems with 'subjective feelings' - the perspective from weightless automata
Igor Aleksander, Helen Morton
        ES2019-108
Prediction of palm oil production with an enhanced n-Tuple Regression Network
Leopoldo Lusquino Filho, Luiz Oliveira, Aluizio Lima Filho, Gabriel Guarisa, Priscila Machado Vieira Lima, Felipe Maia Galvão França
Prediction of palm oil production with an enhanced n-Tuple Regression Network
Leopoldo Lusquino Filho, Luiz Oliveira, Aluizio Lima Filho, Gabriel Guarisa, Priscila Machado Vieira Lima, Felipe Maia Galvão França
            Abstract:
This paper introduces Regression WiSARD and ClusRegression WiSARD, two new weightless neural network models that were applied in the challenging task of predicting the total palm oil production of a set of 28 differently located sites under different climate and soil profiles. Both models were derived from the n-tuple regression weightless neural model and obtained error rates of 8.737% and 8.938%, respectively, which are very competitive with the state-of-art (7.569%), whilst being four (4) orders of magnitude faster during the training phase.
        
    This paper introduces Regression WiSARD and ClusRegression WiSARD, two new weightless neural network models that were applied in the challenging task of predicting the total palm oil production of a set of 28 differently located sites under different climate and soil profiles. Both models were derived from the n-tuple regression weightless neural model and obtained error rates of 8.737% and 8.938%, respectively, which are very competitive with the state-of-art (7.569%), whilst being four (4) orders of magnitude faster during the training phase.
        ES2019-83
Memory Efficient Weightless Neural Network using Bloom Filter
Leandro Santiago de Araújo, Letícia Dias Verona, Fábio Medeiros Rangel, Fabricio Firmino de Faria, Daniel Sadoc Menasche, Wouter Caarls, Maurício Breternitz, Sandip Kundu, Priscila Machado Vieira Lima, Felipe Maia Galvão França
Memory Efficient Weightless Neural Network using Bloom Filter
Leandro Santiago de Araújo, Letícia Dias Verona, Fábio Medeiros Rangel, Fabricio Firmino de Faria, Daniel Sadoc Menasche, Wouter Caarls, Maurício Breternitz, Sandip Kundu, Priscila Machado Vieira Lima, Felipe Maia Galvão França
            Abstract:
Weightless Neural Networks are Artificial Neural Networks based on RAM memory broadly explored as solution for pattern recognition applications. Due to its memory approach, it can be easily implemented in hardware and software providing efficient learning mechanism. Unfortunately, the straightforward implementation requires a large amount of memory resources making its adoption impracticable on memory constraint systems. In this paper, we propose a new model of Weightless Neural Network which utilizes Bloom Filters to implement RAM nodes. By using Bloom Filters, the memory resources are widely reduced allowing false positives entries. The experiment results show that our model using Bloom Filters achieves competitive accuracy, training time and testing time, consuming up to 6 order of magnitude less memory resources in comparison with the standard Weightless Neural Network model.
        
    Weightless Neural Networks are Artificial Neural Networks based on RAM memory broadly explored as solution for pattern recognition applications. Due to its memory approach, it can be easily implemented in hardware and software providing efficient learning mechanism. Unfortunately, the straightforward implementation requires a large amount of memory resources making its adoption impracticable on memory constraint systems. In this paper, we propose a new model of Weightless Neural Network which utilizes Bloom Filters to implement RAM nodes. By using Bloom Filters, the memory resources are widely reduced allowing false positives entries. The experiment results show that our model using Bloom Filters achieves competitive accuracy, training time and testing time, consuming up to 6 order of magnitude less memory resources in comparison with the standard Weightless Neural Network model.
        ES2019-56
A WNN model based on Probabilistic Quantum Memories
Priscila G.M. dos Santos, Rodrigo S Sousa, Adenilton J. da Silva
A WNN model based on Probabilistic Quantum Memories
Priscila G.M. dos Santos, Rodrigo S Sousa, Adenilton J. da Silva
            Abstract:
In this work, we evaluate a Weightless Neural Network model based on a Probabilistic Quantum Memory. The model does not require any training and performs the classification by calculating the Hamming distance between a new sample and the training samples stored on the quantum memory. In order to evaluate the classification capabilities of this quantum model, we conducted classical experiments using an equivalent classical description of the Probabilist Quantum Memory algorithm. We present the first evaluation of a quantum weightless neural networks on public benchmark datasets.
        
    In this work, we evaluate a Weightless Neural Network model based on a Probabilistic Quantum Memory. The model does not require any training and performs the classification by calculating the Hamming distance between a new sample and the training samples stored on the quantum memory. In order to evaluate the classification capabilities of this quantum model, we conducted classical experiments using an equivalent classical description of the Probabilist Quantum Memory algorithm. We present the first evaluation of a quantum weightless neural networks on public benchmark datasets.
        ES2019-153
Weightless neural systems for deforestation surveillance and image-based navigation of UAVs in the Amazon forest
Eduardo Ribeiro, Vitor Torres, Brayan James, Mateus Braga, Elcio Shiguemori, Haroldo Velho, Luiz Torres, Antônio Braga
Weightless neural systems for deforestation surveillance and image-based navigation of UAVs in the Amazon forest
Eduardo Ribeiro, Vitor Torres, Brayan James, Mateus Braga, Elcio Shiguemori, Haroldo Velho, Luiz Torres, Antônio Braga
            Abstract:
This work proposes a novel methodology for the recognition of deforestation areas in tropical forests using weightless neural systems in UAVs. The weightless neural systems embedded in hardware brings a considerable improvement in the speed of processing of image-based navigation of UAVs. In our approach the UAV navigates at the frontier of the deforestation area by means of previously trained descriptors, being able to monitor the increase of deforestation area. Experiments using images of the Amazon rainforest have been performed to validate the proposed approach.
        
    This work proposes a novel methodology for the recognition of deforestation areas in tropical forests using weightless neural systems in UAVs. The weightless neural systems embedded in hardware brings a considerable improvement in the speed of processing of image-based navigation of UAVs. In our approach the UAV navigates at the frontier of the deforestation area by means of previously trained descriptors, being able to monitor the increase of deforestation area. Experiments using images of the Amazon rainforest have been performed to validate the proposed approach.
        ES2019-54
An evolutionary approach for optimizing weightless neural networks
Maurizio Giordano, Massimo De Gregorio
An evolutionary approach for optimizing weightless neural networks
Maurizio Giordano, Massimo De Gregorio
            Abstract:
WiSARD is a weightless neural network model using RAMs to store the function computed by each neuron rather than storing it in connection weights between neurons. Non-linearity in WiSARD is imple- mented by a mapping that splits the binary input into tuples of bits and associate these tuples to neurons. In this work we apply an evolutionary al- gorithm to make evolve an initial population of mappings by combinations and mutations toward the generation of new mappings granting significant improvements in classification accuracy in the conducted experiments.
        
    WiSARD is a weightless neural network model using RAMs to store the function computed by each neuron rather than storing it in connection weights between neurons. Non-linearity in WiSARD is imple- mented by a mapping that splits the binary input into tuples of bits and associate these tuples to neurons. In this work we apply an evolutionary al- gorithm to make evolve an initial population of mappings by combinations and mutations toward the generation of new mappings granting significant improvements in classification accuracy in the conducted experiments.
        ES2019-187
Modeling Sparse Data as Input for Weightless Neural Network
Luis Kopp, Jose Barbosa Filho, Priscila Machado Vieira Lima, Claudio de Farias
Modeling Sparse Data as Input for Weightless Neural Network
Luis Kopp, Jose Barbosa Filho, Priscila Machado Vieira Lima, Claudio de Farias
            Abstract:
Dealing with large and sparse input data has been a challenge to machine learning algorithms. In Natural Language Processing (NLP), the number of words used in a text is only a small fraction of a dictionary with all possible words and leads to very sparse matrix. In this paper we propose an alternative method for constructing the input vector in a Weightless Neural Network model using WiSARD. Our algorithm significantly outperformed the benchmark method in accuracy by 3.7\% on average when aggregating columns in groups of 3 or 6 words.
    Dealing with large and sparse input data has been a challenge to machine learning algorithms. In Natural Language Processing (NLP), the number of words used in a text is only a small fraction of a dictionary with all possible words and leads to very sparse matrix. In this paper we propose an alternative method for constructing the input vector in a Weightless Neural Network model using WiSARD. Our algorithm significantly outperformed the benchmark method in accuracy by 3.7\% on average when aggregating columns in groups of 3 or 6 words.
Domain adaptation and learning
        ES2019-20
Multi-target feature selection through output space clustering
Konstantinos Sechidis, Eleftherios Spyromitros-Xioufis, Ioannis Vlahavas
Multi-target feature selection through output space clustering
Konstantinos Sechidis, Eleftherios Spyromitros-Xioufis, Ioannis Vlahavas
            Abstract:
A key challenge in information theoretic feature selection is to estimate mutual information expressions that capture three desirable terms: the relevancy of a feature with the output, the redundancy and the complementarity between groups of features. The challenge becomes more pronounced in multi-target problems, where the output space is multi-dimensional. Our work presents a generic algorithm that captures these three desirable terms and is suitable for the well-known multi-target prediction settings of multi-label/dimensional classification and multivariate regression. We achieve this by combining two ideas: deriving low-order information theoretic approximations for the input space and using clustering for deriving low-dimensional approximations of the output space.
        
    A key challenge in information theoretic feature selection is to estimate mutual information expressions that capture three desirable terms: the relevancy of a feature with the output, the redundancy and the complementarity between groups of features. The challenge becomes more pronounced in multi-target problems, where the output space is multi-dimensional. Our work presents a generic algorithm that captures these three desirable terms and is suitable for the well-known multi-target prediction settings of multi-label/dimensional classification and multivariate regression. We achieve this by combining two ideas: deriving low-order information theoretic approximations for the input space and using clustering for deriving low-dimensional approximations of the output space.
        ES2019-162
Feature relevance bounds for ordinal regression
Lukas Pfannschmidt, Jonathan Jakob, Michael Biehl, Peter Tino, Barbara Hammer
Feature relevance bounds for ordinal regression
Lukas Pfannschmidt, Jonathan Jakob, Michael Biehl, Peter Tino, Barbara Hammer
            Abstract:
The increasing occurrence of ordinal data, mainly sociodemographic, led to a renewed research interest in ordinal regression, i.e. the prediction of ordered classes. Besides model accuracy, the interpretation of these models itself is of high relevance, and existing approaches therefore enforce e.g. model sparsity. For high dimensional or highly correlated data, however, this might be misleading due to strong variable dependencies. In this contribution, we aim for an identification of feature relevance bounds which – besides identifying all relevant features – explicitly differentiates between strongly and weakly relevant features.
        
    The increasing occurrence of ordinal data, mainly sociodemographic, led to a renewed research interest in ordinal regression, i.e. the prediction of ordered classes. Besides model accuracy, the interpretation of these models itself is of high relevance, and existing approaches therefore enforce e.g. model sparsity. For high dimensional or highly correlated data, however, this might be misleading due to strong variable dependencies. In this contribution, we aim for an identification of feature relevance bounds which – besides identifying all relevant features – explicitly differentiates between strongly and weakly relevant features.
        ES2019-44
User-steering interpretable visualization with probabilistic principal components analysis
Viet Minh Vu, Benoît Frénay
User-steering interpretable visualization with probabilistic principal components analysis
Viet Minh Vu, Benoît Frénay
            Abstract:
The lack of interpretability generally in machine learning and specifically in visualization is often encountered. Integration of user's feedbacks into visualization process is a potential solution. This paper shows that the user's knowledge expressed by the positions of fixed points in the visualization can be transferred directly into a probabilistic principal components analysis (PPCA) model to help user steer the visualization. Our proposed interactive PPCA model is evaluated with different datasets to prove the feasibility of creating explainable axes for the visualization.
        
    The lack of interpretability generally in machine learning and specifically in visualization is often encountered. Integration of user's feedbacks into visualization process is a potential solution. This paper shows that the user's knowledge expressed by the positions of fixed points in the visualization can be transferred directly into a probabilistic principal components analysis (PPCA) model to help user steer the visualization. Our proposed interactive PPCA model is evaluated with different datasets to prove the feasibility of creating explainable axes for the visualization.
        ES2019-49
Metric learning with submodular functions
Jiajun Pan, Hoel Le Capitaine
Metric learning with submodular functions
Jiajun Pan, Hoel Le Capitaine
            Abstract:
Metric learning mainly focuses on learning distances (or similarities) that use single feature weights with Lp norms, or pair of features with Mahalanobis distances. In this paper, we consider higher order interactions in the feature space, by the help of submodular set-functions. We propose to define a distance metric for continuous features based on submodular functions, and then present a dedicated metric learning approach. This is naturally at the price of higher complexity, so that we propose a method allowing to decrease this complexity, by reducing the order of interactions that are taken into account. This approach finally gives a computationally feasible problem. Experiments on various datasets show the effectiveness of the approach.
        
    Metric learning mainly focuses on learning distances (or similarities) that use single feature weights with Lp norms, or pair of features with Mahalanobis distances. In this paper, we consider higher order interactions in the feature space, by the help of submodular set-functions. We propose to define a distance metric for continuous features based on submodular functions, and then present a dedicated metric learning approach. This is naturally at the price of higher complexity, so that we propose a method allowing to decrease this complexity, by reducing the order of interactions that are taken into account. This approach finally gives a computationally feasible problem. Experiments on various datasets show the effectiveness of the approach.
        ES2019-135
Fusing Features based on Signal Properties and TimeNet for Time Series Classification
Arijit Ukil, Pankaj Malhotra, Soma Bandyopadhyay, Tulika Bose, Ishan Sahu, Ayan Mukherjee, Lovekesh Vig, Arpan Pal, Gautam Shroff
Fusing Features based on Signal Properties and TimeNet for Time Series Classification
Arijit Ukil, Pankaj Malhotra, Soma Bandyopadhyay, Tulika Bose, Ishan Sahu, Ayan Mukherjee, Lovekesh Vig, Arpan Pal, Gautam Shroff
            Abstract:
Automated feature extraction from time series to capture statistical, temporal, spectral, and morphololgical properties is highly desirable but challenging due to diverse nature of real-world time series applications. In this paper, we consider extracting a rich and robust set of time series features encompassing signal processing based features as well as generic hierarchical features extracted via deep neural networks. We present SPGF-TimeNet: a generic feature extractor for time series that allows fusion of signal processing, information-theoretic, and statistical features (Signal Properties based Generic Features (SPGF)) with features from an off-the-shelf pre-trained deep recurrent neural network (TimeNet). Through empirical evalution on diverse benchmark datasets from the UCR Time Series Classication (TSC) Archive, we show that classfiers trained on SPGF-TimeNet-based hybrid and generic features outperform state-of-the-art TSC algorithms such as BOSS, while being computationally efficient.
        
    Automated feature extraction from time series to capture statistical, temporal, spectral, and morphololgical properties is highly desirable but challenging due to diverse nature of real-world time series applications. In this paper, we consider extracting a rich and robust set of time series features encompassing signal processing based features as well as generic hierarchical features extracted via deep neural networks. We present SPGF-TimeNet: a generic feature extractor for time series that allows fusion of signal processing, information-theoretic, and statistical features (Signal Properties based Generic Features (SPGF)) with features from an off-the-shelf pre-trained deep recurrent neural network (TimeNet). Through empirical evalution on diverse benchmark datasets from the UCR Time Series Classication (TSC) Archive, we show that classfiers trained on SPGF-TimeNet-based hybrid and generic features outperform state-of-the-art TSC algorithms such as BOSS, while being computationally efficient.
        ES2019-51
Metric learning with relational data
Jiajun Pan, Hoel Le Capitaine
Metric learning with relational data
Jiajun Pan, Hoel Le Capitaine
            Abstract:
The vast majority of metric learning approaches are meant to be applied on data described by feature vectors, with some notable exceptions such as times series, trees or graphs. The objective of this paper is to propose metric learning algorithms that consider multi-relational data. More specifically, we present a metric learning approach taking into account the features of the observations, as well as the relationships between observations.Experiments and comparisons of the two settings for a collective classification task on real-world datasets show that our method i) presents a better performance than other approaches in both settings, and ii) scales well with the volume of the data.
        
    The vast majority of metric learning approaches are meant to be applied on data described by feature vectors, with some notable exceptions such as times series, trees or graphs. The objective of this paper is to propose metric learning algorithms that consider multi-relational data. More specifically, we present a metric learning approach taking into account the features of the observations, as well as the relationships between observations.Experiments and comparisons of the two settings for a collective classification task on real-world datasets show that our method i) presents a better performance than other approaches in both settings, and ii) scales well with the volume of the data.
        ES2019-110
Feature and Algorithm Selection for Capacitated Vehicle Routing Problems
Jussi Rasku, Nysret Musliu, Tommi Kärkkäinen
Feature and Algorithm Selection for Capacitated Vehicle Routing Problems
Jussi Rasku, Nysret Musliu, Tommi Kärkkäinen
            Abstract:
Many exact, heuristic, and metaheuristic algorithms have been proposed to effectively produce high quality solutions to vehicle routing problems. However, it remains an open question which algorithm is the most appropriate for solving a given problem instance, mostly because the different strengths and weaknesses of algorithms are still not well understood. We propose an extensive feature set for describing capacitated vehicle routing problem instances and illustrate how it can be used in algorithm selection, and how different feature selection approaches can be used to recognize the most relevant features for this task.
        
    Many exact, heuristic, and metaheuristic algorithms have been proposed to effectively produce high quality solutions to vehicle routing problems. However, it remains an open question which algorithm is the most appropriate for solving a given problem instance, mostly because the different strengths and weaknesses of algorithms are still not well understood. We propose an extensive feature set for describing capacitated vehicle routing problem instances and illustrate how it can be used in algorithm selection, and how different feature selection approaches can be used to recognize the most relevant features for this task.
        ES2019-112
Topic-based historical information selection for personalized sentiment analysis
Siwen Guo, Sviatlana Höhn, Christoph Schommer
Topic-based historical information selection for personalized sentiment analysis
Siwen Guo, Sviatlana Höhn, Christoph Schommer
            Abstract:
In this paper, we present a selection approach designed for personalized sentiment analysis with the aim of extracting related information from a user's history. Analyzing a person's past is key to modeling individuality and understanding the current state of the person. We consider a user's expressions in the past as historical information, and target posts from social platforms for which Twitter texts are chosen as exemplary. While implementing the personalized model PERSEUS, we observed information loss due to the lack of flexibility regarding the design of the input sequence. To compensate this issue, we provide a procedure for information selection based on the similarities in the topics of a user's historical posts. Evaluation is conducted comparing different similarity measures, and improvements are seen with the proposed method.
        
    In this paper, we present a selection approach designed for personalized sentiment analysis with the aim of extracting related information from a user's history. Analyzing a person's past is key to modeling individuality and understanding the current state of the person. We consider a user's expressions in the past as historical information, and target posts from social platforms for which Twitter texts are chosen as exemplary. While implementing the personalized model PERSEUS, we observed information loss due to the lack of flexibility regarding the design of the input sequence. To compensate this issue, we provide a procedure for information selection based on the similarities in the topics of a user's historical posts. Evaluation is conducted comparing different similarity measures, and improvements are seen with the proposed method.
        ES2019-143
Bridging face and sound modalities through domain adaptation metric learning
Christos Athanasiadis, Enrique Hortal, Stylianos Asteriadis
Bridging face and sound modalities through domain adaptation metric learning
Christos Athanasiadis, Enrique Hortal, Stylianos Asteriadis
            Abstract:
Robust emotion recognition systems require extensive training by employing huge number of training samples with purpose of generating sophisticated models. Furthermore, research is mostly focused on facial expression recognition due, mainly to, the wide availability of related datasets. However, the existence of rich and publicly available datasets is not the case for other modalities like sound and so forth. In this work, a heterogeneous domain adaptation framework is introduced for bridging two inherently different domains (namely face and audio). The purpose is to perform affect recognition on the modality where only a small amount of data is available, leveraging large amounts of data from another modality.
        
    Robust emotion recognition systems require extensive training by employing huge number of training samples with purpose of generating sophisticated models. Furthermore, research is mostly focused on facial expression recognition due, mainly to, the wide availability of related datasets. However, the existence of rich and publicly available datasets is not the case for other modalities like sound and so forth. In this work, a heterogeneous domain adaptation framework is introduced for bridging two inherently different domains (namely face and audio). The purpose is to perform affect recognition on the modality where only a small amount of data is available, leveraging large amounts of data from another modality.
        ES2019-18
Model selection for Extreme Minimal Learning Machine using sampling
Tommi Kärkkäinen
Model selection for Extreme Minimal Learning Machine using sampling
Tommi Kärkkäinen
            Abstract:
A combination of Extreme Learning Machine (ELM) and Minimal Learning Machine (MLM)-to use a distance-based basis from MLM in the ridge regression like learning framework of ELM-was proposed in [8]. In the further experiments with the technique [9], it was concluded that in multilabel classification one can obtain a good validation error level without overlearning simply by using the whole training data for constructing the basis. Here, we consider possibilities to reduce the complexity of the resulting machine learning model, referred as the Extreme Minimal Leaning Machine (EMLM), by using a bidirectional sampling strategy: To sample both the feature space and the space of observations in order to identify a simpler EMLM without sacrificing its generalization performance.
        
    A combination of Extreme Learning Machine (ELM) and Minimal Learning Machine (MLM)-to use a distance-based basis from MLM in the ridge regression like learning framework of ELM-was proposed in [8]. In the further experiments with the technique [9], it was concluded that in multilabel classification one can obtain a good validation error level without overlearning simply by using the whole training data for constructing the basis. Here, we consider possibilities to reduce the complexity of the resulting machine learning model, referred as the Extreme Minimal Leaning Machine (EMLM), by using a bidirectional sampling strategy: To sample both the feature space and the space of observations in order to identify a simpler EMLM without sacrificing its generalization performance.
        ES2019-34
Knowledge Discovery in Quarterly Financial Data of Stocks Based on the Prime Standard using a Hybrid of a Swarm with SOM
Michael Thrun
Knowledge Discovery in Quarterly Financial Data of Stocks Based on the Prime Standard using a Hybrid of a Swarm with SOM
Michael Thrun
            Abstract:
Stocks of the German Prime standard have to publish financial reports every three months which were not used fully for fundamental analysis so far. Through web scrapping, an up-to-date high-dimensional dataset of 45 features of 269 companies was extracted, but finding meaningful cluster structures in a high-dimensional dataset with a low number of cases is still a challenge in data science. A hybrid of a swarm with a SOM called Databionic swarm (DBS) found meaningful structures in the financial reports. Using the Chord distance the DBS algorithm results in a topographic map of high-dimensional structures and a clustering. Knowledge from the clustering is acquired using CART. The cluster structures can be explained by simple rules that allow predicting which future stock courses will fall with a 70% probability.
        
    Stocks of the German Prime standard have to publish financial reports every three months which were not used fully for fundamental analysis so far. Through web scrapping, an up-to-date high-dimensional dataset of 45 features of 269 companies was extracted, but finding meaningful cluster structures in a high-dimensional dataset with a low number of cases is still a challenge in data science. A hybrid of a swarm with a SOM called Databionic swarm (DBS) found meaningful structures in the financial reports. Using the Chord distance the DBS algorithm results in a topographic map of high-dimensional structures and a clustering. Knowledge from the clustering is acquired using CART. The cluster structures can be explained by simple rules that allow predicting which future stock courses will fall with a 70% probability.
        ES2019-55
Dimensionality reduction in a hydraulic valve positioning application
Travis Wiens
Dimensionality reduction in a hydraulic valve positioning application
Travis Wiens
            Abstract:
This paper presents an application of neural network signal processing to estimate the position of a hydraulic valve spool, based on acoustic excitement of the spool's end chamber. The spool's end chamber acts somewhat like a Helmholtz resonator whose frequency response changes based on its volume (and therefore spool position). However, non-ideal characteristics of the system including wave propagation effects and distributed parameters mean that estimating the volume is more complicated than simply evaluating the resonant frequency. In this case the frequency response has high dimensionality with high redundancy and noise. We present the use of linear and nonlinear principal component analysis to preprocess the frequency response data prior to neural network regression.
        
    This paper presents an application of neural network signal processing to estimate the position of a hydraulic valve spool, based on acoustic excitement of the spool's end chamber. The spool's end chamber acts somewhat like a Helmholtz resonator whose frequency response changes based on its volume (and therefore spool position). However, non-ideal characteristics of the system including wave propagation effects and distributed parameters mean that estimating the volume is more complicated than simply evaluating the resonant frequency. In this case the frequency response has high dimensionality with high redundancy and noise. We present the use of linear and nonlinear principal component analysis to preprocess the frequency response data prior to neural network regression.
        ES2019-117
Class-aware t-SNE: cat-SNE
Cyril de Bodt, Dounia Mulders, Daniel Lopez-Sanchez, Michel Verleysen, John Lee
Class-aware t-SNE: cat-SNE
Cyril de Bodt, Dounia Mulders, Daniel Lopez-Sanchez, Michel Verleysen, John Lee
            Abstract:
Stochastic Neighbor Embedding (SNE) and variants like $t$-distributed SNE are popular methods of unsupervised dimensionality reduction (DR) that deliver outstanding experimental results. Regular $t$-SNE is often used to visualize data with class labels in colored scatterplots, even if those labels are actually not involved in the DR process. This paper proposes a modification of $t$-SNE that uses class labels to adjust the individual widths of the Gaussian neighborhoods around each datum, instead of deriving those from a perplexity set by the user. The widths are adjusted such that neighbors of the same class around a datum exceed a certain fraction of the probability, typically above $50\%$. Doing so tends to shrink the bulk of the classes and to stretch their separation. Experimental results show that the proposed class-aware $t$-SNE ($\mathrm{ca}t$-SNE) outperforms regular $t$-SNE in a $K$NN classification task carried in the embedding.
        
    Stochastic Neighbor Embedding (SNE) and variants like $t$-distributed SNE are popular methods of unsupervised dimensionality reduction (DR) that deliver outstanding experimental results. Regular $t$-SNE is often used to visualize data with class labels in colored scatterplots, even if those labels are actually not involved in the DR process. This paper proposes a modification of $t$-SNE that uses class labels to adjust the individual widths of the Gaussian neighborhoods around each datum, instead of deriving those from a perplexity set by the user. The widths are adjusted such that neighbors of the same class around a datum exceed a certain fraction of the probability, typically above $50\%$. Doing so tends to shrink the bulk of the classes and to stretch their separation. Experimental results show that the proposed class-aware $t$-SNE ($\mathrm{ca}t$-SNE) outperforms regular $t$-SNE in a $K$NN classification task carried in the embedding.
        ES2019-42
Variational auto-encoders with Student’s t-prior
Najmeh Abiri, Mattias Ohlsson
Variational auto-encoders with Student’s t-prior
Najmeh Abiri, Mattias Ohlsson
            Abstract:
We propose a new structure for the variational autoencoder (VAE) prior, with the weakly informative multivariate Student-t distribution. In the proposed model all distribution parameters are trained, thereby allowing for a more robust approximation of the underlying data distribution. We used Fashion-MNIST data in two experiments to compare the proposed VAE with the standard Gaussian prior. Both experiments showed a better reconstruction of the images with VAE using Student-t prior distribution.
    We propose a new structure for the variational autoencoder (VAE) prior, with the weakly informative multivariate Student-t distribution. In the proposed model all distribution parameters are trained, thereby allowing for a more robust approximation of the underlying data distribution. We used Fashion-MNIST data in two experiments to compare the proposed VAE with the standard Gaussian prior. Both experiments showed a better reconstruction of the images with VAE using Student-t prior distribution.
Streaming data analysis, concept drift and analysis of dynamic data sets
        ES2019-3
Recent trends in streaming data analysis, concept drift and analysis of dynamic data sets
Albert Bifet, Barbara Hammer, Frank-Michael Schleif
Recent trends in streaming data analysis, concept drift and analysis of dynamic data sets
Albert Bifet, Barbara Hammer, Frank-Michael Schleif
            Abstract:
Today, many data are not any longer static but occur as dynamic data streams with high velocity, variability and volume. This leads to new challenges to be addressed by novel or adapted algorithms. In this tutorial we provide an introduction into the field of streaming data analysis summarizing its major characteristics and highlighting important research directions in the analysis of dynamic data.
        
    Today, many data are not any longer static but occur as dynamic data streams with high velocity, variability and volume. This leads to new challenges to be addressed by novel or adapted algorithms. In this tutorial we provide an introduction into the field of streaming data analysis summarizing its major characteristics and highlighting important research directions in the analysis of dynamic data.
        ES2019-105
Online Bayesian Shrinkage Regression
Waqas Jamil, Abdelhamid Bouchachia
Online Bayesian Shrinkage Regression
Waqas Jamil, Abdelhamid Bouchachia
            Abstract:
The present work introduces a new online regression method that extends the Shrinkage via Limit of Gibbs sampler (SLOG) in the context of online learning. In particular, we theoretically demonstrate that the proposed Online SLOG (OSLOG) is derived using the Bayesian framework without resorting to the Gibbs sampler. We also prove the performance guarantee of OSLOG.
        
    The present work introduces a new online regression method that extends the Shrinkage via Limit of Gibbs sampler (SLOG) in the context of online learning. In particular, we theoretically demonstrate that the proposed Online SLOG (OSLOG) is derived using the Bayesian framework without resorting to the Gibbs sampler. We also prove the performance guarantee of OSLOG.
        ES2019-33
Reactive Soft Prototype Computing for frequent reoccurring Concept Drift
Christoph Raab, Moritz Heusinger, Frank-Michael Schleif
Reactive Soft Prototype Computing for frequent reoccurring Concept Drift
Christoph Raab, Moritz Heusinger, Frank-Michael Schleif
            Abstract:
Todays datasets, especially in stream context, are more and more non-static and require algorithms to detect and adapt to change. Recent work shows vital research in the field, but mainly lack stable performance during model adaptation. In this work, a bound detection strategy followed by a prototype based insertion strategy is proposed. Validated through experimental results on a variety of typical non-static data, our solution provides stability and quick adjustment in times of change.
        
    Todays datasets, especially in stream context, are more and more non-static and require algorithms to detect and adapt to change. Recent work shows vital research in the field, but mainly lack stable performance during model adaptation. In this work, a bound detection strategy followed by a prototype based insertion strategy is proposed. Validated through experimental results on a variety of typical non-static data, our solution provides stability and quick adjustment in times of change.
        ES2019-59
Beta Distribution Drift Detection for Adaptive Classifiers
Lukas Fleckenstein, Sebastian Kauschke, Johannes Fürnkranz
Beta Distribution Drift Detection for Adaptive Classifiers
Lukas Fleckenstein, Sebastian Kauschke, Johannes Fürnkranz
            Abstract:
With today's abundant streams of data, the only constant we can rely on is change. For stream classification algorithms, it is necessary to adapt to concept drift. This can be achieved by monitoring the model error, and triggering counter measures as changes occur. In this paper, we propose a drift detection mechanism that fits a beta distribution to the model error, and treats abnormal behavior as drift. It works with any given model, leverages prior knowledge about this model, and allows to set application-specific confidence thresholds. Experiments confirm that it performs well, in particular when drift occurs abruptly.
        
    With today's abundant streams of data, the only constant we can rely on is change. For stream classification algorithms, it is necessary to adapt to concept drift. This can be achieved by monitoring the model error, and triggering counter measures as changes occur. In this paper, we propose a drift detection mechanism that fits a beta distribution to the model error, and treats abnormal behavior as drift. It works with any given model, leverages prior knowledge about this model, and allows to set application-specific confidence thresholds. Experiments confirm that it performs well, in particular when drift occurs abruptly.
        ES2019-63
Importance of user inputs while using incremental learning to personalize human activity recognition models
Pekka Siirtola, Heli Koskimäki, Juha Röning
Importance of user inputs while using incremental learning to personalize human activity recognition models
Pekka Siirtola, Heli Koskimäki, Juha Röning
            Abstract:
In this study, importance of user inputs is studied in the context of personalizing human activity recognition models using incremental learning. Inertial sensor data from three body positions are used, and the classification is based on Learn++ ensemble method. Three different approaches to update models are compared: non-supervised, semi-supervised and supervised. Non-supervised approach relies fully on predicted labels, supervised fully on user labeled data, and the proposed method for semi-supervised learning, is a combination of these two. In fact, our experiments show that by relying on predicted labels with high confidence, and asking the user to label only uncertain observations (from 12% to 26% of the observations depending on the used base classifier), almost as low error rates can be achieved as by using supervised approach. In fact, the difference was less than 2%-units. Moreover, unlike non-supervised approach, semi-supervised approach does not suffer from drastic concept drift, and thus, the error rate of the non-supervised approach is over 5%-units higher than using semi-supervised approach.
    In this study, importance of user inputs is studied in the context of personalizing human activity recognition models using incremental learning. Inertial sensor data from three body positions are used, and the classification is based on Learn++ ensemble method. Three different approaches to update models are compared: non-supervised, semi-supervised and supervised. Non-supervised approach relies fully on predicted labels, supervised fully on user labeled data, and the proposed method for semi-supervised learning, is a combination of these two. In fact, our experiments show that by relying on predicted labels with high confidence, and asking the user to label only uncertain observations (from 12% to 26% of the observations depending on the used base classifier), almost as low error rates can be achieved as by using supervised approach. In fact, the difference was less than 2%-units. Moreover, unlike non-supervised approach, semi-supervised approach does not suffer from drastic concept drift, and thus, the error rate of the non-supervised approach is over 5%-units higher than using semi-supervised approach.
Societal Issues in Machine Learning: When Learning from Data is Not Enough
        ES2019-6
Societal Issues in Machine Learning: When Learning from Data is Not Enough
Davide Bacciu, Battista Biggio, Paulo Lisboa, José D. Martín, Luca Oneto, Alfredo Vellido
Societal Issues in Machine Learning: When Learning from Data is Not Enough
Davide Bacciu, Battista Biggio, Paulo Lisboa, José D. Martín, Luca Oneto, Alfredo Vellido
            Abstract:
It has been argued that Artificial Intelligence (AI) is experiencing a fast process of commodification. Such characterization is on the interest of big IT companies, but it correctly reflects the current industrialization of AI. This phenomenon means that AI systems and products are reaching the society at large and, therefore, that societal issues related to the use of AI and Machine Learning (ML) cannot be ignored any longer. Designing ML models from this human-centered perspective means incorporating human-relevant requirements such as safety, fairness, privacy, and interpretability, but also considering broad societal issues such as ethics and legislation. These are essential aspects to foster the acceptance of ML-based technologies, as well as to ensure compliance with an evolving legislation concerning the impact of digital technologies on ethically and privacy sensitive matters. The {ESANN} special session for which this tutorial acts as an introduction aims to showcase the state of the art on these increasingly relevant topics among ML theoreticians and practitioners. For this purpose, we welcomed both solid contributions and preliminary relevant results showing the potential, the limitations and the challenges of new ideas, as well as refinements, or hybridizations among the different fields of research, ML and related approaches in facing real-world problems involving societal issues.
        
    It has been argued that Artificial Intelligence (AI) is experiencing a fast process of commodification. Such characterization is on the interest of big IT companies, but it correctly reflects the current industrialization of AI. This phenomenon means that AI systems and products are reaching the society at large and, therefore, that societal issues related to the use of AI and Machine Learning (ML) cannot be ignored any longer. Designing ML models from this human-centered perspective means incorporating human-relevant requirements such as safety, fairness, privacy, and interpretability, but also considering broad societal issues such as ethics and legislation. These are essential aspects to foster the acceptance of ML-based technologies, as well as to ensure compliance with an evolving legislation concerning the impact of digital technologies on ethically and privacy sensitive matters. The {ESANN} special session for which this tutorial acts as an introduction aims to showcase the state of the art on these increasingly relevant topics among ML theoreticians and practitioners. For this purpose, we welcomed both solid contributions and preliminary relevant results showing the potential, the limitations and the challenges of new ideas, as well as refinements, or hybridizations among the different fields of research, ML and related approaches in facing real-world problems involving societal issues.
        ES2019-29
Privacy Preserving Synthetic Health Data
Andrew Yale, Saloni Dash, Ritik Dutta, Isabelle Guyon, Adrien Pavao, Kristin Bennett
Privacy Preserving Synthetic Health Data
Andrew Yale, Saloni Dash, Ritik Dutta, Isabelle Guyon, Adrien Pavao, Kristin Bennett
            Abstract:
We examine the feasibility of using synthetic medical data generated by GANs in the classroom, to teach data science in health infor- matics. We present an end-to-end methodology to retain instructional utility, while preserving privacy to a level, which meets regulatory re- quirements: (1) a GAN is trained by a certified medical-data security-aware agent, inside a secure environment; (2) the GAN is used outside of the secure environment by external users (instructors or researchers) to gener- ate synthetic data. This second step facilitates data handling for external users, by avoiding de-identification, which may require special user training, be costly, and/or cause loss of data fidelity. We benchmark our proposed GAN versus various baseline methods using a novel set of metrics. At equal levels of privacy and utility, GANs provide small footprint models, meeting the desired specifications of our application domain. Data, code, and a challenge that we organized for educational purposes are available.
        
    We examine the feasibility of using synthetic medical data generated by GANs in the classroom, to teach data science in health infor- matics. We present an end-to-end methodology to retain instructional utility, while preserving privacy to a level, which meets regulatory re- quirements: (1) a GAN is trained by a certified medical-data security-aware agent, inside a secure environment; (2) the GAN is used outside of the secure environment by external users (instructors or researchers) to gener- ate synthetic data. This second step facilitates data handling for external users, by avoiding de-identification, which may require special user training, be costly, and/or cause loss of data fidelity. We benchmark our proposed GAN versus various baseline methods using a novel set of metrics. At equal levels of privacy and utility, GANs provide small footprint models, meeting the desired specifications of our application domain. Data, code, and a challenge that we organized for educational purposes are available.
        ES2019-78
Fairness and Accountability of Machine Learning Models in Railway Market: are Applicable Railway Laws Up to Regulate Them?
Charlotte Ducuing, Luca Oneto, Canepa Renzo
Fairness and Accountability of Machine Learning Models in Railway Market: are Applicable Railway Laws Up to Regulate Them?
Charlotte Ducuing, Luca Oneto, Canepa Renzo
            Abstract:
In this work we discuss whether the law is up to regulate the use of machine learning model in the context of the railway public transportation system. In particular, we deal with the problems of fairness and accountability of these models when exploited in the context of train traffic management. Railway sector-specific regulation, in their quality as network industry, hereby serves as a pilot. We show that, even where technological solutions are available, the law needs to keep up to support and accurately regulate the use of the technological solutions and we identify stumble points in this regard.
        
    In this work we discuss whether the law is up to regulate the use of machine learning model in the context of the railway public transportation system. In particular, we deal with the problems of fairness and accountability of these models when exploited in the context of train traffic management. Railway sector-specific regulation, in their quality as network industry, hereby serves as a pilot. We show that, even where technological solutions are available, the law needs to keep up to support and accurately regulate the use of the technological solutions and we identify stumble points in this regard.
        ES2019-134
Dynamic fairness - Breaking vicious cycles in automatic decision making
Benjamin Paaßen, Astrid Bunge, Carolin Hainke, Leon Sindelar, Matthias Vogelsang
Dynamic fairness - Breaking vicious cycles in automatic decision making
Benjamin Paaßen, Astrid Bunge, Carolin Hainke, Leon Sindelar, Matthias Vogelsang
            Abstract:
In recent years, machine learning techniques have been increasingly applied in sensitive decision making processes, raising fairness concerns. Past research has shown that machine learning may reproduce and even exacerbate human bias due to biased training data or flawed model assumptions, and thus may lead to discriminatory actions. To counteract such biased models, researchers have proposed multiple mathematical definitions of fairness according to which classifiers can be optimized. However, it has also been shown that the outcomes generated by some fairness notions may be unsatisfactory. In this contribution, we add to this research by considering decision making processes in time. We establish a theoretic model in which even perfectly accurate classifiers which adhere to almost all common fairness definitions lead to stable long-term inequalities due to vicious cycles. Only demographic parity, which enforces equal rates of positive decisions in all groups, avoids these effects and establishes instead a virtuous cycle leading to perfectly accurate and fair classification in the long term.
        
    In recent years, machine learning techniques have been increasingly applied in sensitive decision making processes, raising fairness concerns. Past research has shown that machine learning may reproduce and even exacerbate human bias due to biased training data or flawed model assumptions, and thus may lead to discriminatory actions. To counteract such biased models, researchers have proposed multiple mathematical definitions of fairness according to which classifiers can be optimized. However, it has also been shown that the outcomes generated by some fairness notions may be unsatisfactory. In this contribution, we add to this research by considering decision making processes in time. We establish a theoretic model in which even perfectly accurate classifiers which adhere to almost all common fairness definitions lead to stable long-term inequalities due to vicious cycles. Only demographic parity, which enforces equal rates of positive decisions in all groups, avoids these effects and establishes instead a virtuous cycle leading to perfectly accurate and fair classification in the long term.
        ES2019-120
Detecting Black-box Adversarial Examples through Nonlinear Dimensionality Reduction
Francesco Crecchi, Davide Bacciu, Battista Biggio
Detecting Black-box Adversarial Examples through Nonlinear Dimensionality Reduction
Francesco Crecchi, Davide Bacciu, Battista Biggio
            Abstract:
Deep neural networks are vulnerable to adversarial examples,i.e., carefully-perturbed input samples aimed to mislead classification. In this work, we propose a detection method based on t-SNE, a powerful nonlinear dimensionality reduction technique. Our empirical findings show that the proposed approach is able to effectively detect black-box adversarial examples, i.e., adversarial perturbations not carefully tuned to also bypass the detection method. While we believe that our method may also improve the robustness of deep nets against white-box adversarial examples, we leave a more detailed investigation of this issue to future work.
        
    Deep neural networks are vulnerable to adversarial examples,i.e., carefully-perturbed input samples aimed to mislead classification. In this work, we propose a detection method based on t-SNE, a powerful nonlinear dimensionality reduction technique. Our empirical findings show that the proposed approach is able to effectively detect black-box adversarial examples, i.e., adversarial perturbations not carefully tuned to also bypass the detection method. While we believe that our method may also improve the robustness of deep nets against white-box adversarial examples, we leave a more detailed investigation of this issue to future work.
        ES2019-97
Deep RL for autonomous robots: limitations and safety challenges
Olov Andersson, Patrick Doherty
Deep RL for autonomous robots: limitations and safety challenges
Olov Andersson, Patrick Doherty
            Abstract:
With the rise of deep reinforcement learning, there has also been a string of successes on continuous control problems using physics simulators. This has lead to some optimism regarding use in autonomous robots and vehicles. However, to successful apply such techniques to the real world requires a firm grasp of their limitations. As recent work has raised questions of how diverse these simulation benchmarks really are, we here instead analyze a popular deep RL approach on toy examples from robot obstacle avoidance. We find that these converge very slowly, if at all, to safe policies. We identify convergence issues on stochastic environments and local minima as problems that warrant more attention for safety-critical control applications.
        
    With the rise of deep reinforcement learning, there has also been a string of successes on continuous control problems using physics simulators. This has lead to some optimism regarding use in autonomous robots and vehicles. However, to successful apply such techniques to the real world requires a firm grasp of their limitations. As recent work has raised questions of how diverse these simulation benchmarks really are, we here instead analyze a popular deep RL approach on toy examples from robot obstacle avoidance. We find that these converge very slowly, if at all, to safe policies. We identify convergence issues on stochastic environments and local minima as problems that warrant more attention for safety-critical control applications.
        ES2019-124
Explaining classification systems using sparse dictionaries
Andrea Apicella, Fracesco Isgro, Roberto Prevete, Andrea Sorrentino, Guglielmo Tamburrini
Explaining classification systems using sparse dictionaries
Andrea Apicella, Fracesco Isgro, Roberto Prevete, Andrea Sorrentino, Guglielmo Tamburrini
            Abstract:
A pressing research topic is to find ways to explain the decisions of machine learning systems to end users, data officers, and other stakeholders. These explanations must be understandable to human beings. Much work in this field focuses on image classification, as the required explanations can rely on images, therefore making communication relatively easy, and may take into account the image as a whole. Here, we propose to exploit the representational power of sparse dictionaries to determine image local properties that can be used as crucial ingredients of humanly understandable explanations of classification decisions.
    A pressing research topic is to find ways to explain the decisions of machine learning systems to end users, data officers, and other stakeholders. These explanations must be understandable to human beings. Much work in this field focuses on image classification, as the required explanations can rely on images, therefore making communication relatively easy, and may take into account the image as a whole. Here, we propose to exploit the representational power of sparse dictionaries to determine image local properties that can be used as crucial ingredients of humanly understandable explanations of classification decisions.
Statistical physics of learning and inference
        ES2019-2
Statistical physics of learning and inference
Michael Biehl, Nestor Caticha, Manfred Opper, Thomas Villmann
Statistical physics of learning and inference
Michael Biehl, Nestor Caticha, Manfred Opper, Thomas Villmann
        ES2019-72
Trust, law and ideology in a NN agent model of the US Appellate Courts
Nestor Caticha, Felippe Alves
Trust, law and ideology in a NN agent model of the US Appellate Courts
Nestor Caticha, Felippe Alves
            Abstract:
Interacting NN are used to model US Appellate Court three judge panels. Agents, whose initial states have three contributions derived from common knowledge of the law, political affiliation and personality, learn by exchange of opinions, updating their state and trust about other agents. The model replicates data patterns only if initially the agents trust each other and are certain about their trust independently of party affiliation, showing evidence of ideological voting, dampening and amplification. Absence of law or party contribution destroys the theoretical-empirical agreement. We identify quantitative signatures for different levels of the law, ideological or idiosyncratic contributions.
        
    Interacting NN are used to model US Appellate Court three judge panels. Agents, whose initial states have three contributions derived from common knowledge of the law, political affiliation and personality, learn by exchange of opinions, updating their state and trust about other agents. The model replicates data patterns only if initially the agents trust each other and are certain about their trust independently of party affiliation, showing evidence of ideological voting, dampening and amplification. Absence of law or party contribution destroys the theoretical-empirical agreement. We identify quantitative signatures for different levels of the law, ideological or idiosyncratic contributions.
        ES2019-173
On-line learning dynamics of ReLU neural networks using statistical physics techniques
Michiel Straat, Michael Biehl
On-line learning dynamics of ReLU neural networks using statistical physics techniques
Michiel Straat, Michael Biehl
            Abstract:
We introduce exact macroscopic on-line learning dynamics of two-layer neural networks with ReLU units in the form of a system of differential equations, using techniques borrowed from statistical physics. For the first experiments, numerical solutions reveal similar behavior compared to sigmoidal activation researched in earlier work. In these experiments the theoretical results show good correspondence with simulations. In overrealizable and unrealizable learning scenarios, the learning behavior of ReLU networks shows distinctive characteristics compared to sigmoidal networks.
        
    We introduce exact macroscopic on-line learning dynamics of two-layer neural networks with ReLU units in the form of a system of differential equations, using techniques borrowed from statistical physics. For the first experiments, numerical solutions reveal similar behavior compared to sigmoidal activation researched in earlier work. In these experiments the theoretical results show good correspondence with simulations. In overrealizable and unrealizable learning scenarios, the learning behavior of ReLU networks shows distinctive characteristics compared to sigmoidal networks.
        ES2019-92
Noise helps optimization escape from saddle points in the neural dynamics
Fang Ying, Yu Zhaofei, Chen Feng
Noise helps optimization escape from saddle points in the neural dynamics
Fang Ying, Yu Zhaofei, Chen Feng
            Abstract:
Synaptic connectivity in the brain is thought to encode the long-term memory of an organism. But experimental data point to surprising ongoing fluctuations in synaptic activity. Assuming that the brain computation and plasticity can be understood as probabilistic inference, one of the essential roles of noise is to efficiently improve the performance of optimization in the form of stochastic gradient descent. The strict saddle condition for synaptic plasticity is deduced and under such condition noise can help escape from saddle points on high dimensional domains. The theoretical result explains the stochasticity of synapses and guides us how to make use of noise. Our simulation results manifest that in the learning and test phase, the accuracy of synaptic sampling is almost 20% higher than that without noise.
    Synaptic connectivity in the brain is thought to encode the long-term memory of an organism. But experimental data point to surprising ongoing fluctuations in synaptic activity. Assuming that the brain computation and plasticity can be understood as probabilistic inference, one of the essential roles of noise is to efficiently improve the performance of optimization in the form of stochastic gradient descent. The strict saddle condition for synaptic plasticity is deduced and under such condition noise can help escape from saddle points on high dimensional domains. The theoretical result explains the stochasticity of synapses and guides us how to make use of noise. Our simulation results manifest that in the learning and test phase, the accuracy of synaptic sampling is almost 20% higher than that without noise.
Image processing and transfer learning
        ES2019-169
Deep hybrid approach for 3D plane segmentation
Felipe Gomez Marulanda, Pieter Libin, Timothy Verstraeten, Ann Nowe
Deep hybrid approach for 3D plane segmentation
Felipe Gomez Marulanda, Pieter Libin, Timothy Verstraeten, Ann Nowe
            Abstract:
We address the limitations of Deep learning models for 3D geometry segmentation by using Conditional Random fields (CRF). We show that CRFs can take advantage of the neighbouring structure of point clouds to assist the learning of the Deep Learning models (DL). Our hybrid PN-CRF model is able to learn more optimal weights by taking advantage of equal-segmentation assignments to neighbouring points. As a result, it increases the robustness in the model specially for segmentation tasks where correctly detecting the boundaries between segmentations is very important.
        
    We address the limitations of Deep learning models for 3D geometry segmentation by using Conditional Random fields (CRF). We show that CRFs can take advantage of the neighbouring structure of point clouds to assist the learning of the Deep Learning models (DL). Our hybrid PN-CRF model is able to learn more optimal weights by taking advantage of equal-segmentation assignments to neighbouring points. As a result, it increases the robustness in the model specially for segmentation tasks where correctly detecting the boundaries between segmentations is very important.
        ES2019-66
visualizing image classification in fourier domain
Florian Franzen, Chunrong Yuan
visualizing image classification in fourier domain
Florian Franzen, Chunrong Yuan
            Abstract:
Image classification is successfully done with Convolutional Neural Networks (CNN). Alternatively it can be done in Fourier domain avoiding the convolution process. In this work, we develop several neural networks (NN) for classifying images in Fourier domain. In order to understand and explain the behaviour of the built NNs, we visualize neuron activities and analyze the underlying patterns relevant for the learning and classification process. We have carried out comparative study based on several datasets. By using images of objects with partial occlusion, we are able to find out the parts that are important for the classification of certain objects.
        
    Image classification is successfully done with Convolutional Neural Networks (CNN). Alternatively it can be done in Fourier domain avoiding the convolution process. In this work, we develop several neural networks (NN) for classifying images in Fourier domain. In order to understand and explain the behaviour of the built NNs, we visualize neuron activities and analyze the underlying patterns relevant for the learning and classification process. We have carried out comparative study based on several datasets. By using images of objects with partial occlusion, we are able to find out the parts that are important for the classification of certain objects.
        ES2019-71
Blind-spot network for image anomaly detection: A new approach to diabetic retinopathy screening
Shaon Sutradhar, José Rouco, Marcos Ortega
Blind-spot network for image anomaly detection: A new approach to diabetic retinopathy screening
Shaon Sutradhar, José Rouco, Marcos Ortega
            Abstract:
The development of computer-aided screening (CAS) systems is motivated by the high prevalence and severity of the target disease along with the time taken to manually assess each case. This is the case with diabetic retinopathy screening, that is based on the manual grading of retinography images. The development of CAS systems, however, usually involves data-driven approaches that require extensive and usually scarce manually labeled datasets. With this in mind, we propose the use of unsupervised anomaly detection methods for screening that can take advantage of the large amount of healthy cases available. Concretely, we focus on reconstruction-based anomaly detection methods, which are usually approached with autoencoders. We propose a new network architecture, the Blind-Spot Network, that, according to the presented experiments, improves the performance of autoencoders in this setting.
        
    The development of computer-aided screening (CAS) systems is motivated by the high prevalence and severity of the target disease along with the time taken to manually assess each case. This is the case with diabetic retinopathy screening, that is based on the manual grading of retinography images. The development of CAS systems, however, usually involves data-driven approaches that require extensive and usually scarce manually labeled datasets. With this in mind, we propose the use of unsupervised anomaly detection methods for screening that can take advantage of the large amount of healthy cases available. Concretely, we focus on reconstruction-based anomaly detection methods, which are usually approached with autoencoders. We propose a new network architecture, the Blind-Spot Network, that, according to the presented experiments, improves the performance of autoencoders in this setting.
        ES2019-17
A document detection technique using convolutional neural networks for optical character recognition systems
Lorand Dobai, Mihai Teletin
A document detection technique using convolutional neural networks for optical character recognition systems
Lorand Dobai, Mihai Teletin
            Abstract:
An important part of an optical character recognition pipeline is the preprocessing step, whose purpose is to enhance the conditions under which the text extraction is later performed. In this paper, we present a novel deep learning based preprocessing method to jointly detect and deskew documents in digital images. Our work intends to improve the optical recognition performance, especially on frames which are skewed (slightly rotated) or have cluttered backgrounds. The proposed method achieves good document detection and deskewing results on a dataset of photos of cash receipts.
        
    An important part of an optical character recognition pipeline is the preprocessing step, whose purpose is to enhance the conditions under which the text extraction is later performed. In this paper, we present a novel deep learning based preprocessing method to jointly detect and deskew documents in digital images. Our work intends to improve the optical recognition performance, especially on frames which are skewed (slightly rotated) or have cluttered backgrounds. The proposed method achieves good document detection and deskewing results on a dataset of photos of cash receipts.
        ES2019-100
Learning super-resolution 3D segmentation of plant root MRI images from few examples
Ali Oguz Uzman, Jannis Horn, Sven Behnke
Learning super-resolution 3D segmentation of plant root MRI images from few examples
Ali Oguz Uzman, Jannis Horn, Sven Behnke
            Abstract:
Analyzing plant roots is crucial to understand plant performance in different soil environments. While magnetic resonance imaging (MRI) can be used to obtain 3D images of plant roots, extraction of the root structural model is challenging due to highly noisy soil environments and low-resolution of MRI images. To improve both contrast and resolution, we adapt the state-of-the-art method RefineNet for 3D segmentation of the plant root MRI images in super-resolution. The networks are trained from few manual segmentations that are augmented by geometric transformations, realistic noise, and other variabilities. The resulting segmentations contain most root structures including branches not extracted by human supervision.
        
    Analyzing plant roots is crucial to understand plant performance in different soil environments. While magnetic resonance imaging (MRI) can be used to obtain 3D images of plant roots, extraction of the root structural model is challenging due to highly noisy soil environments and low-resolution of MRI images. To improve both contrast and resolution, we adapt the state-of-the-art method RefineNet for 3D segmentation of the plant root MRI images in super-resolution. The networks are trained from few manual segmentations that are augmented by geometric transformations, realistic noise, and other variabilities. The resulting segmentations contain most root structures including branches not extracted by human supervision.
        ES2019-175
Analyzing spatial dissimilarities in high-resolution geo-data : a case study of four European cities
Julien Randon-Furling, William Clark, Madalina Olteanu
Analyzing spatial dissimilarities in high-resolution geo-data : a case study of four European cities
Julien Randon-Furling, William Clark, Madalina Olteanu
            Abstract:
The analysis of spatial dissimilarities across cities often relies on pre-defined areal units, leading to problems of scale, interpretability and cross-comparisons. Furthermore, traditional measures of dissimilarities tend to be single-number indices that fail to capture the complexity of segregation patterns. We present in this paper a method that allows one to extract and analyze information on all scales, at every point in the city, through a stochastic sequential aggregation procedure based on high-resolution data. This method provides insightful visual representations, as well as mathematical characterizations of segregation phenomena.
        
    The analysis of spatial dissimilarities across cities often relies on pre-defined areal units, leading to problems of scale, interpretability and cross-comparisons. Furthermore, traditional measures of dissimilarities tend to be single-number indices that fail to capture the complexity of segregation patterns. We present in this paper a method that allows one to extract and analyze information on all scales, at every point in the city, through a stochastic sequential aggregation procedure based on high-resolution data. This method provides insightful visual representations, as well as mathematical characterizations of segregation phenomena.
        ES2019-21
Computerized tool for identification and enhanced visualization of Macular Edema regions using OCT scans
Iago Otero Coto, Plácido Francisco Lizancos Vidal, Joaquim de Moura, Jorge Novo, Marcos Ortega
Computerized tool for identification and enhanced visualization of Macular Edema regions using OCT scans
Iago Otero Coto, Plácido Francisco Lizancos Vidal, Joaquim de Moura, Jorge Novo, Marcos Ortega
            Abstract:
We propose a novel methodology using Optical Coherence Tomography (OCT) images to detect the 3 clinically defined types of Macular Edema, which is among the main causes of blindness: Diffuse Retinal Thickening (DRT), Cystoid Macular Edema (CME) and Serous Retinal Detachment (SRD). To perform this detection, we sample the images and train models to create an intuitive color map that represents the 3 pathologies to facilitate the clinical evaluation. The proposed method was tested using a dataset composed by 96 OCT images. The system provided satisfactory results with accuracy values of 90.49%, 93.23% and 88.87% for the CME, SRD and DRT detections, respectively.
        
    We propose a novel methodology using Optical Coherence Tomography (OCT) images to detect the 3 clinically defined types of Macular Edema, which is among the main causes of blindness: Diffuse Retinal Thickening (DRT), Cystoid Macular Edema (CME) and Serous Retinal Detachment (SRD). To perform this detection, we sample the images and train models to create an intuitive color map that represents the 3 pathologies to facilitate the clinical evaluation. The proposed method was tested using a dataset composed by 96 OCT images. The system provided satisfactory results with accuracy values of 90.49%, 93.23% and 88.87% for the CME, SRD and DRT detections, respectively.
        ES2019-201
A best-first branch-and-bound search for solving the transductive inference problem using support vector machines
Hygor Xavier Araújo, Raul Fonseca Neto, Saulo Moraes Villela
A best-first branch-and-bound search for solving the transductive inference problem using support vector machines
Hygor Xavier Araújo, Raul Fonseca Neto, Saulo Moraes Villela
            Abstract:
In this paper we present a new method for solving the transductive inference problem whose objective is predicting the binary labels of a subset of points of interest of an unknown decision function. We attempt to learn a decision boundary using SVM. To obtain the maximal-margin hypothesis over labeled and unlabeled samples we employ an admissible best-first search based on margin values. Empirical evidence suggests that this globally optimal solution can obtain excellent results in the transduction problem. Due to the selection strategy used the search algorithm explores only a small fraction of unlabeled samples making it efficiently applicable to median-sized datasets. We compare our results with the results obtained from the TSVM demonstrating better results in margin values.
        
    In this paper we present a new method for solving the transductive inference problem whose objective is predicting the binary labels of a subset of points of interest of an unknown decision function. We attempt to learn a decision boundary using SVM. To obtain the maximal-margin hypothesis over labeled and unlabeled samples we employ an admissible best-first search based on margin values. Empirical evidence suggests that this globally optimal solution can obtain excellent results in the transduction problem. Due to the selection strategy used the search algorithm explores only a small fraction of unlabeled samples making it efficiently applicable to median-sized datasets. We compare our results with the results obtained from the TSVM demonstrating better results in margin values.
        ES2019-46
LEAP nets for power grid perturbations
Benjamin Donnot, Balthazar Donon, Isabelle Guyon, Liu ZHENGYING, Antoine MAROT, Patrick Panciatici, Marc Schoenauer
LEAP nets for power grid perturbations
Benjamin Donnot, Balthazar Donon, Isabelle Guyon, Liu ZHENGYING, Antoine MAROT, Patrick Panciatici, Marc Schoenauer
            Abstract:
We propose a novel neural network embedding approach to model power transmission grids, in which high voltage lines are disconnected and re-connected with one-another from time to time, either accidentally or willfully. We call our architeture LEAP net, for Latent Encoding of Atypical Perturbation. Our method implements a form of transfer learning, permitting to train on a few source domains, then generalize to new target domains, without learning on any example of that domain. We evaluate the viability of this technique to rapidly assess curative actions that human operators take in emergency situations, using real historical data, from the French high voltage power grid.
        
    We propose a novel neural network embedding approach to model power transmission grids, in which high voltage lines are disconnected and re-connected with one-another from time to time, either accidentally or willfully. We call our architeture LEAP net, for Latent Encoding of Atypical Perturbation. Our method implements a form of transfer learning, permitting to train on a few source domains, then generalize to new target domains, without learning on any example of that domain. We evaluate the viability of this technique to rapidly assess curative actions that human operators take in emergency situations, using real historical data, from the French high voltage power grid.
        ES2019-81
Active one-shot learning with Prototypical Networks
Rinu Boney, Alexander Ilin
Active one-shot learning with Prototypical Networks
Rinu Boney, Alexander Ilin
            Abstract:
We consider the problem of active one-shot classification where a classifier needs to adapt to new tasks by requesting labels for one example per class from (potentially many) unlabeled examples. We propose a clustering approach to the problem. The features extracted with Prototypical Networks are clustered using K-means and the label for one representative sample from each cluster is requested to label the whole cluster. We demonstrate good performance of this simple active adaptation strategy using image data.
        
    We consider the problem of active one-shot classification where a classifier needs to adapt to new tasks by requesting labels for one example per class from (potentially many) unlabeled examples. We propose a clustering approach to the problem. The features extracted with Prototypical Networks are clustered using K-means and the label for one representative sample from each cluster is requested to label the whole cluster. We demonstrate good performance of this simple active adaptation strategy using image data.
        ES2019-123
Transfer Learning for transferring machine-learning based models among hyperspectral sensors
Patrick Menz, Andreas Backhaus, Udo Seiffert
Transfer Learning for transferring machine-learning based models among hyperspectral sensors
Patrick Menz, Andreas Backhaus, Udo Seiffert
            Abstract:
Using previously generated machine learning models under changing sensor hardware with nearly the same performance is a desirable goal. This constitutes a model transfer problem. We compare a Radial Basis Function Network adapted for transfer learning to a classical data alignment approach. This approach to transfer machine-learning models is tested on a task of material classification using hyperspectral imaging recorded with different camera systems and the aim to make camera systems interchangeable. The results show that a machine-learning based algorithm outperforms a state-of-the-art hyperspectral data alignment algorithm.
    Using previously generated machine learning models under changing sensor hardware with nearly the same performance is a desirable goal. This constitutes a model transfer problem. We compare a Radial Basis Function Network adapted for transfer learning to a classical data alignment approach. This approach to transfer machine-learning models is tested on a task of material classification using hyperspectral imaging recorded with different camera systems and the aim to make camera systems interchangeable. The results show that a machine-learning based algorithm outperforms a state-of-the-art hyperspectral data alignment algorithm.
Time series and signal processing
        ES2019-126
Multiple-Kernel dictionary learning for reconstruction and clustering of unseen multivariate time-series
Babak Hosseini, Barbara Hammer
Multiple-Kernel dictionary learning for reconstruction and clustering of unseen multivariate time-series
Babak Hosseini, Barbara Hammer
            Abstract:
There exist many approaches for description and recognition of unseen classes in datasets. Nevertheless, it becomes a challenging problem when we deal with multivariate time-series (MTS) (e.g., motion data), where we cannot apply the vectorial algorithms directly to the inputs. In this work, we propose a novel multiple-kernel dictionary learning (MKD) which learns semantic attributes based on specific combinations of MTS dimensions in the feature space. Hence, MKD can fully/partially reconstructs the unseen classes based on the training data (seen classes). Furthermore, we obtain sparse encodings for unseen classes based on the learned MKD attributes, and upon which we propose a simple but effective incremental clustering algorithm to categorize the unseen MTS classes in an unsupervised way. According to the empirical evaluation of our MKD framework on real benchmarks, it provides an interpretable reconstruction of unseen MTS data as well as a high performance regarding their online clustering.
        
    There exist many approaches for description and recognition of unseen classes in datasets. Nevertheless, it becomes a challenging problem when we deal with multivariate time-series (MTS) (e.g., motion data), where we cannot apply the vectorial algorithms directly to the inputs. In this work, we propose a novel multiple-kernel dictionary learning (MKD) which learns semantic attributes based on specific combinations of MTS dimensions in the feature space. Hence, MKD can fully/partially reconstructs the unseen classes based on the training data (seen classes). Furthermore, we obtain sparse encodings for unseen classes based on the learned MKD attributes, and upon which we propose a simple but effective incremental clustering algorithm to categorize the unseen MTS classes in an unsupervised way. According to the empirical evaluation of our MKD framework on real benchmarks, it provides an interpretable reconstruction of unseen MTS data as well as a high performance regarding their online clustering.
        ES2019-130
Tensor factorization to extract patterns in multimodal EEG data
Dounia Mulders, Cyril de Bodt, Nicolas Lejeune, John Lee, André Mouraux, Michel Verleysen
Tensor factorization to extract patterns in multimodal EEG data
Dounia Mulders, Cyril de Bodt, Nicolas Lejeune, John Lee, André Mouraux, Michel Verleysen
            Abstract:
Noisy multi-way data sets are ubiquitous in many domains. In neuroscience, electroencephalogram (EEG) data are recorded during periodic stimulation from different sensory modalities, leading to steady-state (SS) recordings with at least four ways: the channels, the time, the subjects and the modalities. Improving the signal-to-noise ratio (SNR) of the SS responses is crucial to enable their practical use. Supervised spatial filtering methods can be considered for this purpose to relevantly guide the extraction of specific activity patterns. Nevertheless, such approaches are difficult to validate with few subjects and can process at most two data ways simultaneously, the remaining ones being either averaged or considered independently despite their dependencies. This paper hence designs unsupervised tensor factorization models to enable identifying meaningful underlying structures characterized in all ways of multimodal SS data. We show on EEG recordings from 15 subjects that such factorizations faithfully reveal consistent spatial topographies, time courses with enhanced SNR and subject variations of the periodic brain activity.
        
    Noisy multi-way data sets are ubiquitous in many domains. In neuroscience, electroencephalogram (EEG) data are recorded during periodic stimulation from different sensory modalities, leading to steady-state (SS) recordings with at least four ways: the channels, the time, the subjects and the modalities. Improving the signal-to-noise ratio (SNR) of the SS responses is crucial to enable their practical use. Supervised spatial filtering methods can be considered for this purpose to relevantly guide the extraction of specific activity patterns. Nevertheless, such approaches are difficult to validate with few subjects and can process at most two data ways simultaneously, the remaining ones being either averaged or considered independently despite their dependencies. This paper hence designs unsupervised tensor factorization models to enable identifying meaningful underlying structures characterized in all ways of multimodal SS data. We show on EEG recordings from 15 subjects that such factorizations faithfully reveal consistent spatial topographies, time courses with enhanced SNR and subject variations of the periodic brain activity.
        ES2019-119
Beyond Pham's algorithm for joint diagonalization
Pierre Ablin, Jean-François Cardoso, Alexandre Gramfort
Beyond Pham's algorithm for joint diagonalization
Pierre Ablin, Jean-François Cardoso, Alexandre Gramfort
            Abstract:
The approximate joint diagonalization of a set of matrices consists in finding a basis in which these matrices are as diagonal as possible. This problem naturally appears in several statistical learning tasks such as blind signal separation. We consider the diagonalization criterion studied in a seminal paper by Pham (2001), and propose a new quasi-Newton method for its optimization. Through numerical experiments on simulated and real datasets, we show that the proposed method outperforms Pham’s algorithm. An open source Python package is released.
        
    The approximate joint diagonalization of a set of matrices consists in finding a basis in which these matrices are as diagonal as possible. This problem naturally appears in several statistical learning tasks such as blind signal separation. We consider the diagonalization criterion studied in a seminal paper by Pham (2001), and propose a new quasi-Newton method for its optimization. Through numerical experiments on simulated and real datasets, we show that the proposed method outperforms Pham’s algorithm. An open source Python package is released.
        ES2019-50
Frequency Domain Transformer Networks for Video Prediction
Hafez Farazi, Sven Behnke
Frequency Domain Transformer Networks for Video Prediction
Hafez Farazi, Sven Behnke
            Abstract:
The task of video prediction is forecasting the next frames given some previous frames. Despite much recent progress, this task is still challenging mainly due to high nonlinearity in the spatial domain. To address this issue, we propose a novel architecture, Frequency Domain Transformer Network (FDTN), which is an end-to-end learnable model that formulates the transformations of the signal in the frequency domain. Experimental evaluations show that this approach can outperform some widely used video prediction methods like Video Ladder Network (VLN) and Predictive Gated Pyramids (PGP).
        
    The task of video prediction is forecasting the next frames given some previous frames. Despite much recent progress, this task is still challenging mainly due to high nonlinearity in the spatial domain. To address this issue, we propose a novel architecture, Frequency Domain Transformer Network (FDTN), which is an end-to-end learnable model that formulates the transformations of the signal in the frequency domain. Experimental evaluations show that this approach can outperform some widely used video prediction methods like Video Ladder Network (VLN) and Predictive Gated Pyramids (PGP).
        ES2019-184
Comparison between DeepESNs and gated RNNs on multivariate time-series prediction
Claudio Gallicchio, Alessio Micheli, Luca Pedrelli
Comparison between DeepESNs and gated RNNs on multivariate time-series prediction
Claudio Gallicchio, Alessio Micheli, Luca Pedrelli
            Abstract:
We propose an experimental comparison between Deep Echo State Networks (DeepESNs) and gated Recurrent Neural Networks (RNNs) on multivariate time-series prediction tasks. In particular, we compare reservoir and fully-trained RNNs able to represent signals featured by multiple time-scales dynamics. The analysis is performed in terms of efficiency and prediction accuracy on 4 polyphonic music tasks. Our results show that DeepESN is able to outperform ESN in terms of prediction accuracy and efficiency. Whereas, between fully-trained approaches, Gated Recurrent Units (GRU) outperforms Long Short-Term Memory (LSTM) and simple RNN models in most cases. Overall, DeepESN turned out to be extremely more efficient than others RNN approaches and the best solution in terms of prediction accuracy on 3 out of 4 tasks.
        
    We propose an experimental comparison between Deep Echo State Networks (DeepESNs) and gated Recurrent Neural Networks (RNNs) on multivariate time-series prediction tasks. In particular, we compare reservoir and fully-trained RNNs able to represent signals featured by multiple time-scales dynamics. The analysis is performed in terms of efficiency and prediction accuracy on 4 polyphonic music tasks. Our results show that DeepESN is able to outperform ESN in terms of prediction accuracy and efficiency. Whereas, between fully-trained approaches, Gated Recurrent Units (GRU) outperforms Long Short-Term Memory (LSTM) and simple RNN models in most cases. Overall, DeepESN turned out to be extremely more efficient than others RNN approaches and the best solution in terms of prediction accuracy on 3 out of 4 tasks.
        ES2019-159
Autoregressive Convolutional Recurrent Neural Network for Univariate and Multivariate Time Series Prediction
Matteo Maggiolo , Gerasimos Spanakis
Autoregressive Convolutional Recurrent Neural Network for Univariate and Multivariate Time Series Prediction
Matteo Maggiolo , Gerasimos Spanakis
            Abstract:
Time Series forecasting (univariate and multivariate) is a problem of high complexity due the different patterns that have to be detected in the input, ranging from high to low frequencies ones. In this paper we propose a new model for timeseries prediction that utilizes convolutional layers for feature extraction, a recurrent encoder and a linear autoregressive component. We motivate the model and we test and compare it against a baseline of widely used existing architectures for univariate and multivariate timeseries. The proposed model appears to outperform the baselines in almost every case of the multivariate timeseries datasets, in some cases even with 50% improvement which shows the strengths of such a hybrid architecture in complex timeseries.
        
    Time Series forecasting (univariate and multivariate) is a problem of high complexity due the different patterns that have to be detected in the input, ranging from high to low frequencies ones. In this paper we propose a new model for timeseries prediction that utilizes convolutional layers for feature extraction, a recurrent encoder and a linear autoregressive component. We motivate the model and we test and compare it against a baseline of widely used existing architectures for univariate and multivariate timeseries. The proposed model appears to outperform the baselines in almost every case of the multivariate timeseries datasets, in some cases even with 50% improvement which shows the strengths of such a hybrid architecture in complex timeseries.
        ES2019-15
Using Deep Learning and Evolutionary Algorithms for Time Series Forecasting
Rafael Thomazi Gonzalez, Dante Augusto Couto Barone
Using Deep Learning and Evolutionary Algorithms for Time Series Forecasting
Rafael Thomazi Gonzalez, Dante Augusto Couto Barone
            Abstract:
Deep Learning is one of the latest approaches in the field of artificial neural networks. Since they were first proposed, Deep Learning models have obtained state-of-art results in some problems related to classification and pattern recognition. However, such models have been little used in time series forecasting. This work aims to investigate the use of some of these architectures in this kind of problem. Another contribution is the use of one Evolutionary Algorithm to optimize the hyperparameters of these models. The advantage of the proposed method is shown on two artificial time series datasets and one electricity load demand dataset.
        
    Deep Learning is one of the latest approaches in the field of artificial neural networks. Since they were first proposed, Deep Learning models have obtained state-of-art results in some problems related to classification and pattern recognition. However, such models have been little used in time series forecasting. This work aims to investigate the use of some of these architectures in this kind of problem. Another contribution is the use of one Evolutionary Algorithm to optimize the hyperparameters of these models. The advantage of the proposed method is shown on two artificial time series datasets and one electricity load demand dataset.
        ES2019-103
lightweight autonomous bayesian optimization of Echo-State Networks
Cerina Luca, Giuseppe Franco, Marco Domenico Santambrogio
lightweight autonomous bayesian optimization of Echo-State Networks
Cerina Luca, Giuseppe Franco, Marco Domenico Santambrogio
            Abstract:
Echo State Networks (ESN) represent a good option to tackle non-linear, time-dependent problems without the training complexity of standard Recurrent Neural Networks (RNNs), thanks to intrinsic dynamics that arise from untrained sparse networks. However, performance and stability of ESN are determined by their hyper-parameters, e.g. Reservoir dimension and sparsity, and the characteristics of the input, whose optimal values required time consuming procedures to be found. Here we propose an efficient automatic optimization framework for ESN based on the Bayesian Optimization given user-defined objectives, and bounded ranges on hyper-parameters. Results shown performance comparable withexhaustive grid-search optimization algorithms.
        
    Echo State Networks (ESN) represent a good option to tackle non-linear, time-dependent problems without the training complexity of standard Recurrent Neural Networks (RNNs), thanks to intrinsic dynamics that arise from untrained sparse networks. However, performance and stability of ESN are determined by their hyper-parameters, e.g. Reservoir dimension and sparsity, and the characteristics of the input, whose optimal values required time consuming procedures to be found. Here we propose an efficient automatic optimization framework for ESN based on the Bayesian Optimization given user-defined objectives, and bounded ranges on hyper-parameters. Results shown performance comparable withexhaustive grid-search optimization algorithms.
        ES2019-99
time series modelling of market price in real-time bidding
Manxing Du, Christian Hammerschmidt, Georgios Varisteas, Radu State, Mats Brorsson, Zhu Zhang
time series modelling of market price in real-time bidding
Manxing Du, Christian Hammerschmidt, Georgios Varisteas, Radu State, Mats Brorsson, Zhu Zhang
            Abstract:
Real-Time-Bidding (RTB) is one of the most popular online advertisement selling mechanisms. Modeling the highly dynamic bidding environment is crucial for making good bids. Market prices of auctions fluctuate heavily within short time spans. State-of-the-art methods neglect the temporal dependencies of bidders' behaviors. In this paper, the bid requests are aggregated by time and the mean market price per aggregated segment is modeled as a time series. We show that the Long Short Term Memory (LSTM) neural network outperforms the state-of-the-art univariate time series models by capturing the nonlinear temporal dependencies in the market price. We further improve the predicting performance by adding a summary of exogenous features from bid requests.
    Real-Time-Bidding (RTB) is one of the most popular online advertisement selling mechanisms. Modeling the highly dynamic bidding environment is crucial for making good bids. Market prices of auctions fluctuate heavily within short time spans. State-of-the-art methods neglect the temporal dependencies of bidders' behaviors. In this paper, the bid requests are aggregated by time and the mean market price per aggregated segment is modeled as a time series. We show that the Long Short Term Memory (LSTM) neural network outperforms the state-of-the-art univariate time series models by capturing the nonlinear temporal dependencies in the market price. We further improve the predicting performance by adding a summary of exogenous features from bid requests.
Dynamical systems and reinforcement learning
        ES2019-65
Short-term trajectory planning using reinforcement learning within a neuromorphic control architecture
Florian Mirus, Benjamin Zorn, Jörg Conradt
Short-term trajectory planning using reinforcement learning within a neuromorphic control architecture
Florian Mirus, Benjamin Zorn, Jörg Conradt
            Abstract:
In this paper, we present a first step towards neuromorphic vehicle control. We propose a modular and hierarchical system architecture entirely implemented in a spiking neuron substrate, which allows for adjustment of individual components trough either supervised or reinforcement learning as well as future deployment on dedicated neuromorphic hardware. In a sample instantiation, we investigate automated training of a neuromorphic trajectory selection module using reinforcement learning to demonstrate the general feasibility of our approach. We evaluate our system using the open-source race car simulator TORCS.
        
    In this paper, we present a first step towards neuromorphic vehicle control. We propose a modular and hierarchical system architecture entirely implemented in a spiking neuron substrate, which allows for adjustment of individual components trough either supervised or reinforcement learning as well as future deployment on dedicated neuromorphic hardware. In a sample instantiation, we investigate automated training of a neuromorphic trajectory selection module using reinforcement learning to demonstrate the general feasibility of our approach. We evaluate our system using the open-source race car simulator TORCS.
        ES2019-129
training networks separately on static and dynamic obstacles improves collision avoidance during indoor robot navigation
Viktor Schmuck, David Meredith
training networks separately on static and dynamic obstacles improves collision avoidance during indoor robot navigation
Viktor Schmuck, David Meredith
            Abstract:
Autonomous robot navigation and dynamic obstacle avoidance in complex, cluttered, indoor environments is a challenging task. A robust solution would allow robots to be deployed in hospitals, airports or shopping centres to serve as guides and fulfil other functions requiring safe human--robot interaction. Previous studies have explored various approaches to selecting sensor types, collecting data, and training models capable of safely avoiding unmapped, possibly dynamic obstacles in an indoor environment. In this paper we address the problem of recognizing and anticipating collisions, in order to determine when avoidance manoeuvres are required. We propose and compare two sensor-fusion and neural-network-based solutions, one in which models are trained separately on static and dynamic samples and another in which a model is trained on samples of collisions with both dynamic and static obstacles. The measured accuracies confirmed that the separately trained, ensemble models had better recognition performance, but were slower at calculation than the models trained without taking the obstacle types into account.
        
    Autonomous robot navigation and dynamic obstacle avoidance in complex, cluttered, indoor environments is a challenging task. A robust solution would allow robots to be deployed in hospitals, airports or shopping centres to serve as guides and fulfil other functions requiring safe human--robot interaction. Previous studies have explored various approaches to selecting sensor types, collecting data, and training models capable of safely avoiding unmapped, possibly dynamic obstacles in an indoor environment. In this paper we address the problem of recognizing and anticipating collisions, in order to determine when avoidance manoeuvres are required. We propose and compare two sensor-fusion and neural-network-based solutions, one in which models are trained separately on static and dynamic samples and another in which a model is trained on samples of collisions with both dynamic and static obstacles. The measured accuracies confirmed that the separately trained, ensemble models had better recognition performance, but were slower at calculation than the models trained without taking the obstacle types into account.
        ES2019-149
Human feedback in continuous actor-critic reinforcement learning
Cristian Millán, Bruno Fernandes, Francisco Cruz
Human feedback in continuous actor-critic reinforcement learning
Cristian Millán, Bruno Fernandes, Francisco Cruz
            Abstract:
Reinforcement learning methods are used when an agent tries to learn from a changing environment. With continuous actions, the performance is significantly better, but the learning requires excessive time to find the proper policy. In this work, we focused on including human feedback in reinforcement learning continuous action space. We joint the policy with the feedback to favor actions in regions of low density. We compare the performance of the feedback over continuous actor-critic algorithm and evaluate it in the cart-pole balancing task. The obtained results show that our approach increases the accumulated reward and improves performance during the task.
        
    Reinforcement learning methods are used when an agent tries to learn from a changing environment. With continuous actions, the performance is significantly better, but the learning requires excessive time to find the proper policy. In this work, we focused on including human feedback in reinforcement learning continuous action space. We joint the policy with the feedback to favor actions in regions of low density. We compare the performance of the feedback over continuous actor-critic algorithm and evaluate it in the cart-pole balancing task. The obtained results show that our approach increases the accumulated reward and improves performance during the task.
        ES2019-76
Chasing the Echo State Property
Claudio Gallicchio
                              Chasing the Echo State Property
Claudio Gallicchio
            Abstract:
Reservoir Computing (RC) provides an efficient way for designing dynamical recurrent neural models. While training is restricted to a simple output component, the recurrent connections are left untrained after initialization, subject to stability constraints specified by the Echo State Property (ESP). Literature conditions for the ESP typically fail to properly account for the effects of driving input signals, often limiting the potentialities of the RC approach. In this paper, we study the fundamental aspect of asymptotic stability of RC models in presence of driving input, introducing an empirical ESP index that enables to easily analyze the stability regimes of reservoirs. Results on two benchmark datasets reveal interesting insights on the dynamical properties of input-driven reservoirs, suggesting that the actual domain of ESP validity is much wider than what covered by literature conditions commonly used in RC practice.
    Reservoir Computing (RC) provides an efficient way for designing dynamical recurrent neural models. While training is restricted to a simple output component, the recurrent connections are left untrained after initialization, subject to stability constraints specified by the Echo State Property (ESP). Literature conditions for the ESP typically fail to properly account for the effects of driving input signals, often limiting the potentialities of the RC approach. In this paper, we study the fundamental aspect of asymptotic stability of RC models in presence of driving input, introducing an empirical ESP index that enables to easily analyze the stability regimes of reservoirs. Results on two benchmark datasets reveal interesting insights on the dynamical properties of input-driven reservoirs, suggesting that the actual domain of ESP validity is much wider than what covered by literature conditions commonly used in RC practice.