Bruges, Belgium, April 27-28-29
Content of the proceedings
-
Advances in Learning with Kernels: Theory and Practice in a World of growing Constraints
Regression and mathematical models
Indefinite proximity learning
Deep learning, text, image and signal processing
Machine learning for medical applications
Physics and Machine Learning: Emerging Paradigms
Incremental learning algorithms and applications
Classification
Deep learning
Clustering and feature selection
Information Visualisation and Machine Learning: Techniques, Validation and Integration
Robotics and reinforcement learning
Advances in Learning with Kernels: Theory and Practice in a World of growing Constraints
ES2016-21
Advances in Learning with Kernels: Theory and Practice in a World of growing Constraints
Luca Oneto, Nicolò Navarin, Michele Donini, Fabio Aiolli, Davide Anguita
Advances in Learning with Kernels: Theory and Practice in a World of growing Constraints
Luca Oneto, Nicolò Navarin, Michele Donini, Fabio Aiolli, Davide Anguita
Abstract:
Kernel methods consistently outperformed previous generations of learning techniques. They provide a flexible and expressive learning framework that has been successfully applied to a wide range of real world problems but, recently, novel algorithms, such as Deep Neural Networks and Ensemble Methods, have increased their competitiveness against them. Due to the current data growth in size, heterogeneity and structure, the new generation of algorithms are expected to solve increasingly challenging problems. This must be done under growing constraints such as computational resources, memory budget and energy consumption. For these reasons, new ideas have to come up in the field of kernel learning, such as deeper kernels and novel algorithms, to fill the gap that now exists with the most recent learning paradigms. The purpose of this special session is to highlight recent advances in learning with kernels. In particular, this session welcomes contributions toward the solution of the weaknesses (e.g. scalability, computational efficiency and too shallow kernels) and the improvement of the strengths (e.g. the ability of dealing with structural data) of the state of the art kernel methods. We also encourage the submission of new theoretical results in the Statistical Learning Theory framework and innovative solutions to real world problems.
Kernel methods consistently outperformed previous generations of learning techniques. They provide a flexible and expressive learning framework that has been successfully applied to a wide range of real world problems but, recently, novel algorithms, such as Deep Neural Networks and Ensemble Methods, have increased their competitiveness against them. Due to the current data growth in size, heterogeneity and structure, the new generation of algorithms are expected to solve increasingly challenging problems. This must be done under growing constraints such as computational resources, memory budget and energy consumption. For these reasons, new ideas have to come up in the field of kernel learning, such as deeper kernels and novel algorithms, to fill the gap that now exists with the most recent learning paradigms. The purpose of this special session is to highlight recent advances in learning with kernels. In particular, this session welcomes contributions toward the solution of the weaknesses (e.g. scalability, computational efficiency and too shallow kernels) and the improvement of the strengths (e.g. the ability of dealing with structural data) of the state of the art kernel methods. We also encourage the submission of new theoretical results in the Statistical Learning Theory framework and innovative solutions to real world problems.
ES2016-111
Kernel based collaborative filtering for very large scale top-N item recommendation
Fabio Aiolli, Mirko Polato
Kernel based collaborative filtering for very large scale top-N item recommendation
Fabio Aiolli, Mirko Polato
Abstract:
The increasing availability of implicit feedback datasets has raised the interest in developing effective collaborative filtering techniques able to deal asymmetrically with unambiguous positive feedback and ambiguous negative feedback. In this paper, we propose a principled kernel-based collaborative filtering method for top-N item recommendation with implicit feedback. We present an efficient implementation using the linear kernel, and how to generalize it to other kernels preserving efficiency. We compare our method with the state-of-the-art algorithm on the Million Songs Dataset achieving an execution about 5 time faster, while having comparable effectiveness.
The increasing availability of implicit feedback datasets has raised the interest in developing effective collaborative filtering techniques able to deal asymmetrically with unambiguous positive feedback and ambiguous negative feedback. In this paper, we propose a principled kernel-based collaborative filtering method for top-N item recommendation with implicit feedback. We present an efficient implementation using the linear kernel, and how to generalize it to other kernels preserving efficiency. We compare our method with the state-of-the-art algorithm on the Million Songs Dataset achieving an execution about 5 time faster, while having comparable effectiveness.
ES2016-137
RNAsynth: constraints learning for RNA inverse folding.
Fabrizio Costa, Parastou Kohvaei, Robert Kleinkauf
RNAsynth: constraints learning for RNA inverse folding.
Fabrizio Costa, Parastou Kohvaei, Robert Kleinkauf
Abstract:
RNA polymers are an important class of molecules: not only they are involved in a variety of biological functions, from coding to decoding, from regulation to expression of genes, but crucially, they are nowadays easily synthesizable, opening interesting application scenarios in biotechnological and biomedical domains. Here we propose a constructive machine learning framework to aid in the rational design of such polymers. Using a graph kernel approach in a supervised setting we define an importance notion over molecular parts. We then convert the set of most important parts into specific sequence and structure constraints. Finally an inverse folding algorithm uses these constraints to compute the desired RNA sequence.
RNA polymers are an important class of molecules: not only they are involved in a variety of biological functions, from coding to decoding, from regulation to expression of genes, but crucially, they are nowadays easily synthesizable, opening interesting application scenarios in biotechnological and biomedical domains. Here we propose a constructive machine learning framework to aid in the rational design of such polymers. Using a graph kernel approach in a supervised setting we define an importance notion over molecular parts. We then convert the set of most important parts into specific sequence and structure constraints. Finally an inverse folding algorithm uses these constraints to compute the desired RNA sequence.
ES2016-150
Measuring the Expressivity of Graph Kernels through the Rademacher Complexity
Luca Oneto, Nicolò Navarin, Michele Donini, Alessandro Sperduti, Fabio Aiolli, Davide Anguita
Measuring the Expressivity of Graph Kernels through the Rademacher Complexity
Luca Oneto, Nicolò Navarin, Michele Donini, Alessandro Sperduti, Fabio Aiolli, Davide Anguita
Abstract:
Graph kernels are widely adopted in real-world applications that involve learning on graph data. Different graph kernels have been proposed in literature, but no theoretical comparison among them is present. In this paper we provide a formal definition for the expressiveness of a graph kernel by means of the Rademacher Complexity, and analyze the differences among some state-of-the-art graph kernels. Results on real world datasets confirm some known properties of graph kernels, showing that the Rademacher Complexity is indeed a suitable measure for this analysis.
Graph kernels are widely adopted in real-world applications that involve learning on graph data. Different graph kernels have been proposed in literature, but no theoretical comparison among them is present. In this paper we provide a formal definition for the expressiveness of a graph kernel by means of the Rademacher Complexity, and analyze the differences among some state-of-the-art graph kernels. Results on real world datasets confirm some known properties of graph kernels, showing that the Rademacher Complexity is indeed a suitable measure for this analysis.
ES2016-172
A reservoir activation kernel for trees
Davide Bacciu, Claudio Gallicchio, Alessio Micheli
A reservoir activation kernel for trees
Davide Bacciu, Claudio Gallicchio, Alessio Micheli
Abstract:
We introduce an efficient tree kernel for reservoir computing models exploiting the recursive encoding of the structure in the state activations of the untrained recurrent layer. We discuss how the contractive property of the reservoir induces a topographic organization of the state space that can be used to compute structural matches in terms of pairwise distances between points in the state space. The experimental analysis shows that the proposed kernel is capable of achieving competitive classification results by relying on very small reservoirs comprising as little as $10$ sparsely connected recurrent neurons.
We introduce an efficient tree kernel for reservoir computing models exploiting the recursive encoding of the structure in the state activations of the untrained recurrent layer. We discuss how the contractive property of the reservoir induces a topographic organization of the state space that can be used to compute structural matches in terms of pairwise distances between points in the state space. The experimental analysis shows that the proposed kernel is capable of achieving competitive classification results by relying on very small reservoirs comprising as little as $10$ sparsely connected recurrent neurons.
ES2016-32
Learning with hard constraints as a limit case of learning with soft constraints
Giorgio Gnecco, Marco Gori, Stefano Melacci, Marcello Sanguineti
Learning with hard constraints as a limit case of learning with soft constraints
Giorgio Gnecco, Marco Gori, Stefano Melacci, Marcello Sanguineti
Abstract:
We refer to the framework of learning with mixed hard/soft pointwise constraints considered in Gnecco et al., IEEE TNNLS, vol. 26, pp. 2019-2032, 2015. We show that the optimal solution to the learning problem with hard bilateral and linear pointwise constraints stated therein can be obtained as the limit of the sequence of optimal solutions to the related learning problems with soft bilateral and linear pointwise constraints, when the penalty parameter tends to infinity.
We refer to the framework of learning with mixed hard/soft pointwise constraints considered in Gnecco et al., IEEE TNNLS, vol. 26, pp. 2019-2032, 2015. We show that the optimal solution to the learning problem with hard bilateral and linear pointwise constraints stated therein can be obtained as the limit of the sequence of optimal solutions to the related learning problems with soft bilateral and linear pointwise constraints, when the penalty parameter tends to infinity.
ES2016-109
Gaussian process prediction for time series of structured data
Benjamin Paassen, Christina Göpfert, Barbara Hammer
Gaussian process prediction for time series of structured data
Benjamin Paassen, Christina Göpfert, Barbara Hammer
Abstract:
Time series prediction constitutes a classic topic in machine learning with wide-ranging applications, but mostly restricted to the domain of vectorial sequence entries. In recent years, time series of structured data (such as sequences, trees or graph structures) have become more and more important, for example in social network analysis or intelligent tutoring systems. In this contribution, we propose an extension of time series models to strucured data based on Gaussian processes and structure kernels. We also provide speedup techniques for predictions in linear time, and we evaluate our approach on theoretical as well as real data.
Time series prediction constitutes a classic topic in machine learning with wide-ranging applications, but mostly restricted to the domain of vectorial sequence entries. In recent years, time series of structured data (such as sequences, trees or graph structures) have become more and more important, for example in social network analysis or intelligent tutoring systems. In this contribution, we propose an extension of time series models to strucured data based on Gaussian processes and structure kernels. We also provide speedup techniques for predictions in linear time, and we evaluate our approach on theoretical as well as real data.
ES2016-115
Efficient low rank approximation via alternating least squares for scalable kernel learning
Piyush Bhardwaj, Harish Karnick
Efficient low rank approximation via alternating least squares for scalable kernel learning
Piyush Bhardwaj, Harish Karnick
Abstract:
Kernel approximation is an effective way of dealing with the scalability challenges of computing, storing and learning with kernel matrix. In this work, we propose an $O(|\Omega|r^2)$ time algorithm for rank $r$ approximation of the kernel matrix by computing $|\Omega|$ entries. The proposed algorithm solves a non-convex optimization problem by random sampling of the entries of the kernel matrix followed by a matrix completion step using alternating least squares (ALS). Empirically, our method shows better performance than other baseline and state-of-the-art kernel approximation methods on several standard real life datasets. Theoretically, we extend the current guarantees of ALS for kernel approximation.
Kernel approximation is an effective way of dealing with the scalability challenges of computing, storing and learning with kernel matrix. In this work, we propose an $O(|\Omega|r^2)$ time algorithm for rank $r$ approximation of the kernel matrix by computing $|\Omega|$ entries. The proposed algorithm solves a non-convex optimization problem by random sampling of the entries of the kernel matrix followed by a matrix completion step using alternating least squares (ALS). Empirically, our method shows better performance than other baseline and state-of-the-art kernel approximation methods on several standard real life datasets. Theoretically, we extend the current guarantees of ALS for kernel approximation.
Regression and mathematical models
ES2016-169
Modelling of parameterized processes via regression in the model space
Witali Aswolinskiy, Rene Felix Reinhart, Jochen Steil
Modelling of parameterized processes via regression in the model space
Witali Aswolinskiy, Rene Felix Reinhart, Jochen Steil
Abstract:
We consider the modelling of parameterized processes, where the goal is to model the process for new parameter value combinations. We compare the classical regression approach to a modular approach based on regression in the model space: First, for each process parametrization a model is learned, second, a mapping from process parameters to model parameters is learned. We evaluate both approaches on a real and a synthetic dataset and show the advantages of the regression in the model space.
We consider the modelling of parameterized processes, where the goal is to model the process for new parameter value combinations. We compare the classical regression approach to a modular approach based on regression in the model space: First, for each process parametrization a model is learned, second, a mapping from process parameters to model parameters is learned. We evaluate both approaches on a real and a synthetic dataset and show the advantages of the regression in the model space.
ES2016-131
Auto-adaptive Laplacian Pyramids
Ángela Fernández, Neta Rabin, Dalia Fishelov, José R. Dorronsoro
Auto-adaptive Laplacian Pyramids
Ángela Fernández, Neta Rabin, Dalia Fishelov, José R. Dorronsoro
Abstract:
An important challenge in Data Mining and Machine Learning is the proper analysis of a given dataset, especially for understanding and working with functions defined over it. In this paper we propose Auto-adaptive Laplacian Pyramids (ALP) for target function smoothing when the target function is defined on a high-dimensional dataset. The proposed algorithm automatically selects the optimal function resolution (stopping time) adapted to the data defined and its noise. We illustrate its application on a radiation forecasting example.
An important challenge in Data Mining and Machine Learning is the proper analysis of a given dataset, especially for understanding and working with functions defined over it. In this paper we propose Auto-adaptive Laplacian Pyramids (ALP) for target function smoothing when the target function is defined on a high-dimensional dataset. The proposed algorithm automatically selects the optimal function resolution (stopping time) adapted to the data defined and its noise. We illustrate its application on a radiation forecasting example.
ES2016-187
Using Robust Extreme Learning Machines to Predict Cotton Yarn Strength and Hairiness
Diego Mesquita, Antonio Araujo Neto, Jose Queiroz Neto, Joao Gomes, Leonardo Rodrigues
Using Robust Extreme Learning Machines to Predict Cotton Yarn Strength and Hairiness
Diego Mesquita, Antonio Araujo Neto, Jose Queiroz Neto, Joao Gomes, Leonardo Rodrigues
Abstract:
Cotton yarn is often spun from a mixture of distinct cotton bales. Although many studies have presented eorts to predict hairiness and strength from cotton properties, the heterogeneity of this mixture and its influence in such values have been neglected so far. In this work the properties of the cotton bale mixture are modeled as random variables and a robust variant of the Extreme Learning Machine (ELM) to address the cotton quality prediction problem is proposed. A real world dataset collected from a textile industry was used to compare the performance of the proposed model with a traditional ELM and a linear regression model. The results showed that the proposed method outperformed the benchmark methods in terms of Average Root Mean Square Error (ARMSE).
Cotton yarn is often spun from a mixture of distinct cotton bales. Although many studies have presented eorts to predict hairiness and strength from cotton properties, the heterogeneity of this mixture and its influence in such values have been neglected so far. In this work the properties of the cotton bale mixture are modeled as random variables and a robust variant of the Extreme Learning Machine (ELM) to address the cotton quality prediction problem is proposed. A real world dataset collected from a textile industry was used to compare the performance of the proposed model with a traditional ELM and a linear regression model. The results showed that the proposed method outperformed the benchmark methods in terms of Average Root Mean Square Error (ARMSE).
ES2016-176
RSS-based Robot Localization in Critical Environments using Reservoir Computing
Mauro Dragone, Claudio Gallicchio, Roberto Guzman, Alessio Micheli
RSS-based Robot Localization in Critical Environments using Reservoir Computing
Mauro Dragone, Claudio Gallicchio, Roberto Guzman, Alessio Micheli
Abstract:
Supporting both accurate and reliable localization in critical environments is key to increasing the potential of logistic mobile robots. This paper presents a system for indoor robot localization based on Reservoir Computing from noisy radio signal strength index (RSSI) data generated by a network of sensors. The proposed approach is assessed under different conditions in a real-world hospital environment. Experimental results show that the resulting system represents a good trade-off between localization performance and deployment complexity, with the ability to recover from cases in which permanent changes in the environment affect its generalization performance.
Supporting both accurate and reliable localization in critical environments is key to increasing the potential of logistic mobile robots. This paper presents a system for indoor robot localization based on Reservoir Computing from noisy radio signal strength index (RSSI) data generated by a network of sensors. The proposed approach is assessed under different conditions in a real-world hospital environment. Experimental results show that the resulting system represents a good trade-off between localization performance and deployment complexity, with the ability to recover from cases in which permanent changes in the environment affect its generalization performance.
ES2016-141
Interpretability of machine learning models and representations: an introduction
Adrien Bibal, Benoît Frénay
Interpretability of machine learning models and representations: an introduction
Adrien Bibal, Benoît Frénay
Abstract:
Interpretability is often a major concern in machine learning. Although many authors agree with this statement, interpretability is often tackled with intuitive arguments, distinct (yet related) terms and heuristic quantifications. This short survey aims to clarify the concepts related to interpretability and emphasises the distinction between interpreting models and representations, as well as heuristic-based and user-based approaches.
Interpretability is often a major concern in machine learning. Although many authors agree with this statement, interpretability is often tackled with intuitive arguments, distinct (yet related) terms and heuristic quantifications. This short survey aims to clarify the concepts related to interpretability and emphasises the distinction between interpreting models and representations, as well as heuristic-based and user-based approaches.
ES2016-179
Bayesian mixture of spatial spline regressions
Faicel Chamroukhi
Bayesian mixture of spatial spline regressions
Faicel Chamroukhi
Abstract:
We introduce a Bayesian mixture of spatial spline regressions with mixed-effects (BMSSR) for density estimation and model-based clustering of spatial functional data. The model, through its Bayesian formulation, allows to integrate possible prior knowledge on the data structure and constitute a good alternative to a recent mixture of spatial spline regressions model estimated in a maximum likelihood framework via the expectation-maximization (EM) algorithm. The Bayesian model inference is performed by Markov Chain Monte Carlo (MCMC) sampling. We derive a Gibbs sampler to infer the model and apply it on simulated surfaces and a real problem of handwritten digit recognition using the MNIST data.
We introduce a Bayesian mixture of spatial spline regressions with mixed-effects (BMSSR) for density estimation and model-based clustering of spatial functional data. The model, through its Bayesian formulation, allows to integrate possible prior knowledge on the data structure and constitute a good alternative to a recent mixture of spatial spline regressions model estimated in a maximum likelihood framework via the expectation-maximization (EM) algorithm. The Bayesian model inference is performed by Markov Chain Monte Carlo (MCMC) sampling. We derive a Gibbs sampler to infer the model and apply it on simulated surfaces and a real problem of handwritten digit recognition using the MNIST data.
ES2016-159
Comparison of three algorithms for parametric change-point detection
Cynthia Faure, Jean-Marc Bardet, Madalina Olteanu, Jérôme Lacaille
Comparison of three algorithms for parametric change-point detection
Cynthia Faure, Jean-Marc Bardet, Madalina Olteanu, Jérôme Lacaille
Abstract:
Numerous sensors on SNECMA's engines capture a considerable amount of data during tests or flights. In order to detect potential crucial changes of characteristic variables, it is relevant to develop powerful statistical algorithms. This manuscript is devoted to offline change-point detection, in piecewise linear models with an unknown number of change-points. In this context, three recent algorithms are considered, implemented and applied to simulated and real data.
Numerous sensors on SNECMA's engines capture a considerable amount of data during tests or flights. In order to detect potential crucial changes of characteristic variables, it is relevant to develop powerful statistical algorithms. This manuscript is devoted to offline change-point detection, in piecewise linear models with an unknown number of change-points. In this context, three recent algorithms are considered, implemented and applied to simulated and real data.
ES2016-96
Differentiable piecewise-Bézier interpolation on Riemannian manifolds
Pierre-Antoine Absil, Pierre-Yves Gousenbourger, Paul Striewski, Benedikt Wirth
Differentiable piecewise-Bézier interpolation on Riemannian manifolds
Pierre-Antoine Absil, Pierre-Yves Gousenbourger, Paul Striewski, Benedikt Wirth
Abstract:
Given a set of manifold-valued data points associated to a regular 2D grid, we propose a method to interpolate those by means of a C1-Bézier surface. To this end, we generalize classical Euclidean piecewise-Bézier surfaces to manifolds. We then propose an efficient algorithm to compute the control points defining the surface based on the Euclidean concept of natural C2-splines and show examples on different manifolds.
Given a set of manifold-valued data points associated to a regular 2D grid, we propose a method to interpolate those by means of a C1-Bézier surface. To this end, we generalize classical Euclidean piecewise-Bézier surfaces to manifolds. We then propose an efficient algorithm to compute the control points defining the surface based on the Euclidean concept of natural C2-splines and show examples on different manifolds.
ES2016-164
Extending a two-variable mean to a multi-variable mean
Estelle Massart, Julien Hendrickx, Pierre-Antoine Absil
Extending a two-variable mean to a multi-variable mean
Estelle Massart, Julien Hendrickx, Pierre-Antoine Absil
Abstract:
We consider the problem of extending any two-variable mean M(·,·) to a multi-variable mean, using no other tool than M(·,·) itself. Pálfia proposed an iterative procedure that consists in evaluating successively two-variable means according to a cyclic pattern. We propose here a variant of his procedure to improve the convergence speed. Our approach consists in re-ordering the iterates after each iteration in order to speed up the transfer of information between successive iterates.
We consider the problem of extending any two-variable mean M(·,·) to a multi-variable mean, using no other tool than M(·,·) itself. Pálfia proposed an iterative procedure that consists in evaluating successively two-variable means according to a cyclic pattern. We propose here a variant of his procedure to improve the convergence speed. Our approach consists in re-ordering the iterates after each iteration in order to speed up the transfer of information between successive iterates.
ES2016-5
neuro-percolation as a superposition of random-walks
Gaetano Aiello
neuro-percolation as a superposition of random-walks
Gaetano Aiello
Abstract:
Axons of pioneers neurons are actively directed towards their targets by signaling molecules. The result is a highly stereotyped axonal trajectory. The tip of the axon appears to proceed erratically, which has favored models of axon guidance as random-walk processes. In reality, axon guidance is basically a deterministic process, although largely unknown. Random-walk models assume noise as a representation of what is actually unknown. Wadsworth's guidance gives an experimental account of the axonal bending as induced by addition/subtraction of specific guidance agents. The axonal trajectory, however, is not a simple random-walk but a series of Wiener-Lévy stochastic processes.
Axons of pioneers neurons are actively directed towards their targets by signaling molecules. The result is a highly stereotyped axonal trajectory. The tip of the axon appears to proceed erratically, which has favored models of axon guidance as random-walk processes. In reality, axon guidance is basically a deterministic process, although largely unknown. Random-walk models assume noise as a representation of what is actually unknown. Wadsworth's guidance gives an experimental account of the axonal bending as induced by addition/subtraction of specific guidance agents. The axonal trajectory, however, is not a simple random-walk but a series of Wiener-Lévy stochastic processes.
Indefinite proximity learning
ES2016-22
Learning in indefinite proximity spaces - recent trends
Frank-Michael Schleif, Peter Tino, Yingyu Liang
Learning in indefinite proximity spaces - recent trends
Frank-Michael Schleif, Peter Tino, Yingyu Liang
Abstract:
Efficient learning of a data analysis task strongly depends on the data representation. Many methods rely on symmetric similarity or dissimilarity representations by means of metric inner products or dis-tances, providing easy access to powerful mathematical formalisms like kernel approaches. Similarities and dissimilarities are however often natu-rally obtained by non-metric proximity measures which can not easily be handled by classical learning algorithms. In the last years major efforts have been undertaken to provide approaches which can either directly be used for such data or to make standard methods available for these type of data. We provide an overview about recent achievements in the field of learning with indefinite proximities.
Efficient learning of a data analysis task strongly depends on the data representation. Many methods rely on symmetric similarity or dissimilarity representations by means of metric inner products or dis-tances, providing easy access to powerful mathematical formalisms like kernel approaches. Similarities and dissimilarities are however often natu-rally obtained by non-metric proximity measures which can not easily be handled by classical learning algorithms. In the last years major efforts have been undertaken to provide approaches which can either directly be used for such data or to make standard methods available for these type of data. We provide an overview about recent achievements in the field of learning with indefinite proximities.
ES2016-186
Discriminative dimensionality reduction in kernel space
Alexander Schulz, Barbara Hammer
Discriminative dimensionality reduction in kernel space
Alexander Schulz, Barbara Hammer
Abstract:
Modern nonlinear dimensionality reduction (\DR) techniques enable an efficient visual data inspection in the form of scatter plots, but they suffer from the fact that \DR\ is inherently ill-posed. Discriminative dimensionality reduction (\didi) offers one remedy, since it allows a practitioner to identify what is relevant and what should be regarded as noise by means of auxiliary information such as class labels. Powerful \didi\ methods exist, but they are restricted to vectorial data only. In this contribution, we extend one particularly promising approach to non-vectorial data characterised by a kernel. This enables us to apply discriminative dimensionality reduction to complex, possibly discrete or structured data.
Modern nonlinear dimensionality reduction (\DR) techniques enable an efficient visual data inspection in the form of scatter plots, but they suffer from the fact that \DR\ is inherently ill-posed. Discriminative dimensionality reduction (\didi) offers one remedy, since it allows a practitioner to identify what is relevant and what should be regarded as noise by means of auxiliary information such as class labels. Powerful \didi\ methods exist, but they are restricted to vectorial data only. In this contribution, we extend one particularly promising approach to non-vectorial data characterised by a kernel. This enables us to apply discriminative dimensionality reduction to complex, possibly discrete or structured data.
ES2016-14
Study on the loss of information caused by the "positivation" of graph kernels for 3D shapes
Gaelle Loosli
Study on the loss of information caused by the "positivation" of graph kernels for 3D shapes
Gaelle Loosli
Abstract:
In the presented experimental study, we compare the classification power of two variations of the same graph kernel. One variation is designed to produce semi-definite positive kernel matrices (Kmatching) and is an approximation of the other one, which is indefinite (Kmax). We show that using adaptated tools to deal with indefiniteness (KSVM), the original indefinite kernel outperforms its positive definite approximate version. We also propose a slight improvement of the KSVM method, with produces non sparse solution, by adding a fast post-processing step that gives a sparser solution.
In the presented experimental study, we compare the classification power of two variations of the same graph kernel. One variation is designed to produce semi-definite positive kernel matrices (Kmatching) and is an approximation of the other one, which is indefinite (Kmax). We show that using adaptated tools to deal with indefiniteness (KSVM), the original indefinite kernel outperforms its positive definite approximate version. We also propose a slight improvement of the KSVM method, with produces non sparse solution, by adding a fast post-processing step that gives a sparser solution.
ES2016-100
Adaptive dissimilarity weighting for prototype-based classification optimizing mixtures of dissimilarities
Marika Kaden, David Nebel, Thomas Villmann
Adaptive dissimilarity weighting for prototype-based classification optimizing mixtures of dissimilarities
Marika Kaden, David Nebel, Thomas Villmann
Abstract:
In this paper we propose an adaptive bilinear mixing of dissimilarities for better classification learning. In particular, we focus on prototype based learning like learning vector quantization. In this sense the learning of the mixture can be seen as a kind of dissimilarity learning as counterpart to dissimilarity selection in advance. We demonstrate this approach working for relational as well as median variants of prototype learning for proximity data.
In this paper we propose an adaptive bilinear mixing of dissimilarities for better classification learning. In particular, we focus on prototype based learning like learning vector quantization. In this sense the learning of the mixture can be seen as a kind of dissimilarity learning as counterpart to dissimilarity selection in advance. We demonstrate this approach working for relational as well as median variants of prototype learning for proximity data.
Deep learning, text, image and signal processing
ES2016-87
Deep multi-task learning with evolving weights
Soufiane Belharbi, Romain HERAULT, Clément Chatelain, Sébastien Adam
Deep multi-task learning with evolving weights
Soufiane Belharbi, Romain HERAULT, Clément Chatelain, Sébastien Adam
Abstract:
Pre-training of deep neural networks has been abandoned in the last few years. The main reason is the difficulty to control the over-fitting and tune the consequential raised number of hyper-parameters. In this paper we use a multi-task learning framework that gathers weighted supervised and unsupervised tasks. We propose to evolve the weights along the learning epochs in order to avoid the break in the sequential transfer learning used in the pre-training scheme. This framework allows the use of unlabeled data. Extensive experiments on MNIST showed interesting results.
Pre-training of deep neural networks has been abandoned in the last few years. The main reason is the difficulty to control the over-fitting and tune the consequential raised number of hyper-parameters. In this paper we use a multi-task learning framework that gathers weighted supervised and unsupervised tasks. We propose to evolve the weights along the learning epochs in order to avoid the break in the sequential transfer learning used in the pre-training scheme. This framework allows the use of unlabeled data. Extensive experiments on MNIST showed interesting results.
ES2016-98
Deep neural network analysis of go games: which stones motivate a move?
Thomas Burwick, Luke Ewig
Deep neural network analysis of go games: which stones motivate a move?
Thomas Burwick, Luke Ewig
Abstract:
Recently, deep learning was used to construct deep convolution network models for move prediction in Go. Here, we develop methods to analyze the inner workings of the resulting deep architectures. Our example network is learned and tested using a database of over 83,000 expert games with over 17 million moves. We present ways of visualizing the learned features (“shapes”) and a method to derive aspects of the motivation behind the expert’s moves. The discussed methods are inspired by recent progress made in constructing saliency maps for image classification.
Recently, deep learning was used to construct deep convolution network models for move prediction in Go. Here, we develop methods to analyze the inner workings of the resulting deep architectures. Our example network is learned and tested using a database of over 83,000 expert games with over 17 million moves. We present ways of visualizing the learned features (“shapes”) and a method to derive aspects of the motivation behind the expert’s moves. The discussed methods are inspired by recent progress made in constructing saliency maps for image classification.
ES2016-105
Feature binding in deep convolution networks with recurrences, oscillations, and top-down modulated dynamics
Martin Mundt, Sebastian Blaes, Thomas Burwick
Feature binding in deep convolution networks with recurrences, oscillations, and top-down modulated dynamics
Martin Mundt, Sebastian Blaes, Thomas Burwick
Abstract:
Deep convolution networks are extended with an oscillatory phase dynamics and recurrent couplings that are based on convolution and deconvolution. Moreover, top-down modulation is included that enforces the dynamical selection and grouping of features of the recognized object into assemblies based on temporal coherence. With respect to image processing, it is demonstrated how the combination of these mechanisms allow for the segmentation of the parts of the objects that are relevant for its classification.
Deep convolution networks are extended with an oscillatory phase dynamics and recurrent couplings that are based on convolution and deconvolution. Moreover, top-down modulation is included that enforces the dynamical selection and grouping of features of the recognized object into assemblies based on temporal coherence. With respect to image processing, it is demonstrated how the combination of these mechanisms allow for the segmentation of the parts of the objects that are relevant for its classification.
ES2016-170
Maximum likelihood learning of RBMs with Gaussian visible units on the Stiefel manifold
Ryo Karakida, Masato Okada, Shun-ichi Amari
Maximum likelihood learning of RBMs with Gaussian visible units on the Stiefel manifold
Ryo Karakida, Masato Okada, Shun-ichi Amari
Abstract:
The restricted Boltzmann machine (RBM) is a generative model widely used as an essential component of deep networks. However, it is hard to train RBMs by using maximum likelihood (ML) learning because many iterations of Gibbs sampling take too much computational time. In this study, we reveal that, if we consider RBMs with Gaussian visible units and constrain the weight matrix to the Stiefel manifold, we can easily compute analytical values of the likelihood and its gradients. The proposed algorithm on the Stiefel manifold achieves comparable performance to the standard learning algorithm.
The restricted Boltzmann machine (RBM) is a generative model widely used as an essential component of deep networks. However, it is hard to train RBMs by using maximum likelihood (ML) learning because many iterations of Gibbs sampling take too much computational time. In this study, we reveal that, if we consider RBMs with Gaussian visible units and constrain the weight matrix to the Stiefel manifold, we can easily compute analytical values of the likelihood and its gradients. The proposed algorithm on the Stiefel manifold achieves comparable performance to the standard learning algorithm.
ES2016-161
Semi-Supervised Classification of Social Textual Data Using WiSARD
Fabio Rangel, Fabrício de Faria, Priscila Lima, Jonice Oliveira
Semi-Supervised Classification of Social Textual Data Using WiSARD
Fabio Rangel, Fabrício de Faria, Priscila Lima, Jonice Oliveira
Abstract:
Text categorization is a problem which can be addressed by a semi-supervised learning classifier, since the annotation process is costly and ponderous. The semi-supervised approach is also adequate in context of social network text categorization, due to its adaptation in class distribution changes. This article presents a novel approach for semi-supervised learning based on WiSARD classifier (SSW), and compares it to others already established mechanisms (S3VM and NB-EM), over three different datasets. The novel approach shown to be up to fifty times more faster than S3VM and EM-NB with competitive accuracies.
Text categorization is a problem which can be addressed by a semi-supervised learning classifier, since the annotation process is costly and ponderous. The semi-supervised approach is also adequate in context of social network text categorization, due to its adaptation in class distribution changes. This article presents a novel approach for semi-supervised learning based on WiSARD classifier (SSW), and compares it to others already established mechanisms (S3VM and NB-EM), over three different datasets. The novel approach shown to be up to fifty times more faster than S3VM and EM-NB with competitive accuracies.
ES2016-162
On the equivalence between algorithms for Non-negative Matrix Factorization and Latent Dirichlet Allocation
Thiago de Paulo Faleiros, Alneu Lopes
On the equivalence between algorithms for Non-negative Matrix Factorization and Latent Dirichlet Allocation
Thiago de Paulo Faleiros, Alneu Lopes
Abstract:
LDA (Latent Dirichlet Allocation ) and NMF (Non-negative Matrix Factorization) are two popular techniques to extract topics in a textual document corpus. This paper shows that NMF with Kullback-Leibler divergence approximate the LDA model under a uniform Dirichlet prior, therefore the comparative analysis can be useful to elucidate the implementation of variational inference algorithm for LDA.
LDA (Latent Dirichlet Allocation ) and NMF (Non-negative Matrix Factorization) are two popular techniques to extract topics in a textual document corpus. This paper shows that NMF with Kullback-Leibler divergence approximate the LDA model under a uniform Dirichlet prior, therefore the comparative analysis can be useful to elucidate the implementation of variational inference algorithm for LDA.
ES2016-121
Word Embeddings for Morphologically Rich Languages
Pyry Takala
Word Embeddings for Morphologically Rich Languages
Pyry Takala
Abstract:
Word-embedding models commonly treat words as unique symbols, for which a lower-dimensional embedding can be looked up. These representations generalize poorly with morphologically rich languages, as vectors for all possible inflections cannot be stored, and words with the same stem do not share a similar representation. We study alternative representations for words, including one subword-model and two character-based models. Our methods outperform classical word embeddings for a morphologically rich language, Finnish, on tasks requiring sophisticated understanding of grammar and context. Our embeddings are easier to implement than previously proposed methods, and can be used to form word-representations for any common language processing tasks.
Word-embedding models commonly treat words as unique symbols, for which a lower-dimensional embedding can be looked up. These representations generalize poorly with morphologically rich languages, as vectors for all possible inflections cannot be stored, and words with the same stem do not share a similar representation. We study alternative representations for words, including one subword-model and two character-based models. Our methods outperform classical word embeddings for a morphologically rich language, Finnish, on tasks requiring sophisticated understanding of grammar and context. Our embeddings are easier to implement than previously proposed methods, and can be used to form word-representations for any common language processing tasks.
ES2016-79
Localized discriminative Gaussian process latent variable model for text-dependent speaker verification
Nooshin Maghsoodi, Hossein Sameti, Hossein Zeinali
Localized discriminative Gaussian process latent variable model for text-dependent speaker verification
Nooshin Maghsoodi, Hossein Sameti, Hossein Zeinali
Abstract:
The duration of utterances is one of the effective factors on the performance of speaker verification systems. Text dependent speaker verification suffers from both short duration and unmatched content between enrollment and test segments. In this paper, we use Discriminative Gaussian Process Latent Variable Model (DGPLVM) to deal with the uncertainty caused by short duration. This is the first attempt to utilize Gaussian Process for speaker verification. Also, to manage the unmatched content between enrollment and test segments we proposed the localized-DGPLVM that trains DGPLVM for each phrase in dataset. Experiments show the relative improvement of 27.4% in EER on RSR2015.
The duration of utterances is one of the effective factors on the performance of speaker verification systems. Text dependent speaker verification suffers from both short duration and unmatched content between enrollment and test segments. In this paper, we use Discriminative Gaussian Process Latent Variable Model (DGPLVM) to deal with the uncertainty caused by short duration. This is the first attempt to utilize Gaussian Process for speaker verification. Also, to manage the unmatched content between enrollment and test segments we proposed the localized-DGPLVM that trains DGPLVM for each phrase in dataset. Experiments show the relative improvement of 27.4% in EER on RSR2015.
ES2016-154
Multi-task learning for speech recognition: an overview
Gueorgui Pironkov, Stéphane Dupont, Thierry Dutoit
Multi-task learning for speech recognition: an overview
Gueorgui Pironkov, Stéphane Dupont, Thierry Dutoit
Abstract:
Generalization is a common issue for automatic speech recognition. A successful method used to improve recognition results consists of training a single system to solve multiple related tasks in parallel. This overview investigates which auxiliary tasks are helpful for speech recognition when multi-task learning is applied on a deep learning based acoustic model. The impact of multi-task learning on speech recognition related tasks, such as speaker adaptation, or robustness to noise, is also examined.
Generalization is a common issue for automatic speech recognition. A successful method used to improve recognition results consists of training a single system to solve multiple related tasks in parallel. This overview investigates which auxiliary tasks are helpful for speech recognition when multi-task learning is applied on a deep learning based acoustic model. The impact of multi-task learning on speech recognition related tasks, such as speaker adaptation, or robustness to noise, is also examined.
ES2016-62
Bayesian semi non-negative matrix factorisation
Albert Vilamala, Alfredo Vellido, Lluís A. Belanche
Bayesian semi non-negative matrix factorisation
Albert Vilamala, Alfredo Vellido, Lluís A. Belanche
Abstract:
Non-negative Matrix Factorisation (NMF) has become a standard method for source identification when data, sources and mixing coefficients are constrained to be positive-valued. The method has recently been extended to allow for negative-valued data and sources in the form of Semi- and Convex-NMF. In this paper, we re-elaborate Semi-NMF within a full Bayesian framework. This provides solid foundations for parameter estimation and, importantly, a principled method to address the problem of choosing the most adequate number of sources to describe the observed data. The proposed Bayesian Semi-NMF is preliminarily evaluated here in a real neuro-oncology problem.
Non-negative Matrix Factorisation (NMF) has become a standard method for source identification when data, sources and mixing coefficients are constrained to be positive-valued. The method has recently been extended to allow for negative-valued data and sources in the form of Semi- and Convex-NMF. In this paper, we re-elaborate Semi-NMF within a full Bayesian framework. This provides solid foundations for parameter estimation and, importantly, a principled method to address the problem of choosing the most adequate number of sources to describe the observed data. The proposed Bayesian Semi-NMF is preliminarily evaluated here in a real neuro-oncology problem.
ES2016-81
An Immune-Inspired, Dependence-Based Approach to Blind Inversion of Wiener Systems
Stephanie Alvarez Fernandez, Romis Attux, Denis G. Fantinato, Jugurta Montalvão, Daniel Silva
An Immune-Inspired, Dependence-Based Approach to Blind Inversion of Wiener Systems
Stephanie Alvarez Fernandez, Romis Attux, Denis G. Fantinato, Jugurta Montalvão, Daniel Silva
Abstract:
In this work, we present a comparative analysis of two methods --- based on the autocorrelation and autocorrentropy functions --- for representing the time structure of a given signal in the context of the unsupervised inversion of Wiener systems by Hammerstein systems. Linear stages with and without feedback are considered and an immune-inspired algorithm is used to allow parameter optimization without the need for manipulating the cost function, and also with a significant probability of global convergence. The results indicate that both functions provide effective means for system inversion and also illustrate the effect of linear feedback on the overall system performance.
In this work, we present a comparative analysis of two methods --- based on the autocorrelation and autocorrentropy functions --- for representing the time structure of a given signal in the context of the unsupervised inversion of Wiener systems by Hammerstein systems. Linear stages with and without feedback are considered and an immune-inspired algorithm is used to allow parameter optimization without the need for manipulating the cost function, and also with a significant probability of global convergence. The results indicate that both functions provide effective means for system inversion and also illustrate the effect of linear feedback on the overall system performance.
ES2016-122
A new penalisation term for image retrieval in clique neural networks
Romain Huet, Nicolas Courty, Sébastien Lefèvre
A new penalisation term for image retrieval in clique neural networks
Romain Huet, Nicolas Courty, Sébastien Lefèvre
Abstract:
Neural networks that are able to retrieve store and retrieve information constitue an old but still active area of research. Among the dierent existing architectures, recurrent networks that combine as- sociative memory with error correcting properties based on cliques have recently shown good performances on storing arbitrary random messages. However, they fail in scaling up to large dimensions data such as images, mostly because the distribution of activated neurons is not uniform in the network. We propose in this paper a new penalization term that alle- viates this problem, and shows its eciency on partially erased images reconstruction problem.
Neural networks that are able to retrieve store and retrieve information constitue an old but still active area of research. Among the dierent existing architectures, recurrent networks that combine as- sociative memory with error correcting properties based on cliques have recently shown good performances on storing arbitrary random messages. However, they fail in scaling up to large dimensions data such as images, mostly because the distribution of activated neurons is not uniform in the network. We propose in this paper a new penalization term that alle- viates this problem, and shows its eciency on partially erased images reconstruction problem.
ES2016-139
Gesture Recognition with a Convolutional Long Short-Term Memory Recurrent Neural Network
Eleni Tsironi, Pablo Barros, Stefan Wermter
Gesture Recognition with a Convolutional Long Short-Term Memory Recurrent Neural Network
Eleni Tsironi, Pablo Barros, Stefan Wermter
Abstract:
Inspired by the adequacy of convolutional neural networks in implicit extraction of visual features and the efficiency of Long Short-Term Memory Recurrent Neural Networks in dealing with long-range temporal dependencies, we propose a Convolutional Long Short-Term Memory Re- current Neural Network (CNNLSTM) for the problem of dynamic gesture recognition. The model is able to successfully learn gestures varying in duration and complexity and proves to be a significant base for further development. Finally, the new gesture command TsironiGR-dataset for human-robot interaction is presented for the evaluation of CNNLSTM.
Inspired by the adequacy of convolutional neural networks in implicit extraction of visual features and the efficiency of Long Short-Term Memory Recurrent Neural Networks in dealing with long-range temporal dependencies, we propose a Convolutional Long Short-Term Memory Re- current Neural Network (CNNLSTM) for the problem of dynamic gesture recognition. The model is able to successfully learn gestures varying in duration and complexity and proves to be a significant base for further development. Finally, the new gesture command TsironiGR-dataset for human-robot interaction is presented for the evaluation of CNNLSTM.
Machine learning for medical applications
ES2016-17
Machine learning for medical applications
Veronica Bolon-Canedo, Beatriz Remeseiro, Amparo Alonso-Betanzos, Aurélio Campilho
Machine learning for medical applications
Veronica Bolon-Canedo, Beatriz Remeseiro, Amparo Alonso-Betanzos, Aurélio Campilho
Abstract:
Machine learning has been well applied and recognized as an effective tool to handle a wide range of real situations, including medical applications. In this scenario, it can help to alleviate problems typically suffered by researchers in this field, such as saving time for practitioners and providing unbiased results. This tutorial is concerned with the use of machine learning techniques to solve different medical problems. We provide a survey of recent methods developed or applied to this context, together with a review of novel contributions to the ESANN 2016 special session on Machine learning for medical applications.
Machine learning has been well applied and recognized as an effective tool to handle a wide range of real situations, including medical applications. In this scenario, it can help to alleviate problems typically suffered by researchers in this field, such as saving time for practitioners and providing unbiased results. This tutorial is concerned with the use of machine learning techniques to solve different medical problems. We provide a survey of recent methods developed or applied to this context, together with a review of novel contributions to the ESANN 2016 special session on Machine learning for medical applications.
ES2016-138
Automatic detection of EEG arousals
Isaac Fernández-Varela, Elena Hernández-Pereira, Diego Álvarez-Estévez, Vicente Moret-Bonillo
Automatic detection of EEG arousals
Isaac Fernández-Varela, Elena Hernández-Pereira, Diego Álvarez-Estévez, Vicente Moret-Bonillo
Abstract:
Fragmented sleep is commonly caused by arousals that can be detected with the observation of electroencephalographic (EEG) signals. As this is a time consuming task, automatization processes are required. A method using signal processing and machine learning models, for arousal detection, is presented. Relevant events are identified in the EEG signals and in the electromyography, during the signal processing phase. After discarding those events that do not meet the required characteristics, the resulting set is used to extract multiple parameters. Several machine learning models -Fisher's Linear Discriminant, Artificial Neural Networks and Support Vector Machines- are fed with these parameters. The final proposed model, a combination of the different individual models, was used to conduct experiments on 26 patients, reporting a sensitivity of 0.72 and a specificity of 0.89, while achieving an error of 0.13, in the arousal events detection.
Fragmented sleep is commonly caused by arousals that can be detected with the observation of electroencephalographic (EEG) signals. As this is a time consuming task, automatization processes are required. A method using signal processing and machine learning models, for arousal detection, is presented. Relevant events are identified in the EEG signals and in the electromyography, during the signal processing phase. After discarding those events that do not meet the required characteristics, the resulting set is used to extract multiple parameters. Several machine learning models -Fisher's Linear Discriminant, Artificial Neural Networks and Support Vector Machines- are fed with these parameters. The final proposed model, a combination of the different individual models, was used to conduct experiments on 26 patients, reporting a sensitivity of 0.72 and a specificity of 0.89, while achieving an error of 0.13, in the arousal events detection.
ES2016-8
Stacked denoising autoencoders for the automatic recognition of microglial cells’ state
Sofia Fernandes, Ricardo Sousa, Renato Socodato, Luis Silva
Stacked denoising autoencoders for the automatic recognition of microglial cells’ state
Sofia Fernandes, Ricardo Sousa, Renato Socodato, Luis Silva
Abstract:
We present the first study for the automatic recognition of microglial cells' state using stacked denoising autoencoders. Microglia has a pivotal role as sentinel of neuronal diseases where its state (resting, transition or active) is indicative of what is occurring in the Central Nervous System. In this work we delve on different strategies to best learn a stacked denoising autoencoder for that purpose and show that the transition state is the most hard to recognize while an accuracy of approximately 64% is obtained with a dataset of 45 images.
We present the first study for the automatic recognition of microglial cells' state using stacked denoising autoencoders. Microglia has a pivotal role as sentinel of neuronal diseases where its state (resting, transition or active) is indicative of what is occurring in the Central Nervous System. In this work we delve on different strategies to best learn a stacked denoising autoencoder for that purpose and show that the transition state is the most hard to recognize while an accuracy of approximately 64% is obtained with a dataset of 45 images.
ES2016-82
A machine learning pipeline for supporting differentiation of glioblastomas from single brain metastases
Victor Mocioiu, Nuno Miguel Pedrosa de Barros, Sandra Ortega-Martorell, Johannes Slotboom, Urspeter Knecht, Carles Arús, Alfredo Vellido, Margarida Julià-Sapé
A machine learning pipeline for supporting differentiation of glioblastomas from single brain metastases
Victor Mocioiu, Nuno Miguel Pedrosa de Barros, Sandra Ortega-Martorell, Johannes Slotboom, Urspeter Knecht, Carles Arús, Alfredo Vellido, Margarida Julià-Sapé
Abstract:
Machine learning has provided, over the last decades, tools for knowledge extraction in complex medical domains. Most of these tools, though, are ad hoc solutions and lack the systematic approach that would be required to become mainstream in medical practice. In this brief paper, we define a machine learning-based analysis pipeline for a difficult problem in the field of neuro-oncology, namely the discrimination of brain glioblastomas from single brain metastases. This pipeline involves source extraction using k-Means-initialized Convex Non-negative Matrix Factorization and a collection of classifiers, including Logistic Regression, Linear Discriminant Analysis, AdaBoost, and Random Forests.
Machine learning has provided, over the last decades, tools for knowledge extraction in complex medical domains. Most of these tools, though, are ad hoc solutions and lack the systematic approach that would be required to become mainstream in medical practice. In this brief paper, we define a machine learning-based analysis pipeline for a difficult problem in the field of neuro-oncology, namely the discrimination of brain glioblastomas from single brain metastases. This pipeline involves source extraction using k-Means-initialized Convex Non-negative Matrix Factorization and a collection of classifiers, including Logistic Regression, Linear Discriminant Analysis, AdaBoost, and Random Forests.
ES2016-142
Feature definition, analysis and selection for lung nodule classification in chest computerized tomography images
Luis Gonçalves, Jorge Novo, Aurélio Campilho
Feature definition, analysis and selection for lung nodule classification in chest computerized tomography images
Luis Gonçalves, Jorge Novo, Aurélio Campilho
Abstract:
This work presents the results of the characterization of lung nodules in chest Computerized Tomography for benign/malignant classification. A set of image features was used in the Computer-aided Diagnosis system to distinguish benign from malignant nodules and, therefore, diagnose lung cancer. A filter-based feature selection approach was used in order to define an optimal subset with higher accuracy. A large and heterogeneous set of 293 features was defined, including shape, intensity and texture features. We used different KNN and SVM classifiers to evaluate the features subsets. The estimated results were tested in a dataset annotated by radiologists. Promising results were obtained with an area under the Receiver Operating Characteristic curve (AUC value) of 96.2 ± 0.5 using SVM.
This work presents the results of the characterization of lung nodules in chest Computerized Tomography for benign/malignant classification. A set of image features was used in the Computer-aided Diagnosis system to distinguish benign from malignant nodules and, therefore, diagnose lung cancer. A filter-based feature selection approach was used in order to define an optimal subset with higher accuracy. A large and heterogeneous set of 293 features was defined, including shape, intensity and texture features. We used different KNN and SVM classifiers to evaluate the features subsets. The estimated results were tested in a dataset annotated by radiologists. Promising results were obtained with an area under the Receiver Operating Characteristic curve (AUC value) of 96.2 ± 0.5 using SVM.
ES2016-39
Bag-of-Steps: predicting lower-limb fracture rehabilitation length
Albert Pla, Beatriz López, Cristofor Nogueira, Natalia Mordvaniuk, Taco J. Blokhuis, Herman R Holtslag
Bag-of-Steps: predicting lower-limb fracture rehabilitation length
Albert Pla, Beatriz López, Cristofor Nogueira, Natalia Mordvaniuk, Taco J. Blokhuis, Herman R Holtslag
Abstract:
This paper presents bag-of-steps, a new methodology to predict the rehabilitation length of a patient by monitoring the weight he is bearing in his injured leg and using a predictive model based on the bag-of-words technique. A force sensor is used to monitor and characterize the patient's gait, obtaining a set of step descriptors. These are later used to define a vocabulary of steps that can be used to describe rehabilitation sessions. Sessions are finally fed to a support vector machine classifier that performs the final rehabilitation estimation.
This paper presents bag-of-steps, a new methodology to predict the rehabilitation length of a patient by monitoring the weight he is bearing in his injured leg and using a predictive model based on the bag-of-words technique. A force sensor is used to monitor and characterize the patient's gait, obtaining a set of step descriptors. These are later used to define a vocabulary of steps that can be used to describe rehabilitation sessions. Sessions are finally fed to a support vector machine classifier that performs the final rehabilitation estimation.
ES2016-152
Initializing nonnegative matrix factorization using the successive projection algorithm for multi-parametric medical image segmentation
Nicolas Sauwen, Marjan Acou, Halandur Nagaraja Bharath, Diana Sima, Jelle Veraart, Frederik Maes, Uwe Himmelreich, Eric Achten, Sabine Van Huffel
Initializing nonnegative matrix factorization using the successive projection algorithm for multi-parametric medical image segmentation
Nicolas Sauwen, Marjan Acou, Halandur Nagaraja Bharath, Diana Sima, Jelle Veraart, Frederik Maes, Uwe Himmelreich, Eric Achten, Sabine Van Huffel
Abstract:
As nonnegative matrix factorization (NMF) represents a non-convex problem, the quality of its solution will depend on the initialization of the factor matrices. This study proposes the Successive Projection Algorithm (SPA) as a feasible NMF initialization method. SPA is applied to a multi-parametric MRI dataset for automated NMF brain tumor segmentation. SPA provides fast and reproducible estimates of the tissue sources, and segmentation quality is found to be similar compared to repetitive random initialization.
As nonnegative matrix factorization (NMF) represents a non-convex problem, the quality of its solution will depend on the initialization of the factor matrices. This study proposes the Successive Projection Algorithm (SPA) as a feasible NMF initialization method. SPA is applied to a multi-parametric MRI dataset for automated NMF brain tumor segmentation. SPA provides fast and reproducible estimates of the tissue sources, and segmentation quality is found to be similar compared to repetitive random initialization.
ES2016-49
On the analysis of feature selection techniques in a conjunctival hyperemia grading framework
Maria Luisa Sanchez Brea, Noelia Barreira Rodríguez, Noelia Sánchez Maroño, Antonio Mosquera González, Carlos García Resúa, Eva Yebra-Pimentel Vilar
On the analysis of feature selection techniques in a conjunctival hyperemia grading framework
Maria Luisa Sanchez Brea, Noelia Barreira Rodríguez, Noelia Sánchez Maroño, Antonio Mosquera González, Carlos García Resúa, Eva Yebra-Pimentel Vilar
Abstract:
Hyperemia is a parameter that describes the degree of redness in a tissue. When it affects the bulbar conjunctiva, it can serve as an early indicator for pathologies such as dry eye syndrome. Hyperemia is measured using scales, which are collections of images that show different severity levels. Features computed from the images can be used to develop an automatic grading system with the help of machine learning algorithms. In this work, we present a methodology that analyses the influence of each feature when determining the hyperemia level.
Hyperemia is a parameter that describes the degree of redness in a tissue. When it affects the bulbar conjunctiva, it can serve as an early indicator for pathologies such as dry eye syndrome. Hyperemia is measured using scales, which are collections of images that show different severity levels. Features computed from the images can be used to develop an automatic grading system with the help of machine learning algorithms. In this work, we present a methodology that analyses the influence of each feature when determining the hyperemia level.
ES2016-133
Using a feature selection ensemble on DNA microarray datasets
Borja Seijo-Pardo, Veronica Bolon-Canedo, Amparo Alonso-Betanzos
Using a feature selection ensemble on DNA microarray datasets
Borja Seijo-Pardo, Veronica Bolon-Canedo, Amparo Alonso-Betanzos
Abstract:
DNA microarray has brought a difficult challenge for researchers due to the high number of gene expression contained and the small samples size. Therefore, feature selection has become an indispensable preprocessing step. In this paper we propose an ensemble for feature selection based on combining rankings of features. The individual rankings are combined with different aggregation methods, and a practical subset of features is selected according to a data complexity measure -the inverse of Fisher discriminant ratio-. The proposed ensemble, tested on seven different DNA microarray datasets using a Support Vector Machine as classifier, was able to obtain the best results in different scenarios.
DNA microarray has brought a difficult challenge for researchers due to the high number of gene expression contained and the small samples size. Therefore, feature selection has become an indispensable preprocessing step. In this paper we propose an ensemble for feature selection based on combining rankings of features. The individual rankings are combined with different aggregation methods, and a practical subset of features is selected according to a data complexity measure -the inverse of Fisher discriminant ratio-. The proposed ensemble, tested on seven different DNA microarray datasets using a Support Vector Machine as classifier, was able to obtain the best results in different scenarios.
ES2016-67
A fast learning algorithm for high dimensional problems: an application to microarrays
Oscar Fontenla-Romero, Bertha Guijarro-Berdiñas, Beatriz Pérez-Sánchez, Diego Rego-Fernández, David Martínez-Rego
A fast learning algorithm for high dimensional problems: an application to microarrays
Oscar Fontenla-Romero, Bertha Guijarro-Berdiñas, Beatriz Pérez-Sánchez, Diego Rego-Fernández, David Martínez-Rego
Abstract:
In this work, a new learning method for one-layer neural network based on a singular value decomposition is presented. The optimal parameters of the model can be obtained by means of a system of linear equations, which complexity depends on the number of samples. This approach provides a fast learning algorithm for huge dimensional problems where the number of inputs is higher than the number of data points. This kind of situations appears, for example, in DNA microarrays scenarios. An experimental study over eleven microarray datasets shows that the proposed method is able to defeat other representative classifiers, in terms of CPU time, without significant loss of accuracy.
In this work, a new learning method for one-layer neural network based on a singular value decomposition is presented. The optimal parameters of the model can be obtained by means of a system of linear equations, which complexity depends on the number of samples. This approach provides a fast learning algorithm for huge dimensional problems where the number of inputs is higher than the number of data points. This kind of situations appears, for example, in DNA microarrays scenarios. An experimental study over eleven microarray datasets shows that the proposed method is able to defeat other representative classifiers, in terms of CPU time, without significant loss of accuracy.
ES2016-134
Data complexity measures for analyzing the effect of SMOTE over microarrays
Laura Morán-Fernández, Veronica Bolon-Canedo, Amparo Alonso-Betanzos
Data complexity measures for analyzing the effect of SMOTE over microarrays
Laura Morán-Fernández, Veronica Bolon-Canedo, Amparo Alonso-Betanzos
Abstract:
Microarray classification is a challenging issue for machine learning researchers mainly due to the fact that there is a mismatch between gene dimension and sample size. Besides, this type of data have other properties that can complicate the classification task, such as class imbalance. A common approach to deal with the problem of imbalanced datasets is the use of a preprocessing step trying to cope with this imbalance.. In this work we analyze the usefulness of the data complexity measures in order to evaluate the behavior of the SMOTE algorithm before and after applying feature gene selection.
Microarray classification is a challenging issue for machine learning researchers mainly due to the fact that there is a mismatch between gene dimension and sample size. Besides, this type of data have other properties that can complicate the classification task, such as class imbalance. A common approach to deal with the problem of imbalanced datasets is the use of a preprocessing step trying to cope with this imbalance.. In this work we analyze the usefulness of the data complexity measures in order to evaluate the behavior of the SMOTE algorithm before and after applying feature gene selection.
ES2016-60
Multi-step strategy for mortality assessment in cardiovascular risk patients with imbalanced data
Fernando Mateo, Emilio Soria-Olivas, Marcelino Martínez-Sober, Maria Tellez-Plaza, Juan Gómez-Sanchis, Josep Redon
Multi-step strategy for mortality assessment in cardiovascular risk patients with imbalanced data
Fernando Mateo, Emilio Soria-Olivas, Marcelino Martínez-Sober, Maria Tellez-Plaza, Juan Gómez-Sanchis, Josep Redon
Abstract:
The assessment of mortality in patients with cardiovascular disease (CVD) risk factors is typically a challenging task given the large amount of collected variables and the imbalance between classes. This is the case of the ESCARVAL-RISK dataset, a large cardiovascular follow-up record spanning 4 years. This study intends to give insight into: a) the performance of variable selection methods, b) the best class balancing method and c) choosing an adequate classifier to predict mortality. We conclude that combining ADASYN with SVM classifiers without and with AUC score-based feature selection, and RUSBoost combined with boosting tree ensembles are the most suitable methodologies among the tested.
The assessment of mortality in patients with cardiovascular disease (CVD) risk factors is typically a challenging task given the large amount of collected variables and the imbalance between classes. This is the case of the ESCARVAL-RISK dataset, a large cardiovascular follow-up record spanning 4 years. This study intends to give insight into: a) the performance of variable selection methods, b) the best class balancing method and c) choosing an adequate classifier to predict mortality. We conclude that combining ADASYN with SVM classifiers without and with AUC score-based feature selection, and RUSBoost combined with boosting tree ensembles are the most suitable methodologies among the tested.
ES2016-145
Spatiotemporal ICA improves the selection of differentially expressed genes
Emilie Renard, Andrew E. Teschendorff, Pierre-Antoine Absil
Spatiotemporal ICA improves the selection of differentially expressed genes
Emilie Renard, Andrew E. Teschendorff, Pierre-Antoine Absil
Abstract:
Selecting differentially expressed genes with respect to some phenotype of interest is a difficult task, specially in the presence of confounding factors. We propose to use a spatiotemporal independent component analysis to model those factors, and to combine information from different spatiotemporal parameter values to improve the set of selected genes. We show on real datasets that the proposed method allows to significantly increase the proportion of genes related to the phenotype of interest in the final selection.
Selecting differentially expressed genes with respect to some phenotype of interest is a difficult task, specially in the presence of confounding factors. We propose to use a spatiotemporal independent component analysis to model those factors, and to combine information from different spatiotemporal parameter values to improve the set of selected genes. We show on real datasets that the proposed method allows to significantly increase the proportion of genes related to the phenotype of interest in the final selection.
ES2016-167
Unsupervised Cross-Subject BCI Learning and Classification using Riemannian Geometry
Samaneh Nasiri Ghosheh Bolagh, Mohammad Bagher SHAMSOLLAHI, Christian Jutten, Marco Congedo
Unsupervised Cross-Subject BCI Learning and Classification using Riemannian Geometry
Samaneh Nasiri Ghosheh Bolagh, Mohammad Bagher SHAMSOLLAHI, Christian Jutten, Marco Congedo
Abstract:
The inter-subject variability poses a challenge in cross-subject Brain-Computer Interface learning and classification. As a matter of fact, in cross-subject learning not all available subjects may improve the performance on a test subject. In order to address this problem we propose a subject selection algorithm and we investigate the use of this algorithm in the Riemannian geometry classification framework. We demonstrate that this new approach can significantly improve cross-subject learning without the need of any labeled data from test subjects.
The inter-subject variability poses a challenge in cross-subject Brain-Computer Interface learning and classification. As a matter of fact, in cross-subject learning not all available subjects may improve the performance on a test subject. In order to address this problem we propose a subject selection algorithm and we investigate the use of this algorithm in the Riemannian geometry classification framework. We demonstrate that this new approach can significantly improve cross-subject learning without the need of any labeled data from test subjects.
ES2016-11
Assessment of diabetic retinopathy risk with random forests
Silvia Sanromà, Antonio Moreno, Aida Valls, Pedro Romero, Sofia De La Riva, Ramon Sagarra
Assessment of diabetic retinopathy risk with random forests
Silvia Sanromà, Antonio Moreno, Aida Valls, Pedro Romero, Sofia De La Riva, Ramon Sagarra
Abstract:
Diabetic retinopathy is one of the most usual morbidities associated to diabetes. Its appropriate control requires the implementation of expensive screening programs. This paper reports the use of Random Forests to build a classifier which may determine, with sensitivity and especifity levels over 80%, whether a diabetic person is likely to develop retinopathy. The use of this model in a decision support tool may help doctors to determine the best screening periodicity for each person, so that an appropriate care is provided and human, material and economic resources are more efficiently employed.
Diabetic retinopathy is one of the most usual morbidities associated to diabetes. Its appropriate control requires the implementation of expensive screening programs. This paper reports the use of Random Forests to build a classifier which may determine, with sensitivity and especifity levels over 80%, whether a diabetic person is likely to develop retinopathy. The use of this model in a decision support tool may help doctors to determine the best screening periodicity for each person, so that an appropriate care is provided and human, material and economic resources are more efficiently employed.
Physics and Machine Learning: Emerging Paradigms
ES2016-20
Physics and Machine Learning: Emerging Paradigms
José D. Martín-Guerrero, Paulo J. G. Lisboa, Alfredo Vellido
Physics and Machine Learning: Emerging Paradigms
José D. Martín-Guerrero, Paulo J. G. Lisboa, Alfredo Vellido
ES2016-12
Controlling adaptive quantum-phase estimation with scalable reinforcement learning
Pantita Palittapongarnpim, Peter Wittek, Barry C. Sanders
Controlling adaptive quantum-phase estimation with scalable reinforcement learning
Pantita Palittapongarnpim, Peter Wittek, Barry C. Sanders
Abstract:
We develop a reinforcement learning algorithm to construct a feedback policy that delivers quantum-enhanced interferometric phase estimation up to 100 photons in a noisy environment. We ensure scalability of the calculations by distributing the workload in a cluster and by vectorizing time-critical operations. We also improve running time by introducing accept-reject criteria to terminate calculation when a successful result is reached. Furthermore, we make the learning algorithm robust to noise by fine-tuning how the objective function is evaluated. The results show the importance and relevance of well-designed classical machine learning algorithms in quantum physics problems.
We develop a reinforcement learning algorithm to construct a feedback policy that delivers quantum-enhanced interferometric phase estimation up to 100 photons in a noisy environment. We ensure scalability of the calculations by distributing the workload in a cluster and by vectorizing time-critical operations. We also improve running time by introducing accept-reject criteria to terminate calculation when a successful result is reached. Furthermore, we make the learning algorithm robust to noise by fine-tuning how the objective function is evaluated. The results show the importance and relevance of well-designed classical machine learning algorithms in quantum physics problems.
ES2016-181
How machine learning won the Higgs boson challenge
Claire Adam-Bourdarios, Glen Cowan, Cecile Germain, Isabelle Guyon, Balazs Kegl, David Rousseau
How machine learning won the Higgs boson challenge
Claire Adam-Bourdarios, Glen Cowan, Cecile Germain, Isabelle Guyon, Balazs Kegl, David Rousseau
Abstract:
In 2014 we ran a very successful machine learning challenge in High Ernergy physics attracting 1785 teams, which exposed the machine learning community for the first time to the problem of "learning to discover" (www.kaggle.com/c/higgs-boson). While physicists had the opportunity to improve on the state-of-the-art using "feature engineering" based on physics principles, this was not the determining factor in winning the challenge. Rather, the challenge revealed that the central difficulty of the problem is to develop a strategy to optimize directly the Approximate Median Significance (AMS) objective function, which is a particularly challenging and novel problem. This objective function aims at increasing the power of a statistical test. The top ranking learning machines span a variety of techniques including deep learning and gradient tree boosting. This paper presents the problem setting and analyzes the results.
In 2014 we ran a very successful machine learning challenge in High Ernergy physics attracting 1785 teams, which exposed the machine learning community for the first time to the problem of "learning to discover" (www.kaggle.com/c/higgs-boson). While physicists had the opportunity to improve on the state-of-the-art using "feature engineering" based on physics principles, this was not the determining factor in winning the challenge. Rather, the challenge revealed that the central difficulty of the problem is to develop a strategy to optimize directly the Approximate Median Significance (AMS) objective function, which is a particularly challenging and novel problem. This objective function aims at increasing the power of a statistical test. The top ranking learning machines span a variety of techniques including deep learning and gradient tree boosting. This paper presents the problem setting and analyzes the results.
ES2016-171
Performance assessment of quantum clustering in non-spherical data distributions
Raul V. Casaña-Eslava, José D. Martín-Guerrero, Ian H. Jarman, Paulo J. G. Lisboa
Performance assessment of quantum clustering in non-spherical data distributions
Raul V. Casaña-Eslava, José D. Martín-Guerrero, Ian H. Jarman, Paulo J. G. Lisboa
Abstract:
This work deals with the performance of Quantum Clustering (QC) when applied to non-spherically distributed data sets; in particular, QC outperforms K-Means when applied to a data set that contains information of different olive oil areas. The Jaccard score can be set depending on QC parameters; this enables to find local maxima by tuning QC parameters, thus showing up the underlying data structure. In conclusion, QC appears as a promising solution to deal with non-spherical data distributions; however, some improvements are still needed, for example, in order to find out a way to detect the appropriate number of clusters for a given data set.
This work deals with the performance of Quantum Clustering (QC) when applied to non-spherically distributed data sets; in particular, QC outperforms K-Means when applied to a data set that contains information of different olive oil areas. The Jaccard score can be set depending on QC parameters; this enables to find local maxima by tuning QC parameters, thus showing up the underlying data structure. In conclusion, QC appears as a promising solution to deal with non-spherical data distributions; however, some improvements are still needed, for example, in order to find out a way to detect the appropriate number of clusters for a given data set.
ES2016-47
Supervised quantum gate "teaching" for quantum hardware design
Leonardo Banchi, Nicola Pancotti, Sougato Bose
Supervised quantum gate "teaching" for quantum hardware design
Leonardo Banchi, Nicola Pancotti, Sougato Bose
Abstract:
We show how to train a quantum network of pairwise interacting qubits such that its evolution implements a target quantum algorithm into a given network subset. Our strategy is inspired by supervised learning and is designed to help the physical construction of a quantum computer which operates with minimal external classical control.
We show how to train a quantum network of pairwise interacting qubits such that its evolution implements a target quantum algorithm into a given network subset. Our strategy is inspired by supervised learning and is designed to help the physical construction of a quantum computer which operates with minimal external classical control.
ES2016-7
Enhanced learning for agents in quantum-accessible environments
Jacob Taylor, Hans Briegel, Vedran Dunjko
Enhanced learning for agents in quantum-accessible environments
Jacob Taylor, Hans Briegel, Vedran Dunjko
Abstract:
In this paper we provide a broad framework for describing learning agents in general quantum environments. We analyze the types of classically specified environments which allow for quantum enhancements in learning, by contrasting environments to quantum oracles. We show that whether or not quantum improvements are at all possible depends on the internal structure of the quantum environment. If the environments have an appropriate structure, we show that near-generic improvements in learning times are possible in a broad range of scenarios.
In this paper we provide a broad framework for describing learning agents in general quantum environments. We analyze the types of classically specified environments which allow for quantum enhancements in learning, by contrasting environments to quantum oracles. We show that whether or not quantum improvements are at all possible depends on the internal structure of the quantum environment. If the environments have an appropriate structure, we show that near-generic improvements in learning times are possible in a broad range of scenarios.
Incremental learning algorithms and applications
ES2016-19
Incremental learning algorithms and applications
Alexander Gepperth, Barbara Hammer
Incremental learning algorithms and applications
Alexander Gepperth, Barbara Hammer
Abstract:
Incremental learning refers to learning from streaming data, which arrive over time, with limited memory resources and, ideally, without sacrificing model accuracy. This setting fits different application scenarios where lifelong learning is relevant, e.g. due to changing environments, and it offers an elegant scheme for big data processing by means of its sequential treatment. In this contribution, we formalise the concept of incremental learning, we discuss particular challenges which arise in this setting, and we give an overview about popular approaches, its theoretical foundations, and applications, which emerged in the last years.
Incremental learning refers to learning from streaming data, which arrive over time, with limited memory resources and, ideally, without sacrificing model accuracy. This setting fits different application scenarios where lifelong learning is relevant, e.g. due to changing environments, and it offers an elegant scheme for big data processing by means of its sequential treatment. In this contribution, we formalise the concept of incremental learning, we discuss particular challenges which arise in this setting, and we give an overview about popular approaches, its theoretical foundations, and applications, which emerged in the last years.
ES2016-71
Choosing the best algorithm for an incremental on-line learning task
Viktor Losing, Barbara Hammer, Heiko Wersing
Choosing the best algorithm for an incremental on-line learning task
Viktor Losing, Barbara Hammer, Heiko Wersing
Abstract:
Recently, incremental and on-line learning gained more attention especially in the context of big data and learning from data streams, conflicting with the traditional assumption of complete data availability. Even though a variety of different methods are available, it often remains unclear which of them is suitable for a specific task and how they perform in comparison to each other. We analyze the key properties of seven incremental methods representing different algorithm classes. Our extensive evaluation on data sets with different characteristics gives an overview of the performance with respect to accuracy as well as model complexity, facilitating the choice of the best method for a given application.
Recently, incremental and on-line learning gained more attention especially in the context of big data and learning from data streams, conflicting with the traditional assumption of complete data availability. Even though a variety of different methods are available, it often remains unclear which of them is suitable for a specific task and how they perform in comparison to each other. We analyze the key properties of seven incremental methods representing different algorithm classes. Our extensive evaluation on data sets with different characteristics gives an overview of the performance with respect to accuracy as well as model complexity, facilitating the choice of the best method for a given application.
ES2016-9
Distributed learning algorithm for feedforward neural networks
Oscar Fontenla-Romero, Beatriz Pérez-Sánchez, Bertha Guijarro-Berdiñas, Diego Rego-Fernández
Distributed learning algorithm for feedforward neural networks
Oscar Fontenla-Romero, Beatriz Pérez-Sánchez, Bertha Guijarro-Berdiñas, Diego Rego-Fernández
Abstract:
With the appearance of huge data sets new challenges have risen regarding the scalability and efficiency of Machine Learning algorithms, and both distributed computing and randomized algorithms have become effective ways to handle them. Taking advantage of these two approaches, a distributed learning algorithm for two-layer neural networks is proposed. Results demonstrate a similar accuracy when compare to an equivalent non-distributed approach whilst providing some advantages that make it specially well-suited for Big Data sets: over 50\% savings in computational time; low communication and storage cost; no hyperparameters to be tuned; it allows online learning and it is privacy-preserving.
With the appearance of huge data sets new challenges have risen regarding the scalability and efficiency of Machine Learning algorithms, and both distributed computing and randomized algorithms have become effective ways to handle them. Taking advantage of these two approaches, a distributed learning algorithm for two-layer neural networks is proposed. Results demonstrate a similar accuracy when compare to an equivalent non-distributed approach whilst providing some advantages that make it specially well-suited for Big Data sets: over 50\% savings in computational time; low communication and storage cost; no hyperparameters to be tuned; it allows online learning and it is privacy-preserving.
ES2016-91
Watch, Ask, Learn, and Improve: a lifelong learning cycle for visual recognition
Christoph Käding, Erik Rodner, Alexander Freytag, Joachim Denzler
Watch, Ask, Learn, and Improve: a lifelong learning cycle for visual recognition
Christoph Käding, Erik Rodner, Alexander Freytag, Joachim Denzler
Abstract:
We present WALI, a prototypical system that learns object categories over time by continuously watching online videos. WALI actively asks questions to a human annotator about the visual content of observed video frames. Thereby, WALI is able to receive information about new categories and to simultaneously improve its generalization abilities. The functionality of WALI is driven by scalable active learning, efficient incremental learning, as well as state-of-the-art visual descriptors. In our experiments, we show qualitative and quantitative statistics about WALI's learning process. WALI runs continuously and regularly asks questions.
We present WALI, a prototypical system that learns object categories over time by continuously watching online videos. WALI actively asks questions to a human annotator about the visual content of observed video frames. Thereby, WALI is able to receive information about new categories and to simultaneously improve its generalization abilities. The functionality of WALI is driven by scalable active learning, efficient incremental learning, as well as state-of-the-art visual descriptors. In our experiments, we show qualitative and quantitative statistics about WALI's learning process. WALI runs continuously and regularly asks questions.
ES2016-97
Memory management for data streams subject to concept drift
Pierre-Xavier Loeffel, Christophe Marsala, Marcin Detyniecki
Memory management for data streams subject to concept drift
Pierre-Xavier Loeffel, Christophe Marsala, Marcin Detyniecki
Abstract:
Learning on data streams subject to concept drifts is a challenging task. A successful algorithm must be able to keep memory consumption constant regardless of the amount of data processed, and at the same time, retain good adaptation and prediction capabilities by effectively selecting which observations should be stored into memory. We claim that, instead of using a temporal window to discard observations with a time stamp criterion, it is better to retain observations that minimize the change in outputted prediction and rule learned with the full memory case. Experimental results for the Droplets algorithm, on 6 artificial and semi-artificial datasets reproducing various types of drifts back this claim.
Learning on data streams subject to concept drifts is a challenging task. A successful algorithm must be able to keep memory consumption constant regardless of the amount of data processed, and at the same time, retain good adaptation and prediction capabilities by effectively selecting which observations should be stored into memory. We claim that, instead of using a temporal window to discard observations with a time stamp criterion, it is better to retain observations that minimize the change in outputted prediction and rule learned with the full memory case. Experimental results for the Droplets algorithm, on 6 artificial and semi-artificial datasets reproducing various types of drifts back this claim.
ES2016-160
Towards incremental deep learning: multi-level change detection in a hierarchical visual recognition architecture
Thomas Hecht, Alexander Gepperth
Towards incremental deep learning: multi-level change detection in a hierarchical visual recognition architecture
Thomas Hecht, Alexander Gepperth
Abstract:
We present a hierarchical recognition architecure capable of detecting newness (or outliers) at all hierarchical levels. As the ability to detect newness is an import prerequisite for incremental learning, this contribution paves the way for deep neural architectures that are able to learn in an incremental fashion. We verify the ability to detect newness by conducting experiments on the MNIST database, where we introduce either localized changes, by adding noise to a small patch of the input, or global changes, by combining the left and right half of two different samples which is not detectable at the local but only at the global level.
We present a hierarchical recognition architecure capable of detecting newness (or outliers) at all hierarchical levels. As the ability to detect newness is an import prerequisite for incremental learning, this contribution paves the way for deep neural architectures that are able to learn in an incremental fashion. We verify the ability to detect newness by conducting experiments on the MNIST database, where we introduce either localized changes, by adding noise to a small patch of the input, or global changes, by combining the left and right half of two different samples which is not detectable at the local but only at the global level.
Classification
ES2016-144
Boosting face recognition via neural Super-Resolution
Guillaume Berger, Clément Peyrard, Moez Baccouche
Boosting face recognition via neural Super-Resolution
Guillaume Berger, Clément Peyrard, Moez Baccouche
Abstract:
We propose a two-step neural approach for face Super-Resolution (SR) to improve face recognition performances. It consists in first performing generic SR on the entire image, based on Convolutional Neural Networks, followed by a specific local SR step for each facial component, using neural autoencoders. Obtained results on the LFW dataset for a ×4 upscaling factor demonstrate that the method improves both image reconstruction (+2.80 dB in PSNR) and recognition performances (+3.94 points in mean accuracy), compared with ×4 bicubic interpolation.
We propose a two-step neural approach for face Super-Resolution (SR) to improve face recognition performances. It consists in first performing generic SR on the entire image, based on Convolutional Neural Networks, followed by a specific local SR step for each facial component, using neural autoencoders. Obtained results on the LFW dataset for a ×4 upscaling factor demonstrate that the method improves both image reconstruction (+2.80 dB in PSNR) and recognition performances (+3.94 points in mean accuracy), compared with ×4 bicubic interpolation.
ES2016-116
Parallelized rotation and flipping INvariant Kohonen maps (PINK) on GPUs
Kai Lars Polsterer, Fabian Gieseke, Christian Igel, Bernd Doser, Nikolaos Gianniotis
Parallelized rotation and flipping INvariant Kohonen maps (PINK) on GPUs
Kai Lars Polsterer, Fabian Gieseke, Christian Igel, Bernd Doser, Nikolaos Gianniotis
Abstract:
Morphological classification is one of the most demanding challenges in astronomy. With the advent of all-sky surveys, an enormous amount of imaging data is publicly available. These data are typically analyzed by experts or encouraged amateur volunteers. For upcoming surveys with billions of objects, however, such an approach is not feasible anymore. In this work, we present a simple yet effective variant of a rotation-invariant self-organizing map that is suitable for many analysis tasks in astronomical. We show how to reduce the computational complexity via modern GPUs and apply the resulting framework to galaxy data for morphological analysis.
Morphological classification is one of the most demanding challenges in astronomy. With the advent of all-sky surveys, an enormous amount of imaging data is publicly available. These data are typically analyzed by experts or encouraged amateur volunteers. For upcoming surveys with billions of objects, however, such an approach is not feasible anymore. In this work, we present a simple yet effective variant of a rotation-invariant self-organizing map that is suitable for many analysis tasks in astronomical. We show how to reduce the computational complexity via modern GPUs and apply the resulting framework to galaxy data for morphological analysis.
ES2016-53
Sparse Least Squares Support Vector Machines via Multiresponse Sparse Regression
David Vieira, Ajalmar Rocha Neto, Antonio Wendell Rodrigues
Sparse Least Squares Support Vector Machines via Multiresponse Sparse Regression
David Vieira, Ajalmar Rocha Neto, Antonio Wendell Rodrigues
Abstract:
Least Square Suppor Vector Machines (LSSVMs) are an alternative to SVMs because the training process for LSSVMs is based on solving a linear equation system while the training process for SVMs relies on solving a quadratic programming optimization problem. Despite solving a linear system is easier than solving a quadratic programming optimization problem, the absence of sparsity in the Lagrange multiplier vector obtained after training a LSSVM model is an important drawback. To overcome this drawback, we present a new approach for sparse LSSVM called Optimally Pruned LSSVM (OP-LSSVM). Our proposal is based on a ranking method, named Multiresponse Sparse Regression (MRSR), which is used to sort the patterns in terms of relevance. After that, the leave-one-out (LOO) criterion is also used in order to select an appropriate number of support vectors. Our proposal was inspired by a recent methodology called OP-ELM, which prunes hidden neurons of Extreme Learning Machines. Therefore, in this paper, we put LSSVM and MRSR to work togheter in order to achieve sparse classifiers, as well as one can see that we achieved equivalent (or even superior) performance for real-world classification tasks.
Least Square Suppor Vector Machines (LSSVMs) are an alternative to SVMs because the training process for LSSVMs is based on solving a linear equation system while the training process for SVMs relies on solving a quadratic programming optimization problem. Despite solving a linear system is easier than solving a quadratic programming optimization problem, the absence of sparsity in the Lagrange multiplier vector obtained after training a LSSVM model is an important drawback. To overcome this drawback, we present a new approach for sparse LSSVM called Optimally Pruned LSSVM (OP-LSSVM). Our proposal is based on a ranking method, named Multiresponse Sparse Regression (MRSR), which is used to sort the patterns in terms of relevance. After that, the leave-one-out (LOO) criterion is also used in order to select an appropriate number of support vectors. Our proposal was inspired by a recent methodology called OP-ELM, which prunes hidden neurons of Extreme Learning Machines. Therefore, in this paper, we put LSSVM and MRSR to work togheter in order to achieve sparse classifiers, as well as one can see that we achieved equivalent (or even superior) performance for real-world classification tasks.
ES2016-104
anomaly detection on spectrograms using data-driven and fixed dictionary representations
Mina ABDEL-SAYED, Daniel Duclos, Gilles Faÿ, Jérôme Lacaille, Mathilde Mougeot
anomaly detection on spectrograms using data-driven and fixed dictionary representations
Mina ABDEL-SAYED, Daniel Duclos, Gilles Faÿ, Jérôme Lacaille, Mathilde Mougeot
Abstract:
Spectrograms provide a visual representation of the vibrations of civil aircraft engines. The vibrations contain information relative to damage in the engine, if any. This representation is noisy, high dimensional and the relevant signatures relative to damages concern only a small part of the spectrogram. All these arguments lead to difficulties to automatically detect anomalies in the spectrogram. Adequate lower dimensional representations of the spectrograms are needed. In this paper, we study two types of representations with dictionary, a data-driven one and a non-adaptive one and we show their benefits for automatic anomaly detection.
Spectrograms provide a visual representation of the vibrations of civil aircraft engines. The vibrations contain information relative to damage in the engine, if any. This representation is noisy, high dimensional and the relevant signatures relative to damages concern only a small part of the spectrogram. All these arguments lead to difficulties to automatically detect anomalies in the spectrogram. Adequate lower dimensional representations of the spectrograms are needed. In this paper, we study two types of representations with dictionary, a data-driven one and a non-adaptive one and we show their benefits for automatic anomaly detection.
ES2016-174
Using semantic similarity for multi-label zero-shot classification of text documents
Prateek Veeranna Sappadla, Jinseok Nam, Eneldo Loza Mencía, Johannes Fürnkranz
Using semantic similarity for multi-label zero-shot classification of text documents
Prateek Veeranna Sappadla, Jinseok Nam, Eneldo Loza Mencía, Johannes Fürnkranz
Abstract:
In this paper, we examine a simple approach to zero-shot multi-label text classification, i.e., to the problem of predicting multiple, possibly previously unseen labels for a document. In particular, we propose to use a semantic embedding of label and document words and base the prediction of previously unseen labels on the similarity between the label name and the document words in this embedding. Experiments on three textual datasets across various domains show that even such a simple technique yields considerable performance improvements over a simple uninformed baseline.
In this paper, we examine a simple approach to zero-shot multi-label text classification, i.e., to the problem of predicting multiple, possibly previously unseen labels for a document. In particular, we propose to use a semantic embedding of label and document words and base the prediction of previously unseen labels on the similarity between the label name and the document words in this embedding. Experiments on three textual datasets across various domains show that even such a simple technique yields considerable performance improvements over a simple uninformed baseline.
ES2016-99
Active transfer learning for activity recognition
Tom Diethe, Niall Twomey, Peter Flach
Active transfer learning for activity recognition
Tom Diethe, Niall Twomey, Peter Flach
Abstract:
We examine activity recognition from accelerometers, which provides at least two major challenges for machine learning. Firstly, the deployment context is likely to differ from the learning context. Secondly, accurate labelling of training data is time-consuming and error-prone. This calls for a combination of active and transfer learning. We derive a hierarchical Bayesian model that is a natural fit to such problems, and provide empirical validation on synthetic and publicly available datasets. The results show that by combining active and transfer learning, we can achieve faster learning with fewer labels on a target domain than by either alone.
We examine activity recognition from accelerometers, which provides at least two major challenges for machine learning. Firstly, the deployment context is likely to differ from the learning context. Secondly, accurate labelling of training data is time-consuming and error-prone. This calls for a combination of active and transfer learning. We derive a hierarchical Bayesian model that is a natural fit to such problems, and provide empirical validation on synthetic and publicly available datasets. The results show that by combining active and transfer learning, we can achieve faster learning with fewer labels on a target domain than by either alone.
ES2016-46
Tuning the Distribution Dependent Prior in the PAC-Bayes Framework based on Empirical Data
Luca Oneto, Sandro Ridella, Davide Anguita
Tuning the Distribution Dependent Prior in the PAC-Bayes Framework based on Empirical Data
Luca Oneto, Sandro Ridella, Davide Anguita
Abstract:
In this paper we further develop the idea that the PAC-Bayes prior can be defined based on the data-generating distribution. In particular, following Catoni, we refine some recent generalisation bounds on the risk of the Gibbs Classifier, when the prior is defined in terms of the data generating distribution, and the posterior is defined in terms of the observed one. Moreover we show that the prior and the posterior distributions can be tuned based on the observed samples without worsening the convergence rate of the bounds and with a marginal impact on their constants.
In this paper we further develop the idea that the PAC-Bayes prior can be defined based on the data-generating distribution. In particular, following Catoni, we refine some recent generalisation bounds on the risk of the Gibbs Classifier, when the prior is defined in terms of the data generating distribution, and the posterior is defined in terms of the observed one. Moreover we show that the prior and the posterior distributions can be tuned based on the observed samples without worsening the convergence rate of the bounds and with a marginal impact on their constants.
ES2016-48
Random Forests Model Selection
Ilenia Orlandi, Luca Oneto, Davide Anguita
Random Forests Model Selection
Ilenia Orlandi, Luca Oneto, Davide Anguita
Abstract:
Random Forests (RF) of tree classifiers are a popular ensemble method for classification. RF have shown to be effective in many different real world classification problems and nowadays are considered as one of the best learning algorithms in this context. In this paper we discuss the effect of the hyperparameters of the RF over the accuracy of the final model, with particular reference to different theoretically grounded weighing strategies of the tree in the forest. In this way we go against the common misconception which considers RF as an hyperparameter-free learning algorithm. Results on a series of benchmark datasets show that performing an accurate Model Selection procedure can greatly improve the accuracy of the final RF classifier.
Random Forests (RF) of tree classifiers are a popular ensemble method for classification. RF have shown to be effective in many different real world classification problems and nowadays are considered as one of the best learning algorithms in this context. In this paper we discuss the effect of the hyperparameters of the RF over the accuracy of the final model, with particular reference to different theoretically grounded weighing strategies of the tree in the forest. In this way we go against the common misconception which considers RF as an hyperparameter-free learning algorithm. Results on a series of benchmark datasets show that performing an accurate Model Selection procedure can greatly improve the accuracy of the final RF classifier.
ES2016-63
The WiSARD Classifier
Massimo De Gregorio, Maurizio Giordano
The WiSARD Classifier
Massimo De Gregorio, Maurizio Giordano
Abstract:
WiSARD is a weightless neural model which essentially uses look up tables to store the function computed by each neuron rather than storing it in weights of neuron connections. Although WiSARD was originally conceived as a pattern recognition device mainly focusing on image processing, in this work we show how it is possible to build a multi–class classifier method in Machine Learning (ML) domain based on WiSARD that shows equivalent performances to ML state–of-the–art methods.
WiSARD is a weightless neural model which essentially uses look up tables to store the function computed by each neuron rather than storing it in weights of neuron connections. Although WiSARD was originally conceived as a pattern recognition device mainly focusing on image processing, in this work we show how it is possible to build a multi–class classifier method in Machine Learning (ML) domain based on WiSARD that shows equivalent performances to ML state–of-the–art methods.
ES2016-102
Policy-gradient methods for Decision Trees
Aurélia Léon, Ludovic Denoyer
Policy-gradient methods for Decision Trees
Aurélia Léon, Ludovic Denoyer
Abstract:
We propose a new type of decision trees able to learn at the same time how inputs fall in the tree and which predictions are associated to the leaves. The main advantage of this approach is to be based on the optimization of a global loss function instead of using heuristic-based greedy techniques, while keeping the good characteristics of decision trees. The learning algorithm is inspired by reinforcement learning and based on gradient-descent based methods, allowing a fast optimization. Moreover the algorithm is not limited to (mono-label) classification task and can be used for any predictive problem while a derivable loss function exist. Experimental results show the effectiveness of the method w.r.t baselines.
We propose a new type of decision trees able to learn at the same time how inputs fall in the tree and which predictions are associated to the leaves. The main advantage of this approach is to be based on the optimization of a global loss function instead of using heuristic-based greedy techniques, while keeping the good characteristics of decision trees. The learning algorithm is inspired by reinforcement learning and based on gradient-descent based methods, allowing a fast optimization. Moreover the algorithm is not limited to (mono-label) classification task and can be used for any predictive problem while a derivable loss function exist. Experimental results show the effectiveness of the method w.r.t baselines.
ES2016-113
Multicriteria optimized MLP for imbalanced learning
Paavo Nieminen, Tommi Karkkainen
Multicriteria optimized MLP for imbalanced learning
Paavo Nieminen, Tommi Karkkainen
Abstract:
Classifier construction for data with imbalanced class frequencies needs special attention if good classification accuracy for all the classes is sought. When the classes are not separable, i.e., when the distributions of observations in the classes overlap, it is impossible to achieve ideal accuracy for all the classes at once. We suggest a versatile multicriteria optimization formulation for imbalanced classification and demonstrate its applicability using a single hidden layer perceptron as the classifier model.
Classifier construction for data with imbalanced class frequencies needs special attention if good classification accuracy for all the classes is sought. When the classes are not separable, i.e., when the distributions of observations in the classes overlap, it is impossible to achieve ideal accuracy for all the classes at once. We suggest a versatile multicriteria optimization formulation for imbalanced classification and demonstrate its applicability using a single hidden layer perceptron as the classifier model.
ES2016-120
Activity recognition with echo state networks using 3D body joints and objects category
Luiza Mici, Xavier Hinaut, Stefan Wermter
Activity recognition with echo state networks using 3D body joints and objects category
Luiza Mici, Xavier Hinaut, Stefan Wermter
Abstract:
In this paper we present our experiments with an echo state network (ESN) for the task of classifying high-level human activities from video data. ESNs are recurrent neural networks which are biologically plausible, fast to train and they perform well in processing arbitrary sequential data. We focus on the integration of body motion with the information on objects manipulated during the activity, in order to overcome the visual ambiguities introduced by the processing of articulated body motion. We investigate the outputs learned and the accuracy of classification obtained with ESNs by using a challenging dataset of long high-level activities. We finally report the results achieved on this dataset
In this paper we present our experiments with an echo state network (ESN) for the task of classifying high-level human activities from video data. ESNs are recurrent neural networks which are biologically plausible, fast to train and they perform well in processing arbitrary sequential data. We focus on the integration of body motion with the information on objects manipulated during the activity, in order to overcome the visual ambiguities introduced by the processing of articulated body motion. We investigate the outputs learned and the accuracy of classification obtained with ESNs by using a challenging dataset of long high-level activities. We finally report the results achieved on this dataset
ES2016-126
From User-independent to Personal Human Activity Recognition Models Using Smartphone Sensors
Pekka Siirtola, Heli Koskimäki, Juha Röning
From User-independent to Personal Human Activity Recognition Models Using Smartphone Sensors
Pekka Siirtola, Heli Koskimäki, Juha Röning
Abstract:
In this study, a novel method to obtain user-dependent human activity recognition models unobtrusively by using the sensors of a smartphone is presented. The recognition consists of two models: sensor fusion-based user-independent model for data labeling and single sensor-based user-dependent model for final recognition. The functioning of the presented method is tested with human activity data set, including data from accelerometer and magnetometer, and with two classifiers. Comparison of the detection accuracies of the proposed method to traditional user-independent model shows that the presented method has potential, in nine cases out of ten it is better than the traditional method, but more experiments using different sensor combinations should be made to show the full potential of the method.
In this study, a novel method to obtain user-dependent human activity recognition models unobtrusively by using the sensors of a smartphone is presented. The recognition consists of two models: sensor fusion-based user-independent model for data labeling and single sensor-based user-dependent model for final recognition. The functioning of the presented method is tested with human activity data set, including data from accelerometer and magnetometer, and with two classifiers. Comparison of the detection accuracies of the proposed method to traditional user-independent model shows that the presented method has potential, in nine cases out of ten it is better than the traditional method, but more experiments using different sensor combinations should be made to show the full potential of the method.
ES2016-136
One-class classification algorithm based on convex hull
Diego Fernandez-Francos, Oscar Fontenla-Romero, Amparo Alonso-Betanzos
One-class classification algorithm based on convex hull
Diego Fernandez-Francos, Oscar Fontenla-Romero, Amparo Alonso-Betanzos
Abstract:
A new version of a one-class classification algorithm is in this paper. In it, convex hull (CH) is used to define the boundary of the target class defining the one-class problem. Expansion and reduction of the CH prevents over-fitting. An approximation of the D-dimensional CH decision is made by using random projections and an ensemble of models in very low-dimensional spaces. A different method to obtain the expanded polytope is proposed in order to avoid some undesirable behavior detected in the original algorithm in certain situations. Besides, this modification allows the use of a new parameter, the CH center, that provides even more flexibility to our proposal. Experimental results showed that the new algorithm is significantly better, regarding accuracy, than the original one on a large number of datasets.
A new version of a one-class classification algorithm is in this paper. In it, convex hull (CH) is used to define the boundary of the target class defining the one-class problem. Expansion and reduction of the CH prevents over-fitting. An approximation of the D-dimensional CH decision is made by using random projections and an ensemble of models in very low-dimensional spaces. A different method to obtain the expanded polytope is proposed in order to avoid some undesirable behavior detected in the original algorithm in certain situations. Besides, this modification allows the use of a new parameter, the CH center, that provides even more flexibility to our proposal. Experimental results showed that the new algorithm is significantly better, regarding accuracy, than the original one on a large number of datasets.
ES2016-158
Converting SVDD scores into probability estimates
Meriem El Azami, Carole Lartizien, Stéphane Canu
Converting SVDD scores into probability estimates
Meriem El Azami, Carole Lartizien, Stéphane Canu
Abstract:
To enable post-processing, the output of a support vector data description (SVDD) should be a calibrated probability as done for SVM. Standard SVDD does not provide such probabilities. To create probabilities, we first generalize the SVDD model and propose two calibration functions. The first one uses a sigmoid model and the other one is based on a generalized extreme distribution model. To estimate calibration parameters, we use the consistency property of the estimator associated with a single SVDD model. A synthetic dataset and datasets from the UCI repository are used to compare the performance against a robust kernel density estimator.
To enable post-processing, the output of a support vector data description (SVDD) should be a calibrated probability as done for SVM. Standard SVDD does not provide such probabilities. To create probabilities, we first generalize the SVDD model and propose two calibration functions. The first one uses a sigmoid model and the other one is based on a generalized extreme distribution model. To estimate calibration parameters, we use the consistency property of the estimator associated with a single SVDD model. A synthetic dataset and datasets from the UCI repository are used to compare the performance against a robust kernel density estimator.
Deep learning
ES2016-23
Challenges in Deep Learning
Plamen Angelov, Alessandro Sperduti
Challenges in Deep Learning
Plamen Angelov, Alessandro Sperduti
ES2016-175
Deep Reservoir Computing: A Critical Analysis
Claudio Gallicchio, Alessio Micheli
Deep Reservoir Computing: A Critical Analysis
Claudio Gallicchio, Alessio Micheli
Abstract:
In this paper we propose an empirical analysis of deep recurrent neural networks (RNNs) with stacked layers. The analysis aims at the study and proposal of approaches to develop and enhance multiple timescale and hierarchical dynamics in deep recurrent architectures, within the efficient Reservoir Computing (RC) approach for RNN modeling. Results point out the actual relevance of layering and RC parameters aspects on the diversification of temporal representations in deep recurrent models.
In this paper we propose an empirical analysis of deep recurrent neural networks (RNNs) with stacked layers. The analysis aims at the study and proposal of approaches to develop and enhance multiple timescale and hierarchical dynamics in deep recurrent architectures, within the efficient Reservoir Computing (RC) approach for RNN modeling. Results point out the actual relevance of layering and RC parameters aspects on the diversification of temporal representations in deep recurrent models.
ES2016-112
Deep Learning Vector Quantization
Harm de Vries, Roland Memisevic, Aaron Courville
Deep Learning Vector Quantization
Harm de Vries, Roland Memisevic, Aaron Courville
Abstract:
While deep neural nets (DNN's) achieve impressive performance on image recognition tasks, previous studies have reported that DNN's give high confidence predictions for unrecognizable images. Motivated by the observation that such \emph{fooling examples} might be caused by the extrapolating nature of the log-softmax, we propose to combine neural network with Learning Vector Quantization (LVQ). Our proposed method, called Deep LVQ (DLVQ), achieves comparable performance on MNIST while being more robust against fooling and adversarial examples.
While deep neural nets (DNN's) achieve impressive performance on image recognition tasks, previous studies have reported that DNN's give high confidence predictions for unrecognizable images. Motivated by the observation that such \emph{fooling examples} might be caused by the extrapolating nature of the log-softmax, we propose to combine neural network with Learning Vector Quantization (LVQ). Our proposed method, called Deep LVQ (DLVQ), achieves comparable performance on MNIST while being more robust against fooling and adversarial examples.
ES2016-118
Multispectral Pedestrian Detection using Deep Fusion Convolutional Neural Networks
Jörg Wagner, Volker Fischer, Michael Herman, Sven Behnke
Multispectral Pedestrian Detection using Deep Fusion Convolutional Neural Networks
Jörg Wagner, Volker Fischer, Michael Herman, Sven Behnke
Abstract:
Robust vision-based pedestrian detection is a crucial feature of future autonomous systems. Thermal cameras provide an additional input channel that helps solving this task and deep convolutional networks are the currently leading approach for many pattern recognition problems, including object detection. In this paper, we explore the potential of deep models for multispectral pedestrian detection. We investigate two deep fusion architectures and analyze their performance on multispectral data. Our results show that a pre-trained late-fusion architecture significantly outperforms the current state-of-the-art ACF+T+THOG solution.
Robust vision-based pedestrian detection is a crucial feature of future autonomous systems. Thermal cameras provide an additional input channel that helps solving this task and deep convolutional networks are the currently leading approach for many pattern recognition problems, including object detection. In this paper, we explore the potential of deep models for multispectral pedestrian detection. We investigate two deep fusion architectures and analyze their performance on multispectral data. Our results show that a pre-trained late-fusion architecture significantly outperforms the current state-of-the-art ACF+T+THOG solution.
ES2016-74
Augmenting a convolutional neural network with local histograms - A case study in crop classification from high-resolution UAV imagery
Julien Rebetez, Héctor F. Satizábal, Matteo Mota, Dorothea Noll, Lucie Büchi, Marina Wendling, Bertrand Cannelle, Andres Perez-Uribe, Stéphane Burgos
Augmenting a convolutional neural network with local histograms - A case study in crop classification from high-resolution UAV imagery
Julien Rebetez, Héctor F. Satizábal, Matteo Mota, Dorothea Noll, Lucie Büchi, Marina Wendling, Bertrand Cannelle, Andres Perez-Uribe, Stéphane Burgos
Abstract:
The advent of affordable drones capable of taking high resolution images of agricultural fields creates new challenges and opportunities in aerial scene understanding. This paper tackles the problem of recognizing crop types from aerial imagery and proposes a new hybrid neural network architecture which combines histograms and convolutional units. We evaluate the performance of the proposed model on a 23-class classification task and compare it to other models. The result is an improvement of the classification performance.
The advent of affordable drones capable of taking high resolution images of agricultural fields creates new challenges and opportunities in aerial scene understanding. This paper tackles the problem of recognizing crop types from aerial imagery and proposes a new hybrid neural network architecture which combines histograms and convolutional units. We evaluate the performance of the proposed model on a 23-class classification task and compare it to other models. The result is an improvement of the classification performance.
ES2016-6
Stochastic gradient estimate variance in contrastive divergence and persistent contrastive divergence
Mathias Berglund
Stochastic gradient estimate variance in contrastive divergence and persistent contrastive divergence
Mathias Berglund
Abstract:
Contrastive Divergence (CD) and Persistent Contrastive Divergence (PCD) are popular methods for training Restricted Boltzmann Machines. However, both methods use an approximate method for sampling from the model distribution. As a side effect, these approximations yield significantly different biases and variances for stochastic gradient estimates of individual data points. It is well known that CD yields a biased gradient estimate. In this paper we however show empirically that CD has a lower stochastic gradient estimate variance than unbiased sampling, while the mean of subsequent PCD estimates has a higher variance than independent sampling. The results give one explanation to the finding that CD can be used with smaller minibatches or higher learning rates than PCD.
Contrastive Divergence (CD) and Persistent Contrastive Divergence (PCD) are popular methods for training Restricted Boltzmann Machines. However, both methods use an approximate method for sampling from the model distribution. As a side effect, these approximations yield significantly different biases and variances for stochastic gradient estimates of individual data points. It is well known that CD yields a biased gradient estimate. In this paper we however show empirically that CD has a lower stochastic gradient estimate variance than unbiased sampling, while the mean of subsequent PCD estimates has a higher variance than independent sampling. The results give one explanation to the finding that CD can be used with smaller minibatches or higher learning rates than PCD.
ES2016-27
An Experiment in Pre-Emphasizing Diversified Deep Neural Classifiers
Ricardo Alvear-Sandoval, Anibal Figueiras-Vidal
An Experiment in Pre-Emphasizing Diversified Deep Neural Classifiers
Ricardo Alvear-Sandoval, Anibal Figueiras-Vidal
Abstract:
We explore if adding a pre-emphasis step to diversified deep auto-encoding based classifiers serves to further improve their performance with respect to those obtained just separately using pre-emphasis or diversification. An experiment with the MNIST database, a well-known benchmark problem for this type of designs, shows that further improvement does appear, the main condition for it simply being to select general and flexible enough pre-emphasis forms. Other manners of combining diversity and pre-emphasis require more research effort, as well as to investigate if other deep architectures can also obtain benefits from these ideas.
We explore if adding a pre-emphasis step to diversified deep auto-encoding based classifiers serves to further improve their performance with respect to those obtained just separately using pre-emphasis or diversification. An experiment with the MNIST database, a well-known benchmark problem for this type of designs, shows that further improvement does appear, the main condition for it simply being to select general and flexible enough pre-emphasis forms. Other manners of combining diversity and pre-emphasis require more research effort, as well as to investigate if other deep architectures can also obtain benefits from these ideas.
ES2016-45
Comparison of Four- and Six-Layered Configurations for Deep Network Pretraining
Tommi Karkkainen, Jan Hanninen
Comparison of Four- and Six-Layered Configurations for Deep Network Pretraining
Tommi Karkkainen, Jan Hanninen
Abstract:
Using simpler building blocks to initially construct a deep network, with their finetuning for the full architecture, is known to improve the deep learning process. However, in many cases the pretrained networks are obtained using different training algorithms than used in their final combination. Here we introduce and compare four possible architectures to pretrain a deep, feedforward network architecture, using exactly the same formulation throughout. Based on the analytical formulations and experimental results, one of the tested configurations is concluded as the recommended approach for the initial phase of deep learning.
Using simpler building blocks to initially construct a deep network, with their finetuning for the full architecture, is known to improve the deep learning process. However, in many cases the pretrained networks are obtained using different training algorithms than used in their final combination. Here we introduce and compare four possible architectures to pretrain a deep, feedforward network architecture, using exactly the same formulation throughout. Based on the analytical formulations and experimental results, one of the tested configurations is concluded as the recommended approach for the initial phase of deep learning.
ES2016-103
Learning Embeddings for Completion and Prediction of Relationnal Multivariate Time-Series
Ali Ziat, Gabriella Contardo, Nicolas Baskiotis, Ludovic Denoyer
Learning Embeddings for Completion and Prediction of Relationnal Multivariate Time-Series
Ali Ziat, Gabriella Contardo, Nicolas Baskiotis, Ludovic Denoyer
Abstract:
We focus on learning over multivariate and relational time-series where relations are modeled by a graph. We propose a model that is able to simultaneously fill in missing values and predict future ones. This approach is based on representation learning techniques, where temporal data are represented in a latent vector space so as to capture the dynamicity of the process and also the relations between the different sources. Information completion (missing values) and prediction are performed simultaneously using a unique formalism, whereas most often they are addressed separately using different methods.
We focus on learning over multivariate and relational time-series where relations are modeled by a graph. We propose a model that is able to simultaneously fill in missing values and predict future ones. This approach is based on representation learning techniques, where temporal data are represented in a latent vector space so as to capture the dynamicity of the process and also the relations between the different sources. Information completion (missing values) and prediction are performed simultaneously using a unique formalism, whereas most often they are addressed separately using different methods.
ES2016-107
Spatial Chirp-Z Transformer Networks
Jonas Degrave, Sander Dieleman, Joni Dambre, Francis Wyffels
Spatial Chirp-Z Transformer Networks
Jonas Degrave, Sander Dieleman, Joni Dambre, Francis Wyffels
Abstract:
Convolutional Neural Networks are often used for computer vision solutions, because of their inherent modeling of the translation invariance in images. In this paper, we propose a new module to model rotation and scaling invariances in images. To do this, we rely on the chirp-Z transform to perform the desired translation, rotation and scaling in the frequency domain. This approach has the benefit that it scales well and that it is differentiable because of the computationally cheap sinc-interpolation.
Convolutional Neural Networks are often used for computer vision solutions, because of their inherent modeling of the translation invariance in images. In this paper, we propose a new module to model rotation and scaling invariances in images. To do this, we rely on the chirp-Z transform to perform the desired translation, rotation and scaling in the frequency domain. This approach has the benefit that it scales well and that it is differentiable because of the computationally cheap sinc-interpolation.
Clustering and feature selection
ES2016-77
Fast Support Vector Clustering
Tung Pham, Trung Le, Hoang-Thai Le, Dat Tran
Fast Support Vector Clustering
Tung Pham, Trung Le, Hoang-Thai Le, Dat Tran
Abstract:
Support-based clustering has recently drawn plenty of attention because of its applications in solving the difficult and diverse clustering or outlier detection problem. Support-based clustering method undergoes two phases: finding the domain of novelty and doing clustering assignment. To find the domain of novelty, the training time given by the current solvers is typically quadratic in the training size. It precludes the usage of support-based clustering method for the large-scale datasets. In this paper, we propose applying Stochastic Gradient Descent framework to the first phase of support-based clustering for finding the domain of novelty in form of a half-space and a new strategy to do the clustering assignment. We validate our proposed method on the well-known datasets for clustering to show that the proposed method offers a comparable clustering quality to Support Vector Clustering while being faster than this method.
Support-based clustering has recently drawn plenty of attention because of its applications in solving the difficult and diverse clustering or outlier detection problem. Support-based clustering method undergoes two phases: finding the domain of novelty and doing clustering assignment. To find the domain of novelty, the training time given by the current solvers is typically quadratic in the training size. It precludes the usage of support-based clustering method for the large-scale datasets. In this paper, we propose applying Stochastic Gradient Descent framework to the first phase of support-based clustering for finding the domain of novelty in form of a half-space and a new strategy to do the clustering assignment. We validate our proposed method on the well-known datasets for clustering to show that the proposed method offers a comparable clustering quality to Support Vector Clustering while being faster than this method.
ES2016-72
Fast in-memory spectral clustering using a fixed-size approach
Rocco Langone, Raghvendra Mall, Vilen Jumutc, Johan A. K. Suykens
Fast in-memory spectral clustering using a fixed-size approach
Rocco Langone, Raghvendra Mall, Vilen Jumutc, Johan A. K. Suykens
Abstract:
Spectral clustering represents a successful approach to data clustering. Despite its high performance in solving complex tasks, it is often disregarded in favor of the less accurate k-means algorithm because of its computational inefficiency. In this article we present a fast in-memory spectral clustering algorithm, which can handle millions of datapoints at a desktop PC scale. The proposed technique relies on a kernel-based formulation of the spectral clustering problem, also known as kernel spectral clustering. In particular, we use a fixed-size approach based on an approximation of the feature map via the Nystrom method to solve the primal optimization problem. We experimented on several small and large scale real-world datasets to show the computational efficiency and clustering quality of the proposed algorithm.
Spectral clustering represents a successful approach to data clustering. Despite its high performance in solving complex tasks, it is often disregarded in favor of the less accurate k-means algorithm because of its computational inefficiency. In this article we present a fast in-memory spectral clustering algorithm, which can handle millions of datapoints at a desktop PC scale. The proposed technique relies on a kernel-based formulation of the spectral clustering problem, also known as kernel spectral clustering. In particular, we use a fixed-size approach based on an approximation of the feature map via the Nystrom method to solve the primal optimization problem. We experimented on several small and large scale real-world datasets to show the computational efficiency and clustering quality of the proposed algorithm.
ES2016-78
Spectral clustering and discriminant analysis for unsupervised feature selection
Xiucai Ye, Kaiyang Ji, Tetsuya Sakurai
Spectral clustering and discriminant analysis for unsupervised feature selection
Xiucai Ye, Kaiyang Ji, Tetsuya Sakurai
Abstract:
In this paper, we propose a novel method for unsupervised feature selection, which utilizes spectral clustering and discriminant analysis to learn the cluster labels of data. During the learning of cluster labels, feature selection is performed simultaneously. By imposing row sparsity on the transformation matrix, the proposed method optimizes for selecting the most discriminative features which better captures both the global and local structure of data. We develop an iterative algorithm to effectively solve the optimization problem in our method. Experimental results on different real-world data demonstrate the effectiveness of the proposed method.
In this paper, we propose a novel method for unsupervised feature selection, which utilizes spectral clustering and discriminant analysis to learn the cluster labels of data. During the learning of cluster labels, feature selection is performed simultaneously. By imposing row sparsity on the transformation matrix, the proposed method optimizes for selecting the most discriminative features which better captures both the global and local structure of data. We develop an iterative algorithm to effectively solve the optimization problem in our method. Experimental results on different real-world data demonstrate the effectiveness of the proposed method.
ES2016-28
Clustering from two data sources using a kernel-based approach with weight coupling
Lynn Houthuys, Rocco Langone, Johan A. K. Suykens
Clustering from two data sources using a kernel-based approach with weight coupling
Lynn Houthuys, Rocco Langone, Johan A. K. Suykens
Abstract:
In many clustering problems there are multiple data sources which are available. Although each one could individually be used for clustering, exploiting information from all data sources together can be relevant to nd a clustering that is more accurate. Here a new model is proposed for clustering when two data sources are available. This model is called Binary View Kernel Spectral Clustering (BVKSC) and is based on a constrained optimization formulation typical to Least Squares Support Vector Machines (LS-SVM). The model includes a coupling term, where the weights of the two dierent data sources are coupled in the primal model. This coupling term makes it possible to exploit the additional information from each other data source. Experimental comparisons with a number of similar methods show that using two data sources can improve the clustering results and that the proposed method is competitive in performance to other state-of-the-art methods.
In many clustering problems there are multiple data sources which are available. Although each one could individually be used for clustering, exploiting information from all data sources together can be relevant to nd a clustering that is more accurate. Here a new model is proposed for clustering when two data sources are available. This model is called Binary View Kernel Spectral Clustering (BVKSC) and is based on a constrained optimization formulation typical to Least Squares Support Vector Machines (LS-SVM). The model includes a coupling term, where the weights of the two dierent data sources are coupled in the primal model. This coupling term makes it possible to exploit the additional information from each other data source. Experimental comparisons with a number of similar methods show that using two data sources can improve the clustering results and that the proposed method is competitive in performance to other state-of-the-art methods.
ES2016-37
Genetic Algorithm with Novel Crossover, Selection and Health Check for Clustering
Abul Hashem Beg, Md Zahidul Islam
Genetic Algorithm with Novel Crossover, Selection and Health Check for Clustering
Abul Hashem Beg, Md Zahidul Islam
Abstract:
We propose a genetic algorithm for clustering records, where the algorithm contains new approaches for various genetic operations including crossover and selection. We also propose a health check operation that finds sick chromosomes of a population and probabilistically replaces them with healthy chromosomes found in the previous generations. The proposed approaches improve the chromosome quality within a population, which then contribute in achieving good clustering solution. We use fifteen datasets to compare our technique with five existing techniques in terms of two cluster evaluation criteria. The experimental results indicate a clear superiority of the proposed technique over the existing techniques.
We propose a genetic algorithm for clustering records, where the algorithm contains new approaches for various genetic operations including crossover and selection. We also propose a health check operation that finds sick chromosomes of a population and probabilistically replaces them with healthy chromosomes found in the previous generations. The proposed approaches improve the chromosome quality within a population, which then contribute in achieving good clustering solution. We use fifteen datasets to compare our technique with five existing techniques in terms of two cluster evaluation criteria. The experimental results indicate a clear superiority of the proposed technique over the existing techniques.
ES2016-85
PSCEG: an unbiased parallel subspace clustering algorithm using exact grids
Bo Zhu, Alberto Mozo, Bruno Ordozgoiti
PSCEG: an unbiased parallel subspace clustering algorithm using exact grids
Bo Zhu, Alberto Mozo, Bruno Ordozgoiti
Abstract:
The quality of grid-based subspace clustering is highly dependent on the grid size and the positions of dense units, and many existing methods use sensitive global density thresholds that are difficult to set a priori. We propose PSCEG, a new approach that generates an exact grid without the need to specify its size based on the distribution of each dimension. In addition, we define an adaptive density estimator that avoids dimensionality bias. A parallel implementation of our algorithm using Resilient Distributed Datasets achieves a significant speedup w.r.t. the number of cores in high dimensional scenarios. Experimental results on synthetic and real datasets show PSCEG outperforms existing alternatives.
The quality of grid-based subspace clustering is highly dependent on the grid size and the positions of dense units, and many existing methods use sensitive global density thresholds that are difficult to set a priori. We propose PSCEG, a new approach that generates an exact grid without the need to specify its size based on the distribution of each dimension. In addition, we define an adaptive density estimator that avoids dimensionality bias. A parallel implementation of our algorithm using Resilient Distributed Datasets achieves a significant speedup w.r.t. the number of cores in high dimensional scenarios. Experimental results on synthetic and real datasets show PSCEG outperforms existing alternatives.
ES2016-93
Initialization of big data clustering using distributionally balanced folding
Joonas Hämäläinen, Tommi Karkkainen
Initialization of big data clustering using distributionally balanced folding
Joonas Hämäläinen, Tommi Karkkainen
Abstract:
Use of distributionally balanced folding to speed up the initialization phase of K-means++ clustering method, targeting for big data applications, is proposed and tested. The approach is first described and then experimented, by focusing on the effects of the sampling method when the number of folds created is varied. In the tests, quality of the final clustering results were assessed and scalability of a distributed implementation was demonstrated. The experiments support the viability of the proposed approach.
Use of distributionally balanced folding to speed up the initialization phase of K-means++ clustering method, targeting for big data applications, is proposed and tested. The approach is first described and then experimented, by focusing on the effects of the sampling method when the number of folds created is varied. In the tests, quality of the final clustering results were assessed and scalability of a distributed implementation was demonstrated. The experiments support the viability of the proposed approach.
ES2016-124
RBClust: High quality class-specific clustering using rule-based classification
Michael Siers, Md Zahidul Islam
RBClust: High quality class-specific clustering using rule-based classification
Michael Siers, Md Zahidul Islam
Abstract:
Within a class-labeled dataset, there are typically two or more possible class labels. Class-specific subsets of the dataset have the same class label for each record. Class-specific clusters are the groups of similar records within these subsets. There exists many machine learning techniques which require class-specific clusters. We propose RBClust, a rule based method for finding class-specific clusters. We demonstrate that when compared to traditional clustering methods, the proposed method achieves better cluster quality, and computation time is significantly lower.
Within a class-labeled dataset, there are typically two or more possible class labels. Class-specific subsets of the dataset have the same class label for each record. Class-specific clusters are the groups of similar records within these subsets. There exists many machine learning techniques which require class-specific clusters. We propose RBClust, a rule based method for finding class-specific clusters. We demonstrate that when compared to traditional clustering methods, the proposed method achieves better cluster quality, and computation time is significantly lower.
ES2016-188
K-means for Datasets with Missing Attributes: Building Soft Constraints with Observed and Imputed Values
Diego Mesquita, Joao Gomes, Leonardo Rodrigues
K-means for Datasets with Missing Attributes: Building Soft Constraints with Observed and Imputed Values
Diego Mesquita, Joao Gomes, Leonardo Rodrigues
Abstract:
Clustering methods have a wide range of applications. However, the presence of missing attribute values on the dataset may limit the use of clustering methods. Developing clustering methods that can deal with missing data has been a topic of interest among researchers in recent years. This work presents a variant of the well known k-means algorithm that can handle missing data. The proposed algorithm uses one type of soft constraints for observed data and a second type for imputed data. Four public datasets were used in the experiments in order to compare the performance of the proposed model with a traditional k-means algorithm and an algorithm that uses soft constraints only for observed data. The results showed that the proposed method outperformed the benchmark methods for all datasets considered in the experiments.
Clustering methods have a wide range of applications. However, the presence of missing attribute values on the dataset may limit the use of clustering methods. Developing clustering methods that can deal with missing data has been a topic of interest among researchers in recent years. This work presents a variant of the well known k-means algorithm that can handle missing data. The proposed algorithm uses one type of soft constraints for observed data and a second type for imputed data. Four public datasets were used in the experiments in order to compare the performance of the proposed model with a traditional k-means algorithm and an algorithm that uses soft constraints only for observed data. The results showed that the proposed method outperformed the benchmark methods for all datasets considered in the experiments.
ES2016-178
Instance and feature weighted k-nearest-neighbors algorithm
Gabriel Prat, Lluís A. Belanche
Instance and feature weighted k-nearest-neighbors algorithm
Gabriel Prat, Lluís A. Belanche
Abstract:
We present a novel method that aims at providing a more stable selection of feature subsets when variations in the training process occur. This is accomplished by using an instance-weighting process --assigning different importances to instances-- as a preprocessing step to a feature weighting method that is independent of the learner, and then making good use of both sets of computed weigths in a standard Nearest-Neighbours classifier. We report extensive experimentation in well-known benchmarking datasets as well as some challenging microarray gene expression problems. Our results show increases in FSS stability for most subset sizes and most problems, without compromising prediction accuracy.
We present a novel method that aims at providing a more stable selection of feature subsets when variations in the training process occur. This is accomplished by using an instance-weighting process --assigning different importances to instances-- as a preprocessing step to a feature weighting method that is independent of the learner, and then making good use of both sets of computed weigths in a standard Nearest-Neighbours classifier. We report extensive experimentation in well-known benchmarking datasets as well as some challenging microarray gene expression problems. Our results show increases in FSS stability for most subset sizes and most problems, without compromising prediction accuracy.
ES2016-143
Spatio-temporal feature selection for black-box weather forecasting
Zahra Karevan, Johan A. K. Suykens
Spatio-temporal feature selection for black-box weather forecasting
Zahra Karevan, Johan A. K. Suykens
Abstract:
In this paper, a data-driven modeling technique is proposed for temperature forecasting. Due to the high dimensionality, LASSO is used as feature selection approach. Considering spatio-temporal structure of the weather dataset, first LASSO is applied in a spatial and temporal scenario, independently. Next, a feature is included in the model if it is selected by both. Finally, Least Squares Support Vector Machines (LS-SVM) regression is used to learn the model. The experimental results show that spatio-temporal LASSO improves the performance and is competitive with the state-of-the-art methods. As a case study, the prediction of the temperature in Brussels is considered.
In this paper, a data-driven modeling technique is proposed for temperature forecasting. Due to the high dimensionality, LASSO is used as feature selection approach. Considering spatio-temporal structure of the weather dataset, first LASSO is applied in a spatial and temporal scenario, independently. Next, a feature is included in the model if it is selected by both. Finally, Least Squares Support Vector Machines (LS-SVM) regression is used to learn the model. The experimental results show that spatio-temporal LASSO improves the performance and is competitive with the state-of-the-art methods. As a case study, the prediction of the temperature in Brussels is considered.
ES2016-84
Parallelized unsupervised feature selection for large-scale network traffic analysis
Bruno Ordozgoiti, Sandra Gómez Canaval, Alberto Mozo
Parallelized unsupervised feature selection for large-scale network traffic analysis
Bruno Ordozgoiti, Sandra Gómez Canaval, Alberto Mozo
Abstract:
In certain domains, where model interpretability is highly valued, feature selection is often the only possible option for dimensionality reduction. However, two key problems arise. First, the size of data sets today makes it unfeasible to run centralized feature selection algorithms in reasonable amounts of time. Second, the impossibility of labeling data sets rules out supervised techniques. We propose an unsupervised feature selection algorithm based on a new formulation of the leverage scores. We derive an extremely efficient parallelized approach over the Resilient Distributed Datasets abstraction, making it applicable to the enormous data sets often present in network traffic analysis.
In certain domains, where model interpretability is highly valued, feature selection is often the only possible option for dimensionality reduction. However, two key problems arise. First, the size of data sets today makes it unfeasible to run centralized feature selection algorithms in reasonable amounts of time. Second, the impossibility of labeling data sets rules out supervised techniques. We propose an unsupervised feature selection algorithm based on a new formulation of the leverage scores. We derive an extremely efficient parallelized approach over the Resilient Distributed Datasets abstraction, making it applicable to the enormous data sets often present in network traffic analysis.
Information Visualisation and Machine Learning: Techniques, Validation and Integration
ES2016-18
Information visualisation and machine learning: characteristics, convergence and perspective
Benoît Frénay, Bruno Dumas
Information visualisation and machine learning: characteristics, convergence and perspective
Benoît Frénay, Bruno Dumas
Abstract:
This paper discusses how information visualisation and machine learning can cross-fertilise. On the one hand, the user-centric field of information visualisation can help machine learning to better integrate users in the learning, assessment and interpretation processes. On the other hand, machine learning can provide powerful algorithms for clustering, dimensionality reduction, data cleansing, outlier detection, etc. Such inference tools are required to create efficient visualisations. This paper highlight opportunities to collaborate for experts in both fields.
This paper discusses how information visualisation and machine learning can cross-fertilise. On the one hand, the user-centric field of information visualisation can help machine learning to better integrate users in the learning, assessment and interpretation processes. On the other hand, machine learning can provide powerful algorithms for clustering, dimensionality reduction, data cleansing, outlier detection, etc. Such inference tools are required to create efficient visualisations. This paper highlight opportunities to collaborate for experts in both fields.
ES2016-147
Enhancing a social science model-building workflow with interactive visualisation
Cagatay Turkay, Aidan Slingsby, Kaisa Lahtinen, Sarah Butt, Jason Dykes
Enhancing a social science model-building workflow with interactive visualisation
Cagatay Turkay, Aidan Slingsby, Kaisa Lahtinen, Sarah Butt, Jason Dykes
Abstract:
Although automated methods are helping produce significantly better models for various phenomena in scientific research, they often reduce the ability for the scientist to inform the model building with their theoretical knowledge. Such incorporation of prior knowledge is crucial when scientists aim to understand and defend their models. In this paper, we report our ongoing studies as a team of computer and social scientists where we use interactive visualisation techniques to improve the efficiency of the model building workflow. We do this by designing methods to incorporate theory, interactively build models, and keep a record of the decisions made.
Although automated methods are helping produce significantly better models for various phenomena in scientific research, they often reduce the ability for the scientist to inform the model building with their theoretical knowledge. Such incorporation of prior knowledge is crucial when scientists aim to understand and defend their models. In this paper, we report our ongoing studies as a team of computer and social scientists where we use interactive visualisation techniques to improve the efficiency of the model building workflow. We do this by designing methods to incorporate theory, interactively build models, and keep a record of the decisions made.
ES2016-123
Informative data projections: a framework and two examples
Tijl De Bie, Jefrey Lijffijt, Raul Santos-Rodriguez, Bo Kang
Informative data projections: a framework and two examples
Tijl De Bie, Jefrey Lijffijt, Raul Santos-Rodriguez, Bo Kang
Abstract:
Projection Pursuit aims to facilitate visual exploration of high-dimensional data by identifying interesting low-dimensional projections. A major challenge in Projection Pursuit is the design of a projection index–a suitable quality measure to maximise. We introduce a strategy for tackling this problem based on quantifying the amount of information a projection conveys, given a user’s prior beliefs about the data. The resulting projection index is a subjective quantity, explicitly dependent on the intended user. As an illustration, we developed this principle for two kinds of prior beliefs; the first leads to PCA, the second leads to a novel projection index, which we call t-PCA, that can be regarded as a robust PCA-variant. We demonstrate t-PCA’s usefulness in comparative experiments against PCA and FastICA, a popular PP method.
Projection Pursuit aims to facilitate visual exploration of high-dimensional data by identifying interesting low-dimensional projections. A major challenge in Projection Pursuit is the design of a projection index–a suitable quality measure to maximise. We introduce a strategy for tackling this problem based on quantifying the amount of information a projection conveys, given a user’s prior beliefs about the data. The resulting projection index is a subjective quantity, explicitly dependent on the intended user. As an illustration, we developed this principle for two kinds of prior beliefs; the first leads to PCA, the second leads to a novel projection index, which we call t-PCA, that can be regarded as a robust PCA-variant. We demonstrate t-PCA’s usefulness in comparative experiments against PCA and FastICA, a popular PP method.
ES2016-166
Human-centered machine learning through interactive visualization: review and open challenges
Dominik Sacha, Michael Sedlmair, Leishi Zhang, John Lee, Daniel Weiskopf, Stephen North, Daniel Keim
Human-centered machine learning through interactive visualization: review and open challenges
Dominik Sacha, Michael Sedlmair, Leishi Zhang, John Lee, Daniel Weiskopf, Stephen North, Daniel Keim
Abstract:
The goal of visual analytics (VA) systems is to solve complex problems by integrating automated data analysis methods, such as machine learning (ML) algorithms, with interactive visualizations. We propose a conceptual framework that models human interactions with ML components in the VA process, and makes the crucial interplay between automated algorithms and interactive visualizations more concrete. The framework is illustrated through several examples. We derive three open research challenges at the intersection of ML and visualization research that will lead to more effective data analysis.
The goal of visual analytics (VA) systems is to solve complex problems by integrating automated data analysis methods, such as machine learning (ML) algorithms, with interactive visualizations. We propose a conceptual framework that models human interactions with ML components in the VA process, and makes the crucial interplay between automated algorithms and interactive visualizations more concrete. The framework is illustrated through several examples. We derive three open research challenges at the intersection of ML and visualization research that will lead to more effective data analysis.
ES2016-41
A state-space model on interactive dimensionality reduction
Ignacio Diaz-Blanco, Abel Alberto Cuadrado-Vega, Michel Verleysen
A state-space model on interactive dimensionality reduction
Ignacio Diaz-Blanco, Abel Alberto Cuadrado-Vega, Michel Verleysen
Abstract:
In this work, we present a conceptual approach to the convergence dynamics of interactive dimensionality reduction (iDR) algorithms from the perspective of a well stablished theoretical model, namely state-space theory. The expected benefits are twofold: 1) suggesting new ways to import well known ideas from the state-space theory that help in the characterization and development of iDR algorithms and 2) providing a conceptual model for user interaction in iDR algorithms, that can be easily adopted for future interactive machine learning (iML) tools.
In this work, we present a conceptual approach to the convergence dynamics of interactive dimensionality reduction (iDR) algorithms from the perspective of a well stablished theoretical model, namely state-space theory. The expected benefits are twofold: 1) suggesting new ways to import well known ideas from the state-space theory that help in the characterization and development of iDR algorithms and 2) providing a conceptual model for user interaction in iDR algorithms, that can be easily adopted for future interactive machine learning (iML) tools.
ES2016-70
Visualizing stacked autoencoder language learning
Trevor Barron, Matthew Whitehead
Visualizing stacked autoencoder language learning
Trevor Barron, Matthew Whitehead
Abstract:
Visualizing the features of unsupervised deep networks is an important part of understanding what a network has learned. In this paper, we present a method for visualizing a deep autoencoder's hidden layers when trained on natural language data. Our method provides researchers insight into the semantic language features the network has extracted from the dataset. It can also show a big picture view of what a network has learned and how the various features the network has extracted relate to one another in semantic hierarchies. We hope that these visualizations will aid human understanding of deep networks and can help guide future experiments.
Visualizing the features of unsupervised deep networks is an important part of understanding what a network has learned. In this paper, we present a method for visualizing a deep autoencoder's hidden layers when trained on natural language data. Our method provides researchers insight into the semantic language features the network has extracted from the dataset. It can also show a big picture view of what a network has learned and how the various features the network has extracted relate to one another in semantic hierarchies. We hope that these visualizations will aid human understanding of deep networks and can help guide future experiments.
ES2016-29
Incremental hierarchical indexing and visualisation of large image collections
Frédéric Rayar, Sabine Barrat, Fatma Bouali, Gilles Venturini
Incremental hierarchical indexing and visualisation of large image collections
Frédéric Rayar, Sabine Barrat, Fatma Bouali, Gilles Venturini
Abstract:
Ever-growing image collections are common in several fields such as health, digital humanities or social networks. Nowadays, there is a lack of visualisation tools to browse such large image collection. In this work, the incremental indexing and the visualisation of large image collections is done jointly. The BIRCH algorithm is improved to incrementally yield a hierarchical indexing structure. A custom web platform is presented to visualise the structure that is built. The proposed method is tested with two large image collections, up to one million images.
Ever-growing image collections are common in several fields such as health, digital humanities or social networks. Nowadays, there is a lack of visualisation tools to browse such large image collection. In this work, the incremental indexing and the visualisation of large image collections is done jointly. The BIRCH algorithm is improved to incrementally yield a hierarchical indexing structure. A custom web platform is presented to visualise the structure that is built. The proposed method is tested with two large image collections, up to one million images.
Robotics and reinforcement learning
ES2016-54
Learning contextual affordances with an associative neural architecture
Francisco Cruz, German Parisi, Stefan Wermter
Learning contextual affordances with an associative neural architecture
Francisco Cruz, German Parisi, Stefan Wermter
Abstract:
Affordances are an effective method to anticipate the effect of actions performed by an agent interacting with objects. In this work, we present a robotic cleaning task using contextual affordances, i.e. an extension of affordances which takes into account the current state. We implement an associative neural architecture for predicting the effect of performed actions with different objects to avoid failed states. Experimental results on a simulated robot environment show that our associative memory is able to learn in short time and predict future states with high accuracy.
Affordances are an effective method to anticipate the effect of actions performed by an agent interacting with objects. In this work, we present a robotic cleaning task using contextual affordances, i.e. an extension of affordances which takes into account the current state. We implement an associative neural architecture for predicting the effect of performed actions with different objects to avoid failed states. Experimental results on a simulated robot environment show that our associative memory is able to learn in short time and predict future states with high accuracy.
ES2016-94
Neural fitted actor-critic
Matthieu Zimmer, Yann Boniface, Alain Dutech
Neural fitted actor-critic
Matthieu Zimmer, Yann Boniface, Alain Dutech
Abstract:
A novel reinforcement learning algorithm that deals with both continuous state and action spaces is proposed. Domain knowledge requirements are kept minimal by using non-linear estimators and since the algorithm does not need prior trajectories or known goal states. The new actor-critic algorithm is on-policy, offline and model-free. It considers discrete time, stationary policies, and maximizes the discounted sum of rewards. Experimental results on two common environments, showing the good performance of the proposed algorithm, are presented.
A novel reinforcement learning algorithm that deals with both continuous state and action spaces is proposed. Domain knowledge requirements are kept minimal by using non-linear estimators and since the algorithm does not need prior trajectories or known goal states. The new actor-critic algorithm is on-policy, offline and model-free. It considers discrete time, stationary policies, and maximizes the discounted sum of rewards. Experimental results on two common environments, showing the good performance of the proposed algorithm, are presented.
ES2016-114
Simultaneous estimation of rewards and dynamics from noisy expert demonstrations
Michael Herman, Tobias Gindele, Jörg Wagner, Felix Schmitt, Wolfram Burgard
Simultaneous estimation of rewards and dynamics from noisy expert demonstrations
Michael Herman, Tobias Gindele, Jörg Wagner, Felix Schmitt, Wolfram Burgard
Abstract:
Inverse Reinforcement Learning (IRL) describes the problem of learning an unknown reward function of a Markov Decision Process (MDP) from demonstrations of an expert. Current approaches typically require the system dynamics to be known or additional demonstrations of state transitions to be available to solve the inverse problem accurately. If these assumptions are not satisfied, heuristics can be used to compensate the lack of a model of the system dynamics. However, heuristics can add bias to the solution. To overcome this, we present a gradient-based approach, which simultaneously estimates rewards, dynamics, and the parameterizable stochastic policy of an expert from demonstrations, while the stochastic policy is a function of optimal Q-values.
Inverse Reinforcement Learning (IRL) describes the problem of learning an unknown reward function of a Markov Decision Process (MDP) from demonstrations of an expert. Current approaches typically require the system dynamics to be known or additional demonstrations of state transitions to be available to solve the inverse problem accurately. If these assumptions are not satisfied, heuristics can be used to compensate the lack of a model of the system dynamics. However, heuristics can add bias to the solution. To overcome this, we present a gradient-based approach, which simultaneously estimates rewards, dynamics, and the parameterizable stochastic policy of an expert from demonstrations, while the stochastic policy is a function of optimal Q-values.
ES2016-125
On the improvement of static force capacity of humanoid robots based on plants behavior
Juliano Pierezan, Roberto Zanetti Freire, Lucas Weihmann, Gilberto Reynoso-Meza, Leandro dos Santos Coelho
On the improvement of static force capacity of humanoid robots based on plants behavior
Juliano Pierezan, Roberto Zanetti Freire, Lucas Weihmann, Gilberto Reynoso-Meza, Leandro dos Santos Coelho
Abstract:
Humanoid robots need to interact with the environment and are constantly in rigid contact with objects. When a task must be performed, multiple contact points are responsible to add a degree of complexity to their control and, due to excessive efforts in joints, the durability of the components may be affected. This work presents the use of a recent proposed metaheuristic called Runner-Root Algorithm (RRA) applied on the static force capacity optimization of a humanoid robot. The performance of this algorithm was evaluated and compared to four well stablished methods showing promising results for RRA in this type of application.
Humanoid robots need to interact with the environment and are constantly in rigid contact with objects. When a task must be performed, multiple contact points are responsible to add a degree of complexity to their control and, due to excessive efforts in joints, the durability of the components may be affected. This work presents the use of a recent proposed metaheuristic called Runner-Root Algorithm (RRA) applied on the static force capacity optimization of a humanoid robot. The performance of this algorithm was evaluated and compared to four well stablished methods showing promising results for RRA in this type of application.
ES2016-148
Grounding the experience of a visual field through sensorimotor contingencies
Alban Laflaquiere, Michael Garcia Ortiz, Ahmed Faraz Khan
Grounding the experience of a visual field through sensorimotor contingencies
Alban Laflaquiere, Michael Garcia Ortiz, Ahmed Faraz Khan
Abstract:
Artificial perception is traditionally handled by hand-designing specific algorithms. However, a truly autonomous robot should develop perceptive abilities on its own by interacting with its environment. The sensorimotor contingencies theory proposes to ground those abilities in the way the agent can actively transform its sensory inputs. This work presents an application of this approach to the discovery of a visual field. It shows how an agent can capture regularities induced by its visual sensor in a sensorimotor predictive model. A formalism is proposed to address this problem and tested on a simulated system.
Artificial perception is traditionally handled by hand-designing specific algorithms. However, a truly autonomous robot should develop perceptive abilities on its own by interacting with its environment. The sensorimotor contingencies theory proposes to ground those abilities in the way the agent can actively transform its sensory inputs. This work presents an application of this approach to the discovery of a visual field. It shows how an agent can capture regularities induced by its visual sensor in a sensorimotor predictive model. A formalism is proposed to address this problem and tested on a simulated system.
ES2016-168
Semantic Role Labelling for Robot Instructions using Echo State Networks
Johannes Twiefel, Xavier Hinaut, Stefan Wermter
Semantic Role Labelling for Robot Instructions using Echo State Networks
Johannes Twiefel, Xavier Hinaut, Stefan Wermter
Abstract:
To control a robot in a real-world robot scenario, a real-time parser is needed to create semantic representations from natural language which can be interpreted. The parser should be able to create the hierarchical tree-like representations without consulting external systems to show its learning capabilities. We propose an efficient Echo State Network-based parser for robotic commands and only relies on the training data. The system generates a single semantic tree structure in real-time which can be executed by a robot arm manipulating objects. Four of six other approaches, which in most cases generate multiple trees and select one of them as the solution, were outperformed with 64.2% tree accuracy on difficult unseen natural language (74.1% under best conditions) on the same dataset.
To control a robot in a real-world robot scenario, a real-time parser is needed to create semantic representations from natural language which can be interpreted. The parser should be able to create the hierarchical tree-like representations without consulting external systems to show its learning capabilities. We propose an efficient Echo State Network-based parser for robotic commands and only relies on the training data. The system generates a single semantic tree structure in real-time which can be executed by a robot arm manipulating objects. Four of six other approaches, which in most cases generate multiple trees and select one of them as the solution, were outperformed with 64.2% tree accuracy on difficult unseen natural language (74.1% under best conditions) on the same dataset.
ES2016-75
Human detection and classification of landing sites for search and rescue drones
Felipe Martins, Marc de Groot, Xeryus Stokkel, Marco Wiering
Human detection and classification of landing sites for search and rescue drones
Felipe Martins, Marc de Groot, Xeryus Stokkel, Marco Wiering
Abstract:
Search and rescue is often time and labour intensive. We present a system to be used in drones to make search and rescue operations more effective. The system uses a drone downward facing camera to detect people and to evaluate potential sites as being safe or not for the drone to land. Histogram of Oriented Gradients (HOG) features are extracted and a Support Vector Machine (SVM) is used as classifier. Our results show good performance on classifying frames as containing people (Sensitivity > 78%, Specificity > 83%), and distinguishing between safe and dangerous landing sites (Sensitivity > 87%, Specificity > 98%).
Search and rescue is often time and labour intensive. We present a system to be used in drones to make search and rescue operations more effective. The system uses a drone downward facing camera to detect people and to evaluate potential sites as being safe or not for the drone to land. Histogram of Oriented Gradients (HOG) features are extracted and a Support Vector Machine (SVM) is used as classifier. Our results show good performance on classifying frames as containing people (Sensitivity > 78%, Specificity > 83%), and distinguishing between safe and dangerous landing sites (Sensitivity > 87%, Specificity > 98%).