Bruges, Belgium, April 24-25-26
Content of the proceedings
-
Machine Learning Methods for Processing and Analysis of Hyperspectral Data
Recurrent networks and modeling
Dimensionality reduction
Image, signal and time series analysis
Feature selection
Reinforcement learning, control and optimization
Machine Learning for multimedia applications
Clustering
Regression and forecasting
Developments in kernel design
Human Activity and Motion Disorder Recognition: towards smarter Interactive Cognitive Environments
Classification
Sparsity for interpretation and visualization in inference models
Machine Learning Methods for Processing and Analysis of Hyperspectral Data
ES2013-9
Processing Hyperspectral Data in Machine Learning
Thomas Villmann, Marika Kästner, Andreas Backhaus, Udo Seiffert
Processing Hyperspectral Data in Machine Learning
Thomas Villmann, Marika Kästner, Andreas Backhaus, Udo Seiffert
Abstract:
The adaptive and automated analysis of hyperspectral data is mandatory in many areas of research such as physics, astronomy and geophysics, chemistry, bioinformatics, medicine, biochemistry, engineering, and others. Hyperspectra differ from other spectral data that a large frequency range is uniformly sampled. The resulting discretized spectra have a huge number of spectral bands and can be seen as good approximations of the underlying continuous spectra. The large dimensionality causes numerical difficulties in efficient data analysis. Another aspect to deal with is that the amount of data may range from several billion samples in geophysics to only a few in medical applications. In consequence, dedicated machine learning algorithms and approaches are required for precise while efficient processing of hyperspectral data, which should include also expert knowledge of the application domain as well as mathematical properties of the hyperspectral data.
The adaptive and automated analysis of hyperspectral data is mandatory in many areas of research such as physics, astronomy and geophysics, chemistry, bioinformatics, medicine, biochemistry, engineering, and others. Hyperspectra differ from other spectral data that a large frequency range is uniformly sampled. The resulting discretized spectra have a huge number of spectral bands and can be seen as good approximations of the underlying continuous spectra. The large dimensionality causes numerical difficulties in efficient data analysis. Another aspect to deal with is that the amount of data may range from several billion samples in geophysics to only a few in medical applications. In consequence, dedicated machine learning algorithms and approaches are required for precise while efficient processing of hyperspectral data, which should include also expert knowledge of the application domain as well as mathematical properties of the hyperspectral data.
ES2013-34
Multi-view feature extraction for hyperspectral image classification
Michele Volpi, Giona Matasci, Mikhaïl Kanevski, Devis Tuia
Multi-view feature extraction for hyperspectral image classification
Michele Volpi, Giona Matasci, Mikhaïl Kanevski, Devis Tuia
Abstract:
We study the multi-view feature extraction (MV-FE) framework for the classification of hyperspectral images acquired from airborne and spaceborne sensors. This type of data is naturally composed by distinct blocks of spectral channels, forming the hypercube. To reduce the dimensionality of the data by taking advantage of this particular structure, an unsupervised multi-view feature extraction method is applied prior to classification. First, a technique to automatically obtain the blocks, based on the global spectral correlation matrix, is applied. Then, the kernel canonical correlation analysis is performed in a multi-view setting (MV-kCCA) to find projections of the data blocks in a correlated subspace, gaining thus discriminant power. Experiments using the linear discriminant classifier (LDA) show the appropriateness of adopting a MV-FE prior to classification, which outperforms standard approaches.
We study the multi-view feature extraction (MV-FE) framework for the classification of hyperspectral images acquired from airborne and spaceborne sensors. This type of data is naturally composed by distinct blocks of spectral channels, forming the hypercube. To reduce the dimensionality of the data by taking advantage of this particular structure, an unsupervised multi-view feature extraction method is applied prior to classification. First, a technique to automatically obtain the blocks, based on the global spectral correlation matrix, is applied. Then, the kernel canonical correlation analysis is performed in a multi-view setting (MV-kCCA) to find projections of the data blocks in a correlated subspace, gaining thus discriminant power. Experiments using the linear discriminant classifier (LDA) show the appropriateness of adopting a MV-FE prior to classification, which outperforms standard approaches.
ES2013-54
Regularization in relevance learning vector quantization using l1-norms
Martin Riedel, Fabrice Rossi, Marika Kästner, Thomas Villmann
Regularization in relevance learning vector quantization using l1-norms
Martin Riedel, Fabrice Rossi, Marika Kästner, Thomas Villmann
Abstract:
We propose in this contribution a method for $l_{1}$-regularization in prototype based relevance learning vector quantization (LVQ) for sparse relevance profiles. Sparse relevance profiles in hyperspectral data analysis fade down those spectral bands which are not necessary for classification. In particular, we consider the sparsity in the relevance profile enforced by LASSO optimization. The latter one is obtained by a gradient learning scheme using a differentiable parametrized approximation of the $l_{1}$-norm, which has an upper error bound. We extend this regularization idea also to the matrix learning variant of LVQ as the natural generalization of relevance learning.
We propose in this contribution a method for $l_{1}$-regularization in prototype based relevance learning vector quantization (LVQ) for sparse relevance profiles. Sparse relevance profiles in hyperspectral data analysis fade down those spectral bands which are not necessary for classification. In particular, we consider the sparsity in the relevance profile enforced by LASSO optimization. The latter one is obtained by a gradient learning scheme using a differentiable parametrized approximation of the $l_{1}$-norm, which has an upper error bound. We extend this regularization idea also to the matrix learning variant of LVQ as the natural generalization of relevance learning.
Recurrent networks and modeling
ES2013-4
Mixed order associative networks for function approximation, optimisation and sampling
Kevin Swingler, Leslie Smith
Mixed order associative networks for function approximation, optimisation and sampling
Kevin Swingler, Leslie Smith
Abstract:
A mixed order associative neural network with n neurons and a modified Hebbian learning rule can learn any function f:{-1,1}^n = R and reproduce its output as the network's energy function. The network weights are equal to Walsh coefficients, the fixed point attractors are local maxima in the function, and partial sums across the weights of the network calculate averages for hyperplanes through the function. If the network is trained on data sampled from a distribution, then marginal and conditional probability calculations may be made and samples from the distribution generated from the network. These qualities make the network ideal for optimisation fitness function modelling and make the relationships amongst variables explicit in a way that architectures such as the MLP do not.
A mixed order associative neural network with n neurons and a modified Hebbian learning rule can learn any function f:{-1,1}^n = R and reproduce its output as the network's energy function. The network weights are equal to Walsh coefficients, the fixed point attractors are local maxima in the function, and partial sums across the weights of the network calculate averages for hyperplanes through the function. If the network is trained on data sampled from a distribution, then marginal and conditional probability calculations may be made and samples from the distribution generated from the network. These qualities make the network ideal for optimisation fitness function modelling and make the relationships amongst variables explicit in a way that architectures such as the MLP do not.
ES2013-50
Auto-encoder pre-training of segmented-memory recurrent neural networks
Stefan Glüge, Ronald Böck, Andreas Wendemuth
Auto-encoder pre-training of segmented-memory recurrent neural networks
Stefan Glüge, Ronald Böck, Andreas Wendemuth
Abstract:
The extended Backpropagation Through Time (eBPTT) learning algorithm for Segmented-Memory Recurrent Neural Networks (SMRNNs) yet lacks the ability to reliably learn long-term dependencies. The alternative learning algorithm, extended Real-Time Recurrent Learning (eRTRL), does not suffer from this problem but is computationally very inefficient, such that it is impractical for the training of large networks. The positive results reported with the pre-training of deep neural networks give rise to the hope that SMRNNs could also benefit from a pre-training procedure. In this paper, we introduce a layer-local pre-training procedure for SMRNNs. Using the information latching problem as a benchmark task, the comparison of randomly initialised and pre-trained networks shows the beneficial effect of the unsupervised pre-training. It significantly improves the learning of long-term dependencies in the supervised eBPTT training.
The extended Backpropagation Through Time (eBPTT) learning algorithm for Segmented-Memory Recurrent Neural Networks (SMRNNs) yet lacks the ability to reliably learn long-term dependencies. The alternative learning algorithm, extended Real-Time Recurrent Learning (eRTRL), does not suffer from this problem but is computationally very inefficient, such that it is impractical for the training of large networks. The positive results reported with the pre-training of deep neural networks give rise to the hope that SMRNNs could also benefit from a pre-training procedure. In this paper, we introduce a layer-local pre-training procedure for SMRNNs. Using the information latching problem as a benchmark task, the comparison of randomly initialised and pre-trained networks shows the beneficial effect of the unsupervised pre-training. It significantly improves the learning of long-term dependencies in the supervised eBPTT training.
ES2013-47
Error entropy criterion in echo state network training
Levy Boccato, Daniel G. Silva, Denis Fantinato, Kenji Nose Filho, Rafael Ferrari, Romis Attux, Aline Neves, Jugurta Montalvão, João Marcos T. Romano
Error entropy criterion in echo state network training
Levy Boccato, Daniel G. Silva, Denis Fantinato, Kenji Nose Filho, Rafael Ferrari, Romis Attux, Aline Neves, Jugurta Montalvão, João Marcos T. Romano
Abstract:
Echo state networks offer a promising possibility for an effective use of recurrent structures as the presence of feedback is accompanied with a relatively simple training process. However, such simplicity, which is obtained through the use of an adaptive linear readout that minimizes the mean-squared error, limits the capability of exploring the statistical information of the involved signals. In this work, we apply an information-theoretic learning framework, based on the error entropy criterion, to the ESN training, in order to improve the performance of the neural model, whose advantages are analyzed in the context of supervised channel equalization problem.
Echo state networks offer a promising possibility for an effective use of recurrent structures as the presence of feedback is accompanied with a relatively simple training process. However, such simplicity, which is obtained through the use of an adaptive linear readout that minimizes the mean-squared error, limits the capability of exploring the statistical information of the involved signals. In this work, we apply an information-theoretic learning framework, based on the error entropy criterion, to the ESN training, in order to improve the performance of the neural model, whose advantages are analyzed in the context of supervised channel equalization problem.
ES2013-94
Perceptual grouping through competition in coupled oscillator networks
Martin Meier, Robert Haschke, Helge Ritter
Perceptual grouping through competition in coupled oscillator networks
Martin Meier, Robert Haschke, Helge Ritter
Abstract:
In this paper we present a novel approach to model perceptual grouping based on synchronization in a network of coupled oscillators. To this end, the concept of excitatory and inhibitory connections between recurrent neurons is transfered from the Competitive Layer Model to a network of Kuramoto oscillators, which realizes grouping by phase and frequency synchronization. While preserving the excellent grouping capabilities of the CLM, this approach boosts the computational performance (due its simplicity), which is verified in several experiments.
In this paper we present a novel approach to model perceptual grouping based on synchronization in a network of coupled oscillators. To this end, the concept of excitatory and inhibitory connections between recurrent neurons is transfered from the Competitive Layer Model to a network of Kuramoto oscillators, which realizes grouping by phase and frequency synchronization. While preserving the excellent grouping capabilities of the CLM, this approach boosts the computational performance (due its simplicity), which is verified in several experiments.
ES2013-106
Using Wikipedia with associative networks for document classification
Niels Bloom, Mariet Theune, Franciska de Jong
Using Wikipedia with associative networks for document classification
Niels Bloom, Mariet Theune, Franciska de Jong
Abstract:
We demonstrate a new technique for building associative networks based on Wikipedia, comparing them to WordNet-based associative networks that we used previously, finding the Wikipedia-based networks to perform better at document classification. Additionally, we compare the performance of associative networks to various other text classication techniques using the Reuters-21578 dataset, establishing that associative networks can achieve comparable results.
We demonstrate a new technique for building associative networks based on Wikipedia, comparing them to WordNet-based associative networks that we used previously, finding the Wikipedia-based networks to perform better at document classification. Additionally, we compare the performance of associative networks to various other text classication techniques using the Reuters-21578 dataset, establishing that associative networks can achieve comparable results.
ES2013-7
Automated operational states detection for drilling systems control in critical conditions
Galina Veres, Zoheir Sabeur
Automated operational states detection for drilling systems control in critical conditions
Galina Veres, Zoheir Sabeur
Abstract:
Critical events in industrial drilling should be overcome by engineers while they maintain safety and achieve their operational drilling plans. Complex geophysical drilling requires maximum awareness of critical situations such as “Kicks”, “Fluid loss” or “Stuck pipe”. These may compromise safety and potentially halt operations with the need of staff evacuations from rigs rapidly. In this paper, a robust method for the detection of operational states is proposed. Specifically, Echo State Networks (ESNs) were benchmarked and tested rigorously despite of the challenging training datasets that exhibited imbalance problem issues. These issues were overcome and led to good ESNs performances.
Critical events in industrial drilling should be overcome by engineers while they maintain safety and achieve their operational drilling plans. Complex geophysical drilling requires maximum awareness of critical situations such as “Kicks”, “Fluid loss” or “Stuck pipe”. These may compromise safety and potentially halt operations with the need of staff evacuations from rigs rapidly. In this paper, a robust method for the detection of operational states is proposed. Specifically, Echo State Networks (ESNs) were benchmarked and tested rigorously despite of the challenging training datasets that exhibited imbalance problem issues. These issues were overcome and led to good ESNs performances.
ES2013-27
Analysis of Synaptic Weight Distribution in an Izhikevich Network
Li Guo, Zhijun Yang, Qingbao Zhu
Analysis of Synaptic Weight Distribution in an Izhikevich Network
Li Guo, Zhijun Yang, Qingbao Zhu
Abstract:
Izhikevich network is a relatively new neuronal network, which consists of cortical spiking model neurons with axonal conduction delays and spike-timing-dependent plasticity (STDP) with hard bound adaptation. In this work, we use uniform and Gaussian distributions respectively to initialize the weights of all excitatory neurons. After the network undergoes a few minutes of STDP adaptation, we can see that the weights of all synapses in the network, for both initial weight distributions, form a bimodal distribution, and numerically the established distribution presents dynamic stability.
Izhikevich network is a relatively new neuronal network, which consists of cortical spiking model neurons with axonal conduction delays and spike-timing-dependent plasticity (STDP) with hard bound adaptation. In this work, we use uniform and Gaussian distributions respectively to initialize the weights of all excitatory neurons. After the network undergoes a few minutes of STDP adaptation, we can see that the weights of all synapses in the network, for both initial weight distributions, form a bimodal distribution, and numerically the established distribution presents dynamic stability.
ES2013-23
Percolation model of axon guidance
Gaetano Liborio Aiello, Valentino Romano
Percolation model of axon guidance
Gaetano Liborio Aiello, Valentino Romano
Abstract:
In the developing brain neurons interconnect via the action of molecules that guide the axon to its targets, thus allowing the proper wiring scheme to emerge. It is not fully understood whether the underlying mechanism is wholly deterministic or not. The existence of “choice-points” and “decision-regions” suggest that options are available to the growth cone. The guidance mechanism is here simulated by equating the axonal trajectory to that of a trickle of ground water sipping through a bed of sand. Decision regions are implemented by assigning each site of the percolation lattice a set of probabilities ruling the possible moves.
In the developing brain neurons interconnect via the action of molecules that guide the axon to its targets, thus allowing the proper wiring scheme to emerge. It is not fully understood whether the underlying mechanism is wholly deterministic or not. The existence of “choice-points” and “decision-regions” suggest that options are available to the growth cone. The guidance mechanism is here simulated by equating the axonal trajectory to that of a trickle of ground water sipping through a bed of sand. Decision regions are implemented by assigning each site of the percolation lattice a set of probabilities ruling the possible moves.
ES2013-5
Efficient VLSI Architecture for Spike Sorting Based on Generalized Hebbian Algorithm
Wen-Jyi Hwang, Hao Chen
Efficient VLSI Architecture for Spike Sorting Based on Generalized Hebbian Algorithm
Wen-Jyi Hwang, Hao Chen
Abstract:
A novel hardware architecture for fast spike sorting is presented in this paper. The architecture is able to perform feature extraction based on the Generalized Hebbian Algorithm (GHA). The employment of GHA allows efficient computation of principal components for subsequent clustering and classification operations. The hardware implementations of GHA features high throughput, low power dissipation, and low area costs. The proposed architecture is implemented by Field Programmable Gate Array (FPGA). It is embedded in a System-On-Programmable-Chip(SOPC) platform for performance measurement. Experimental results show that the proposed architecture is an efficient spike sorting design for attaining low hardware resource utilization and high speed computation.
A novel hardware architecture for fast spike sorting is presented in this paper. The architecture is able to perform feature extraction based on the Generalized Hebbian Algorithm (GHA). The employment of GHA allows efficient computation of principal components for subsequent clustering and classification operations. The hardware implementations of GHA features high throughput, low power dissipation, and low area costs. The proposed architecture is implemented by Field Programmable Gate Array (FPGA). It is embedded in a System-On-Programmable-Chip(SOPC) platform for performance measurement. Experimental results show that the proposed architecture is an efficient spike sorting design for attaining low hardware resource utilization and high speed computation.
Dimensionality reduction
ES2013-46
Soft rank neighbor embeddings
Marc Strickert, Kerstin Bunte
Soft rank neighbor embeddings
Marc Strickert, Kerstin Bunte
Abstract:
Correlation-based multidimensional scaling is proposed for reconstructing pairwise dissimilarity or score relationships in a Euclidean space. Pearson correlation between pairs of objects in source and target space can be directly maximized by gradient methods, while gradient optimization of Spearman rank correlation profits from a numerically soft formulation introduced in this work. Scale and shift invariance properties of correlation help circumventing typical distance concentration problems.
Correlation-based multidimensional scaling is proposed for reconstructing pairwise dissimilarity or score relationships in a Euclidean space. Pearson correlation between pairs of objects in source and target space can be directly maximized by gradient methods, while gradient optimization of Spearman rank correlation profits from a numerically soft formulation introduced in this work. Scale and shift invariance properties of correlation help circumventing typical distance concentration problems.
ES2013-99
Multiple Kernel Self-Organizing Maps
Madalina Olteanu, Nathalie Villa-Vialaneix, Christine Cierco-Ayrolles
Multiple Kernel Self-Organizing Maps
Madalina Olteanu, Nathalie Villa-Vialaneix, Christine Cierco-Ayrolles
Abstract:
In a number of real-life applications, the user is interested in analyzing several sources of information together: a graph together with additional information known on its nodes, numerical variables measured on individuals together with factors describing these individuals... The combination of all the sources of information can help him to better understand the dataset in its whole. The present article focuses on such an issue, by using self-organizing maps. Using a kernel version of the algorithm makes it possible to combine various types of information (graph, numerical values, factors, strings...) and to automatically find a good trade-off between all sources of data, but using an automated procedure to tune the data combination. This approach is illustrated on several examples.
In a number of real-life applications, the user is interested in analyzing several sources of information together: a graph together with additional information known on its nodes, numerical variables measured on individuals together with factors describing these individuals... The combination of all the sources of information can help him to better understand the dataset in its whole. The present article focuses on such an issue, by using self-organizing maps. Using a kernel version of the algorithm makes it possible to combine various types of information (graph, numerical values, factors, strings...) and to automatically find a good trade-off between all sources of data, but using an automated procedure to tune the data combination. This approach is illustrated on several examples.
ES2013-38
Semi-Supervised Vector Quantization for proximity data
Xibin Zhu, Frank-Michael Schleif, Barbara Hammer
Semi-Supervised Vector Quantization for proximity data
Xibin Zhu, Frank-Michael Schleif, Barbara Hammer
Abstract:
Semi-supervised learning (SSL) is focused on learning from labeled and unlabeled data by incorporating structural and statistical information of the available unlabeled data. The amount of data is dramatically increasing, but few of them are fully labeled, due to cost and time constraints. Even more challenging are non-vectorial, so called proximity data, with data given by pairwise proximity values, like score-values in sequence alignments, having no regular vector-space representation. Only few methods provide SSL for this data, limited to positive-semi-definite (psd) data. They also lack interpretable models, which is a relevant aspect in life-sciences where most of these data are found. This paper provides a prototype based SSL approach for proximity data.
Semi-supervised learning (SSL) is focused on learning from labeled and unlabeled data by incorporating structural and statistical information of the available unlabeled data. The amount of data is dramatically increasing, but few of them are fully labeled, due to cost and time constraints. Even more challenging are non-vectorial, so called proximity data, with data given by pairwise proximity values, like score-values in sequence alignments, having no regular vector-space representation. Only few methods provide SSL for this data, limited to positive-semi-definite (psd) data. They also lack interpretable models, which is a relevant aspect in life-sciences where most of these data are found. This paper provides a prototype based SSL approach for proximity data.
ES2013-66
Sensitivity to parameter and data variations in dimensionality reduction techniques
Francisco J. García-Fernández, Michel Verleysen, John A. Lee, Ignacio Díaz
Sensitivity to parameter and data variations in dimensionality reduction techniques
Francisco J. García-Fernández, Michel Verleysen, John A. Lee, Ignacio Díaz
Abstract:
Dimensionality reduction techniques aim at representing high-dimensional data in a meaningful and lower dimensional space, improving the human comprehension and interpretation of data. In recent years, newer nonlinear techniques have been proposed in order to address the limitation of linear techniques. This paper presents a study of the stability of some of these dimensionality reduction techniques, analyzing their behavior under changes in the parameters and the data. The performances of these techniques are investigated on artificial datasets. The paper presents these results by identifying the weaknesses of each technique, and suggests some data-processing tasks to improve the stability.
Dimensionality reduction techniques aim at representing high-dimensional data in a meaningful and lower dimensional space, improving the human comprehension and interpretation of data. In recent years, newer nonlinear techniques have been proposed in order to address the limitation of linear techniques. This paper presents a study of the stability of some of these dimensionality reduction techniques, analyzing their behavior under changes in the parameters and the data. The performances of these techniques are investigated on artificial datasets. The paper presents these results by identifying the weaknesses of each technique, and suggests some data-processing tasks to improve the stability.
Image, signal and time series analysis
ES2013-112
A nuclear-norm based convex formulation for informed source separation
Augustin Lefèvre, François Glineur, P.A. Absil
A nuclear-norm based convex formulation for informed source separation
Augustin Lefèvre, François Glineur, P.A. Absil
Abstract:
Abstract. We study the problem of separating audio sources from a single linear mixture. The goal is to find a decomposition of the single channel spectrogram into a sum of individual contributions associated to a certain number of sources. In this paper, we consider an informed source separation problem in which the input spectrogram is partly annotated. We propose a convex formulation that relies on a nuclear norm penalty to induce low rank for the contributions. We show experimentally that solving this model with a simple subgradient method outperforms a previ- ously introduced nonnegative matrix factorization (NMF) technique, both in terms of source separation quality and computation time.
Abstract. We study the problem of separating audio sources from a single linear mixture. The goal is to find a decomposition of the single channel spectrogram into a sum of individual contributions associated to a certain number of sources. In this paper, we consider an informed source separation problem in which the input spectrogram is partly annotated. We propose a convex formulation that relies on a nuclear norm penalty to induce low rank for the contributions. We show experimentally that solving this model with a simple subgradient method outperforms a previ- ously introduced nonnegative matrix factorization (NMF) technique, both in terms of source separation quality and computation time.
ES2013-56
Frequency-Dependent Peak-Over-Threshold algorithm for fault detection in the spectral domain
Aurélien Hazan, Kurosh Madani
Frequency-Dependent Peak-Over-Threshold algorithm for fault detection in the spectral domain
Aurélien Hazan, Kurosh Madani
Abstract:
An original novelty detection algorithm in the Fourier domain, using extreme value theory (EVT) is considered in this article. Periodograms may be considered as frequency-dependent random variables, and this can be taken into account when designing statistical tests. Frequency-Dependent Peak-Over-Threshold (FDPOT) puts special emphasis on the frequency dependence of extreme value statistics, thanks to Vector Generalized Additive Models (VGAM) estimation. An application is discussed in the field of mechanical vibrations. It is first shown that performance increases compared to POT detection. Then FDPOT is compared to state-of-the-art algorithms such as KPCA.
An original novelty detection algorithm in the Fourier domain, using extreme value theory (EVT) is considered in this article. Periodograms may be considered as frequency-dependent random variables, and this can be taken into account when designing statistical tests. Frequency-Dependent Peak-Over-Threshold (FDPOT) puts special emphasis on the frequency dependence of extreme value statistics, thanks to Vector Generalized Additive Models (VGAM) estimation. An application is discussed in the field of mechanical vibrations. It is first shown that performance increases compared to POT detection. Then FDPOT is compared to state-of-the-art algorithms such as KPCA.
ES2013-82
Activity Date Estimation in Timestamped Interaction Networks
Fabrice Rossi, Pierre Latouche
Activity Date Estimation in Timestamped Interaction Networks
Fabrice Rossi, Pierre Latouche
Abstract:
We propose in this paper a new generative model for graphs that uses a latent space approach to explain timestamped interactions. The model is designed to provide global estimates of activity dates in historical networks where only the interaction dates between agents are known with reasonable precision. Experimental results show that the model provides better results than local averages in dense enough networks.
We propose in this paper a new generative model for graphs that uses a latent space approach to explain timestamped interactions. The model is designed to provide global estimates of activity dates in historical networks where only the interaction dates between agents are known with reasonable precision. Experimental results show that the model provides better results than local averages in dense enough networks.
ES2013-60
Novelty detection in image recognition using IRF Neural Networks properties
Philippe Smagghe, Jean-Luc Buessler, Jean-Philippe Urban
Novelty detection in image recognition using IRF Neural Networks properties
Philippe Smagghe, Jean-Luc Buessler, Jean-Philippe Urban
Abstract:
Image Receptive Fields Neural Network (IRF-NN) is a variant of feedforward multi-layer perceptrons adapted to image recognition. It shows very fast training as well as robust and accurate results on supervised classification tasks. This paper presents another property of IRF-NN: responses of trained networks can be analysed to detect unknown images. Several discriminative and efficient novelty criteria are introduced and tested successfully on the ALOI image dataset. A combination of novelty detection and object recognition is illustrated with a robust, pose invariant application of multi-object localization in various backgrounds
Image Receptive Fields Neural Network (IRF-NN) is a variant of feedforward multi-layer perceptrons adapted to image recognition. It shows very fast training as well as robust and accurate results on supervised classification tasks. This paper presents another property of IRF-NN: responses of trained networks can be analysed to detect unknown images. Several discriminative and efficient novelty criteria are introduced and tested successfully on the ALOI image dataset. A combination of novelty detection and object recognition is illustrated with a robust, pose invariant application of multi-object localization in various backgrounds
ES2013-42
Non-Euclidean independent component analysis and Oja's learning
Mandy Lange, Michael Biehl, Thomas Villmann
Non-Euclidean independent component analysis and Oja's learning
Mandy Lange, Michael Biehl, Thomas Villmann
Abstract:
In the present contribution we tackle the problem of nonlinear independent component analysis by non-Euclidean Hebbian-like learning. Independent component analysis (ICA) and blind source separation originally were introduced as tools for the linear unmixing of the signals to detect the underlying sources. Hebbian methods became very popular and succesfully in this context. Many nonlinear ICA extensions are known. A promising strategy is the application of kernel mapping. Kernel mapping realizes an usually nonlinear but implicite data mapping of the data into a reproducing kernel Hilbert space. After that a linear demixing can be carried out there. However, explicit handling in this non-Euclidean kernel mapping space is impossible. We show in this paper an alternative using an isomorphic mapping space. In particular, we show that the idea of Hebbian-like learning of \emph{kernel }ICA can be transferred to this non-Euclidean space realizing an non-Euclidean ICA.
In the present contribution we tackle the problem of nonlinear independent component analysis by non-Euclidean Hebbian-like learning. Independent component analysis (ICA) and blind source separation originally were introduced as tools for the linear unmixing of the signals to detect the underlying sources. Hebbian methods became very popular and succesfully in this context. Many nonlinear ICA extensions are known. A promising strategy is the application of kernel mapping. Kernel mapping realizes an usually nonlinear but implicite data mapping of the data into a reproducing kernel Hilbert space. After that a linear demixing can be carried out there. However, explicit handling in this non-Euclidean kernel mapping space is impossible. We show in this paper an alternative using an isomorphic mapping space. In particular, we show that the idea of Hebbian-like learning of \emph{kernel }ICA can be transferred to this non-Euclidean space realizing an non-Euclidean ICA.
ES2013-48
Automatic Singular Spectrum Analysis for Time-Series Decomposition
Andres Marino Alvarez-Meza, Carlos Daniel Acosta-Medina, Germán Castellanos-Dominguez
Automatic Singular Spectrum Analysis for Time-Series Decomposition
Andres Marino Alvarez-Meza, Carlos Daniel Acosta-Medina, Germán Castellanos-Dominguez
Abstract:
An automatic singular spectrum analysis - SSA based methodology is proposed to decompose and reconstruct time-series. We suggest a clustering based procedure to decompose the main dynamics of the input signal. A subset of orthogonal basis computed from the input are selected using a power based criterion. Then, the subset of basis are represented by a discrete fourier transform, to identify basis encoding similar data structures, which are employed to infer the hidden components of the signal. Our approach is tested over some synthetic and real-world datasets, showing that our algorithm is a good tool to interpret and decomposes time-series.
An automatic singular spectrum analysis - SSA based methodology is proposed to decompose and reconstruct time-series. We suggest a clustering based procedure to decompose the main dynamics of the input signal. A subset of orthogonal basis computed from the input are selected using a power based criterion. Then, the subset of basis are represented by a discrete fourier transform, to identify basis encoding similar data structures, which are employed to infer the hidden components of the signal. Our approach is tested over some synthetic and real-world datasets, showing that our algorithm is a good tool to interpret and decomposes time-series.
ES2013-64
Dimension reduction for individual ica to decompose FMRI during real-world experiences: principal component analysis vs. canonical correlation analysis
Valeri Tsatsishvili, Fengyu Cong, Tuomas Puoliväli, Vinoo Alluri, Petri Toiviainen, Asoke K. Nandi, Elvira Brattico, Tapani Ristaniemi
Dimension reduction for individual ica to decompose FMRI during real-world experiences: principal component analysis vs. canonical correlation analysis
Valeri Tsatsishvili, Fengyu Cong, Tuomas Puoliväli, Vinoo Alluri, Petri Toiviainen, Asoke K. Nandi, Elvira Brattico, Tapani Ristaniemi
Abstract:
Data analysis for functional magnetic resonance imaging collected during real-world experiences is critical. Independent component analysis (ICA) has been used to extract desired spatial maps. Before ICA, dimension reduction is used to separate the signal and the noise subspaces. Recently, in addition to the widely used Principal component analysis (PCA) and model order selection, canonical correlation analysis (CCA) has been exploited to find the correlated and uncorrelated subspaces between two datasets. This study compares CCA and PCA for dimension reduction for ICA to decompose very noisy fMRI elicited by natural and continuous music. We find that their performances are comparable.
Data analysis for functional magnetic resonance imaging collected during real-world experiences is critical. Independent component analysis (ICA) has been used to extract desired spatial maps. Before ICA, dimension reduction is used to separate the signal and the noise subspaces. Recently, in addition to the widely used Principal component analysis (PCA) and model order selection, canonical correlation analysis (CCA) has been exploited to find the correlated and uncorrelated subspaces between two datasets. This study compares CCA and PCA for dimension reduction for ICA to decompose very noisy fMRI elicited by natural and continuous music. We find that their performances are comparable.
ES2013-45
Machine Learning Techniques for Short-Term Electric Power Demand Prediction
Fernando Mateo, Juan J. Carrasco, Mónica Millán-Giraldo, Abderrahim Sellami, Pablo Escandell-Montero, José M. Martínez-Martínez, Emilio Soria-Olivas
Machine Learning Techniques for Short-Term Electric Power Demand Prediction
Fernando Mateo, Juan J. Carrasco, Mónica Millán-Giraldo, Abderrahim Sellami, Pablo Escandell-Montero, José M. Martínez-Martínez, Emilio Soria-Olivas
Abstract:
Since several years ago, power consumption forecast has attracted considerable attention from the scientific community. Although there exist several works that deal with this issue, it remains open. The good management of energy consumption in HVAC (Heating, Ventilation and Air Conditioning ) systems for large households and public buildings may benefit from a sustainable development in terms of economy and environmental preservation. In this paper, several Machine Learning techniques are evaluated and compared with a linear technique (Robust Multiple Linear Regression) and a naïve method. All methods have been applied to five buildings of the University of León (Spain), the results indicate nonlinear techniques outperform the linear one in most scenarios.
Since several years ago, power consumption forecast has attracted considerable attention from the scientific community. Although there exist several works that deal with this issue, it remains open. The good management of energy consumption in HVAC (Heating, Ventilation and Air Conditioning ) systems for large households and public buildings may benefit from a sustainable development in terms of economy and environmental preservation. In this paper, several Machine Learning techniques are evaluated and compared with a linear technique (Robust Multiple Linear Regression) and a naïve method. All methods have been applied to five buildings of the University of León (Spain), the results indicate nonlinear techniques outperform the linear one in most scenarios.
ES2013-6
Unsupervised non-linear neural networks capture aspects of floral choice behaviour
Levente Orbán, Sylvain Chartier
Unsupervised non-linear neural networks capture aspects of floral choice behaviour
Levente Orbán, Sylvain Chartier
Abstract:
Two unsupervised neural networks were tested to understand the extent to which they capture elements of bumblebees’ unlearned preferences towards flower-like visual properties. The networks, which are based on Independent Component Analysis and Feature-Extracting Bidirectional Associative Memory use images of test-patterns that are identical to ones used in behavioural studies. While both models show consistency with behavioural results, the ICA model matches behavioural results sub- stantially better in terms of image reconstruction quality of radial and concentric patterns, and foliage background. Both models generated a novel prediction of an interaction between spatial frequency and symmetry. These results are interpreted to support the hypothesis that flower displays are adapted to pollinators’ information processing constraints.
Two unsupervised neural networks were tested to understand the extent to which they capture elements of bumblebees’ unlearned preferences towards flower-like visual properties. The networks, which are based on Independent Component Analysis and Feature-Extracting Bidirectional Associative Memory use images of test-patterns that are identical to ones used in behavioural studies. While both models show consistency with behavioural results, the ICA model matches behavioural results sub- stantially better in terms of image reconstruction quality of radial and concentric patterns, and foliage background. Both models generated a novel prediction of an interaction between spatial frequency and symmetry. These results are interpreted to support the hypothesis that flower displays are adapted to pollinators’ information processing constraints.
Feature selection
ES2013-117
GA-KDE-Bayes: an evolutionary wrapper method based on non-parametric density estimation applied to bioinformatics problems
Maria Fernanda Wanderley, Vincent Gardeux, René Natowicz, Antônio Braga
GA-KDE-Bayes: an evolutionary wrapper method based on non-parametric density estimation applied to bioinformatics problems
Maria Fernanda Wanderley, Vincent Gardeux, René Natowicz, Antônio Braga
Abstract:
This paper presents an evolutionary wrapper method for feature selection that uses a non-parametric density estimation method and a Bayesian Classifier. Non-parametric methods are a good alternative for scarce and sparse data, as in Bioinformatics problems, since they do not make any assumptions about its structure and all the information come from data itself. Results show that local modeling provides small and relevant subsets of features when comparing to results available on literature.
This paper presents an evolutionary wrapper method for feature selection that uses a non-parametric density estimation method and a Bayesian Classifier. Non-parametric methods are a good alternative for scarce and sparse data, as in Bioinformatics problems, since they do not make any assumptions about its structure and all the information come from data itself. Results show that local modeling provides small and relevant subsets of features when comparing to results available on literature.
ES2013-77
Risk Estimation and Feature Selection
Gauthier Doquire, Benoît Frénay, Michel Verleysen
Risk Estimation and Feature Selection
Gauthier Doquire, Benoît Frénay, Michel Verleysen
Abstract:
For classification problems, the risk is often the criterion to be eventually minimised. It can thus naturally be used to assess the quality of feature subsets in feature selection. However, in practice, the probability of error is often unkwown and must be estimated. Also, mutual information is often used as a criterion to assess the quality of feature subsets, since it can be seen as an imperfect proxy for the risk and can be reliably estimated. In this paper, two different ways to estimate the risk using the Kozachenko-Leonenko probability density estimator are proposed. The resulting estimators are compared on feature selection problems with a mutual information estimator based on the same density estimator. Along the line of our previous works, experiments show that using an estimator of either the risk or the mutual information give similar results.
For classification problems, the risk is often the criterion to be eventually minimised. It can thus naturally be used to assess the quality of feature subsets in feature selection. However, in practice, the probability of error is often unkwown and must be estimated. Also, mutual information is often used as a criterion to assess the quality of feature subsets, since it can be seen as an imperfect proxy for the risk and can be reliably estimated. In this paper, two different ways to estimate the risk using the Kozachenko-Leonenko probability density estimator are proposed. The resulting estimators are compared on feature selection problems with a mutual information estimator based on the same density estimator. Along the line of our previous works, experiments show that using an estimator of either the risk or the mutual information give similar results.
ES2013-67
Random Brains: An ensemble method for feature selection with neural networks
Mark Embrechts, Jonathan Linton, Jorge Santos
Random Brains: An ensemble method for feature selection with neural networks
Mark Embrechts, Jonathan Linton, Jorge Santos
Abstract:
The purpose of this paper is to introduce and validate Random Brains, a novel artificial neural network based feature selection technique. Feature selection is widely used in high-dimensional data and it aims on removing irrelevant or redundant data, providing faster predictors without a significant decrease in model performance. Random Brains, inspired by Breiman’s Random Forests, are bagged ensembles of predictive neural network models that use randomly selected subsets of features. This paper validates Random Brains on several classification and regression benchmark data sets by comparing its performance to similar models with features selected based on sensitivity analysis.
The purpose of this paper is to introduce and validate Random Brains, a novel artificial neural network based feature selection technique. Feature selection is widely used in high-dimensional data and it aims on removing irrelevant or redundant data, providing faster predictors without a significant decrease in model performance. Random Brains, inspired by Breiman’s Random Forests, are bagged ensembles of predictive neural network models that use randomly selected subsets of features. This paper validates Random Brains on several classification and regression benchmark data sets by comparing its performance to similar models with features selected based on sensitivity analysis.
ES2013-41
A distributed wrapper approach for feature selection
Veronica Bolon-Canedo, Noelia Sánchez-Maroño, Amparo Alonso-Betanzos
A distributed wrapper approach for feature selection
Veronica Bolon-Canedo, Noelia Sánchez-Maroño, Amparo Alonso-Betanzos
Abstract:
In recent years, distributed learning has been the focus of much attention due to the proliferation of big databases, usually distributed. In this context, machine learning can take advantage of feature selection methods to deal with these datasets of high dimensionality. However, the great majority of current feature selection algorithms are designed for centralized learning. To confront the problem of distributed feature selection, in this paper we propose a distributed wrapper approach. In this manner, the learning accuracy can be improved, as well as obtaining a reduction in the memory requirements and execution time. Four representative datasets were selected to test the approach, paving the way to its application over extremely-high data which prevented previously the use of wrapper approaches.
In recent years, distributed learning has been the focus of much attention due to the proliferation of big databases, usually distributed. In this context, machine learning can take advantage of feature selection methods to deal with these datasets of high dimensionality. However, the great majority of current feature selection algorithms are designed for centralized learning. To confront the problem of distributed feature selection, in this paper we propose a distributed wrapper approach. In this manner, the learning accuracy can be improved, as well as obtaining a reduction in the memory requirements and execution time. Four representative datasets were selected to test the approach, paving the way to its application over extremely-high data which prevented previously the use of wrapper approaches.
ES2013-52
Feature Selection for Footwear Shape Estimation
Fernando Mateo, Mónica Millán-Giraldo, Juan J. Carrasco, Enrique Montiel, Jose A. Bernabeu, José D. Martín-Guerrero
Feature Selection for Footwear Shape Estimation
Fernando Mateo, Mónica Millán-Giraldo, Juan J. Carrasco, Enrique Montiel, Jose A. Bernabeu, José D. Martín-Guerrero
Abstract:
This study proposes feature selection techniques to obtain a set of significant foot anthropometric measurements that can assist custumers in the choice of footwear size and width. The results given by a number of methods are averaged to provide a reliable set of features. Several machine learning methods are used to evaluate the classification (for the width) and regression (for the size) accuracies before and after feature selection. The results prove the benefits of carrying out feature selection, especially for the shoe width.
This study proposes feature selection techniques to obtain a set of significant foot anthropometric measurements that can assist custumers in the choice of footwear size and width. The results given by a number of methods are averaged to provide a reliable set of features. Several machine learning methods are used to evaluate the classification (for the width) and regression (for the size) accuracies before and after feature selection. The results prove the benefits of carrying out feature selection, especially for the shoe width.
ES2013-116
Efficient prediction of x-axis intercepts of discrete impedance spectra
Thomas Schmid, Dorothee Günzel, Martin Bogdan
Efficient prediction of x-axis intercepts of discrete impedance spectra
Thomas Schmid, Dorothee Günzel, Martin Bogdan
Abstract:
In impedance spectroscopy of epithelial cell layers, it is a common task to extrapolate discrete two-dimensional plots in order to determine electrical properties associated with axis intercepts. Here, we investigate how implicit properties of such curves can be used to predict the x-axis intercept where explicitly determined properties fail to do so. We perform feature extraction, algorithmic feature ranking and dimension reduction on model impedance spectra derived from a tissue-equivalent electric circuit. Selected feature subsets are assessed by training artificial neural networks to predict the intercept. Results show that subsets of three or less implicit features provide a reasonable basis for predictions.
In impedance spectroscopy of epithelial cell layers, it is a common task to extrapolate discrete two-dimensional plots in order to determine electrical properties associated with axis intercepts. Here, we investigate how implicit properties of such curves can be used to predict the x-axis intercept where explicitly determined properties fail to do so. We perform feature extraction, algorithmic feature ranking and dimension reduction on model impedance spectra derived from a tissue-equivalent electric circuit. Selected feature subsets are assessed by training artificial neural networks to predict the intercept. Results show that subsets of three or less implicit features provide a reasonable basis for predictions.
ES2013-58
Evolutionary computation based system decomposition with neural networks
Robert Kaltenhaeuser, Erik Schaffernicht, Frank-Florian Steege, Horst-Michael Gross
Evolutionary computation based system decomposition with neural networks
Robert Kaltenhaeuser, Erik Schaffernicht, Frank-Florian Steege, Horst-Michael Gross
Abstract:
We present an evolutionary approach to divide a complex control system into smaller sub-systems with the help of neural networks. Thereto, measured channels are partitioned into several disjunct sets, representing possible sub-problems, while the networks are used to assess the quality of the resulting decomposition. We show that this approach is well suited to calculate correct decompositions of complex control systems. Furthermore, the obtained neural networks are used to predict important process factors with considerable better approximation quality than monolithic approaches that have to deal with all input channels in parallel.
We present an evolutionary approach to divide a complex control system into smaller sub-systems with the help of neural networks. Thereto, measured channels are partitioned into several disjunct sets, representing possible sub-problems, while the networks are used to assess the quality of the resulting decomposition. We show that this approach is well suited to calculate correct decompositions of complex control systems. Furthermore, the obtained neural networks are used to predict important process factors with considerable better approximation quality than monolithic approaches that have to deal with all input channels in parallel.
Reinforcement learning, control and optimization
ES2013-100
Fast online adaptivity with policy gradient: example of the BCI ``P300''-speller
Emmanuel Daucé, Timothée Proix, Liva Ralaivola
Fast online adaptivity with policy gradient: example of the BCI ``P300''-speller
Emmanuel Daucé, Timothée Proix, Liva Ralaivola
Abstract:
We tackle the problem of reward-based online learning of multiclass classifiers and consider a policy gradient ascent to solve this problem in the linear case. We apply it to the online adaptation of an EEG-based ``P300''-speller. When applied from scratch, a robust classifier is obtained in few steps.
We tackle the problem of reward-based online learning of multiclass classifiers and consider a policy gradient ascent to solve this problem in the linear case. We apply it to the online adaptation of an EEG-based ``P300''-speller. When applied from scratch, a robust classifier is obtained in few steps.
ES2013-73
Locally Weighted Least Squares Temporal Difference Learning
Matthew Howard, Yoshihiko Nakamura
Locally Weighted Least Squares Temporal Difference Learning
Matthew Howard, Yoshihiko Nakamura
Abstract:
This paper introduces locally weighted temporal difference learning for evaluation of a class of policies whose value function is non-linear in the state. Least squares temporal difference learning is used for training local models according to a distance metric in state-space. Empirical evaluations are reported demonstrating learning performance on a number of strongly non-linear value functions, without the need for prior knowledge of features or a specific functional form.
This paper introduces locally weighted temporal difference learning for evaluation of a class of policies whose value function is non-linear in the state. Least squares temporal difference learning is used for training local models according to a distance metric in state-space. Empirical evaluations are reported demonstrating learning performance on a number of strongly non-linear value functions, without the need for prior knowledge of features or a specific functional form.
ES2013-26
Learning control under uncertainty: A probabilistic Value-Iteration approach
Bastian Bischoff, Duy Nguyen-Tuong, Heiner Markert, Alois Knoll
Learning control under uncertainty: A probabilistic Value-Iteration approach
Bastian Bischoff, Duy Nguyen-Tuong, Heiner Markert, Alois Knoll
Abstract:
In this paper, we introduce a probabilistic version of the well-studied Value-Iteration approach, i.e. Probabilistic Value-Iteration (PVI). The PVI approach can handle continuous states and actions in an episodic Reinforcement Learning (RL) setting, while using Gaussian Processes to model the state uncertainties. We further show, how the approach can be efficiently realized making it suitable for learning with large data. The proposed PVI is evaluated on a benchmark problem, as well as on a real robot for learning a control task. A comparison of PVI with two state-of-the-art RL algorithms shows that the proposed approach is competitive in performance while being efficient in learning.
In this paper, we introduce a probabilistic version of the well-studied Value-Iteration approach, i.e. Probabilistic Value-Iteration (PVI). The PVI approach can handle continuous states and actions in an episodic Reinforcement Learning (RL) setting, while using Gaussian Processes to model the state uncertainties. We further show, how the approach can be efficiently realized making it suitable for learning with large data. The proposed PVI is evaluated on a benchmark problem, as well as on a real robot for learning a control task. A comparison of PVI with two state-of-the-art RL algorithms shows that the proposed approach is competitive in performance while being efficient in learning.
ES2013-93
Ensembles for Continuous Actions in Reinforcement Learning
Siegmund Duell, Steffen Udluft
Ensembles for Continuous Actions in Reinforcement Learning
Siegmund Duell, Steffen Udluft
Abstract:
Data efficient reinforcement learning methods allow to optimize controllers (policies) for complex technical systems in a data-driven manner. Still there is the risk that, when running such a policy on the real system, it performs considerably worse than expected. For policies with discrete actions it has been shown, that this risk can be reduced considerably, when, instead of just using a single policy, that by chance might be inferior, a whole ensemble of policies is used to select the final policy by an aggregation like, e.g., majority voting. In this paper we extend the applicability of the ensemble approach to vector-valued, continuous actions.
Data efficient reinforcement learning methods allow to optimize controllers (policies) for complex technical systems in a data-driven manner. Still there is the risk that, when running such a policy on the real system, it performs considerably worse than expected. For policies with discrete actions it has been shown, that this risk can be reduced considerably, when, instead of just using a single policy, that by chance might be inferior, a whole ensemble of policies is used to select the final policy by an aggregation like, e.g., majority voting. In this paper we extend the applicability of the ensemble approach to vector-valued, continuous actions.
ES2013-68
An empirical analysis of reinforcement learning using design of experiments
Christopher Gatti, Mark Embrechts, Jonathan Linton
An empirical analysis of reinforcement learning using design of experiments
Christopher Gatti, Mark Embrechts, Jonathan Linton
Abstract:
This study uses a design of experiments approach to understand the behavior of a neural network to learn the mountain car domain using reinforcement learning. A large experiment is first performed to characterize the probability of empirical convergence based on three reinforcement learning algorithm parameters (λ, γ, ε), and a logistic regression model is fitted to this data. A detailed analysis of a subset of the parameter space finds that, upon convergence, algorithm parameters have significant effects on the convergence speed and mean performance, though performance differences are minimal.
This study uses a design of experiments approach to understand the behavior of a neural network to learn the mountain car domain using reinforcement learning. A large experiment is first performed to characterize the probability of empirical convergence based on three reinforcement learning algorithm parameters (λ, γ, ε), and a logistic regression model is fitted to this data. A detailed analysis of a subset of the parameter space finds that, upon convergence, algorithm parameters have significant effects on the convergence speed and mean performance, though performance differences are minimal.
ES2013-19
Hierarchical Reinforcement Learning for Robot Navigation
Bastian Bischoff, Duy Nguyen-Tuong, I-Hsuan Lee, Felix Streichert, Alois Knoll
Hierarchical Reinforcement Learning for Robot Navigation
Bastian Bischoff, Duy Nguyen-Tuong, I-Hsuan Lee, Felix Streichert, Alois Knoll
Abstract:
For complex tasks, such as manipulation and robot navigation, reinforcement learning (RL) is well-known to be difficult due to the curse of dimensionality. To overcome this complexity and making RL feasible, hierarchical RL (HRL) has been suggested. The basic idea of HRL is to divide the original task into elementary subtasks, which can be learned using RL. In this paper, we propose a HRL architecture for learning robot's movements, e.g. robot navigation. The proposed HRL consists of two layers: (i) movement planning and (ii) movement execution. In the planning layer, e.g. generating navigation trajectories, discrete RL is employed while using movement primitives. Given the movement planning and corresponding primitives, the policy for the movement execution can be learned in the second layer using continuous RL. The proposed approach is implemented and evaluated on a mobile robot platform for a navigation task.
For complex tasks, such as manipulation and robot navigation, reinforcement learning (RL) is well-known to be difficult due to the curse of dimensionality. To overcome this complexity and making RL feasible, hierarchical RL (HRL) has been suggested. The basic idea of HRL is to divide the original task into elementary subtasks, which can be learned using RL. In this paper, we propose a HRL architecture for learning robot's movements, e.g. robot navigation. The proposed HRL consists of two layers: (i) movement planning and (ii) movement execution. In the planning layer, e.g. generating navigation trajectories, discrete RL is employed while using movement primitives. Given the movement planning and corresponding primitives, the policy for the movement execution can be learned in the second layer using continuous RL. The proposed approach is implemented and evaluated on a mobile robot platform for a navigation task.
ES2013-2
Least-squares temporal difference learning based on extreme learning machine
Pablo Escandell-Montero, José M. Martínez-Martínez, José D. Martín-Guerrero, Emilio Soria-Olivas, Juan Gómez-Sanchis
Least-squares temporal difference learning based on extreme learning machine
Pablo Escandell-Montero, José M. Martínez-Martínez, José D. Martín-Guerrero, Emilio Soria-Olivas, Juan Gómez-Sanchis
Abstract:
This paper proposes a least-squares temporal difference (LSTD) algorithm based on extreme learning machine that uses a single-hidden layer feedforward network to approximate the value function. While LSTD is typically combined with local function approximators, the proposed approach uses a global approximator that allows better scalability properties. The results of the experiments carried out on four Markov decision processes show the usefulness of the proposed approach.
This paper proposes a least-squares temporal difference (LSTD) algorithm based on extreme learning machine that uses a single-hidden layer feedforward network to approximate the value function. While LSTD is typically combined with local function approximators, the proposed approach uses a global approximator that allows better scalability properties. The results of the experiments carried out on four Markov decision processes show the usefulness of the proposed approach.
ES2013-91
Binary particle swarm optimisation with improved scaling behaviour
Denise Gorse
Binary particle swarm optimisation with improved scaling behaviour
Denise Gorse
Abstract:
A boolean particle swarm optimisation (PSO) algorithm is presented that builds on the strengths of earlier proposals but which by introducing a wholly random element into the search process shows greatly improved performance in higher dimensional search spaces in comparison also to the binary PSO algorithm of Kennedy and Eberhart.
A boolean particle swarm optimisation (PSO) algorithm is presented that builds on the strengths of earlier proposals but which by introducing a wholly random element into the search process shows greatly improved performance in higher dimensional search spaces in comparison also to the binary PSO algorithm of Kennedy and Eberhart.
ES2013-62
Dynamic Placement with Connectivity for RSNs based on a Primal-Dual Neural Network
Rafael Lima Carvalho, Lunlong Zhong, Felipe França, Félix Mora-Camino
Dynamic Placement with Connectivity for RSNs based on a Primal-Dual Neural Network
Rafael Lima Carvalho, Lunlong Zhong, Felipe França, Félix Mora-Camino
Abstract:
The present work deals with the dynamic placement of a set of pursuers and a set of relay devices so that the mean distance to a set of moving targets is minimized along a given period of time. The relay devices are here in charge of maintaining the communication between the pursuers. Moving targets, relay devices and pursuers are limited in their movements from one period to the next. The periodic problem is formulated as a linear quadratic programming model and a primal-dual neural network is proposed to solve from one stage to the next the current optimization problem. Moreover, the feasibility of the proposed approach is displayed through a numerical example.
The present work deals with the dynamic placement of a set of pursuers and a set of relay devices so that the mean distance to a set of moving targets is minimized along a given period of time. The relay devices are here in charge of maintaining the communication between the pursuers. Moving targets, relay devices and pursuers are limited in their movements from one period to the next. The periodic problem is formulated as a linear quadratic programming model and a primal-dual neural network is proposed to solve from one stage to the next the current optimization problem. Moreover, the feasibility of the proposed approach is displayed through a numerical example.
Machine Learning for multimedia applications
ES2013-13
Machine Learning and Content-Based Multimedia Retrieval
Philippe-Henri Gosselin, David Picard
Machine Learning and Content-Based Multimedia Retrieval
Philippe-Henri Gosselin, David Picard
ES2013-109
Learning associative spatiotemporal features with non-negative sparse coding
Thomas Guthier, Steve Gerges, Volker Willert, Julian Eggert
Learning associative spatiotemporal features with non-negative sparse coding
Thomas Guthier, Steve Gerges, Volker Willert, Julian Eggert
Abstract:
Motion features based on optical flow are very powerful in tasks such as the recognition of human actions or gestures. Usually, they are combined with gradient information to form a set of spatiotemporal features. However, humans can recognize gestures and actions and thus derive the implied motion out of static images alone. We model this associative recognition within a learned hierarchy of non-negative sparse coding layers. In the first stages, topology preserving gradient and motion features are processed separately. Afterwards, they are projected onto a combined inner representation, that is learned during the training phase. We show, that during recognition the learned, combined representation improves the recognition of human actions, even in the absence of explicit motion information.
Motion features based on optical flow are very powerful in tasks such as the recognition of human actions or gestures. Usually, they are combined with gradient information to form a set of spatiotemporal features. However, humans can recognize gestures and actions and thus derive the implied motion out of static images alone. We model this associative recognition within a learned hierarchy of non-negative sparse coding layers. In the first stages, topology preserving gradient and motion features are processed separately. Afterwards, they are projected onto a combined inner representation, that is learned during the training phase. We show, that during recognition the learned, combined representation improves the recognition of human actions, even in the absence of explicit motion information.
ES2013-111
Content-based image retrieval with hierarchical Gaussian Process bandits with self-organizing maps
Ksenia Konyushkova, Dorota Glowacka
Content-based image retrieval with hierarchical Gaussian Process bandits with self-organizing maps
Ksenia Konyushkova, Dorota Glowacka
Abstract:
A content-based image retrieval system based on relevance feedback is proposed. The system relies on an interactive search paradigm where at each round a user is presented with k images and selects the one closest to her target. The approach based on hierarchical Gaussian Process bandits is used to trade exploration and exploitation in presenting the images in each round. Experimental results show that the new approach compares favorably with previous work.
A content-based image retrieval system based on relevance feedback is proposed. The system relies on an interactive search paradigm where at each round a user is presented with k images and selects the one closest to her target. The approach based on hierarchical Gaussian Process bandits is used to trade exploration and exploitation in presenting the images in each round. Experimental results show that the new approach compares favorably with previous work.
Clustering
ES2013-95
Clustering the Vélib’ origin-destinations flows by means of Poisson mixture models
Andry Randriamanamihaga, Etienne Côme, Latifa Oukhellou, Gérard Govaert
Clustering the Vélib’ origin-destinations flows by means of Poisson mixture models
Andry Randriamanamihaga, Etienne Côme, Latifa Oukhellou, Gérard Govaert
Abstract:
Studies based on human mobility, including Bycicle Sharing System (BSS) traffic analysis, has expanded over the past few years. They give insight of the underlying urban phenomena linked to city dynamics. This paper presents a generative count-series model using Poisson mixtures to automatically analyse and find temporal-based partitions over the Vélib’ origin-destination (OD) flow-data. Such an approach may provide latent factors that reveal how regions of different usage interact over the time. More generally, the proposed methodology can be used to cluster edges of temporal valued graph with respect to their temporal profiles
Studies based on human mobility, including Bycicle Sharing System (BSS) traffic analysis, has expanded over the past few years. They give insight of the underlying urban phenomena linked to city dynamics. This paper presents a generative count-series model using Poisson mixtures to automatically analyse and find temporal-based partitions over the Vélib’ origin-destination (OD) flow-data. Such an approach may provide latent factors that reveal how regions of different usage interact over the time. More generally, the proposed methodology can be used to cluster edges of temporal valued graph with respect to their temporal profiles
ES2013-89
Delaunay simplices pruning based clustering
Octavio Razafindramanana, Gilles Venturini
Delaunay simplices pruning based clustering
Octavio Razafindramanana, Gilles Venturini
Abstract:
We introduce in this paper a new clustering method using the Delaunay triangulation of a set of points as an input. The proposed method is based on pruning away extra simplices of a triangulation accord- ing to a local heterogeneity measure which we introduce. This measure provides good clustering results as it yields to better inter-cluster simplices detection. Our introduced measure is evaluated on 2-D shape data set.
We introduce in this paper a new clustering method using the Delaunay triangulation of a set of points as an input. The proposed method is based on pruning away extra simplices of a triangulation accord- ing to a local heterogeneity measure which we introduce. This measure provides good clustering results as it yields to better inter-cluster simplices detection. Our introduced measure is evaluated on 2-D shape data set.
ES2013-69
Hierarchical and multiscale Mean Shift segmentation of population grids
Johanna Baro, Etienne Côme, Patrice Aknin, Olivier Bonin
Hierarchical and multiscale Mean Shift segmentation of population grids
Johanna Baro, Etienne Côme, Patrice Aknin, Olivier Bonin
Abstract:
The Mean Shift (MS) algorithm allows to identify clusters that are catchment areas of modes of a probability density function (pdf). We propose to use a multiscale and hierarchical implementation of the algorithm to process grid data of population and identify automatically urban centers and their dependant sub-centers through scales. The multiscale structure is obtained by increasing iteratively the bandwidth of the kernel used to define the pdf on which the MS algorithm works. This will induce a hierarchical structure over clusters since modes will merge together when the bandwidth parameter increases.
The Mean Shift (MS) algorithm allows to identify clusters that are catchment areas of modes of a probability density function (pdf). We propose to use a multiscale and hierarchical implementation of the algorithm to process grid data of population and identify automatically urban centers and their dependant sub-centers through scales. The multiscale structure is obtained by increasing iteratively the bandwidth of the kernel used to define the pdf on which the MS algorithm works. This will induce a hierarchical structure over clusters since modes will merge together when the bandwidth parameter increases.
ES2013-87
Bayesian non parametric inference of discrete valued networks
Laetitia Nouedoui, Pierre Latouche
Bayesian non parametric inference of discrete valued networks
Laetitia Nouedoui, Pierre Latouche
Abstract:
We present a non parametric bayesian inference strategy to automatically infer the number of classes during the clustering process of a discrete valued random network. Our methodology is related to the Dirichlet process mixture models and inference is performed using a Blocked Gibbs sampling procedure. Using simulated data, we show that our approach improves over competitive variational inference clustering methods.
We present a non parametric bayesian inference strategy to automatically infer the number of classes during the clustering process of a discrete valued random network. Our methodology is related to the Dirichlet process mixture models and inference is performed using a Blocked Gibbs sampling procedure. Using simulated data, we show that our approach improves over competitive variational inference clustering methods.
ES2013-20
ONP-MF: An Orthogonal Nonnegative Matrix Factorization Algorithm with Application to Clustering
Filippo Pompili, Nicolas Gillis, François Glineur, P.A. Absil
ONP-MF: An Orthogonal Nonnegative Matrix Factorization Algorithm with Application to Clustering
Filippo Pompili, Nicolas Gillis, François Glineur, P.A. Absil
Abstract:
Given a nonnegative matrix M, the orthogonal nonnegative matrix factorization (ONMF) problem consists in finding a nonnegative matrix $U$ and an orthogonal nonnegative matrix V such that the product UV is as close as possible to M in the sense of the Frobenius norm. The importance of ONMF comes from its tight connection with data clustering. In this paper, we propose a new ONMF method, called ONP-MF, and we show that it outperforms other clustering methods (including ONMF-based methods) in terms of accuracy on several datasets in text clustering and hyperspectral unmixing.
Given a nonnegative matrix M, the orthogonal nonnegative matrix factorization (ONMF) problem consists in finding a nonnegative matrix $U$ and an orthogonal nonnegative matrix V such that the product UV is as close as possible to M in the sense of the Frobenius norm. The importance of ONMF comes from its tight connection with data clustering. In this paper, we propose a new ONMF method, called ONP-MF, and we show that it outperforms other clustering methods (including ONMF-based methods) in terms of accuracy on several datasets in text clustering and hyperspectral unmixing.
ES2013-113
Linear spectral hashing
Zalán Bodó, Lehel Csato
Linear spectral hashing
Zalán Bodó, Lehel Csato
Abstract:
Spectral hashing assigns binary hash keys to data points. This is accomplished via thresholding the eigenvectors of the graph Laplacian and obtaining binary codewords. While calculation for inputs in the training set is straightforward, an intriguing and difficult problem is how to compute the hash codewords for unseen data. A second problem we address is the computational difficulties when using the Gaussian similarity measure in spectral hashing: for specific problems -- mainly the processing of large text databases -- we propose linear scalar products as similarity measures and analyze the performance of the algorithm. We implement the linear algorithm and provide an inductive -- generative -- formula that leads to a prediction method similar to locality-sensitive hashing for a new data point. Experiments on document retrieval show promising results.
Spectral hashing assigns binary hash keys to data points. This is accomplished via thresholding the eigenvectors of the graph Laplacian and obtaining binary codewords. While calculation for inputs in the training set is straightforward, an intriguing and difficult problem is how to compute the hash codewords for unseen data. A second problem we address is the computational difficulties when using the Gaussian similarity measure in spectral hashing: for specific problems -- mainly the processing of large text databases -- we propose linear scalar products as similarity measures and analyze the performance of the algorithm. We implement the linear algorithm and provide an inductive -- generative -- formula that leads to a prediction method similar to locality-sensitive hashing for a new data point. Experiments on document retrieval show promising results.
ES2013-90
Normalized cuts clustering with prior knowledge and a pre-clustering stage
Diego Peluffo-Ordoñez, Andrés Eduardo Castro-Ospina, Diego Chavez-Chamorro, Carlos Daniel Acosta-Medina, Germán Castellanos-Dominguez
Normalized cuts clustering with prior knowledge and a pre-clustering stage
Diego Peluffo-Ordoñez, Andrés Eduardo Castro-Ospina, Diego Chavez-Chamorro, Carlos Daniel Acosta-Medina, Germán Castellanos-Dominguez
Abstract:
Clustering is of interest in cases when data are not labeled enough and a prior training stage is unfeasible. In particular, spectral clustering based on graph partitioning is of interest to solve problems with highly non-linearly separable classes. However, spectral methods, such as the well-known normalized cuts, involve the computation of eigenvectors that is a highly time-consuming task in case of large data. In this work, we propose an alternative to solve the normalized cuts problem for clustering, achieving same results as conventional spectral methods but spending less processing time. Our method consists of a heuristic search to find the best cluster binary indicator matrix, in such a way that each pair of nodes with greater similarity value are first grouped and the remaining nodes are clustered following a heuristic algorithm to search into the similarity-based representation space. The proposed method is tested over a public domain image data set. Results show that our method reaches comparable results with a lower computational cost.
Clustering is of interest in cases when data are not labeled enough and a prior training stage is unfeasible. In particular, spectral clustering based on graph partitioning is of interest to solve problems with highly non-linearly separable classes. However, spectral methods, such as the well-known normalized cuts, involve the computation of eigenvectors that is a highly time-consuming task in case of large data. In this work, we propose an alternative to solve the normalized cuts problem for clustering, achieving same results as conventional spectral methods but spending less processing time. Our method consists of a heuristic search to find the best cluster binary indicator matrix, in such a way that each pair of nodes with greater similarity value are first grouped and the remaining nodes are clustered following a heuristic algorithm to search into the similarity-based representation space. The proposed method is tested over a public domain image data set. Results show that our method reaches comparable results with a lower computational cost.
ES2013-33
Network community detection with edge classifiers trained on LFR graphs
Twan van Laarhoven, Elena Marchiori
Network community detection with edge classifiers trained on LFR graphs
Twan van Laarhoven, Elena Marchiori
Abstract:
A popular method for generating graphs with known community structure is the Lancichinetti-Fortunato-Radicchi (LFR) model. This paper investigates the use of LFR graphs as training data for learning classifiers that discriminates between edges that are 'within' a community and 'between' network communities. We trained linear edge-wise weighted support vector machine classifiers on LFR graphs generated with different amounts of mixing between communities. Results of a comparative experimental analysis show that a classifier trained on a graph with more mixing also work well when tested on LFR benchmark graphs generated using less mixing, while it achieves mixed performance on real-life networks, with a tendency towards finding many communities.
A popular method for generating graphs with known community structure is the Lancichinetti-Fortunato-Radicchi (LFR) model. This paper investigates the use of LFR graphs as training data for learning classifiers that discriminates between edges that are 'within' a community and 'between' network communities. We trained linear edge-wise weighted support vector machine classifiers on LFR graphs generated with different amounts of mixing between communities. Results of a comparative experimental analysis show that a classifier trained on a graph with more mixing also work well when tested on LFR benchmark graphs generated using less mixing, while it achieves mixed performance on real-life networks, with a tendency towards finding many communities.
Regression and forecasting
ES2013-81
Decoding stimulation intensity from evoked ECoG activity using support vector regression
Armin Walter, Georgios Naros, Martin Spüler, Alireza Gharabaghi, Wolfgang Rosenstiel, Martin Bogdan
Decoding stimulation intensity from evoked ECoG activity using support vector regression
Armin Walter, Georgios Naros, Martin Spüler, Alireza Gharabaghi, Wolfgang Rosenstiel, Martin Bogdan
Abstract:
One of the unsolved problems of the application of cortical stimulation for therapeutic means is the selection of optimal stimulation parameters. Using support vector regression, we demonstrate that the intensity of single pulse electrical stimulation can be decoded from the waveform of the evoked electrocorticographic (ECoG) activity, even if intensities used for training and testing of the regression model are disjoint. This was most effective when stimulation was applied directly over the motor cortex, less so for pre-motor and sensory cortex. Thus, if the optimal shape of the evoked neural response to stimulation is known, a regression model trained on the responses to a small set of stimulation intensities could be sufficient to determine the optimal stimulation intensity.
One of the unsolved problems of the application of cortical stimulation for therapeutic means is the selection of optimal stimulation parameters. Using support vector regression, we demonstrate that the intensity of single pulse electrical stimulation can be decoded from the waveform of the evoked electrocorticographic (ECoG) activity, even if intensities used for training and testing of the regression model are disjoint. This was most effective when stimulation was applied directly over the motor cortex, less so for pre-motor and sensory cortex. Thus, if the optimal shape of the evoked neural response to stimulation is known, a regression model trained on the responses to a small set of stimulation intensities could be sufficient to determine the optimal stimulation intensity.
ES2013-98
Neurally imprinted stable vector fields
Andre Lemme, Klaus Neumann, Felix Reinhart, Jochen Steil
Neurally imprinted stable vector fields
Andre Lemme, Klaus Neumann, Felix Reinhart, Jochen Steil
Abstract:
We present a novel learning scheme to imprint stable vector fields into Extreme Learning Machines (ELMs). The networks represent movements, where asymptotic stability is incorporated through constraints derived from a Lyapunov function. We show that our approach successfully performs stable and smooth point-to-point movements learned from human handwriting movements.
We present a novel learning scheme to imprint stable vector fields into Extreme Learning Machines (ELMs). The networks represent movements, where asymptotic stability is incorporated through constraints derived from a Lyapunov function. We show that our approach successfully performs stable and smooth point-to-point movements learned from human handwriting movements.
ES2013-79
Ensembles of genetically trained artificial neural networks for survival analysis
Jonas Kalderstam, Patrik Edén, Mattias Ohlsson
Ensembles of genetically trained artificial neural networks for survival analysis
Jonas Kalderstam, Patrik Edén, Mattias Ohlsson
Abstract:
We have developed a prognostic index model for survival data based on an ensemble of artificial neural networks that optimizes directly on the concordance index. Approximations of the c-index are avoided with the use of a genetic algorithm, which does not require gradient information. The model is compared with Cox proportional hazards (COX) and three support vector machine (SVM) models by Van Belle et al. on two clinical data sets, and only with COX on one artificial data set. Results indicate comparable performance to COX and SVM models on clinical data and superior performance compared to COX on non-linear data.
We have developed a prognostic index model for survival data based on an ensemble of artificial neural networks that optimizes directly on the concordance index. Approximations of the c-index are avoided with the use of a genetic algorithm, which does not require gradient information. The model is compared with Cox proportional hazards (COX) and three support vector machine (SVM) models by Van Belle et al. on two clinical data sets, and only with COX on one artificial data set. Results indicate comparable performance to COX and SVM models on clinical data and superior performance compared to COX on non-linear data.
ES2013-51
Optimization of Gaussian process hyperparameters using Rprop
Manuel Blum, Martin Riedmiller
Optimization of Gaussian process hyperparameters using Rprop
Manuel Blum, Martin Riedmiller
Abstract:
Gaussian processes are a powerful tool for non-parametric regression. Training can be realized by maximizing the likelihood of the data given the model. We show that Rprop, a fast and accurate gradient-based optimization technique originally designed for neural network learning, can outperform more elaborate unconstrained optimization methods on real world data sets, where it is able to converge more quickly and reliably to the optimal solution.
Gaussian processes are a powerful tool for non-parametric regression. Training can be realized by maximizing the likelihood of the data given the model. We show that Rprop, a fast and accurate gradient-based optimization technique originally designed for neural network learning, can outperform more elaborate unconstrained optimization methods on real world data sets, where it is able to converge more quickly and reliably to the optimal solution.
ES2013-35
Are Rosenblatt multilayer perceptrons more powerfull than sigmoidal multilayer perceptrons? From a counter example to a general result
Jose Fonseca
Are Rosenblatt multilayer perceptrons more powerfull than sigmoidal multilayer perceptrons? From a counter example to a general result
Jose Fonseca
Abstract:
In the eighties the problem of the lack of an efficient algorithm to train multilayer Rosenblatt perceptrons was solved by sigmoidal neural networks and backpropagation. But should we still try to find an efficient algorithm to train multilayer hardlimit neuronal networks, a task known as a NP-Complete problem? In this work we show that this would not be a waste of time by means of a counter example where a two layer Rosenblatt perceptron with 21 neurons showed much more computational power than a sigmoidal feedforward two layer neural network with 300 neurons trained by backpropagation for the same classification problem. We show why the synthesis of logical functions with threshold gates or hardlimit perceptrons is an active research area in VLSI design and nanotechnology and we review some of the methods to synthesize logical functions with a multilayer hardlimit perceptron and we propose the search for an efficient method to synthesize any classification problem with analogical inputs with a two layer hardlimit perceptron as a near future objective. Nevertheless we recognize that with hardlimit multilayer perceptrons we cannot approximate continuous functions as we can easily do with multilayer sigmoidal neural networks, with multilayer hardlimit perceptrons we can only solve any classification problem, as we plan to demonstrate in a near future.
In the eighties the problem of the lack of an efficient algorithm to train multilayer Rosenblatt perceptrons was solved by sigmoidal neural networks and backpropagation. But should we still try to find an efficient algorithm to train multilayer hardlimit neuronal networks, a task known as a NP-Complete problem? In this work we show that this would not be a waste of time by means of a counter example where a two layer Rosenblatt perceptron with 21 neurons showed much more computational power than a sigmoidal feedforward two layer neural network with 300 neurons trained by backpropagation for the same classification problem. We show why the synthesis of logical functions with threshold gates or hardlimit perceptrons is an active research area in VLSI design and nanotechnology and we review some of the methods to synthesize logical functions with a multilayer hardlimit perceptron and we propose the search for an efficient method to synthesize any classification problem with analogical inputs with a two layer hardlimit perceptron as a near future objective. Nevertheless we recognize that with hardlimit multilayer perceptrons we cannot approximate continuous functions as we can easily do with multilayer sigmoidal neural networks, with multilayer hardlimit perceptrons we can only solve any classification problem, as we plan to demonstrate in a near future.
ES2013-92
Detection and quantification in real-time polymerase chain reaction
Abou KEITA, Romain HERAULT, Colas CALBRIX, Stéphane Canu
Detection and quantification in real-time polymerase chain reaction
Abou KEITA, Romain HERAULT, Colas CALBRIX, Stéphane Canu
Abstract:
The estimation of the concentration of an infectious agent in the environment is a key step to trigger an alert when there is a biological threat. This concentration can be obtained trough a quantitative polymerase chain reaction (qPCR). Nevertheless, standard real-time procedure do not address detection delay which is a main concern in alert triggering. Therefore, we propose a method based on Lasso regression and CUSUM change detection to accurately estimate the concentration while minimizing the detection delay. We compare our results with those found by a standard method (threshold method) and promising results are obtained.
The estimation of the concentration of an infectious agent in the environment is a key step to trigger an alert when there is a biological threat. This concentration can be obtained trough a quantitative polymerase chain reaction (qPCR). Nevertheless, standard real-time procedure do not address detection delay which is a main concern in alert triggering. Therefore, we propose a method based on Lasso regression and CUSUM change detection to accurately estimate the concentration while minimizing the detection delay. We compare our results with those found by a standard method (threshold method) and promising results are obtained.
ES2013-43
Temperature Forecast in Buildings Using Machine Learning Techniques
Fernando Mateo, Juan J. Carrasco, Mónica Millán-Giraldo, Abderrahim Sellami, Pablo Escandell-Montero, José M. Martínez-Martínez, Emilio Soria-Olivas
Temperature Forecast in Buildings Using Machine Learning Techniques
Fernando Mateo, Juan J. Carrasco, Mónica Millán-Giraldo, Abderrahim Sellami, Pablo Escandell-Montero, José M. Martínez-Martínez, Emilio Soria-Olivas
Abstract:
Energy efficiency in buildings requires having good prediction of the variables that define the power consumption in the building. Temperature is the most relevant of these variables because it affects the operation of the cooling systems in summer and the heating systems in winter, while being also the main variable that defines comfort. This paper presents the application of classical methods of time series forecasting, such as Autoregressive (AR), Multiple Linear Regression (MLR) and Robust MLR (RMLR) models, along with others derived from more complex machine learning techniques, including Multilayer Perceptron with Non-linear Autoregressive Exogenous (MLP-NARX) and Extreme Learning Machine (ELM), to forecast temperature in buildings. The results obtained in the temperature prediction of several rooms of a building show the goodness of machine learning methods as compared to traditional approaches.
Energy efficiency in buildings requires having good prediction of the variables that define the power consumption in the building. Temperature is the most relevant of these variables because it affects the operation of the cooling systems in summer and the heating systems in winter, while being also the main variable that defines comfort. This paper presents the application of classical methods of time series forecasting, such as Autoregressive (AR), Multiple Linear Regression (MLR) and Robust MLR (RMLR) models, along with others derived from more complex machine learning techniques, including Multilayer Perceptron with Non-linear Autoregressive Exogenous (MLP-NARX) and Extreme Learning Machine (ELM), to forecast temperature in buildings. The results obtained in the temperature prediction of several rooms of a building show the goodness of machine learning methods as compared to traditional approaches.
ES2013-121
Forecasting Financial Markets with Classified Tactical Signals
Patrick Kouontchou, Amaury Lendasse, Yoan Miché, Bertrand Maillet
Forecasting Financial Markets with Classified Tactical Signals
Patrick Kouontchou, Amaury Lendasse, Yoan Miché, Bertrand Maillet
Abstract:
The financial market dynamics can be characterized by macro-economic, micro-financial and market risk indicators, used as lead- ing indicators by market professionals. In this article, we propose a method to identify market states integrating two classification algorithms: a Robust Kohonen Self-Organising Maps one and a CART one. After studying the market’s states separation using the former, we use the latter to characterize the economic conditions over time and to compute the conditional probabilities of related market states.
The financial market dynamics can be characterized by macro-economic, micro-financial and market risk indicators, used as lead- ing indicators by market professionals. In this article, we propose a method to identify market states integrating two classification algorithms: a Robust Kohonen Self-Organising Maps one and a CART one. After studying the market’s states separation using the former, we use the latter to characterize the economic conditions over time and to compute the conditional probabilities of related market states.
Developments in kernel design
ES2013-29
A quotient basis kernel for the prediction of mortality in severe sepsis patients
Vicent Ribas Ripoll, Enrique Romero, Juan Carlos Ruiz-Rodríguez, Alfredo Vellido
A quotient basis kernel for the prediction of mortality in severe sepsis patients
Vicent Ribas Ripoll, Enrique Romero, Juan Carlos Ruiz-Rodríguez, Alfredo Vellido
Abstract:
In this paper, we describe a novel kernel for multinomial distributions, namely the Quotient Basis Kernel (QBK), which is based on a suitable reparametrization of the input space through algebraic geometry and statistics. The QBK is used here for data transformation prior to classification in a medical problem concerning the prediction of mortality in patients suffering severe sepsis. This is a common clinical syndrome, often treated at the Intensive Care Unit (ICU) in a time-critical context. Mortality prediction results with Support Vector Machines using QBK compare favorably with those obtained using alternative kernels and standard clinical procedures.
In this paper, we describe a novel kernel for multinomial distributions, namely the Quotient Basis Kernel (QBK), which is based on a suitable reparametrization of the input space through algebraic geometry and statistics. The QBK is used here for data transformation prior to classification in a medical problem concerning the prediction of mortality in patients suffering severe sepsis. This is a common clinical syndrome, often treated at the Intensive Care Unit (ICU) in a time-critical context. Mortality prediction results with Support Vector Machines using QBK compare favorably with those obtained using alternative kernels and standard clinical procedures.
ES2013-103
Synthetic over-sampling in the empirical feature space
María Pérez-Ortiz, Pedro A. Gutiérrez, César Hervás-Martínez
Synthetic over-sampling in the empirical feature space
María Pérez-Ortiz, Pedro A. Gutiérrez, César Hervás-Martínez
Abstract:
The imbalanced nature of some real-world data is one of the current challenges for machine learning, giving rise to different approaches to handling it. However, preprocessing methods operate in the original input space, presenting distortions when combined with the kernel classifiers, which make use of the feature space. This paper explores the notion of empirical feature space (a Euclidean space which is isomorphic to the feature space) to develop a kernel-based synthetic over-sampling technique, which maintains the main properties of the kernel mapping. The proposal achieves better results than the same oversampling method applied to the original input space.
The imbalanced nature of some real-world data is one of the current challenges for machine learning, giving rise to different approaches to handling it. However, preprocessing methods operate in the original input space, presenting distortions when combined with the kernel classifiers, which make use of the feature space. This paper explores the notion of empirical feature space (a Euclidean space which is isomorphic to the feature space) to develop a kernel-based synthetic over-sampling technique, which maintains the main properties of the kernel mapping. The proposal achieves better results than the same oversampling method applied to the original input space.
ES2013-21
Multi-scale Support Vector Machine Optimization by Kernel Target-Alignment
María Pérez-Ortiz, Pedro A. Gutiérrez, Javier Sánchez-Monedero, César Hervás-Martínez
Multi-scale Support Vector Machine Optimization by Kernel Target-Alignment
María Pérez-Ortiz, Pedro A. Gutiérrez, Javier Sánchez-Monedero, César Hervás-Martínez
Abstract:
The problem considered is the optimization of a multi-scale kernel, where a different width is chosen for each feature. This idea has been barely studied in the literature, and through the use of evolutionary or gradient descent approaches, which explicitly train the learning machine and thereby incur high computacional cost. To cope with this limitation, the problem is explored by making use of an analytical methodology known as kernel-target alignment, where the kernel is optimized by aligning it to the so-called ideal kernel matrix. The results show that the proposal leads to better performance and simpler models at limited computational cost when applying the binary Support Vector Machine (SVM) paradigm.
The problem considered is the optimization of a multi-scale kernel, where a different width is chosen for each feature. This idea has been barely studied in the literature, and through the use of evolutionary or gradient descent approaches, which explicitly train the learning machine and thereby incur high computacional cost. To cope with this limitation, the problem is explored by making use of an analytical methodology known as kernel-target alignment, where the kernel is optimized by aligning it to the so-called ideal kernel matrix. The results show that the proposal leads to better performance and simpler models at limited computational cost when applying the binary Support Vector Machine (SVM) paradigm.
ES2013-105
Handling missing values in kernel methods with application to microbiology data
Vladimer Kobayashi, Tomas Aluja, Lluís Belanche
Handling missing values in kernel methods with application to microbiology data
Vladimer Kobayashi, Tomas Aluja, Lluís Belanche
Abstract:
We discuss several approaches that make possible for kernel methods to deal with missing values. The first two are extended kernels able to handle missing values without data preprocessing methods. Another two methods are derived from a sophisticated multiple imputation technique involving logistic regression as local model learner. The performance of these approaches is compared using a binary data set that arises typically in microbiology (the microbial source tracking problem). Our results show that the kernel extensions demonstrate competitive performance in comparison with multiple imputation in terms of predictive accuracy. However, these results are achieved with a simpler and deterministic methodology and entail a much lower computational effort.
We discuss several approaches that make possible for kernel methods to deal with missing values. The first two are extended kernels able to handle missing values without data preprocessing methods. Another two methods are derived from a sophisticated multiple imputation technique involving logistic regression as local model learner. The performance of these approaches is compared using a binary data set that arises typically in microbiology (the microbial source tracking problem). Our results show that the kernel extensions demonstrate competitive performance in comparison with multiple imputation in terms of predictive accuracy. However, these results are achieved with a simpler and deterministic methodology and entail a much lower computational effort.
Human Activity and Motion Disorder Recognition: towards smarter Interactive Cognitive Environments
ES2013-11
Human Activity and Motion Disorder Recognition: towards smarter Interactive Cognitive Environments
Jorge Luis Reyes-Ortiz, Alessandro Ghio, Xavier Parra, Davide Anguita, Joan Cabestany, Andreu Català
Human Activity and Motion Disorder Recognition: towards smarter Interactive Cognitive Environments
Jorge Luis Reyes-Ortiz, Alessandro Ghio, Xavier Parra, Davide Anguita, Joan Cabestany, Andreu Català
Abstract:
The rise of ubiquitous computing systems in our environment is engendering a strong need of novel approaches of human-computer interaction. Either for extending the existing range of possibilities and services available to people or for providing assistance the ones with limited conditions. Human Activity Recognition (HAR) is playing a central role in this task by offering the input for the development of more interactive and cognitive environments. This has motivated the organization of the ESANN 2013 Special Session in Human Activity and Motion Disorder Recognition and the execution of a competition in HAR. Here, a compilation of the most recent proposals in the area are exposed accompanied by the results of the contest calling for innovative approaches to recognize activities of daily living (ADL) from a recently published data set.
The rise of ubiquitous computing systems in our environment is engendering a strong need of novel approaches of human-computer interaction. Either for extending the existing range of possibilities and services available to people or for providing assistance the ones with limited conditions. Human Activity Recognition (HAR) is playing a central role in this task by offering the input for the development of more interactive and cognitive environments. This has motivated the organization of the ESANN 2013 Special Session in Human Activity and Motion Disorder Recognition and the execution of a competition in HAR. Here, a compilation of the most recent proposals in the area are exposed accompanied by the results of the contest calling for innovative approaches to recognize activities of daily living (ADL) from a recently published data set.
ES2013-57
A heterogeneous database for movement knowledge extraction in Parkinson’s disease
Albert Samà, Carlos Pérez-López, Daniel Rodríguez-Martín, Joan Cabestany, Juan Manuel Moreno-Arostegui, Alejandro Rodríguez-Molinero
A heterogeneous database for movement knowledge extraction in Parkinson’s disease
Albert Samà, Carlos Pérez-López, Daniel Rodríguez-Martín, Joan Cabestany, Juan Manuel Moreno-Arostegui, Alejandro Rodríguez-Molinero
Abstract:
This paper presents the design and methodology used to create a heterogeneous database for knowledge movement extraction in Parkinson's Disease. This database is being constructed as part of REMPARK project and is composed of movement measurements acquired from inertial sensors, standard medical scales as Unified Parkinson's Disease Rating Scale, and other information obtained from 90 Parkinson's Disease patients. The signals obtained will be used to create movement disorder detection algorithms using supervised learning techniques. The different sources of information and the need of labelled data pose many challenges which the methodology described in this paper addresses. Some preliminary data obtained are presented.
This paper presents the design and methodology used to create a heterogeneous database for knowledge movement extraction in Parkinson's Disease. This database is being constructed as part of REMPARK project and is composed of movement measurements acquired from inertial sensors, standard medical scales as Unified Parkinson's Disease Rating Scale, and other information obtained from 90 Parkinson's Disease patients. The signals obtained will be used to create movement disorder detection algorithms using supervised learning techniques. The different sources of information and the need of labelled data pose many challenges which the methodology described in this paper addresses. Some preliminary data obtained are presented.
ES2013-71
Long term analysis of daily activities in smart home
Labiba Gillani Fahad, Arshad Ali, Muttukrishnan Rajarajan
Long term analysis of daily activities in smart home
Labiba Gillani Fahad, Arshad Ali, Muttukrishnan Rajarajan
Abstract:
In this paper, we propose the approach to monitor a change in the daily routine of a person using the long term analysis of the activities performed in a smart home. The proposed approach comprises of two steps; first is the activity recognition, in which the newly detected activity instances are labeled using the learning model probabilistic neural network. In the second step, the daily routine of the occupant in the smart home is analyzed by exploiting the group of activities of a day performed over a period of time. We apply K-means clustering to separate the normal routine to unusual and suspected routines. The proposed approach is validated on a publicly available dataset.
In this paper, we propose the approach to monitor a change in the daily routine of a person using the long term analysis of the activities performed in a smart home. The proposed approach comprises of two steps; first is the activity recognition, in which the newly detected activity instances are labeled using the learning model probabilistic neural network. In the second step, the daily routine of the occupant in the smart home is analyzed by exploiting the group of activities of a day performed over a period of time. We apply K-means clustering to separate the normal routine to unusual and suspected routines. The proposed approach is validated on a publicly available dataset.
ES2013-76
Sensor Positioning for Activity Recognition Using Multiple Accelerometer-Based Sensors
Lei Gao, Alan Bourke, John Nelson
Sensor Positioning for Activity Recognition Using Multiple Accelerometer-Based Sensors
Lei Gao, Alan Bourke, John Nelson
Abstract:
Physical activity has a positive impact on people’s well-being and it can decrease the occurrence of chronic disease. To date, there has been a substantial amount of research studies, which focus on activity recognition using accelerometer and gyroscope-based sensors. However, the sensor position and the sensor combination, which have the best recognition performance with minimum sensor number, have not been investigated enough. This study proposes a method to adopt multiple accelerometer-based sensors on different body locations to investigate this problem. The dataset was collected in a study conducted by the eCAALYX project. Eight subjects were recruited to perform eight normal scripted activities in different life scenarios, and each repeated three times. Thus a total of 192 activities were recorded. The collected dataset was used to find the most suitable sensor-subset for recognizing Activities of Daily Living (ADLs).
Physical activity has a positive impact on people’s well-being and it can decrease the occurrence of chronic disease. To date, there has been a substantial amount of research studies, which focus on activity recognition using accelerometer and gyroscope-based sensors. However, the sensor position and the sensor combination, which have the best recognition performance with minimum sensor number, have not been investigated enough. This study proposes a method to adopt multiple accelerometer-based sensors on different body locations to investigate this problem. The dataset was collected in a study conducted by the eCAALYX project. Eight subjects were recruited to perform eight normal scripted activities in different life scenarios, and each repeated three times. Thus a total of 192 activities were recorded. The collected dataset was used to find the most suitable sensor-subset for recognizing Activities of Daily Living (ADLs).
ES2013-88
Multi-user Blood Alcohol Content estimation in a realistic simulator using Artificial Neural Networks and Support Vector Machines
Audrey Robinel, Didier Puzenat
Multi-user Blood Alcohol Content estimation in a realistic simulator using Artificial Neural Networks and Support Vector Machines
Audrey Robinel, Didier Puzenat
Abstract:
We instrumented a realistic car simulator to extract low level data related to the driver's use of the vehicle controls. After proceeding these data, we generated features that were fed to a Multi-Layer Perceptron (MLP) and Support Vector Machines (SVM) in order to determine weather the driver was over a blood alcohol content threshold, and even estimate the BAC value. We discuss the results of the prototype using the MLP and SVM (or SVR) algorithms in both single-user and multi-user context.
We instrumented a realistic car simulator to extract low level data related to the driver's use of the vehicle controls. After proceeding these data, we generated features that were fed to a Multi-Layer Perceptron (MLP) and Support Vector Machines (SVM) in order to determine weather the driver was over a blood alcohol content threshold, and even estimate the BAC value. We discuss the results of the prototype using the MLP and SVM (or SVR) algorithms in both single-user and multi-user context.
ES2013-84
A Public Domain Dataset for Human Activity Recognition using Smartphones
Davide Anguita, Alessandro Ghio, Luca Oneto, Xavier Parra, Jorge Luis Reyes-Ortiz
A Public Domain Dataset for Human Activity Recognition using Smartphones
Davide Anguita, Alessandro Ghio, Luca Oneto, Xavier Parra, Jorge Luis Reyes-Ortiz
Abstract:
Human-centered computing is an emerging research field that aims to understand human behavior and integrate users and their social context with computer systems. One of the most recent, challenging and appealing applications in this framework consists in sensing human body motion using smartphones to gather context information about people actions. In this context, we describe in this work an Activity Recognition database, built from the recordings of 30 subjects doing Activities of Daily Living (ADL) while carrying a waist-mounted smartphone with embedded inertial sensors, which is released to public domain on a well-known on-line repository. Results, obtained on the dataset by exploiting a multiclass Support Vector Machine (SVM), are also acknowledged.
Human-centered computing is an emerging research field that aims to understand human behavior and integrate users and their social context with computer systems. One of the most recent, challenging and appealing applications in this framework consists in sensing human body motion using smartphones to gather context information about people actions. In this context, we describe in this work an Activity Recognition database, built from the recordings of 30 subjects doing Activities of Daily Living (ADL) while carrying a waist-mounted smartphone with embedded inertial sensors, which is released to public domain on a well-known on-line repository. Results, obtained on the dataset by exploiting a multiclass Support Vector Machine (SVM), are also acknowledged.
ES2013-124
A One-Vs-One Classifier Ensemble With Majority Voting for Activity Recognition
Bernardino Romera-Paredes, M. S. H. Aung, Nadia Bianchi-Berthouze
A One-Vs-One Classifier Ensemble With Majority Voting for Activity Recognition
Bernardino Romera-Paredes, M. S. H. Aung, Nadia Bianchi-Berthouze
Abstract:
A solution for the automated recognition of six full body motion activities is proposed. This problem is posed by the release of the Activity Recognition database and forms the basis for a classification competition at the European Symposium on Artificial Neural Networks 2013. The data-set consists of motion characteristics of thirty subjects captured using a single device delivering accelerometric and gyroscopic data. Included in the released data-set are 561 processed features in both the time and frequency domains. The proposed recognition framework consists of an ensemble of linear support vector machines each trained to discriminate a single motion activity against another single activity. A majority voting rule is used to determine the final outcome. For comparison, a six "winner take all" multiclass support vector machine ensemble and k- Nearest Neighbour models were also implemented. Results show that the system accuracy for the one versus one ensemble is 96.4% for the competition test set. Similarly, the multiclass SVM ensemble and k-Nearest Neighbour returned accuracies of 93.7% and 90.6% respectively. The outcomes of the one versus one method were submitted to the competition resulting in the winning solution.
A solution for the automated recognition of six full body motion activities is proposed. This problem is posed by the release of the Activity Recognition database and forms the basis for a classification competition at the European Symposium on Artificial Neural Networks 2013. The data-set consists of motion characteristics of thirty subjects captured using a single device delivering accelerometric and gyroscopic data. Included in the released data-set are 561 processed features in both the time and frequency domains. The proposed recognition framework consists of an ensemble of linear support vector machines each trained to discriminate a single motion activity against another single activity. A majority voting rule is used to determine the final outcome. For comparison, a six "winner take all" multiclass support vector machine ensemble and k- Nearest Neighbour models were also implemented. Results show that the system accuracy for the one versus one ensemble is 96.4% for the competition test set. Similarly, the multiclass SVM ensemble and k-Nearest Neighbour returned accuracies of 93.7% and 90.6% respectively. The outcomes of the one versus one method were submitted to the competition resulting in the winning solution.
ES2013-123
A sparse kernelized matrix learning vector quantization model for human activity recognition
Marika Kästner, Marc Strickert, Thomas Villmann
A sparse kernelized matrix learning vector quantization model for human activity recognition
Marika Kästner, Marc Strickert, Thomas Villmann
Abstract:
The contribution describes the application of the 'Computational Intelligence Group' from the University of Applied Sciences Mittweida (Germany) to the ESANN'2013 Competition on 'Human Activity Recognition (HAR)' using Android-OS smartphone sensor signals. We applied a kernel variant of learning vector quantization with metric adaptation with only one prototype vector per class (sparse model). This model obtains very good accuracies and additionally provides class correlation information. Further, the model allows an optimized class visualization.
The contribution describes the application of the 'Computational Intelligence Group' from the University of Applied Sciences Mittweida (Germany) to the ESANN'2013 Competition on 'Human Activity Recognition (HAR)' using Android-OS smartphone sensor signals. We applied a kernel variant of learning vector quantization with metric adaptation with only one prototype vector per class (sparse model). This model obtains very good accuracies and additionally provides class correlation information. Further, the model allows an optimized class visualization.
ES2013-122
A competitive approach for human activity recognition on smartphones
Attila Reiss, Gustaf Hendeby, Didier Stricker
A competitive approach for human activity recognition on smartphones
Attila Reiss, Gustaf Hendeby, Didier Stricker
Abstract:
This paper describes a competitive approach developed for an activity recognition challenge. The competition was defined on a new and publicly available dataset of human activities, recorded with smartphone sensors. This work investigates different feature sets for the activity recognition task of the competition. Moreover, the focus is also on the introduction of a new, confidence-based boosting algorithm called ConfAdaBoost.M1. Results show that the new classification method outperforms commonly used classifiers, uch as decision trees or AdaBoost.M1.
This paper describes a competitive approach developed for an activity recognition challenge. The competition was defined on a new and publicly available dataset of human activities, recorded with smartphone sensors. This work investigates different feature sets for the activity recognition task of the competition. Moreover, the focus is also on the introduction of a new, confidence-based boosting algorithm called ConfAdaBoost.M1. Results show that the new classification method outperforms commonly used classifiers, uch as decision trees or AdaBoost.M1.
Classification
ES2013-104
A dictionary learning based method for aCGH segmentation
Salvatore Masecchia, Saverio Salzo, Annalisa Barla, Alessandro Verri
A dictionary learning based method for aCGH segmentation
Salvatore Masecchia, Saverio Salzo, Annalisa Barla, Alessandro Verri
Abstract:
The starting point of our work is to devise a model for segmentation of aCGH data. We propose an optimization method based on dictionary learning and regularization and we compare it with a state-of-the-art approach, presenting our experimental results on synthetic data.
The starting point of our work is to devise a model for segmentation of aCGH data. We propose an optimization method based on dictionary learning and regularization and we compare it with a state-of-the-art approach, presenting our experimental results on synthetic data.
ES2013-75
A Learning Machine with a Bit-Based Hypothesis Space
Davide Anguita, Alessandro Ghio, Luca Oneto, Sandro Ridella
A Learning Machine with a Bit-Based Hypothesis Space
Davide Anguita, Alessandro Ghio, Luca Oneto, Sandro Ridella
Abstract:
We propose in this paper a bit-based classifier, picked from an hypothesis space described accordingly to sparsity and locality principles: the complexity of the corresponding space of functions is controlled through the number of bits needed to represent it, so that it will include the classifiers that will be most likely chosen by the learning procedure. Through an introductory example, we show how the number of bits, the sparsity of the representation and the local definition approach affect the complexity of the space of functions, where the final classifier is selected from.
We propose in this paper a bit-based classifier, picked from an hypothesis space described accordingly to sparsity and locality principles: the complexity of the corresponding space of functions is controlled through the number of bits needed to represent it, so that it will include the classifiers that will be most likely chosen by the learning procedure. Through an introductory example, we show how the number of bits, the sparsity of the representation and the local definition approach affect the complexity of the space of functions, where the final classifier is selected from.
ES2013-65
Optimization by Variational Bounding
Joe Staines, David Barber
Optimization by Variational Bounding
Joe Staines, David Barber
Abstract:
We discuss a general technique that forms a differentiable bound on non-differentiable objective functions by bounding the function optimum by its expectation with respect to a parametric variational distribution. We describe sufficient conditions for the bound to be convex with respect to the variational parameters. As example applications we consider variants of sparse linear regression and SVM training.
We discuss a general technique that forms a differentiable bound on non-differentiable objective functions by bounding the function optimum by its expectation with respect to a parametric variational distribution. We describe sufficient conditions for the bound to be convex with respect to the variational parameters. As example applications we consider variants of sparse linear regression and SVM training.
ES2013-118
support vector machine-based aproach for multi-labelers problems
Santiago Murillo Rendón, Diego Peluffo-Ordoñez, Germán Castellanos-Dominguez
support vector machine-based aproach for multi-labelers problems
Santiago Murillo Rendón, Diego Peluffo-Ordoñez, Germán Castellanos-Dominguez
Abstract:
We propose a first approach to quantify the panelist's labeling generalizing a soft-margin support vector machine classifier to multi-label analysis. Such variation consist of formulating the optimization problem within a quadratic programming framework instead of using a heuristic search algorithm. Our method's outcomes are penalty or relevance values associated with each panelist, pointing out a well performing labeler when lower is its value. For experiments, two databases are considered. Firstly, the well-known Iris with multiple artificial labels. Secondly, a multi-label speech database for detecting hypernasality. Obtained penalty factors are compared with both standard supervised and non-supervised measurements. The results are promising to asses the concordance among panelists taking into account the structure of data.
We propose a first approach to quantify the panelist's labeling generalizing a soft-margin support vector machine classifier to multi-label analysis. Such variation consist of formulating the optimization problem within a quadratic programming framework instead of using a heuristic search algorithm. Our method's outcomes are penalty or relevance values associated with each panelist, pointing out a well performing labeler when lower is its value. For experiments, two databases are considered. Firstly, the well-known Iris with multiple artificial labels. Secondly, a multi-label speech database for detecting hypernasality. Obtained penalty factors are compared with both standard supervised and non-supervised measurements. The results are promising to asses the concordance among panelists taking into account the structure of data.
ES2013-115
Read classification for next generation sequencing
James Hogan, Peter Holland, Alex Holloway, Robert Petit, Timothy Read
Read classification for next generation sequencing
James Hogan, Peter Holland, Alex Holloway, Robert Petit, Timothy Read
Abstract:
Next Generation Sequencing (NGS) technologies have revolutionised molecular biology, allowing clinical sequencing to become a matter of routine, and a method of considerable diagnostic value. NGS data sets consist of short sequence reads obtained from the machine, given context and meaning through downstream assembly and annotation. For these techniques to operate successfully, it is necessary to ensure that the collected reads are consistent with the species or species group assumed, and not corrupted in some way. The bacterium Staphylococcus aureus is a common infectious agent in hospitals, causing severe and potentially life-threatening infections, with some strains exhibiting antibiotic resistance. In this paper, we apply a Support Vector Machine classifier to the important problem of distinguishing S. aureus sequencing projects from a range of alternatives, including other pathogens and closely related Staphylococci. Using a representation based on sequence k-mers of various lengths, we are able to make the correct prediction in over 95% of cases, while reporting almost no false positives, and implicating features with important functional associations in the bacterium.
Next Generation Sequencing (NGS) technologies have revolutionised molecular biology, allowing clinical sequencing to become a matter of routine, and a method of considerable diagnostic value. NGS data sets consist of short sequence reads obtained from the machine, given context and meaning through downstream assembly and annotation. For these techniques to operate successfully, it is necessary to ensure that the collected reads are consistent with the species or species group assumed, and not corrupted in some way. The bacterium Staphylococcus aureus is a common infectious agent in hospitals, causing severe and potentially life-threatening infections, with some strains exhibiting antibiotic resistance. In this paper, we apply a Support Vector Machine classifier to the important problem of distinguishing S. aureus sequencing projects from a range of alternatives, including other pathogens and closely related Staphylococci. Using a representation based on sequence k-mers of various lengths, we are able to make the correct prediction in over 95% of cases, while reporting almost no false positives, and implicating features with important functional associations in the bacterium.
ES2013-107
A new metric for dissimilarity data classification based on Support Vector Machines optimization
Agata Manolova, Anne Guerin-Dugue
A new metric for dissimilarity data classification based on Support Vector Machines optimization
Agata Manolova, Anne Guerin-Dugue
Abstract:
Dissimilarities are extremely useful in many real-world pattern classification problems, where the data resides in a complicated, complex space, and it can be very difficult, if not impossible, to find useful feature vector representations. In these cases a dissimilarity representation may be easier to come by. The goal of this work is to provide a new technique based on Support Vector Machines (SVM) optimization that can be a good alternative in terms of accuracy compared to known methods using dissimilarities such as k nearest neighbor classifier (kNN), prototype-based dissimilarity classifiers and distance kernel based SVM classifiers.
Dissimilarities are extremely useful in many real-world pattern classification problems, where the data resides in a complicated, complex space, and it can be very difficult, if not impossible, to find useful feature vector representations. In these cases a dissimilarity representation may be easier to come by. The goal of this work is to provide a new technique based on Support Vector Machines (SVM) optimization that can be a good alternative in terms of accuracy compared to known methods using dissimilarities such as k nearest neighbor classifier (kNN), prototype-based dissimilarity classifiers and distance kernel based SVM classifiers.
ES2013-61
DYNG: Dynamic Online Growing Neural Gas for stream data classification
Oliver Beyer, Philipp Cimiano
DYNG: Dynamic Online Growing Neural Gas for stream data classification
Oliver Beyer, Philipp Cimiano
Abstract:
In this paper we introduce Dynamic Online Growing Neural Gas (DYNG), a novel online stream data classification approach based on Online Growing Neural Gas (OGNG). DYNG exploits labelled data during processing to adapt the network structure as well as the speed of growth of the network to the requirements of the classification task. It thus speeds up learning for new classes/labels and dampens growth of the subnetwork representing the class once the class error converges. We show that this strategy is beneficial in life-long learning settings involving non-stationary data, giving DYNG an increased performance in highly non-stationary phases compared to OGNG.
In this paper we introduce Dynamic Online Growing Neural Gas (DYNG), a novel online stream data classification approach based on Online Growing Neural Gas (OGNG). DYNG exploits labelled data during processing to adapt the network structure as well as the speed of growth of the network to the requirements of the classification task. It thus speeds up learning for new classes/labels and dampens growth of the subnetwork representing the class once the class error converges. We show that this strategy is beneficial in life-long learning settings involving non-stationary data, giving DYNG an increased performance in highly non-stationary phases compared to OGNG.
ES2013-15
Prior knowledge in an end-user trainable machine vision framework
Klaas Dijkstra, Walter Jansen, Jaap van de Loosdrecht
Prior knowledge in an end-user trainable machine vision framework
Klaas Dijkstra, Walter Jansen, Jaap van de Loosdrecht
Abstract:
The increasing popularity of machine vision based solutions in common applications calls for a structured approach for incorporating the end user's domain knowledge and limiting the solution's dependency on expert knowledge. We propose a framework facilitating optimized classification results and will show several approaches in which prior knowledge of the solution is captured in a neural network or in a geometric pattern matcher. The methodology is applied to disc print reading for antibiotic susceptibility testing by disc diffusion. Results show that increased prior knowledge produces better classifiers, and that more thorough optimization is required to increase the accuracy of classifiers which use less prior knowledge.
The increasing popularity of machine vision based solutions in common applications calls for a structured approach for incorporating the end user's domain knowledge and limiting the solution's dependency on expert knowledge. We propose a framework facilitating optimized classification results and will show several approaches in which prior knowledge of the solution is captured in a neural network or in a geometric pattern matcher. The methodology is applied to disc print reading for antibiotic susceptibility testing by disc diffusion. Results show that increased prior knowledge produces better classifiers, and that more thorough optimization is required to increase the accuracy of classifiers which use less prior knowledge.
ES2013-59
Border sensitive fuzzy vector quantization in semi-supervised learning
Tina Geweniger, Marika Kästner, Thomas Villmann
Border sensitive fuzzy vector quantization in semi-supervised learning
Tina Geweniger, Marika Kästner, Thomas Villmann
Abstract:
We propose a semi-supervised fuzzy vector quantization method for the classification of incompletely labeled data. Since information contained within the structure of the data set should not be neglected, our method considers the whole data set during the learning process. In difference to known methods our approach uses neighborhood cooperativeness for stable prototype learning known from Neural Gas. Further improvement of the classification accuracy is achieved by including class border sensitivity inspired by Support Vector Machines again improved by neighborhood learning.
We propose a semi-supervised fuzzy vector quantization method for the classification of incompletely labeled data. Since information contained within the structure of the data set should not be neglected, our method considers the whole data set during the learning process. In difference to known methods our approach uses neighborhood cooperativeness for stable prototype learning known from Neural Gas. Further improvement of the classification accuracy is achieved by including class border sensitivity inspired by Support Vector Machines again improved by neighborhood learning.
ES2013-22
B-bleaching: Agile Overtraining Avoidance in the WiSARD Weightless Neural Classifier
Danilo Carvalho, Hugo Carneiro, Felipe França, Priscila Lima
B-bleaching: Agile Overtraining Avoidance in the WiSARD Weightless Neural Classifier
Danilo Carvalho, Hugo Carneiro, Felipe França, Priscila Lima
Abstract:
Weightless neural networks constitute a still not fully explored Machine Learning paradigm, even if its first model, WiSARD, is considered. Bleaching, an improvement on WiSARD's learning mechanism was recently proposed in order to avoid overtraining. Although presenting very good results in different application domains, the original sequential bleaching and its confidence modulation mechanisms still offer room for improvement. This paper presents a new variation of the bleaching mechanism and compares the three strategies performance on a complex domain, that of multilingual grammatical categorization. Experiments considered both number of iterations and accuracy. Results show that binary bleaching allows for a considerable improvement to number of iterations whilst not introducing loss of accuracy.
Weightless neural networks constitute a still not fully explored Machine Learning paradigm, even if its first model, WiSARD, is considered. Bleaching, an improvement on WiSARD's learning mechanism was recently proposed in order to avoid overtraining. Although presenting very good results in different application domains, the original sequential bleaching and its confidence modulation mechanisms still offer room for improvement. This paper presents a new variation of the bleaching mechanism and compares the three strategies performance on a complex domain, that of multilingual grammatical categorization. Experiments considered both number of iterations and accuracy. Results show that binary bleaching allows for a considerable improvement to number of iterations whilst not introducing loss of accuracy.
ES2013-101
WIPS: the WiSARD Indoor Positioning System
D.O. Cardoso, J. Gama, Massimo De Gregorio, Felipe França, Maurizio Giordano, Priscila Lima
WIPS: the WiSARD Indoor Positioning System
D.O. Cardoso, J. Gama, Massimo De Gregorio, Felipe França, Maurizio Giordano, Priscila Lima
Abstract:
In this paper, we present a WiSARD-based system facing the problem of Indoor Positioning (IP) by taking advantage of pervasively available infrastructures (WiFi Access Points – AP). The goal is to develop a system to be used to position users in indoor environments, such as: museums, malls, factories, offshore platforms etc. Based on the fingerprint approach, we show how the proposed weightless neural system provides very good results in terms of performance and positioning resolution. Both the approach to the problem and the system will be presented through two correlated experiments.
In this paper, we present a WiSARD-based system facing the problem of Indoor Positioning (IP) by taking advantage of pervasively available infrastructures (WiFi Access Points – AP). The goal is to develop a system to be used to position users in indoor environments, such as: museums, malls, factories, offshore platforms etc. Based on the fingerprint approach, we show how the proposed weightless neural system provides very good results in terms of performance and positioning resolution. Both the approach to the problem and the system will be presented through two correlated experiments.
ES2013-72
Cost-sensitive cascade graph neural networks
Van Tuc Nguyen, Ah Chung Tsoi, Markus Hagenbuchner
Cost-sensitive cascade graph neural networks
Van Tuc Nguyen, Ah Chung Tsoi, Markus Hagenbuchner
Abstract:
This paper introduces a novel cost sensitive approach to a cascade of Graph Neural Networks for learning from unbalanced data in the graph structured domain. The proposed method is shown to be very effective in addressing the un- desirable effects of unbalanced data distribution on learning systems. The proposed idea is based on a weighting mechanism which forces the network to encode mis- classified graphs (or nodes) more strongly. The idea is applied to Graph Neural Networks which are capable of encoding complex graph structured data. We evalu- ate the model through an application to a well known Web spam detection problem, and demonstrate that the general network performance is improved as a result.
This paper introduces a novel cost sensitive approach to a cascade of Graph Neural Networks for learning from unbalanced data in the graph structured domain. The proposed method is shown to be very effective in addressing the un- desirable effects of unbalanced data distribution on learning systems. The proposed idea is based on a weighting mechanism which forces the network to encode mis- classified graphs (or nodes) more strongly. The idea is applied to Graph Neural Networks which are capable of encoding complex graph structured data. We evalu- ate the model through an application to a well known Web spam detection problem, and demonstrate that the general network performance is improved as a result.
Sparsity for interpretation and visualization in inference models
ES2013-14
Research directions in interpretable machine learning models
Vanya Van Belle, Paulo Lisboa
Research directions in interpretable machine learning models
Vanya Van Belle, Paulo Lisboa
ES2013-3
Learning regression models with guaranteed error bounds
Clemens Otte
Learning regression models with guaranteed error bounds
Clemens Otte
Abstract:
The combination of a symbolic regression model with a residual Gaussian Process is proposed for providing an interpretable model with improved accuracy. While the learned symbolic model is highly interpretable the residual model usually is not. However, by limiting the output of the residual model to a defined range a worst-case guarantee can be given in the sense that the maximal deviation from the symbolic model is always below a defined limit. When ranking the accuracy and interpretability of several different approaches on the SARCOS data benchmark the proposed combination yields the best result.
The combination of a symbolic regression model with a residual Gaussian Process is proposed for providing an interpretable model with improved accuracy. While the learned symbolic model is highly interpretable the residual model usually is not. However, by limiting the output of the residual model to a defined range a worst-case guarantee can be given in the sense that the maximal deviation from the symbolic model is always below a defined limit. When ranking the accuracy and interpretability of several different approaches on the SARCOS data benchmark the proposed combination yields the best result.
ES2013-63
Sparse approximations for kernel learning vector quantization
Daniela Hofmann, Barbara Hammer
Sparse approximations for kernel learning vector quantization
Daniela Hofmann, Barbara Hammer
Abstract:
Various prototype based learning techniques have recently been extended to similarity data by means of kernelization. While state-of-the-art classification results can be achieved this way, kernelization loses one important property of prototype-based techniques: a representation of the solution in terms of few characteristic prototypes which can directly be inspected by experts. In this contribution, we introduce several different ways to obtain sparse representations for kernel learning vector quantization and compare its efficiency and performance in connection to the underlying data characteristics in diverse benchmark scenarios.
Various prototype based learning techniques have recently been extended to similarity data by means of kernelization. While state-of-the-art classification results can be achieved this way, kernelization loses one important property of prototype-based techniques: a representation of the solution in terms of few characteristic prototypes which can directly be inspected by experts. In this contribution, we introduce several different ways to obtain sparse representations for kernel learning vector quantization and compare its efficiency and performance in connection to the underlying data characteristics in diverse benchmark scenarios.
ES2013-36
Robust cartogram visualization of outliers in manifold learning
Alessandra Tosi, Alfredo Vellido
Robust cartogram visualization of outliers in manifold learning
Alessandra Tosi, Alfredo Vellido
Abstract:
Most real data sets contain atypical observations, often referred to as outliers. Their presence may have a negative impact in data modeling using machine learning. This is particularly the case in data density estimation approaches. Manifold learning techniques provide low-dimensional data representations, often oriented towards visualization. The visualization provided by density estimation manifold learning methods can be compromised by the presence of outliers. Recently, a cartogram-based representation of model-generated distortion was presented for nonlinear dimensionality reduction. Here, we investigate the impact of outliers on this visualization when using manifold learning techniques that behave robustly in their presence.
Most real data sets contain atypical observations, often referred to as outliers. Their presence may have a negative impact in data modeling using machine learning. This is particularly the case in data density estimation approaches. Manifold learning techniques provide low-dimensional data representations, often oriented towards visualization. The visualization provided by density estimation manifold learning methods can be compromised by the presence of outliers. Recently, a cartogram-based representation of model-generated distortion was presented for nonlinear dimensionality reduction. Here, we investigate the impact of outliers on this visualization when using manifold learning techniques that behave robustly in their presence.
ES2013-44
ManiSonS: A New Visualization Tool for Manifold Clustering
José M. Martínez-Martínez, Pablo Escandell-Montero, José D. Martín-Guerrero, Joan Vila-Francés, Emilio Soria-Olivas
ManiSonS: A New Visualization Tool for Manifold Clustering
José M. Martínez-Martínez, Pablo Escandell-Montero, José D. Martín-Guerrero, Joan Vila-Francés, Emilio Soria-Olivas
Abstract:
Manifold learning is an important theme in machine learning. This paper proposes a new visualization approach to manifold clustering. The method is based on pie charts in order to obtain meaningful visualizations of the clustering results when applying a manifold technique. In addition to this, the proposed approach extracts all the existing relationships among the attributes of the different clusters and find the most important variables of the manifold in order to distinguish among the different clusters. The methodology is tested in one synthetic data set and one real data set. Achieved results show the suitability and usefulness of the proposed approach.
Manifold learning is an important theme in machine learning. This paper proposes a new visualization approach to manifold clustering. The method is based on pie charts in order to obtain meaningful visualizations of the clustering results when applying a manifold technique. In addition to this, the proposed approach extracts all the existing relationships among the attributes of the different clusters and find the most important variables of the manifold in order to distinguish among the different clusters. The methodology is tested in one synthetic data set and one real data set. Achieved results show the suitability and usefulness of the proposed approach.
ES2013-83
Visualizing pay-per-view television customers churn using cartograms and flow maps
David L. García, Angela Nebot, Alfredo Vellido
Visualizing pay-per-view television customers churn using cartograms and flow maps
David L. García, Angela Nebot, Alfredo Vellido
Abstract:
Media companies aggressively compete for their share of the pay-per-view television market. Such share can only be kept or improved by avoiding customer defection, or churn. The analysis of customers' data should provide insight into customers' behavior over time and help preventing churn. Data visualization can be part of this analysis. Here, a database of pay-per-view television customers is visualized using a nonlinear manifold learning model. This visualization is enhanced through, first, the reintroduction of the local nonlinear distortion using a cartogram technique and, second, the visualization of customer migrations using flow maps. Both techniques are inspired by geographical representation.
Media companies aggressively compete for their share of the pay-per-view television market. Such share can only be kept or improved by avoiding customer defection, or churn. The analysis of customers' data should provide insight into customers' behavior over time and help preventing churn. Data visualization can be part of this analysis. Here, a database of pay-per-view television customers is visualized using a nonlinear manifold learning model. This visualization is enhanced through, first, the reintroduction of the local nonlinear distortion using a cartogram technique and, second, the visualization of customer migrations using flow maps. Both techniques are inspired by geographical representation.
ES2013-86
Visualizing dependencies of spectral features using mutual information
Andrej Gisbrecht, Yoan Miché, Barbara Hammer, Amaury Lendasse
Visualizing dependencies of spectral features using mutual information
Andrej Gisbrecht, Yoan Miché, Barbara Hammer, Amaury Lendasse
Abstract:
The curse of dimensionality leads to problems in machine learning when dealing with high dimensionality. This aspect is particularly pronounced if intrinsically infinite dimensionality is faced such as present for spectral or functional data. Feature selection constitutes one possibility to deal with this problem. Often, it relies on mutual information as an evaluation tool for the feature importance, however, it might be overlaid by intrinsic biases such as a high correlation of neighbored function values for functional data. In this paper we propose to asses feature correlations of spectral data by an overlay of prior dependencies due to the functional nature and its similarity as measured by mutual information, enabling a quick overall assessment of the relationships between features. By integrating the Nyström approximation technique, the usually time consuming step to compute all pairwise mutual informations can be reduced to only linear complexity in the number of features.
The curse of dimensionality leads to problems in machine learning when dealing with high dimensionality. This aspect is particularly pronounced if intrinsically infinite dimensionality is faced such as present for spectral or functional data. Feature selection constitutes one possibility to deal with this problem. Often, it relies on mutual information as an evaluation tool for the feature importance, however, it might be overlaid by intrinsic biases such as a high correlation of neighbored function values for functional data. In this paper we propose to asses feature correlations of spectral data by an overlay of prior dependencies due to the functional nature and its similarity as measured by mutual information, enabling a quick overall assessment of the relationships between features. By integrating the Nyström approximation technique, the usually time consuming step to compute all pairwise mutual informations can be reduced to only linear complexity in the number of features.