Current Research
Key words: Data mining, clustering, cluster validity, validity index, feature selection, decision support.
Cluster Validity
The best known example of unsupervised learning is certainly clustering. The goal of clustering is to group data points that are similar according to a given similarity metric (by default Euclidean distance is used). Clustering techniques have been applied in various domains such as text mining, color image segmentation, sensory time series and information exploration. In these fields, as in many others, the number of clusters is usually not known in advance.
Cluster validity is used for both estimating the quality of a clustering algorithm and determining the number of clusters underlying the data. Several indices exist in the literature that only attempt to deal with one issue of cluster validity. In addition, most of this work is only relevant for data sets that contain at least two clusters. My actual work deal with a new and overall index for cluster validity. Issues such as unbalanced, overlapped and noisy clusters are processed. In addition, sub-clusters and perfect clusters cases are tested. Moreover, it developed to suit multi-dimensional and noisy data sets. One of its particularity is to handle unique cluster case.
For more information, you can see my publications.
Sensor Placement
Sensors are increasingly used worldwide for tasks such as fault diagnosis and automatic control. The field of sensor configuration has emerged recently and research concerning sensor networks is now emerging in parallel. Examples of the interest in this field are the special issue of Communications of the ACM on wireless sensor networks in 2004 and the publication of a new journal, ACM Transactions on Sensor Networks, in 2005. Moreover, research evolves in managing these sensor networks mainly to satisfy the always growing user needs. Work on sensors is carried out in areas such as multi-sensor management, for example.
In this work, configuring a measurement system (i.e. placing sensors) is considered to be a discrete combinatorial optimization problem. Indeed, the number of sensors to be used as well as the number of possible sensor locations are discrete variables. The complexity of the solution is combinatorial since each sensor can be placed anywhere although only once. In greedy algorithms, strategies that accept a less attractive alternative for a better overall solution do not exist. While finding an answer, the best immediate, or local, solution is always selected. Although finding the overall, or globally, optimal solution for some optimization problems, greedy algorithms may find non optimal solutions for other problems. Unlike greedy algorithms, global search algorithms aim to find the best solution among all possible. Most popular global search algorithms are simulated annealing (SA) and genetic algorithms (GA). The algorithm Probabilistic Global Search Lausanne (PGSL) is a powerful and easy-to-tune algorithm that can be used for sensor placement.
For further details, see my publication page.
Feature Selection
Ongoing work...