Clustering problem

Multiple versions of Java

Software tools

Virtual private server

SMTP supported tools

Dynamic DNS service

Make your own PC web server

Domain Name System

DNS server setup

Nginx web server

Email verification method

lntTblView table add-on

tnGIS store locator solution

Online GIS for website

TINYRAY WebLogo

MySql+PHP+Ajax Web

Clustering problem

Published 1/22/2017 11:05:27 AM | Last update 11/7/2021 07:51:34 AM
Tags: cluster computing, cluster, cluster analysis

Clustering in data mining refers to the grouping of data points within a dataset based on their similar properties. Data points within a cluster are highly similar to each other and can be discriminated from data points within other clusters. Successful clustering, therefore, maximizes both the compactness of data points within a cluster and the discrimination between clusters.

Given a set of n objects X = {x₁, x₂,…,x_n}, let Θ = {U, V}, where cluster set V = {v₁, v₂,…,v_c} and partition matrix U={u_ki}, k=1,…,c, i=1,…,n, be a partition of X such that ,

Each subset v_k of X is called a cluster and {u_ki} is the membership degree of {x_i} to v_k. u_ki ∈ {0,1} if Θ is crisp partition, otherwise, u_ki ∈ [0,1]. The goal of cluster analysis is to assign objects to clusters such that objects in the same cluster are highly similar to each other while objects from different clusters are as divergent as possible. These sub-goals create what we call the compactness and separation factors that are used, not only for modelling the clustering objectives, but also for evaluating the clustering result. These two parameters can be mathematically formulated in many different ways that lead to numerous clustering models.

An object data matrix X

A dataset containing the objects to be clustered is usually represented in one of two formats, the object data matrix and the object distance matrix. In an object data matrix, the rows usually represent the objects and the columns represent the attributes of the objects regarding the context where the objects occur. The roles of the rows and columns can be interchanged for another representation method, but this one is preferred because the number of objects is always enormously large in comparison with the number of attributes. Assume we have n objects in a p-dimension data space. The object data matrix X then has n rows and p columns where x_ij is the attribute of object i in the j^th dimension and x_i is the vector (or object vector) of object i across p dimensions. The distance matrix contains the pairwise distance (or dissimilarity) of objects. Specifically, the entry (i,j) in the distance matrix represents the distance between objects i and j, 1≤i,j≤n. The distance of objects i and j can be computed using the object vectors i and j from the object data matrix based on a distance measurement. However, the object data matrix cannot be fully recovered from the distance matrix, especially when the value of p is unknown. REFERENCES

Thanh Le (2013) A Machine Learning approach for Gene Expression analysis and applications.

Authentication by TINYRAY
Account:
Password: