Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Data Mining - Grid - Based Clustering Method, Study notes of Data Mining

Detail Summery about Cluster Analysis, What is Cluster Analysis?, Types of Data in Cluster Analysis, Hierarchical Methods, Density-Based Methods, Grid-Based Methods.

Typology: Study notes

2010/2011

Uploaded on 09/04/2011

amit-mohta
amit-mohta 🇮🇳

4.2

(152)

89 documents

1 / 5

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
November 20, 2014 Data Mining: Concepts and
Techniques 1
Chapter 7. Cluster Analysis
1. What is Cluster Analysis?
2. Types of Data in Cluster Analysis
3. A Categorization of Major Clustering Methods
4. Partitioning Methods
5. Hierarchical Methods
6. Density-Based Methods
7. Grid-Based Methods
8. Model-Based Methods
9. Clustering High-Dimensional Data
10.Constraint-Based Clustering
11.Outlier Analysis
12.Summary
pf3
pf4
pf5

Partial preview of the text

Download Data Mining - Grid - Based Clustering Method and more Study notes Data Mining in PDF only on Docsity!

November 20, 2014 Data Mining: Concepts and 1

Chapter 7. Cluster Analysis

1. What is Cluster Analysis?

2. Types of Data in Cluster Analysis

3. A Categorization of Major Clustering Methods

4. Partitioning Methods

5. Hierarchical Methods

6. Density-Based Methods

7. Grid-Based Methods

8. Model-Based Methods

9. Clustering High-Dimensional Data

10.Constraint-Based Clustering

11.Outlier Analysis

12.Summary

November 20, 2014 Data Mining: Concepts and 2

Grid-Based Clustering

Method

  • (^) Using multi-resolution grid data structure Basic Grid-based Algorithm
  1. Define a set of grid-cells
  2. Assign objects to the appropriate grid cell and compute the density of each cell.
  3. Eliminate cells, whose density is below a certain threshold t.
  4. Form clusters from contiguous (adjacent) groups of dense cells (usually minimizing a given objective function)
  • (^) Several interesting methods
    • (^) STING (a STatistical INformation Grid approach) by Wang, Yang and Muntz (1997)
    • (^) WaveCluster by Sheikholeslami, Chatterjee, and Zhang (VLDB’98)
      • (^) A multi-resolution clustering approach using wavelet method
    • (^) CLIQUE: Agrawal, et al. (SIGMOD’98)
      • (^) On high-dimensional data (thus put in the section of clustering high- dimensional data

November 20, 2014 Data Mining: Concepts and 4 The STING Clustering Method

  • (^) Each cell at a high level is partitioned into a number of smaller cells in the next lower level
  • (^) Statistical info of each cell is calculated and stored beforehand and is used to answer queries
  • (^) Parameters of higher level cells can be easily calculated from parameters of lower level cell - (^) count , mean , s , min , max - (^) type of distribution—normal, uniform , etc.
  • (^) Use a top-down approach to answer spatial data queries
  • (^) Start from a pre-selected layer—typically with a small number of cells
  • (^) For each cell in the current level compute the confidence interval

November 20, 2014 Data Mining: Concepts and 5

Comments on STING

  • (^) Remove the irrelevant cells from further consideration
  • (^) When finish examining the current layer, proceed to the next lower level
  • (^) Repeat this process until the bottom layer is reached
  • (^) Advantages:
    • (^) Query-independent, easy to parallelize, incremental update
    • (^) O(K), where K is the number of grid cells at the lowest level
  • (^) Disadvantages:
    • (^) All the cluster boundaries are either horizontal or vertical, and no diagonal boundary is detected