INTUITIVE DATA UNDERSTANDING: INTERACTIVE VISUALIZATIONS OF K-MEANS AND HIERARCHICAL CLUSTERING TECHNIQUES
By Aditi Rajesh Sharma
Research Article
INTUITIVE DATA UNDERSTANDING: INTERACTIVE VISUALIZATIONS OF K-MEANS AND HIERARCHICAL CLUSTERING TECHNIQUES
ISSN: 3067-4395
DOI Prefix: 10.5281/zenodo.
Abstract
This study examines the structure of geothermal geo-chemisty data in India using k-means and hierarchical cluster analyses. The first attempt to list hot springs in India was made by Schlagintweit in 1852. The Geological Survey of India published a special publication titled 'Geothermal Atlas of India', and the government of India constituted a 'Hot Spring Committee' to examine the possibility of developing geothermal plants for power generation and other uses. In the Puga valley and Parvati projects, it is estimated that it is possible to harness 5000 MWh of geothermal energy from Puga valley, sufficient to sustain a 20 MWe power plant.
The GSI, the repository of most information concerning geological and related data in the country, included an R&D item No. 7/WB-5 for the development of a computerised system of geothermal database system referred to as GTHERMIS in its field season 1993-94 program. The computational strategy involves assigning each data point to the cluster with the nearest center (or "centroid"), recalculating the centroids after each assignment, and repeating the procedure until the clusters are no longer statistically different.
The k-means method, developed by MacQueen (1967), is one of the most widely used non-hierarchical methods, particularly suitable for large amounts of data. The data is scaled before cluster analysis, and the results are visualized using interactive graphics
The GSI, the repository of most information concerning geological and related data in the country, included an R&D item No. 7/WB-5 for the development of a computerised system of geothermal database system referred to as GTHERMIS in its field season 1993-94 program. The computational strategy involves assigning each data point to the cluster with the nearest center (or "centroid"), recalculating the centroids after each assignment, and repeating the procedure until the clusters are no longer statistically different.
The k-means method, developed by MacQueen (1967), is one of the most widely used non-hierarchical methods, particularly suitable for large amounts of data. The data is scaled before cluster analysis, and the results are visualized using interactive graphics