Unconventional application of k-means for distributed approximate similarity search

Ortega, Felipe; Algar, Maria Jesus; Martín de Diego, Isaac; Martínez Moguerza, Javier

doi:10.1016/j.ins.2022.11.024

dc.contributor.author	Ortega, Felipe
dc.contributor.author	Algar, Maria Jesus
dc.contributor.author	Martín de Diego, Isaac
dc.contributor.author	Martínez Moguerza, Javier
dc.date.accessioned	2023-09-22T11:25:58Z
dc.date.available	2023-09-22T11:25:58Z
dc.date.issued	2022
dc.identifier.citation	Felipe Ortega, Maria Jesus Algar, Isaac Martín de Diego, Javier M. Moguerza, Unconventional application of k-means for distributed approximate similarity search, Information Sciences, Volume 619, 2023, Pages 208-234, ISSN 0020-0255, https://doi.org/10.1016/j.ins.2022.11.024	es
dc.identifier.issn	0020-0255
dc.identifier.uri	https://hdl.handle.net/10115/24488
dc.description.abstract	Similarity search based on a distance function in metric spaces is a fundamental problem for many applications. Queries for similar objects lead to the well-known machine learning task of nearest-neighbours identification. Many data indexing strategies, collectively known as Metric Access Methods (MAM), have been proposed to speed up these queries. Moreover, since exact approaches to solving similarity queries can be complex and timeconsuming, alternative options have emerged to reduce query execution time, such as returning approximate results or resorting to distributed computing platforms. In this paper, we introduce MASK (Multilevel Approximate Similarity search with k-means), an unconventional application of the k-means algorithm as the foundation of a multilevel index structure for approximate similarity search suitable for metric spaces. We show that this method leverages inherent properties of k-means for this purpose, like representing high-density data areas with fewer prototypes. An implementation of this new indexing procedure is evaluated using a synthetic dataset and two real-world datasets in highdimensional and high-sparsity spaces. Experimental tests show that MASK performs better than alternative algorithms for approximate similarity search. Results are promising and underpin the applicability of this novel indexing method in multiple domains.	es
dc.language.iso	eng	es
dc.publisher	Elsevier	es
dc.rights	Atribución 4.0 Internacional	*
dc.rights.uri	http://creativecommons.org/licenses/by/4.0/	*
dc.subject	Data indexing	es
dc.subject	Approximate similarity search	es
dc.subject	Metric distance	es
dc.subject	Unsupervised learning	es
dc.subject	Distributed computing	es
dc.subject	k-means	es
dc.title	Unconventional application of k-means for distributed approximate similarity search	es
dc.type	info:eu-repo/semantics/article	es
dc.identifier.doi	10.1016/j.ins.2022.11.024	es
dc.rights.accessRights	info:eu-repo/semantics/openAccess	es

Files in this item

Name:: 1-s2.0-S0020025522013056-main.pdf
Size:: 4.822Mb
Format:: PDF

View/Open

Google Viewer/

This item appears in the following Collection(s)

Artículos de Revista [3696]

Show simple item record

Except where otherwise noted, this item's license is described as Atribución 4.0 Internacional