SALSA algorithm
Stochastic Approach for Link-Structure Analysis (SALSA) is a web page ranking algorithm designed by R. Lempel and S. Moran to assign high scores to hub and authority web pages based on the quantity of hyperlinks among them.[1]
Origins
SALSA is inspired by two other link-based ranking algorithms, namely HITS and PageRank, in the following ways:
- like HITS, the algorithm assigns two scores to each web page: a hub score and an authority score. An authority is a page which is significantly more relevant to a given topic than other pages, whereas a hub is a page which contains many links to authorities;
- like HITS, SALSA also works on a focused subgraph which is topic-dependent. This focused subgraph is obtained by first finding a set of pages most relevant to a given topic (e.g. take the top-n pages returned by a text-based search algorithm) and then augmenting this set with web pages that link directly to it and with pages that are linked directly from it. Because of this selection process, the hub and authority scores are topic-dependent;
- like PageRank, the algorithm computes the scores by simulating a random walk through a Markov chain that represents the graph of web pages. SALSA however works with two different Markov chains: a chain of hubs and a chain of authorities. This is a departure from HITS's notions of hubs and authorities based on a mutually reinforcing relationship.
Properties
SALSA can be seen as an improvement of HITS.
It is computationally lighter since its ranking is equivalent to a weighted in/out degree ranking. The computational cost of the algorithm is a crucial factor since HITS and SALSA are computed at query time and can therefore significantly affect the response time of a search engine. This should be contrasted with query-independent algorithms like PageRank that can be computed off-line.
SALSA is less vulnerable to the Tightly Knit Community (TKC) effect than HITS. A TKC is a topological structure within the Web that consists of a small set of highly interconnected pages. The presence of TKCs in a focused subgraph is known to negatively affect the detection of meaningful authorities by HITS.
The Twitter Social network uses a SALSA style algorithm to suggest accounts to follow.[2]
References
- ^ Wang, Ziyang. "Improved Link-Based Algorithms for Ranking Web Pages" (PDF). cs.nyu.edu. New York University, Department of Computer Science. Retrieved 7 August 2023.
- ^ Pankaj Gupta, Ashish Goel, Jimmy Lin, Aneesh Sharma, Dong Wang, and Reza Bosagh Zadeh WTF: The who-to-follow system at Twitter, Proceedings of the 22nd international conference on World Wide Web
- Lempel, R.; Moran S. (April 2001). "SALSA: The Stochastic Approach for Link-Structure Analysis". ACM Transactions on Information Systems. 19 (2): 131–160. CiteSeerX 10.1.1.38.5859. doi:10.1145/382979.383041. S2CID 9607841.
Content Disclaimer
Informasi ini disarikan dari Wikipedia dan disajikan kembali untuk tujuan edukasi. Konten tersedia di bawah lisensi CC BY-SA 3.0. Kami tidak bertanggung jawab atas ketidakakuratan data yang bersumber dari kontribusi publik tersebut.
- The information displayed on this website is sourced in part or in whole from Wikipedia and has been adapted for the purpose of restating it. We strive to provide accurate and relevant information, however:
- There is no guarantee of absolute accuracy. Wikipedia is an open, collaborative project that can be edited by anyone, so information is subject to change.
- It is not intended to constitute professional advice. The content displayed is for informational and educational purposes only. For important decisions (e.g., medical, legal, or financial), please consult a professional.
- Content copyright. Wikipedia is licensed under the Creative Commons Attribution-ShareAlike License (CC BY-SA). This means that content may be reused with appropriate attribution and shared under a similar license.
- Responsible use. Any risk arising from the use of information from this website is entirely the responsibility of the user.