Set intersection oracle
This article may be too technical for most readers to understand. (February 2015) |
A set intersection oracle (SIO) is a data structure which represents a collection of sets and can quickly answer queries about whether the set intersection of two given sets is non-empty.
The input to the problem is n finite sets. The sum of the sizes of all sets is N (which also means that there are at most N distinct elements). The SIO should quickly answer any query of the form:
- "Does the set Si intersect the set Sk"?
Minimum memory, maximum query time
Without any pre-processing, a query can be answered by inserting the elements of Si into a temporary hash table and then checking for each element of Sk whether it is in the hash table. The query time is .
Maximum memory, minimum query time
Alternatively, we can pre-process the sets and create an n-by-n table where the intersection information is already entered. Then the query time is , but the memory required is .
A compromise
Define a "large set" as a set with at least elements. Obviously there are at most such sets. Create a table of intersection data between every large set to every other large set. This requires memory. Additionally, for each large set, keep a hash table of all its elements. This requires additional memory.
Given two sets, there are three possible cases:
- Both sets are large. Then just read the answer to the intersection query from the table, in time .
- Both sets are small. Then insert the elements of one of them into a hash table and check the elements of the other one; because the sets are small, the required time is .
- One set is large and one set is small. Loop over all elements in the small set and check them against the hash table of the large set. The required time is again .
In general, if we define a "large set" as a set with at least elements, then the number of large set is at most so the memory required is , and the query time is .
Reduction to approximate distance oracle
The SIO problem can be reduced to the approximate distance oracle (DO) problem, in the following way.[1]
- Build an undirected bipartite graph where one part contains a node for each of the n sets, and the other part contains a node for each of the (at most) N elements contained in the sets.
- There is an edge between a set and an element, iff the set contains the element.
This graph has the following properties:
- If two sets intersect, the distance between them is 2 (from one set, to an element in the intersection, to the other set).
- If two sets do not intersect, the distance between them is at least 4.
So, with a DO whose approximation factor of less than 2, we can solve the SIO problem.
It is believed that the SIO problem does not have a non-trivial solution. I.e., it requires space to answer queries in time . If this conjecture is true, this implies that there is no DO with an approximation factor of less than 2 and a constant query time.[1]
References
- ^ a b Patrascu, M.; Roditty, L. (2010). Distance Oracles beyond the Thorup–Zwick Bound. 2010 IEEE 51st Annual Symposium on Foundations of Computer Science (FOCS). p. 815. doi:10.1109/FOCS.2010.83. ISBN 978-1-4244-8525-3.
Content Disclaimer
Informasi ini disarikan dari Wikipedia dan disajikan kembali untuk tujuan edukasi. Konten tersedia di bawah lisensi CC BY-SA 3.0. Kami tidak bertanggung jawab atas ketidakakuratan data yang bersumber dari kontribusi publik tersebut.
- The information displayed on this website is sourced in part or in whole from Wikipedia and has been adapted for the purpose of restating it. We strive to provide accurate and relevant information, however:
- There is no guarantee of absolute accuracy. Wikipedia is an open, collaborative project that can be edited by anyone, so information is subject to change.
- It is not intended to constitute professional advice. The content displayed is for informational and educational purposes only. For important decisions (e.g., medical, legal, or financial), please consult a professional.
- Content copyright. Wikipedia is licensed under the Creative Commons Attribution-ShareAlike License (CC BY-SA). This means that content may be reused with appropriate attribution and shared under a similar license.
- Responsible use. Any risk arising from the use of information from this website is entirely the responsibility of the user.