Efficient List Intersection Algorithm for Short Documents by Document Reordering
Abstrak
List intersection plays a pivotal role in various domains such as search engines, database systems, and social networks. Efficient indexes and query strategies can significantly enhance the efficiency of list intersection. Existing inverted index-based algorithms fail to utilize the length information of documents and require excessive list intersections, resulting in lower efficiency. To address this issue, in this paper, we propose the LDRpV (Length-based Document Reordering plus Verification) algorithm. LDRpV filters out documents that are unlikely to satisfy the intersection results by reordering documents based on their length, thereby reducing the number of candidates. Additionally, to minimize the number of list intersection operations, an intersection and verification strategy is designed, where only the first <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mi>m</mi></semantics></math></inline-formula> lists are intersected, and the resulting candidate set is directly verified. This approach effectively improves the efficiency of list intersection. Experimental results on four real datasets demonstrate that LDRpV can achieve a maximum efficiency improvement of 46.69% compared to the most competitive counterparts.
Topik & Kata Kunci
Penulis (4)
Lianyin Jia
Dongyang Li
Haihe Zhou
Fengling Xia
Akses Cepat
- Tahun Terbit
- 2024
- Sumber Database
- DOAJ
- DOI
- 10.3390/math12091328
- Akses
- Open Access ✓