KEYWORD SEARCH IN TEXT CUBE: FINDING TOP-K RELEVANT CELLS

KEYWORD SEARCH IN TEXT CUBE: FINDING TOP-K RELEVANT CELLS

BOLIN DING, YINTAO YU, BO ZHAO, CINDY XIDE LIN, JIAWEI HAN, AND CHENGXIANG ZHAI

Abstract. We study the problem of keyword search in a data cube with text-rich dimension(s) (so-called text cube). The text cube is built on a multidimensional text database, where each row is associated with some text data (e.g., a document) and other structural dimensions (attributes). A cell in the text cube aggregates a set of documents with matching attribute values in a subset of dimensions. A cell document is the concatenation of all documents in a cell. Given a keyword query, our goal is to find the top-k most relevant cells (ranked according to the relevance scores of cell documents w.r.t. the given query) in the text cube. We define a keyword-based query language and apply IR-style relevance model for scoring and ranking cell documents in the text cube. We propose two efficient approaches to find the top-k answers. The proposed approaches support a general class of IR-style relevance scoring formulas that satisfy certain basic and common properties. One of them uses more time for pre-processing and less time for answering online queries; and the other one is more efficient in pre-processing and consumes more time for online queries. Experimental studies on the ASRS dataset are conducted to verify the efficiency and effectiveness of the proposed approaches.

Data and Resources

Additional Info

Field Value
Maintainer Elizabeth Foughty
Last Updated February 19, 2025, 11:21 (UTC)
Created February 19, 2025, 11:21 (UTC)
accessLevel public
accrualPeriodicity irregular
bureauCode {026:00}
catalog_@context https://project-open-data.cio.gov/v1.1/schema/catalog.jsonld
catalog_@id https://data.nasa.gov/data.json
catalog_conformsTo https://project-open-data.cio.gov/v1.1/schema
catalog_describedBy https://project-open-data.cio.gov/v1.1/schema/catalog.json
harvest_object_id b60e6d79-36fb-43fe-8a3e-3782aefcb167
harvest_source_id b37e5849-07d2-41cd-8bb6-c6e83fc98f2d
harvest_source_title DNG Legacy Data
identifier DASHLINK_234
issued 2010-10-13
landingPage https://c3.nasa.gov/dashlink/resources/234/
modified 2020-01-29
programCode {026:029}
publisher Dashlink
resource-type Dataset
source_datajson_identifier true
source_hash 0636f064938f96fbc6e9d42de39d0fd630000fc53c0a73332fe12abd58d4f687
source_schema_version 1.1