The details of the DBpedia indcies used for generating the baseline runs of the DBpedia-Entity v2 collection (i.e., Table 2 of ) are described here.
The following files from DBpedia 2015-10 dump are indexed:
- Entities without the following predicates are not indexed:
- All predicate values with URIs are resolved by replacing “_” with space; e.g., the URI
http://dbpedia.org/resource/As_We_May_Thinkbecomes “as we may think”.
The following fields are used for constructing the index, following the general approach outlined in . All fields listed below contain unique phrases.
|Names||Names of the entity||
|Similar entity names||Entity name variants||
|Attributes||Literal attibutes of entity||All
|Related entity names||URI relations of entity||Similar to Attributes field, but “o” should be a URI.|
The baseline runs of the DBpedia-Entity v2 collection  are generated using two indices, denoted as index A and B. These indices, built by two different research groups, share the above settings, but are different in the following aspects.
- A new field called “catchall” is used; it encompass the content of all other fields. Duplicate values are not removed in this field.
- Anchor texts (i.e. contents of
<dbo:wikiPageWikiLinkText>predicate) are added to both “similar entity names” and “attributes” fields.
- Entity URIs are resolved differently for the “related entity names” field. Names for related entities are extracted in the same way as it is done for “names” field (see predicates for “names” in the above table), but only one arbitrary name is used for each related entity.
- Category URIs are resolved using
- Predicate URIs are resolved using
infobox_property_definitions_en.ttlfile. If a name for a predicate is not defined, a predicate is omitted.
 Faegheh Hasibi, Fedor Nikolaev, Chenyan Xiong, Krisztian Balog, Svein Erik Bratsberg, Alexander Kotov, and Jamie Callan. 2017. “DBpedia-Entity v2: A Test Collection for Entity Search”, In proceedings of 40th ACM SIGIR conference on Research and Development in Information Retrieval (SIGIR ’17). 1265-1268.
 Nikita Zhiltsov, Alexander Kotov, and Fedor Nikolaev. 2015. “Fielded Sequential Dependence Model for Ad-Hoc Entity Retrieval in the Web of Data”. In Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ‘15). 253–262.