Deduplication in search results

Asked by Siegfried Schweizer

It seems to be the default behaviour of Goobi.Presentation searches that hit lists are being deduplicated. That means that if a search term is being found in two ore more substructures of an item, the substructures seem to be removed from the list and the hit links to the one superstructure only, while the total number of hits is being kept for information of the user. This behaviour also seems to be clearly visible at http://digital.slub-dresden.de/ - for example, if searched for "Dresden", the result list has got the following annotation:

"Die Suche ergab 5236 Treffer in 4459 Dokumenten.
Einträge 1 bis 25 von 4459."

Again this is something we in Berlin do not neccessarily deem the best choice for us, at least we didn't realize it that way in our "old" (present) presentation (http://digital.staatsbibliothek-berlin.de/dms/suche/). So, is there a way to change search behaviour in Goobi.Presentation to non-deduplication?

Question information

Language:
English Edit question
Status:
Answered
For:
Goobi.Presentation Edit question
Assignee:
No assignee Edit question
Last query:
Last reply:
Revision history for this message
Matthias Ronge (matthias-ronge) said :
#1

I have come accross this is well. To a beholder, It doesn’t become clear in any way what the ‘5236 Treffer’ mean. Is it possible to easily remove/disable that line?

Revision history for this message
Sebastian Meyer (sebastian-meyer) said :
#2

"5236 hits in 4459 documents" means that there were a total of 5236 (already deduplicated) hits in the index which belong to 4459 seperate documents. But those multiple hits don't get removed, but instead aggregated under their respective parent document.

As you can see at http://digital.slub-dresden.de/ some entries of the result list show a little "Details einblenden" link which reveals all the hits inside that document. Technically this is just an cascading ordered list (at least as long as you use the default template for the listview plugin):

<ol>
 <li>FIRST DOCUMENT
  <ol>
   <li>FIRST HIT IN DOCUMENT</li>
   <li>SECOND HIT IN DOCUMENT</li>
  </ol>
 </li>
 <li>SECOND DOCUMENT</li>
</ol>

If you want to remove the line, you can either remove the ###LISTDESCRIPTION### placeholder from your listview plugin's template or change the string localization for the term "hits" of your search plugin. The latter can be achieved via TypoScript (just change the strings to whatever you want):

plugin.tx_dlf_search._LOCAL_LANG.default.hits = %d hits found in %d documents.
plugin.tx_dlf_search._LOCAL_LANG.de.hits = Die Suche ergab %d Treffer in %d Dokumenten.

Can you help with this problem?

Provide an answer of your own, or ask Siegfried Schweizer for more information if necessary.

To post a message you must log in.