Fulltext Searching

Asked by x-rayman on 2009-06-03

Would it be possible/practical to integrate fulltext searching of the files stored within referencer not just the meta-data?

I'm not certain of the best way to do this whether to parse all the files and where possible pass the information into a database such as MySQL and then use its inbuilt searching capabilities or sqlite or both (bibus like). Or to use some internal system or using something like swish-e?

I was thinking of trying to write a plugin based on a MySQL database approach. I see that there is already a plugin for doing searching put it looks like for adding entries rather than finding entries as its main goal? I haven't looked but it would also appear that search-test as a database source is located in the main code somewhere?

PS

I've written a very crude plugin which converts ACS and RSC default pdf filenames into DOI strings. I'd be happy to put it somewhere. It is very basic and not very sophisticated it works on the premise that ACS pubs save by default to the end string of their DOI. Whilst RSC pubs do a similar thing now they also seemed to use a different convention previously. The python plugin just works like genkey but moving the title/filename to the DOI field after adding the correct journal prefix.

Question information

Language:
English Edit question
Status:
Solved
For:
Referencer Edit question
Assignee:
No assignee Edit question
Solved by:
x-rayman
Solved:
2009-06-05
Last query:
2009-06-05
Last reply:
x-rayman (ya93hjdqalf9) said : #1

I've written a primitive interface between a hard coded mysql database in a plugin. For each pdf a md5checkum is created and pdftotext is run these are loaded into the database. The result of pdftotext being stored as a fulltext field.

Searching the fulltext field brings back results with appropriate scoring which could then be fed back into referencer.

For now I'm highjacking test-search to bring the "results" back.

Being rather lazy how does referenecer identify each file - just by filename? or do you have some internal method/uniqueID?

Testing on my system imported 811 entries in 1 minute. Which I was pleasantly surprised about!

x-rayman (ya93hjdqalf9) said : #2

Since this is a feature request I've moved it to the bug list. Its not solved but I guess I put this in the wrong place.

x-rayman (ya93hjdqalf9) said : #3

Since this is a feature request I've moved it to the bug list. Its not solved but I guess I put this in the wrong place.