Extract PDF keywords
Hey, I just came across Referencer and I think it has the potential to be a really great organizational tool. I'm not in the sciences, but I'm interested in using it as a general purpose organizational utility for a large archive of documents that have been scanned to PDF. I was wondering about the possibility of extracting the embedded keywords field from the pdf metadata and using it to auto-generate tags. I was hoping that I could implement this as a python plugin, but it doesn't look like the the hooks you have implemented extend that far. I'm afraid I don't have the chops with C/C++ to hack that out myself, but just browsing through the code I notice that your PDF import code uses poppler, which I'm reasonably certain supports extracting the keyword data and that should make the whole thing fairly trivial to implement. Thanks in advance for any consideration you give this.
Question information
- Language:
- English Edit question
- Status:
- Answered
- For:
- Referencer Edit question
- Assignee:
- No assignee Edit question
- Last query:
- Last reply:
Can you help with this problem?
Provide an answer of your own, or ask bouvard for more information if necessary.