PHP-readability to use DB of content definitions
Asked by
Dither
I think php-readability must use main content definition database (possibly just a plain php array) to get page content from a known sources instead of statistical analysis as it's doing now, because it fails in a lot of cases. It's possible to combine those two approaches so php-readability will firstly try to find content definition in database and if there is no definition then use page analysis.
The possible way is to utilize microformat that consist of pageElement and pageURL entires where the first is XPath definition of main content and the second is RegExp definition of pages where this content is.
Question information
- Language:
- English Edit question
- Status:
- Answered
- Assignee:
- No assignee Edit question
- Last query:
- Last reply:
Can you help with this problem?
Provide an answer of your own, or ask Dither for more information if necessary.
To post a message you must log in.