Only two full articles being shown

Asked by JuanLUX

Hello,

I have been playing around with the Full-Text RSS reader and I have found an strange behaviour with some reeds URL's used by some newspapers in Spain either in the Free Service provided by fivefilters.org and also on my own installation at my host.

For instance, try with the following url:

http://www.canalalba.es/rss/index.xml

You will see only the two first articles while the rest will give you the copyright disclaimer of the publisher.

Is this an expected behaviour?

Thanks in advance

Question information

Language:
English Edit question
Status:
Solved
For:
Five Filters Edit question
Assignee:
No assignee Edit question
Solved by:
Keyvan
Solved:
Last query:
Last reply:
Revision history for this message
Best Keyvan (keyvan) said :
#1

Thanks for the report. This service relies on Readability to identify and extract content. Usually it works but occasionally - especially when there's very little content in the article (as in the 2 articles in this case) it will fail to identify the correct content block. I usually suggest users try the bookmarklet by Arc90 - found here http://lab.arc90.com/experiments/readability/. Our PHP version is based on their code. If the bookmarklet works on pages which the Full-Text RSS service fails to extract content from, then I can look into it some more. But if it also fails to extract or identify the correct content block, then there's not much we can do.

I just tried those two articles using the bookmarklet and it also identified the copyright notice instead of the content.

In future version I'm considering some sort of override where for particular sites you'll be able to specify identifiers to help the extractor pick out the correct content.

Thanks again for the report.

Keyvan

Revision history for this message
JuanLUX (juanlgr) said :
#2

Thanks Keyvan, that solved my question.