RSS download error, ?encoding?

Asked by tomlaf

On download of the feed form Sharepoint (unprotected feed) the file download by xibo client get the acentuated caracters corupted.

I all ready know that the server is reported the wrong encoding info on the frist line as by W3C RSS validator, but that error does not seam to create any probleme to the rss reader of IE.

My guest is that the bug, correction or amelioration are somewre in in wc_OpenReadCompleted.

If i take the manualy download file and put it in place of the xibo download one the xibo client respond correcly and show the contect of the feed without any problems.

Thanks a lot for you help.

Question information

Language:
English Edit question
Status:
Answered
For:
Xibo Edit question
Assignee:
No assignee Edit question
Last query:
Last reply:

This question was reopened

Revision history for this message
tomlaf (tomlaf) said :
#1

Such i can not add file :
1. The XML download and prase by XIBO have corupted caracters (the original acentuated caracters).
2. If I goes directly to the feed (on IE or Chrome) and download it, the new file is OK. If I take that new file and change the Xibo download file with the manualy download file the client start to show the feed in the layout without problem

Revision history for this message
Alex Harrington (alexharrington) said :
#2

If you try a different Unicode encoded rss feed you'll see xibo works fine. We specifically test this before each release.

Your feed is very broken. The accented characters are corrupted in Firefox and Chrome if I open that feed.

You need to fix the feed.

Alex

This email carries a disclaimer, a copy of which may be read at http://learning.longhill.org.uk/disclaimer

Revision history for this message
Alex Harrington (alexharrington) said :
#3

For example:
http://www.france24.com/fr/monde/rss

Works perfectly in my 1.0.7 client and the python client.

Alex

Revision history for this message
tomlaf (tomlaf) said :
#4

Hi,

did find my bug :

On the wc_OpenReadCompleted in RSS.CS we have (line 429 to 441) :

                System.IO.StreamReader sr = new System.IO.StreamReader(data, wc.Encoding);
                rssContents = sr.ReadToEnd();

                StreamWriter sw = new StreamWriter(File.Open(_rssFilePath, FileMode.Create, FileAccess.Write, FileShare.Read), wc.Encoding);

                System.Diagnostics.Debug.WriteLine("Retrieved RSS - about to write it", "RSS - wc_OpenReadCompleted");

                sw.Write(rssContents);

                sr.Close();
                sw.Close();

                _rssReady = true;

The system read my rss feed and detect is not a UTF8 file, covert it to UTF8 to do rssContents.
But, and this is what created my bug, the streamwriter convertit back to write it down on the .xml file, but removing the wc.encoding on the streamwrite line that stop the writer to mess with the text and voila my rss is show on the page.

So, in conclusion, all i have to do was change the line 432 from :
StreamWriter sw = new StreamWriter(File.Open(_rssFilePath, FileMode.Create, FileAccess.Write, FileShare.Read), wc.Encoding);

to

StreamWriter sw = new StreamWriter(File.Open(_rssFilePath, FileMode.Create, FileAccess.Write, FileShare.Read));

Have a nice day

Revision history for this message
tomlaf (tomlaf) said :
#5

Forget it,

did work for my rss but broke other.

Will have to do a in hose correction.

Have a nice day

Revision history for this message
tomlaf (tomlaf) said :
#6

Hi back,

not realy sure why, but by removing boot wc.encoding (for sw et sr) it work fine for my broke rrs and for the france24 rss also.

seam to be a strage encoding, reencoding prb some were and i'm not sure where, but it work.......

Have a nice day, any help on understandig will be apreciate

Revision history for this message
Alex Harrington (alexharrington) said :
#7

We went through lots of iterations of that code to get what we have now. Don't forget it has to work in non-Latin locales too.

Alex

This email carries a disclaimer, a copy of which may be read at http://learning.longhill.org.uk/disclaimer

Revision history for this message
tomlaf (tomlaf) said :
#8
Revision history for this message
tomlaf (tomlaf) said :
#9

Hello,

I perfecly understand, it may be a windows 7 x64 think or the fact that my system is french localised or any other local particularity.

I'm pretty sure the system is doing a double renconding on my "boken" feed.

Will try the pyton client to see if it a .net thing.

As before thanks you for your time

Revision history for this message
Alex Harrington (alexharrington) said :
#10

OK. Can you put a patch together then please, or better still push a new branch of the code in to launchpad and we'll run it against the test suite we have.

If it's a diff file please email it to <email address hidden> as you can't have attachments here.

Thanks for your help with this

Alex

This email carries a disclaimer, a copy of which may be read at http://learning.longhill.org.uk/disclaimer

Revision history for this message
tomlaf (tomlaf) said :
#11

Did try to push it to a new branch,

not sure if it was the good way to do it, but seam to have work

Have fun

Revision history for this message
tomlaf (tomlaf) said :
#12

I have take the time to also add screen saver version of the client.

Have a good sunday.

Revision history for this message
Alex Harrington (alexharrington) said :
#13

Thanks for that.

We can pick the changes for Rss.cs manually but I think the screensaver patch will be impossible for us to merge cleanly as you've changed alot of other unnecessary stuff at the same time. The VS2010 changes are also somewhat unhelpful - but I appreciate that's perhaps unavoidable.

The reason the 1.2.0-rc1 version doesn't work against a 1.0.6 server is because we've changed the protocol version. All you needed to do was change the protocol version 2 back to a 1 to enable it to work for testing, and commit it back as a 2 again to prevent a conflict later on. Once 1.2.0 stable (or perhaps rc2) is released, you'll need the newer client to talk to it. That's because alot of people insist on ignoring the release notes and running old clients against newer servers - which almost always causes problems and an email to support. This will physically prevent that happening in the future.

Also we don't generally commit binaries to the repo - you'll see that you added virtually everything in the bin directory as part of your commit. All that will need to be stripped out.

It's good to have the code changes, but I don't think we can merge it in it's current state.

Best wishes

Alex

Revision history for this message
Dan Garner (dangarner) said :
#14

tomlaf,

We did a lot of work on that section of code (believe it or not ;-) and found that when we used the system default encoding (i.e. leaving out the encoding property) it did not work on non-english installations. I don't mean non-english RSS feeds, instead non-english windows installations. In particular we had a very long running support issue with a Thai locale windows installation and a Thai local feed/text.

I think before we do anything we will need to get someone to test your modifications on a Thai local. I hope you can understand that this is necessary. Thank you for your efforts with this bug.

Does anyone using a Thai locale have the time to test out this (https://code.launchpad.net/~tomlaf/xibo/netclient-encoding) branch of the client?

-Dan

Revision history for this message
tomlaf (tomlaf) said :
#15

Hi,

thanks a lot, I dind't thnink my revision was ok for real testing.

I will be clening up my branches (removing the unneded stuff and separating the sceen saver and the rss feed).

I was my frist push to a open source projet, sorry if it was dirty, I'm lunrning how to do it.

For the client, the reason i move back to 1.0.7 insted of going with the 1.2.0 is the 1.2.0 refuse tu register with 1.2.0-rc1 server, the register finish with a error. (do you need more on thah or is it a know limitation of 1.2.0.rc1?)

For the fix I propose I have find some rss fedd that didn't encode properly yesterday, so I'm back to check what is the problem, I'm thiniking the the automatic encoding detection is not 100% correct. The easy solution I see is to add a new field on the rss sessting page to let the user, if needed, manualy select the encoding of the feed (like a advance setting button that will give the oportunity to set it manualy).

I will check back what the wc.encoding is realy sending depending on the feed, may be I will find a automaitc 100% working solution.

Thanks

Revision history for this message
Alex Harrington (alexharrington) said :
#16

Thanks Tom

Its great to have contribution from you.

If there is a way to detect the encoding properly then great, but personally I'd say that if the webserver is returning the wrong encoding in its response then we've done as much as we can reasonably do. I wouldn't want to see a new option for this. It would be confusing for people to understand what it was for and almost impossible to document!

With regard to the 1.2 series client not registering. As I said before all you need to do is change the webservice schema version from 2 back to 1 temporarily in the client options and it will then work. They are otherwise identical for now. That's a deliberate change to prevent old clients connecting to the new series server as people seem unwilling to read the release notes with respect to which server and client versions are compatible.

Keep up the good work :)

Alex

This email carries a disclaimer, a copy of which may be read at http://learning.longhill.org.uk/disclaimer

Can you help with this problem?

Provide an answer of your own, or ask tomlaf for more information if necessary.

To post a message you must log in.