can Bazaar diff two binaray files like TortoiseSVN?

Asked by Timmie

In TortoiseSVN there is a tool that lets you compare 2 binaray files. E. g. Office documents:

* Word Document management using SVN - http://newgeeks.blogspot.com/2006/08/word-document-management-using-svn.html
* Comparing Microsoft Word documents stored in a Subversion repository - http://nicolas.lehuen.com/index.php/post/2005/06/30/60-comparing-microsoft-word-documents-stored-in-a-subversion-repository

Is this implemented or planned for BZR as well?

Question information

Language:
English Edit question
Status:
Answered
For:
Bazaar Edit question
Assignee:
No assignee Edit question
Last query:
Last reply:
Revision history for this message
Elliot Murphy (statik) said :
#1

On 01/08/2008 07:16 AM, Tim wrote:
> New question #21728 on Bazaar:
> https://answers.launchpad.net/bzr/+question/21728
>
> In TortoiseSVN there is a tool that lets you compare 2 binaray files. E. g. Office documents:
>
> * Word Document management using SVN - http://newgeeks.blogspot.com/2006/08/word-document-management-using-svn.html
> * Comparing Microsoft Word documents stored in a Subversion repository - http://nicolas.lehuen.com/index.php/post/2005/06/30/60-comparing-microsoft-word-documents-stored-in-a-subversion-repository
>
> Is this implemented or planned for BZR as well?
>
The diff tool being used here is Microsoft Word itself. That is pretty
neat! Bazaar will allow you to specify a custom tool like this to
compare files. You would need to modify the Bazaar difftools plugin to
support this. It looks like there are interesting scripts to do this
with Open Office as well.

-elliot

Revision history for this message
Tony (tony-altaimiruniversity) said :
#2

Once upon a time, several versions ago, I used to work on the Microsoft Word Team.

And I can tell you that the Microsoft Word Revision Tracking System has a history of being very unreliable. It works fine for simple documents, but breaks on complex documents.

I would urge people to be very careful about using it for important documents, especially if those documents use tables or other advanced functionality.

Now, as I said, that was several versions ago and they may have rewritten the revision tracking after I left. But while I was there management did a study of the problem and concluded that it would be too costly to fix; So they made the decision to leave it as is for their next product cycle.

Microsoft has a ~strong inclination~ of not fixing bugs unless they receive a very high number of complaints about a specific issue. Revision tracking does not get used by very many people so it does not figure very high on Microsoft's list of priorities, or at least that was the situation when I was there.

In my personal opinion as a former member of the Word Team, any solution that relies on Microsoft Word Revision Tracking is going to be very fragile.

Just one more good reason to use Open Office.

Revision history for this message
Andreas Sommer (andidog) said :
#3

Just stumbled over this question and thought I had to work around this missing feature. Diffing Word documents is really important, that's why its a standard feature in TortoiseSVN. This patch is for Windows XP and works if TortoiseSVN is installed in the default path.

In "plugins\qbzr\lib\diffview.py", add the following lines at the top:

from __future__ import absolute_import
from subprocess import Popen
import os, tempfile, threading, time

Then around line 490, place the following under the code block that contains the string "[binary file]":

            if all(present) and file_extension(paths[0]).lower() == ".doc" and file_extension(paths[1]).lower() == ".doc":
                tmpFilesInfo = [tempfile.mkstemp(prefix = "qbzr-doc-compare-tmp", suffix = ".doc")
                                for i in xrange(2)]

                for i, (fd, filename) in enumerate(tmpFilesInfo):
                    try:
                        os.write(fd, data[i])
                    finally:
                        os.close(fd)

                # There's a bug about Popen not being able to take special characters. Work around it by using tempfile
                # for both A and B - this usually creates ASCII filenames.
                process = Popen(["wscript", r"C:\Program Files\TortoiseSVN\Diff-Scripts\diff-doc.js", tmpFilesInfo[0][1], tmpFilesInfo[1][1]])

                def waitThenCloseFiles(process, filenames):
                    process.communicate()
                    for filename in filenames:
                        os.remove(filename)

                threading.Thread(target = waitThenCloseFiles, args = (process, (tmpFilesInfo[0][1], tmpFilesInfo[1][1]))).start()

Haven't tested this a lot, but it should work for .doc files.

Revision history for this message
Martin Pool (mbp) said :
#4

Hi AndiDog,

People are working on this on the list now, you might want to join that thread.

Can you help with this problem?

Provide an answer of your own, or ask Timmie for more information if necessary.

To post a message you must log in.