Wednesday, July 18, 2012

binary files diff and patch

I wanted to send this large sized data from a server to a client multiple times.
Though the change in data in every iteration was negligible, it was essential that the copy be made exactly similar on both client and server.

The time taken to transfer this data was huge. We had to decrease this. Hence it was required that only changes made in the file be diff 'ed  and this file be scp'ed over the client.

Also, at the client end, the file had to be patch'ed, so that the client had the exact same copy as the server.

Heres how we go about this:

1> Diff the file

diff old_file new_file > patch_file   

// old_file was created in previous iteration and new_file created in current iteration.

2> copy the patch file to the client

scp patch_file client@client_IP:patch_file



3> AT CLIENT END :: patch the file in the client

patch old_file patch_file > new_file

for more information please see man patch and man diff.

In order to do this for binary files, you need to install bsdiff.

for Ubuntu use

sudo apt-get install bsdiff


then do ::

bsdiff old_file new_file patch_file
scp patch_file client@clientip:
AT CLIENT END:
bspatch old_file patch_file > new_file.

NOTE :: Check whether the checksum of the newly created file is same as that of the previous file.
This can be done using cksum command.

If they are not the same, you can use jdiff (source code given here :: http://jojodiff.sourceforge.net/)

Also, note that jdiff and jpatch, the two binaries that can be used from the source code are compiled on 32 bit kernel. (use file command for checking the binary type)

For kernel version 3 and above, this will work perfectly fine. However if you plan to use jdiff and jpatch for machines with kernel level 2.6 or so, you will need to recompile it for 64 bit machine.

This regenerates jdiff and jpatch with 1-2 trivial error resolutions.

bsdiff  OR  jdiff, which one is better?

bsdiff takes more time but uses internal compression algorithm for creating small sized patch. It goes without saying the time taken to generate the patch is more.

jdiff on the other hand takes less time, generates larger patch, but gives you the gurantee that the final file after patch is exact replica of the original binary file (verified using cksum ). Also, it is meant for 32 bit OS. 

2 comments: