Delta update process on large files


#1

Hi All,

I’ve got a question about how effective the binary deltas are, when very large files are included in an image.

We’ve had great results when using the delta updates for small files, but we’ve recently experimented with larger files (like really large - 90MB and up), and no matter how small the changs is, the delta always seems to be the exact size of the large file that’s been changed.

We’ve tried using xdelta3 on the large file, and the delta it generates is really small - like less than a KB. Is there anything that might be causing large files to be ignored by the binary delta?

Thanks!


#2

Hello Corin, that sounds weird. What sort of changes did you try? Changing the file contents? Renaming the file or changing extended attributes? Is it possible to share with me a way to reproduce?

Our deltas are currently based on rsync, so I ran rsync over two ~120MB files that I modified with a binary editor in a few ways:

  • Insert a few bytes at random towards the beginning of the file
  • The same, but towards the end of the file
  • Appending the whole file contents to the end, essentially duplicating one of the files

At all times rsync came up with tiny deltas, at most in the order of a few KB. rsync works in chunks, so you’ll always get slightly more that absolute binary changes, but still it should never be the whole file contents or even a significant portion if the files do not significantly differ.


#3

Hi There,

The file is a docker image, which has been exported with docker save.

To recreate this:

  1. Take a docker image, export it:
    docker save IMAGE > docker_image.tar

  2. Compress it:
    gzip docker_image.tar (in my case, the file is about 82MB)

  3. Upload it to:
    http://some-web-server/docker_image.tar.gz

  4. Include it in a resin Docker file:
    RUN curl http://some-web-server/docker_image.tar.gz > /docker_image.tar.gz

  5. Push the project - obviously, the delta will be the new tar.gz file.

  6. Make a small change to the original docker image (e.g. change a text config in the image), and rebuild the image. Then repeat the steps above. In my case, the delta generated was as so:

[Info] Pulling old image for delta generation
[==================================================>] 100%
[Success] Cached image pulled in a few milliseconds
[Info] Generating delta for faster device updates
[Success] Delta generated in 21 seconds; size: 80.97 MB

As I mentioned, when I tried it with xdelta3, on the same two files, the delta was less than 1 KB.

As an aside: I’m doing it this way with large file, rather than pushing them via git, as I have the feeling that every time I push these kind of monster files into git, a unicorn loses its horn. For example, I did it yesterday straight into git, and 2 minutes later there was a builder outage. Coincidence? :wink:


#4

Ok, so I plucked up courage and pushed the file into git… And the delta is pretty small - around 700kb (which is what you’d expect, given the other overhead in the generated resin docker image).

The question is, I have around 1.5GB that I want to be pushing in there - is that going to break your build server? :confounded:: :cold_sweat:


#5

@cmoss glad you’ve got it working!

Pushing an 1.5GB repo definitely won’t be a problem for our service and I think what you observed earlier was really just coincidence. It can be slightly inconvenient though because unless the build eventually succeeds, git won’t accept the push and roll the master branch forward so you’d have to re-upload on next push (git might be smarter than this, but that’s my feeling.)

BTW, I didn’t get a chance to try the steps you outlined but it does look to me more like an issue with the delta being given resin/scratch (i.e. “null”) as the source image due to some bug, than a bug in delta. Will investigate in the coming days, thank you very much for this.