Difference between Buzhash and Rabin fingerprint chunker

I want to know that which chunker is better for archives and encrypted files and which is faster. Which chunker can handle huge number of files better.

I think for both archives and encrypted files you’d be better off with the default size splitter. If you do want to figure out which was faster I’d suggest testing a small sample with both. You’re going to be trying to chunk what looks mostly like random noise for both encrypted files and archives, if they’re compressed.

1 Like

@zacharywhitley is correct. For encrypted/compressed data, just use the default.

Other than being different algorithms, the primary difference between go-ipfs’ chunkers is:

  • rabin is theoretically slightly better (unconfirmed) but our implementation is very inefficient and slow.
  • buzhash is much simpler and faster at chunking data.

I seem to recall some discussion of using FastCDC. Any idea if there was and where it might have gone?