Difference between Buzhash and Rabin fingerprint chunker

mehulagg · September 1, 2021, 8:01am

I want to know that which chunker is better for archives and encrypted files and which is faster. Which chunker can handle huge number of files better.

zacharywhitley · September 8, 2021, 12:13pm

I think for both archives and encrypted files you’d be better off with the default size splitter. If you do want to figure out which was faster I’d suggest testing a small sample with both. You’re going to be trying to chunk what looks mostly like random noise for both encrypted files and archives, if they’re compressed.

stebalien · September 12, 2021, 11:22am

@zacharywhitley is correct. For encrypted/compressed data, just use the default.

Other than being different algorithms, the primary difference between go-ipfs’ chunkers is:

rabin is theoretically slightly better (unconfirmed) but our implementation is very inefficient and slow.
buzhash is much simpler and faster at chunking data.

zacharywhitley · September 12, 2021, 3:07pm

I seem to recall some discussion of using FastCDC. Any idea if there was and where it might have gone?

Topic		Replies	Views
Which chunking algorithms are available? Old FAQ	2	780	May 23, 2017
Best way to check how much dedupe IPFS	5	441	December 21, 2021
What exactly does raw-leaves do Help	14	1808	February 8, 2020
File hash is different from original Help	28	1790	September 16, 2020
Does the IPFS chunking change the CID for the same file chunked differently? Tutorials	2	753	June 26, 2021

Difference between Buzhash and Rabin fingerprint chunker

Related Topics