First thing to check is that your gzip has a working --rsyncable option. Some details here;
Next thing to check is that the tar files actually have any common data in the first place. You could do this by using rdiff or xdelta to calculate a patch between the uncompressed tar files… The size difference between the patch and the new file is how much duplicate data there is. Using xdelta will tell you the maximum amount of duplication that could be found (it finds pretty much optimal deltas). Playing with the block size for rdiff can figure out how much the duplicate detection depends on the block size; smaller blocks will find more/smaller identical sections, larger blocks will be cheaper to calculate. In particular, the largest block size that still finds some duplication (patch file smaller than new file) will tell you the largest of contiguous block of duplicate data. If there is no identical data in the raw tar files, then gzip --rsyncable will not help.
After that repeat the experiment on the compressed tar files using gzip --rsyncable. I checked the gzip-rsyncable patch and it appears to reset the compressor every input “chunksize” of min 4KB, avg 8KB, max unlimited?. Assuming a compression ratio of 50% this means the output should have identical sections at least 2KB, average 4KB long. This suggests that a librsync block size of 2KB (which is the default) should be the best compromise between finding duplicate data and computation cost. Larger librsync block sizes will not find as many small duplicates, but will be cheaper to calculate.
Finally, assuming above experiments showed the tar.gz files have any duplication at all, you need to tune your rabin chunk size to find it. The largest librsync block size that still finds duplicate data gives you the largest rabin chunk size that could ever find any duplication at all, and you want a chunk size less than half that size to have any chance of finding it. The smaller you set it, the more duplication it can find, but the more blocks you end up with. At some point the overheads of having more small blocks will override the benefits of finding duplicates of those small blocks.
The ipfs default chunker is a static 256KB block size, and this is also the average rabin chunk size if not explicitly specified. The default is usually indicative of a sweet-spot in terms of block size vs number of blocks, though for static blocks this sweet spot is usually the largest practical block size, so for rabin the sweet spot is almost certainly smaller, probably at least half that. If I had to guess I’d say find the block size in librsync that gives you diminishing returns in terms of finding duplicates and use an average rabin chunk size half that to find the most duplication. If that size is significantly less than say 16KB you may find the number of blocks you end up with makes it not worth it. If the largest block size librsync can still find duplicates with is <16KB then rabin chunking is probably not worth it.
TLDR, check if the tar.gz files have any duplication worth finding. Then use --rabin-min-avg-max where avg is half the best librsync block size, min is 1/4 avg, and max is 4*avg, or just try --rabin-4096-16384-65536 and see if it helps.