Adding a directory with a lot of files

I added some content - a directory via ipfs add -r <path> and it worked:

added Qmcwa4FAW74p3M5AsRLC41ifo5ytEouam7EL1Ad3MasNog output
 8.26 MiB / 8.26 MiB [==============================================================================================================================] 100.00%

and took about 30min

then I tried to pin it on my dappnode (added it on my laptop which is not 100% online) and it was just hanging:


And as I do not get any progress there I tried:

ipfs pin add QmZvCJBNKdKMohHE5u18vNgK6pA3RS5CkWu82M7HWZ84pA --progress

even on the same machine where I did the successful ipfs add to eliminate connection problems.
And it is stuck on Fetched/Processed 0 nodes for days.
The problem is that one subdirectory contains a lot of (small) files (272745)
Is there anything I can do or is it just not possible with IPFS?

if you want to reproduce it - I was trying to pin the directory output that is created in the build-step of GitHub - ethereum-lists/website: The source for the site lists.eth
But guess a simpler repo might be to just use GitHub - ethereum-lists/4bytes: List of 4byte identifiers for EVM smart contract functions directly as I am pretty sure this is the culprit (cannot test now as I am now on 3g and limited data)

Try ipfs block stat <cid> if the result is larger than 1MB then you won’t be able to transfer it over the network (the actual limit is a bit higher than 1MB, but that’s all that’s guaranteed to work at the moment).

The way to get around this is by using UnixFS sharded directories. Unfortunately, they’re not enabled by default at the moment and have some tradeoffs go-ipfs/ at master · ipfs/go-ipfs · GitHub.

Work on automatic sharding is being tracked Tracking issue for UnixFS automatic sharding · Issue #8106 · ipfs/go-ipfs · GitHub and is currently slated for go-ipfs v0.10.0

1 Like

Thanks @adin - the hint with the sharding feature is most helpful - will try this out.
But not sure if the result is larger than 1MB - not sure about the unit here:

igi@komputing:~$ ipfs block stat Qmcwa4FAW74p3M5AsRLC41ifo5ytEouam7EL1Ad3MasNg
Key: Qmcwa4FAW74p3M5AsRLC41ifo5ytEouam7EL1Ad3MasNog
Size: 208

The units are bytes (as described in ipfs block stat --help), but you did ipfs block stat Qmcwa4FAW74p3M5AsRLC41ifo5ytEouam7EL1Ad3MasNg while asking about QmZvCJBNKdKMohHE5u18vNgK6pA3RS5CkWu82M7HWZ84pA.

To be honest any time you’re attempting a solution that involves 100s of thousands of files in a single folder, you should expect that to fail (or have unusably slow performance) on most file system that exist. Linux/Windows, etc. It’s just a massive anti-pattern. i.e. trying to use a ‘folder’ as a “blob database”. Never works, at scale.

If you need that much data stored, just add each blob/file as a separate thing and worry about storing all their CIDs (the index of them) as a completely separate task. That’s just my advise, people may disagree.