Lack of reference counting for pins

I got a question about pinning.

IPFS objects form a DAG, so whenever I add or pin something I never know if it is not already used and pinned from somewhere else in my application. So when do I know that it is safe to unpin something?

I would have to keep track manually of how many times I have referenced the object from anywhere within my application to make sure I only unpin it once I really don’t need it anymore.

I would have hoped for the pinning system to do some kind of reference counting to ensure that it is kept track of how often an object is pinned. Without that, I don’t see how I can ever confidently unpin something because it could be used from somewhere else.

So how do I work around this?

2 Likes

Just did a quick experiment. Add a small file hierarchy:

tree1
|-a.txt // contains ‘a’
|-b.txt
tree2
|-b.txt
|-c.txt

added tree1 and tree2 recursively

$ ipfs add -r tree1
added Qmbvkmk9LFsGneteXk3G7YLqtLVME566ho6ibaQZZVHaC9 tree1/a.txt
added QmR9pC5uCF3UExca8RSrCVL8eKv7nHMpATzbEQkAHpXmVM tree1/b.txt
added QmSh9WkN3WgwQG8raYCgNY5LHn19V5UcHFkXm73CxAXp99 tree1
$ ipfs add -r tree2
added QmR9pC5uCF3UExca8RSrCVL8eKv7nHMpATzbEQkAHpXmVM tree2/b.txt
added QmetGxZTgo8tYAKQH1KLsY13MxqeVHbxYVmvzBzJAKU6Z7 tree2/c.txt
added QmZNFvNfWjtrj5iSjfs4Y6x4AR3vgzT1brnqCvw6MwEpJU tree2

Check pinning status for tree1, tree2, b.txt, (which is referenced from both tree1 and tree2):

$ ipfs pin ls QmSh9WkN3WgwQG8raYCgNY5LHn19V5UcHFkXm73CxAXp99
QmSh9WkN3WgwQG8raYCgNY5LHn19V5UcHFkXm73CxAXp99 recursive
$ ipfs pin ls QmZNFvNfWjtrj5iSjfs4Y6x4AR3vgzT1brnqCvw6MwEpJU
QmZNFvNfWjtrj5iSjfs4Y6x4AR3vgzT1brnqCvw6MwEpJU recursive
$ ipfs pin ls QmR9pC5uCF3UExca8RSrCVL8eKv7nHMpATzbEQkAHpXmVM
QmR9pC5uCF3UExca8RSrCVL8eKv7nHMpATzbEQkAHpXmVM indirect through QmSh9WkN3WgwQG8raYCgNY5LHn19V5UcHFkXm73CxAXp99

Recursively remove pinning for tree1 (which is what b.txt is currently pinned by)

$ ipfs pin rm -r QmSh9WkN3WgwQG8raYCgNY5LHn19V5UcHFkXm73CxAXp99
unpinned QmSh9WkN3WgwQG8raYCgNY5LHn19V5UcHFkXm73CxAXp99

Check if b.txt is still pinned

$ ipfs pin ls QmR9pC5uCF3UExca8RSrCVL8eKv7nHMpATzbEQkAHpXmVM
QmR9pC5uCF3UExca8RSrCVL8eKv7nHMpATzbEQkAHpXmVM indirect through QmZNFvNfWjtrj5iSjfs4Y6x4AR3vgzT1brnqCvw6MwEpJU

Yes it is, but now via QmZNFvNfWjtrj5iSjfs4Y6x4AR3vgzT1brnqCvw6MwEpJU aka tree2.

So the indirect pinning works exactly as expected. Not sure if it is via reference counting or via another mechanism, but I don’t really care.

The question now becomes:

  1. How can I get this “safe behaviour” that takes into account that an object can be pinned from multiple places in a DAG for things that I directly pin, to get more fine grained control over what is pinned?

  2. Is there some detailed documentation for how the pinning system works, including internal approach and performance characteristics, or is it just “use the source, luke”?

I don’t think you should think of something being pinned as a reference count. You’re in a decentralized environment. The pinning only means that it’s pinned from your perspective. If you need it to be pinned, then you have to pin it. You can’t rely on the fact that someone else pinned something. That’s what it means to be decentralized.

Yes, I am very much aware of this. I am talking solely about a single node. Let’s say I have some piece of data which is very common. It might be linked from many places in my application (we are not talking about a blog or website here, but a pretty complex database application).

Now for whatever reason I don’t need one particular link anymore. How do I decide that it is “the last” link, and I have to unpin?

This is exactly identical to handling pointers to objects on a heap. There are basically three approaches:

  1. The malloc approach
    You have to make sure that you delete (pin rm) an object on the heap only when there are no more references to it. If you mess up, all hell breaks loose.
  2. The smart pointer / reference counting approach
    There is a counter that keeps track of how many pointers point to a location, and you automatically delete (pin rm) once the last pointer is gone. This is famously not working for cyclical references like a double linked list, but these are impossible anyway in IPFS since it is an acyclic graph.
  3. The garbage collector approach
    You perform global analysis on the heap and detect when a pointer is no longer referenced directly or indirectly from a “root”. Can be very fast, but is a very complex piece of software.

I was kind of expecting ipfs to follow the reference counting approach, since a DAG can not have cycles so this should work pretty well. But this seems to not be the case. It seems to be more like 3., which unfortunately in my case forces me to do 1.

My understanding is (1). Not an expert though.

In my current understanding it is more like (3), if you pin recursively. A recursive pin is like a GC root on a garbage collected heap. The problem is that this is very coarse. What if you want to have more fine-grained control over what is pinned? For example, you have some nested dag objects, and somewhere you got a dag link to this: Main Page

Obviously pinning everything recursively is not the answer, since it might not even fit on your device.

Sorry to be a noob, but please help me understand something. In your example above, you never do a ipfs pin add on the files. Are they pinned by virtue of being part of the folder that you added (but didn’t pin)? What happens in your example if you do ipfs add... and then ipfs pin add... Does it matter?

The default behavior for ipfs pin add is to also pin the root hash. This can be overridden using the --pin option.

If you do ipfs add followed by ipfs pin add it is redundant if you used the default ipfs add behavior. In either case the ipfs pin add should complete quickly since the content should be cached in your repo.

Is there a typo? What you said is confusing. @leerspace

@tjayrush I don’t see any typos in what I wrote. If there are specific things I wrote that don’t make sense to you I’d be happy to clarify. Otherwise if nothing I wrote made sense maybe someone else can put it more clearly.

edit: looking at the example above that I think you were referencing, I might not have understood what you’re asking. Please feel free to disregard what I wrote if it doesn’t apply. Apologies if I’ve only added noise.

I just meant that you said:

Which confused me a bit. Did you mean to say “The default behaviour for ipfs add is…” In the original example, the OP doesn’t do ipfs pin add, they only do ipfs add -r. I’m just learning, so I’m not sure.

Yep, that’s a mistake. I meant to say what you wrote.

1 Like