A few weeks ago I installed IPFS Desktop on my Mac and added a simple PDF file. It worked great and I was able to come back over the next days and access the PDF. Now when I try to access the PDF file I get the error message “Error, Failed to load PDF document”. This is consistent and refreshing and restarting the Mac does not fix the issue. Oddly, I can click on More, Inspect and then click on the “View on IPFS gateway” option and it displays correctly. Any thoughts would be appreciated.
I am wondering if this forum is inactive?
Your post doesn’t have nearly enough information, which is probably why no one picked it up.
How did you add the file, did you pin it, is the node running 24/7, is the node reachable, when accessing the file, how do you go about it, which gateway returns it, etc etc etc. We need to know a lot more about your problem before we can even start thinking about what it might be.
P.S. also, giving us your node’s peerID and the PDF’s CID would allow us to do some tests
Thanks for responding. The file was added through IPFS Desktop. The file was pinned right away. The node is on my Mac so is not running 24/7. This is intentional as the application we are designing will have a similar environment. I have been trying to access the file through IPFS Desktop which generates the error message. Unfortunately I am not seeing any detail about the cause of the message. The file is retrievable through the IPFS gateway. This is the default gateway in IPFS Desktop. Peer ID and CID:
Peer ID: 12D3KooWNdGB2JdB75WaHP6HM33d4x1scBYgtQdibUBwwtHYAfq7
Any thoughts would be appreciated.
k, some good news: your node is reachable and has the content:
some bad news, the DHT doesn’t know about the file (as seen in that screenshot and the following test on my node:
> ipfs dht findprovs QmeTqcann5xWcA9pM6BeC69RQDo2yjXV57AyN7Uh1VssAY >
A few things to understand:
- your node is the server providing your content on IPFS. if you shut it down, your content becomes unavailable until you bring the node back up
- your node advertises your content in the DHT every 12 hours, and those records survive for 24 hours. right now, the record for your PDF file is gone, which is why nothing can find it
- the gateway returns the file because it made a copy of it in its cache when you accessed it the first time around. every time you access it on the gateway, it simply gets it from its cache and refreshes the time-out on it. eventually, the file will be flushed from the cache and it will have to retrieve it again. if your node isn’t there to give it, it won’t be able to return it
- if you intend for your node not to be up 24/7, you need to re-pin your content on a node that is. there are services that do that (web3.storage, pinata.cloud, etc etc).
I’m not sure why your node isn’t advertising correctly. one typical reason for it is that you have too many blocks on your node for the default client to be able to advertise all of them under 24 hours, so things fall out. in that case, you have to use the accelerated DHT client on your node, it will fix that problem.
Is the PDF viewer something that opens inside IPFS Desktop? Can you save the file from IPFS Desktop to your computer and then open it?
It sounds as if the PDF viewer isn’t able to read the file properly.
Thanks. A few thoughts:
It makes sense that the content is not available if the system is shut down (in the absence of third party pinning). The error is happening after the node has been up for some time.
So, is it the case that the DHT is not persistent in the local node even if the file is pinned?
Thanks for explaining about the gateway. I actually assume that other nodes would garbage collect and remove the file after a period of time. Just odd to me that my local IPFS Desktop could not retrieve the file.
Wow, it sounds like local IPFS nodes that are not constantly up are unreliable repositories for files, even if pinned, do I understand that correctly? Is a third party pinning service always recommended? And, a natural follow on question, are third party pinning services also likely to lose files if their servers are down for a while?
That PDF is the only file so I suspect it is not an issue of too many blocks.
Can you describe where to get the accelerated DHT?
I believe that the PDF view is integrated with IPFS Desktop. When I first added the file and then clicked on the name, it showed the first few pages of the PDF within the application. And I was able to save it to a local directory on the Mac and open it.
Yes, not sure about the issue with the PDF viewer. Interestingly, while I get the error when I click on the file I can still use the side menu bar to download the file. I guess this points to an issue with the PDF viewer?
The DHT isn’t kept on your own node (at least, not for your own files). Your node never loses anything (assuming it’s pinned or in mfs), but unless your node can refresh the DHT records in other DHT nodes more often than once every 24 hours, those DHT nodes will flush those records (which is why your node does that every 12 hours).
You turn the accelerated DHT client on with this command (restart your node after doing it):
ipfs config --json Experimental.AcceleratedDHTClient true
No, whatever problem there is providing this to the DHT, if your file is available locally then it should open just fine. It sounds as a problem with the PDF viewer inside Desktop. I am not sure how IPFS Desktop reads the file from ipfs and passes it to the PDF viewer, but there’s probably a problem there. Should open an issue in the IPFS Desktop repo.
Thanks for explaining this. I did not realize that the local node does not have DHT entries for local files. I guess that makes sense. I will incorporate the accelerated DHT module in our design.
Do you happen to know how quickly the DHT refresh is with the accelerated module? Or, what the interval is? It’s important to us as our application currently makes the assumption that others can access the file fairly soon after it is added to the local node. Maybe that is a bad assumption?
Thanks. I will open the issue with the IPFS Desktop team. You are right that the file is still available through the download option.
The accelerated DHT client uses the same interval (12 hours), but is able to do a reprovide run much faster. for example, my node has around 30,000 blocks in it. with the default client, it would take over 100 days (yes, days) to do a reprovide run, so everything would fall out of the DHT for 99 of those days. the accelerated client does a run for those 30,000 records in around 10 minutes.
When using the the accelerated client, you will see a traffic spike every hour, that’s normal (it’s just rescanning the network for DHT nodes). The first time you run it, you should see the spike immediately after starting the node (can last 10 mins), and then shortly after that, you should see the reprovide run. you should see it repeat at 12 hour intervals (you can change that value in the config, but 12 hours should be fine).
P.S. forgot to answer your other question: when adding blocks to your node, the node does an initial provide immediately, and then will do a reprovide every 12 hours. so, it’s near immediate.
Thanks for your help, much appreciated! Will have to do some re-design due to the latencies in the DHT updates.
Sorry, added a PS to my previous message about the other question
Thanks, that is important to us!