The “seed” server will have all the blocks.
So, the idea is to try findprovs on a bunch of blocks, see what peer IDs they have in common, and then use IPFS Check to talk to that peer and ask it about a bunch of the blocks. If it’s the seed, it will respond that it has them all. That’s the server you want to peer with.
However, after a cursory look just now, their reprovide is dismal, so it’s super hard to find out anything, and that’s why things are super slow.
A lot of people have the misconception that they can use their node to “upload” a bunch of things to IPFS, and then they can just turn off their node and it will magically work. Their node is the seed, if they turn it off, all the content disappears until they turn it back on.
And, of course, if they are publishing a lot of stuff and are not using the accelerated dht, their reprovide will never really work, and things will be super hard to find.
After lunch, I’ll try and poke at it a little bit more, see if I get lucky and find the seed for one of your two examples (assuming it’s even running).
The problem is, you now know a lot more about all this than the people who are publishing the content you are trying to get, and that says a lot.
alright I understand now, its a bit disappointing to find out its harder and harder to get this contents faster.
I keep asking because I know there is something I can do since there is other bots who are way faster than mine, so I know I will find a way to get this faster.
anyway, dont worry about it too much, I don’t want you to loose that much time on this little issue.
I guess I will just find a way to multiply the Ip adresses that my multithread uses so I can send more requests and get it that way faster.
Eventually I can run different part of the code on different machines and sum up all the results.
Let’s say there is still something I can do, I just thought that those other bots I see loading the ipfs links in less than 1 second had a special trick in their code or something, I guess that’s not the case though.
Thank you so much man
Nothing goes that fast in IPFS. If they are doing it that fast, something else is going on:
- They had a chance to download the info ahead of time and cached it in a local database, and that’s what they are serving
- They are pulling the data from somewhere else (not IPFS, some sort of API the publisher is providing or documents they published)
- or something else along those lines…
If we can find the seed nodes, it will go fast, but not that fast. It will probably still take 10-15 seconds to pull the whole list.
I mean I’m guessing they just do what I do but divided in multiple parts and done simultaneously. If I run my script to check the first 5000 on my pc and the other 5000 on another pc with another IP I will theoretically get the whole list 2 times faster. Well I guess they are doing this process but with lots and lots of machines, I could calculate how many times I would need to divide my time to get theirs so from 1 min approximately to 1 second I would need 60 different machines/processes to run on different IP’s and get different parts of what I need