Tl;dr: I would like to run an IPFS gateway which maps IPFS addresses to files which are hosted by a third party.
I’m writing to ask for help and advice.
I’m a research software engineer at the European Bioinformatics Institute (EMBL-EBI). One of the services offered at the EBI is the European Nucleotide Archive (ENA). This archive contains petabytes of public open-access genetic data. The data is currently served with FTP and REST with around 6 FTP mirrors world wide.
There is obviously great value in addressing this data with the IPFS protocol. For example, guaranteed replication of computational results due to content addresses input data.
I have “local” (a mounted drive, as local as possible) access to the data. However, I don’t believe that I can simply run an IPFS node on the computational infrastructure available at the EBI and start serving files. I think that this strategy will encounter a lot of resistance. Which grant/budget covers the infrastructure cost (particularly network IO and file system IO)?
I could try convincing the ENA devops to deploy IPFS, but I’ve heard that they’re very busy and I don’t believe that they will be very keen to support a third protocol when there is no apparent (for them at this time) motivation to do so.
I have thought of hosting some of the data myself as a proof of concept. However, it’s occurred to me that I can “serve” all of the data from the ENA using IPFS if I can redirect IPFS addresses to the corresponding ENA endpoints. In theory, all that I would need to store locally is the mapping from IPFS addresses to ENA endpoints.
Does this sound like a reasonable solution and does it currently exist?
If it does not exist, I am happy to work towards implementing it within IPFS or as a hack.
Thank you for your attention.