IPFS for ETL operations?

7yl4r · January 26, 2018, 4:41pm

I have been thinking a lot lately about the value I think IPFS could provide to Extract, Transform, Load (ETL) operations. My thinking is that if paired with a metadata database, IPFS could handle both the “extract” and “load” steps for simple cases where “extract” means “pull some files to the local processing server” and “load” means “push the resulting data product into a data lake”.

To me this seems huge because it would allow data analysis developers to focus entirely on the “transform” stage. I have been seriously considering attempting a test implementation, but am hesitant to commit a lot of time to such an experimental idea.

Here is a rough outline of my plan:

put ipfs on all my machines
implement database to map product metadata into sha5 hash for “extracts”
replace E & L operations in my airflow pipeline with ipfs fuse mount usage & ipfs add, respectively
set up an ipfs cluster to keep my data pinned across nodes

Has anyone tried something like this?

Are there challenges I am overlooking?

Topic		Replies	Views
IPFS in production - challenges, best practices, and use cases Ecosystem use-cases-and-apps	2	2854	January 30, 2018
Harnessing IPFS for NFT Metadata Management: A RAIR and Filebase Success Story Ecosystem ipfs	0	107	February 8, 2024
Interested in putting your IPFS use case onto Filecoin? Ecosystem	0	292	September 4, 2020
An IPFS-based DB Ecosystem	8	2828	April 5, 2018
Metadata-query-addressable filesystem vs content-addressable filesystem	8	333	December 8, 2023

IPFS for ETL operations?

Related Topics