Hey interplanetarians! I got sent her by some folks from the dat-project, who thought IPFS might be better suited for the use case I have in mind.
I've got a 1TB hard drive that I'd like to clean up and make useful. There's two problems with the way that data on that drive is structured. For one, I have the problem where, when backing up my data in a hurry because a computer was failing, I ended up backing up backups --- i.e. there is a degree of nesting in there where I have a bunch of data duplicated at more than one level. The other problem is, when I didn't understand that deeply nested directories were a poor way of labeling data, I used them to try and divide up notes, PDFs, etc., by subject area. I was sort of using the directory system as a poor tagging system.
What I would like to do is write a program that examines files on the drive. If a file is novel to the program, I want its contents to get some canonical path for later retrieval, and I want to save the string representing its path on the TB drive in a database. If a file is NOT novel, I still want to save that path string, but associate it to the existing record.
I learned about content-addressible hashing a couple of years ago, and thought it might be the key to solving problems like this. But I also bet that I didn't have to implement rolling checksums myself, and so I came here. Can someone help point me in the direction of any tools, libraries, or systems that could help me with this project?