"ipfs add" and "ipfs files write" commands returns different hashes

From @lockedshadow on Tue Dec 13 2016 15:22:15 GMT+0000 (UTC)

Hello! And first of all, I apologize for my bad english. Hope that you can understand it.

I’m trying to add into mfs some files (previously added via ipfs add) but ipfs files write command produces different hashes than ipfs add.

For example:

$ echo "IPFS Files API is awesome!" > ipfs-files-api-test.txt
$ ipfs add ipfs-files-api-test.txt
added QmUZtQRZG58yB55k5NFPFeYBQ3FMTKydpuNAb66JnxDgup ipfs-files-api-test.txt

Next, let’s try to write this file to mfs:

$ ipfs files write --create /ipfs-files-test ipfs-files-api-test.txt
$ ipfs files stat /ipfs-files-test
QmYJnHQ8yMSursnCvJa2nKEaQKXXFbbLm5MLXqbuHKZdfe
Size: 27
CumulativeSize: 137
ChildBlocks: 2
Type: file

As we can see, hashes is actually different. Seems like one string now known as two different objects. If is actually true, it turns out that deduplication is not performed for this case.

But object, that returned by ipfs files stat command have two child blocks. Maybe one of these blocks is the same object, that was produced by previously executed ipfs add command?

$ ipfs object links QmYJnHQ8yMSursnCvJa2nKEaQKXXFbbLm5MLXqbuHKZdfe
QmejyB5JSYNMcJeQbXuPj4W1DM23DxsWrU42JQFqy3Z7Xe 8
QmPt4vGy69ENW5GJgVN8wSV5UAoG2SapjRYVUDQJVWbACR 35

No, none of these is not QmUZtQRZG58yB55k5NFPFeYBQ3FMTKydpuNAb66JnxDgup, which produced by previously executed ipfs add. But one of those definitely should contains the source string:

$ ipfs get QmUZtQRZG58yB55k5NFPFeYBQ3FMTKydpuNAb66JnxDgup -o result-of-add.txt
$ ipfs get QmPt4vGy69ENW5GJgVN8wSV5UAoG2SapjRYVUDQJVWbACR -o result-of-files-write.txt
$ diff result-of-add.txt result-of-files-write.txt --report-identical-files
Files result-of-add.txt and result-of-files-write.txt are identical

Indeed, it’s the same string. But why the hashes are different? Not exactly what I would like to get.

But OK, we can directly add some previously added hashes to mfs. For example:

$ ipfs add ipfs-files-api-test.txt
added QmUZtQRZG58yB55k5NFPFeYBQ3FMTKydpuNAb66JnxDgup ipfs-files-api-test.txt
$ ipfs files cp /ipfs/QmUZtQRZG58yB55k5NFPFeYBQ3FMTKydpuNAb66JnxDgup /ipfs-files-test-2
$ ipfs files stat /ipfs-files-test-2
QmUZtQRZG58yB55k5NFPFeYBQ3FMTKydpuNAb66JnxDgup #Finally, the same hash!
Size: 27
CumulativeSize: 35
ChildBlocks: 0
Type: file

(BTW, it’s slightly unclear, that we can write to mfs any existing hashes using ipfs files cp. I figured it out only after reading this: https://github.com/ipfs/go-ipfs/issues/2610#issuecomment-241066009)

But what if I now want to overwrite some files, existing into mfs?

$ echo "IPFS Files API is really awesome!" | ipfs add
added QmS2YcaWxiprdGuXgvsNpqnKeRPeKbrDjTZcdw2qdv8yYa QmS2YcaWxiprdGuXgvsNpqnKeRPeKbrDjTZcdw2qdv8yYa
$ ipfs files cp /ipfs/QmS2YcaWxiprdGuXgvsNpqnKeRPeKbrDjTZcdw2qdv8yYa /ipfs-files-test-2
Error: directory already has entry by that name

Actually, I cannot do that. In case that I definitely want to overwrite some files, I’ll have to execute ipfs files rm first, and cannot overwrite it directly, as ipfs files write do. But I don’t want to use ipfs files write, because for now it’s produces different hashes that ipfs add, and don’t allow to perform deduplication.

Tl;dr:

  1. ipfs add and ipfs files write probably should produce the same hashes, but it’s doesn’t.

  2. It should to be a bit more clearly explained in documentation, that ipfs files cp allows to copy existing hashes into mfs, not only files already written to mfs.

  3. ipfs files cp probably should have option to overwrite existing files, but it hasn’t.

    Copied from original issue: https://github.com/ipfs/support/issues/45

From @rddaz2013 on Tue Dec 27 2016 07:08:06 GMT+0000 (UTC)

> Actually, I cannot do that. In case that I definitely want to overwrite some files, I’ll have to execute ipfs files rm first, and cannot overwrite it directly, as ipfs files write do. But I don’t want to use ipfs files write, because for now it’s produces different hashes that ipfs add, and don’t allow to perform deduplication.

mhh…perhaps you get two hash’s because of the unterlaying deduplication of the file with the same content but other filename? could it be that the second hash i only a link?

The underlying concept of ipfs makes it difficult to adopt the previous concepts for storing data easily.

From @Kubuxu on Tue Dec 27 2016 08:55:35 GMT+0000 (UTC)

re. 1

The hashes are different because ipfs files write uses different linking structure than ipfs add. ipfs files write's linking structure is optimized for random seeking and writing after initial creation and ipfs add structure is optimized for reduction of link count.

Underling data is still deduplicated as they use the same chunking, AFAIK.

re. 2

I will try to improve that

re. 3

From @rddaz2013 on Tue Dec 27 2016 09:17:46 GMT+0000 (UTC)

> re. 3

ipfs/go-ipfs#2074

that would be a nice step…

Hello again, and my apologies for extremely late answer.

Yes, as we can see by given output of ipfs object links, second hash is a link that refers on two underlying objects. But QmPt4vGy69ENW5GJgVN8wSV5UAoG2SapjRYVUDQJVWbACR, which is a hash of exactly the same data as first hash, not the same as it! It’s no any blocks under QmUZtQRZG58yB55k5NFPFeYBQ3FMTKydpuNAb66JnxDgup and QmPt4vGy69ENW5GJgVN8wSV5UAoG2SapjRYVUDQJVWbACR, isn’t it?

Linking structure is related to composition of «sub-blocks» of large objects, and chunking is a process of decomposition of file into small blocks (which are 256 kb by default), right? Anyway, causes for producing different hashes of whole object (= top link in DAG node) is quite clear, if blocks of this objects is different too. But why the hashes of each underlying blocks is also different, even it’s have exactly the same size and contains exactly the same data? It seems that deduplication still not performed in this case.
And let’s make a small experiment for see that!

I have two nodes: A and B, and two identical copies of one file on each of its. On node A, I add this file with ipfs add command. Returned hash is QmQskVDpjHWytARWPscdUrdXMvs5zWojjb6FeMCutkQGst. Please remember that!
On node B, I add this file with ipfs files write --create command. Hash, which returned by ipfs files stat /path/to/file is… secret for while.

All stat of files are identical.

$ ipfs object stat QmQskVDpjHWytARWPscdUrdXMvs5zWojjb6FeMCutkQGst
NumLinks: 2
BlockSize: 104
LinksSize: 90
DataSize: 14
CumulativeSize: 361604

$ ipfs object stat [secret hash from node B] 
NumLinks: 2
BlockSize: 104
LinksSize: 90
DataSize: 14
CumulativeSize: 361604

$ ipfs object links -v QmQskVDpjHWytARWPscdUrdXMvs5zWojjb6FeMCutkQGst
Hash                                           Size   Name
QmVww1W7GF8HB2u242Vn31WmbSguZ7mYsmFd9qTLWyZ5Nm 262158
QmPyaV2DhuSDgd3rxgsuVxAhKNx8T9zmK8R1rYQJtEHVvc 99342

$ ipfs object links -v [secret hash from node B]
Hash                                           Size   Name
[secret hash of first block]                   262158
[secret hash of second block]                  99342

And now node A goes offline.

And then, let’s try to get this file with using hash from node A and blocks from node B. Please post it there, if you got it! (But you probably cannot do that, until node A not appears online.)