I’m reading the paper Towards Peer-to-Peer Content Retrieval Markets: Enhancing IPFS with ICN published in ICN’19. It says IPFS sends the file without knowing if the receiver already has it. It results in 9 times more duplicated chunks in their experiments and causes congestions as well as low throughput.
More accurately, IPFS nodes currently request pieces of files from multiple
peers at once because it doesn’t know which peers actually have the
requested data. In my tests, this usually leads to 2x overhead, not 9x,
but it’s still terrible.
There is an engineer working full-time on solving this issue and those
changes should be landing by the end of the year.