This is assuming that we want more metadata (which I think we probably do) I think we should consider how it grows over time and support efficient streaming of large directories.
Indexing the files is interesting. Right now there there is a de-facto index on name and nothing else. (There is metadata for size but not sorted). However for different use cases different indexes are optimal.
- A user may wish to sort by name, size, last modified or any other attribute.
- A file manager may want to display name and size.
- A file open dialog may wish to filter by file type.
At the end of the day we have a table of metadata and we should consider the possible layouts. I think we want to have indexes for sure. Otherwise you can’t do streaming with a specific sort order. (For example show me the newest files, or biggest files)
There are two main approaches for any sort of table
Row-oriented layout
This is the most obvious solution. The main “directory” CID points at an object with links to the metadata entries as well as “indexes”.
entries:
- cid: QmQ8Qc8Y85YGrQ1zCQdn8DGZj1BP4bsupQsXBw6xh2D4PL
name: 2020
size: 308681
mime: inode/directory
- cid: QmUj45TFLeCjNzF9kWXv88LVzGvAJTqa9HYHTFEQMhLCKY
name: 2021
size: 11827
mime: inode/directory
- cid: QmXvBPGMU1Dk8EwBPL5Q629AuD1fVdRF5vNCpqrTahRTVQ
name: index.html
size: 1783
mime: text/html
indexes: # One index can be omitted by defining the sort order of `entries`
cid:
QmQ8Qc8Y85YGrQ1zCQdn8DGZj1BP4bsupQsXBw6xh2D4PL: 0
QmUj45TFLeCjNzF9kWXv88LVzGvAJTqa9HYHTFEQMhLCKY: 1
QmXvBPGMU1Dk8EwBPL5Q629AuD1fVdRF5vNCpqrTahRTVQ: 2
name:
2020: 0
2021: 1
index.html: 2
size:
1783: 2
11827: 1
308681: 0
mime:
inode/directory: 0
inode/directory: 1
text/html: 2
One basic optimization would be ordering entries
as map of name
to remaining metadata, much like sharded directories are today. This makes lookups fast. However you would also want to look up by index, otherwise you would need to have the index contain the filenames which could be expensive.
Column-oriented Layout
It would also be interesting to consider a “column oriented” table of metadata. This is similar to the row-oriented but instead of separating each enposixtry contiguously you split each column into a different “datastream”
columns:
cid:
- QmQ8Qc8Y85YGrQ1zCQdn8DGZj1BP4bsupQsXBw6xh2D4PL
- QmUj45TFLeCjNzF9kWXv88LVzGvAJTqa9HYHTFEQMhLCKY
- QmXvBPGMU1Dk8EwBPL5Q629AuD1fVdRF5vNCpqrTahRTVQ
name:
- 2020
- 2021
- index.html
size:
- 308681
- 11827
- 1783
mime:
- inode/directory
- inode/directory
- text/html
indexes:
# Same as row-oriented
This solution is fairly naive. It probably makes sense to group cid and name together as lookup is probably the most frequent operation.
Column-oriented also provides the possibility of a lot of compression options. As a lot of the values will be the same or similar. (for example a directory with everything modified at the same time, with the same few mime-types or HTTP headers.
Comparison
For this comparison I’ll assume that both are sorted/sharded by filename and the column-oriented interleaves the cid
and name
columns.
cat
These are fairly similar, however column oriented is slightly better because it pulls in less irrelevant information. (for example the file types aren’t pulled in) This means that if you are doing repeated cats your metadata will be hotter and you will get better efficiency.
stat (full)
This is a clear winner for row-oriented. It pulls in basically the minimum blocks and requires no “joining”. column-based pulls in a block per metadata requested leading to more data transfer and tail latency.
stat (partial)
This is unclear. It depends on how many metadata columns there are and how many you need. For repeated stats column-oriented will come out on top because you are puling in only the relevant types of data. However for cold lookups the row-based will likely access less blocks for any reasonable amount of metadata.
Furthermore most software assuming POSIX semantics will do a “full” stat which pulls in a lot of fields.
list (full)
This is a win for column-oriented as it only pulls in the metadata it needs. row-oriented will access more blocks.
list (partial)
When listing a small part of a directory row-oriented is better because it will access less blocks. Both are the same index usage (if required for sorting) however column-oriented will access N blocks if N bits of metadata are required (see stat) whereas row oriented will usually only need one block to get all of its metadata. As you list a higher portion of the directory this will improve and switch to column’s favour (assuming that you cache the blocks that you read).
Conclusion
It isn’t completely clear. The biggest problem with column-oriented is the “wide stat” which is assumed to be efficient by the POSIX APIs and a lot of software does have POSIX assumptions and will probably continue to for the foreseeable future. On the other hand consumers of HTTP APIs are getting used to selecting which fields they wish to have returned.
One interesting thing about column-oriented is that because metadata is separated into different blocks by type directories with a subset of metadata similar (for example different last-modified times) will be very similar, only that column and the root node will differ. This means that adding, removing or changing one type of metadata will not require a completely new set of blocks. However on the other hand adding or removing files will affect more blocks.
Either way I think it is clear that we should consider a solution for more metadata. Is there somewhere more targeted that we can discuss this?