I'm rephrasing your intention to clarify it. You have a largish number of file objects, each with some identifier (such as a full pathname.) You have a small, generally fixed, set of labels (sometimes called "tags") to be associated with the file objects. The relationship between file objects and labels is many-to-many. (In other words, a file can be associated with 0, 1, or more labels, and a label can be applied to an arbitrary number of files.) (Unstated before, but likely:) You want to avoid duplication of labels associated with any file object. The traditional way to express this in a relational database is to have 3 tables and a constraint: 1. A table with one row per file object, containing at least their identifiers. 2. A table with one row per label that _might_ be associated with files. 3. A linking table, each row of which signifies the association of of one label with one file object. In SQL-speak (or RDBMS-speak), there will be two foreign key columns, one referencing file table rows and the other referencing label table rows. 4. Usually, a linking table has a UNIQUE constraint applied to the combination of the two foreign keys. This precludes the above-mentioned duplication. Your concerns about speed will be addressed by creating suitable indexes upon the tables. The precise indexes needed will depend on the queries you want to support with speedy response. That response will have to be balanced against the impact of index updates during modification of the indexed tables.