tagmentation
A fragmented tagging system.
Tagmentation is a system for tagging stuff. It is similar in concept to software like TMSU, or websites like Last.FM or image Boorus. Unlike TMSU, tagmentation is designed from the start to support tagging of arbitrary objects, not just local files. And unlike tagging websites, data is stored locally by default, but still shareable.
Additional notable features of Tagmentation include its "fragmented" nature: its database is designed to support multiple collections coexisting simultaneously, and queries can optionally be run across all collections. You can subscribe to collections that others publish online, or share your own.
Features
This is a listing of features, both implemented and planned. Most features are in the "planned" category at the moment. These features are not listed in any particular order.
- Tag database backends:
- Lisp (saved as a file of s-expressions)
- SQLite
- Can be converted between different database backends.
- Tag features:
- Tag descriptions.
- Tag values.
- Meta-tags (for tagging/categorizing tags).
- Metadata.
- User interfaces:
- Lisp interface.
- Command line interface.
- GUIs.
- Web UI (REST-based).
- Browser extension.
- Mobile app.
- Tagging interface features:
- Prefix or flex completion
- Space or punctuation word boundaries
- Things that can be tagged:
- Local files.
- IPFS (hashes).
- Music (artists, albums, tracks).
- Websites/webpages.
- Tags (Yes, tags themselves can be tagged! This could be used, for example, for ad-hoc categorization/hierarchization).
- Anything your heart dreams of (as long as it can be identified by a string of text).
- Plugins:
- Functionality to allow detection of similar or "duplicate" files based on their "type". i.e. the ability to detect "duplicate" images even if they're not bit-for-bit same, or are of differing dimensions.
- Duplicates can then be "squashed" (worse version deleted and its tags/etc moved to the better), or "related" (both kept, and tags/metadata added to link them to each other).
- Ability to mount the database as a FUSE filesystem to allow browsing it in programs that don't support Tagmentation directly.
- Filesystem watcher (i.e. via inotify or similar) to automatically track file additions, deletions, renames, etc.
- Functionality to allow detection of similar or "duplicate" files based on their "type". i.e. the ability to detect "duplicate" images even if they're not bit-for-bit same, or are of differing dimensions.
- Networking:
- Collection importing - import tags from a dumped Tagmentation collection (i.e. if you download it from someone on Soulseek or similar). Imported tags are added to your database, and can be (de)prioritized and specific tags overridden by your own. Since collections are kept separate in the database, they are only mixed with your own when you want them to be, and it is always easy to fully remove a collection.
- Collection subscribing - download and import tag collections that have been published on the internet. Subscribed collections can be easily synced to re-import updated collections.
- Collection publishing - dump one or more of your tag collections to a file, which can then be published online, so others can subscribe to it.
- Meta-Collections - Public archives of tag collections uploaded by users. Thus you don't have to run your own web server to make your collection easily available.
- Integrations with other tag systems:
- TMSU
- Last.FM
- Gelbooru
- Graphical interface features:
- Thumbnails for everything possible (thumbnail functions are per-type and can be added via plugins)
- Audio
- Images
- Directories
- URLs
- Thumbnails for everything possible (thumbnail functions are per-type and can be added via plugins)
Glossary
Tag - A string of text that describes an aspect of an object.
What Tagmentation is all about! The power of Tagmentation comes from being able to query the database of tags to quickly find items that have certain characteristics, or are related in one way or another.
Tag examples:
- Date - The creation or publishing date of the target.
- Genre - Descriptor of the style of the target.
- Mood - Descriptor of the emotional content of the target.
- Tool - What was used to create the target.
For a song, this could be an instrument; for source code, this could be the programming language it is written in.
- Anything else you can think of!
Tag metadata:
- Name - The tag's friendly name, which optionally differs from the name you type.
- Synopsis - A one-line summary of the meaning of the tag. Defaults to the first sentence of the description.
- Description - An extended explanation of the meaning of the tag, or any other arbitrary text the user wants to associate with the tag.
- Category - What kind of thing the tag describes (see "Tag examples" above).
- Publish? - Whether the tag should be included when publishing the collection or database it is a part of.
- Color - Optional color to highlight the tag with.
- Aliases - Alternative names for the tag. When tagging, all aliases will be coerced into the tag's canonical name.
- Birth - Date and time when the tag was created.
- Uses - Number of targets the tag has been applied to.
- Last usage - Most recent target the tag has been applied to, and the date and time it was applied.
Target - Anything that can be tagged.
Targets are represented within Tagmentation as a type and an identifier. They are conceptually equivalent to uniform resource identifiers (URIs), but non-standardized URIs can be used (i.e. the user can define their own).
Similar to tags, targets also have metadata associated with them.
Unlike TMSU and other file-based tagging systems, Tagmentation allows anything to be tagged, as long as it can be represented as a string.
Target metadata:
- Friendly name - Human-readable text string naming the target.
For example, for a file, this would be the path to the file. For a music group, this would be the band name.
- Resolver - Function that generates a URL for accessing the object.
A target can have multiple resolvers; for example, a music artist can have a resolver that converts its identifier to a MusicBrainz.org URL, and another that converts its identifier to a local file path that contains the music that you have by the artist.
Users can define their own resolvers in their Tagmentation configuration file, but Tagmentation comes with a set of default resolvers that cover some of the example target types described below.
- Discovery - Date and time when the target was added to the database.
Type - The general category of the target.
This is similar to the "protocol" of a URI, but instead of being primarily used to specify how to access the resource, its primary use is to specify how to identify the target. The type specifies a namespace for identifiers, which must be unique within that namespace.
The idiomatic way to represent "sub-types" in Tagmentation is by separating them with colons, i.e. music:artist, music:album, music:track, etc.
Example target types:
- Local files -
file - URLs -
url - IPFS -
ipfs - Music artist -
music:artist - Music album -
music:album - Music track -
music:track - Listyrinth item -
listy - Person -
person
Identifier - A text string that uniquely (within the namespace of the type) identifies a target.
Example target identifiers:
- Local files - The path to the file.
- URLs - The URL.
- IPFS - The content's IPFS content identifier ("CID").
- Music artist - The artist's MusicBrainz identifier ("MBID").
- Music album - The album's MusicBrainz identifier ("MBID").
- Music track - The track's MusicBrainz identifier ("MBID").
- Listyrinth item - A URL pointing to the item.
- Person - A URL pointing to a FOAF profile, a URL pointing to a listyrinth item, or anything you want.
Mapping - Application of a tag to a target.
Value - The "value" of the tag for this mapping.
Generally, values are most useful when the tag for a target can only have one possible value. Instead of multiple values, each value might work better as its own tag instead.
For example, when tagging music, a date tag would only have one value, representing the date that the album or track was first published, or the date that the artist was born or the band was formed. Whereas listing the members of a band in the value of a members tag would probably be less useful than simply applying a tag for each member.
Collection - Set of mappings.
A user can create as many collections as they want, but, unless otherwise specified, all mappings are made in the default collection created when the user starts Tagmentation for the first time. Generally, collections are associated with a user or a "tag source". Collections can be seamlessly and non-destructively merged together when querying the tag database.
For example, any tags you manually create could be in the default collection. You could train a neural network on this tag collection, and have it "auto-tag" untagged targets into a separate collection. This way, you don't pollute your carefully-tagged collection with the (possibly-incorrect) results of the neural net. You continue tagging manually in your default collection, and later train a new version of the NN, and use its "guesses" as the source for a new collection.
Collections can also be imported from another user. Tags imported from a user are kept in their own collection, so they don't interfere with your own tags. You can set a "collection priority" so that your own tags have a higher priority than tags from other users (or vice-versa).
It's also possible to subscribe to a collection if it is published at a specific URL. When you subscribe to a collection, the sync command can be used to download the last-published version of one or more public collections.
Generator - Function that automatically generates mappings, metadata, or any other information.
Like resolvers (see above), generators can be user-defined, but some are also provided with Tagmentation. They can be run "eagerly" (i.e. to auto-tag your entire music collection) or "lazily" (i.e. only when the data is queried for a specific item).
Implication - A tag that implies another tag.
CID - IPFS content identifier.
IPFS is particularly relevant to Tagmentation because they have already defined a very robust system for uniquely-identifying content by… its content, rather than its location (as standard HTTP URLs do). See the Multiformats website for more information.
MBID - MusicBrainz identifier.
MusicBrainz identifiers ("MBIDs" for short) are universally-unique identifiers permanently assigned to every artist, album, and track (and more) in MusicBrainz' database. They are a good choice for tagging music for several reasons:
- MusicBrainz is a completely free and open-source project. Unlike Discogs and RateYourMusic, there is no risk that MusicBrainz will one day decide to close their API or otherwise make their data inaccessible.
- While MusicBrainz does not have every artist, album, and track in its database, it is already a huge database and contains most of the most popular artists and works.
- MusicBrainz identifiers for tracks and albums are based on the actual content of the recording; thus they can be determined even for untagged music, as long as it is already in MusicBrainz' database.
- Using the name of the band/artist, album, or track as its identifier is a bad idea because there are many cases where completely unrelated bands share a name; for example, the experimental rock group Chrome, and the Memphis hip-hop artist Chrome Korleone, who has also released albums under the name Chrome.
- It's also a bad idea because many artists have used several different aliases over their careers. For example, Aphex Twin has released music under the names Aphex Twin, AFX, Richard D. James, Polygon Window, Power Pill, The Tuss, and many more.
For more information on MBIDs, see MusicBrainz' documentation on MusicBrainz Identifiers.
Commands
Tagmentation's user interface has various commands that allow the user to query and interact with its database (and in some cases, things not yet in the database; for example, searching for files in a directory that have not yet been tagged).
config
Adjust Tagmentation settings.
copy
Create a copy of a database or collection.
delete
Delete a tag.
export
Dump a database or collection to a stream or file.
help
Shows general Tagmentation help, or help on a specific command or concept.
imply
Create a tag implication.
import
Import an exported tag collection into a collection inside the local database.
info
Show information about the database or anything contained within it.
init
Create a database.
mount
Mount a database or collection as a virtual filesystem.
rename
Rename or merge a tag or value.
repair
Repair the database.
status
Show status of any ongoing actions.
subscribe
Add a public tag collection to the subscription list, to be automatically downloaded and (re)imported when sync is run.
sync
Download and import the most recent versions of any public tag collections.
tag
Apply a tag to a target.
tags
List tags matching a query.
targets
List targets matching a query.
unmount
Unmount a virtual filesystem that has been mounted with the mount command.
untag
Remove tags from targets.
untagged
List targets matching a query that have not been tagged yet. This is basically a convenience alias for the targets command
values
List tag values matching a query.