The Tag System

In pybow, Bow and Limb objects can be tagged with arbitrary category identificators, much like tweets have #hashtags. Since tagging is something more commonly associated with social networks than with Python objects, let alone archery, I want to explain how I arrived at this idea.

The tag system – each Bow and Limb object having an attribute tags for arbitrary categorical metadata – evolved from a series of design decisions. My initial plan for representing bows as Python objects looked like this:

  • Bow
    • Name (str, mandatory)
    • Material (str, optional)
    • Artefact Length (float, generated from limb data)
    • Effective/Nock-to-Nock Length (float, generated from limb data)
    • Date (int or datetime.date, optional)
    • Source (str, mandatory – requiring an author name as a minimum inconvenience to contribute to a sum good.)
    • Description (str, optional)
    • Limbs (up to 2)
      • Limb
        • Index/id (int, generated automatically/implicitly)
        • Name (str, optional)
        • Measurements (pd.DataFrame, mandatory)
        • Complete (bool, mandatory, default: True)
        • Upper/Lower (elegant implementation unsure, mandatory? optional?)

With one idea from the beginning being to iterate over a bow to yield its limbs, and then using comprehensions to filter by desired metadata, like so:

>>> [limb for limb in bow if limb.complete]

to extract only those Limb objects that describe complete, unbroken limbs.

Due to not having a good idea how to implement whether a bow limb is the top limb, bottom limb, or its position being unknown/undecided, I postponed implementing it. [*] If top/bottom were to be important for a project, one can easily make that the limbs’ names and then filter by limb.name == 'top' or limb.name == 'bottom'. This line of thinking (and its correlating data model) was how I first implemented the Limb class.

Using the limb’s name as a way to store metadata, however, soon started to irk me. For one, it’s just not right. A name is a name, and descriptive categorical metadata is descriptive categorical metadata. And then, bow limbs don’t tend to get named. Bows might get names of some sort to tell them apart, but limbs usually get referred to by descriptors of their position, their shape, or whatever property is convenient in the current context.

Storing a descriptor as a limb’s name also introduces a follow-up problem: Limbs don’t necessarily have only one descriptor. Limbs can be top or bottom limbs. (or their position is unknown.) Limbs can be straight limbs or have character growth. Limbs can be the longer or shorter ones, the wider or narrower ones. Limbs can be straight, recurved, or decurved. Limbs can have idiosyncratic differences meaningful only in the context of this one bow. Broken limbs in a prehistoric artefact could have broken in use or in sediment. The possibilities are endless, and limb names could therefore devolve into being a concatenation of multiple descriptors.

Merely renaming the name attribute to something more befitting its new role won’t address the problem: Storing multiple descriptors in one str is clumsy to handle, and also unsafe for checking the presence of a descriptor. With a growing vocabulary of possible descriptors, the chance for accidental substring-matches increases and becomes its own problem to keep track of.

However, it was the thought of multiple user-chosen descriptors in one str that led me to what is now the tag system:

The ability to store an arbitrary number of user-chosen descriptors is very appealing. It is a much better design than supplying a number of attributes for possible descriptor categories and inevitably missing one for something someone will want to do. Handling and mismatch-safe checking for a descriptor’s presence need to be improved, though.

Python’s sets provide all of that: They are, like lists and tuples, collections of items whose presence you can check for:

>>> collection
{1, 2, 3}
>>> 2 in collection
True
>>> 4 in collection
False

Sets, like lists and unlike tuples, are mutable. That is, items can easily be added to them without needing to replace the whole thing:

>>> collection
{1, 2, 3}
>>> collection.add(4) # add one item
>>> collection
{1, 2, 3, 4}
>>> collection |= {5, 6, 7} # add a set
>>> collection
{1, 2, 3, 4, 5, 6, 7}

Sets also don’t allow duplicate entries, which isn’t a necessity, but neat. Sets have no order, but that is of no concern for storing descriptors. Finally, checking whether an item is in a set with in is both beautiful, concise syntax, and fast.

Once Limb.name became Limb.tags, it was hard not to realise that being able to tag limbs, but not the entire bow, is unrealised potential. (And also unpleasantly asymmetric.)

And that is how bows and limbs got tags.


[*]Setting top/bottom on one limb would need to update the other, preferably in a non-silent manner, it should store three states in a somewhat elegant and non-redundant way, safeguard against combinations that make no sense, like top/top, bottom/bottom, unknown/top, and so on.