In my collection management system, I want to be able to see which objects have newly been acquired by the museum on a given day. Or on any day of a given month. Obviously.
In musdb, the former was possible if imperfect thus far. The latter was not possible at all. This is because data that should be in rather controlled fields with suitable data types in the underlying database – dates, value information, sizes – are stored as strings in the database. On the one hand, this allows us to import also inconsistent or unsuitable data into the fields (especially sizes are often not available is in numeric form). Say, keeping these fields as free text fields is a way to prevent data loss for people switching systems.
On the other hand, keeping them as free text fields prevents us from sufficiently making use of them, especially in providing accurate search options for making larger / smaller searches. Second, it allows different workers at a museum to enter the same information in different, incompatible ways.
A Conservative Solution
In a perfect world, we would simply convert the date field for entry dates into an actual date field in the database and mark the field as one in the HTML of musdb (thus providing a date picker in modern browsers). Users would not expect to be able to enter incompatible data (“ca. 2010”) and imports would already contain well-formatted data.
The world is not perfect.
museum-digital hence needs to be able to do both: Ensure consistency and allow the necessary search functionalities on the one hand, and allowing for inconsistency at the additional expense of search functionalities on the other. And that we can do using two new features.
First, we added additional search indexes for the relevant fields that are of the correct numeric or timestamp data types. Only appropriately formatted information can be entered into these fields, and they using these we can provide the search functionalities we sought.
Since only correct and consistently formatted data can be entered into this index, we try to make sense of the available information. Databases generally expect “0.01”, whereas German users would spell the same “0,01”. This we can automatically correct easily. Dates are a more complicated case. 2022-08-25 may be spelled that way. But even in a given language there are often multiple ways to express a date. 2022.08.25. is Hungarian, 25.08.2022 is German, but so is “25 August 2022”. Gladly we already have a rather capable auto-correction function from auto-correcting new time entries in our controlled vocabularies.
Still, there are some values that cannot be made sense of. If you expect a date, and someone gives you “ca. 2020”, there’s nothing really to work with. The search indexes thus only contain any values for an object, if we actually were able to understand whatever had been entered into the respective free text fields.
For new museums that do not regularly use imports, we added a second new feature: strict modes. In the institution-wide settings, there are now options to enable strict modes for values statements, dates, and sizes. If the respective strict mode is enabled, the entry fields will be changed into the correct types. The browser’s date picker thus becomes available for date fields, and the browser will prevent users from entering non-dates. Similarly, values statements and sizes will be marked as numeric, making the browser prevent non-numeric inputs.
If the strict mode is enabled, one can be sure that extended search functionalities like smaller / larger or before / after searches can be used on all objects that have had their respective fields filled after the strict mode had been enabled. On the downside, it will now be impossible to see older, incorrect values in those fields. Upon submitting the forms, the data will be lost.
Enabling the strict mode thus only makes sense for museums starting to use musdb for collection management. We however plan to provide checking and migration scripts eventually to allow museums to keep their existing data entered before enabling the strict mode without risk of data loss when enabling the strict mode later.
We pushed the update yesterday night and unfortunately ran into a problem filling the updated search indexes in case of certain malformed dates: Our auto-correction script for getting sensible time information from strings thus far did not contain a safeguard against time entries like 2022-14-14 – that is, if value for a month was accepted as a two-digit numeric value, but larger than 12 (in the example it would be the 14th month, which obviously does not exist).
Since the update required a full regeneration of the search indexes, the attempt to enter such malformed dates fully halted the update (this is on purpose with internal scripts to force us to immediately fix such problems). Some instances of musdb (mainly Saxony-Anhalt, the Rhineland, Berlin, Budapest and Baden-Württemberg) thus ran with incomplete search indexes for the object search in musdb from about 4 a.m. until noon. The error is now fixed.