Bringing back character-driven search for inventory numbers in musdb

If you search for “run”, you want to find entries (objects, blog posts, etc.), that mention “ran”. If you search for inventory numbers like “*1”, you want to find “0001”. These are fundamentally different categories of search. In the first case, you want to have a language-aware full-text search. In the latter case, you simply want to work with characters. In technical terms: Inventory numbers are strings (groups of signs or characters), but not common “text”.

When musdb and museum-digital’s frontend received their last large-scale update to their respective object search functions around 2021, enabling actually complex search requests across almost all of the fields and data types linkable to objects, this was – among others – made possible by our use of Manticore, a dedicated search server. Traditional relational databases excel at searches by indexes – pre-defined, known search parameters, that one prepares for searchability beforehand – while search servers like Manticore are reasonably good at that and excel at full-text searches.

In moving the search to Manticore, all searches in free text fields were defined as full-text searches. This was mostly the right decision: Full-text searches are the way to go with objects’ titles, descriptions, and the like. But two specific types of fields posed a challenge, because – as indicated above – they usually run on formalized strings that operate very differently from prose: objects’ locations and their inventory numbers. The software does not know about institution-specific and non-standardized rules of formalization, but users do. Hence, the preferred way for those specific types of fields is character-level searching.

For managing locations, we have in the meantime introduced spaces as a dedicated category capable of hierarchization as well as advanced features like the storage of sensor data. Object’s locations can now simply be expressed as a link to a space, which is by far the superior way when compared to the legacy free text field. If one does so, one can search for objects exactly in a given space, those that are located within it or its sub-spaces (e.g. a box in a depot room), etc. A migration tool from the legacy free-text field to the controlled spaces module is available through musdb’s dashboard. “Fixing” the issue of character-driven searches vs. full-text searches in locations is thus a least-priority issue – a better alternative is available anyway.

With inventory numbers on the other hand, there is no alternative to character-driven searching.

Laying the Foundations: From MSQL to Manticore and (Somewhat) Back

The basis for the expansion of search capabilities for objects was the introduction of a dedicated search server running Manticore. As the number of requests increased, this proved to be a blessing and – to some extent – a curse. Manticore offers more and better search options than a classic relational database, but it does not achieve the same level of stability. As long as queries remain index-bound and not concerned with text, the performance is roughly similar on our hardware: (both are about as quick, even with subqueries in MySQL; MySQL uses more resources, but is much more stable). If a query concerns a free-text field on the other hand, there is almost no comparison. Manticore offers a multitude of additional features at a great performance.

As stability had become an issue for a while, we adjusted the search to be able to use Manticore or MySQL as a backend, depending on which was more suitable in a specific context. In practice, this means that each search parameter is translated into a query string for Manticore and – if possible – for MySQL. If all search parameters have a MySQL equivalent, the search will be performed using the MySQL backend. Otherwise, Manticore will be used.

This simple way of negotiating which backend is more suitable works only as long as one of the alternatives (Manticore) supports all search options, while the other (MySQL) is preferrable in a subset of the search contexts. Which is to say, character-driven searches in inventory numbers break the negotiation logic – they work somewhat well in MySQL, but do not work in Manticore.

Breaking the Logic / Mitigating Confusion

Up to this weekend, all search options were compatible with each other:

If one searches for all objects one has acc ess to, both Manticore and MySQL can handle the query. MySQL will be used.
If one searches for all “helmets” (tag) from “Europe” (place), both Manticore and MySQL can handle the query. MySQL will be used.
If one searches for “helmets” (full-text) from “Europe” (place), MySQL can only sufficiently handle the search by place, while Manticore can meet both search requirements. Manticore will be used.

Character-driven searches by inventory numbers break that compatibility. If one were to search for objects for “helmets” (full-text) with inventory numbers starting with 1 (“1*”), the search parameter “helmets” could only be satisfied by Manticore, while the character-driven search by inventory numbers can only be satisfied by MySQL. Which is to say, the combined search cannot be executed.

Due to popular demand, we introduced character-driven searches for inventory numbers back into musdb. As there is no way to sensibly combine all search parameters anymore, given our circumstance, we had to handle reduce the resulting confusion. For this, there are theoretically two ways.

The theoretically cleaner way would have been to disable the extension of search queries by full-text-focused parameters once an inventory number had been searched. As a full-text search by inventory number is theorecally still possible, the opposite direction (setting a full-text search first, then searching by inventory number) might still have been acceptable, as it would not have led to visibly different results. The basic idea of this solution would have been to prevent users from performing combined searches that are not possible in the targetted way. But if users actually managed in some way, the confusion would have been major. Worse yet, it would have been hard to explain – or rather, it would have been hard to find an appropiate spot in the UI for an explanation -, why certain search options are suddenly disabled.

The alternative route we chose is to allow users to do the impossible combinations, perform the search as best as we can (by transforming the character-driven search by inventory number into a full-text search), and aggressively warn about the likelihood of unexpected results when trying to perform such combined searches. This solution looks unpolished, but it is transparent about the imperfections of the software, and it allows users to find their own solutions to actually perform the combined searches they want. The simplest such solution would be to first search by inventory number, move all the objects into a watch list, and then search by the watch list and combine that search with the full-text search.

This content is licensed under a Creative Commons Attribution 4.0 International license.

M	T	W	T	F	S	S
					1	2
3	4	5	6	7	8	9
10	11	12	13	14	15	16
17	18	19	20	21	22	23
24	25	26	27	28	29	30
31

Laying the Foundations: From MSQL to Manticore and (Somewhat) Back

Breaking the Logic / Mitigating Confusion

Ähnliche Beiträge: