Object search (frontend) | museum-digital: blog

State of Development, October 2025

Joshua Ramon Enslin — Tue, 25 Nov 2025 16:55:09 +0000

Development

Frontend

Significantly reworked the display of transcriptions on object pages
- Titles of transcriptions are now displayed
  - If none is set, the type of the transcription (original or translation) is used as a replacement
- Transcriptions are sorted by their titles
- Improved the display of transcriptions in tiles
  - Problems with vertical scrolling are now solved
  - If only one transcription has been recorded, it will be displayed on the full width of the page
  - If there are more than two transcriptions for an object, they are folded in by default and can be unfolded later on
Batch export of object metadata via the API
- Thus far available in JSON & LIDO
- API documentation
- See also
Dots as a separator in floating point numbers for object measurements are replaced with a comma in languages that require that
Collection-specific ISIL IDs are used in the LIDO API

musdb

Added a field for recording titles / names of transcriptions
Added the option to set collection-specific ISIL IDs
Setting object type tags via the improvement suggestions now correctly classifies the thus created link between object and tag
Additional shapes are now available
- E.g.: round, square
Object groups can now be filtered by whether they have a superordinate one or not

Dissemination

2025-10-08: Presentation at the Autumn Conference of the Working Group Documentation of the German Museum Association: “Interoperabilität schaffen – Geschichten aus 1001 Importen”
- PDF
- ODP
2025-10-14: Talk on a workshop of the project CiVers (Citation of Versioned Web Pages by PID)
- PDF
- ODP
2025-10-17: museum-digital Usertagung 2025

This content is licensed under a Creative Commons Attribution 4.0 International license.

Making Interoperability Easy

Joshua Ramon Enslin — Mon, 24 Nov 2025 15:56:37 +0000

Interoperability has been one of the focal issues around museum-digital practically since its inception. Offering different, simple ways to bring data into the system was a necessary requirement to even think of what we do. And offering simple ways to get the data out of the system again is just good practice – though all too often neglected.

To that end, there have traditionally been two primary ways for data retrieval. In musdb, one could run batch exports and receive a ZIP with some form of XML files. One per object, with the objects matching the results of any given object search.

On the other hand, there is the public API. Using URL manipulation, one can access the (primary) contents of each page in a machine-readable way. To access the JSON representation of an object’s published metadata, where the object’s ID is 7141 in the Hesse instance of museum-digital (URL: https://hessen.museum-digital.de/object/7141), one simply has to insert json to the path: https://hessen.museum-digital.de/json/object/7141.

Next to the default JSON output, additional APIs are offered wherever suitable for a given data type. For objects, the primary additional output method is a LIDO API.

Thus far, the main limitation of the public API was that it only allowed one object (or institution, collection, etc.) to be queried at a time.

Querying Object Metadata in Batches

After a significant refactoring of the code to load object data for object pages – primarily to improve caching and allow for parallelized requests to the database – we are now finally able to offer APIs for querying object metadata in batch. Thanks to grouped database queries, performance and resource usage scale nicely. Taking simply the currently most recent objects in the Germany-wide instance of museum-digital: Loading all object data of one object and presenting it in JSON takes 0.0087 seconds and loading and generating the JSON for the 100 most recent objects takes 0.197 seconds (or 0.00197 per object). Note that not all queries for all aspects of an object’s metadata are grouped yet, performance may thus get even better over time. This does also not yet account for the overhead of the many HTTPs requests one would previously need to execute to get each object’s metadata one by one – real performance improvements are thus even greater.

Now, how to access object metadata in batches?

The batch access is linked with the search API and reuses its main query parameter (“s”). Say, if one is searching for objects related to Berlin (a.k.a. the place of the ID 61), the URL of the respective search page would be https://global.museum-digital.org/objects?s=place%3A61. The corresponding API for retrieving all of the objects’ published metadata would then be https://global.museum-digital.org/export/json/place:61?limit=24&offset=0. Like the search page itself and its primary API (/json/objects), the full batch export API is paginated with a maximum of currently 100 objects per page being returned.

Additional to URL manipulation, the batch export API is linked in the menu of object search results pages.

Currently, the batch export of full object metadata is available for JSON and LIDO (XML) representations of the object data. More can rather easily be added later, should a demand arise.

OAI

Implementing a performant way to export full object metadata in bulk was one of the two main missing components for the long-missing implementation of an OAI-PMH API.

OAI is a standard tailored towards data harvesting. Say, an external service like the German Digital Library or Worldcat wants to do something with external data from diverse sources, e.g. to also display objects or implement a search across the different collections / libraries to find which one has which object / book. To be able to do so, they need to be able to access the respective data in some way. Ideally, using a common standard that describes how to query data, helps to identify any data sets that need to be updated or added (or deleted), and finally presents a uniform way to access the data periodically. That, exactly, is OAI-PMH.

In a nutshell: OAI-PMH allows other services to copy all (published) data from another service in a maschine-readable way and can thus significantly improve reuse in aggregation. Of course this only applies to technical questions; legally, potential re-users need to comply with the metadata license applied by the initial data provider regardless of the (technical) means of access.

Since last week, museum-digital now provides a OAI-PMH API at /oai respective to a given subdomain. E.g.: https://hessen.museum-digital.de/oai. As of now, the OAI-PMH API provides access to the objects’ metadata using LIDO (XML) and the mandatory OAI-DC format.

Note that there are some caveats remaining for now: First, the LIDO representation of object metadata is not (and can by definition not be) as complete and fine-grained as the JSON API. It is also not exactly similar to the LIDO as returned by exports from musdb (one is formed natively in PHP, the other using XSLT, leading to divergent development paths). Also, the LIDO output lists different identifiers from the ones used by the OAI-PMH API and the OAI-DC representations otherwise.

Finally, the OAI-PMH API at museum-digital does not implement OAI-PMH data sets to group collections. Instead, it follows the existing search logic (essentially providing a new endpoint per query). Example searches via OAI might thus look as follows:

All objects from the Agrargeschichte instance of museum-digital, represented in OAI-DC: https://agrargeschichte.museum-digital.de/oai?verb=ListRecords&metadataPrefix=oai_dc
All objects linked to Berlin (place #61), in the Berlin instance of museum-digital, represented in LIDO: https://berlin.museum-digital.de/oai/place:61?verb=ListRecords&metadataPrefix=lido
All objects of the Freies Deutsches Hochstift, Frankfurt am Main, represented in LIDO: https://hessen.museum-digital.de/oai/institution:1?verb=ListRecords&metadataPrefix=lido
All objects from Baranya with only their identifiers: https://ba.hu.museum-digital.org/oai?verb=ListIdentifiers&metadataPrefix=lido

Credits

Credits where credit is due: the Städel Museum deserves praise for their implementation of OAI-PMH. For me personally, seeing the Städel’s API was the first time I saw a visibly stateless OAI-PMH implementation, enabled by the ingenious idea of using machine-readable, JSON-encoded resumption tokens over UIDs that have to be resolved on the server side. I spent weeks (or months?) making museum-digital’s frontend stateless. By following the Städel’s example, it can remain so even while offering an OAI-PMH API.

This content is licensed under a Creative Commons Attribution 4.0 International license.

Sort by Beauty

Joshua Ramon Enslin — Wed, 05 Mar 2025 23:25:18 +0000

Last month a new sort option appeared on museum-digital: “Aesthetics prediction”. Based on the LAION aesthetics predictor, each published object’s main thumbnail aesthetics are scored. The objects can then be sorted according to this score.

AI, Aesthetics and Discrimination

When working with AI it is common to criticize the inherent biases of the models used. Already underrepresented entities (people, viewpoints, etc.) are excluded, because they are not sufficiently represented in the data the model was trained on. In turn, AI repeats what it learned – pre-existing biases in society are replayed and reinforced by AI.

A second, well-founded criticism against AI applications is their impreciseness, their tendency to “hallucinate” entirely wrong results, and the lack of reproducibility of results.

Both criticisms are entirely valid. And both hint at why AI might be a useful tool exactly for generating a sort order by aesthetics. Like AI, aesthetics are imprecise: Ask 10 people to rank some pictures by their aesthetic appeal and you will get 10 different answers. But the general thrust of the answers will likely be more or less the same. There are societal rules – biases – for what pictures are more or less beautiful. But they are fuzzy and interpreted differently by each and every observer. A sense of aesthetics is about learning and reproducing those biases – learned by the inputs we learn in everyday life -, just as AI will be biased based on the data it is trained on.

Sorting then is, in essence, an exercise of discrimination. Sorting entries by ID / age in a database system will favor the newest entries and discrimate against older ones. Sorting entries alphabetically in an ascending order will favor entries whose title starts with “A” while it will discrimate against those whose title starts with “Z”. Sorting based on an AI-generated rank is discriminating. Sorting by beauty or aesthetics is as well. AI allows sorting by beauty.

What is discriminated against?

Now, if everything in sorting is discriminating, it is all the more important to consider what is actually evaluated. In other words, on what basis the discrimination takes place when ranking digitized museum objects by the beauty of their digital reproductions. As stated above, the objects are scored based on the aesthetics score established for their main thumbnails.

Images then have a range of aspects that influence their aesthetics. Image composition, lighting, contrast, the motive, and many more (an art historian could likely list hundreds). For the common critique, it is essentially only the motive that matters: Favoring images of white women over images of Asian men reproduces a range of societal problems that should not be reproduced. Transferred to object photography, the motive may be further differenciated between object type and the actual motive (e.g. of a painting). And such a discrimination is noticable in the results: Paintings seem to be generally slightly better ranked than pictures of tools. A bias based on the displayed subjects of e.g. different paintings is not something anybody from the team would have noticed, but is to be assumed that one will exists.

On the other hand, the influence of motive-focused biases is many times weaker than the actually useful discrimination based on the technical aspects of the images. Objects images taken by a professional photographer with up-to-date equipment are ranked much better than images taken using a digital camera from the 1990s. Images without a timestamp or a watermark are ranked better than ones that do feature one. Images taken with proper lighting and contrast settings are ranked much better than images taken in a dark room, presenting the objects in gray on black. Similarly, one of the most important facets contributing to the aesthetics score the predictor returns seems to be the composition: If an object is centered in the photo taken, it will score way above images featuring multiple objects at different corners of the image (a classic example would be images that show both the actual object and a photo bar).

To reiterate, these technical aspects are visibly the main contributors to the score. And discriminating based on those images is actually useful: It allows museum-digital to present newer users who are just browsing the published collections with objects recorded with a more consistently high visual quality.

What are the alternatives, or who is the audience?

museum-digital suffers the old problem of a lack of a specified target audience. A common user may be a hobbyist looking for other versions of the model train they just bought. They might be a person interested in what art of the 16th century looked like. Or which museum to visit next. Or they might be a specialist, with a much better idea of what they are actually looking for. It is common for users to leave the page after less than a minute. But there is also an astonishing number of users who stay for hours. Accordingly, it is all the more important to offer (sort) options catering to different needs.

A new user who chanced upon the platform and randomly browses it with no further background on museums will benefit from the new sorting method. Sort orders like sorting by ID or title are linear and follow a consistent logic – but that logic may in essence just reflect its own type of randomness. Sorting newer objects over older ones is essentially a random sort order, if the museum does – as is usual – record objects as they are needed in an exhibition, newly enter the museum or are simply the next object in the shelf – all of which follow a very particular, not publicly comprehensible logic. Similarly, sorting objects by their names or titles is linear. In contrast to the object entry’s age in the database, it is also immediately comprehensible to users. But museums are free to determine the object name freely – and often this is necessary. As per the best practices around publication on museum-digital, an object name should be descriptive and usable to distinguish between different objects. As most objects simply are nameless by themselves, this often means that a colleague at the responsible museum invented a name. Seeing a green vase, there may be a rather short list of names that come to mind for most people – “green vase”, “vase, green”, “vase”, “green-ish vase”, “vase in pale green” -, making the selection of the actual name comprehensible. But “green vase” is in a very different position from “vase, green” when sorting objects alphabetically. One might actually be involuntarily be sorting the objects by curator.

Other than an object’s title/name and institution, the one other facet users see when listing objects on an overview or search page is the object’s thumbnail. And as a sense of aesthetics – fuzzy as it certainly is – is roughly shared among most people (with globalization, even actually most people), sorting objects of diverse origins by the aesthetics of their thumbnails suddenly starts to look like a comparatively relatable and reasonable sort order. Nobody will agree 100%, but the rough order instinctively makes sense. That’s more than can be said about the alternatives, unless one actually looks at the sort settings.

Presenting new users with a more consistently appealing and streamlined set of search results (at the first glance at least) on the other hand might encourage them to stay for longer. And if they stay longer, they might end up actually chancing upon more objects – including ones with less well-taken pictures – eventually.

For those users who are researchers and/or specialists, who need a linear, logical sorting, the aesthetics prediction sort option is obviously of much less merit. But it is safe to assume that is this group of users are overrepresented in the above-mentioned group of people who browse the site for hours. And it is also rather safe to assume, that they are generally more used to online databases and the existence of different sort options. Consequently, it can be assumed that they are able to change the sort settings to their needs – or at least have a higher likelihood of being able to do so.

Being very useful for the general public while less so for specialists who can be assumed to be more skilled anyway, it is sensible to make aesthetics-based sorting the default. There are thus three different default sort orders, depending on what objects one lists:

If a user lists all objects of a given instance, the default sort order remains sorting by the age of the entries
If a fulltext search was performed, the results are sorted by how well the query string matched the entry by default
Otherwise – in case of ID-based search queries like a query for all objects created in Berlin – the search results are sorted using the new aesthetics prediction by default

Operationalizing the Prediction

On a side note, operationalizing the prediction proved a challenge in itself. All of museum-digital’s servers are built for traditional web hosting. Hence, they feature quite powerful CPUs, lots of RAM and no GPU. In other words: They are least suitable for AI applications. And they are all the more unsuitable for scoring over a million thumbnails in bulk while remaining otherwise performant.

If the existing servers cannot be used, there are three reasonable alternatives. First, we could have simply rented another server. As a mostly volunteer-run project, this was not an option. Second, we could have used browser-based AI to calculate the score on users’ machines – we might have e.g. let an uploading user’s machine calculate the score of a thumbnail whenever the user uploads an image. But the users’ PCs vary widely, laptops and tablets (usually again without GPUs) have become more and more popular when compared to workstation, and thus such an approach would have made uploading images unbearably slow. It would have also not helped with calculating scores for the already pre-existing million of objects and their main thumbnails. Again, this was not really an option.

Finally, we could use our private machines (specifically: mine). And that is the approach we chose. Based on a new search API parameter aesthetics_score for querying objects whose thumbnails have not been scored yet (aesthetics_score:10001), my PC downloads the yet unevaluated object thumbnails and evaluates a score for each. These scores are then locally stored in one SQLite database per instance of museum-digital and exported into a CSV file. The CSV file is then uploaded to the server and ingested back into the database to set the aesthetics score for the relevant images. The search index is automatically updated with the new scores following the upload.

This structure unfortunately also means that the scoring does not occur in real-time. Depending on when I turn on or shut down my PC, the most recently published objects may remain uncategorized for a while. To be able to fairly represent such objects, they are assigned an impossibly high score that serves to both sort them above the already-scored objects while also marking them as yet unscored. Specifically, the aesthetics predictor returns a score between 0 and 10. As calculating with full integers is generally much more simple than working with floating point numbers, the score is multiplied by a thousand and then rounded, leaving one with a score between 0 and 10000. An object yet unscored will hence be assigned a default score of 10001.

This content is licensed under a Creative Commons Attribution 4.0 International license.