A blog on museum-digital and the broader digitization of museum work.
KI-generiertes Bild einer Landschaft mit Sonnenuntergang

Last month a new sort option appeared on museum-digital: “Aesthetics prediction”. Based on the LAION aesthetics predictor, each published object’s main thumbnail aesthetics are scored. The objects can then be sorted according to this score.

AI, Aesthetics and Discrimination

When working with AI it is common to criticize the inherent biases of the models used. Already underrepresented entities (people, viewpoints, etc.) are excluded, because they are not sufficiently represented in the data the model was trained on. In turn, AI repeats what it learned – pre-existing biases in society are replayed and reinforced by AI.

A second, well-founded criticism against AI applications is their impreciseness, their tendency to “hallucinate” entirely wrong results, and the lack of reproducibility of results.

Both criticisms are entirely valid. And both hint at why AI might be a useful tool exactly for generating a sort order by aesthetics. Like AI, aesthetics are imprecise: Ask 10 people to rank some pictures by their aesthetic appeal and you will get 10 different answers. But the general thrust of the answers will likely be more or less the same. There are societal rules – biases – for what pictures are more or less beautiful. But they are fuzzy and interpreted differently by each and every observer. A sense of aesthetics is about learning and reproducing those biases – learned by the inputs we learn in everyday life -, just as AI will be biased based on the data it is trained on.

Sorting then is, in essence, an exercise of discrimination. Sorting entries by ID / age in a database system will favor the newest entries and discrimate against older ones. Sorting entries alphabetically in an ascending order will favor entries whose title starts with “A” while it will discrimate against those whose title starts with “Z”. Sorting based on an AI-generated rank is discriminating. Sorting by beauty or aesthetics is as well. AI allows sorting by beauty.

What is discriminated against?

Now, if everything in sorting is discriminating, it is all the more important to consider what is actually evaluated. In other words, on what basis the discrimination takes place when ranking digitized museum objects by the beauty of their digital reproductions. As stated above, the objects are scored based on the aesthetics score established for their main thumbnails.

Images then have a range of aspects that influence their aesthetics. Image composition, lighting, contrast, the motive, and many more (an art historian could likely list hundreds). For the common critique, it is essentially only the motive that matters: Favoring images of white women over images of Asian men reproduces a range of societal problems that should not be reproduced. Transferred to object photography, the motive may be further differenciated between object type and the actual motive (e.g. of a painting). And such a discrimination is noticable in the results: Paintings seem to be generally slightly better ranked than pictures of tools. A bias based on the displayed subjects of e.g. different paintings is not something anybody from the team would have noticed, but is to be assumed that one will exists.

On the other hand, the influence of motive-focused biases is many times weaker than the actually useful discrimination based on the technical aspects of the images. Objects images taken by a professional photographer with up-to-date equipment are ranked much better than images taken using a digital camera from the 1990s. Images without a timestamp or a watermark are ranked better than ones that do feature one. Images taken with proper lighting and contrast settings are ranked much better than images taken in a dark room, presenting the objects in gray on black. Similarly, one of the most important facets contributing to the aesthetics score the predictor returns seems to be the composition: If an object is centered in the photo taken, it will score way above images featuring multiple objects at different corners of the image (a classic example would be images that show both the actual object and a photo bar).

To reiterate, these technical aspects are visibly the main contributors to the score. And discriminating based on those images is actually useful: It allows museum-digital to present newer users who are just browsing the published collections with objects recorded with a more consistently high visual quality.

What are the alternatives, or who is the audience?

museum-digital suffers the old problem of a lack of a specified target audience. A common user may be a hobbyist looking for other versions of the model train they just bought. They might be a person interested in what art of the 16th century looked like. Or which museum to visit next. Or they might be a specialist, with a much better idea of what they are actually looking for. It is common for users to leave the page after less than a minute. But there is also an astonishing number of users who stay for hours. Accordingly, it is all the more important to offer (sort) options catering to different needs.

A new user who chanced upon the platform and randomly browses it with no further background on museums will benefit from the new sorting method. Sort orders like sorting by ID or title are linear and follow a consistent logic – but that logic may in essence just reflect its own type of randomness. Sorting newer objects over older ones is essentially a random sort order, if the museum does – as is usual – record objects as they are needed in an exhibition, newly enter the museum or are simply the next object in the shelf – all of which follow a very particular, not publicly comprehensible logic. Similarly, sorting objects by their names or titles is linear. In contrast to the object entry’s age in the database, it is also immediately comprehensible to users. But museums are free to determine the object name freely – and often this is necessary. As per the best practices around publication on museum-digital, an object name should be descriptive and usable to distinguish between different objects. As most objects simply are nameless by themselves, this often means that a colleague at the responsible museum invented a name. Seeing a green vase, there may be a rather short list of names that come to mind for most people – “green vase”, “vase, green”, “vase”, “green-ish vase”, “vase in pale green” -, making the selection of the actual name comprehensible. But “green vase” is in a very different position from “vase, green” when sorting objects alphabetically. One might actually be involuntarily be sorting the objects by curator.

Other than an object’s title/name and institution, the one other facet users see when listing objects on an overview or search page is the object’s thumbnail. And as a sense of aesthetics – fuzzy as it certainly is – is roughly shared among most people (with globalization, even actually most people), sorting objects of diverse origins by the aesthetics of their thumbnails suddenly starts to look like a comparatively relatable and reasonable sort order. Nobody will agree 100%, but the rough order instinctively makes sense. That’s more than can be said about the alternatives, unless one actually looks at the sort settings.

Presenting new users with a more consistently appealing and streamlined set of search results (at the first glance at least) on the other hand might encourage them to stay for longer. And if they stay longer, they might end up actually chancing upon more objects – including ones with less well-taken pictures – eventually.

For those users who are researchers and/or specialists, who need a linear, logical sorting, the aesthetics prediction sort option is obviously of much less merit. But it is safe to assume that is this group of users are overrepresented in the above-mentioned group of people who browse the site for hours. And it is also rather safe to assume, that they are generally more used to online databases and the existence of different sort options. Consequently, it can be assumed that they are able to change the sort settings to their needs – or at least have a higher likelihood of being able to do so.

Being very useful for the general public while less so for specialists who can be assumed to be more skilled anyway, it is sensible to make aesthetics-based sorting the default. There are thus three different default sort orders, depending on what objects one lists:

  • If a user lists all objects of a given instance, the default sort order remains sorting by the age of the entries
  • If a fulltext search was performed, the results are sorted by how well the query string matched the entry by default
  • Otherwise – in case of ID-based search queries like a query for all objects created in Berlin – the search results are sorted using the new aesthetics prediction by default

Operationalizing the Prediction

On a side note, operationalizing the prediction proved a challenge in itself. All of museum-digital’s servers are built for traditional web hosting. Hence, they feature quite powerful CPUs, lots of RAM and no GPU. In other words: They are least suitable for AI applications. And they are all the more unsuitable for scoring over a million thumbnails in bulk while remaining otherwise performant.

If the existing servers cannot be used, there are three reasonable alternatives. First, we could have simply rented another server. As a mostly volunteer-run project, this was not an option. Second, we could have used browser-based AI to calculate the score on users’ machines – we might have e.g. let an uploading user’s machine calculate the score of a thumbnail whenever the user uploads an image. But the users’ PCs vary widely, laptops and tablets (usually again without GPUs) have become more and more popular when compared to workstation, and thus such an approach would have made uploading images unbearably slow. It would have also not helped with calculating scores for the already pre-existing million of objects and their main thumbnails. Again, this was not really an option.

Finally, we could use our private machines (specifically: mine). And that is the approach we chose. Based on a new search API parameter aesthetics_score for querying objects whose thumbnails have not been scored yet (aesthetics_score:10001), my PC downloads the yet unevaluated object thumbnails and evaluates a score for each. These scores are then locally stored in one SQLite database per instance of museum-digital and exported into a CSV file. The CSV file is then uploaded to the server and ingested back into the database to set the aesthetics score for the relevant images. The search index is automatically updated with the new scores following the upload.

This structure unfortunately also means that the scoring does not occur in real-time. Depending on when I turn on or shut down my PC, the most recently published objects may remain uncategorized for a while. To be able to fairly represent such objects, they are assigned an impossibly high score that serves to both sort them above the already-scored objects while also marking them as yet unscored. Specifically, the aesthetics predictor returns a score between 0 and 10. As calculating with full integers is generally much more simple than working with floating point numbers, the score is multiplied by a thousand and then rounded, leaving one with a score between 0 and 10000. An object yet unscored will hence be assigned a default score of 10001.