<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>OAI-PMH | museum-digital: blog</title>
	<atom:link href="https://blog.museum-digital.org/tag/oai-pmh/feed/" rel="self" type="application/rss+xml" />
	<link>https://blog.museum-digital.org</link>
	<description>A blog on museum-digital and the broader digitization of museum work.</description>
	<lastBuildDate>Mon, 12 Jan 2026 16:43:04 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>
	hourly	</sy:updatePeriod>
	<sy:updateFrequency>
	1	</sy:updateFrequency>
	

<image>
	<url>https://blog.museum-digital.org/wp-content/uploads/2020/01/cropped-mdlogo-code-512px-32x32.png</url>
	<title>OAI-PMH | museum-digital: blog</title>
	<link>https://blog.museum-digital.org</link>
	<width>32</width>
	<height>32</height>
</image> 
<atom:link rel="search" type="application/opensearchdescription+xml" title="Search museum-digital: blog" href="https://blog.museum-digital.org/wp-json/opensearch/1.1/document" />	<item>
		<title>State of Development, November 2025</title>
		<link>https://blog.museum-digital.org/2025/12/03/state-of-development-november-2025/</link>
					<comments>https://blog.museum-digital.org/2025/12/03/state-of-development-november-2025/#respond</comments>
		
		<dc:creator><![CDATA[Joshua Ramon Enslin]]></dc:creator>
		<pubDate>Wed, 03 Dec 2025 01:53:58 +0000</pubDate>
				<category><![CDATA[Development]]></category>
		<category><![CDATA[Frontend]]></category>
		<category><![CDATA[Importer]]></category>
		<category><![CDATA[musdb]]></category>
		<category><![CDATA[API]]></category>
		<category><![CDATA[New Features]]></category>
		<category><![CDATA[OAI-PMH]]></category>
		<guid isPermaLink="false">https://blog.museum-digital.org/?p=4575</guid>

					<description><![CDATA[Frontend musdb Importer Core Parser]]></description>
										<content:encoded><![CDATA[
<h2 class="wp-block-heading"><a href="https://en.about.museum-digital.org/software/frontend/">Frontend</a></h2>



<ul class="wp-block-list">
<li>On source / reference pages, linked objects are now sorted by the position within the source work on which they are referenced or which they do themselves reference</li>



<li>The target URL of the regular / unspecified search bar for objects now follows the new, prettier URL schema</li>



<li>Support for an <a href="https://www.openarchives.org/pmh/">OAI-PMH</a> API for object metadata
<ul class="wp-block-list">
<li>Standardized endpoint for aggregators seeking to retrieve data in batch</li>



<li>Metadata formats thus far supported:
<ul class="wp-block-list">
<li>LIDO</li>



<li>OAI-DC (mandatory)</li>
</ul>
</li>



<li>See also: <a href="https://blog.museum-digital.org/2025/11/24/making-interoperability-easy/">Blog</a></li>
</ul>
</li>



<li>PDFs are only generated for users with a browser set to a non-default language if load on the server is low
<ul class="wp-block-list">
<li>The resource use caused by AI bots scraping museum-digital has been growing and growing. Generally, we see bots included in our mission to enable access to cultural heritage. On the other hand, nobody is served if the service is bogged down by bots. One functionality that is commonly used among bots and resource intensive is the generation of PDFs for object pages. The same information can be loaded from the object page itself and printed to a PDF using the browser&#8217;s print option. There are thus rather few downsides to limiting access to PDF generation to timmes, when server load is low. So that&#8217;s what we did.</li>
</ul>
</li>



<li>Collection-specific ISIL identifiers are now also used in the LIDO API</li>



<li>Alternative numbers of an object can now be displayed on object pages
<ul class="wp-block-list">
<li>This includes tooltips for types of alternative numbers, that can be set by the museum on the institution-wide settings pages of musdb</li>
</ul>
</li>
</ul>



<h2 class="wp-block-heading"><a href="https://en.about.museum-digital.org/software/musdb/">musdb</a></h2>



<ul class="wp-block-list">
<li>Search for objects
<ul class="wp-block-list">
<li>Type-ahead search for languages (of the object&#8217;s content)</li>



<li>Search by object&#8217;s revision status (open, read-ony, archived, etc.)</li>
</ul>
</li>



<li>Batch editing of objects&#8217; revision status</li>



<li>Parameters of the full text search index were updated to improve the search of word compounds in German</li>
</ul>



<h3 class="wp-block-heading">Importer</h3>



<h4 class="wp-block-heading">Core</h4>



<ul class="wp-block-list">
<li>The dry-run mode now does not abort an import anymore, if an unmapped value is encountered. Unmapped entries are collected and displayed together afterwards.
<ul class="wp-block-list">
<li>This means, that unmapped entries can now much more easily be copied to and mapped in <a href="https://concordance.museum-digital.org/">concordance.museum-digital.org</a></li>
</ul>
</li>



<li>Support for the import of alternative numbers (of objects)</li>



<li>Support for the import of space hierarchies</li>
</ul>



<h4 class="wp-block-heading">Parser</h4>



<ul class="wp-block-list">
<li><code>AdlibXml</code>
<ul class="wp-block-list">
<li>Support for importing objects&#8217; alternative numbers</li>
</ul>
</li>



<li><code>CsvXml</code>
<ul class="wp-block-list">
<li>Support for importing objects&#8217; alternative numbers</li>
</ul>
</li>



<li><code>CsvLocations</code>
<ul class="wp-block-list">
<li>New parser for csv-based imports of space hierarchies</li>
</ul>
</li>



<li><code>ImageByInvno</code>
<ul class="wp-block-list">
<li>New setting: append_chars (Adds suffixes, that exist in the inventory number, but not in file names)</li>
</ul>
</li>
</ul>



<div class="wp-block-cgb-cc-by message-body" style="background-color:white;color:black"><img decoding="async" src="https://blog.museum-digital.org/wp-content/plugins/creative-commons/includes/images/by.png" alt="CC" width="88" height="31"/><p><span class="cc-cgb-name">This content</span> is licensed under a <a href="https://creativecommons.org/licenses/by/4.0/">Creative Commons Attribution 4.0 International license.</a> <span class="cc-cgb-text"></span></p></div>
]]></content:encoded>
					
					<wfw:commentRss>https://blog.museum-digital.org/2025/12/03/state-of-development-november-2025/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
		<post-thumbnail><url>https://blog.museum-digital.org/wp-content/uploads/2025/12/winter.webp</url><width>600</width><height>467</height></post-thumbnail>	</item>
		<item>
		<title>Making Interoperability Easy</title>
		<link>https://blog.museum-digital.org/2025/11/24/making-interoperability-easy/</link>
					<comments>https://blog.museum-digital.org/2025/11/24/making-interoperability-easy/#respond</comments>
		
		<dc:creator><![CDATA[Joshua Ramon Enslin]]></dc:creator>
		<pubDate>Mon, 24 Nov 2025 15:56:37 +0000</pubDate>
				<category><![CDATA[Development]]></category>
		<category><![CDATA[Frontend]]></category>
		<category><![CDATA[Infrastructure]]></category>
		<category><![CDATA[API]]></category>
		<category><![CDATA[Interoperability]]></category>
		<category><![CDATA[New Features]]></category>
		<category><![CDATA[OAI-PMH]]></category>
		<category><![CDATA[Object search (frontend)]]></category>
		<guid isPermaLink="false">https://blog.museum-digital.org/?p=4538</guid>

					<description><![CDATA[Interoperability has been one of the focal issues around museum-digital practically since its inception. Offering different, simple ways to bring data into the system was a necessary requirement to even think of what we do. And offering simple ways to get the data out of the system again is just good practice &#8211; though all <a href="https://blog.museum-digital.org/2025/11/24/making-interoperability-easy/" class="more-link">...</a>]]></description>
										<content:encoded><![CDATA[
<p>Interoperability has been one of the focal issues around museum-digital practically since its inception. Offering different, simple ways to bring data into the system was a necessary requirement to even think of what we do. And offering simple ways to get the data out of the system again is just good practice &#8211; though all too often neglected.</p>



<p>To that end, there have traditionally been two primary ways for data retrieval. In musdb, one could run batch exports and receive a ZIP with some form of XML files. One per object, with the objects matching the results of any given object search.</p>



<p>On the other hand, there is the public API. Using URL manipulation, one can access the (primary) contents of each page in a machine-readable way. To access the JSON representation of an object&#8217;s published metadata, where the object&#8217;s ID is 7141 in the Hesse instance of museum-digital (URL: <a href="https://hessen.museum-digital.de/object/7141"><code>https://hessen.museum-digital.de/object/7141</code></a>), one simply has to insert <code>json</code> to the path: <code><a href="https://hessen.museum-digital.de/json/object/7141">https://hessen.museum-digital.de/json/object/7141</a></code>.</p>



<p>Next to the default JSON output, additional APIs are offered wherever suitable for a given data type. For objects, the primary additional output method is a <a href="http://lido-schema.org/">LIDO</a> API.</p>



<p>Thus far, the main limitation of the public API was that it only allowed one object (or institution, collection, etc.) to be queried at a time.</p>



<h2 class="wp-block-heading">Querying Object Metadata in Batches</h2>



<p>After a significant refactoring of the code to load object data for object pages &#8211; primarily to improve caching and allow for parallelized requests to the database &#8211; we are now finally able to offer APIs for querying object metadata in batch. Thanks to grouped database queries, performance and resource usage scale nicely. Taking simply the currently most recent objects in the Germany-wide instance of museum-digital: Loading all object data of one object and presenting it in JSON takes 0.0087 seconds and loading and generating the JSON for the 100 most recent objects takes 0.197 seconds (or 0.00197 per object). Note that not all queries for all aspects of an object&#8217;s metadata are grouped yet, performance may thus get even better over time. This does also not yet account for the overhead of the many HTTPs requests one would previously need to execute to get each object&#8217;s metadata one by one &#8211; real performance improvements are thus even greater.</p>



<p>Now, how to access object metadata in batches?</p>



<p>The batch access is linked with the search API and reuses its main query parameter (&#8220;s&#8221;). Say, if one is searching for objects related to Berlin (a.k.a. the place of the ID 61), the URL of the respective search page would be <code>https://global.museum-digital.org/objects?s=place%3A61</code>. The corresponding API for retrieving all of the objects&#8217; published metadata would then be <code>https://global.museum-digital.org/export/json/place:61?limit=24&amp;offset=0</code>. Like the search page itself and its primary API (<code>/json/objects</code>), the full batch export API is paginated with a maximum of currently 100 objects per page being returned.</p>



<figure class="wp-block-image size-large"><img fetchpriority="high" decoding="async" width="1024" height="576" src="https://blog.museum-digital.org/wp-content/uploads/2025/11/20251124_15h36m15s_frontend-batch-export-object-metadata-ui-1024x576.webp" alt="Link to batch exporting search results in the frontend" class="wp-image-4540" srcset="https://blog.museum-digital.org/wp-content/uploads/2025/11/20251124_15h36m15s_frontend-batch-export-object-metadata-ui-1024x576.webp 1024w, https://blog.museum-digital.org/wp-content/uploads/2025/11/20251124_15h36m15s_frontend-batch-export-object-metadata-ui-300x169.webp 300w, https://blog.museum-digital.org/wp-content/uploads/2025/11/20251124_15h36m15s_frontend-batch-export-object-metadata-ui.webp 1292w" sizes="(max-width: 1024px) 100vw, 1024px" /><figcaption class="wp-element-caption">Additional to URL manipulation, the batch export API is linked in the menu of object search results pages.</figcaption></figure>



<p>Currently, the batch export of full object metadata is available for JSON and LIDO (XML) representations of the object data. More can rather easily be added later, should a demand arise.</p>



<h2 class="wp-block-heading">OAI</h2>



<p>Implementing a performant way to export full object metadata in bulk was one of the two main missing components for the long-missing implementation of an <a href="https://www.openarchives.org/pmh/">OAI-PMH</a> API.</p>



<p>OAI is a standard tailored towards data harvesting. Say, an external service like the German Digital Library or Worldcat wants to do something with external data from diverse sources, e.g. to also display objects or implement a search across the different collections / libraries to find which one has which object / book. To be able to do so, they need to be able to access the respective data in some way. Ideally, using a common standard that describes how to query data, helps to identify any data sets that need to be updated or added (or deleted), and finally presents a uniform way to access the data periodically. That, exactly, is OAI-PMH.</p>



<p>In a nutshell: OAI-PMH allows other services to copy all (published) data from another service in a maschine-readable way and can thus significantly improve reuse in aggregation. Of course this only applies to technical questions; legally, potential re-users need to comply with the metadata license applied by the initial data provider regardless of the (technical) means of access.</p>



<p>Since last week, museum-digital now provides a OAI-PMH API at <code>/oai</code> respective to a given subdomain. E.g.: <code>https://hessen.museum-digital.de/oai</code>. As of now, the OAI-PMH API provides access to the objects&#8217; metadata using LIDO (XML) and the mandatory OAI-DC format.</p>



<p>Note that there are some caveats remaining for now: First, the LIDO representation of object metadata is not (and can by definition not be) as complete and fine-grained as the JSON API. It is also not exactly similar to the LIDO as returned by exports from musdb (one is formed natively in PHP, the other using XSLT, leading to divergent development paths). Also, the LIDO output lists different identifiers from the ones used by the OAI-PMH API and the OAI-DC representations otherwise.</p>



<p>Finally, the OAI-PMH API at museum-digital does not implement OAI-PMH data sets to group collections. Instead, it follows the existing search logic (essentially providing a new endpoint per query). Example searches via OAI might thus look as follows:</p>



<ul class="wp-block-list">
<li>All objects from the Agrargeschichte instance of museum-digital, represented in OAI-DC: <a href="https://agrargeschichte.museum-digital.de/oai?verb=ListRecords&amp;metadataPrefix=oai_dc">https://agrargeschichte.museum-digital.de/oai?verb=ListRecords&amp;metadataPrefix=oai_dc</a></li>



<li>All objects linked to Berlin (place #61), in the Berlin instance of museum-digital, represented in LIDO: <a href="https://berlin.museum-digital.de/oai/place:61?verb=ListRecords&amp;metadataPrefix=lido">https://berlin.museum-digital.de/oai/place:61?verb=ListRecords&amp;metadataPrefix=lido</a></li>



<li>All objects of the Freies Deutsches Hochstift, Frankfurt am Main, represented in LIDO: <a href="https://hessen.museum-digital.de/oai/institution:1?verb=ListRecords&amp;metadataPrefix=lido">https://hessen.museum-digital.de/oai/institution:1?verb=ListRecords&amp;metadataPrefix=lido</a></li>



<li>All objects from Baranya with only their identifiers: <a href="https://ba.hu.museum-digital.org/oai?verb=ListIdentifiers&amp;metadataPrefix=lido">https://ba.hu.museum-digital.org/oai?verb=ListIdentifiers&amp;metadataPrefix=lido</a></li>
</ul>



<h2 class="wp-block-heading">Credits</h2>



<p>Credits where credit is due: the Städel Museum deserves praise for their implementation of OAI-PMH. For me personally, seeing the <a href="https://sammlung.staedelmuseum.de/de/oai/guide">Städel&#8217;s API</a> was the first time I saw a visibly stateless OAI-PMH implementation, enabled by the ingenious idea of using machine-readable, JSON-encoded resumption tokens over UIDs that have to be resolved on the server side. I spent weeks (or months?) making museum-digital&#8217;s frontend stateless. By following the Städel&#8217;s example, it can remain so even while offering an OAI-PMH API.</p>



<hr class="wp-block-separator has-alpha-channel-opacity"/>



<div class="wp-block-cgb-cc-by message-body" style="background-color:white;color:black"><img decoding="async" src="https://blog.museum-digital.org/wp-content/plugins/creative-commons/includes/images/by.png" alt="CC" width="88" height="31"/><p><span class="cc-cgb-name">This content</span> is licensed under a <a href="https://creativecommons.org/licenses/by/4.0/">Creative Commons Attribution 4.0 International license.</a> <span class="cc-cgb-text"></span></p></div>
]]></content:encoded>
					
					<wfw:commentRss>https://blog.museum-digital.org/2025/11/24/making-interoperability-easy/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
		<post-thumbnail><url>https://blog.museum-digital.org/wp-content/uploads/2025/11/20251124-absurdresmasterpiececircuitlight-poster-1024x1024.webp</url><width>600</width><height>600</height></post-thumbnail>	</item>
	</channel>
</rss>
