Structured Data on Wikimedia Commons report
In October 2021, the OpenRefine team has continued working on structured data functionalities, with a focus on the Wikimedia Commons Reconciliation Service. By the end of October, we have started testing the service in OpenRefine itself, and are including and improving upon additional features, including support for various formats of Commons file names, and data extension, including support for all datatypes. The Wikimedia Commons Reconciliation Service is also available for technical testing at the Reconciliation service test bench.
Why Wikimedia Commons reconciliation? How does it work?
A Wikimedia Commons reconciliation service is necessary groundwork to allow further editing of (structured data of) Wikimedia Commons files in OpenRefine. How does this work?
- The reconciliation service takes a list of file names on Wikimedia Commons that are entered in a column in OpenRefine. It then looks up the M-ids (identifiers) for these files. This process is called reconciliation.
- The magic happens in the next step, though... after reconciliation, the user can proceed to retrieve wikitext and existing structured data statements from these Commons files. As requested, the wikitext and the structured data for each file will be listed in consecutive (new) columns in OpenRefine. This process is called data extension.
- As a result, the user will be able to take this wikitext and existing structured data, modify and clean it further in OpenRefine, and convert wikitext to structured data (for instance: convert strings of names of photographers to their corresponding Wikidata items, and add these as creators (P170) to the files' structured data. This step is currently not yet possible; the OpenRefine team will work on this during the upcoming months.
The reconciliation service is not written specifically for OpenRefine alone; it will also be usable in other tools that want to take existing information (Wikitext and structured data) from Wikimedia Commons files and further process this information.
OpenRefine at WikidataCon 2021
The OpenRefine team presented its ongoing work related to Structured Data on Commons to the Wikidata community at WikidataCon 2021. Additionally, we also gave a general OpenRefine tutorial, and participated in a panel discussion about Wikimedia tool sustainability. Slides (where relevant) of these sessions can be found at https://www.wikidata.org/wiki/Wikidata:WikidataCon_2021/Documentation/List_of_sessions.
No comments yet. Yours could be the first!