Step in workflow
(some things to think about during this phase)
(selection of software that can be used in this phase)
Negotiations between a GLAM partner and Wikimedia community members
Both sides can get to know each other by starting with smaller activities (e.g. an edit-a-thon or internal Wikimedia course).
Agreements about the co-operation can be made explicit in a Memorandum of Understanding. ( Guide on how to create a MoU)
Data and media files are made available for Wikimedia Commons and/or Wikidata.
Website scraping/ingest tools (if the data is available online but the partner can't produce data exports from its database)
- open source tool to extract tables from PDF files Tabula - Python programming notebook environment on Wikimedia Tools Lab that can transfer records from an institution's API PAWS
Make sure that copyright of the data and media files is compatible with Wikimedia projects.
If permissions and licenses for copyrighted media files aren't published in a public place: make sure the permissions are clarified via an e-mail to OTRS, the platform used by the Wikimedia projects to manage and archive e-mail conversations. ( Licensing images: when do I contact OTRS?)
Clean up the data to be consistent and compatible with Wikimedia Commons and/or Wikidata.
Look at similar media or data items on Wikimedia Commons or Wikidata for inspiration how to model the data.
Wikidata's WikiProjects – the 'groups' where volunteers work together on common interests – often have recommendations on data modelling for specific subjects.
Spreadsheet software - allows non-programmers to run checks against existing Wikimedia content
OpenRefine (formerly Google Refine) - popular tool for advanced data cleaning, transformation and matching against Wikidata content. Its homepage includes video tutorials and a guide on how to use version 3.0 and higher for Wikidata manipulation and uploading. PAWS and Pywikibot - for those with some programming experience allows for large scale querying and advanced actions.
Always check which data and media items are already present on Wikidata and Wikimedia Commons.
Volunteers have often already autonomously uploaded quite a few images from GLAM collections.
Wikidata will probably already contain quite a few data items about creative works, people and topics related to specific GLAM collections.
On Wikimedia Commons, it is considered good practice to upload new (higher-quality) media files. Don't overwrite existing files.
On Wikidata, duplicate items must be avoided and merged when they are discovered. It is OK (and even highly recommended) to add extra sources and statements to existing items though.
Reconciliation is the step where data items from a source dataset are matched with their corresponding Wikidata items.
Be thorough during this step. Creating many duplicate Wikidata items must be avoided, as these cause a lot of cleanup work for the Wikidata community!
Upload the new data items and/or media files to Wikidata and/or Commons.
Start with small test batches to check for structural errors.
Upload in manageable batches. Don't make your batches too large (hundreds rather than thousands) – correcting mistakes in thousands of data items or files at once is not fun.
Occasionally check uploads during the process, to prevent errors from propagating. Wikimedia Commons:
for simple uploads of up to 50 files. Offers no options for refined metadata. Upload Wizard
, a user-friendly batch upload tool that works with spreadsheets and that allows for refined details in metadata. Pattypan , an advanced upload tool for XML feeds of large file batches. Requires days of lead time and a request for permission to use the tool. GLAMwiki Toolset
, create or update Wikidata items using tab-delimited or CSV files QuickStatements (3.0+) tool that has powerful upload functionality for Wikidata OpenRefine
Fix mistakes and omissions that were made during the upload.
Mistakes happen! Take responsibility for them, and make sure to correct and improve your own uploads. Wikimedia Commons:
, a gadget on Wikimedia Commons to help with categorizing images by pointing and clicking. Activate in your Commons user preferences. Cat-a-lot
, a gadget on Wikimedia Commons that allows you to do batch edits to groups of media files VisualFileChange.js , a semi-automated editor AutoWikiBrowser
, create or update Wikidata items using tab-delimited or CSV files QuickStatements
(3.0+) tool that has powerful upload functionality for Wikidata OpenRefine
allows to 'undo' faulty batch edits that were performed with QuickStatements and with OpenRefine EditGroups , the advanced search and query tool for Wikimedia projects, also has (limited) editing functionalities for Wikidata items. PetScan
Work with Wikimedia communities to enhance and enrich the data and media.
Improvements can include:
More precise metadata (e.g. what are the places, objects, people depicted in a media file?)
Translations of metadata
Encourage use of the media and data in Wikimedia projects and beyond.
Campaigns can help a lot: Wikipedia article writing contests, photography events...
Think beyond Wikipedia; perhaps the media or data can be re-used on other platforms too.
Evaluate the impact of the media files and/or data by measuring improvements and (re-)use
Measurable aspects may include
(Number of) people who worked on the data and media
Types of enrichment
Inclusions in Wikimedia project
Pageviews of pages where data/media is used Wikimedia Commons:
GLAMorous shows how often media files from a Commons category (or uploaded by Commons user) are used in other Wikimedia projects
BaGLAMa shows Wikimedia page views over time, for specific categories of media files on Wikimedia Commons. Get in touch with its maintainer, Magnus Manske, who can add your own category/ies.
GLAMorgan shows Wikimedia page views for a specific Wikimedia Commons category for a specific month. Fae's GLAM Dashboard, a set of templates that show interesting data about a Commons category, including the most edited files and the most active volunteers who have contributed to them.