Step in workflow
|
💡 Tips
(some things to think about during this phase)
|
🛠 Tools
(selection of software that can be used in this phase)
|
---|
|
Negotiations between a GLAM partner and Wikimedia community members
- Both sides can get to know each other by starting with smaller activities (e.g. an edit-a-thon or internal Wikimedia course).
- Agreements about the co-operation can be made explicit in a Memorandum of Understanding. (Guide on how to create a MoU)
|
|
|
Data and media files are made available for Wikimedia Commons and/or Wikidata.
|
- Website scraping/ingest tools (if the data is available online but the partner can't produce data exports from its database)
- Tabula - open source tool to extract tables from PDF files
- PAWS - Python programming notebook environment on Wikimedia Tools Lab that can transfer records from an institution's API
|
|
---|
|
Make sure that copyright of the data and media files is compatible with Wikimedia projects.
|
If permissions and licenses for copyrighted media files aren't published in a public place: make sure the permissions are clarified via an e-mail to OTRS, the platform used by the Wikimedia projects to manage and archive e-mail conversations. (Licensing images: when do I contact OTRS?)
|
|
Clean up the data to be consistent and compatible with Wikimedia Commons and/or Wikidata.
- Look at similar media or data items on Wikimedia Commons or Wikidata for inspiration how to model the data.
- Wikidata's WikiProjects – the 'groups' where volunteers work together on common interests – often have recommendations on data modelling for specific subjects.
|
- Spreadsheet software - allows non-programmers to run checks against existing Wikimedia content
- OpenRefine (formerly Google Refine) - popular tool for advanced data cleaning, transformation and matching against Wikidata content. Its homepage includes video tutorials and a guide on how to use version 3.0 and higher for Wikidata manipulation and uploading.
- PAWS and Pywikibot - for those with some programming experience allows for large scale querying and advanced actions.
|
|
Always check which data and media items are already present on Wikidata and Wikimedia Commons.
- Volunteers have often already autonomously uploaded quite a few images from GLAM collections.
- Wikidata will probably already contain quite a few data items about creative works, people and topics related to specific GLAM collections.
- On Wikimedia Commons, it is considered good practice to upload new (higher-quality) media files. Don't overwrite existing files.
- On Wikidata, duplicate items must be avoided and merged when they are discovered. It is OK (and even highly recommended) to add extra sources and statements to existing items though.
|
|
|
Reconciliation is the step where data items from a source dataset are matched with their corresponding Wikidata items.
- Be thorough during this step. Creating many duplicate Wikidata items must be avoided, as these cause a lot of cleanup work for the Wikidata community!
|
|
|
---|
|
Upload the new data items and/or media files to Wikidata and/or Commons.
- Start with small test batches to check for structural errors.
- Upload in manageable batches. Don't make your batches too large (hundreds rather than thousands) – correcting mistakes in thousands of data items or files at once is not fun.
- Occasionally check uploads during the process, to prevent errors from propagating.
|
Wikimedia Commons:
- Upload Wizard for simple uploads of up to 50 files. Offers no options for refined metadata.
- Pattypan, a user-friendly batch upload tool that works with spreadsheets and that allows for refined details in metadata.
- GLAMwiki Toolset, an advanced upload tool for XML feeds of large file batches. Requires days of lead time and a request for permission to use the tool.
Wikidata:
- QuickStatements, create or update Wikidata items using tab-delimited or CSV files
- OpenRefine (3.0+) tool that has powerful upload functionality for Wikidata
For both:
|
|
---|
|
Fix mistakes and omissions that were made during the upload.
- Mistakes happen! Take responsibility for them, and make sure to correct and improve your own uploads.
|
Wikimedia Commons:
- Cat-a-lot, a gadget on Wikimedia Commons to help with categorizing images by pointing and clicking. Activate in your Commons user preferences.
- VisualFileChange.js, a gadget on Wikimedia Commons that allows you to do batch edits to groups of media files
- AutoWikiBrowser, a semi-automated editor
Wikidata:
- QuickStatements, create or update Wikidata items using tab-delimited or CSV files
- OpenRefine (3.0+) tool that has powerful upload functionality for Wikidata
- EditGroups allows to 'undo' faulty batch edits that were performed with QuickStatements and with OpenRefine
- PetScan, the advanced search and query tool for Wikimedia projects, also has (limited) editing functionalities for Wikidata items.
|
|
Work with Wikimedia communities to enhance and enrich the data and media.
Improvements can include:
- More precise metadata (e.g. what are the places, objects, people depicted in a media file?)
- More references
- Translations of metadata
|
|
|
Encourage use of the media and data in Wikimedia projects and beyond.
- Campaigns can help a lot: Wikipedia article writing contests, photography events...
- Think beyond Wikipedia; perhaps the media or data can be re-used on other platforms too.
|
|
|
---|
|
Evaluate the impact of the media files and/or data by measuring improvements and (re-)use
Measurable aspects may include
- (Number of) people who worked on the data and media
- Types of enrichment
- Inclusions in Wikimedia project
- Pageviews of pages where data/media is used
|
Wikimedia Commons:
- GLAMorous shows how often media files from a Commons category (or uploaded by Commons user) are used in other Wikimedia projects
- BaGLAMa shows Wikimedia page views over time, for specific categories of media files on Wikimedia Commons. Get in touch with its maintainer, Magnus Manske, who can add your own category/ies.
- GLAMorgan shows Wikimedia page views for a specific Wikimedia Commons category for a specific month.
- Fae's GLAM Dashboard, a set of templates that show interesting data about a Commons category, including the most edited files and the most active volunteers who have contributed to them.
Wikidata:
|