GLAM/Newsletter/May 2021/Contents/Special story
|
Wikimedia Hackathon report: Upgrading GLAM tech tools and PAWS
As part of the recent Wikimedia Hackathon, a number of Wikimedians including User:Chicocvenancio, User:Fuzheado, Tony Hirst, User:Susannaanas, and User:Yuvipanda worked to better document and upgrade the PAWS computing environment on Wikimedia servers for the GLAM Wiki and greater Wikimedia community.
PAWS (https://paws.wmcloud.org) is an often-overlooked coding and development system geared towards those starting out with programming or automated wiki tasks. But it's not just for beginners: it is a full-fledged computing environment being used by dozens of folks for large scale GLAM tasks. Anyone with a Wikimedia login can log into PAWS to try running some basic bot code or just to familiarize themselves with coding tools.
PAWS itself is simply a Wikimedia-specific instance of Jupyter Notebooks, a popular "literate" or "interactive" programming environment useful for experimenting with code. The benefit of running on Wikimedia infrastructure is fast access to servers and data while being automatically authenticated for database actions after logging in.
Major upgrades
PAWS is also a general one-click computing container, so it can run a variety of different open source packages. Some of the important innovations that resulted from the Hackathon for GLAM Wiki users:
- OpenRefine - OpenRefine is used for reconciliation and data cleaning, primarily to see how a data set matches (or not) Wikidata's items. Of main interest to GLAM folks: you can now run OpenRefine on the Wikimedia cloud servers, via PAWs, instead of downloading it to a local computer and installing it. This is useful for folks who cannot install software on locked-down institutional computers, or those who want to do training for OpenRefine and have been hampered by installation woes for individual users. Having folks login to PAWS and run OpenRefine in the same turnkey environment and version is a huge plus. Another benefit of having this in the cloud: after running a reconciliation session in OpenRefine on PAWS, you can share your working data with others with a public link. If you are logged in to PAWS you can choose OpenRefine from the "New" menu, or you can access it with this link:
- SPARQL kernel - If you've ever run a series of Wikidata SPARQL queries but wanted to save them or show a progression of several queries, PAWS can now save SPARQL queries in a Jupyter notebook format. You can make a new SPARQL notebook from the New menu. A sample can be seen here:
- JupyterLab - PAWS has a more advanced JupyterLab mode that is more sophisticated in handling multiple files and extensions. By default, PAWS loads up the classic "notebook" view but thanks to the work at the Hackathon, PAWS can now run JupyterLab, and is useful if you're working with multiple files. PAWS starts up in classic notebook mode, but you can visit this URL to run it in JupyterLab mode:
- PAWS notebooks as apps - A PAWS notebook can interactively show how code can make Wikipedia/Wikidata edits. But you can also develop a usable app with a point-and-click user interface in PAWS. Thanks to the work of User:Chicocvenancio and User:Yuvipanda, the extension Voilá is now available in PAWS, which allows you to execute a fully functioning standalone app on Wikimedia servers, run under your account.
- For an example of a project published with this system, see the USA report writeup for this month, where User:Fuzheado describes his Wikidata Graph Browser project. You can also click on this link to launch the tool using Voilá to see how it works.
- R and RStudio - PAWS has previously been limited to the notebooks using the Python programming language, but during the hackathon support for R was added. R is a programming language focused on statistical computing and graphics which a be powerful tools when one needs to do data analysis and visualization. In addition to using R in classic notebooks PAWS now also comes with RStudio, an "integrated development environment" which allows you to work with multiply files, tools to debug your code, built-in tutorials, and much more. To get started with R in RStudio, use the link below:
Why I wanted to learn to use Jupyter
My idea going to the hackathon was to be able to reconcile tens of thousands of geographic items to data already in Wikidata. I wanted to find items that are located in the same place and only after that start to compare their names in different languages and other properties. There are many tools that could be tweaked to do that, but the data in Wikidata for my items was so heterogenous that I should be able switch between different approaches. So I started to create a notebook for reconciling based on geographic data. It's just a start. The notebook is not working yet, but I have a sense that it will work! I am eager to learn myself and together with others, and create a library of recipes we can exchange with one another to do the things we need. The story continues... – Susanna Ånäs
- AfLIA Wikipedia in African Libraries report
- Argentina report
- Armenia report
- Australia report
- Brazil report
- Côte d'Ivoire report
- India report
- Indonesia report
- Italy report
- Kosovo report
- Netherlands report
- New Zealand report
- North Macedonia report
- Serbia report
- Spain report
- Sweden report
- Switzerland report
- Uganda report
- UK report
- USA report
- Special story
- Wikisource report
- WMF GLAM report
- Wiki World Heritage User Group report
- Calendar