Commons talk:WMF support for Commons/Commons community calls
Priorities from the perspective of a frequent user and re-user (inside and outside Wikimedia projects)
[edit]Posting here just in the case I will miss tomorrow's call. I am very grateful for this opportunity, thank you for listening and considering <3 !
My perspective: I very frequently edit Wikimedia Commons, with the focus of describing the media there as accurately and reliably as possible, and making the media there usable and re-usable by the world (not just Wikimedia projects), in full agreement with the Wikimedia movement strategy.
Professionally I also currently lead a project by a government agency which frequently re-uses media from Wikimedia Commons (probably often media which is not used in Wikipedia at all). You can see some of the usage here. Besides this visible re-use, we also rely on search and querying of Wikimedia Commons and Wikidata to find more media and data, which is harder to track down. As project manager I can say our usage and data retrieval goes up to 10,000s to 100,000s of Wikidata items and Commons files.
I have worked on media databases (broadly speaking) professionally since the early 2000s. My native language is Dutch and I am very aware that the majority of the world doesn't speak a word of English. We have the tools in Wikimedia projects to serve this majority of the world if we decide to leverage them.
High-level wishes from these perspectives:
- In terms of content organization, multilingual discoverability and ease of re-use, structured data is vastly superior. A part of the Wikimedia Commons community is very attached to Wikitext and categories, and I heard that they matter for discoverability too; therefore I still use them. Mainly as duplicate work on top of adding structured data - I would be able to use my sparse volunteer time more efficiently if this were not needed. For re-using and searching, structured data is the way to go. Commons should be a structured database just like any other contemporary digital assets management system.
- Commons is a knowledge platform, not a stock images website. If I want to find a free picture of a dog or a rainbow, I will use a generic search engine. The unique strength of Wikimedia Commons is that we describe and contextualize very specific things (a specific church at a certain point in time, a specific occurrence of an animal or plant in a specific location...). Generic search engines can't help searching for such specific things at specific times and spaces, but we can build (and partly already have) the unique and very helpful infrastructure to achieve that. We should further develop search and browsing for discovery of such specificity. For discovery of media related to general topics, IMO it's better to e.g. work with general-purpose search engines, perhaps focusing on mission-aligned ones (e.g. DuckDuckGo?), to make our general-scope media more generally discoverable there.
- Generally make structured data more visible to contributors so that there is more incentive to improve it.
- Design updates to SDC should encourage editors to edit with precision and accuracy (sourced, correct, not generic but specific).
More specific wishes and requests I'm currently thinking of:
- Remove authentification from WCQS so that Wikimedians, and cultural and other knowledge organizations around the world can perform federated and shareable Wikimedia Commons queries.
- Improve MediaSearch so that it shows (structured) metadata of each file by default (not needing a click).
- Add faceted search to MediaSearch.
- Persistent faceted search results can become new-style galleries. It would be great to be able to have persistent URIs for specific faceted search results, multilingually ("Korenmolen de Distilleerketel in de 19de eeuw")
- Show structured data on file pages by default and more prominent than Wikitext (not in a separate tab anymore)
- In order to be able to re-use gadgets and scripts from Wikidata, and to provide a unified experience, make sure SDC has the generic Wikibase/Wikidata design (i.e. revert the decision to have Commons-specific UI for SDC).
Thanks! Spinster (talk) 09:39, 20 November 2024 (UTC)
- As someone who works on Wikidata scripts/gadgets a lot, the biggest problem for those (by far) is the lack of Javascript hooks. I can adapt scripts to support different HTML structures, but they won't work if they don't run at the right time.
- Also, links to some relevant tickets:
- phab:T327076 - UI for structured data on Commons should have the same Javascript hooks as Wikidata
- phab:T341781 - Show structured data by default
- phab:T297995 - Remove authentication from Wikimedia Commons Query Services (WCQS)
- phab:T337106 - Faceted, structured data-based MediaSearch on Wikimedia Commons
- - Nikki (talk) 16:13, 20 November 2024 (UTC)
- In terms of content organization, multilingual discoverability and ease of re-use, structured data is vastly superior. Strongly disagree. It's basically redundant to categories and just duplicates the work. Most files do not have structured data and those that have them do not have most major subjects or as many things set as the categories. Most of the SD that are set have been set using the categories. It's wishful thinking and is SD is a resource-sink without much need for it when it comes to subjects depicted. Moreover, categories can also be multilingual – it's just one of many cases where people think SD is needed or better when it's not. See Add machine translated category titles on WMC.
- Improve MediaSearch so that it shows (structured) metadata of each file by default Also strongly oppose – instead make it show the categories which unlike SD are well-maintained, usually fairly complete and not polluted with unrelated or vandal depicts data.
- make structured data more visible to contributors so that there is more incentive to improve it just wastes precious scarce volunteer time to duplicate work that has already been done via file categories.
- For discovery of media related to general topics, IMO it's better to e.g. work with general-purpose search engines, perhaps focusing on mission-aligned ones (e.g. DuckDuckGo?), to make our general-scope media more generally discoverable there. People also search for relatively niche things with Web search engines (e.g. a specific river from space at sunlight) and the problem is that WMC is not well indexed there. Videos are not showing in DuckDuckGo Videos at all for example. See Do something about Google & DuckDuckGo search not indexing media files and categories on Commons.
- Please accept the reality of structure data and categories. Prototyperspective (talk) 19:22, 20 November 2024 (UTC)
- The category system is broken in a lot of ways. It doesn't scale well to the size of Commons, and is causing stability issues. Tiny intersection categories ("Red apples with green spots sitting on blue plates in November 2024") are common but make actually using the category system to find every picture of a red apple difficult. All of this and more is solved by structured data, but migrating all of the existing category-based data to structured data absolutely is a challenge. The tools to work with structured data are often barely functional, and WCQS has been an afterthought since it was introduced. But that doesn't mean we should look backward instead of forward. AntiCompositeNumber (talk) 15:46, 21 November 2024 (UTC)
- It's not broken at all.
- For scaling you seem to be referring to phab:T343131 which can be addressed in various ways such as maybe better caching or removing redundant meta-categories (or moving these to SD since they are not about the content).
- [overspecific intersection categories] make actually using the category system to find every picture of a red apple difficult. 1. Not an issue of categories. 2. Not addressed with structured data. 3. Addressed with the Deepcat gadget which would be greatly improved if the deepcategory search operator issues like phab:T376440 were fixed and could be improved upon (e.g. specify depth or exclude certain subcats of Red apples like "Red apples in fiction") and with this highly supported wish.
- Those overspecific categories if anything are a problem and often they are getting upmerged and if not you could propose that but there should also be a category the user would navigate to that contains more of these files instead of many deep overspecific cats. Moreover, many of these by date categories should be redundant by enabling users to sort, search and/or filter (also see phab:T329961 & phab:T329961) by content in the {{Information}} template like the
date=
field which is something quite overdue as there is so much useful metadata in there that it should be searchable / part of filters. - All of this and more is solved by structured data That is denying the reality and wishful thinking. None of these things have been solved or solved to any notable degree.
- that doesn't mean we should look backward instead of forward Just because something is new doesn't make it better. When it comes to subjects depicted, forward are categories, putting one's head in sand and arguing with what one idealogically wish was true is structured data.
- Prototyperspective (talk) 16:25, 21 November 2024 (UTC)
- ... everyone breathe :)
- an image gallery including subcats is a great bandaid.
- If we're redesigning things to make more sense: combination categories are a bit of a misuse of the theoretical concept of cats. "X in fiction" should be in categories "X" + "in fiction". Then <adjective> <adjective> <adjective> <noun> <in context> <in context> would be in six atomic categories, with a large number of possible combination categories. Then we need indexes and views that allow seeing all of the "red, decaying, food, on flatware" which will show red apples with green mold spots on blue plates.
- --SJ+ 14:40, 24 November 2024 (UTC)
- ... everyone breathe :)
- The category system is broken in a lot of ways. It doesn't scale well to the size of Commons, and is causing stability issues. Tiny intersection categories ("Red apples with green spots sitting on blue plates in November 2024") are common but make actually using the category system to find every picture of a red apple difficult. All of this and more is solved by structured data, but migrating all of the existing category-based data to structured data absolutely is a challenge. The tools to work with structured data are often barely functional, and WCQS has been an afterthought since it was introduced. But that doesn't mean we should look backward instead of forward. AntiCompositeNumber (talk) 15:46, 21 November 2024 (UTC)
Perennial needs
[edit]Commons:Requests for comment/Technical needs survey. RoyZuo (talk) 11:34, 20 November 2024 (UTC)
- @RoyZuo Thanks, we already discussed internally the result of this survey, and we tried to include as much as possible its findings into our roadmap. Sannita (WMF) (talk) 10:26, 25 November 2024 (UTC)
summary of calls
[edit]how did the two sessions yesterday go? Arlo James Barnes 20:17, 22 November 2024 (UTC)
- @Arlo Barnes Thanks for the question. The calls went well, we will publish the notes in the next days. Please have a bit of patience, because we need to give them a bit of structure. Sannita (WMF) (talk) 18:02, 23 November 2024 (UTC)
- Do these meetings ever have etherpads or collective notes that the attendees can contribute to? That makes some community meetings easier to follow --SJ+ 14:40, 24 November 2024 (UTC)
- @Sj We collected feedback on an internal document, but I'll ask if we can move to Etherpad for the next calls. Sannita (WMF) (talk) 10:25, 25 November 2024 (UTC)
- Do these meetings ever have etherpads or collective notes that the attendees can contribute to? That makes some community meetings easier to follow --SJ+ 14:40, 24 November 2024 (UTC)
My comments based on Sandra's and Nikki's comments - Jane023
[edit]Quick reorganisation of Sandra’s and Nikki’s comments to be able to refer to these issues by number:
1) Remove authentification from WCQS so that Wikimedians, and cultural and other knowledge organizations around the world can perform federated and shareable Wikimedia Commons queries. phab:T297995 - Remove authentication from Wikimedia Commons Query Services (WCQS)
2) Improve MediaSearch so that it shows (structured) metadata of each file by default (not needing a click).
3) Add faceted search to MediaSearch. phab:T337106 - Faceted, structured data-based MediaSearch on Wikimedia Commons
4) Persistent faceted search results can become new-style galleries. It would be great to be able to have persistent URIs for specific faceted search results, multilingually ("Korenmolen de Distilleerketel in de 19de eeuw”) This is a popular windmill today that was a ruin in the the early 20th-century - see nl:De Distilleerketel
5) Show structured data on file pages by default and more prominent than Wikitext (not in a separate tab anymore) phab:T341781 - Show structured data by default
6) In order to be able to re-use gadgets and scripts from Wikidata, and to provide a unified experience, make sure SDC has the generic Wikibase/Wikidata design (i.e. revert the decision to have Commons-specific UI for SDC). phab:T327076 - UI for structured data on Commons should have the same Javascript hooks as Wikidata
On categories: I am going to skip the category discussion because though I love Commons (and Wikipedia) categories and use HotCat and Cat-a-lot quite a bit on Commons categories for heritage sites and artist categories, I have given up on the “category or item” discussion in favour of both when and if possible. On 2) I feel that Commons categories are much more inefficient for search than structured data, but because of all the restrictions on practical use of WCQS (it’s so well hidden!) I prefer Wikidata search. My main issue with categories these days is that when I go to track down painting files in some language Wikipedia I don’t speak or read, I am shocked that when I click on the file it doesn’t take me to my “ normal” Commons UI, but takes me by default instead to some UI that doesn’t give me any commons categories at all. My main comment on 1) is that this authentication feature is the reason I don’t use WCQS at all. My main comment on 5) is that I occasionally get confused when I update the Wikidata item in a Commons file but the file is still showing the data from the old Q number and I have to go in and change the Q number there too. This hasn’t happened recently so no idea if it has been fixed. For point 6) I agree, but I would wish to keep my default setting across all Wikimedia projects based on my default language version on that project (currently it seems I get the “ not logged in” version as soon as I leave one and enter another).
On copyright files: All of that being said, one thing I really like is the effort to improve multi-lingual copyright labels outside of the complicated templating that we have had since 2010. As a paintings enthusiast with a fondness for Dutch 17th-century art, I am happiest with high resolution images of such paintings and always eager to see the best and version in use. I am a big fan of detail images of paintings and recognise the challenges when all we have are details and are missing an image of the whole painting. Occasionally I stray outside of my safe “PD-old-expired-100” and I find it very confusing at times to see that we are using any image but most weirdly, an image of the signature for a copyrighted painting with no indication of the reason.
On "Commons file gaps": As a member of a gendergap workgroup I am also always surprised by the lack of any gender categories (though with various gender discussions these are as problematic as ethnic or melanin-toned categories). In the case of missing paintings, I have looked for ways to show the gap, and of course this is currently only possible on Wikidata. For popular modern artists there is currently no way to show a commons gallery of paintings in a catalog except to show a numbered series of File:Noimage.svg. Instead of deleting these regularly when they get uploaded by misinformed or unsuspecting Commonists, it would be nice if copyrighted images just default automatically to the “no image” option based on the Wikidata item information, which can be passed through to Commons.
On "modern art on Commons: I do find it logical to look for modern art on Commons, even if we insiders know it’s not there. Most people will look for a painter or sculptor name without considering copyright at all. With a true “global sign-in” to give me my customised UI, I could possibly use some structured data flag to a file held in some non-English Wikipedia, enable auto-delete for copyright files based on death date of the creator (though possibly trumped by Freedom of Panorama) all using some “No image” structured data artefact so that if the artist gets reattributed or his/her death date passes the 70 cutoff, the “undelete” would be semi-automatic and if there was no previous upload, then maybe an auto-upload link can be added to the artefact.
On "ghost uploads": I think the precision of our artist death dates is one reason we don’t have more modern art on Commons. As Commonists come and go, their “ghost uploads” for modern art slowly snowball into huge black holes, especially as exhibitions that those Commonists attended join the other “great exhibitions we all forgot ever happened”. Jane023 (talk) 10:15, 27 November 2024 (UTC)
- @Jane023 Thanks, much appreciated! Sannita (WMF) (talk) 12:30, 27 November 2024 (UTC)