Tuesday, May 29, 2018

Fork-tailed flycatcher: World Univ and Sch forking into WUaS Co * * * Wikidata office hours today re newly released lexicographical data - and related licensing information : Scott: "Lydia_WMDE - in Wikidata's unfolding relationship with Google (eg Wikipedia/Wikidata is used by Google a lot) - and now potentially re lexicographical data, could you say a little about how you think something like GNMT / Google Translate will use lexicographical data in Wikidata / Wikipedia's 301 languages please - and re CC licensing too?" Lydia: "Scott_WUaS: I don't know :D Anyone can use it for anything. That's why we do this, right? I hope that we will see a lot of new tools being developed by organisations that support small languages," Scott: "algorithmic heaven :)" ... AND ... Scott: "Is WikiCite CC-0 licensed?," Lydia: "WikiCite is a project. The data they add to Wikidata is CC-0," Wikidata's project re Wikicite books' data in 301 languages will become the basis of WUaS Bookstores, Think that this new Wikidata lexicographical data will become the basis of the Academic Press at World University and School planned with machine translation


World Univ and Sch forking into WUaS Co ...



https://youtu.be/n8Iq41aYl-k

*
Re a cryptocurrency with blockchain ledger with Universal Basic Income, which I talk about in the video above (and other recent WUaS videos - https://youtube.com/user/WorldUnivandSch), see, too:

Red panda: Not enough kindness in the world (in my opinion). Curious how best WUaS could facilitate a single worldwide cryptocurrency with blockchain ledger backed by all ~200 nation states' central banks (think the Euro, 20 years or so in) ... and at a key moment of coding World Univ & Sch's new WUaS Miraheze Mediawiki as "front end" with Wikidata/Wikibase as "back end" for wiki learners / teachers in all 7097 living languages, - and indeed for all 7.5 billion people on earth as end users / Universitians at WUaS via an Universal Basic Income (UBI) ... https://wiki.worlduniversityandschool.org/wiki/You_at_World_University ... David re robotics re gaming - https://twitter.com/ValaAfshar/status/1000411890453970944 - which I retweeted at the non-profit 501 (c) 3 - https://twitter.com/WorldUnivAndSch and here at the newly forked (in 2017) for-profit general stock company the WUaS Corp - https://twitter.com/WUaSPress ... Wow!


http://scott-macleod.blogspot.com/2018/05/red-panda-not-enough-kindness-in-world.html



**





***
Wikidata office hours today re newly released lexicographical data - and related licensing information :

e.g.

16:54:18  Hi Lydia_WMDE - in Wikidata's unfolding relationship with Google (eg Wikipedia/Wikidata is used by Google a lot) - and now potentially re lexicographical data, could you say a little about how you think something like GNMT / Google Translate will use lexicographical data in Wikidata / Wikipedia's 301 languages please - and re CC licensing too?
16:54:38  to throw some random numbers into the room – 100M lexemes before the end of 2020?
16:55:06  It would be amazing!
16:55:07  I agree. We should wait for the tools so that we will not have duplicate entries
16:55:26  Scott_WUaS: I don't know :D Anyone can use it for anything. That's why we do this, right? I hope that we will see a lot of new tools being developed by organisations that support small languages.


... AND ...


17:04:39  CC-0 licensing question: Is Wikicitation - and possibly re lexicographical data for translation -
17:04:53  Is WikiCite CC-0 licensed?
17:05:15  WikiCite is a project. The data they add to Wikidata is CC-0.
17:05:15  Thank you for the office hour!
17:05:26  Thank you, Lydia!

...


16:00:26  #startmeeting Wikidata office hour
16:00:26  Meeting started Tue May 29 16:00:26 2018 UTC and is due to finish in 60 minutes.  The chair is Lydia_WMDE. Information about MeetBot at http://wiki.debian.org/MeetBot.
16:00:26  Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
16:00:26  The meeting name has been set to 'wikidata_office_hour'
16:00:27  Meeting started Tue May 29 16:00:26 2018 UTC and is due to finish in 60 minutes.  The chair is Lydia_WMDE. Information about MeetBot at http://wiki.debian.org/MeetBot.
16:00:27  Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
16:00:27  The meeting name has been set to 'wikidata_office_hour'
16:00:52  Hello world!
16:00:56  Hello word :D
16:00:58  o/
16:01:02  hello, double meetbot :D
16:01:19  Auregann_WMDE: ah, nice one :)
16:01:38  Hello
16:02:05  So, we're going to start, as usual, with an overview of what happened in the dev team since the last office hour (end of January, time flies)
16:02:12  then we will have a time for questions
16:02:35  The second part of the meeting is dedicated to a special topic, and today the special topic is of course the release of lexicographical data on Wikidata :)
16:02:42  Who is here for the office hour?
16:03:02  o/
16:03:04  o/
16:03:20  o/
16:03:25  Yay my favourite people :)
16:03:36  Alright let's get this started then
16:04:10  I'll do an overview of what happened around the development. A lot has happened and I'm only going to concentrate on the most important things.
16:04:22  First of all: We have lexicographical data on Wikidata now! \o/
16:04:48  It took us a lot of time to get to this point but now the first version is finally out and we can talk about it more in the second part of the meeting.
16:05:25  We also did a lot of work on improving usage tracking and with that what kind of Wikidata changes are shown on Wikipedia watchlists and recent changes.
16:05:32  o/
16:06:13  Some of the biggest critizism from Wikipedians was that this has been pretty bad in the past and I hope that this is much better now. If you still see things that are not good please let us know and we can look into it more.
16:06:52  We also asked for input on how to improve our Lua functions to make it easier to create infoboxes with Wikidata.
16:07:03  You can still give input here: https://www.wikidata.org/wiki/Wikidata:New_convenience_functions_for_Lua
16:07:37  based on the Feedback we got we already made a few changes like a function that allows checking if an item is a subclass or instance of another item
16:07:48  and a function to test if an item ID is valid
16:08:34  Then we improved the constraints checks. Specifically we added a bunch of new constraints to be able to find even more errors: https://www.wikidata.org/wiki/Wikidata:Project_chat/Archive/2018/05#New_constraint_types
16:08:54  And the constraints are now enabled for all logged in users to help make errors more visible for more people
16:09:47  The search is also much improved and is now running on elastic.
16:10:32  We also continued our efforts to make it easier to install and use Wikibase outside Wikimedia by offering for example Docker images that people can use to easily set up their own knowledge base
16:11:00  And last but not least a nice little tweak: images are now shown with a thumbnail instead of just a link to the commons page
16:11:18  Any questions so far about any of this?
16:11:52  Sweet then I'll jump to the next part: what's next
16:12:20  We'll continue to polish/build out/improve the support for lexicographical data
16:12:21  Excellent.
16:12:59  And we'll spend a bit more time on the constraints and then investigate how to best integrate shape expressions into Wikidata as another more powerful tool to help with data maintenance
16:13:35  nice!
16:13:43  And we'll spend time on showing labels, descriptions and aliases in all the languages on mobile (right now you only see your own language)
16:13:47  great!
16:14:12  I'll go more into the lexicographical data part later.
16:14:34  Any questions about those? Or should we hand it over to Auregann_WMDE?
16:15:53  Alright, so appart from development, a lot of cool stuff happened during the past 5 months
16:15:57  Is there any plan on having a proper way of doing lists from Wikidata in Wikipedia, Wikisource... ?
16:16:07  with something like "simple queries"
16:16:08  Since February, we got 5 new admins: Kostas20142, Putnik, Okkn, Pintoch, Addshore. Welcome or welcome back!
16:16:32  Tpt[m]: yes but not before the things I listed unless someone pushes for it
16:16:55  Let me see if there is a ticket to collect the ideas/plan
16:17:01  Items now contain an average of 9 statements https://grafana.wikimedia.org/dashboard/db/wikidata-datamodel-statements?refresh=30m&panelId=4&fullscreen&orgId=1&from=now-2y&to=now
16:17:23  Plenty of conferences, Wikidata workshops and events happened! Thank you all for making the Wikidataverse so active :) Top day was May 5th with 3 Wikidata workshops organized in different countries :p
16:17:34  Wikidata:Tools has been reorganized and updated, thanks to Pasleim! Feel free to help keeping this page up to date https://www.wikidata.org/wiki/Wikidata:Tools
16:17:44  The RFC about Privacy and Living People policy has been successfully closed https://www.wikidata.org/wiki/Wikidata:Requests_for_comment/Privacy_and_Living_People
16:17:47  Tpt[m]: https://phabricator.wikimedia.org/T67626 (though nothing too useful -.- I should spend some time expanding this)
16:18:04  As usual, a lot of new tools were created, updated or discovered:
16:18:14  EditGroups https://tools.wmflabs.org/editgroups/  is a new tool that lets you review, discuss and revert entire edit groups made by various tools. Try it and let feedback to Pintoch
16:18:22  The property explorer sorts and displays properties per category https://tools.wmflabs.org/prop-explorer/
16:18:28  ok! thanks!
16:18:31  A new version of Denelezh, a tool to monitor the gender gap in Wikidata, has been released, including a new methodology to produce the data, and an overview of the gender gap by Wikimedia project https://denelezh.dicare.org/gender-gap.php
16:18:37  You can try the new Drag&Drop gadget developed by Yarl and give feedback https://www.wikidata.org/wiki/Wikidata:Project_chat#Drag'n'drop_gadget_rewrite_%E2%80%93_feedback_welcomed
16:18:47  OpenRefine 3.0 beta was released. You can get an overview of the new Wikidata-related features with tutorials and videos https://www.wikidata.org/wiki/Wikidata:Tools/OpenRefine/Editing
16:19:11  Relator is providing the family tree of a person https://tools.wmflabs.org/wikidata-todo/relator
16:19:36  I want to give OpenRefine 3.0 a try soon. I saw a demo that looked impressive - it now allows to add data to Wikidata!
16:19:54  spinster: you should, it's awesome :)
16:19:59  We also selected a few articles that are worth a look
16:20:02  Yeah and even cooler: it gives you reports of potential issues before import! \o/
16:20:31  Making women more visible online https://blog.wikimedia.org/2018/03/29/increasing-visibility-women-with-wikidata/
16:20:38  The work of Goran Milovanovic on the usage of Wikidata accross the Wikimedia projects https://blog.wikimedia.org/2018/01/29/from-the-life-of-wikidata/ + https://www.wikidata.org/wiki/Wikidata:Wikidata_Concepts_Monitor/WDCM_Journal
16:20:46  Discovering Types for Entity Disambiguation on OpenAI https://blog.openai.com/discovering-types-for-entity-disambiguation/
16:20:51  Some ways Wikidata can improve search and discovery http://blogs.bodleian.ox.ac.uk/digital/2018/02/14/some-ways-wikidata-can-improve-search-and-discovery/
16:20:57  Using Wikidata to build an authority list of Holocaust-era ghettos https://blog.ehri-project.eu/2018/02/12/using-wikidata/
16:21:04  Martin Poulter gave a TEDxBathUniversity talk about Wikidata https://www.youtube.com/watch?v=Wj8na1GFXMs
16:21:18  There have been also some scientific papers related to Wikidata
16:21:26  Practical Linked Data Access via SPARQL: The Case of Wikidata https://iccl.inf.tu-dresden.de/w/images/8/85/Wikidata-SPARQL-queries-Bielefeldt-Gonsior-Kroetzsch-LDOW-2018.pdf
16:21:32  Towards a Question Answering System over the Semantic Web https://arxiv.org/abs/1803.00832
16:21:38  Automatically Generating Wikipedia Info-boxes from Wikidata http://aidanhogan.com/docs/infobox-wikidata.pdf
16:21:44  Mind the (Language) Gap: Generation of Multilingual Wikipedia Summaries from Wikidata for ArticlePlaceholders https://2018.eswc-conferences.org/wp-content/uploads/2018/02/ESWC2018_paper_131.pdf
16:21:54  Yes I know, that's a lot to read ^^
16:22:18  Any further question before we focus on Lexemes?
16:22:33  there was one article I really liked about using Wikidata for authority control or something similar, a few weeks ago I think…
16:22:38  was that one of the ones you mentioned?
16:22:45  I can’t find the link right now unfortunately
16:23:06  No, thank you for the tools. I will test them.
16:24:08  Lucas_WMDE: I'll have a look
16:24:38  Alright, then...
16:24:48  ...we have lexicographical data on Wikidata \o/
16:25:00  Finally! :D
16:25:17  \o/
16:25:17  Oh yeah!
16:25:18  to read about the details, and the current status of the first release, I encourage you to have a look at the announcement https://www.wikidata.org/wiki/Wikidata_talk:Lexicographical_data#First_experiment_of_lexicographical_data_is_out
16:25:19  congrats
16:25:32  and all the discussions are happening on https://www.wikidata.org/wiki/Wikidata_talk:Lexicographical_data
16:25:48  You've been many to discuss on this page, everyone is very constructive, I love that :)
16:26:44  Lucas_WMDE http://swib.org/swib13/slides/steinmetz_swib13_106.pdf
16:26:54  Just as an idea: during the first 3 days after the release, 1111 Lexemes created and improved in 49 languages by 119 people!
16:27:34  A lot of people are playing with the data, discussing about the best way to organize it :)
16:27:58  And of course, people have been starting building tools on the top of it, mostly to help with the features that are not there yet (search, queries)
16:28:42  let's mention Ordia by Finn Nielsen, providing some search on the first lexemes https://tools.wmflabs.org/ordia/search?q=hus
16:29:07  Lucas also wrote a hack to make nice graphs appear :) https://lucaswerkmeister.github.io/wikidata-lexeme-graph-builder/?subjects=L88%2CL129&predicate=P5191
16:29:24  and I see some python scripts running here and there ;)
16:29:32  I really should have used some less stupid example lexemes for the demo link :D
16:29:59  :D
16:30:02  reminder: don't be too hard on the APIs right now, we're going to improve it in the future so it supports heavy queries :)
16:30:27  alright people, I need to leave now, I have to go to the dentist :o
16:30:39  cu Auregann_WMDE :)
16:30:39  have a nice evening and see you soon onwiki :)
16:30:57  That brings us to the what's next for lexicographical data on Wikidata
16:31:03  Good Bye
16:31:18  Obviously there are a lot of things missing still or not polished.
16:31:43  This includes things like showing the Lemma in recent changes/watchlist/AllPages etc
16:31:57  Some messages that are not really understandable for people
16:32:20  Fixing all these smaller and bigger things is one thing I want to concentrate on
16:32:49  Then we have Search, which is sorely missed. Stas is working on that at the moment.
16:33:22  Then we have querying. Tpt[m] was amazing and wrote a draft for the RDF mapping we need to support that. https://www.wikidata.org/wiki/Wikidata:Project_chat#Draft_for_the_RDF_mapping_of_Wikibase_Lexeme
16:33:47  If you have input on that please give it really soon so it can still be taken into account.
16:34:12  And then there is of course support for Senses which is needed to complete the base.
16:34:42  I'd love to hear from you what would be most important to you so we can make sure we prioritize right.
16:35:20  As a change of an Arabic diacritic can change the meaning of the word, we have to add diacritics to lexemes. The matter is that there is no database involving diacritized lexemes.
16:35:21  Also especially all the little annoying things that are not right yet: it would be super helpful to know about them.
16:36:06  Csisc: can you clarify? There is not existing other dictionary that does that? Or Wikidata doesn't do it? Or?
16:36:26  (Sorry for my ignorance as a non-speaker of Arabic)
16:38:25  Oh and if you're so inclined: check out all the ideas people aready wrote down for querying: https://www.wikidata.org/wiki/Wikidata:Lexicographical_data/Ideas_of_queries
16:38:41  and for tools to build on top of that data: https://www.wikidata.org/wiki/Wikidata:Lexicographical_data/Ideas_of_tools
16:38:52  Please add yours if you have additions
16:40:49  Thanks: Auregann_WMDE !
16:40:58  oooh, office hour
16:41:13  Arabs tend not to put diacritics of words when writing in Arabic. Arabic diacritics are the equivalent of vowels.https://en.m.wikipedia.org/wiki/Arabic_diacritics. A change of the quality of an Arabic diacritic can change a lexeme into another one.
16:41:54  Ok
16:42:12  So we'd cover them in different Lexemes in Wikidata I guess?
16:43:09  Or is that not a good idea for some reason?
16:43:14  Yes, of course. That is why we have to diacritize the lexemes we have before adding them to Wikidata
16:43:26  Ok. Makes sense.
16:43:43  Is there anything we should change/add in the software?
16:44:45  Add a Lua function to get lexemes lemma to be able to easily link them from wikitext
16:45:29  Tpt[m]: hah! Yes. I'll check later if we already have a ticket for that but I think not.
16:45:35  Will make one then.
16:46:01  Tpt[m]: Would you prioritize that over any of the things I mentioned above?
16:46:24  For example, I have added some lexemes as labels to Wikidata entities before the creation of Wikidata's Lexicographical Data. I ask if we can extract these labels and integrate them to the Lexicographical Data.
16:46:47  "what would be most important to you" → I love the order you've already followed when explaining the Lexicographical stuff :)
16:46:56  \o/
16:47:31  Csisc: hmmm good question. Is there an easy way we can find the ones that should be Lexemes? We don't want to create them en mass for people for example right?
16:48:31  Lydia: No, I believe that UI, search and SPARQL queries should go first
16:48:34  but we should not wait months
16:48:42  Yes, we can use statements like P31/P279 to check what are the labels to be added to the Lexicographical Data
16:48:49  Tpt[m]: heh alright. good to know
16:50:10  Csisc: ok. I guess it's a good idea to wait with that until we have search or queries to avoid creating a ton of duplicates
16:50:20  but not up to me at the end of course
16:50:38  Lydia: This kind of features could be easily added by volunteers if there is a consensus on a good function name
16:50:44  My suggestion though is to wait with masscreation until we at least have recent changes integration and search improved
16:51:15  Tpt[m]: sounds good! I don't have a good name idea right now but happy to brainstorm
16:51:32  great!
16:51:53  Tpt[m]: I'll create the ticket and we can collect suggestions there
16:52:14  thanks
16:52:41  About the masscreation... can we estimate how many lexemes Wikidata will have in the future?
16:52:58  abian: uhhhh good question!
16:53:01  any guesses?
16:53:37  If all proper names are accepted, as they're now, this could explode :)
16:53:51  depending on how enthusiastic the community is, I think they could easily overtake items in the future
16:53:55  Yeah not so sure if that's really useful but maybe
16:53:57  even without names
16:54:05  Lucas_WMDE: agreed
16:54:18  Hi Lydia_WMDE - in Wikidata's unfolding relationship with Google (eg Wikipedia/Wikidata is used by Google a lot) - and now potentially re lexicographical data, could you say a little about how you think something like GNMT / Google Translate will use lexicographical data in Wikidata / Wikipedia's 301 languages please - and re CC licensing too?
16:54:38  to throw some random numbers into the room – 100M lexemes before the end of 2020?
16:55:06  It would be amazing!
16:55:07  I agree. We should wait for the tools so that we will not have duplicate entries
16:55:26  Scott_WUaS: I don't know :D Anyone can use it for anything. That's why we do this, right? I hope that we will see a lot of new tools being developed by organisations that support small languages.
16:56:04  Thanks :)
16:56:11  Lucas_WMDE: :panic emoji:
16:56:12  :D
16:56:26  my impression is that Lexemes are much better than item for names (that are definitely closers to lexical element than to usual concepts like people or places...)
16:56:27  Bots will start to create lexemes soon with no statements, I guess :)
16:56:35  Or with a few
16:56:52  Yeah
16:57:16  algorithmic heaven :) ... I don't know :D Anyone can use it for anything. That's why we do this, right? I hope that we will see a lot of new tools being developed by organisations that support small languages.
16:57:17  So the number will skyrocket
16:58:21  We can start by making guesses for how things will look like after 1 month and then see how far off we are :D
17:00:33  Just a question. I ask when we can add senses to the Wikidata's Lexicographical Data.
17:00:58  Thank you!
17:01:09  Csisc: We'll start the development next week. My best guess is 3 months at this point but that is a rough guess.
17:01:12  Mainly Q-Embedded senses.
17:03:06  Alright. Any remaining questions? Wishes? Thoughts?
17:04:35  If not then I think we can wrap it up and I'll go file a ticket for the Lua function ;-)
17:04:39  CC-0 licensing question: Is Wikicitation - and possibly re lexicographical data for translation -
17:04:53  Is WikiCite CC-0 licensed?
17:05:15  WikiCite is a project. The data they add to Wikidata is CC-0.
17:05:15  Thank you for the office hour!
17:05:26  Thank you, Lydia!
17:05:27  Thank you so much everyone for coming!
17:05:31  Thank you.
17:05:55  I'm still taking your best guess for how many lexemes we will have after 1 month by email ;-)
17:06:05  <3 a="" name="l-196">
17:06:37 #endmeeting



*

[09:58] 
<Lydia_WMDE> We can start by making guesses for how things will look like after 1 month and then see how far off we are :D
[10:00] <Csisc> Just a question. I ask when we can add senses to the Wikidata's Lexicographical Data.
[10:00] <Scott_WUaS> Thank you!
[10:01] <Lydia_WMDE> Csisc: We'll start the development next week. My best guess is 3 months at this point but that is a rough guess.
[10:01] <Csisc> Mainly Q-Embedded senses.
[10:03] <Lydia_WMDE> Alright. Any remaining questions? Wishes? Thoughts?
[10:03] == heatherw [~administr@wikimedia/heatherawalls] has quit [Ping timeout: 260 seconds]
[10:04] <Lydia_WMDE> If not then I think we can wrap it up and I'll go file a ticket for the Lua function ;-)
[10:04] <Scott_WUaS> CC-0 licensing question: Is Wikicitation - and possibly re lexicographical data for translation -
[10:04] <Scott_WUaS> Is WikiCite CC-0 licensed?
[10:05] <Lydia_WMDE> WikiCite is a project. The data they add to Wikidata is CC-0.
[10:05] <Tpt[m]> Thank you for the office hour!
[10:05] <Scott_WUaS> Thank you, Lydia!
[10:05] <Lydia_WMDE> Thank you so much everyone for coming!
[10:05] <Csisc> Thank you.
[10:05] <Lydia_WMDE> I'm still taking your best guess for how many lexemes we will have after 1 month by email ;-)
[10:06] <Lydia_WMDE> <3 div="">
[10:06] <Lydia_WMDE#endmeeting
[10:06] <wm-labs-meetbot`> Meeting ended Tue May 29 17:06:37 2018 UTC.  Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4)
[10:06] <wm-labs-meetbot> Meeting ended Tue May 29 17:06:37 2018 UTC.  Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4)
[10:07] Lucas_WMDE waves
[10:07] == Lucas_WMDE [Lucas_WMDE@nat/wmf/x-anqjaudrkobiuivf] has left #wikimedia-office ["Good Bye"]
[10:07] <abian> We have to give a high number and then manipulate the project so that our guess is fulfilled :D
[10:08] <abian> "manipulate" = "create lots of lexemes" O:)
[10:08] <Lydia_WMDE> Tpt[m]: https://phabricator.wikimedia.org/T195895 - let me know if that's totally not what you had in mind
[10:08] <Lydia_WMDE> abian: tststs :P
[10:09] <Tpt[m]> Lydia_WMDE: It's perfect! Thanks!
[10:09] <Lydia_WMDE> abian: we'd of course never do that, right? ;-)
[10:09] <abian> Sure :D




***

Léa Lacroix via lists.wikimedia.org 

12:20 PM (1 hour ago)
to Discussion
Hello all,
Thanks to the participants!

*







*

Wikidata https://www.wikidata.org/wiki/Wikidata:Lexicographical_data … devs & related licensing information: Scott: "Lydia_WMDE -in Wikidata's unfolding relationship w Google where Wikipedia/Wikidata is used a lot-301 languages "WikiCite is a project. The data they add to Wikidata is CC-0" https://scott-macleod.blogspot.com/2018/05/fork-tailed-flycatcher-world-univ-and.html … ~





*

I think Wikidata's project re Wikicite books' data in 301 languages will become the basis of WUaS Bookstores - e.g. of which these are the beginning ...  


- in each of all ~200 countries' official / main languages.


And I think that this new Wikidata lexicographical data will become the basis of the Academic Press at World University and School planned with machine translation http://worlduniversityandschool.org/AcademicPress.html (https://twitter.com/WUaSPress) -  in all 7097 living languages emerging from something like Google Translate / Google Neural Machine Translation ...



*









...



No comments: