Friday, May 27, 2016

Semien Mountains, Ethiopia: Wikidata supports 358 languages, Paper by VRANDEČIĆ, Denny, KRÖTZSCH, Markus, The possible number of Wikidata languages (and inter-lingual developments) will emerge as these two projects develop - RFC: Per-language URLs for multilingual wiki pages - https://phabricator.wikimedia.org/T114662 - RFC: make Parser::getTargetLanguage aware of multilingual wikis - https://phabricator.wikimedia.org/T114640, How easy ahead will it be to simply "populate" a list of all 8K languages in Wikidata/Wikibase for various artificial intelligence, machine learning and machine translation developments? Another aspect of the developing Wikidata/Wikibase languages' information technologies - with important Phabricator links to RFCs (Request for Comment) - from Ryan Kaldari, who helped install WUaS in MediaWiki in English on Wed., January 6, 2016 at the Wikimedia Foundation in SF during the Wikimedia Developers Conference. His thread mentions the important Wiktionary ... Phabricator is helpful developers' organizational software, used by Wikitech folks in both of these instances above ...

Hi all,
looking into [1] I read that Wikidata supports 358 languages. Is it still true? For example, I tried to add label in language coded as "nan" (defined in ISO 639-3) and it worked. However it didn't worked for e.g. "arb", which is also part of the ISO 639-3 standard. So how many?
Thanks
 Jan

[1] VRANDEČIĆDennyKRÖTZSCH, Markus. Wikidata: A Free Collaborative KnowledgebaseCommunications of the ACM. 2014-10, Vol. 57 No. 10, 7885. DOI 10.1145/2629489. http://cacm.acm.org/magazines/2014/10/178785-wikidata/fulltext

_______________________________________________
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

*
Hi Jan and Wikidatans, 

That sounds about right. 

I think the possible number of Wikidata languages (and inter-lingual developments) will emerge as these two projects develop a little further - 

RFC: Per-language URLs for multilingual wiki pages -

RFC: make Parser::getTargetLanguage aware of multilingual wikis - 

which I just learned yesterday in the #wikimedia-office hour at 2pm PT (on Wednesdays).

CC World University and School, which is like CC Wikipedia with CC MIT OCW in 7 languages and CC Yale OYC (and planning online free best STEM CC OCW accrediting university degrees in ~204 countries' main and official languages) is seeking additionally to develop wiki schools for open teaching and learning in all 7,943+ languages building in CC Wikidata/Wikibase, if possible - as well as for an universal translator. How easy ahead will it be to simply "populate" a list of all 8K languages in Wikidata/Wikibase for various artificial intelligence, machine learning and machine translation developments?  

Best, 
Scott



*
https://commons.wikimedia.org/wiki/Template:LangSwitch

*
Some recent Wikimedia office hours with related language developments ...

May 18

[15:03] <TimStarling#endmeeting
[15:03] == wm-labs-meetbot changed the topic of #wikimedia-office to: Wikimedia meeting channel | Please note: Channel is logged and publicly posted (DO NOT REMOVE THIS NOTE) | Logs: http://bots.wmflabs.org/~wm-bot/logs/%23wikimedia-office/
[15:03] <wm-labs-meetbot> Meeting ended Wed May 18 22:03:45 2016 UTC.  Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4)
[15:04] <Scott_WUaS> :)


*
May 25

14:36] <cscott> so in the unconference spirit the session got redirected to match what people actually wanted
[14:36] <Scott_WUaS> (gwicke: can you please share some URLs that currently focus "multi-lingual content should interact with caching & parsing"?)
[14:36] <gwicke> it's not specifically targeted at i18n currently
[14:37] <cscott> Scott_WUaS: that's daniel's RFCs
[14:37] <cscott> Scott_WUaS: T114662 and T114640
[14:37] <stashbot> T114662: RFC: Per-language URLs for multilingual wiki pages - https://phabricator.wikimedia.org/T114662
[14:37] <stashbot> T114640: RFC: make Parser::getTargetLanguage aware of multilingual wikis - https://phabricator.wikimedia.org/T114640
[14:37] <cscott> gwicke has commented on both of those, i believe, bringing up the caching issue
[14:37] <Scott_WUaS> Thnx:)




[14:59] <robla#endmeeting
[14:59] == wm-labs-meetbot changed the topic of #wikimedia-office to: Wikimedia meeting channel | Please note: Channel is logged and publicly posted (DO NOT REMOVE THIS NOTE) | Logs: http://bots.wmflabs.org/~wm-bot/logs/%23wikimedia-office/
[14:59] <wm-labs-meetbot> Meeting ended Wed May 25 21:59:38 2016 UTC.  Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4)
[14:59] == heatherw [~administr@wikimedia/heatherawalls] has quit [Quit: heatherw]

*
Here's another aspect of the developing Wikidata/Wikibase languages' information technologies - with important Phabricator links to RFCs (Request for Comment) - from the Wikitech email list and in an email thread begun by Ryan Kaldari (who studied at UC Berkeley), who helped install WUaS in MediaWiki in English on Wed., January 6, 2016 at the Wikimedia Foundation in SF during the Wikimedia Developers Conference. His thread mentions the important Wiktionary (think wiki dictionary) aspect for WMF / Wikidata translation developments ...

[Wikitech-l] Should we switch the default category collation to uca-default?

Ryan Kaldari rkaldari@wikimedia.org Unsubscribe

4:36 PM (16 hours ago)
to Wikimedia
There are currently 94 WMF wikis using UCA category collation rather than the default "uppercase" collation. The Unicode Collation Algorithm (UCA) is the official standard for how to sort Unicode characters, and generally follows how a human would typically alphabetize strings. For example, uppercase collation sorts Aztec, Ärsenik, Zoo, Aardvark as "Aardvark, Aztec, Zoo, Ärsenik", but uca-default collation sorts them as "Aardvark, Ärsenik, Aztec, Zoo". UCA collation also (optionally) supports natural numeric sorting so that 100, 1, 99 sorts as "1, 99, 100" rather than "1, 100, 99". The WMF Community Tech team has recently posted proposals on English Wikipedia and several Wiktionaries asking if these communities would support switching to UCA collation. The proposal on English Wikipedia has received unanimous support so far.[1] We thought that Wiktionaries would be more skeptical of the change, but so far we have received only positive responses.[2]

Since it seems that most wikis are receptive to switching to UCA, maybe we should just make it the default rather than waiting on all the wikis to request it separately. Of the large Wikipedias, French, Dutch, Polish, Portuguese, and Russian are already using UCA, and German is in the process of switching.[3] For non-Latin scripts, my understanding is that UCA will be a big improvement, especially if we switch them to language-specific implementations, like uca-ja, uca-zh, uca-ar, etc.

Three questions:
1. Does switching the default collation from "uppercase" to "uca-default" sound like a good idea?
2. Should this be proposed on meta or is it too technical?
3. Are there any wikis that would need to opt out of this for some reason? (I know there are issues with Kurdish,[4] but that's the only one I know about.)

1.
https://en.wikipedia.org/wiki/Wikipedia_talk:Categorization#OK_to_switch_English_Wikipedia.27s_category_collation_to_uca-default.3F
2. https://phabricator.wikimedia.org/T128502
3. https://phabricator.wikimedia.org/T128806
4. https://phabricator.wikimedia.org/T48235
_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

*
Phabricator is helpful developers' organizational software, used by Wikitech folks in both of these instances  above...


*




...



No comments: