Engineered AI Still Matters for Question Answering

https://youtu.be/BvMZlC9LBC8

Slides:
http://cognitive-science.info/wp-content/uploads/2018/09/CSIG.BillMurdock.ACS_.2018-09-13.pdf

*

13 September 2018

J. William Murdock

“Engineered AI Still Matters for Question Answering”

IBM

Slides

Recording

http://cognitive-science.info/community/weekly-update/

*
See, too:

I liked a @YouTube video https://t.co/2UtZiXfcvC Engineered AI Still Matters for Question Answering
— WorldUnivandSch (@WorldUnivAndSch) September 15, 2018

https://twitter.com/WorldUnivAndSch/status/1041010695192686592

*

Friday, September 14, 2018

Hi Bill (Murdock), and Jim (Spohrer),

Thanks for sharing your very interesting thoughts re CC licensing and IBM, Bill, and for your edifying ISSIP CSIG IBM talk yesterday. I'd like to bring Jim (a friend and colleague - https://twitter.com/JimSpohrer/status/1040597121307697154 - and MIT alumnus too) in on this conversation as well. The commercial - non-commercial distinction and Creative Commons' licensing (and re copyright) both in terms of for-profit companies as well as non-profit MIT-centric universities is a fascinating unfolding legal development as A.I. and machine learning spread (and create new legal & IP questions) into so many areas of our lives, and potentially in all ~200 countries (per WUaS), where IBM is in something like 165 out of ~200 countries. (World University and School, of which I'm the founder and head is like Wikipedia in 301 languages with CC-4 MIT OCW in its 5 languages, and seeks to offer online degrees in each of all ~200 countries' official languages, and wiki schools in all 7,097 living languages). And legally, WUaS can A) share and B) adapt the CC-4 MIT OpenCourseWare but C) non-commercially, and per MIT OCW Executive Director Cecilia d'Oliveira, but WUaS has to state these clarifications where we do so - http://worlduniversityandschool.org.

I'd also like to raise some hypothetical cases to come into conversation with your thinking re World University and School. 1) For WUaS, CC-4 MIT OCW, in its five languages, is a gold mine or motherlode, and WUaS seeks to offer it for free-to-students' online Bachelor, Ph.D., Law, M.D. and I.B. degrees in each of all ~200 countries' official languages, while seeking reimbursement from ministries of education in all ~200 countries per student per year. Hypothetically, if IBM might have a history of sending its young professionals to a MIT for a Ph.D. or to Harvard Law for a law degree, or similar, for example, the for-profit IBM would have a history of reimbursement to these schools for non-profit MIT-centric education.

In terms of your talk, and hypothetically again, 2) conceiving of developing avatar bots via Q&A machine learning, which/who could become avatar professors teaching CC-4 MIT OCW, I could see the benefits of coders, such as yourself, drawing data from this CC-4 MIT OCW (per your email below). In creating such avatar bot professors, I could also see the benefits of coders such as yourself drawing knowledge & data from a variety of books, such as might be contained in Wikipedia's / Wikimedia's CC-0 Wikicite as references (so this CC-0 commercial licensing recognizes book references as products, but only refers to citations presently), but potentially eventually coding with machine learning text itself from such books - and in all ~300 languages in Wikipedia. If IBM could create hypothetical avatar bot professors, (and MD surgeons too), in the future - and as kinds of products - would these avatar professor surgeons have benefitted from not having been coded for with CC-4 MIT OCW - re IBM? I appreciate Jim's comments in your talk on the value of openness, and think both kinds of CC licensing we're emailing about facilitate such openness in different ways. I also think WUaS's creation of such avatar bot professors will seek to remain CC-4 open in a few ways.

Hypothetical case 3) (per a presentation yesterday after Stanford Law CodeX I attended):
A recent physics' graduate from Stanford and a Argentinian lawyer living in Munich, Germany talked about developing Tutela machine learning legal AI software with a focus on Colombian human rights' law cases, which make up something like 25% of Colombian law cases - and in Spanish. I wonder how (from a hypothetical IBM perspective) how such legal documents in Colombia in Spanish should be licensed as datasets for machine learning and natural language processing and for legal tech companies to develop products from? I'd think either CC-4 or CC-0, but I'll let the Colombian alumni of Stanford Law and Harvard Law (mentioned at this talk), for example, and their teams make this decision. (Also, WUaS seeks to hire, for example, graduate students and law students who are learning to become faculty, to teach Colombian law online as it develops online in Spanish, and as one example, the licensing of such Colombian legal corpus for use as datasets could be significant with what you write (since IBM may be in Colombia, for ex). In the interest of serving the people of Colombia (re questions of justice as well), WUaS would potentially see the benefits of licensing such legal documents as CC-4 or similar in all likelihood.

So, here's a coding challenge conceptually in these regards: how best to plan for, and code for, WUaS's wiki subject pages with their focus on offering CC-4 MIT OCW for free-to-students' degrees - eg https://wiki.worlduniversityandschool.org/wiki/Subjects at "front end" - newly with this SUBJECT_TEMPLATE - https://wiki.worlduniversityandschool.org/wiki/SUBJECT_TEMPLATE - with Wikidata/Wikibase as "back end" ... and in coding conceptually Q&A professor avatar bots drawing from CC-0 Wikicite (in its 300 languages) and CC-4 MIT OCW (in its 5 languages), as well as drawing on Wikimedia's Lexeme project with parts of words, which might much later interoperate with Google Translate, I wonder? (See some related info and contacts at Wikimedia below too - and how it is that WUaS is working with Wikidata - given the value of their ~300 languages).

Non-profit 501 c 3 World University and School is also seeking to collaborate with companies such as IBM, for example, in some of these questions, (and WUaS also has a parallel for-profit general stock company, both planned in 200 countries and in 7097 living languages around these 14 planned revenue streams - https://worlduniversityandschool.blogspot.com/2016/01/14-planned-wuas-revenue-streams.html - but where we haven't hired yet on either wing). IBM could conceivably benefit greatly from engaging MIT OCW in some of these regards as well.

Thanks again for your edifying, timely and topical talk yesterday, and it would be great to explore some of these questions further with time. Thank you, Bill.

Cheers, Scott

And thanks for this too, Dario - https://twitter.com/ReaderMeter/status/1037349669335126016 - re that "all Wikicite data is CC0 licensed," which when I met you, Dario, in January 2017, and with Lydia, in the session at the WikiDev SF conference, I think you both had said was likely to become the case - ie that Wikicite would become CC-0 licensed, so commercial, and by way of comparison with CC-4 licensed MIT OCW in its 5 languages, which CC4 attributes allow for 1) sharing 2) adapting but 3) non-commercially. And as you you probably all know, World University and School is CC-4 MIT OCW-centric on our non-profit side, but for our WUaS planned bookstores (eventually in all ~200 countries' official / main languages, and indeed in all 7097 living languages), we are seeking to develop with CC-0 licensing - and hence in all ~300 of Wikidata's languages. And WUaS donated itself to Wikidata in 2015 for co-development, and got our new "front end" Miraheze MediaWiki in 2017 - https://wiki.worlduniversityandschool.org/wiki/Subjects - probably as a consequence of our WUaS donation to Wikidata).

Here by the way is the WUaS SUBJECT TEMPLATE as concept which WUaS seeks to develop with Wikidata/Wikibase ...
... https://wiki.worlduniversityandschool.org/wiki/SUBJECT_TEMPLATE ....

*

Hi Bill,

Thanks for your helpful talk this morning. Here's my question again: In your / IBM’s use of Wikipedia data, what role does Wikipedia’s Creative Commons’ licensing play here if any? For example, Wikimedia / Wikipedia engages both CC-0 - so commercial - licensing in Wikicite for bibliographic data, while Wikimedia / Wikipedia also engages other non-commercial CC licensing. (I ask with CC-4 MIT OCW-centric wiki World University and School in many languages in mind - where CC-4 licensing allows for A) sharing B) adapting but C) non-commercially).

And here's an idea at the top that emerged partly in conversation with your talk - https://scott-macleod.blogspot.com/2018/09/dodders-chat-bots-that-use-or-avatar.html.

Some Stanford Law CodeX presentations raised some further related questions.

Might you be heading in the direction of talking avatar bots, and even in a realistic virtual earth with realistic avatar bots - eg https://twitter.com/hashtag/RealisticVirtualEarth?src=hash and even in online medical schools with tele-robotic surgery - https://twitter.com/WorldUnivAndSch/status/1038822141880233984 ?

Thanks again for your talk, and nice to be in communication.

Regards, Scott

Tweeted about this blog post as well here -

https://twitter.com/scottmacleod/status/1040400869307539456

& here -

https://twitter.com/sgkmacleod/status/1040401851366072320

Scott MacLeod - https://twitter.com/scottmacleod

World Univ and Sch Twitter - http://twitter.com/WorldUnivandSch

Languages - World Univ - http://twitter.com/sgkmacleod

WUaS Press - https://twitter.com/WUaSPress

“Naked Harbin Ethnography” book (in Academic Press at WUaS) - http://twitter.com/HarbinBook

(OpenBand (Berkeley) - https://twitter.com/TheOpenBand ) )

*

Hi. My personal feeling is that IBM should not use any content that
is licensed only for "non-commercial" purposes. I think some people
see those license terms and believe that data like that can still be
used for internal pure research as long as it is not connected to a
specific product. However, since IBM is a business, I would be
inclined to assume that _anything_ IBM does is commercial; even pure
research is expected to eventually indirectly benefit shareholders,
which some people could argue makes it commercial. I've taken a quick
look at the CC BY-NC 4.0 (non-commercial 4.0), and my personal read of
it is that employees of any business would not even be legally allowed
to look at that material if there was a possibility that they could
learn anything from it that might be relevant to their work. With
that said, however, I see there is also a CC BY 4.0 (regular 4.0)
creative commons license that does not seem to have this restriction,
and it looks like it would be fine for use with IBM Watson. Since I
am not a lawyer, if we were considering ingesting one of these sources
into an IBM Watson product, we would be sure to get a more qualified
legal opinion from IBM's IP lawyers first. (But absent instructions
to the contrary, I would definitely avoid looking at anything labeled
"CC BY-NC 4.0" for any reason).

I will look at the other content you sent me, ASAP. Thanks!On Thu,
Sep 13, 2018 at 9:08 PM Scott MacLeod

*

Dodders: Chat bots that use - Or 'AVATAR BOTs that 'speak' from Wikipedia'? eg with Google VOICE? In connecting WUaS's "front end" eg with Wikidata/Wikibase as "backend" via WUaS SUBJECT TEMPLATE how best to plan for Q & A re books' content?, Heard a great / edifying IBM cognitive science open talk this morning with many implications for voice and chat bots, and asked the question below about licensing of data given all the interoperating systems, Ethical Artificial Intelligence?, Do you engage questions of licensing (I'm particularly interested in Creative Commons' licensing/law, both CC-4 re MIT OCW in 5 languages, as well as CC-0 re Wikicite for bibliographic data in ~301 languages) at all, or are such distinctions on the horizon in your work with various datasets

http://scott-macleod.blogspot.com/2018/09/dodders-chat-bots-that-use-or-avatar.html

*

Hi Bill, (and Jim and Larry),

I blogged a bit more about your talk and our email conversation here today - http://scott-macleod.blogspot.com/2018/09/sarcodes-engineered-ai-still-matters.html - as well as the other day - http://scott-macleod.blogspot.com/2018/09/dodders-chat-bots-that-use-or-avatar.html. FYI, my blog link from today has information about both Wikicite and its head Dario Taroborelli's reader meter's Twitter as well as video describing Wikicite - https://twitter.com/ReaderMeter/status/1040735449105367040 and https://twitter.com/WorldUnivAndSch/status/1041016001117253632. It turns out all of the data in Wikidata itself, and not just its sister project Wikicite, is CC-0 licensed - so commercial - so I think MIT won't be adding CC-4 MIT OCW to Wikidata / Wikibase. My blog link from today also has a link to Miraheze Mediawiki's Twitter - https://twitter.com/miraheze/status/1040848571233431552 - and World University and School is in WUaS Miraheze Mediawiki.

Regards, Scott

* *

Miraheze now has a new blog for announcing technical developements and other Miraheze-related news! Check out https://blog.miraheze.org

Miraheze now has a new blog for announcing technical developements and other Miraheze-related news! Check out https://t.co/N1McrlNaHn
— Miraheze (@miraheze) September 15, 2018

https://twitter.com/miraheze/status/1040848571233431552

* *

A video of my #OpenSciRoadmap flashtalk introducing @Wikidata and @WikiCite is now available (thanks for producing these great little videos, y'all!)

A video of my #OpenSciRoadmap flashtalk introducing @Wikidata and @WikiCite is now available (thanks for producing these great little videos, y'all!)https://t.co/zEQGEkfwnn
— Dario Taraborelli (@ReaderMeter) September 14, 2018

https://twitter.com/ReaderMeter/status/1040735449105367040

*

I liked a @YouTube video https://t.co/1Bes3k7Hkx JROST Flashtalk: Wikidata & WikiCite
— WorldUnivandSch (@WorldUnivAndSch) September 15, 2018

https://twitter.com/WorldUnivAndSch/status/1041016001117253632

* * *
Am glad to have set up this daily newspaper for World University and School many months ago, which also contains the licensing clarifications that MIT asked WUaS to post when WUaS adds a MIT OCW link, or use their name. Am unclear who picks the pictures for these newspaper Tweets (it isn't me), and am happy to riff on the idea that WUaS may be a unicorn company here ...

The latest The MIT OpenCourseWare Daily! https://t.co/QcoTLpDPzG Thanks to @coinconference @SaraDuane @familyonabike #learning #elearning
— WorldUnivandSch (@WorldUnivAndSch) September 15, 2018

https://twitter.com/WorldUnivAndSch/status/1040775652834783232

* *

Hi M,

How was your day, and how are you?

Curious about ways in which Karen Brennan at Harvard Graduate School of Education could help facilitate my getting a position in the Stanford GSE, for example. The Stanford GSE doesn't seem to be teaching teachers how to teach both programming or (Lego) robotics with programming, both of which I could help facilitate via ScratchEd Meetups and more.

Also World University and School and the WUaS Corporation could both be what Silicon Valley calls a unicorn company ... a billion or trillion dollar company ... which Stanford might embrace developing and benefit from too. Perhaps my email to Stanford GSE / Harvard GSE will simmer in a great way over the weekend while Karen and Janet et al communicate ...

L,
Scott