Thursday, October 1, 2015

Strata Keynotes

I am attending my first #StrataHadoop Conference in NYC.

The Wed morning keynotes were well done. Lots of speakers, many sponsors, but each was short and about Data in general (not just a sales pitch). The keynotes were all streamed and are available for viewing.

Here are a few of the highlights that caught my interest:

First up was Mike Olson from Cloudera. It was the expected "Recent accomplishments, Big announcements, Exciting Future" talk. Not surprising is the growth of Spark and Kafka. I was also already aware of the new RecordService announcement. I still have to look up a few of the other mentioned partnerships like CounterTack Sentinel and work with healthcare ERM security. Also new projects such as Ibis and Kudu. 
He ended by pointing out that Hadoop is now 10 years old.

The second keynote was my favorite. AnnMarie Thomas (School of Engineering and Schulze School of Entrepreneurship, University of St. Thomas) talked about creative ways to encourage and teach STEM. While not specifically pointed out, it had a very clear message about the benefits of diversity in teams - any team.  Her students work with playdough to sculpt circuitry, experience circus training to learn higher math of physics, gain new perspectives by sharing knowledge with preschool children, and compare digital and human observations with cooking.  All really cool projects. This presentation gave me some ideas to add to my search for non-programming STEM projects for youth that I wrote about a few months ago.

Next up there was an amusing talk by Joseph Sirosh (Microsoft) discussing the How Old Robot. This was followed by Ron Kasabian (Intel) and Michael Draugelis (Penn Medicine) talking about the Trusted Analytic Platform and Penn Signals.  I think I dozed off a bit during the Tim Howes (ClearStory Data) talk.

Joy Johnson (AudioCommon) talked about Music Science followed by a related discussion of data in creative decisions by David Boyle (BBC Worldwide). These were interesting just not in my primary focus. I just do not have enough brain cells for all nifty research out there.

I enjoyed the talk by Jim McHugh (Cisco) on Data from the edge. Can I drive the race car next time? I did not realize that with all the wearables and small device sensors, that the Tour de France still mostly tracked progress with a guy on the back of a moterbike and chalkboard.  Next year there will be GPS devices on all the bikes and in the support vehicles. From the support vehicles, the data gets uploaded "real time" to a helicopter and from there down to a central van analytic truck. Teams, and more importantly, press (and there by fans) can get more acurate real time data of the race progress. 

DJ Patil (White House Office of Science and Technology Policy) talked about efforts to open data in a machine readable format. He pointed out that machine readable format does not mean PDF. He also asked that any training efforts make use of these open data sets. He continued to discuss some specific projects available and wrapped up with a plea to integrate data ethics into all programs and all training - not just as an add on or after thought or separate requirement - as a normal part of every step of every use of any data sets.

Katherine Milkman (Wharton School at the University of Pennsylvania) discusses improving decisions. I liked the examples of temptation bundling and the choice architecture, specifically the keyboard stairs in Stockholm. There was also a reference to the book Nudge. 

The final presentation was by Jeff Jonas (IBM) He was the founder of another company acquired by IBM and has a background in fraud detection analytics.  He discussed how context is important. His evil puzzle experiment is as fascinating as the space time boxes for asteroid hunting. 


