Divided Brain, Map & Terrain – Sounds Like ML to Me

Despite the guest, Dr. Iain McGilchrist, explicitly rejecting the metaphor that the brain is like a computer, I can’t help but think about the process of building and incorporating machine learning models.

Psychiatrist and author Iain McGilchrist talks about his book, The Master and His Emissary, with EconTalk host Russ Roberts. McGilchrist argues we have misunderstand the purpose and effect of the divided brain. The left side is focused, concrete, and confident while the right side is about integration of ourselves with the complexity of the world around us. McGilchrist uses this distinction to analyze the history of western civilization. This is a wide-ranging conversation that includes discussions of poetry, philosophy, and economics.

When Luck Is Your Strategy

M.R.D. Foot tells a story of an underground agent forced to transport a B-2 wireless set (radio) through a railway station in which German forces were conducting random checks of luggage and personnel. The radio which the agent was carrying was a distinct size and shape and thus easily recognizable to alert police forces. The underground operative, realizing the precariousness of his situation, initiated a cunning security measure he presumed would reduce his risks.

He, reached a big terminus by train; carrying only his B2 in its little case; saw a boy of about twelve struggling with a big one; and said genially (in the local language) “Let’s change loads, shall we?” He took care to go through in front of the boy; there was no trouble. Round the first corner, they changed cases back. The boy said “It’s as well they didn’t stop you; mine’s full of revolvers.”

Foot, MRD, Resistance, (New York, McGraw-Hill Book Company, 1977), as quoted in: Underground Management: An Examination Of World War II Resistance Movements by Christian E. Christenson

MP3 and Entropy

If you’re like me, with a wife nursing in the other room and an infant distracted by any noise or movement, then your mind naturally drifts to the topic of entropy[1]. I saw something on “TV”[2] about music, and began wondering about whether different musical styles had different entropy.

Slashing around with a little Perl code, I hoped to look across my music collection and see what there was to see. Sadly, I don’t have the decoding libraries necessary[3], so instead I just looked at the mp3’s themselves. While most genres did show a good range, they are also relatively distinct, and definitely distinct at the extremes.

mp3entropy

While I can’t make any conclusions about how efficient mp3 compression is by genre, we can safely say that the further compressibility of the mp3 file is affected by the genre. I.e., there is room for genre specific compression algorithms. Which only makes sense, right? If you know more about the structure, you should be able to make better choices.


  1. Entropy is the measure of disorder in a closed system, with the scale ranging from completely random to completely static. It is used as a model in information science for a number of things, including compression: entropy can be used as a measure of how much information is provided by the message, rather than the structure of the message.For example, the patterns of letters in words in the English language follow rules. U follows q, i before e…, etc. The structure defined by these imprecise rules, in English, dictates about half of the letters in any given word. Ths s wh y cn rd wrds wth mssng vwls.The thinking is, should you agree to what the rules are, you can communicate with just the symbols that convey additional meaning. Compression looks to take advantage of this by eliminating as much of the structure as possible, and keeping just the additional information provided.

    Mathematically, the entropy is often referred to by the amount of information per symbol averaged across structural and informational symbols. English has an entropy around 0.5, so each letter conveys 1/2 a letter of information, the other 1/2 is structure[4]. ↩

  2. Probably Hulu?  ↩
  3. Darn you, Apple.  ↩
  4. Apparently, this makes English great for crossword puzzles. ↩

Red Hat is the Best IT Software Company!?

wtn-winner-badgeFriday night, Red Hat was honored as the “Best IT Software Company” along with luminaries from other industries including Elon Musk, LeVar Burton, and Jane Goodall by the World Technology Network.

Rodney Brooks, co-founder of iRobot and winner of the individual award for IT Hardware, made the joke that it was nice to have a hardware category, because we software guys never give hardware credit. I can think of a few jokes in the other direction, but he does have a point: none of us will be successful without each other. That’s why the evening was so much fun…

The room was full of customers and even some partners getting recognition for their work. These are the folks who have made us successful. I can’t wait to see what’s next!


 

November 15, 2014 at 1257PMThis award was also personally rewarding, as I had to leave my very jet-lagged wife at home with our very jet-lagged and cranky 4-month-old, after a week away with my mother. So, it’s probably good for me that we won.

I know you’re dying, so here’s a goofy photo of me in my tuxedo…

The Baby Measureur

R Code for Our Kid

Not to long ago, a tiny, screaming, pooping, extraordinarily amazing data manufacturing machine came into my life. Long accustom to taking subtle cues from my wife, his arrival was not a surprise; so I had plenty of time to prepare my optimal workflow for consuming baby data. Basically, I just installed Baby Connect apps on all of our devices.1

Baby Connect syncs feeding, diaper, health and all sorts of devices across multiple devices. So, when I change a diaper, I record it and can get credit for it. They also provide a number of graphs so you can see changes in the input/output of your bundle. I wanted something that would point me to changes. Fortunately, in a stroke of genius, they also allow you do download the data in CSV format from their website.2 So, with no sleep, a month of paternity leave3, and ready access to data, I started putting together some R code looking for patterns through cluster analysis.4

Feeding the Beast

For the month and half this kid has been living with us, the model based clustering identified five clusters of feedings, when measured across the datetime, time of day, and duration of feedings.

Feeding Duration
My kiddo was eating either long or short, for the first week and a half. For the next two weeks the variation in duration of feedings came down enough to be considered a single cluster. For the next two-and-a-half weeks, the variation decrease further. The difference in feeding duration for the first fortnight is particularly noticeable in the graph below.
Feeding Duration

You’ll note I’ve discussed four clusters. The fifth has a single entry (Aug 19th, just before 5am). I have no idea what that’s about.

Long and short of it: if my kid’s like yours, you will definitively see changes to eating patterns over the first weeks.

Making Diaper Changing Cool Again

Running the similar tests over the diaper data, I calculate three clusters, and again see them largely grouped chronologically.
Diaper Timing
In the first, my boy went any damn well time he pleased. In the second, for a week, there’s a noticeable dropoff in quantity of diapers. In the third, quantity picks up again, but we also see the introduction of a small kindness: fewer changes after 9pm. Yes, interested parties, my boy is thankfully starting to fall into sleep patterns as well as sleep more. But, what’s going on in that middle cluster? For that, I look at the reasons for diaper changes.
Diaper Changes by Type
This graph requires explanation (and simplification). Aside from boredom and performance art, there are two main reasons I change diapers. These two reason are often, but not always, concurrent. This graph looks at those two reasons, and tests whether they are concurrent: yes on top, no on the bottom. The Y-axis is otherwise irrelevant, and variation is in place only so the points are more readable by not all occurring in a boring line.

What we see here is my data producer had ~2/3 exclusive diapers in his first two weeks. Then mostly double diapers, for a week. And now, about an even split. Note the shift to longer feedings during the same week (second graph), this coincided with a growth spurt, not that I can tell except for looking at my calendar.

What’s Next

Please, jump in. Take a look at the code. Use the code. Provide ideas, patches, comments.

Git Hub: babyconnectR


  1. Thank you, Gunnar
  2. I’d prefer a way to download it all at once, but by month isn’t so bad. 
  3. Thank you, Red Hat
  4. Most of the measures didn’t show me much, but I’ve added them all to the github repo as it could be an artifact of the data. 

NYC Taxi GPS

Chris Whong got a hold of a lot of GPS data for NYC taxis, using the cleverest hack of all: He asked nicely.

Not sure what I want to do with this yet, but spend an evening putting the basics together, and drew this from a small subset of the taxi dropoff locations. I love how the slightly fuzzy data lines up with street locations, and looks like the skeletons of pressed leaves.

dropoff
(click for full size)

There’s also an impressive zoomable map with both pickup and dropoff data by @enf.

[ hat tip to: @j0el ]