MP3 and Entropy

If you’re like me, with a wife nursing in the other room and an infant distracted by any noise or movement, then your mind naturally drifts to the topic of entropy[1]. I saw something on “TV”[2] about music, and began wondering about whether different musical styles had different entropy.

Slashing around with a little Perl code, I hoped to look across my music collection and see what there was to see. Sadly, I don’t have the decoding libraries necessary[3], so instead I just looked at the mp3’s themselves. While most genres did show a good range, they are also relatively distinct, and definitely distinct at the extremes.


While I can’t make any conclusions about how efficient mp3 compression is by genre, we can safely say that the further compressibility of the mp3 file is affected by the genre. I.e., there is room for genre specific compression algorithms. Which only makes sense, right? If you know more about the structure, you should be able to make better choices.

  1. Entropy is the measure of disorder in a closed system, with the scale ranging from completely random to completely static. It is used as a model in information science for a number of things, including compression: entropy can be used as a measure of how much information is provided by the message, rather than the structure of the message.For example, the patterns of letters in words in the English language follow rules. U follows q, i before e…, etc. The structure defined by these imprecise rules, in English, dictates about half of the letters in any given word. Ths s wh y cn rd wrds wth mssng vwls.The thinking is, should you agree to what the rules are, you can communicate with just the symbols that convey additional meaning. Compression looks to take advantage of this by eliminating as much of the structure as possible, and keeping just the additional information provided.

    Mathematically, the entropy is often referred to by the amount of information per symbol averaged across structural and informational symbols. English has an entropy around 0.5, so each letter conveys 1/2 a letter of information, the other 1/2 is structure[4]. ↩

  2. Probably Hulu?  ↩
  3. Darn you, Apple.  ↩
  4. Apparently, this makes English great for crossword puzzles. ↩

Red Hat is the Best IT Software Company!?

wtn-winner-badgeFriday night, Red Hat was honored as the “Best IT Software Company” along with luminaries from other industries including Elon Musk, LeVar Burton, and Jane Goodall by the World Technology Network.

Rodney Brooks, co-founder of iRobot and winner of the individual award for IT Hardware, made the joke that it was nice to have a hardware category, because we software guys never give hardware credit. I can think of a few jokes in the other direction, but he does have a point: none of us will be successful without each other. That’s why the evening was so much fun…

The room was full of customers and even some partners getting recognition for their work. These are the folks who have made us successful. I can’t wait to see what’s next!


November 15, 2014 at 1257PMThis award was also personally rewarding, as I had to leave my very jet-lagged wife at home with our very jet-lagged and cranky 4-month-old, after a week away with my mother. So, it’s probably good for me that we won.

I know you’re dying, so here’s a goofy photo of me in my tuxedo…


Chris Whong got a hold of a lot of GPS data for NYC taxis, using the cleverest hack of all: He asked nicely.

Not sure what I want to do with this yet, but spend an evening putting the basics together, and drew this from a small subset of the taxi dropoff locations. I love how the slightly fuzzy data lines up with street locations, and looks like the skeletons of pressed leaves.

(click for full size)

There’s also an impressive zoomable map with both pickup and dropoff data by @enf.

[ hat tip to: @j0el ]

13 Years of VC Connections in NYC

Anyone ever tell you that fundraising is a small world? Drawing the graph of connections between funders and NYC companies from 2001 through the present, we see a highly connected core with most outliers representing single deals. That level of connectedness is why reputation matters so very much.

(Also thrilled to see the NYC funding ecosystem grow over the years!)


[data from CrunchBase]


West Village on Georgetown


Person of Interest is filming in my neighborhood today, and the crew has changed some of the street signs.

I probably would not have noticed, except this happens to be the “famous” corner where Waverley intersects with itself. Remember Kramer getting lost in the West Village and stumbling upon intersection of 1st and 1st?