≡ Menu

Beautiful, Beautiful Data

Flip over at InfoChimps has put together a massive scrape of twitter.  While trying to figure out how to process it all, I’ve drawn one day’s scrape* (20-Dec-08).  Many thanks to Flip and InfoChimps for the wow work, and doubly so for making publicly available.

twitterIt’s actually a subset of the day; I limited to nodes with more than one friend.  Why the limitation?  We working in at least O(n²) and that was over half of the nodes.  Was it necessary?  No, but it sure make it quicker for something I just wanted to see what it looked like.

{ 2 comments… add one }

  • Philip (flip) Kromer December 30, 2008, 5:43 pm

    … and so small world networks beget small world networks: I know Alex Adai, one of the LGL guys, from when he was here at UT-Austin.

    http://bioinformatics.icmb.utexas.edu/lgl/ is under the weather, and the SF.net repo seems static. Can you say more about your version of LGL?

    Also, have you played with cytoscape or Pajek for doing this stuff?

  • erich December 31, 2008, 3:24 pm

    I am most interested in the impact of an individual’s local network on the individual and vice versa; and have not found much literature on the topic, not to mention fewer tools. So, for analysis I mostly cobble software and algorithms together myself. I’ve poked at Pajek without much success, but cytoscape is new to me; I’ll definitely check it out, thanks! I think there’s potential in some of the economic heat map software out there too, for visualization and analysis, but have yet to have the patience to figure out the file formats.

    A lot of what I’ve done with visualization, and LGL in particular, is usually just because I think the images are great, rather than for any analysis. Large scale social networks start to look alike, one small world looks like another I guess. For example relationships between people in the news or SEC registered inside traders don’t look all that different until you start to see the combination of networks.

    My preference for LGL (I use the SF.net version with minor mods for higher contrast colors) is it can work with large-ish data sets before I run out of memory. If I had one New Year’s wish for LGL it’d be to parallelize the root node selection. I think I’m using it for at least an order of magnitude larger data sets than they planned for in the bioinformatics department, and it’s still wonderful. Please thank Alex for me.

Leave a Comment