Category: Research
-
The Baby Measureur
R Code for Our Kid Not to long ago, a tiny, screaming, pooping, extraordinarily amazing data manufacturing machine came into my life. Long accustom to taking subtle cues from my wife, his arrival was not a surprise; so I had plenty of time to prepare my optimal workflow for consuming baby data. Basically, I just…
-
Calculating Conditional Entropy in R
conditionalEntropy <- function( graph ) { # graph is a 2 or 3 column dataframe if (ncol(graph) == 2 ) { names(graph) <- c(“from”,”to”) graph$weight <- 1 } else if (ncol(graph) == 3) names(graph) <- c(“from”,”to”,”weight”) max <- length(rle(paste(graph$from, graph$to))$values) …
-
The Lambert Effect – Subtleties in Cloud Modeling
After you’ve done all of the hard work in creating the perfect model that fits your data comes the hard part: does it make sense? Have you overly fitted your data? Are the results confirming or surprising? If surprising, is that because there’s a surprise or your model is broken? Here’s an example: iterating on…
-
Determining Application Performance Profiles in the Cloud
I want to know how to characterize my workloads in the cloud. With that, I should be able to find systems both over-provisioned and resource starved to aid in right-sizing and capacity planning. CloudForms by Red Hat can do these at the system level, which is where you would most likely take any actions, but I want…
-
Analyzing Cloud Performance with CloudForms and R
CloudForms by Red Hat has extensive reporting and predictive analysis built into the product. But what if you already have a reporting engine? Or want to do analysis not already built into the system? This project was created as an example of using Cloud Forms with external reporting tools (our example uses R). Take special care that you can miss context to…
-
City Green as a Function of City Parks
I stumbled across the City Nature project at Stanford via some interesting interactive data visualization they have created like the comparison between natural and social variables and the Naturehoods Explorer for 34 US cities. One of the comments in the comparison chart (first project link), was lack of clear relationships between any of the provided variables. As I’m…
-
Load Volatility and Resource Planning for your Cloud
Having your own cloud does not mean you are out of the resource planning business, but it does make the job a lot easier. If you collect the right data, with the application of some well understood statistical practices, you can break the work down into two different tasks: supporting workload volatility and resource planning.…
-
Measuring Load in the Cloud: Correcting for Seasonality
Usage is over threshold, unleash the kraken! Short run peaks are perfect for automated elasticity: the unpredictable consumption that we stay up late worrying about fulfilling. But, short run peaks can be difficult to tease out from expected variation within the period: seasonality. Using the open source statistical package R, we can separate and look…
-
Communication Method, Scale, and Entropy
In a surprise to Marshall McLuhan, we see ad hoc conversations conducted through different electronic media demonstrating very similar scaling characteristics across number of nodes, number of edges, and number of unique edges. Looking at email lists, IRC, and long-term twitter searches, we more similarity than difference between the three media. However, when look at the…
-
Life in a Networked Age
John Robb, who brought us the term “open source warfare,” wallops the concerns of governance of our increasingly global network: A global network is too large and complex for a bureaucracy to manage. It would be too slow, expensive, and inefficient to be of value. Further, even if one could be built, it would be impossible to apply…