Are MLB Games Getting Longer?

August 5, 2010 | Ryan

On July 29, 2010, I had a flight from Denver to Cincinnati.  About an hour before boarding, I went to ESPN’s website and found a new article by Bill Simmons, a.k.a The Sports Guy (@sportsguy33 on Twitter).  The basic premise of this article is that a core group of ... [Read more...]

My Experience at Hadoop Summit 2010 #hadoopsummit

June 30, 2010 | Ryan

This week I had the opportunity the trek up north to Silicon Valley to attend Yahoo’s Hadoop Summit 2010. I love Silicon Valley. The few times I’ve been there the weather was perfect (often warmer than LA), little to no traffic, no road rage and people overall seem friendly ... [Read more...]

Some LaTeX Gems – Part 1: TikZ, Loops and more

April 23, 2010 | Ryan

This logo means that the blog post is about something I have found interesting, but does not apply directly to the exact purpose of this blog. Note: These commands have been tested in pdflatex. I am not sure if they work in other distributions. Over the past couple of months, ... [Read more...]

Anecdotal Evidence that Facebook Stores all Clicks?

April 11, 2010 | Ryan

This is not really news. A few months ago, news broke that Facebook recorded each user’s clicks and profile views in a database. Of course, I am not at all surprised. I would be more surprised if they didn’t store every single click. By now, most people have ... [Read more...]

Some Code for Dumping Data from Twitter Gardenhose

March 30, 2010 | Ryan

Gardenhose is a Streaming API feed that continuously sends a sample (roughly 15% according to Ryan Sarver at the 140tc in September 2009) of all tweets to feed recipients. This is some code for dumping the tweets to files named by date and hour. It is in PHP which is not my ... [Read more...]

Lessons Learned from EC2

March 24, 2010 | Ryan

A week or so ago I had my first experience using someone else’s cluster on Amazon EC2. EC2 is the Amazon Elastic Compute Cloud. Users set up a virtual computing platform that runs on Amazon’s servers “in the cloud.” Amazon EC2 is not just another cluster. EC2 allows ... [Read more...]

Be Careful Searching Python Dictionaries!

February 27, 2010 | Ryan

For my talk on High Performance Computing in R (which I had to reschedule due to a nasty stomach bug), I used Wikipedia linking data, an adjacency list of articles and the articles to which they link. This data was linked from DataWrangling and was originally created by Henry Haselgrove. ... [Read more...]

Some Python Nooks and Crannies

January 31, 2010 | Ryan

I spent this weekend reading Learning Python (Second Edition for Python 2.3!) by Mark Lutz. Python is my favorite programming language, but my experience with it has been mostly anecdotal; I come up with my own solutions and functions and I Google whatever I do not know. I decided to spend ... [Read more...]

What to Expect?

January 22, 2010 | Ryan

In 2007, I was introduced to Twitter via the written qualifying exam towards my Ph.D.. At first, I did not know what to do with it. After a good year or so (maybe even sooner) passed, I began to follow some very interesting people that share the same interests as ... [Read more...]


January 4, 2010 | Ryan

Welcome to my new blog, Byte Mining! Data is all around us, all the time. It flows in from places you would least expect it, and more times that not, it remains in its original form untouched by human and machine. When data simply flows in and out of our ... [Read more...]
