PyCon 2011: Handling ridiculous amounts of data with probabilistic data structures

C. Titus Brown Part of my job as a scientist involves playing with rather large amounts of data (200 gb+). In doing so we stumbled across some neat CS techniques that scale well, and are easy to understand and trivial to implement. These techniques allow us to make some or many types of data analysis map-reducable. I'll talk about interesting implementation details, fun science, and neat computer science.

