PyCon 2011: Handling ridiculous amounts of data with probabilistic data structures

C. Titus Brown Part of my job as a scientist involves playing with rather large amounts of data (200 gb+). In doing so we stumbled across some neat CS techniques that scale well, and are easy to understand and trivial to implement. These techniques allow us to make some or many types of data analysis map-reducable. I'll talk about interesting implementation details, fun science, and neat computer science.

More episodes of PyCon US Videos - 2009, 2010, 2011

Featured episodes in Learning

PyCon US Videos - 2009, 2010, 2011

PyCon is an activity of the Python Software Foundation, a 501c3 non-profit organization. To support future conferences, please donate to the Foundation at www.python.org/psf/donations . Video and audio material from PyCon are licensed under the Creative Commons CC-BY-NC-SA license . This means you can incorporate excerpts or entire recordings in your own non-commercial projects, as long as you credit the speaker and you CC-license the finished project.