Surge 2011 ~ Building a Real-Time Cloud Analytics Service with Node.js

We present our experience designing, implementing, and deploying a Node.js-based distributed system for analyzing system and application performance across a datacenter. Our system's design, and particularly the choice of programming environment, were driven by our goals of supporting real-time analysis of problems spanning hundreds of production systems, which requires that the system deal with large volumes of data with very low latency. We will briefly discuss these considerations and why we chose Node.js for the implementation. We will then present our actual experience building and deploying the software, including topics of software development speed; availability of libraries and tools for development, testing, and verification; difficulties observing and debugging Node applications (especially post-mortem); packaging issues related to lack of C++ binary compatibility; and other development and deployment issues. Finally, we will close with a demonstration of the facility itself, and some discussions of the production pathologies that it has found—including the results of using the facility to analyze its own performance.