Supercharging visualization with Apache Arrow

Click for: original source

Article on KDnuggets™ about how Apache Arrow provides a new way to exchange and visualize data at unprecedented speed and scale. Despite the fact that interactive visualization of large data sets on the web has traditionally been impractical.

Imagine a future where “Minority Report” style data visualizations run in every web browser.

The Apache Arrow ecosystem, including the first open source layers for improving JavaScript performance, is starting doing exactly that. An approach taken i remote rendering: the server sends geometry commands to the client, and the client turns those into viewable pixels by leveraging the client’s standard web browser and its local access to a client-side GPU.

Remote rendering experiences over typical web architectures built on JSON hits two key bottlenecks:

  • Networking clogged by large file sizes
  • CPU and memory-intensive data serialization

A big win for the file size is using a columnar format. Apache Arrow was designed to eliminate the overhead of serialization by providing a standard way of representing columnar data for in-memory processing. Follow the link to learn more.

[Read More]

Tags big-data analytics data-science big-data