Data Engineering

This is an important concept when considering data as an art material. Should the data take the lead in the work, will it be the defining factor rather than something plugged-in afterwards (this is an issue for my current thinking in constructing a physical work that can cater for various data streams)…

Data Engineering

Posted: April 1, 2013 | Author: Hilary Mason

Data engineering is when the architecture of your system is dependent on characteristics of the data flowing through that system.

It requires a different kind of engineering process than typical systems engineering, because you have to do some work upfront to understand the nature of the data before you can effectively begin to design the infrastructure. Most data engineering systems also transform the data as they process it.

Developing these types of systems requires an initial research phase, where you do the necessary work to understand the characteristics of the data, before you design the system (and perhaps even requiring an active experimental process where you try multiple infrastructure options in the wild before making a final decision). I’ve seen numerous people run straight into walls when they ignore this research requirement.

Forget Table is one example of a data engineering project from our work at bitly. It’s a database for storing non-stationary categorical distributions. We often see streams of data and want to understand what the distributions in that data look like, knowing that they drift over time. Forget Table is  designed precisely for this use, allowing you to configure the rate of change in your particular dataset (check it out on github).

via » Data Engineering hilarymason.com.