A Taxonomy of Data Types

A Taxonomy of Data Types – Statistical Machine Learning and Visualization.

A table of data types that refers to distribution, for example:

type: categoric atom example: word in Eng Lang distribution example: Multinomial (1,theta)

This taxonomy at the technical end of the data type spectrum and is unlikely to be used by artists.

It is useful to separate machine learning and visualization techniques (k-NN, PCA, etc.) from specific data domains (text, images, etc.). We should be able to come up with a taxonomy of data types on one hand and a library of techniques suitable for each data type on the other hand. Then, given a specific data domain we can identify the appropriate data type and follow up with one or more appropriate analysis/visualization techniques.

This is by no means completely satisfactory as each data domain has its own peculiarities and any attempt to come up with a short taxonomy is bound to be a “lossy approximation”. But as many approximations I believe it is one that is useful and worthy of consideration.