A Concise Taxonomy for Describing Data as an Art Material v2.0 (a work in progress)

This taxonomy is designed for artists, curators, and consumers of any art which incorporates data as a material. It is a descriptive set of terms, that is, it eschews some technical accuracy for classifications that are more commonly understood and easy to apply. The words form an informal conceptual system, which is that the terms underlie a more specific knowledge base (such as the Getty Art & Architecture vocabulary and the Project Open Data metadata schema). It is a challenge to represent all aspects of data in a uniform way, therefore this taxonomy includes generic terms which guide the reader toward a richer understanding of the data.

We have aimed to create a concise set of terms which enable data to be described in an objective way. Its purpose is not to describe subjective response of the viewer or listener, hence the taxonomy does not include terms that can be applied to the affective descriptions of the experience of the work such as evocative or intimate. We have also avoided terms that describe the aesthetic that the data yields in the artwork itself such as dynamic or abstract. Whilst also useful for categorising and grouping art these terms are often personal and user-defined (by the artist, curator, audience, or critic) which makes a controlled vocabulary less effective and relevant.

The material (data) is examined from a number of perspectives—delivery method, how it emerged, format of existence, which system it represents, the source or origin, the license. In comparison, when considering a traditional art material, we may ask where it was made, who made it, where is it from, what does it comprise of, who owns it, how does it need to be stored, does it transform or degrade? Any number of the terms in the taxonomy may be relevant to any one artwork, and it should be used with this in mind. For example, Listening Post by Mark Hansen and Ben Rubin would be tagged personal, social, live, real-time, temporal, retrieved, processed, anecdata.

  • Of living: Biological; Environmental
  • Of non-living: Object
  • Of social context: Commercial; Personal; Social; State
  • Of licence: Closed; Open; Shared
  • Of time/space: Geospatial; Live; Real-time; Static; Temporal
  • Of type: Anecdata; Causal; Generated; Metadata; Processed; Retrieved; Streamed
  • Of disclosure: Anonymised; Identifiable; Unknown

DEFINITIONS

Of Living: Biological
Data whose origin is directly linked to something that is alive. Data that occurs without conscious origin (i.e. not from a human typing). Often from sensors. Examples: a) species migration reported by a sensor; b) quantified self data such as output from a heart-rate monitor; c) a bird-call.

Of Living: Environmental
Data whose origin is directly linked to the natural world. Often from sensors. Examples: a) ocean temperature, b) solar storm activity, c) seed bank information.

Of Non-Living: Object
Data whose origin is a physical object or device. Object data is often generated for machine to machine communication, however, the Internet of Things will see a greater machine to (human) consumer communication. Examples: a) a fridge’s energy use; b) a CCTV camera; c) a smart watch.

Of Social Context: Commercial
Data produced by or about a corporate entity. Examples: a) 10 years of financial information about a company; b) the expiry date on a chocolate bar.

Of Social Context: Personal
Data produced by or about an individual. Certain types will have restricted access, and some legal and technical protections. Other will be accessible by some, if not all, of the general public. Examples: a) Google’s search analysis profile of a non-anonymised individual’s interests; b) International travel logs held at border controls; c) a recording of a private telephone conversation; d) family photos publicly tagged on Flickr; e) your social network feed.

Of Social Context: Social
Data produced by or about a social group or society. Examples; a) global number of births each day; b) voting preference in a London borough; c) immigration figures.

Of Social Context: State
Data produced by or about a government or ruling authority. Examples: a) the economy of the eurozone; b) legislation documents.

Of Licence: Closed
Closed data is generally only accessible to people within an organisation or to certain individuals. Examples: a) company personnel files; b) national security documents.

Of Licence: Open
Open data can be accessed, used, and shared by anyone. Examples: a) state-owned weather records, b) earthquake monitoring data.

Of Licence: Shared
Shared data is data available to a specific group of people for a specific purpose. Examples: a) the electoral register; b) anonymized supermarket shopping patterns.

Of Time/Space: Geospatial
Data describing, is relevant to, or is derived from a space or geographic area. Examples: a) GPS coordinates from a cross-country walk; b) the number of people visiting the Tate Modern art gallery; c) the area of a baseball pitch; d) longitude and latitude.

Of Time/Space: Live
Data which is, or was, captured in real-time. The recording does not necessarily get played-back at the same rate, or in the same moment. Examples: a) a football match on TV; b) animal tracking data.

Of Time/Space: Real-time
Data that is created, captured and disseminated in an immediate* time-frame relative to the context of its use; it changes over time. Examples: a) smart-meter reporting electricity usage every 30 seconds (real-time data acquisition with a relevant-time display); b) feeds from sensors such as a webcam on a birds nest, a GPS location of a mobile phone, or a humidity reading in an gallery space.

Of Time/Space: Static
Data in which the items do not change once created, but the dataset can grow over time. Includes historical datasets and archive indexes. Examples: a) historical global population size; b) a recording in the sound archive at the British Library.

Of Time/Space: Temporal
Data which is time-based in its nature, relevant to a specific time or which may only exist for a short time period (transient). Examples: a) the value of a kilogram of rice over time; b) your date of birth; c) the radio signals received from an exploding star.

Of Type: Anecdata
Anecdotal information gathered and then presented as evidence. Anecdata is often not precisely measurable, has no reliable provenance, is hard to compare, and /or cannot be unproven by the scientific method. Examples: a) a collection of comments on a product website; b) proverbs such as ”Never look a gift horse in the mouth”.

Of Type: Causal
Data in which it is (or is made) obvious to the observer what its origin is. Example: a vocal recording.

Of Type: Generated
Data created by a software program. Examples: a) algorithmic music; b) cellular automaton; c) a model of a galaxy exploding.

Of Type: Metadata
Data about data. Data which describes information about other data. Examples: a) the number of rows in a database; b) the time and date a phone call was made.

Of Type: Processed
Data which has been calculated, altered or processed in some way. Examples: a) a sonification of stock market figures; b) aggregated statistics; c) a colourful digital photograph reduced to black and white.

Of Type: Retrieved
Data made available on request by machine or user. Examples: a) compilation of weather data from the past 24 hours as a single CSV file; b) availability status of a library book.

Of Type: Streamed
The technical means of delivering real-time data as a contiguous stream. The primary use-cases are where there is no requirement for data storage, or that the data-sets involved are too large to be manipulated in any other manner (the entire Twitter back catalogue). Examples: a) real-time audio and video from a carnival procession; b) on-demand replay of a film from 1960; c) music playing from a digital radio.

Of Disclosure: Anonymised
Data that has had any identifiable in- formation about a person, animal, or thing removed. Examples: a) CCTV camera footage containing people which have been blurred or obfuscated; b) all bicycle hire users across a city with user IDs and names removed.

Disclosure: Identifiable
Data in which the direct source within it (person, animal, or thing) can be identified. Examples: a) a Facebook data export including friend names; b) a set of mobile phone numbers with owner address details.

Of Disclosure: Unknown
Data which contains information about a person, animal, or thing but in which it is not clear if it is adequately anonymised. Examples: a) a live Twitter feed containing some geolocated photos of people and animals; b) a sound recording from a public space that includes ambient conversation.

*immediate is relative. assumes some minimal latency etc.

This taxonomy is over at GitHub – please contribute comments and suggestions there.