Data Engineering

Definitions

Of Living: Biological
Data whose origin is directly linked to something that is alive. Data that occurs without conscious origin (i.e. not from a human typing). Often from sensors. Examples: a) species migration reported by a sensor; b) quantified self data such as output from a heart-rate monitor; c) a bird-call.

Of Living: Environmental
Data whose origin is directly linked to the natural world. Often from sensors. Examples: a) ocean temperature, b) solar storm activity, c) seed bank information.

Of Non-Living: Object
Data whose origin is a physical object or device. Object data is often generated for machine to machine communication, however, the Internet of Things will see a greater machine to (human) consumer communication. Examples: a) a fridge’s energy use; b) a CCTV camera; c) a smart watch.

Of Social Context: Commercial
Data produced by or about a corporate entity. Examples: a) 10 years of financial information about a company; b) the expiry date on a chocolate bar.

Of Social Context: Personal
Data produced by or about an individual. Certain types will have restricted access, and some legal and technical protections. Other will be accessible by some, if not all, of the general public. Examples: a) Google’s search analysis profile of a non-anonymised individual’s interests; b) International travel logs held at border controls; c) a recording of a private telephone conversation; d) family photos publicly tagged on Flickr; e) your social network feed.

Of Social Context: Social
Data produced by or about a social group or society. Examples; a) global number of births each day; b) voting preference in a London borough; c) immigration figures.

Of Social Context: State
Data produced by or about a government or ruling authority. Examples: a) the economy of the eurozone; b) legislation documents.

Of Licence: Closed
Closed data is generally only accessible to people within an organisation or to certain individuals. Examples: a) company personnel files; b) national security documents.

Of Licence: Open
Open data can be accessed, used, and shared by anyone. Examples: a) state-owned weather records, b) earthquake monitoring data.

Of Licence: Shared
Shared data is data available to a specific group of people for a specific purpose. Examples: a) the electoral register; b) anonymized supermarket shopping patterns.

Of Time/Space: Geospatial
Data describing, is relevant to, or is derived from a space or geographic area. Examples: a) GPS coordinates from a cross-country walk; b) the number of people visiting the Tate Modern art gallery; c) the area of a baseball pitch; d) longitude and latitude.

Of Time/Space: Live
Data which is, or was, captured in real-time. The recording does not necessarily get played-back at the same rate, or in the same moment. Examples: a) a football match on TV; b) animal tracking data.

Of Time/Space: Real-time
Data that is created, captured and disseminated in an immediate (ish) time-frame relative to the context of its use; it changes over time. Examples: a) smart-meter reporting electricity usage every 30 seconds (real-time data acquisition with a relevant-time display); b) feeds from sensors such as a webcam on a birds nest, a GPS location of a mobile phone, or a humidity reading in an gallery space.

Of Time/Space: Static
Data in which the items do not change once created, but the dataset can grow over time. Includes historical datasets and archive indexes. Examples: a) historical global population size; b) a recording in the sound archive at the British Library.

Of Time/Space: Temporal
Data which is time-based in its nature, relevant to a specific time or which may only exist for a short time period (transient). Examples: a) the value of a kilogram of rice over time; b) your date of birth; c) the radio signals received from an exploding star.

Of Type: Anecdata
Anecdotal information gathered and then presented as evidence. Anecdata is often not precisely measurable, has no reliable provenance, is hard to compare, and /or cannot be unproven by the scientific method. Examples: a) a collection of comments on a product website; b) proverbs such as ”Never look a gift horse in the mouth”.

Of Type: Causal
Data in which it is (or is made) obvious to the observer what its origin is. Example: a vocal recording.

Of Type: Generated
Data created by a software program. Examples: a) algorithmic music; b) cellular automaton; c) a model of a galaxy exploding.

Of Type: Metadata
Data about data. Data which describes information about other data. Examples: a) the number of rows in a database; b) the time and date a phone call was made.

Of Type: Processed
Data which has been calculated, altered or processed in some way. Examples: a) a sonification of stock market figures; b) aggregated statistics; c) a colourful digital photograph reduced to black and white.

Of Type: Retrieved
Data made available on request by machine or user. Examples: a) compilation of weather data from the past 24 hours as a single CSV file; b) availability status of a library book.

Of Type: Streamed
The technical means of delivering real-time data as a contiguous stream. The primary use-cases are where there is no requirement for data storage, or that the data-sets involved are too large to be manipulated in any other manner (the entire Twitter back catalogue). Examples: a) real-time audio and video from a carnival procession; b) on-demand replay of a film from 1960; c) music playing from a digital radio.

Of Disclosure: Anonymised
Data that has had any identifiable in- formation about a person, animal, or thing removed. Examples: a) CCTV camera footage containing people which have been blurred or obfuscated; b) all bicycle hire users across a city with user IDs and names removed.

Disclosure: Identifiable
Data in which the direct source within it (person, animal, or thing) can be identified. Examples: a) a Facebook data export including friend names; b) a set of mobile phone numbers with owner address details.

Of Disclosure: Unknown
Data which contains information about a person, animal, or thing but in which it is not clear if it is adequately anonymised. Examples: a) a live Twitter feed containing some geolocated photos of people and animals; b) a sound recording from a public space that includes ambient conversation.

Hide threads | Keyboard Shortcuts

julie 2:59 pm on June 20, 2013 Permalink
Tags: data engineering

Data Engineering
This is an important concept when considering data as an art material. Should the data take the lead in the work, will it be the defining factor rather than something plugged-in afterwards (this is an issue for my current thinking in constructing a physical work that can cater for various data streams)…

Data Engineering

Posted: April 1, 2013 | Author: Hilary Mason

Data engineering is when the architecture of your system is dependent on characteristics of the data flowing through that system.

It requires a different kind of engineering process than typical systems engineering, because you have to do some work upfront to understand the nature of the data before you can effectively begin to design the infrastructure. Most data engineering systems also transform the data as they process it.

Developing these types of systems requires an initial research phase, where you do the necessary work to understand the characteristics of the data, before you design the system (and perhaps even requiring an active experimental process where you try multiple infrastructure options in the wild before making a final decision). I’ve seen numerous people run straight into walls when they ignore this research requirement.

Forget Table is one example of a data engineering project from our work at bitly. It’s a database for storing non-stationary categorical distributions. We often see streams of data and want to understand what the distributions in that data look like, knowing that they drift over time. Forget Table is designed precisely for this use, allowing you to configure the rate of change in your particular dataset (check it out on github).

via » Data Engineering hilarymason.com.

Translating Data

Definitions

Meta

julie 2:59 pm on June 20, 2013 Permalink Tags: data engineering

julie 2:59 pm on June 20, 2013 Permalink
Tags: data engineering