Unsupervised Learning

Definitions

Of Living: Biological
Data whose origin is directly linked to something that is alive. Data that occurs without conscious origin (i.e. not from a human typing). Often from sensors. Examples: a) species migration reported by a sensor; b) quantified self data such as output from a heart-rate monitor; c) a bird-call.

Of Living: Environmental
Data whose origin is directly linked to the natural world. Often from sensors. Examples: a) ocean temperature, b) solar storm activity, c) seed bank information.

Of Non-Living: Object
Data whose origin is a physical object or device. Object data is often generated for machine to machine communication, however, the Internet of Things will see a greater machine to (human) consumer communication. Examples: a) a fridge’s energy use; b) a CCTV camera; c) a smart watch.

Of Social Context: Commercial
Data produced by or about a corporate entity. Examples: a) 10 years of financial information about a company; b) the expiry date on a chocolate bar.

Of Social Context: Personal
Data produced by or about an individual. Certain types will have restricted access, and some legal and technical protections. Other will be accessible by some, if not all, of the general public. Examples: a) Google’s search analysis profile of a non-anonymised individual’s interests; b) International travel logs held at border controls; c) a recording of a private telephone conversation; d) family photos publicly tagged on Flickr; e) your social network feed.

Of Social Context: Social
Data produced by or about a social group or society. Examples; a) global number of births each day; b) voting preference in a London borough; c) immigration figures.

Of Social Context: State
Data produced by or about a government or ruling authority. Examples: a) the economy of the eurozone; b) legislation documents.

Of Licence: Closed
Closed data is generally only accessible to people within an organisation or to certain individuals. Examples: a) company personnel files; b) national security documents.

Of Licence: Open
Open data can be accessed, used, and shared by anyone. Examples: a) state-owned weather records, b) earthquake monitoring data.

Of Licence: Shared
Shared data is data available to a specific group of people for a specific purpose. Examples: a) the electoral register; b) anonymized supermarket shopping patterns.

Of Time/Space: Geospatial
Data describing, is relevant to, or is derived from a space or geographic area. Examples: a) GPS coordinates from a cross-country walk; b) the number of people visiting the Tate Modern art gallery; c) the area of a baseball pitch; d) longitude and latitude.

Of Time/Space: Live
Data which is, or was, captured in real-time. The recording does not necessarily get played-back at the same rate, or in the same moment. Examples: a) a football match on TV; b) animal tracking data.

Of Time/Space: Real-time
Data that is created, captured and disseminated in an immediate (ish) time-frame relative to the context of its use; it changes over time. Examples: a) smart-meter reporting electricity usage every 30 seconds (real-time data acquisition with a relevant-time display); b) feeds from sensors such as a webcam on a birds nest, a GPS location of a mobile phone, or a humidity reading in an gallery space.

Of Time/Space: Static
Data in which the items do not change once created, but the dataset can grow over time. Includes historical datasets and archive indexes. Examples: a) historical global population size; b) a recording in the sound archive at the British Library.

Of Time/Space: Temporal
Data which is time-based in its nature, relevant to a specific time or which may only exist for a short time period (transient). Examples: a) the value of a kilogram of rice over time; b) your date of birth; c) the radio signals received from an exploding star.

Of Type: Anecdata
Anecdotal information gathered and then presented as evidence. Anecdata is often not precisely measurable, has no reliable provenance, is hard to compare, and /or cannot be unproven by the scientific method. Examples: a) a collection of comments on a product website; b) proverbs such as ”Never look a gift horse in the mouth”.

Of Type: Causal
Data in which it is (or is made) obvious to the observer what its origin is. Example: a vocal recording.

Of Type: Generated
Data created by a software program. Examples: a) algorithmic music; b) cellular automaton; c) a model of a galaxy exploding.

Of Type: Metadata
Data about data. Data which describes information about other data. Examples: a) the number of rows in a database; b) the time and date a phone call was made.

Of Type: Processed
Data which has been calculated, altered or processed in some way. Examples: a) a sonification of stock market figures; b) aggregated statistics; c) a colourful digital photograph reduced to black and white.

Of Type: Retrieved
Data made available on request by machine or user. Examples: a) compilation of weather data from the past 24 hours as a single CSV file; b) availability status of a library book.

Of Type: Streamed
The technical means of delivering real-time data as a contiguous stream. The primary use-cases are where there is no requirement for data storage, or that the data-sets involved are too large to be manipulated in any other manner (the entire Twitter back catalogue). Examples: a) real-time audio and video from a carnival procession; b) on-demand replay of a film from 1960; c) music playing from a digital radio.

Of Disclosure: Anonymised
Data that has had any identifiable in- formation about a person, animal, or thing removed. Examples: a) CCTV camera footage containing people which have been blurred or obfuscated; b) all bicycle hire users across a city with user IDs and names removed.

Disclosure: Identifiable
Data in which the direct source within it (person, animal, or thing) can be identified. Examples: a) a Facebook data export including friend names; b) a set of mobile phone numbers with owner address details.

Of Disclosure: Unknown
Data which contains information about a person, animal, or thing but in which it is not clear if it is adequately anonymised. Examples: a) a live Twitter feed containing some geolocated photos of people and animals; b) a sound recording from a public space that includes ambient conversation.

Hide threads | Keyboard Shortcuts

julie 3:33 pm on July 5, 2013 Permalink
Tags: machine learning ( 2 ), supervised learning, unsupervised learning

Unsupervised Learning
Unsupervised learning – Wikipedia, the free encyclopedia.

CHAT

J
A is this true Machine learning techniques that crunch through very large data sets to learn and gain ‘meaning’ from them tend to work on popularity and frequency bias.
A
a lot of the words there are ill defined..
A
popularity of what?
A
frequency of what?
A
bias towards what?
A
i’m also not sure you can learn ‘meaning’
A
also, not sure what ‘meaning’ is in this context
J
bias toward patterns in the data that occur frequently
A
what do you then mean by “to work on”?
A
do they “work” (as in they are useful) because they specifically “look for” patterns
A
often machine learning is called pattern recognition
]J
but is it?
A
so saying that machine learning works because it recognises patterns is reasoning in circles a little bit i guess
A
it’s what they do, not why they work
J
i’m trying to assess if using ‘deep learning’ techniques will result in outliers and freaky data being dismissed
A
dismissed how?
J
not deemed important enough to highlight. in a google search for instance
A
i’m not sure deep learning does that, don’t know much about it
A
i think you probably need some more precisely defined terms to be able to say something about these algorithms
J
it seems to be (i’m sketchy on this) whether ML techniques are used to learn about the data without the use of ontologies, the neural nets learn what they need to learn based on the patterns they find
A
you should look into supervised vs. unsupervised learning
J
okay…
A
just the two terms
J
yes – unsupervised
A
supervised is when an algorithm is trained on data that is labeled, it can learn from examples, unsupervised is when the data isn’t labeled, it finds and labels patterns on its own, but they might not correspond to somebody else’s pattern->label matching
A
i would take ‘meaning’ out, and make it more concrete, and then specify what they do (input->output), not how they work. and specifically state unsupervised learning, not ML in general
J
i get that, so in unsupervised learning, when you are clustering patterns, I want to know what relevance (?) is given to the aspects of the data that don’t fit in to any cluster. Except the cluster of non-clustering things.
J
thank you, this is v helpful
A
it’s not certain that they don’t fit in any clusters
A
dividing a plane in half still leaves two infinite planes
J
that’s what i mean about the cluster of non-clustering things. there can’t be a no-cluster.
A
clustering algorithms take dimensions of the data into account, if they don’t take a dimension into account it doesn’t matter what it is
A
what are non-clustering things?
A
you can make a no-cluster, it’s just a cluster with the name no-cluster
J
so when my algorithms have run (are running), I have sets of clusters that are repeat patterns (eyes in a series of faces say). But what happens to the face with a set of freckles in the shape of Italy that only occurs once but it really quite special. How will the system learn about the specialness of unique things?
A
i wouldn’t call them patterns, just a cluster of data points.
A
a learning algorithm can only see what you give it. you’ll have to give it a dimension for freckles and a dimension for ethnicity, perhaps skin colour or hair colour. you can then ask for points that have not many similar points, but that isn’t necessarily machine learning
A
if you want to learn about uniqueness as a phenomenon you’ll have to start describing it and then find patterns in that i guess
A
unique things aren’t special
A
per se
J
yes i see that, so i guess unsupervised learning won’t necessarily do that as that would mean labeling some bits of the data (perhaps semi-supervised learning).
J
haha
J
some of them are
A
just point out there is a value judgement in there that machine learning just isn’t concerned with
J
ok, yes – loads of things for me to think about. I’ll let you get on…

Translating Data

Definitions

Meta

julie 3:33 pm on July 5, 2013 Permalink Tags: machine learning ( 2 ), supervised learning, unsupervised learning

julie 3:33 pm on July 5, 2013 Permalink
Tags: machine learning ( 2 ), supervised learning, unsupervised learning