Build your own Data Cloud QMUL UPLOAD SLIDES…

Definitions

Of Living: Biological
Data whose origin is directly linked to something that is alive. Data that occurs without conscious origin (i.e. not from a human typing). Often from sensors. Examples: a) species migration reported by a sensor; b) quantified self data such as output from a heart-rate monitor; c) a bird-call.

Of Living: Environmental
Data whose origin is directly linked to the natural world. Often from sensors. Examples: a) ocean temperature, b) solar storm activity, c) seed bank information.

Of Non-Living: Object
Data whose origin is a physical object or device. Object data is often generated for machine to machine communication, however, the Internet of Things will see a greater machine to (human) consumer communication. Examples: a) a fridge’s energy use; b) a CCTV camera; c) a smart watch.

Of Social Context: Commercial
Data produced by or about a corporate entity. Examples: a) 10 years of financial information about a company; b) the expiry date on a chocolate bar.

Of Social Context: Personal
Data produced by or about an individual. Certain types will have restricted access, and some legal and technical protections. Other will be accessible by some, if not all, of the general public. Examples: a) Google’s search analysis profile of a non-anonymised individual’s interests; b) International travel logs held at border controls; c) a recording of a private telephone conversation; d) family photos publicly tagged on Flickr; e) your social network feed.

Of Social Context: Social
Data produced by or about a social group or society. Examples; a) global number of births each day; b) voting preference in a London borough; c) immigration figures.

Of Social Context: State
Data produced by or about a government or ruling authority. Examples: a) the economy of the eurozone; b) legislation documents.

Of Licence: Closed
Closed data is generally only accessible to people within an organisation or to certain individuals. Examples: a) company personnel files; b) national security documents.

Of Licence: Open
Open data can be accessed, used, and shared by anyone. Examples: a) state-owned weather records, b) earthquake monitoring data.

Of Licence: Shared
Shared data is data available to a specific group of people for a specific purpose. Examples: a) the electoral register; b) anonymized supermarket shopping patterns.

Of Time/Space: Geospatial
Data describing, is relevant to, or is derived from a space or geographic area. Examples: a) GPS coordinates from a cross-country walk; b) the number of people visiting the Tate Modern art gallery; c) the area of a baseball pitch; d) longitude and latitude.

Of Time/Space: Live
Data which is, or was, captured in real-time. The recording does not necessarily get played-back at the same rate, or in the same moment. Examples: a) a football match on TV; b) animal tracking data.

Of Time/Space: Real-time
Data that is created, captured and disseminated in an immediate (ish) time-frame relative to the context of its use; it changes over time. Examples: a) smart-meter reporting electricity usage every 30 seconds (real-time data acquisition with a relevant-time display); b) feeds from sensors such as a webcam on a birds nest, a GPS location of a mobile phone, or a humidity reading in an gallery space.

Of Time/Space: Static
Data in which the items do not change once created, but the dataset can grow over time. Includes historical datasets and archive indexes. Examples: a) historical global population size; b) a recording in the sound archive at the British Library.

Of Time/Space: Temporal
Data which is time-based in its nature, relevant to a specific time or which may only exist for a short time period (transient). Examples: a) the value of a kilogram of rice over time; b) your date of birth; c) the radio signals received from an exploding star.

Of Type: Anecdata
Anecdotal information gathered and then presented as evidence. Anecdata is often not precisely measurable, has no reliable provenance, is hard to compare, and /or cannot be unproven by the scientific method. Examples: a) a collection of comments on a product website; b) proverbs such as ”Never look a gift horse in the mouth”.

Of Type: Causal
Data in which it is (or is made) obvious to the observer what its origin is. Example: a vocal recording.

Of Type: Generated
Data created by a software program. Examples: a) algorithmic music; b) cellular automaton; c) a model of a galaxy exploding.

Of Type: Metadata
Data about data. Data which describes information about other data. Examples: a) the number of rows in a database; b) the time and date a phone call was made.

Of Type: Processed
Data which has been calculated, altered or processed in some way. Examples: a) a sonification of stock market figures; b) aggregated statistics; c) a colourful digital photograph reduced to black and white.

Of Type: Retrieved
Data made available on request by machine or user. Examples: a) compilation of weather data from the past 24 hours as a single CSV file; b) availability status of a library book.

Of Type: Streamed
The technical means of delivering real-time data as a contiguous stream. The primary use-cases are where there is no requirement for data storage, or that the data-sets involved are too large to be manipulated in any other manner (the entire Twitter back catalogue). Examples: a) real-time audio and video from a carnival procession; b) on-demand replay of a film from 1960; c) music playing from a digital radio.

Of Disclosure: Anonymised
Data that has had any identifiable in- formation about a person, animal, or thing removed. Examples: a) CCTV camera footage containing people which have been blurred or obfuscated; b) all bicycle hire users across a city with user IDs and names removed.

Disclosure: Identifiable
Data in which the direct source within it (person, animal, or thing) can be identified. Examples: a) a Facebook data export including friend names; b) a set of mobile phone numbers with owner address details.

Of Disclosure: Unknown
Data which contains information about a person, animal, or thing but in which it is not clear if it is adequately anonymised. Examples: a) a live Twitter feed containing some geolocated photos of people and animals; b) a sound recording from a public space that includes ambient conversation.

Hide threads | Keyboard Shortcuts

julie 3:57 pm on June 9, 2015 Permalink
Tags: tech notes ( 27 )

Build your own Data Cloud – QMUL

UPLOAD SLIDES

MapReduce / Hadoop on RaspberryPi

If your data is processed once, maybe not keep it in the HFDS as it is slow(?) check. HDFS (Hadoop Distributed File SYstem).

Name Nodes store the metadata
Data Nodes store data (nodes are replicated a number of times (3 usually)

In nano hadoop-2.6.0/etc/hadoop/hadoop-env.sh (configuration file)
Edit the java implementation to use.
export JAVA_HOME=${JAVA_HOME}
to
export JAVA_HOME=”/home/pi/ejdk1.8.0_33/linux_armv6_vfp_hflt/jre/” [or equiv]

edit core-site.xml

fs.defaultFS
hdfs://pi-0:9000 <--- this is the master node. you have to place the tag inside the

edit hdfs-site.xml
dfs.replication
1 dfs.namenode.name.dir
/home/pi/had-hdfs/ dfs.datanode.data.dir
/home/pi/had-hdfs/

edit mapred-site.xml <--- may already be correctly edited mapreduce.framework.name
yarn

Yarn
– hadoop-2.6.0/etc/hadoop/yarn-site.xml
yarn.nodemanager.aux-services mapreduce_shuffle yarn.resourcemanager.hostname
localhost <---- localhost uses localhost as master. for NME ask ITS for the master IP on their Hadoop set-up You need to start Hadoop process form the Master node... You need to start Yarn process form the Master node... [so both of these will be running at QMUL already] This was a SHIT workshop. to unzip tars in command line: tar xzvf [filename]

Translating Data

Menu

Definitions

Meta

julie 3:57 pm on June 9, 2015 Permalink
Tags: tech notes ( 27 )

Translating Data

Definitions

Meta

julie 3:57 pm on June 9, 2015 Permalink Tags: tech notes ( 27 )

julie 3:57 pm on June 9, 2015 Permalink
Tags: tech notes ( 27 )