Walking Wales : The Data Challenge

The handling and analysis of large amounts of data is no trivial task, especially if the data comes from diverse sources with various data formats, and is subject to inconsistencies and errors. This project deals with such data, collected during a 3 month walk around the perimeter of Wales in the UK. It details the difficulties of processing and ultimately making sense of real world data, including GPS, ECG and free text, showing how problems in the raw data were identified and resolved through the use of open source tools or special tools written by the author. The importance of understand the data is emphasised, of which assessing the quality of the data is a major issue. Unstructured text from blog posts was analysed to extract a sentiment score, which involved creating a domain specific sentiment dictionary. A small study highlighted some of the problems of assessing sentiment in this context. Finally, several multivariate visualisations were created to allow browsing and a visual exploration of the data. This included the results of the sentiment analysis, GPS track, heart-rate, elevation, acceleration and skin conductivity on a zoomable timeline. A zoomable map was also created, showing the walked track with an indication of the sentiment score. The use of the visualisations to find interesting artefacts are demonstrated.

