So, there goes my first attempt at writing a catchy title for a light introduction to Smart Data.
I felt this overwhelming pressure to be witty in this first tech piece I’ve written in a long time.
Now, onward to Smart Data.
Data is smart when it has been linked or merged with enough contextual information for it to reveal its true meaning, it yields actionable intelligence or becomes … ta-da … the best possible version of itself.
It’s also good to establish that smart data has an evil twin, dumb data, which exists in inconceivable volume. Let’s define that here as any data collected by sensors which have no clue what they are measuring, where they are located nor what the values mean that they are collecting, being oblivious even to the time of day or the day it self.
The same can be said for a lot of machine and business generated data which is newer put in context, cleaned or linked, only generated and stored away so its owners can claim an imaginary BigData-trophy.
I felt the need to give you an “early out” after that title fiasco, you’re welcome!
The journey from bad to good - the data edition
Please allow me to put this in a personal context for you, hoping it’s more relatable that way and then ask you to put it into the business context most relevant to you.
Let’s say you have a watch that monitors your heartbeat and tries to accurately count your steps. In isolation this data is fairly stupid even though it does have normal boundaries for heart rate and pace.
As soon as you add your gender, weight and height into the watch you have made the data a bit smarter and at least provided clear upper and lower boundaries for health monitoring.
You go on a hike and aim for a nearby mountain top frequented by you and other mountaineers. This time the weather changes and even tho it’s not too cold the wind has a chilling and taxing effect on you as you aren’t as prepared as you should have been.
The additional data we could have at this point:
Location, destination and path
Weather conditions, not just temperature
Accurate grade/slope and pace
Then we join it to some historical data:
The previous data points from this trip
The readings from your previous trips
The anonymous readings from others going the route
Historical data associated with known anomalies and health issues
Timing is (almost) everything
Now that we have smarter data we need to understand the importance of it being available as close to the source, in space and time, as possible. Adding this contextual, linked, metadata further down the processing stream may not be as efficient.
With this data it’s reasonable to assume that your watch can detect and/or predict performance and health issues and even, in worst case scenarios, automatically contact 911 in the case of emergencies.
At least it should be obvious that we can get a lot more of meaningful and actionable intelligence now that we have turned our dumb data into smart data.
We collect way-more-than-a-healthy amount of data every day and, in many cases, we do so without getting the benefits from it.
From the source the data travels through message queues and data pipelines on its way to data warehouses, data lakes or huge analytics clusters.
By enriching it close to the source we are not only delegating some of the processing to the devices but also making sure that all the processes, down stream, have access to the same smart data.
Data refinement and preparation
It can take time to refine data, link it to contextual metadata and make it “worthy” as input for data science and unfortunately a lot of that work is highly redundant as data scientists, developers and database specialists clean the same kind of s***t and link to the same, commonly used, metadata.
I think that it’s time to rethink the data refinement and data lookup part and make some serious progress in simplifying common tasks and access to commonly used, contextual and linked, metadata.
Turn data into knowledge and appropriate actions
The idea of transforming dumb data into smart is simple enough. It’s the data refinement, conformity and contextual metadata linking which is the tricky and time consuming part which is often missing.
Insufficient understanding or respect for the importance of this part of the process, by the “business end”, fosters unhealthy expectations towards data scientists and their ability to produce smart results from dumb data which does now, as before, end in a all to known “garbage in - garbage out” kind of way.
When all of the preparation is done the data scientists need, in many cases, to select the right features/properties to use as a part of their training data set for machine learning etc. and that, even with all the smartest of data can be hard enough.
Turning wast amounts of data into knowledge and actionable intelligence is hard-enough with smart(er) data. Forget trying to do that efficiently with loads of dumb data and tell me how that works out.
A lot of business data and machine generated data can, if turned smart, produce highly valuable insights and actionable intelligence.
I predict that once we make data smarter, closer to the source, that extracting actionable intelligence and meaningful insights from it will be a (almost) basic demand where the processing work will be delegated more often, rather than all being done centrally in the cloud as is now the case.
Trust and privacy
Moving this processing to the device is also a key part of protecting privacy and gaining control of ones own data. A device fetching additional, anonymous, metadata to process this could then a) to all the things you might expect and b) control what data you decide to share with the device manufacturers or other services.
I feel compelled to state it very clearly that even though the topic is Smart Data, which is meant to be connected and the example I gave involves personal information, that advocating for the Smart Data approach, to maximizing the value of data, is no way shape or form meant to justify or encourage the abuse of trust and misuse of personal information.
Smart Metadata (of the contextual and linked variety) is a topic for another blog post but the short version is this.
I consider metadata, contextual and linked, to be smart when … :
… it is of good quality, is relevant, accurate and trusted.
… it is easily attained and directly contributes value to the data it’s associated with.
… is aware of its state and can “check” if it’s outdated.
… the origin of it can be tracked and verified.
… can be decentralized whilst respecting security and access control
The road ahead. A personal footnote.
This article is the first in a series of short, technology focused, articles which I hope will be informative. They will also mark some milestones on the journey of building my next startup.
Today, being the first such milestone, marks the “birthday” of Universal Data Sources (Snjallgögn ehf.) which will own and operate quicklookup.com, globalidentifier.com and universalgraph.com … but more on that later.
Thank you for stopping by.
Also: Big Data can be smart but having warehouse-loads of dumb data is certainly not.