Are the phrases “data is the new gold” or “data is the new oil” just marketing slogans to entice businesses to spend more money on technology? Or are these metaphors describing a new paradigm that revolutionizes the world?
Let’s try to find that out and start with a few facts. Both oil and gold were buried deep under the surface of the earth for millions of years. And both only started to become valuable for humans once we learned how to dig them out of the earth and process them as resources to become jewelry, or fuel for light, heating, transportation and energy. Similarly, data has been around for ages, starting with early Sumerian and Chinese writing systems, throughout centuries of data in printed books and since the 20th century in digital form on our computer systems.
Also, data only has value when it is served in the right format so that it can be read or consumed in another way, and more importantly, made accessible for people who need it, when they need it. That is why libraries were built to store books, allowing scholars to easily access the information they needed. For centuries, the use of data was at a stage comparable with the use of oil for lighting with a petrol lamp.
It started to accelerate with the digital era. Since the first days of computers, methods were developed to make the information readable and accessible for people. The birth of the world wide web with hyperlinking capability was another big step forward in creating value with data, also enabling you to read this article.
Ubiquitous Data
There is a big difference between data and gold or oil. While the available amounts of new gold and oil are shrinking, the available amounts of data are increasing exponentially. As you can see on the graph below, the amount of digital data globally is doubling approximately every three years. The 74 zettabytes we have today stand for 79 billion terabytes, which is more than 10 terabytes for each living human on earth!
Another big difference is that this increasing amount of data is also created in more and more variations. While gold and oil still look the same in 2021 as it was centuries ago, data has become so variable that it becomes impossible for humans to comprehend it all. At the same time, we all know that having insight into facts is a key to decision-making. Looking back a few decades, the information for decision making was based on a small set of fact tables that had to be nicely summarized and presented in a report (in those days Excel was a revolution!) and in many businesses that is still the core of the decision-making support system.
Data Opportunities
Now here comes the great opportunity. In the growing pile of data, and thanks to the connected world, there is also information about global economic trends and markets, geo-political factors, supply chain events, consumer and customer behavior, and so on. And within the walls of the enterprise, with the Industrial Internet of Things (IIoT), robotic process automation (RPA), and business process/workflow management (BPM) a whole new host of data streams becomes available.
It must be clear that enterprises that manage to capture a rich variety of data and are able to transform it in a format suitable for decision-making have a huge advantage over companies that make decisions based on traditional reporting only. The first category of companies will be looking at facts over a much larger horizon and with much more colors in the picture.
The Evolution of Data Technology
Looking at the evolution of the technology, the new data flows are also becoming a feedback loop in the processes that are increasingly automated; for example, the data from an automated workflow can be used to detect bottlenecks and make automatic adjustments in the flow itself to balance the work.
That brings us to another important technological aspect of modern data management. As mentioned, the huge volumes and variety of data make it impossible for humans to find the information that is needed and transform it into a useful format. That is where the algorithms come into the game: these are smart pieces of program logic that are able to crawl through terabytes of data and detect patterns in information that are relevant for the decision process they need to support. We call it smart logic because the logic can ‘learn’ from previous results in a feedback loop and get better and better in doing their task with each iteration. Other algorithms are specialized in correcting errors and gaps in the information (cleansing) and other algorithms do the data harmonization and aggregation. All these techniques fall under the umbrella of what is called ‘Data Preparation’, making the data available for the consumers.