There may be nothing in the universe more mysterious than a black hole, but the first photograph of one such formation in the Messier 87 galaxy made the phenomenon a little more familiar. In April, the image of a glowing red ring got its 15 minutes of fame.
A second photo featured Event Horizon Telescope computer scientist Katherine Bouman next to a table covered with hard drives, which held information from eight observatories that contributed to the black hole image. Some technology professionals responded with incredulity at this physical sign of the “old school” methods the team used for data transmission—namely, shipping via FedEx.
The reason for turning to air cargo in place of data cables is simple. Using internet connections to send five petabytes from Hawaii, the Chilean desert and Antarctica, among other locations, to the compilation centre at the Massachusetts Institute of Technology would take years. Shipping mirrored hard drives, however, takes mere hours.
Using aeroplanes for data transmission related to a groundbreaking scientific project is surprisingly emblematic of the new era the world is entering. Our growing ability to monitor and track nearly everything around us will soon overwhelm our resources for storing and sending all that data. Creative alternatives are urgently required.
Most of us have read the statistics: global data volumes are expected to reach 175 zettabytes by 2025. Sensors are ascendant, and devices will soon generate far more information than humans with our documents, videos, texts, and other outputs. How we will manage the forthcoming data explosion is unclear. Like the black hole photo, we can see the outlines, but the detail remains out of focus.
The data transmission problem
One trillion gigs of data and counting will present two critical challenges for the technosphere. First of all, it is impossible to efficiently transmit such a massive amount of information over current or next generation networks.
It’s not just the Event Horizon project having issues getting five petabytes from A to B. It’s the transportation industry dealing with the five terabytes of data each autonomous vehicle may generate daily. As well as this, it’s the biotech confronting the 500 terabytes one genome analysis can produce, and manufacturers preparing to collect countless status updates from nearly every consumer product imaginable – not to mention their own facilities and assembly lines.
There are cases where slow and steady data transmission will suffice. But an autonomous car must know instantly whether to brake, and reports of a fire at a remote factory must be addressed immediately. This brings technology to the edge.
Among the possible solutions for dealing with the massive increase in data volumes is to move compute and storage closer to the point of data creation. Edge and fog computing promise to help achieve the low latency demanded by such applications as augmented reality, as well as to compile raw data into more useful and condensed form before sending it to centralised data centres and the cloud.
These technologies will together limit the pressures on our networks to some degree, but this will be done using architectures far different than the cloud-dominated prognostications of just a few years ago.
Storage of an expanding data universe
The second key problem with creating ever more data is storage. For one image, the Event Horizon Telescope generated the equivalent to 5,000 years’ worth of MP3s, which filled half a ton of hard drives. The world’s data that will exist in 2025 will require 12.5 billion of today’s hard drives to be housed. As on-premises and cloud storage are backed primarily by hard drives of the spinning and flash variety, the future of this technology is a chief consideration.
Herein lies the bad news. Drive-based storage has reached the superparamagnetic limit, a big word for the Moore’s Law equivalent in the digital storage realm. Just as processor speeds have doubled about every two years as manufacturers crammed more transistors into integrated circuits, storage has become more powerful. This is at a rate of about 40 percent per year, as OEMs shrank the magnetic grains coating disks. And just as there is a physical limit to how small transistors can be made and how closely packed together these are in an Intel chip, there is a smallest practical size for magnetic grains—and it has been reached for hard drives.
Why hasn’t this gotten much attention? So far, manufacturers have responded with more heads and platters to boost capacity in the same footprint, but the gains delivered through these means are slowing. There are technologies in the works to help, including heat-assisted and microwave-assisted magnetic recording, but they remain problematic and costly.
These engineering realities have experts looking in various directions for how best to encapsulate the record-setting amount of data humankind will soon produce. There are advocates for tape backup, which despite its latency offers energy efficiency and a still unmet superparamagnetic limit. Farther afield, researchers are exploring how to store information as DNA. In this scenario, all global data could fit into a coat closet, but the fact that the polymer encoding would need to be rehydrated and fed into a sequencer to be read means instant access would not be a feature.
There will be good answers to the storage conundrum, but better IT equipment will not change the fact that voluminous stockpiles of data presents cost, security and compliance issues, as well as usefulness challenges. Enterprises will need to address this.
The importance of data management
In the coming years, organisations will inevitably throw hardware at the data explosion. This will often mean keeping existing storage systems on hand far longer than they may expect, as the budget available to transition archives to newer storage arrays will be limited by the concurrent need to add capacity for incoming data. This will affect procurement decisions, equipment lifecycle management, and hardware maintenance, support, and upgrade choices.
Ultimately, the data deluge is not solely a hardware problem. Much like a family home bursting at the seams with clutter, the best, most cost-effective solution is not to buy a bigger house. The technology industry will need a Marie Kondo-esque dedication to tidying up. The choice of what data to keep and what to discard will become increasingly important if companies want their heads to rise above the IoT data floodwaters.
It will, therefore, be critical for organisations to develop strong, effective data management strategies to limit the volume of information retained to the most valuable and impactful for their operations. The design of a data lifecycle road map, from the original creation to its final destruction, must be on the agenda. Final destruction, for security, privacy, and compliance reasons, must be as permanent and irreversible as if the information fell into a gaping black hole.