Its Time To Think About Data Differently?

Time to think about data differently

Man being a creature of habit tends to incorporate “skeumorphic” elements into evolutionary designs to provide that level of “comfort” as we climb the ladder. Think about it, why does your computer need files, and folders? This paradigm was carried forward from the physical world yet lacks a relevance to the digital one which it is applied to.

Now some may argue this is a requirement for acceptability or better put “understand-ability” of the masses as the path of evolution to revolution is in fact sigmodial and we homosapiens tend to be a funny bunch about dragging our baggage with us. However, to make that final leap we do have to leave that “baggage” behind us as it is accumulative in nature and this brings me to the idea of data.

As the computer does two things, ether it is a reductionary device where it is provided massive amounts of data such as in “Big Data” and it is asked to “reduce” it, or “creationary” where its provided something (typically an Algorithm) and it creates something from that. In the later case, large random number sets for Monte Carlo simulations would be an example. Now since “man” (referred to in a phyla sense) created the computer, they also applied the skeumorphic concepts of “data” to the model so both operations creation & reduction depend upon, and these are now the bottle neck.

As we all know data is growing exponentially and I will save you the rehash of the details, yet the explanation above is important to understand as this is why data is growing. The First Law of Thermodynamics proves this out as we can only shift the information in a system, we can not create nor destroy it, so we are left with a bit of a conundrum.

Lets step back for moment and look at data as well as how it is used by a computer today to better understand how our applied skeumorphism is holding us back. As data exists on a hard disk as series of ones and zeros press very tightly together in a small sequential space on a round spinning disk. For moment we will forget about SSD’s as their relevance isn’t important to the concept, as the point is all of your data exists in “whole” for the most. Your Miley Cyrus songs and all exist in a whole state on your hard drive. Now say you want to share this song (legally) with a friend so you tell your computer to “copy” this data to their computer across the country and what happens?

Well first off your computer will waste a substantial amount of bytes (in reference to the data size you wish to move) just to find and establish a connection with your friends computer, next your computer will say “I have a One, please create a One on your side” and so on till the process is complete yet both computers will spend (waste) a significant about bytes exponentially more than the actual file itself. Note, this discussion isn’t about protocols and the like for communication as all forms of communications have a cost (confirmed by the Second Law of Thermodynamics) so we accept a measure of this as a given. However what if we didn’t have to do this hand shake for each and every byte of information?

How to get around this you ask, the answer is simple math. The computer is a “math” machine, in fact that is all it knows yet we spend large amounts of time and money forcing it to understand our perceived mental models of “data”. With this said, what if we reduced say a terabyte of data to just one algorithm and instead of sending those billions or trillions of bytes we sent one formula? Now you might say we have started down the sigmodial road of this using zip files and WAN compression, yet this is only circumstantial to the greater idea of everything actually being a formula.

In the past, slow processing abilities of the CPU’s created a limiting factor as most of the data taken in by a computer has analog origins and the reduction to a single “formula” if you will was not reasonably possible, yet today those same chains are quickly falling away and new abilities are being created every day.

Evidence of this can be seen in the growth of Regex (Regular Expressions) where linguistic patterns are distilled to mathematical equations which can be rapidly applied to look at vast amounts of data. This is how your spam filters work, as to attempt to a string for string match across the millions of mails passing though those servers would be impossible. Additionally, this is how the NSA also looks at all the data they do, as what people miss is there is to much to read, so the goal is to mathematically mine for items of interest and algorithms allow for this to happen.

Still not convinced this is possible? Well you have to look no further than yourself as you in fact are nothing more than an extremely large hard-drive. What do I mean, every cell in in your body was built by a single root data model named DNA, yes Deoxyribonucleic Acid. Comprised of only five elements being Hydrogen, Oxygen, Nitrogen, Carbon and Phosphorus, they form only 4 (one half of a byte) building blocks being guanine, adenine, thymine, and cytosine. These four building blocks recombine to from our complete being including “conscience” from this one seemingly data model.

To extent this back to the digital world for moment, what if we could store that terabyte of information we spoke of earlier in just one formula? The imputed abilities and saving of such a capability would be enormous. Simply look at the energy (typically electrical) to store all this data by spinning hard-disk or refreshing SSD’s for each read cycle.

Keep in mind that while we like to think data is “unique” to us in the pictures we take and music we record, this is really not the case as while the possibly for data to be infinite exists, the probability for it to be finite is statically overwhelming…