--- My Inspiration ---
My first blog post was meant to be a perfectly constructed informative "how to", or a well balanced opinion article with multiple sources, or just something carefully thought through, but it's taken me so damn long to post anything that I thought I should just get the ball rolling by posting a comment that I've just written on a ZDNet article (under my "jamfuse" handle).
The article is by Larry Dignan, who I hold in pretty high regard. He wrote an article on an IDC report about global data storage and business. You can see it here http://www.zdnet.com/a-look-at-a-7235-exabyte-world-7000022200/
--- My Thoughts ---
There is some good stuff here but I think it begins to enter the realm of conjecture upon conjecture, and it seems to be somewhat static thinking, maybe not taking account of the fluidity that we are seeing entering system design and function.
"Tapes... will be tossed.." - well generally yes but they are incredibly dense mediums and don't suffer electro-static decay in the way that offline harddrives can do.
"...information will be viewed as a natural resource" - Ok are we talking "information" or "raw data"? For the latter, well as a basic example you can pipe a loop to /dev/null all day, there is no way that raw data is going to run out so it's not a natural resource. For the former, for information to be created we require CPU time, memory, and access to raw data (which we know can be infinite) and *maybe* storage if we want to keep the information we have created. Storage is the only one that is not transient. So as long as we adhere to the formula which goes something along the lines of...
"the cost of creating/buying storage dense enough to hold the information is less than or equal to the financial value of having the information"
... (which may be the formula the NSA use to justify their data centres) then we will always collect more information and therefore information is not a natural resource. Possibly the only way these could be considered natural resources is that if you chain the dependent parts of their production all the way to the extreme you end with electricity which may indeed rely on a natural resource! Don't get me wrong though there are plenty of big companies that would love you to thing that data is a natural resource, such as ISPs, that way they could justify charging ridiculous amounts for bandwidth caps! Do ISPs see a yearly/bi-yearly cycle of data floods and droughts?
"Owning the data will be everything. The vendors that capture the most data win. Period." - No, definitely not "Period". That's a silo. And that's a problem. For me this is an example of thinking from my generation or even the one before mine. Do kids these days silo their music or stream it? And when they are older and designing the systems that we are designing today I suspect they will take a different mentality - stream when you need it. If your systems get so bloated that they can't move quick that's a problem. Web 1.0 was all about hypertext, basically just text for human consumption. Web 2.0 was about rich content, video, sound etc, for human consumption. Web 3.0 is said to be "the internet of things" and therefore a network where one object can talk to another object, data available via APIs in a machine readable form - data for machine consumption. So although we could create silos so long as the storage costs equal the benefit of the store, it will make our systems so bloated that we can't turn them and change that quickly, we can't be agile in an era in which we need to be agile. But along side the option to store everything we see more and more options to fetch things via more and more APIs. This brings an era of companies or databases storing one type of data and the option for our systems to subscribe to many sources and create the information we need when we need it. Store the fundamental elements, don't store the compound that is created from the elements and that can be recreated from the elements.
If we all silo up to the max we will duplicate everything many times and collapse in our own bloat. If we keep the data or information that is key to our businesses and offer/fetch via API then we can stand on each others shoulders and progress and prosper. The vendors that create the most understandable information from diverse raw data sources will win. (In my opinion.)
So I take it you don't like it when developers just dump an entire XML block into a NVARCHAR database cell :)
ReplyDeleteI 100% agree with you, however you will always find that business requirements (More often than not) are dictated by non-technical users. Who are in business to make money. Their objective is the bottom line, and if saving a few days of development time is on the cards....they are gonna take it. What the world needs is more efficient tools and programming APIs that help developers work more efficiently in identifying and isolating the data that matters. Inherently most developers are lazy....that's what makes them so good at their job. So giving them the tools to easily sort data will help make this go away. There are some tools already out there...but we need more!