Big Data == Big Variety & Big Variability

I read a decent amount about Big Data in the trade blogs.  There’s a singular assumption among all tech authors that Big Data is new because the Volume and Velocity of data were just too big previously to handle these volumes of incoming data.  That’s patently false.  I worked in Telecom for nearly 10 years, where data volumes, which were already coming in in a structured form, exceed, both in motion and at rest, what would classify as Big Data today.  Techniques were developed to very successfully deal with the Volume and Variety, even providing detailed and quick reporting across Terabytes or Petabytes at rest.

That’s not to say that technology was easily accessible to everyone.  The legacy BI vendors will sell it you, but the costs do exceed what most enterprises can afford.  What I will say Hadoop and others have done in the Big Data space is commoditize the space to the point where its now accessible to less than the Fortune 100.  Where all the solutions fall down until we start getting into true Big Data stores like Hadoop, Splunk, Cassandra, Mongo, etc, is dealing with Variety and Variability.

The Achilles heel of legacy BI technologies is requiring that data to be put into a structure at insert-time.  The structure also needs to be consistent amongst that data type and all data types must be  known in advance.  This is variety.  We don’t know when we begin to build our data warehouse what interesting data we’ll find.  What we know is that we want to stuff it somewhere so that it can later be added to the soup that is our intelligence gained from disparate data types.

What happens when that data format changes?  Again, this is the true differentiator for Big Data technologies.  Anyone who has worked in a legacy BI environment can tell you, simply adding a field to a report can take months, especially if that change requires going all the way through the architecture to add the field to all the processing steps from the raw data.  Dealing with variability is non-existent in legacy BI environments.

The Data Warehouse of the future won’t sit in a structured database.  Sources of that data will exist in structured forms because it will make sense to maintain financials in an ACID-compliant RDBMS, but all that data will be fed into an unstructured repository, likely HDFS, as files that can simply be joined, semi-structured, to other semi-structured data through query languages which are likely in their infancy or not yet written.  The concepts of rigid tables, OLAP cubes, etc, will be foreign and the time sunk into writing ETL processes, managing environments full of copies of that structure to test migrating changes, etc, will be reclaimed for use in gaining intelligence about the Enterprise.

The big promise of Big Data isn’t that we’re going to see so much more; the big promise is that we’ll suddenly be free of the shackles of legacy technology that’s been stretched well beyond its logical breaking point!  The man-decades of lost productivity industry-wide will suddenly be freed and unleashed like a tsunami.  Insight will abound, decisions will be informed, and IT will finally begin to deliver on promises made decades ago to deliver true, real, value to their businesses.  All because we’re going to free the most valuable IT asset, the IT worker, now give a more glamorous title of Data Scientist, from the horrendously unproductive way we’ve been managing our data for the last 20 years.  Be prepared, the revolution is coming.

 

Advertisements

On living without interruption

Dave Winer is inspiring me to write smaller posts on things that come to mind.  I think it’s good for the soul.

One of the most telling things about changing jobs to a completely different career, from IT Operations to Marketing, is that there are some people who, while well paid, really get treated like shit.  All of you IT Operations professionals out there, I am truly sorry for how your management and employers mistreat you. And for you individual contributors out there, I’m going to go one step further, and I’m going to sincerely apologize to your bosses, probably up to Director level, although some VPs probably get the shaft too.  Your middle management’s life sucks, and, assuming they do right by you, you should be thankful they’re willing to work 24 hours a day 7 days a week with no on-call rotation.

IT Operations people’s lives are constantly getting disrupted.  This is the case in many operational jobs, but none more so than in IT Operations.  Sales Vice Presidents don’t get called every night a time or two to approve something or to be informed of something, and they certainly don’t work many weekends or miss vacations because some system decides out of the blue to go on the fritz.

Many times during my career people would tell me that there were other jobs out there that didn’t have the kind of time commitments, constant interruptions, and general downsides that IT Operations had.  I always laughed them off, told them I was sure they were right, but now that I’m on the other side I have to say that life is truly better when your phone doesn’t ring every night with a problem.  It’s like when you’re young and poor and you’re afraid to answer the phone because of some bill collector on the other end of the line.  Once you get older and start paying your bills on time, the phone ringing no longer evokes a visceral fear reaction, and it’s the same after you change jobs.  There is life after Operations, and while it isn’t for everyone, after 15 years of it, it was for me.


Customer Centricity is Evil, in IT

I was listening to a video from Startup Lessons Learned, where a brief comment was made regarding customer centricity at IMVU, which reminded me of a nagging thought in my head in recent months.  Customer Centricity is evil, in IT.  That’s a bold statement, so let me explain.  There are a lot of mechanisms of providing the customer what they need, and perhaps even want, without placing them at the center of your activities.

Being customer centric, as implemented most places I’ve seen, means an unwavering attitude internal to the organization that it will bend over backwards to provide the customer what they want.  This leads to some unintended consequences, such as a continual focus on the functional, rather than the non-functional.  For those who spend little time doing software project, program or requirements management, functional requirements are things like “I want the software to be able to calculate taxes and print it on a receipt” and non-functional requirements are things like “I want the software to always work.”  Bending towards customer centricity and an attitude that the customer is always right, especially in the world of custom software, means the resources of the organization are constantly allocated to the functional while often largely ignoring the non-functional until it’s too late.

This attitude can and will, at a point in the future, leave you in a situation where the scale and complexity of the software the IT organization is producing will begin to collapse under it’s own weight.  The nimbleness once felt, perhaps as recently as a few years back, will begin to become cumbersome, as the continuous iterations of “get it done, the customer needs it right away” lead to band-aids, patchwork engineering, and layer upon layer of complexity.  Time for things like application rationalization activities, engineering and architecture improvements, and infrastructure improvements are pushed aside in order to make room in the capacity model for more functional requirements for the customers.

The solution? Simple IT.  A culture of simplicity is required around every change to drive compromise between what the customer wants and needs and what IT can deliver and maintain.  Constant reinforcement of a simplicity culture is required to drive every individual contributor to ask “is this additional complexity required?”  It will drive from monolithic to modular, from n to n-1 layers of abstraction, from customization to off the shelf.  No organization is perfect, so additional iterations of software should continually simplify and rationalize the changes that in hindsight complicated rather than simplified.  The customer demands the non-functional without asking for it, and the customer gets what they’re asking for through simplicity.

What does the mean for the customer?  The customer asks for 30 reports, they get 5, because it’s actually in their benefit.  The customer wants business rules that force the agents to jump through hoops to try to upsell customers, which leads to ridiculous clicking and wasted agent time, IT pushes towards training, because the complexity hurts the users’ perception of the software and impacts their behavior negatively.  The customer asks for a different pricing model for every market, IT refuses to support it.  Will this lead to significant conflict and accusations that IT is not responsive to the business?  Absolutely!  Perhaps the company will go through three IT leadership teams before they realize their behavior is the problem, or perhaps they will never realize.  Either way, IT has a duty to the business to draw a line between what is possible and what is right, and IT is in the unique position to provide the business with facts and data about how decisions driving complexity hurt not only IT, but hurt the business.  Simplicity is, in reality, complicated to deliver; it’s the enemy of mediocrity; it’s the champion of forethought and design; and a culture centered around simplicity is the only way to deliver years of iterative software without being forced to spend significant amounts of money, time, and resources, redoing what customer centricity, and by extension, complexity, create.


Time to start writing thoughts again

A couple of things have happened recently which has made it to where I believe it’s time to start writing thoughts, publicly, again.  First, I’m relatively certain people do not read this, at least regularly.  Secondly, even though I have advanced significantly in my career and I am now a very competent middle manager, I believe my entire happiness in my work life is suffering simply because we seem to defer to political correctness.  Making sure no one’s feelings are hurt rather than running an organization as a meritocracy where the best ideas win seems to be lay of the land, at least in American business these days.  I used to not be afraid of putting my thoughts out there, now I seem to be in constant fear that being liked is the most important aspect to career advancement.  I have decided that while I have come very very far in the last few years attempting to ensure that I am effective at change in a large organization due to being well liked and respected, I will no longer place being liked ahead of stating what I think will be in the best interest of our organization.  That does not mean I intended to go back to being a daft prick like I have been in former years, but I will be firm on my thoughts on the direction our organization should be taking.  I want these ideas to stand on their own, so this will be immediately followed by another post on being too customer centric.


Down with P2P, Part 2

Mr. Cuban has another post up about P2P.  I want to refine his model a bit and propose one that I think would work far better.  First, there’s something fundamental that most people don’t get about the Internet business.  While the Internet was designed to be a P2P medium, with end to end connectivity between all nodes, largely it’s become a publish and subscribe model, more like television and less like the phone system.  Since primarily people want the content that’s available “out there” and they’re not so interested in sending things “out there”, the technology and the service offerings have been designed to offer bandwidth asynchronously to the home user.  This means that instead of a something like a T1, which offers 1.544 megabits per second synchronously (meaning you can transfer and receive at the full rate, all the time), home internet usage is sold asynchronously (for example, I have 8 megabits downstream and 2 megabits upstream).  However, at the provider level, bandwidth is sold synchronously.  These providers are buying large pipes (OC48, 2.4 gigabits, OC192, 10 gigabits, etc), which provide for as much upstream as they do downstream, but since their customers buy asynchronously, they generally have large amounts of upstream capacity available.

The problem with the unlimited model is that people will use more on an unlimited plan than they would normally.  Think about the people that feel the need to gorge themselves at a buffet “to get their money’s worth.”  This isn’t necessarily a problem.  The company I work for sells unlimited wireless.  We can do this because there is a significant amount of cost that can be removed as well as a significant amount of profit that’s embedded in the wireless business that we eschew in favor of servicing an underserved customer base.  It’s working well for us now.  However, we don’t work in a business where any given customer can use 100 or 1000 times more of what we’re selling than another.  This makes for an incredibly difficult problem to manage for ISPs.

Mr. Cuban posits that it would be best to start charging for upstream bandwidth, which would limit the amount of seeding done from P2P users.  However, it’s not the seeding that’s slowing down the network, it’s the downstream.  Most protocols are setup to allow more transfer for the more you seed.  So, while his model would work, I think there’s a far simpler model that would work for everyone, although it would surely piss off the net neutrality folks.  Basically, the idea would be to create two tiers of service.  One would be a metered model, which is what the providers would primarily be selling.  The metered model would offer something like 100 to 200 gigabytes of transfer per month, which is far more than the average customer users.  It’s enough to do some P2P transfers without blowing outside your bucket, but it limits the network abusers (the ones downloading terabytes a month) from falling into this plan.  This plan will be a premier plan.  For giving up your unlimited plan, you will be placed into a QoS bucket that has a higher drop priority than unlimited customers.  The second plan is the existing unlimited plan.  This plan could charge more than the rated plan or charge the same, either way has pluses and minuses, and it will offer truly unlimited service.  No letters from the ISP about abuse etc.  The customer is made aware that they are being offered the same max downstream and upstream rates, but they that are receiving a lower class of service.  They will be placed into the lowest QoS bucket.  Without a congestion scenario, no one notices any difference.  During peak times, when the unlimited users are filling up the pipes, the metered users are still receiving high quality always on Internet access, and the unmetered users still get to download to their heart’s content.

This will require the same shaping devices the ISPs are already using to control inbound bandwidth, but rather than shaping at the protocol level, they will shape at a subscriber level.   There is technology already in place to accommodate this (we have a couple of devices from Cisco which will do exactly that).  The primary problem to implementing this strategy for most ISPs will be on the billing and provisioning side, but the software to do this is readily available.

The freeloaders will still get pissed off.  They think if they’re paying for 10 megabits of downstream bandwidth, they should get it, all the time.  They don’t understand the technology problems with actually filling a pipe (TCP wasn’t designed for fat pipe, high latency networks), and they don’t understand the business model of trying to provide high bandwidth connections when there’s no business feasible way of selling the service to where everyone can light up at once and have it work.  Hell, not even the telephone network can accommodate it, which is why during emergencies people are asked to minimize their phone usage, since the phone system can run into capacity issues.  The average consumer might be upset as well, thinking they’re getting less for their money than they used to (“I used to have unlimited, now I’m metered”), but I think this can be solved by education and marketing (“For the same price you’ve always paid, you will now be a premium customer and always have access to all the bandwidth you want, so long as you’re willing to limit your monthly transfers.”)  However, both are being offered the alternative to choose the other plan should they think that the downsides of the plan they’ve chosen outweigh the benefits of the other plan.  Everyone has options.

This will piss off net neutrality folks who think that the network should always be best effort, but this is a pretty justifiable position.  The ISPs have a right to frame their service to their customers how they so choose, and it does not affect how services on the Internet are delivered on a per site basis, merely on per subscriber per plan basis.  This is a legitimate business case which does not affect the ability of customers to have equal access to Internet resources.

In the end, I think it’s a compromise everyone can live with.  The technology is already in place, and I think the missing pieces would be relatively inexpensive to implement given the upsides to the business.  What do you think, Mark?


Down with P2P

Strangely, I find myself agreeing with Mark Cuban.  I’ve spent some time thinking back to what I’ve downloaded via P2P applications.  I’ve used BitTorrent and previous P2P technologies to download many things over the years, but I can only think of one legitimate application, and that’s Blizzard using BitTorrent for WoW client distribution.  The potential for this is immense, however, the only reason legitimately for Blizzard to use BitTorrent for distribution is to save on bandwidth costs on their end.  A company like Akamai could easily provide a similar or superior experience for most users, but it would cost Blizzard significantly more than their current distribution model.  I find it ironic that one of the most successful users of legitimate P2P is primarily using it to offload costs from them out to the ISPs when they are probably one of the most successful pay services on the Internet.  The only thing I’d miss about losing various P2P applications is the ability to download television seasons during the summer for viewing.  Mainly this is because there isn’t a suitable for-pay alternative.

Honestly, the striking fact is that 60% of Internet traffic is P2P, and that was from a report from last year.  It’s certainly not going down, if anything it’s increasing.  That means that every bit of traffic most normal users do (web browsing, email, etc) is fighting for bandwidth on networks that are largely congested simply because as soon as the ISPs provision more bandwidth, the P2P users fill up the pipes.  We can get into the oversubscription arguments, but frankly oversubscription is the only way the business model works.  If ISPs had to provision enough bandwidth for everyone to fully light up their last mile pipe to the home, they’d go out of business.  What this means, and what I’ve specifically been noticing more in the past few weeks as I’ve traveled, is that my service is starting to suffer.  Every time I get to a hotel, the damn pipe is filled and I can barely VPN into work to get email.  Even as I come back to Arkansas, I’m noticing that my mother-in-law’s Internet connection with Cox appears to be slow out in Greenwood.  It’s almost impossible without access to the various places I’ve been’s network management systems to fairly diagnose exactly why they’re slower than I expect, but a safe bet would certainly be on lack of bandwidth at the upstream (especially during peak hours) due to P2P users.

If I’m starting to get the feeling like my service is suffering, then shape all the damn P2P traffic down to 0.  Honestly, if I get better service, I’ll probably not lament the loss of my ability to make 250 TCP connections at once to pull down files in little increments at 10KB/sec per connection.  Maybe without the ability to go to the alternative and get the content for free, this will force consumers to start demanding acceptable for pay alternatives for the things they’re getting illegally currently.  I just don’t see anything getting much better in the current stalemate we’re in without some sort of drastic measures.  I just never thought I’d be siding with the providers on this particular issue.


My FireAnt Story

So, if you hadn’t seen the news FireAnt was acquired by Sonic Mountain (Odeo).  You can read recaps of the news on two of my favorite blog networks, NewTeeVee (run by Om Malik), and Tech Crunch (by Mike Arrington).

 I came to be involved in FireAnt through my connections to Jay Dedman and Josh Kinberg.  We had some discussions at Vloggercon in July of 2005 which extended into the following months involving my helping them get FireAnt off the ground.  I had started a project I was calling MediaFeedr, which would poll RSS feeds, examine any links, and then develop a new RSS 2.0 feed with enclosures for downloading into FireAnt.  The theory was that you could put any feed into MediaFeedr and then come out with any linked content as enclosures.  In reality, it never really got out of testing, but the initial feedback was good and I was proud of the code and the idea.

 Jay and Josh were in need of a directory.  Josh had put together some rudimentary code to implement some server side components to tie in the Mac and PC versions of FireAnt, but while Josh is an excellent visionary and a good leader, he is by his own admission a pretty poor coder.  I took the best of what I had and the best of what Josh had developed and we developed a videoblogging directory and some really innovative server side features to go along with with the video aggregation clients.  We spent months developing it, and we released it to the public on January 24th of 2006 (initial TechCrunch coverage can be found here).  We were ironically directly competing with Odeo at the time for one of the best directories available on the web.  It was developed with AJAX technology which at the time was still fairly new and required a lot of hand coding of JavaScript, etc.

 I was incredibly proud of the work I had done, but even by that point it was becoming obvious that the things we had thought were important weren’t what the market felt was important.  YouTube had in the course of a year become huge, and flash-based web video was where the traffic and the money was at.  The idea of aggregating different forms of video (of which Flash was incredibly hard to play on a PC based client and for the most part no sites supported RSS 2.0 with media enclosures) was falling by the way-side.  After a successful launch but a limit in the amount of video content to be obtained through podcasting, I left in March of 2006 shortly before Katie was born to pursue other opportunities and to limit my workschedule to spend time with my newborn child.

What went wrong then?  I’ve had over a year to reflect on this, and I think I can boil it down to a few choice areas where we wrong:

  • Too much focus on the business and not enough focus on the technology
    • We brought in BizDev people very early in the process, in fact before I even officially joined the company.
    • Our BizDev people were unsuccessful at selling the technology.  Simple fact is, they were opportunists who were looking to make a quick buck and really didn’t believe in the company other than they thought they had a gravy-train to ride on.  The early stages of the startup should focus on the technology first and the business second.
  • Poor initial design of the business and ownership structure
    • The initial design of the business was a 5 way partnership between two visionaries, two developers and one business development guy.  First of all, equal partnerships never work.  There was no clear leader and far too many chiefs without enough Indians.  When I was brought in, the initial founders were reticent to give up more of their ownership structure since it was already fairly deluted as it was.
  • We bet wrong
    • We bet people wanted offline content and simple aggregation of feeds across many websites across the Internet.  Fact was, people wanted one destination in their web browser to view content.  YouTube won, we lost.

 There were great people involved in the founding of the company, but there were just too many.  The next startup I do will have a clear leader, a core set of technology people, and we’ll worry about making money last.  There just isn’t enough of a small company to split it 7 ways.  It should be split three ways and then a quarter left over for the rest to come.  The development people, the ones doing the work to get the technology off the ground should come first.  I’m slightly bitter over the fact that I worked hundreds of hours and at the end of the whole story I ended up with virtually none of the company.  The technology I developed for them was critical to the initial success of the company and I felt from the beginning that even thought my work was highly valued, the ownership percentage was never ponied up.  This is probably why I left early and didn’t stick with the project.  I think had I have stuck with it and not run out of personal funds we probably could have been much more successful.  There were also numerous problems with the client development founders who were also having to work day jobs.  I was the best suited financially at that time due to my severance with Cingular to work for no money, and I was rewarded the least.

 While this may seem harsh to the people who were involved with the company, I want to point out that I feel no ill-will towards the people who I worked with.  Mistakes were made all around, and I have the highest respect for Josh, Jay, Daniel and Erik who were involved in the project during my tenure.  They are all excellent people, and I’d work with all of them again.  I only note these things largely for my own reference, and I point them out so that if I were to ever team up with these people again we can have an open and honest discussion of our mistakes so we don’t repeat them again.  This was a learning experience for all of us, and I hope that some time in the future I can find a way to work with these people again.

 I’d especially like to point out Josh’s effort.  Josh stuck with FireAnt from the beginning to the end.  Josh sacrificed far more than any of the rest of us, even delaying his wedding so that he could see this through to the end.  I consider Josh a close personal friend, and I’d jump at the chance to work with him again.  Josh is an excellent person of the highest moral caliber.  Josh has endured personal threats, personal hardship, and he has endured and completed this project while the rest of us moved on.  I have the utmost respect for the sacrifices he made, and I tip my hat to the Sonic Mountain team who more than the technology we developed got the best part of FireAnt when they got Josh.

 You can still see the technology I developed for FireAnt at getfireant.com.  Some of our more unscrupulous shareholders stole fireant.tv as part of a petty personal squabble, but at least it’s still available there.  To those of you shareholders who were involved in that, shame on you.  Being involved in a small company with no revenue is about sacrifice, dedication and a pursuit of developing your vision, not about cashing out.  Stealing money, lieing, and personal threats are no way to end a failed startup, and I hope you feel ashamed of your behavior.  You know who you are.

 Jay’s thoughts can be viewed here.  Josh’s thoughts can be viewed here.