AI Developer, A Job For Life.

Last year we wrote about No Free Lunch Theory (NFLT) and how it relates to AI (among other things). In this recent Wired article, this seems to be coming true. Deep Learning, the technology that helped AI make significant leaps in performance has limitations. These limitations, as reported in the article, cannot necessarily be overcome with more compute power.

As NFLT states (paraphrased): being good at doing X means an algorithm cannot also be good at Doing Not X. Deep Learning models that have success in one area is not a guarantee they will have success in other areas. In fact the opposite tends to be true. This is the NFLT in action and in many ways, specialized-instances of AI-based systems was an inevitability of this.

This has implications for the broader adoption of AI. For example, there can be no out-of-the box AI "system". Implementing an AI solution based on the current-state-of-the-art is much like building a railway system. It needs to adapt to the local terrain. A firm can't take a system from another firm or AI-solutions provider and hope it will be a turn-key operation. I guess it's in the name, "Deep Learning". The "Deep" refers to deep domain, i.e. specific use-case, an not necessarily deep thinking.

This is great news if you are an AI developer or have experience in building AI-systems. You are the house builder of the modern age and your talents will always be in demand - unless someone automates AI-system implementation.

UPDATE: A16Z wrote this piece - which supports my thesis.

Beware AI Homogenization

Many firms (Amazon, Google, etc) are touting their plug-and-play AI and Machine Learning tool kits as being a quick way for firms to adopt these new technologies without having to invest resources building their own.

Sound like a good idea but I challenge that. If data is going to drive the new economy, it will be a firm's analytics capabilities that will give it a competitive advantage. In the short-term adopting a third-party framework for analytics will move a firm up the learning curve faster. Over time this competitive edge becomes blunter, as more firms in a sector start to use the same frameworks in the race to be "first".

This homogenization will be good for a sector but pretty rapidly firms competing in that sector will be soon locked back in to trench warfare with their competitors. Retail distribution is a good example, do retailers use a 3rd party distribution network or do they buy and maintain their own fleet. Using a 3rd party distributer saves upfront capex but it voids an area of competitive advantage. Building their own fleet, while more costly, gives a retailer optionality about growth and expansion plans.

The same is true in the rush for AI/ML capabilities. While the concepts of AI / ML will be the same for all firms, their integration and application has to vary from firm-to-firm to preserve their potential for providing lasting competitive advantage. The majority of firms we have spoken to are developing their own tool kit, they might use established infrastructure providers but everything else is custom and proprietary. This seems to be the smart way to go.

Data Moats and Brand Growth



19th C Print Brontosaurus Excelsus by Joseph Smit

Once brands and companies embrace data, the biggest challenge they face for long term growth, and survival, is ensuring the data they control has a broad scope - i.e. it allows them to look to their edges of their vertical, and beyond. When Warren Buffet chooses firms to invest in, he looks at what he describes as their moat, either an economic, ip-related or technological moat. A firm's moat helps insulate them from attack, and allows companies to weather economic downturns. In the data age, building broad scope data sources is an extra moat for firms.

Most companies in established verticals are either competing with Amazon or worried they will end up competing with Amazon. As with Google, Amazon gets a wide and detailed view of customer behaviors and trends from its many businesses. Just three of those business units, online store, web services and amazon video, provide a rich understanding of consumers and their preferences that Amazon can use to identify opportunities for launch new businesses.

Amazon is clearly ahead of the game, and if they don’t make a misstep the lack of real competition this early on will no doubt allow the power law of growth to take hold in many verticals. As Marc Andreessen wrote in 2011, “Software is eating the world”, Amazon is doing just this. No doubt Google and Facebook are on a similar path and are equally hungry.

For the rest of the commercial world, sage advice would be to build broad scope data sources that you control, rather than just enhancing analytics capabilities. Data sources that provide insight into incremental audiences are crucial to audience and sales growth.

Firms should invest in creating new behavioral datasets that they can control (analyze, shape, create experiments with). Ultimately future success will be determined by whether firms can create demand for their products by changing people’s behaviors and create incremental audiences. Looking beyond their verticals, and thinking about the growing their categories is key to this. This can only be achieved successfully by committing to a data strategy that encompasses, developing broad scope and deep data sources as well as advanced analytics.

Building the Future of Machine Learning and Analytics. Right Here, Right Now.

 

 

TechCrunch recently published an article which describes what I am building with the Knowledge Leaps platform (check out website here).

Knowledge Leaps, is a soup-to-nuts data management and analytics platform. With a focus on data engineering, the platform is aimed at helping people prepare data in readiness for predictive modeling.

The first step to incorporating AI in to an analytics process is to build an application that automates grunt work. The effort is in cleaning data, mapping it and converting it to the right structure for further manipulation. It's time-consuming but can be systematized. The Knowledge Leaps application does this, right now. It seamlessly converts any data structure into user-level data using a simple interface, perfect for those who aren't data scientists.

Any data can then be used in classification models using an unbiased algorithm combined with k-fold cross validation for rigorous,objective testing. This is just the tip of the iceberg of its current, and future, functionality.

Onward, to the future of analytics.

When Do We Start Working For Computers?

I have done some quick back-of-envelope calculations on the progress of AI, trying to estimate how much progress has been made vs. how many job-related functions and activities there are left to automate.

On Angel List and Crunchbase there are a total of 4830 AI start-ups listed (assuming both lists contain zero duplicates). To figure out how many unique AI tools and capabilities there are, let's assume the following:

  1. All these companies have a working product,
  2. Their products are unique and have no competitors,
  3. They are all aimed at automating a specific job function, and
  4. These start-ups only represent 30% of all AI-focused company universe.

This gives us a pool of 16,100 unique, operational AI capabilities. These capabilities will be in deep domains (where current AI technology is most successful) such as booking a meeting between two people via email.

If we compare this to the number of domain specific activities in the world of work, we can see how far AI has come and how far it has to go before we are all working for the computers. Using US government data, there are 820 different occupations, and stock markets list 212 different industrial categories. If we make the following set of assumptions:

  1. 50% of all occupations exists in each industrial category,
  2. Each occupation has 50 discrete activities.

This gives us a total of 4.34 million different occupational activities that could be automated using AI. In other words, at its most optimistic, current AI tools and processes could automate 0.37% of our current job functions. We have come a long way, but there is still a long way to go before we are out of work.  As William Gibson said, "the future's here, it's just not widely distributed yet"

In AI, We Trust.

I think this article sums up the challenges of facing the data science community and, by extension, all data analysts. While much of what we are doing isn't in the realms of AI, a lot of the algorithms that are being used are equally opaque and hard to comprehend with the human brain.  However, there is an allure in the power of these techniques but without easy comprehension I fear we are moving into an era of data distrust.