A Specialized Solution for General Use Cases

It doesn't take long to realize that data is agnostic to source. Web logs look similar to retail transaction data. Survey data and customer profile data can be handled in the same way.

When building a data platform, finding a niche is hard because if data is source-agnostic, so is the platform. The platform we have built is a generalist product; point it at any data source or stream and it will be useful. What links all the data sources together is another generic concept - customers.

Generalist products are hard to sell, primarily because it is hard to find audience insights that help contextualize the problems the product can solve.

For the past five years of building the product we have found it is easier to write code than create a succinct product proposition. In recent weeks, some ideas have been crystalizing and we landed on this:

Knowledge Leaps is a customer data platform for storage, engineering, and analytics of all types of customer data.

Knowledge Leaps is a cloud-based data management platform that allows for collaborative analytics, data management and data-workflow management.

Code-free Data Science

There will always be a plentiful supply of data scientists on-hand to perform hand-cut custom data science. For what most businesses requirements, the typical data scientist is over-skilled. Only other data scientists can understand their work and, importantly, only other data scientists can check their work.

What businesses require for most tasks are people with the data-engineering skills of data scientists and not necessarily their statistical skills or their understanding of a scientific-method of analysis.

Data engineering on a big scale is fraught with challenges. While Excel and Google Sheets can handle relatively large (~1mn row) datasets there is not really a similar software solution that allows easy visualization and manipulation of larger data sets. NoSQL / SQL-databases are required for super-scale data engineering, but this requires skills of the super-user. As 'data-is-the-new-oil' mantra makes its way into businesses, people will become exposed to a growing number datasets that are beyond the realm of the software available to them and, potentially, their skill sets.

At Knowledge Leaps we are building a platform solution for this future audience and these future use-cases.The core of the platform are two important features: Visual Data Engineering pipelines and Code-Free Data Science.

The applications of these features are endless; from building a customer data lake, or building a custom-data-pipeline for report generation or even creating simple-to-evaluate predictive models.

Competing For Space vs. Competing For Resources

On recent visit to Southwest Utah I saw lots of pygmy forests containing pinyon pines and small oak trees, these forests are sparse and the trees no more than 8-10 feet tall. The National Park literature says that these trees have adapted to low water conditions. Contrast this with the Redwood forests of coastal California where resources (water & sunlight) are abundant. In this environment the trees are more densely packed and grow much taller.

Replace trees with firms and resources for customers, and this paragraph could describe a business landscape. Being binary for a moment, a new firm gets to choose between choosing to enter a market where resources (customers) are slim or to enter a market where there are lots of customers. Choosing a market with few customers, makes it easier to differentiate your firm but the odds of survival are worse. Choosing a market with more customers makes it harder differentiate your firm and therefore the survival odds are also tough.

Unless of course, your firm is first. In both instances you get to choose the best position and consume all available resources.

Giant Sequoia

Building A Product. Lessons Learned.

Some thoughts on what I have learnt by working in a new company that is building software. A lot of what you "should" do is the wrong thing to do. Here are some reflections on building a firm in San Francisco.

Prospects First

Speaking to prospect firms will get you further, faster than speaking to venture capital firms. Firms that have pain points will pay for solutions and they won't care so much how many other firms have the same pain point. Venture capital firms are interested in size of market, size of outcome, probability of success, experience of the team. Answering a VC's questions won't necessarily help you build a product and a business. If you can't afford to build the software that will answer the pain point you are trying to solve, then work out what you can build and how you can bridge the gap using other means.

Perform The Process By Hand, Before Writing Code

The best business software is first cut-by-hand like the first machine screw. If your software replaces a human-business-process and you can't afford to build the software,  ask yourself 'how much can my firm afford to build?'

Most processes have the same elements: Task Specification, Task Execution, Present Results. The most complex part of this is Task Execution as this will require a lot of code and a lot of investment. As your company speaks to firms work out if it is possible to use humans to perform the complex Task Execution element. If you think it is then you should build a software architecture and framework that allows humans to do the hard work at first. This will help you refine the use-case and build more effective and efficient code. This also wouldn't be the first time this has been done, see here and here for more background.

A useful piece of military wisdom is worth keeping in mind; no plan survives first contact with the enemy.  While customers are certainly not the enemy, the sentiment still holds. It's not until you put your plan in to action and have firms use your product that you realize its true strengths and weaknesses. Here begins the process of iterating product development.

"Speak to people, we might learn something"

This is what my business development lead says a lot. He also asks questions that get customers and prospects talking. In these moments you will learn about the firm, the buyer, the competition, and lots of other information that will make your product and service better.

"We are just starting out"

This is another useful mantra. In lots of ways we do not know where our journey will take us. It is part inspired by company vision but also customer feedback. In Eric Beinhocker's book, The Origin of Wealth, he likens innovation to the process of searching technology-solution-space, an innovation map, looking for high points (that correlate with company profits and growth). The important part of this search process is customer feedback. What your company does determines you starting point on the innovation map, how your firm reacts to customer and market feedback determines which direction you will go in, and ultimately will be a critical factor in its success.

Platforms In Data

Data-is-the-new-oil is a useful framework for describing one of the use-cases we are developing our platform for.

Rather than their being just one platform in the create-process-deliver-use data analytics pipeline, a number of different platforms are required. The reason we don't fill our cars up with gasoline at our local oil rig is the same reason why data distribution requires a number of different platforms.

Data Platforms

The Knowledge Leaps platform is designed to take raw data from our providers, process and merge these different data feeds before delivering to our customers internal data platforms. Just like an oil-refinery produces the various distillates of crude-oil, the Knowledge Leaps platform can produce many different data products from single or multiple data feeds.

Using a simple UI, we can customize the processing of raw data to maximize the value of the raw data to providers as well as its usefulness to users of the data products we produce.

Beware AI Homogenization

Many firms (Amazon, Google, etc) are touting their plug-and-play AI and Machine Learning tool kits as being a quick way for firms to adopt these new technologies without having to invest resources building their own.

Sound like a good idea but I challenge that. If data is going to drive the new economy, it will be a firm's analytics capabilities that will give it a competitive advantage. In the short-term adopting a third-party framework for analytics will move a firm up the learning curve faster. Over time this competitive edge becomes blunter, as more firms in a sector start to use the same frameworks in the race to be "first".

This homogenization will be good for a sector but pretty rapidly firms competing in that sector will be soon locked back in to trench warfare with their competitors. Retail distribution is a good example, do retailers use a 3rd party distribution network or do they buy and maintain their own fleet. Using a 3rd party distributer saves upfront capex but it voids an area of competitive advantage. Building their own fleet, while more costly, gives a retailer optionality about growth and expansion plans.

The same is true in the rush for AI/ML capabilities. While the concepts of AI / ML will be the same for all firms, their integration and application has to vary from firm-to-firm to preserve their potential for providing lasting competitive advantage. The majority of firms we have spoken to are developing their own tool kit, they might use established infrastructure providers but everything else is custom and proprietary. This seems to be the smart way to go.

Data Engineering & Analytics Scripting Functions

We are expanding the operational functions that can be applied to data sets on the platform. This week we pushed out another product release incorporating some new functions that are helping us standardize data streams. Over the next few weeks we will continue to broaden out the data engineering capabilities of the platform. Below is a description of what each function does to data files.

We have also completed Exavault and AWS S3 integrations - we can know upload to as well as download from these two cloud providers.

Key WordDescription
@MAPPINGMap this var value to this new var value
@FILTERKeep rows where this var equals this value
@ADVERTISED LISTSpecify date + item combinations
@GROUPCreate a group of stores, items, countries
@COLUMN REDUCEKeep only these columns
@REPLACEReplace this unicode character with this value.
@RELABELChange the name of a column from this to that.
@COLUMN ORDERPut columns into this order prior to merge.
@PRESENCEReturn list of unique values in this column.
@SAMPLEKeep between 0.1% and 99.9% of rows.
@FUNCTIONApply this function for each row.
@FORMATStandardize format of this column
@DYNAMIC DATAImplement an API
@MASKEncrypt this var salted with a value
@COLUMN MERGECombine these columns in to a new column

New Feature: Productization of the Production of Data Products

As we work with more closely with our partner company DecaData, we are building tools and features that help bring data products to market and then deliver them to customers.  A lot of this is repetitive process work, making it ideal for automation. Furthermore, if data is the new oil, we need an oil-rig, refinery and pipeline to manage this new commodity.

Our new feature implements these operations. Users can now create automated, time-triggered pipelines that import new data files and then perform a set of customizable operations before delivering them to customers via SFTP or to an AWS S3 bucket.