No Code Data Engineering #2

We are adding to our no-code data engineering use cases. Our new Collection Manager feature plugs data pipelines into databases with no code just using a simple drag-and-drop interface.

This feature allows users with zero knowledge of databases and query languages to import data into a database. An additional UI will then allow them to create queries, aggregations and extracts using a simple UI.

The UI can be set up to update the database with new data as it is arrives from external sources, it will also automate extract creation as new data is added.

Example use-cases for this feature would be in the creation of data feeds for dashboards that auto-populate, or creating custom data products which can be timed with a guaranteed delayed delivery time. This feature will also drive our retail experimentation business – we can design and set up a data framework that captures and tags the results from test-and-learn activity.

A Specialized Solution for General Use Cases

It doesn’t take long to realize that data is agnostic to source. Web logs look similar to retail transaction data. Survey data and customer profile data can be handled in the same way.

When building a data platform, finding a niche is hard because if data is source-agnostic, so is the platform. The platform we have built is a generalist product; point it at any data source or stream and it will be useful. What links all the data sources together is another generic concept – customers.

Generalist products are hard to sell, primarily because it is hard to find audience insights that help contextualize the problems the product can solve.

For the past five years of building the product we have found it is easier to write code than create a succinct product proposition. In recent weeks, some ideas have been crystalizing and we landed on this:

Knowledge Leaps is a customer data platform for storage, engineering, and analytics of all types of customer data.

Knowledge Leaps is a cloud-based data management platform that allows for collaborative analytics, data management and data-workflow management.

Code-free Data Science

There will always be a plentiful supply of data scientists on-hand to perform hand-cut custom data science. For what most businesses requirements, the typical data scientist is over-skilled. Only other data scientists can understand their work and, importantly, only other data scientists can check their work.

What businesses require for most tasks are people with the data-engineering skills of data scientists and not necessarily their statistical skills or their understanding of a scientific-method of analysis.

Data engineering on a big scale is fraught with challenges. While Excel and Google Sheets can handle relatively large (~1mn row) datasets there is not really a similar software solution that allows easy visualization and manipulation of larger data sets. NoSQL / SQL-databases are required for super-scale data engineering, but this requires skills of the super-user. As ‘data-is-the-new-oil’ mantra makes its way into businesses, people will become exposed to a growing number datasets that are beyond the realm of the software available to them and, potentially, their skill sets.

At Knowledge Leaps we are building a platform solution for this future audience and these future use-cases.The core of the platform are two important features: Visual Data Engineering pipelines and Code-Free Data Science.

The applications of these features are endless; from building a customer data lake, or building a custom-data-pipeline for report generation or even creating simple-to-evaluate predictive models.

Competing For Space vs. Competing For Resources

On recent visit to Southwest Utah I saw lots of pygmy forests containing pinyon pines and small oak trees, these forests are sparse and the trees no more than 8-10 feet tall. The National Park literature says that these trees have adapted to low water conditions. Contrast this with the Redwood forests of coastal California where resources (water & sunlight) are abundant. In this environment the trees are more densely packed and grow much taller.

Replace trees with firms and resources for customers, and this paragraph could describe a business landscape. Being binary for a moment, a new firm gets to choose between choosing to enter a market where resources (customers) are slim or to enter a market where there are lots of customers. Choosing a market with few customers, makes it easier to differentiate your firm but the odds of survival are worse. Choosing a market with more customers makes it harder differentiate your firm and therefore the survival odds are also tough.

Unless of course, your firm is first. In both instances you get to choose the best position and consume all available resources.

Giant Sequoia

Building A Product. Lessons Learned.

Some thoughts on what I have learnt by working in a new company that is building software. A lot of what you “should” do is the wrong thing to do. Here are some reflections on building a firm in San Francisco.

Prospects First

Speaking to prospect firms will get you further, faster than speaking to venture capital firms. Firms that have pain points will pay for solutions and they won’t care so much how many other firms have the same pain point. Venture capital firms are interested in size of market, size of outcome, probability of success, experience of the team. Answering a VC’s questions won’t necessarily help you build a product and a business. If you can’t afford to build the software that will answer the pain point you are trying to solve, then work out what you can build and how you can bridge the gap using other means.

Perform The Process By Hand, Before Writing Code

The best business software is first cut-by-hand like the first machine screw. If your software replaces a human-business-process and you can’t afford to build the software,  ask yourself ‘how much can my firm afford to build?’

Most processes have the same elements: Task Specification, Task Execution, Present Results. The most complex part of this is Task Execution as this will require a lot of code and a lot of investment. As your company speaks to firms work out if it is possible to use humans to perform the complex Task Execution element. If you think it is then you should build a software architecture and framework that allows humans to do the hard work at first. This will help you refine the use-case and build more effective and efficient code. This also wouldn’t be the first time this has been done, see here and here for more background.

A useful piece of military wisdom is worth keeping in mind; no plan survives first contact with the enemy.  While customers are certainly not the enemy, the sentiment still holds. It’s not until you put your plan in to action and have firms use your product that you realize its true strengths and weaknesses. Here begins the process of iterating product development.

“Speak to people, we might learn something”

This is what my business development lead says a lot. He also asks questions that get customers and prospects talking. In these moments you will learn about the firm, the buyer, the competition, and lots of other information that will make your product and service better.

“We are just starting out”

This is another useful mantra. In lots of ways we do not know where our journey will take us. It is part inspired by company vision but also customer feedback. In Eric Beinhocker’s book, The Origin of Wealth, he likens innovation to the process of searching technology-solution-space, an innovation map, looking for high points (that correlate with company profits and growth). The important part of this search process is customer feedback. What your company does determines you starting point on the innovation map, how your firm reacts to customer and market feedback determines which direction you will go in, and ultimately will be a critical factor in its success.

Platforms In Data

Data-is-the-new-oil is a useful framework for describing one of the use-cases we are developing our platform for.

Rather than their being just one platform in the create-process-deliver-use data analytics pipeline, a number of different platforms are required. The reason we don’t fill our cars up with gasoline at our local oil rig is the same reason why data distribution requires a number of different platforms.

Data Platforms

The Knowledge Leaps platform is designed to take raw data from our providers, process and merge these different data feeds before delivering to our customers internal data platforms. Just like an oil-refinery produces the various distillates of crude-oil, the Knowledge Leaps platform can produce many different data products from single or multiple data feeds.

Using a simple UI, we can customize the processing of raw data to maximize the value of the raw data to providers as well as its usefulness to users of the data products we produce.

Beware AI Homogenization

Many firms (Amazon, Google, etc) are touting their plug-and-play AI and Machine Learning tool kits as being a quick way for firms to adopt these new technologies without having to invest resources building their own.

Sound like a good idea but I challenge that. If data is going to drive the new economy, it will be a firm’s analytics capabilities that will give it a competitive advantage. In the short-term adopting a third-party framework for analytics will move a firm up the learning curve faster. Over time this competitive edge becomes blunter, as more firms in a sector start to use the same frameworks in the race to be “first”.

This homogenization will be good for a sector but pretty rapidly firms competing in that sector will be soon locked back in to trench warfare with their competitors. Retail distribution is a good example, do retailers use a 3rd party distribution network or do they buy and maintain their own fleet. Using a 3rd party distributer saves upfront capex but it voids an area of competitive advantage. Building their own fleet, while more costly, gives a retailer optionality about growth and expansion plans.

The same is true in the rush for AI/ML capabilities. While the concepts of AI / ML will be the same for all firms, their integration and application has to vary from firm-to-firm to preserve their potential for providing lasting competitive advantage. The majority of firms we have spoken to are developing their own tool kit, they might use established infrastructure providers but everything else is custom and proprietary. This seems to be the smart way to go.