There will always be a plentiful supply of data scientists on-hand to perform hand-cut custom data science. For what most businesses requirements, the typical data scientist is over-skilled. Only other data scientists can understand their work and, importantly, only other data scientists can check their work.
What businesses require for most tasks are people with the data-engineering skills of data scientists and not necessarily their statistical skills or their understanding of a scientific-method of analysis.
Data engineering on a big scale is fraught with challenges. While Excel and Google Sheets can handle relatively large (~1mn row) datasets there is not really a similar software solution that allows easy visualization and manipulation of larger data sets. NoSQL / SQL-databases are required for super-scale data engineering, but this requires skills of the super-user. As ‘data-is-the-new-oil’ mantra makes its way into businesses, people will become exposed to a growing number datasets that are beyond the realm of the software available to them and, potentially, their skill sets.
At Knowledge Leaps we are building a platform solution for this future audience and these future use-cases.The core of the platform are two important features: Visual Data Engineering pipelines and Code-Free Data Science.
The applications of these features are endless; from building a customer data lake, or building a custom-data-pipeline for report generation or even creating simple-to-evaluate predictive models.
The patent that has just been awarded to Knowledge Leaps is for our continuous learning technology. Whether it is survey data, purchase data or website traffic / usage data., the technology we have developed will automatically search these complex data spaces. The data spaces covers the price-demand space for packaged goods, or the attitudinal space of market research surveys and other data where there could be complex interactions. In each case, as more data is gathered – more people shopping, more people completing a survey, more people using an app or website – the application updates its predictions and builds a better understanding of the space.
In the use-case for the price-demand for packaged goods, the updated predictions then alter the recommendations about price changes that are made. This feedback loop allows the application to update its beliefs about how shoppers are reacting to prices and make improved recommendations based on this knowledge.
In the survey data use-case, the technology will create an alert when the data set becomes self-predicting. At this point capturing further data is unnecessary to understand the data set and carries an additional expense.
The majority of statistical tools enable analysts to identify the relationships in data. In the hands of a human, this is a brute-force approach and is prone to human biases and time-constraints. The Knowledge Leaps technology allows for more systematic and parallelized approach – avoiding human bias and reducing human effort.
I’ve been analyzing data for 30 years.
I’ve studied science and engineering.
I read about science.
I taught myself to code and created a data engineering and analytics web application (KnowledgeLeaps.com).
I have thought a lot about my work as I have developed my domain expertise.
One of the things I have come to realize is that if you are designing and running experiments to prove / disprove a hypothesis, then you are performing science. You may even call yourself a scientist. Testing a hypothesis requires evidence, usually in the form of objective data. If you are a scientist, you use data, no matter what the discipline. You can’t be a scientist without data. The term data in Data Science is redundant, like calling yourself a Religious Priest or an Oral Dentist.
In contrast, if you use data to look for a story or a correlation (causal or otherwise) and you aren’t testing a hypothesis. You aren’t a scientist, you are an analyst. In this case a qualifying noun is useful (data, systems, etc).
I suspect most people who call themselves data scientists are actually analysts that have taught themselves to write code. This isn’t science. Science is the method by which we create persistent state knowledge, code is just a tool we should use to process data to test a hypothesis.
Having worked within an industry (market research) for some time, I am intrigued how other occupations use data, particularly data scientists. After a conversation with a new friend – a data scientist – last week I had a revelation.
Data scientists have created a new words to talk about data analysis. The ones that stand out are features and feature sets. Quantitative market researchers talk about questions and surveys but never features. Essentially, they are the same thing; features are traits, attributes and behaviors of people that can be used to describe/predict specific outcomes. The big difference is that data scientists don’t care so much that the features are not human-readable (i.e. they can be read and understood like a book), as long as they help make a prediction. For example, Random Forests make good predictors but aren’t easily understandable. The same is true of Support Vector Machines. Excellent predictors but in higher dimensions they are hard to explain.
In contrast, market researchers are fixated on the predictive features being human-readable. As data science has shown, a market researcher’s predictions, their stories, will always be weaker than those of a data scientist. This in-part explains the continued trend of story-telling in market research circles. Stories are popular, and contain some ambiguity, this ambiguity can allow people to take out from them what they wish. This is an expedient quality in the short term but damaging long term to the industry.
I think market researchers need to change, my aim with Knowledge Leaps is to try and bridge the gap between highly predictive features and human-readable stories.