New Feature: Handling Event-Type Data

I have spent a lot of time thinking about data and data structures. What I have learnt is that there are two types of data structures; data which has only one row per user (e.g survey data) and data which has one row for each unique user event (i.e. click stream data from an app or website) and multiple rows for any user.

Many web-based analytics platforms, like Amazon's own ML platform, only let its users upload data that has a simple data structure (one row per user such as survey data and customer profile data). Very few platforms allow users to upload event-type data and engineer it into a simple form that can be used in predictive analytics.

Transforming event data requires data engineering and this process can be daunting. To develop Knowledge Leaps further, we have spent a lot of time looking at a wide range of event-type data use cases. Our aim has to been to create a systematic, easy-to-use (given the task) approach to simplifying the data engineering work flow. As with our models, we also want our user interface and processes to be human-readable too.

In our latest release we are launching the Data Processor module. The design of this module has drawn heavily on working with real-world event data.  This new feature allows the platform to take in any data type and perform simple processing rules to create analytics-ready data sets in minutes.

Unbiased Analytics

When we excitedly tell people that the new version of Knowledge Leaps incorporates k-fold validation, their eyes glaze over. When we tell people about the benefits of this feature, we usually get the opposite response.

In simple terms, k-fold validation is like having a team of 10 pHDs working on your data, independently and simultaneously. The application doesn't produce just one prediction, it makes 10 which are all independent of one another. This approach outputs more general models, these are closer to a rule of thumb and are consequently useful in more contexts. Another step toward human-centered analytics without the human bias.