We have been building some product forecasting models using Monte Carlo methods. Sales distributions are often skewed right. Using normal approximations tends to over inflate forecast estimates, since the distribution is not centered around the mean. Further more the standard deviation of skewed distributions tends to produce estimates with very wide variances – by definition.
To overcome this, we use a Monte Carlo simulator – that draws from the sales distribution at random. Creating a sample of many estimates not only gives a more accurate estimate, it is also helps us calculate more realistic margins of error.
New features rolled out this week:
- Apply filters and mapping files to other filters and mapping files. This feature helps create randomized lists and sub filters based on new criteria. For example, extract a list of userIDs from a data file, apply gender from a look up table. Then filter this list by gender to create a specific list of users. This new file can then be sampled randomly to create a new list of random userIDs that meet a specific criteria.
List of features/fixes in latest app release:
- Download file compression by default. When users download data to their local computers they are compressed by default.
- Merge data now runs in background: some users were struggling trying to combine multi-GB data files. We now merge large data sets in the background to avoid memory issues.
- Server-less charting: All charting has been pushed to server-less environment.
We launched a new feature today. The Knowledge Leaps platform allows users to specify hundreds of charts with a few clicks. For example a user can plot sales by date split by store ID using a simple flow. This can lead to 1000s of charts being produced, each one derived from millions of lines of data.
When you are building data products and filtering data files, it is important to keep track of what you have combined to make a new data set and what you have removed. This feature has saved us countless hours.
From an audit perspective we can build a complete history of a dataset – when it was added to the platform, how it was processed and when/who/where it was delivered / downloaded. This takes a removes a time-draining communication burden from our teams.
We can also add commentary and narratives to a data set. This helps us build transparency and persistent-state knowledge about data.
Building a system that is 100% autonomous and makes its own decisions is both hard and high risk. Given that Amazon, with all its resources and smarts, uses human input for the low/no consequence AI built into Alexa, it is fairly safe to assume that *all* other firms making AI claims have a human involved in at least one critical step.
Until recently, I have wrestled with why people I knew growing up in a small village in the UK stayed in the village when there was a whole world of opportunities awaiting discovery. I have come to realize that life is a search process. A search for purpose, contentment and security. As with most search algorithms, some are better than others. Some peoples’ search algorithm stop when they discover a local maxima – such as village life in the UK. Other algorithms drive people to travel much further.
Software development follows similar principles to a search algorithm. While we might think that we are heading towards a peak when we start out building an application, we soon discover that the landscape we are searching is evolving. If we rush too quickly to a peak we might find that we settle on a local rather than a global maxima. Facebook is a good example of the impact of search speed. The reason that Facebook prevailed is that the many social networking sites that came before it provided the company with a long list of technical and business mistakes to avoid. A major lesson was controlled growth – in other words, a slow search algorithm. Avoiding the strong temptation, especially when a social network is concerned, to grow very rapidly.
This is an example of a good search process and how it has to be a slow one for long term success. A slow search allows a process to find a stable solution. The Simulated Annealing Algorithm is a good example of this. The random perturbations applied to the search result diminish overtime as the solution gets closer to the optimum search result. The occasional randomness ensures the search doesn’t get stuck on a solution.
We have also been running our own, slow search algorithm as we build Knowledge Leaps. We have been at this for a long time. App development began five years ago, but we started its design at least eight years ago. While we wanted to go faster, we have been resource constrained. The advantage of this is that we have built a resilient and fault-tolerant application. The slow-development process has also helped foster our design philosophy, when we began we wanted to be super-technical and build a new scripting language for data engineering. Over time our beliefs have mellowed as we have seen the benefits of a No Code / Low Code solution. Incorporating this philosophy into Knowledge Leaps has made the application that much more user friendly and stable.
If you want to see how good Alexa is at answering people’s questions you should sign on to Alexa Answers and see the questions Alexa cannot answer. This site has gamified helping Alexa answer these questions. I spent a week doing this and figured out a pretty good work flow to stay in the top 10 of the leader board.
The winning strategy is to use Google. You copy the question in to Google and paste the answer Google back in to the Alexa Answers website for it to played back to the person who asked it. The clever thing is that since it is impossible to legally web-scrape Google.com at a commercially viable rate, Amazon have found a way of harnessing the power of Google without a) having to pay, b) violating Google.com’s TOS, and c) getting caught stealing Google’s IP.
After doing this for a week, the interesting thing to note is why Alexa could not answer these questions. Most of them are interpretation errors. Alexa misheard the question (e..g connor virus, coronda virus, instead of coronavirus). The remainder of the errors are because the question assumes Alexa’s knowledge of the context (e.g. Is fgtv dead? – he’s a youtube star) and without the subject of the question being a known entity in Alexa’s knowledge graph, the results are ambiguous. Rather than be wrong, Alexa declines to answer.
Obviously this is where the amazing pattern matching abilities of the human brain come in. We can look at the subject of the question and the results and choose the most probable correct answer. Amazon can then augment Alexa’s knowledge graph using these results. This is probably in violation of Google’s IP if Amazon intentionally set out to do this.
Having a human being perform the hard task in a learning loop is something that we have also employed in building our platform. Knowledge Leaps can take behavioral data and tease out price sensitivity signals, using purchase data, as well as semantic signals in survey data.
On a demo of our application to a prospective customer, the instant feedback was “this looks easier to use than Alteryx”. We’ll take that sort of compliment any day of the week.
During an interview between Shane Parrish and Daniel Kahneman, one of the many interesting comments made was around how to make better decisions. Kahneman said that despite studying decision-making for many years, he was still prone to his own biases. Knowing about your biases doesn’t make them any easier to overcome.
His recommendation to avoid bias in your decision making is to devolve as many decisions as you can to an algorithm. Translating what he is saying to analytical and statistical jobs suggests that no matter how hard we try, we always approach analysis with biases that are hard to overcome. Sometimes our own personal biases are exaggerated by external incentive models. Whether you are evaluating your bosses latest pet idea, or writing a research report for a paying client, delivering the wrong message can be costly, even if it is the right thing to do.
Knowledge Leaps has an answer. We have built two useful tools to overcome human bias in analysis. The first is a narrative builder that can be applied to any dataset to identify the objective narrative captured in the data. Our toolkit can surface a narrative without introducing human biases.
The second tool we built removes bias by using many pairs of eyes to analyze data and average out any potential analytical bias. Instead of a single human (i.e. bias prone analyst) looking at a data set our tool lets lots of people look at it simultaneously and share their individual interpretation of the data. Across many analysts, this tool will remove bias through collaboration and transparency.
Get in touch to learn more. email@example.com.