How Big Data analysts reappropriate algorithms from evolution and warfare

By Olivia Solon 06 January 12

The amount of data we generate is exploding, and the ability to analyse large amounts of data is a key differentiator for businesses seeking to gain competitive advantage.

Opera Solutions is one Big Data company that is harnessing predictive analytics to inform the business decisions of its clients. spoke to Jacob Spoelstra, vice president of analytics at Opera, about how they are taking the algorithms from electrical engineering, physics and maths and applying them to business data.

What Big Data analysts can learn from anti aircraft guns
Kalman Filters were used in the 1960s to help shoot down aeroplanes using anti-aircraft guns. When targeting a fast-moving plane, you need to be able to predict where the plane is going. This means you need a model of the object’s movement and an estimate of its state, i.e., its position, velocity and acceleration. You then keep an eye of the object by making observations to update your prediction about where the aircraft will go, calibrating the model with reality. Clearly measurements and observations can be a bit noisy thanks to imperfect sensors and measurement equipment. As such the Kalman Filter must find the optimal way of using these noisy observations to minimise inaccuracy.

How is this applied to a business setting? Opera Solutions uses it for a client to help work out how to price used vehicles that are sold at auction. Each vehicle has an estimated market value based on its exact specification; for example, people tend to pay more for a white car over a green one. Other factors include mileage, leather versus plastic interiors, etc. This information can be compared to old models that have actually been sold. Spoelstra explains: “Kalman comes in once you have priced the vehicle and it actually sells at auction. You use that observation to update the state-of-the-world model. This leads us to better prediction models. We can be 20-25 percent better than more standards approachesin rapidly adapting to change in a market.” The aim is to reduce the mean absolute error (a little bit like standard deviation). “it is not that hard to be accurate in predicting the mean price of a bunch of similar vehicles, but our goal is to be close to the actual price for every individual vehicle. Companies lose money when a vehicle is severely under or over-priced,” added Spoelstra.

What Big Data analysts can learn from genetics
It’s not just anti-aircraft targeting algorithms that Big Data can harness. It can also learn from nature. Using genetic algorithms can be particularly useful when you have an optimisation problem with lots of different potential solutions. It can be hard to find the right one. For example, a pay TV provider might be challenged to schedule its content across multiple channels. How do you keep the most people happy at any one time? What is the optimal schedule? Spoelstra explains: “If you have 100 items you want to put into 20 channels, there are billions of different combinations. It’s not always obvious what you need to change to make it better.”

A genetic algorithm would create a population where each “individual” has a genome with a string that encodes a potential schedule over the channels. You run simulations based on assumptions such as the popularity of the show, if they liked it but there was something else on at the time, etc in order to evaluate how many people would be satisfied by a particular combination. You keep the ones which perform the best. These can then be paired to produce scheduling “offspring” or mutated by randomly swapping something out to generate the next generation. This next generation tends to contain more individuals with better solutions. The process continues until, after a few hundred or so generations, you end up with a solution that is often close to optimal. This same technique has applications for scheduling physicians in emergency rooms in hospitals.

What Big Data analysts can learn from neural networks
The Restricted Boltzmann Machine is a learning algorithm developed by Geoffrey Hinton and Terry Sejnowski. The Boltzmann distribution is a very complex probability measure for the distribution of the states of a system, for example the energy levels you might expect to find in a gas. The same formulation can be applied to data wherever you have a state when you have elements of zeros and ones. For example, you might have a load of people who have ranked a load of movies. The machine can learn the distribution of data and find patterns that occur to see what people prefer. So it might find that certain people will watch more comedy. By learning patterns of movies, the Restricted Boltzmann Machine can predict what movies people might like based on how they and others have ranked movies within a system such as Netflix. In fact, this approach was used as an application for the Netflix Prize to create a better recommendation algorithm.

This same technique can be used to analyse complicated private hospital bills and detect where charges for certain treatments may have been left off the human-assembled bills.


Comments are closed.

%d bloggers like this: