Use Data to Tell the Future: Understanding Machine Learning


  • BY MARTIN HACK, SKYTREE 03.17.14 11:41 AM

When Amazon recommends a book you would like, Google predicts that you should leave now to get to your meeting on time, and Pandora magically creates your ideal playlist, these are examples of machine learning over a Big Data stream.

With Big Data projected to drive enterprise IT spending to $242 billion according to Gartner, Big Data is here to stay, and as a result, more businesses of every size are getting into the game. To many enterprise organizations Big Data represents a strategic asset — it reflects the aggregate experience of the organization. Each customer, partner, or supplier response or non-response, transaction, defection, credit default, and complaint provides the enterprise the experience from which to learn. From a consumer perspective, every action performed online, every sales process, product interaction, prescribed drug, and environmental anomaly, is being tracked by various sources.

In recent years, companies have focused on how to store and manage this data. How should we best architect our enterprise stack to gain value from Big Data in terms of Hadoop, complex event processing, NoSQL and traditional data warehouses? Should we host our data on-premise or on the cloud?

These are fair questions to ask, but they don’t get to the core of why Big Data is a big deal. Only with advanced analytics, and specifically machine learning, can companies truly tap into their rich vein of experience and mine it to automatically discover  insights and generate predictive models to take advantage of all the data they are capturing. This advanced analytics technology means that instead of looking into the past for generating reports, businesses can predict what will happen in the future based on analysis of their existing data. The value of machine learning is rooted in its ability to create accurate models to guide future actions and to discover patterns that we’ve never seen before.

Defining Machine Learning

There is a lot of confusion about what machine learning is in the Big Data ecosystem. Many software vendors claim they do predictive analytics, deep learning, prescriptive analytics, and machine learning. It’s time we defined these terms, so that vendors and buyers know what to expect from a given software solution and where its value lies.

Machine learning is the modern science of finding patterns and making predictions from data based on work in multivariate statistics, data mining, pattern recognition, and advanced/predictive analytics.

Machine learning methods are particularly effective in situations where deep and predictive insights need to be uncovered from data sets that are large, diverse and fast changing — Big Data. Across these types of data, machine learning easily outperforms traditional methods on accuracy, scale, and speed. For example, when detecting fraud in the millisecond it takes to swipe a credit card, machine learning rules not only on information associated with the transaction, such as value and location, but also by leveraging historical and social network data for accurate evaluation of potential fraud.

Machine learning methods are vastly superior in analyzing potential customer churn across data from multiple sources such as transactional, social media, and CRM sources. High performance machine learning can analyze all of a Big Data set rather than a sample of it. This scalability not only allows predictive solutions based on sophisticated algorithms to be more accurate, it also drives the importance of software’s speed to interpret the billions of rows and columns in real-time and to analyze live streaming data.

For those of us who are practicing and developing machine learning technology, it’s no longer sufficient to provide the ability to achieve the most accurate, fast, and scalable predictive insights. Ultimately, for machine learning to impact the world around us in a truly meaningful way, we have to deliver Machine Learning in a smarter, more usable form. By enabling not only the data scientists who have PhDs but also the business users to tap into the state-of-the-art machine learning technology, we will truly bring this technology to the masses and dramatically accelerate time-to-insight for organizations of all sizes.

Comparing Big Data Analytics Software

When looking into buying software for Big Data analytics, companies should keep three thoughts in mind:

  1. Best-in-class Machine Learning Software. Because of the size, variety and speed of Big Data, many of the traditional techniques run into limitations. Analytic solutions based on machine learning are best suited for fast changing data, large variety of unstructured data and the sheer scaling issues associated with Big Data.
  2. Machine Learning for Your Business. Typical enterprise organizations use machine learning software to develop predictive models that are used in multiple applications such as churn analysis and prevention, real-time recommendation, and fraud analysis and prevention. As a result, the capability to easily integrate machine learning-based technology to enterprise software environment is an important, if not glamorous, requirement.
  3. Accessible Interface. In a hyper competitive world, enterprise organizations need to deploy advanced analytics solutions quickly. As a result, machine leaning based analytic platform must be both easy-to-use by a diversified group of users and enable fast time-to-insight.

We live in an era of Big Data. While earlier paradigm shifts in businesses were powered by steam engines, carbon products, electrical power, semiconductors, computers, and the Internet, we are currently experiencing a boom driven by Big Data. We have a tremendous opportunity to discover insights that can lead to better and faster business decisions. In this age of Big Data, organizations that can realize value from their data assets faster through advanced analytics such as machine learning will become winners and others will be left behind.

So the question is, do you just want to store your data, or do you want to put it to work creating real business value?

Martin Hack is co-founder and CEO of the machine learning company Skytree.


Comments are closed.

%d bloggers like this: