Data Science is one of the hottest industries in technology right now and is seeing lots of innovation in its various subdomains like Descriptive analytics, Diagnostic analytics, and Predictive analytics. Recent estimates by IBM research have put the growth in Data Science related jobs at around 28%, which is among the highest of any fields. According to the IBM research, about 59% of these jobs will be related to Finance and Insurance, Professional Services, and Information Technology. Therefore, it’s quite natural to ask how Blockchains, another revolutionary FinTech can work together with Data Science to further both fields? Here’s an in-depth look at both the technologies.
Data Science – The Basics
Data Science refers to the use of a host of scientific methods, processes, algorithms, and systems to extract knowledge or insights from data sets. Sometimes referred to as Big Data analysis, Data Science essentially provides insights that are easy to miss when merely looking at the data. The insights are gleaned using Linear Regression, Logistic Regression, Pattern Recognition and other sophisticated mathematical techniques. It has a wide range of real-world use cases, like speech recognition, self-driving cars, and spam identification. One of the core goals of Data Science includes the use of Big Data analysis to make machines more autonomous so that they can function without any human intervention.
Blockchains and Data Science
Blockchains are primarily the “trust protocol” of the internet, meaning that they help keep sensitive information secure on the internet. However, keeping information secure means that there is an overhead for storing data on the blockchain which is prohibitive to Data Science. This is because Big Data analysis relies on analyzing huge quantities of data to come up with models. Instead, Blockchains can help by making the data stored in other servers more secure by using timestamps and proof of ownership systems like Factom. Additionally, Blockchains have to potential to improve several key elements of Data Science related to data collection, to distributed computing, and predictive analytics.
- Data Collection – Data is the primary raw material of Data Science models. For example, to make a model for self-driving cars, Data Scientists require millions of hours worth of actual vehicle driving data. Collecting and organizing this data makes up a vast percentage of the total work. Since these Artificial Intelligence models rely on the principle of “Garbage In, Garbage Out,” it is essential to make sure that the data is authentic and untampered. This is one the primary uses of Blockchains as they can help bypass intermediary sources of error. Using data integrity services like Factom, we can make the vehicle driving data directly accessible to data scientists. This ensures the quality and authenticity of the data while greatly increasing speed and bringing down the overall cost of auditing.
- Distributed Computing – Once the relevant data is obtained and processed, it still has to be analyzed. Analyzing these vast quantities of data requires a huge amount of processing power. Individuals rarely have that much computing power at their disposal and therefore rely on expensive cloud computing platforms like Google Cloud and Amazon Web Services. Ethereum based Golem project is working on an implementation of the world’s largest, decentralized super-computer that will give individuals the power to purchase computing resources from people who have idle computers directly. Not only does decentralized computing bring the cost down for the individual, but it also makes the process more secure as there is no third party involved.
- Predictive Analysis – Predictive analytics is the use of data, statistical algorithms and machine learning techniques to identify the likelihood of future outcomes based on historical data. Current machine learning methods have been shown to be inefficient when predicting complex social outcomes like election results. Predictive analytics, which employs the wisdom of the crowds, has been shown to be quite accurate when it comes to accurately predicting social phenomenon. Several studies related to predictive analysis have shown that since each in a crowd has their own unique bias, gathering a large sample size in a crowd helps cancel out all of the competing individual biases, thus providing very reliable, accurate predictions. Augur and Gnosis are two blockchain based projects which already have a working platform for making bets related to such phenomenon.