Data Science Big Data Differences Between
What is Big Data, What is Data Science and the Difference Between Data Science and Big Data
Ok, so lets first start defining what is Big Data and what is Data Science.
What is Big Data according to the wikipedia?
Big data is a field that treats ways to analyze, systematically extract information from, or otherwise deal with data sets that are too large or complex to be dealt with by traditional data-processing application software. Data with many cases (rows) offer greater statistical power, while data with higher complexity (more attributes or columns) may lead to a higher false discovery rate. Big data challenges include capturing data, data storage, data analysis, search, sharing, transfer, visualization, querying, updating, information privacy and data source. Big data was originally associated with three key concepts: volume, variety, and velocity. When we handle big data, we may not sample but simply observe and track what happens. Therefore, big data often includes data with sizes that exceed the capacity of traditional software to process within an acceptable time and value.
Whats is Data Science according to the wikipedia?
Data science is an inter-disciplinary field that uses scientific methods, processes, algorithms and systems to extract knowledge and insights from many structural and unstructured data. Data science is related to data mining, deep learning and big data. Data science is a "concept to unify statistics, data analysis, machine learning, domain knowledge and their related methods" in order to "understand and analyze actual phenomena" with data. It uses techniques and theories drawn from many fields within the context of mathematics, statistics, computer science, domain knowledge and information science.
i will be writting about big data BG and data science DS back and forward so you can compare them.
“Big Data is essentially the data itself,”. Big data exists whenever you have too much data to hold on a single personal computer, and it can take countless different forms. It might be an extensive image data set used to train a neural net, for example, or an immense video archive used to train an algorithm in gesture tagging. It could even be a time-stamped year-by-year archive of the entire internet. Because it’s too much data to store locally, a data scientist typically pulls files from the cloud to work on them, or samples the data randomly, simply because there is such an abundance available.
There are “dimensions” that distinguish data from BIG DATA, summarised as the “3 Vs” of data: Volume, Variety, Velocity. Hence, BIG DATA, is not just “more” data. It is so much data, that is so mixed and unstructured, and is accumulating so rapidly, that traditional techniques and methodologies including “normal” software do not really work (like Excel, Crystal reports or similar). Gartner stated that in 2011, the rate of data growth globally was around 59%. This means that almost 40% of all data ever created was created in the previous year and I am sure it is even more now.
Data Science Acombination of mathematics, statistics, programming, the context of the problem being solved, ingenious ways of capturing data that may not be being captured right now plus the ability to look at things ‘differently’ and of course the significant and necessary activity of cleansing, preparing and aligning the data. So in the strawberry industry we’re going to be building some models that tell us when the optimal time is to sell, which gives us the time to harvest which gives us a combination of breeds to plant at various times to maximize overall yield. We might be short of consumer demand data – so maybe we figure out that when strawberry recipes are published online or on television, then demand goes up – and Tweets and Instagram or Facebook likes provide an indicator of demand. Then we need to align demand data up with market price to give us the final insights and maybe to create a way to drive up demand by promoting certain social media activity.
DS is a set of tools and techniques employed to analyze and model data. The primary goal of data scientists is to solve problems by figuring out the right questions to ask, and using data and algorithms to discover the answers. Far from being one unified field, data science encompasses innumerable specialized domains, often markedly different from one another. “If you talk to any two data scientists and compare their knowledge bases, there will be an intersection between them. But that intersection might be much smaller than you’d expect,”.
“Big data storage is usually distributed across many, many computers in some storage facility somewhere. You can only access bits and pieces of it, or you can access it in a cloud-based paradigm from your local computer. But you don’t actually pull all of the big data down to your local computer,” Until a few years ago, many companies were trying to store as much data as they possibly could, in so-called data seas. The data was mostly unstructured, disorganized, and frequently served no discernible purpose. “They often ended up paying millions of dollars per year to store data that they never figured out how to use, and maybe wasn’t even helpful. They were just sitting on it,”.
Thus, DG “BIG DATA” can be a summary term to describe a set of tools, methodologies and techniques for being able to derive new “insight” out of extremely large, complex sample sizes of data and (most likely) combining multiple extremely large complex datasets. The potential here is that if we crunch true BIG DATA, we can make an attempt to establish patterns and correlations between seemingly random events in the world. Then, by establishing and testing hypotheses, we could understand causality, so predictions and deep insights could be made.
BG using all of the data available to provide new insights to a problem. Every day trillions and trillion bytes of data are generated by billions of devices that are connected to the Internet; so much that 90 percent of the data in the world has been generated in the past couple of years. A lot of data is being generated each minute this massive volume of data comes from various sources; from weather patterns, stock prices, machine sensors, posts to social media sites, digital pictures and videos, transaction records, payment histories, sensors used to gather climate information, or GPS signals, to name just a few. Collectively, this data is called Big Data. Analyzing and processing such massive volume of both structured and unstructured data is almost impossible using traditional software techniques. It is not merely a matter of size; Big Data is an opportunity to find insights in new and emerging types of data and content that lead to better decisions and strategic business moves, and the data is liable to change at any time (e.g. a new source of social media data that is a great predictor for consumer demand).
DS Domain knowledge is extremely important, however. The kinds of data, models, techniques, and results you can expect vary widely depending on the field you’re in. You won’t be doing the same things in a startup looking to revolutionize advertising as you will be in a startup in the cryptoasset space.
BG Today, many more excellent tools, platforms and ideas exist in the field of good management of data (not just BIG DATA). This creates an enormous and immediate potential for the Public Sector in making relevant and timely improvements in “small” data management, data integration and visualisation. Most importantly, in integrating “small” data into the real time decision making of public servants and making it useful. I think this is best achieved by not being distracted by fancy and fashionable titles such as BIG DATA, but focusing on boring (but essential) transformation of the Public Sector. Let’s have a “small” data (or just plain old “data” conference. Less sexy, but more useful.
DS Data scientists and BG Big Data are in high demand in almost every industry today, especially finance and insurance, information technology, professional services, and scientific and technical services.
Sources:
https://en.wikipedia.org/wiki/Big_data
https://en.wikipedia.org/wiki/Data_science
https://blog.galvanize.com/what-is-the-difference-between-big-data-and-data-science/
https://www.mo-data.com/what-is-the-difference-between-data-analytics-data-analysis-data-mining-data-science-machine-learning-big-data-and-predictive-analytics/
http://www.differencebetween.net/technology/difference-between-iot-and-big-data/
https://digileaders.com/whats-difference-big-data-data/