It is true that we are under data attack constantly from everywhere & some spreadsheets are taking desperately long to open, but more rows & columns & a file bigger in size than usual do not simply result in Big Data. I would instead prefer treating it just like another buzzword – mostly introduced as new but indeed a re-branding of an old yet fundamental concept we continually resort to. Here is why.
Big Data actually refers to a set of attributes that need to be present within the data of interest: (i) first, and maybe as the most distinctive one of those, it is unstructured (e.g. text to be mined from social media posts) as opposed to a data set firmly structured by fields & data generation rules; (ii) second, and in fact contributing to the first one, it is continuously generated mostly on a real-time basis & extracted from a number of typically independent sources (e.g. web logs, IoT / M2M platforms of Industry 4.0 era etc. – let’s keep these concepts for a later discussion); (iii) third, and mostly as a consequence of the second, yes, it turns out to be way bigger in size making it impossible to be handled by a spreadsheet but instead via special processing environments (there are several open-source & commercial packages & languages being used for this purpose).
In connection with Big Data, the phrase Data Science also popularized in this last decade as if it is the practice of analyzing & using this new data format along with the traditional one. In other words, since data is big enough now, its analysis & use also deserves to be coined with the word science, leading to the weird term Data Science. In parallel, we also witness a “Data Scientist”s(!) inflation.
Sorry, but there is no such thing as Data Science, and I believe we deserve a bit more creativity at least in the naming of the phrase instead of a hype. (Advanced) analytics is another term used interchangeably with Data Science, and even though it is less weird, none represents what is meant here. Truth is, the field of science to acknowledge here is Management Science, or Operations Research (OR/MS).
In a sense, those buzzwords are meant to refer to a key value of OR/MS, which is modeling. Data – whatever form or size it has – is just the input of a model that is built to represent & support the real-world decision-making problem on hand. This modeling job & making better decisions through quantitative models – or model & data-driven decision-making if you wish – has always been at the core of the promise of OR/MS. So, nothing new here really. But then, what is it all about?
"There is no such thing as Data Science, and the true field of science to acknowledge is Management Science, or Operations Research (OR/MS)."
The story of analytics all began with a focus on customer analytics or analytic customer relationship management (CRM) in most companies. Efforts around analytic CRM generally focus on churn / attrition prediction & trimming promotional campaigns (e.g. up-sell / cross-sell) to maximize customer engagement & retention. In other words, predictive modeling is the first domain employed after the term (business) analytics started to spread in this last decade. Moreover, for those modeling attempts to be meaningful, companies taking advantage of predictive modeling are extensively from B2C industries with mass customer populations, like telecom & financial services. Methods employed in this domain range from time-series forecasting to causal methods, regression being one of the most commonly used, among other more advanced & sometimes hybrid techniques.
But since this is even too complex for most mid-level managers, commercial applications of the so-called data science / analytics stepped back & started to focus on descriptive modeling rather than predictive, but on a much naïve basis. Descriptive modeling involves methods ranging from some advanced statistical analyses to several different types of the powerful simulation modeling that allow running what-if scenarios. Yet, the widespread application in companies boiled down to creating executive dashboards for KPIs monitoring & reporting, or even in some cases, creating pivot tables on spreadsheets. Of course, this sounds more like data mining & information visualization with the aim to discover hidden patterns in data, extract knowledge & acquire actionable insights. Anyways, instead of predicting how customers will behave as in predictive modeling applications (e.g. credit risk scoring), trying to understand who they are via several types of segmentation / classification based descriptive analyses turned to be much popular.
Whatever insights descriptive & predictive models can provide, the key question at the end of the day for a senior executive is always: “what should I do?”, asking for a recipe, an optimal policy or course of action. Here comes the ultimate of the three modeling domains: prescriptive modeling, or optimization / mathematical programming if you wish, which uses the outputs of the other two modeling approaches rather as inputs. Multi-Objective Dynamic Stochastic Mixed Integer Non-linear Programming (MODSMINLP) being the most advanced version of such a model that can represent any real-world decision-making situation, there is a collection of algorithms & heuristics used to optimally or near-optimally solve & analyze prescriptive models based on their types. For the interested, the theory & business applications of optimization / mathematical programming dates to almost a century ago. RIP Dantzig!
So even though the distinction & relationships among these three modeling domains are crystal clear, you can come across dozens of misinterpretations & misuses of descriptive, predictive & prescriptive modeling, even sometimes interchangeably, which is – at least to me – terrifying. For instance, even if you have a very advanced simulation model (like the digital twins of the IoT platforms of Industry 4.0 era), it cannot prescribe you what to do to optimize business performance. Or your dashboards can tell you what has been going on with the business but cannot solely tell you what is going to happen. Examples are numerous, and accurate & more complete definitions of these three modeling approaches can be found in the initial chapters of any of the OR/MS introductory textbooks.
Referring to its value, relying on OR/MS modeling is essential to any business for making better – and in certain cases – best decisions. On the other hand, when the question is whether to incorporate big data in those models, a similar argument would not hold, since even traditional ERP data will be quite enough most of the time, only if you know what to do with it & how to do it. The problem is, most of the companies – in trying to catch up with the buzzwords – make huge investments in those concepts. Yet, only a few have a clear answer to such key questions & achieve the expected ROI out of their investments.
Finally, in relation to the methods employed in OR/MS modeling, it is not a computer science, statistics or pure mathematics job but rather calls for OR/MS expertise. Such a practitioner will be quite comfortable not only in developing the model required by the business context, but also in doing the pre-modeling tasks (e.g. data collection, analysis & manipulation) as well as developing the code for the necessary algorithm that will address the model on hand. Post-solution tasks including model validation & solution implementation even require further social skills on top of the technical ones, interpersonal communication & change management being the most important ones of those. For this extended skill-set, you can always trust an established OR/MS practitioner.