fbpx

Big Data 101 – NoSql, Hadoop, Analytic DS, RDBMS Differences for Business People

insightsoftware -
February 7, 2021

insightsoftware is the global provider of enterprise software solutions for the Office of the CFO to connect to & make sense of data in real time, driving financial intelligence across […]

Lv Tp Data Collection
The following post was coauthored by David Abramson, director of product management, Logi Analytics, and Steven Schneider, VP of sales and business development, Logi Analytics, and was originally published on Slinging Software. Confusion Reigns – The basic differences between Hadoop, NoSql, Analytic Data Stores & traditional databases.Organizations are now creating more data than ever before, and as such a new set of tools and technologies are becoming popular to facilitate the storage and retrieval of this information in a timely and cost-effective manner.  There are many technologies that are attempting to address these challenges, and as such there are different (and often incompatible) approaches, each with positives and negatives depending on the use-case.While initially big data was synonymous with Hadoop, through aggressive vendor marketing and leadership discussion, the term has broadened to it mean “a lot of data” and a wider set of data storage technologies.  At a high-level, there are four competing sets of data storage/access technologies that you are likely to hear about related to big-data:

RDBMS Analytic Data Stores NoSql Hadoop
Description Traditional row-column databases used for both transactional systems, reporting, and archiving. Optimized for data-access (as opposed to writes) and leverage columnar or in-memory technology to provide fast data access at the expense of write-performance limitations. Designed for rapid access to “key-value” pair combinations.  Useful for products like Facebook and Twitter where most information revolves around one key piece of data. An open-source approach to storing data in a file system across a range of commodity hardware and processing it utilizing parallelism (multiple systems at once)
Examples Sql Server, MySql, Oracle, etc Vertica, Kognitio, ParAccel, Netezza, InfoBright, Amazon RedShift MongoDB, Cassandra Hadoop implementations by CloudEra, Intel, Amazon, Hortonworks
Good for… Reads & Writes, “reasonable” data sets (< 1B rows) Storing lots of information, great query/retrieval speeds. Storing information of a certain type, great retrieval speed based on a key, write performance Inexpensive storage of mass data, structured & semi-structured
Not good for… Massive data volumes, unstructured & semi-structured data Unstructured & semi-structured data, writes (one at a time) Not used for grouping information across keys (such as for reporting) Complex, code-based, incompatible approaches in market, writes (one at a time)
Notes Challenging to “scale-out” Often viewed as an alternative to traditional RDBMS when read performance is important Enables faster productivity when creating data-driven applications as there is less up-front design work needed Strong bias to the open-source community & Java

 

About the Author

David Abramson has more than 10 years-experience in full lifecycle product development and management, from product inception through general availability. He has shepherded multiple analytics and business intelligence products, and has worked with hundreds of customers, both enterprises and ISVs, to support data-driven application implementations.

Is Business Intelligence (BI) Right for You?

Is Business Intelligence (BI) Right for You?

Download Now: