HIVE Guide for Beginner's
Birth of a language called HIVE?
HIVE was a brain child of Facebook. When Facebook was first introduced, all it's back end data for analytics was stored in Oracle systems, these were loaded using python scripts. As Facebook expanded & became more popular, data size increased exponentially & they realized Oracle system's weren't capable of handling the data load.
This is when they took a decision to migrate to an open-source platform called Hadoop, which was not that popular those days. Any fetch from hadoop system works on the concept of map-reduce, for which writing complex java jobs was really difficult. This is when they started a project HIVE to develop a SQL like language to convert SQL scripts to map-reduce.
What is HIVE?
What is HIVE?
- Hive is a data ware house system for Hadoop. It runs SQL like queries called HQL (Hive query language) which gets internally converted to map reduce jobs.
- Hive supports Data definition Language(DDL), Data Manipulation Language(DML) and user defined functions.
- Hive's metastore is used to persist schema i.e. table definition(table name, columns, types), location of table files, row format of table files, storage format of files.
- Built-in user-defined functions (UDFs) to manipulate dates, strings, and other data-mining tools. Hive supports extending the UDF set to handle use-cases not supported by built-in functions.
What HIVE is not?
- It is not used for Online Transactional Processing.
- It is not performance oriented, even fetching small amount of data takes time.
No comments: