Big data: Performance profiling of Meteorological and Oceanographic data on Hive

Abdullahi, A.U. and Ahmad, R. and Zakaria, N.M. (2016) Big data: Performance profiling of Meteorological and Oceanographic data on Hive. In: UNSPECIFIED.

Full text not available from this repository.
Official URL: https://www.scopus.com/inward/record.uri?eid=2-s2....

Abstract

The emergence and development of big data tools, techniques and systems motivate industries and organizations to embrace and explore research in big data. This is to circumvent the challenges of the traditional database systems. However, the available benchmarks and workloads are for some specific aspect of Information Technology industries, which has differences in nature and complexities with the data obtained from other sources. Hence there is need for using data from other domains in order to evaluate the performance and maturity of the big data technologies. In this paper the performance profiling of Meteorological and Oceanographic data on Hive is conducted. Hive being the commonly used data warehouse analytical platform for big data is chosen with the view to exposing the intricacies that are involved in the formating and loading of the data. The response time for indexed and non-indexed retrievals using three set of queries frequently used in the area is found. The query types are Type 1 SELECT with WHERE clause, Type 2 SELECT with JOIN clause. And Type 3 SELECT with GROUP BY clause. The experimental results show that a good response time for both indexed and Non-indexed tables are achieved. The indexed retrieval shows a significant decrease in the response time for Type 1 query for all data sizes and for Type 3 query for 100GB data size and less. It also shows additional overhead for Type 2 query for all data sizes and Type 3 query for 500GB and more data sizes. The Meteorological and Oceanographic data if properly formated it's analytics with Hive proved to be efficient compared to the traditional database systems. The results of this study has the potentials of attracting the oil and gas companies to adopt big data technologies for the handling of their exploration dataset. © 2016 IEEE.

Item Type: Conference or Workshop Item (UNSPECIFIED)
Impact Factor: cited By 3
Uncontrolled Keywords: Benchmarking; Data warehouses; Gas industry; Information science; Public utilities; Query processing; Response time (computer systems), Data technologies; Data tools; Hadoop; Hive; Information technology industry; Meteorological and oceanographic data; Oil and gas companies; Query types, Big data
Depositing User: Ms Sharifah Fahimah Saiyed Yeop
Date Deposited: 25 Mar 2022 06:55
Last Modified: 25 Mar 2022 06:55
URI: http://scholars.utp.edu.my/id/eprint/30482

Actions (login required)

View Item
View Item