RESEARCH ON DARK DATA ANALYSIS TO REDUCE DATA COMPLEXITY IN BIG DATA

Authors

  • Bansari Trivedi M. Tech student, Information Technology department, Parul Institute of Engineering and Technology, Waghodia, Vadodara, India.
  • Mr. Gokulnath K. HOD, I.T. Dept. Parul University

Keywords:

Dark Data, Chunking, Classification, Big Data

Abstract

Big data is a large amount of data which is hard to handle by traditional systems. It requires new structures, algorithms and techniques. As data increases, dark data also increases. In such way there is one portion of data within a main data source which is not in regular use but it can help in decision making and to retrieve the data. This portion is known as “Dark Data”. Dark data is generally in ideal state. The first use and defining of the term "dark data" appears to be by the consulting company Gartner. Dark data is acquired through various operational sources but not used in any manner to derive insights or for decision making. It is subset of Big Data. Usually each big data sets consists average 80% dark data of whole data set. There are two ways to view the importance of dark data. One view is that unanalyzed data contains undiscovered, important insights and represents an opportunity lost. The other view is that unanalyzed data, if not handled well, can result in a lot of problems such as legal and security problems. In this phase solution for side effects of dark data on whole data set is introduced. Dark data is important part of Big Data. But it is in ideal state so it may cause load on system and processes. So it is important to find solution such that dark data should remain same and also can’t affect rest of data.

References

Research papers:

“A Comprehensive Study of the Past, Present, and Future of Data Deduplication”, Wan Xia, Member, IEEE, Hong Jiang, Fellow, IEEE, Dan Feng, Member, IEEE, Fred Douglis, Senior

Member, IEEE, Philip Shilane, Yu Hua, Senior Member, IEEE, Min Fu, Yucheng Zhang, and Yukun Zhou, MANUSCRIPT ID 0203-REG-2015-PIEEE (BASE PAPER)

“I-sieve: An Inline High Performance Deduplication System Used in Cloud Storage”, Jibin Wang, Zhigang Zhao, Zhaogang Xu, Hu Zhang, Liang Li, and Ying Guo, TSINGHUA SCIENCE AND TECHNOLOGY ISSNll1007-0214ll03/11llpp17-27 Volume 20, Number 1, February 2015

“The rise of “big data” on cloud computing: Review and open research issues”, Ibrahim Abaker Targio Hashem, Ibrar Yaqoob , Nor Badrul Anuar , Salimah Mokhtar , Abdullah Gani, Samee Ullah Khan , Information Systems 47 (2015) 98–115

“Understanding the Dark Side of Big Data Clusters: An Analysis beyond Failures”, Andrea Rosa, Lydia Y. Chen, Walter Binder, 2015 45th Annual IEEE/IFIP International Conference on Dependable Systems and Networks

“Distributed Virtualization Manager for KVM Based Cluster”, Mr. Uchit Gandhi, Mr. Mitul Modi, Ms. Mitali Raval, Mr. Paavan Maniar, Dr. Narendra Patel, Prof Kirti Sharma, Procedia Computer Science 79 ( 2016 ) 182 – 189, ScienceDirect

“Data Model for Big Data in Cloud Environment”, Imran Khan, S. K. Naqvi, Mansaf Alam, S. N. A Rizvi

“Study of Chunking Algorithm in Data Deduplication”, A.Venish and K. Siva Sankar, Springer India 2016

Websites:

http://www.kdnuggets.com/webcasts/index.html

https://en.wikipedia.org/wiki/Random_forest

Additional Files

Published

15-05-2017

How to Cite

Bansari Trivedi, & Mr. Gokulnath K. (2017). RESEARCH ON DARK DATA ANALYSIS TO REDUCE DATA COMPLEXITY IN BIG DATA. International Education and Research Journal (IERJ), 3(5). Retrieved from https://ierj.in/journal/index.php/ierj/article/view/902