Aaditya Reji George (93613000 )

WebLog Analyzer Using Big Data Technology



In today’s Internet world, log file analysis is becoming a necessary task for analyzing the customer’s behavior in order to improve advertising and sales as well as for datasets like environment, medical, banking system it is important to analyze the log data to get required knowledge from it. Web mining is the process of discovering the knowledge from the web data. Log files are getting generated very fast at the rate of 1-10 Mb/s per machine, a single data center can generate tens of terabytes of log data in a day. These datasets are huge. In order to analyze such large datasets we need parallel processing system and reliable data storage mechanism. Virtual database system is an effective solution for integrating the data but it becomes inefficient for large datasets. The Hadoop framework provides reliable data storage by Hadoop Distributed File System and MapReduce programming model which is a parallel processing system for large datasets. Hadoop distributed file system breaks up input data and sends fractions of the original data to several machines in Hadoop cluster to hold blocks of data. This mechanism helps to process log data in parallel using all the machines in the hadoop cluster and computes result efficiently. The overall objective of this project is to analyze System Log of Internal organization. Log Files contain list of activities that can be a response to any request which is being occurred on the system server or any hosted application. These log files may resides in the same system server. Each individual request is listed on a separate line in a log file, called a log entry. The aim of a log file is to keep track of what is going on with the system server/application. Analyzing these log files can give lots of insights that help understand traffic patterns, user activity, Security breaks, user’s interest etc.




DBMS
HADOOP
BIG DATA