WebLog Analyzer Using Big Data Technology
Material type:
TextSubject(s): Dissertation note: Master of Science in Computer Science & Information Security 2013-2015 EXT "Mirox Cyber Security & Technology Pvt Ltd " Summary: In today’s Internet world, log file analysis is becoming a necessary task for analyzing the customer’s behavior in order to improve advertising and sales as well as for datasets like environment, medical, banking system it is important to analyze the log data to get required knowledge from it. Web mining is the process of discovering the knowledge from the web data. Log files are getting generated very fast at the rate of 1-10 Mb/s per machine, a single data center can generate tens of terabytes of log data in a day. These datasets are huge. In order to analyze such large datasets we need parallel processing system and reliable data storage mechanism. Virtual database system is an effective solution for integrating the data but it becomes inefficient for large datasets. The Hadoop framework provides reliable data storage by Hadoop Distributed File System and MapReduce programming model which is a parallel processing system for large datasets. Hadoop distributed file system breaks up input data and sends fractions of the original data to several machines in Hadoop cluster to hold blocks of data. This mechanism helps to process log data in parallel using all the machines in the hadoop cluster and computes result efficiently. The overall objective of this project is to analyze System Log of Internal organization. Log Files contain list of activities that can be a response to any request which is being occurred on the system server or any hosted application. These log files may resides in the same system server. Each individual request is listed on a separate line in a log file, called a log entry. The aim of a log file is to keep track of what is going on with the system server/application. Analyzing these log files can give lots of insights that help understand traffic patterns, user activity, Security breaks, user’s interest etc.
| Item type | Current library | Call number | Status | Date due | Barcode | |
|---|---|---|---|---|---|---|
Project Reports
|
Kerala University of Digital Sciences, Innovation and Technology Knowledge Centre | Not for loan | R-922 | |||
Project Reports
|
Kerala University of Digital Sciences, Innovation and Technology Knowledge Centre | Not for loan | R-699 |
Master of Science in Computer Science & Information Security 2013-2015 EXT Meraj Uddin Rakesh Kumar R G "Mirox Cyber Security & Technology Pvt Ltd "
In today’s Internet world, log file analysis is becoming a necessary task for analyzing the customer’s behavior in order to improve advertising and sales as well as for datasets like environment, medical, banking system it is important to analyze the log data to get required knowledge from it. Web mining is the process of discovering the knowledge from the web data. Log files are getting generated very fast at the rate of 1-10 Mb/s per machine, a single data center can generate tens of terabytes of log data in a day. These datasets are huge. In order to analyze such large datasets we need parallel processing system and reliable data storage mechanism. Virtual database system is an effective solution for integrating the data but it becomes inefficient for large datasets. The Hadoop framework provides reliable data storage by Hadoop Distributed File System and MapReduce programming model which is a parallel processing system for large datasets. Hadoop distributed file system breaks up input data and sends fractions of the original data to several machines in Hadoop cluster to hold blocks of data. This mechanism helps to process log data in parallel using all the machines in the hadoop cluster and computes result efficiently. The overall objective of this project is to analyze System Log of Internal organization. Log Files contain list of activities that can be a response to any request which is being occurred on the system server or any hosted application. These log files may resides in the same system server. Each individual request is listed on a separate line in a log file, called a log entry. The aim of a log file is to keep track of what is going on with the system server/application. Analyzing these log files can give lots of insights that help understand traffic patterns, user activity, Security breaks, user’s interest etc.
There are no comments on this title.