Monday, January 25, 2016

Analyzing Apache Access Logs in Apache Hive

The following statistics are analyzed:
  1. A count of response code's returned from the server.
  2. The content size of responses returned from the server to host.
  3. The top ten most popular URL’s in the Apache log
  4. The average, min, and max content size of responses returned from the server.


The steps to process data with Apache Hive

Before proceed the below steps, we have to install the Cloudera Quickstart vm 5.5 and VMwareplayer. The Hadoop 2.6, Java 1.7, Eclipse Luna, Hive, Hbase, Spark, and all required libraries have been included in cloudera.
  1. Download the apache log file from http://www.monitorware.com/en/logsamples/ apache.php and unzip it.
  2. Create a loganalyzer/input directory named path in HDFS.
  3. hadoop fs -mkdir -p /user/cloudera/hive/input
  4. Copy the log file from the local file system to directory within the HDFS.
  5. hadoop fs -put access_log /user/cloudera/hive/input/
  6. Create appropriate table for string Apache logs.
  7. Load access_log file, depending location of file (local file system or  HDFS) do on of followings
  8. List the count of response code's returned from the server
  9. Result
  10. List the top 10 most popular URL’s in the Apache log
  11. Result
  12. List the content size of responses returned from the server.
  13. Result
  14. List the average, min, and max content size of responses returned from the server.
  15. Result
Share:

0 comments:

Post a Comment

Search This Blog

Powered by Blogger.