The following statistics are analyzed:
- A count of response code's returned from the server.
- The content size of responses returned from the server to host.
- The top ten most popular URL’s in the Apache log
- The average, min, and max content size of responses returned from the server.
The steps to process data with Apache Hive
Before proceed the below steps, we have to install the Cloudera Quickstart vm 5.5 and VMwareplayer. The Hadoop 2.6, Java 1.7, Eclipse Luna, Hive, Hbase, Spark, and all required libraries have been included in cloudera.
- Download the apache log file from http://www.monitorware.com/en/logsamples/ apache.php and unzip it.
- Create a loganalyzer/input directory named path in HDFS.
hadoop fs -mkdir -p /user/cloudera/hive/input
- Copy the log file from the local file system to directory within the HDFS.
hadoop fs -put access_log /user/cloudera/hive/input/
- Create appropriate table for string Apache logs.
- Load access_log file, depending location of file (local file system or HDFS) do on of followings
- List the count of response code's returned from the server
Result
- List the top 10 most popular URL’s in the Apache log
Result
- List the content size of responses returned from the server.
Result
- List the average, min, and max content size of responses returned from the server.
Result