AccessLog and ErrorLog Class(Apache)

For programmers to need to analyze Apache access log and error log by python,this tested source code is useful.

Output is python dictionary format,appending status code detail and error level detail.
Also available to join access log and error log by datetime(second).

Output samples are followings.

– access log (common format)

[{‘remote_host’: ‘127.0.0.1’, ‘remote_logname’: ‘-‘, ‘remote_user’: ‘frank’, ‘time_received’: ‘[10/Oct/2000:13:55:36 -0700]’, ‘time_received_datetimeobj’: datetime.datetime(2000, 10, 10, 13, 55, 36), ‘time_received_isoformat’: ‘2000-10-10T13:55:36’, ‘time_received_tz_datetimeobj’: datetime.datetime(2000, 10, 10, 13, 55, 36, tzinfo=’0700′), ‘time_received_tz_isoformat’: ‘2000-10-10T13:55:36-07:00’, ‘time_received_utc_datetimeobj’: datetime.datetime(2000, 10, 10, 20, 55, 36, tzinfo=’0000′), ‘time_received_utc_isoformat’: ‘2000-10-10T20:55:36+00:00’, ‘request_first_line’: ‘GET /apache_pb.gif HTTP/1.0’, ‘request_method’: ‘GET’, ‘request_url’: ‘/apache_pb.gif’, ‘request_http_ver’: ‘1.0’, ‘request_url_scheme’: ”, ‘request_url_netloc’: ”, ‘request_url_path’: ‘/apache_pb.gif’, ‘request_url_query’: ”, ‘request_url_fragment’: ”, ‘request_url_username’: None, ‘request_url_password’: None, ‘request_url_hostname’: None, ‘request_url_port’: None, ‘request_url_query_dict’: {}, ‘request_url_query_list’: [], ‘request_url_query_simple_dict’: {}, ‘status’: ‘200’, ‘response_bytes_clf’: ‘2326’, ‘time_received_datetimeobj_str’: ‘2000-10-10 13:55:36.000000’, ‘status_code_name’: ‘OK’, ‘status_code_explanation’: ‘The 200 (OK) status code indicates that the request has succeeded. The payload sent in a 200 response depends on the request method. For the methods defined by this specification, the intended meaning of the payload can be summarized as:GET a representation of the target resource;HEAD the same representation as GET, but without the representation data; POST a representation of the status of, or results obtained from, the action;PUT, DELETE a representation of the status of the action;PUT、DELETE。OPTIONS a representation of the communications options;TRACE a representation of the request message as received by the end server.Aside from responses to CONNECT, a 200 response always has a payload, though an origin server MAY generate a payload body of zero length. If no payload is desired, an origin server ought to send 204 (No Content) instead. For CONNECT, no payload is allowed because the successful result is a tunnel, which begins immediately after the 200 response header section.’, ‘remote_host_ip’: ‘127.0.0.1’, ‘remote_host_domainname’: ‘localhost’},…………….

– access log (combined format)

[{‘remote_host’: ‘127.0.0.1’, ‘remote_logname’: ‘-‘, ‘remote_user’: ‘-‘, ‘time_received’: ‘[20/Oct/2018:19:51:19 +0900]’, ‘time_received_datetimeobj’: datetime.datetime(2018, 10, 20, 19, 51, 19), ‘time_received_isoformat’: ‘2018-10-20T19:51:19’, ‘time_received_tz_datetimeobj’: datetime.datetime(2018, 10, 20, 19, 51, 19, tzinfo=’0900′), ‘time_received_tz_isoformat’: ‘2018-10-20T19:51:19+09:00’, ‘time_received_utc_datetimeobj’: datetime.datetime(2018, 10, 20, 10, 51, 19, tzinfo=’0000′), ‘time_received_utc_isoformat’: ‘2018-10-20T10:51:19+00:00’, ‘request_first_line’: ‘GET /xampp/itsfa/ HTTP/1.1’, ‘request_method’: ‘GET’, ‘request_url’: ‘/xampp/itsfa/’, ‘request_http_ver’: ‘1.1’, ‘request_url_scheme’: ”, ‘request_url_netloc’: ”, ‘request_url_path’: ‘/xampp/itsfa/’, ‘request_url_query’: ”, ‘request_url_fragment’: ”, ‘request_url_username’: None, ‘request_url_password’: None, ‘request_url_hostname’: None, ‘request_url_port’: None, ‘request_url_query_dict’: {}, ‘request_url_query_list’: [], ‘request_url_query_simple_dict’: {}, ‘status’: ‘404’, ‘response_bytes_clf’: ‘1134’, ‘request_header_referer’: ‘-‘, ‘request_header_user_agent’: ‘Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:62.0) Gecko/20100101 Firefox/62.0’, ‘request_header_user_agent__browser__family’: ‘Firefox’, ‘request_header_user_agent__browser__version_string’: ‘62.0’, ‘request_header_user_agent__os__family’: ‘Windows’, ‘request_header_user_agent__os__version_string’: ’10’, ‘request_header_user_agent__is_mobile’: False, ‘time_received_datetimeobj_str’: ‘2018-10-20 19:51:19.000000’, ‘status_code_name’: ‘Not Found’, ‘status_code_explanation’: ‘The 404 (Not Found) status code indicates that the origin server did not find a current representation for the target resource or is not willing to disclose that one exists. A 404 status code does not indicate whether this lack of representation is temporary or permanent; the 410 (Gone) status code is preferred over 404 if the origin server knows, presumably through some configurable means, that the condition is likely to be permanent.’, ‘remote_host_ip’: ‘127.0.0.1’, ‘remote_host_domainname’: ‘localhost’}, {‘remote_host’: ‘183.79.135.206’, ‘remote_logname’: ‘-‘, ‘remote_user’: ‘-‘, ‘time_received’: ‘[20/Oct/2018:19:51:24 +0900]’, ‘time_received_datetimeobj’: datetime.datetime(2018, 10, 20, 19, 51, 24), ‘time_received_isoformat’: ‘2018-10-20T19:51:24’, ‘time_received_tz_datetimeobj’: datetime.datetime(2018, 10, 20, 19, 51, 24, tzinfo=’0900′), ‘time_received_tz_isoformat’: ‘2018-10-20T19:51:24+09:00’, ‘time_received_utc_datetimeobj’: datetime.datetime(2018, 10, 20, 10, 51, 24, tzinfo=’0000′), ‘time_received_utc_isoformat’: ‘2018-10-20T10:51:24+00:00’, ‘request_first_line’: ‘GET /xampp/ HTTP/1.1’, ‘request_method’: ‘GET’, ‘request_url’: ‘/xampp/’, ‘request_http_ver’: ‘1.1’, ‘request_url_scheme’: ”, ‘request_url_netloc’: ”, ‘request_url_path’: ‘/xampp/’, ‘request_url_query’: ”, ‘request_url_fragment’: ”, ‘request_url_username’: None, ‘request_url_password’: None, ‘request_url_hostname’: None, ‘request_url_port’: None, ‘request_url_query_dict’: {}, ‘request_url_query_list’: [], ‘request_url_query_simple_dict’: {}, ‘status’: ‘200’, ‘response_bytes_clf’: ‘768’, ‘request_header_referer’: ‘-‘, ‘request_header_user_agent’: ‘Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:62.0) Gecko/20100101 Firefox/62.0’, ‘request_header_user_agent__browser__family’: ‘Firefox’, ‘request_header_user_agent__browser__version_string’: ‘62.0’, ‘request_header_user_agent__os__family’: ‘Windows’, ‘request_header_user_agent__os__version_string’: ’10’, ‘request_header_user_agent__is_mobile’: False, ‘time_received_datetimeobj_str’: ‘2018-10-20 19:51:24.000000’, ‘status_code_name’: ‘OK’, ‘status_code_explanation’: ‘The 200 (OK) status code indicates that the request has succeeded. The payload sent in a 200 response depends on the request method. For the methods defined by this specification, the intended meaning of the payload can be summarized as:GET a representation of the target resource;HEAD the same representation as GET, but without the representation data; POST a representation of the status of, or results obtained from, the action;PUT, DELETE a representation of the status of the action;PUT、DELETE。OPTIONS a representation of the communications options;TRACE a representation of the request message as received by the end server.Aside from responses to CONNECT, a 200 response always has a payload, though an origin server MAY generate a payload body of zero length. If no payload is desired, an origin server ought to send 204 (No Content) instead. For CONNECT, no payload is allowed because the successful result is a tunnel, which begins immediately after the 200 response header section.’, ‘remote_host_ip’: ‘183.79.135.206’, ‘remote_host_domainname’: ‘f1.top.vip.kks.yahoo.co.jp’},…………..

– error log

[{‘dt’: ‘Wed Oct 11 14:32:52 2000’, ‘dtobj’: datetime.datetime(2000, 10, 11, 14, 32, 52), ‘dtobj_str’: ‘2000-10-11 14:32:52’, ‘level’: ‘error’, ‘module’: ”, ‘message’: ‘[client 127.0.0.1] client denied by server configuration: /export/home/live/ap/htdocs/test’, ‘dtobj_sec’: datetime.datetime(2000, 10, 11, 14, 32, 52), ‘level_explanation’: ‘Error conditions’},………..

Specification

IDSTCD_0000000008
LanguagePython
Steps120(AccessLogClass)
67(ErrorLogClass)
PurposeAnalyze access log and error log of apache.
FunctionAnalyze access log.
Analyze error log.
Join accesslog and error log with datetime(second).
EnvironmentAnaconda3(Python 3.9.7)
IDE : Visual Studio Code
apache_log_parser
Restrictionfree license
You can use source code copy as owner .
You can customize and distribute it freely.
Price7 dollars or 700 yen
(Pay with PayPal)
Referenceshttps://github.com/amandasaurus/apache-log-parser

Source Code

AccsessLog Class

ErrorLog Class

Test Result

AccessLog Class

NOtest caseresult
01analyze common log.OK
02analyze combined log.OK
03set status_code converting to on or off.OK
04set remote host converting to on or off.OK
05set some status codes to skip.OK
06Join access log and error log.OK

ErrorLog Class

NOtest caseresult
01analyze error log.OK

Test Code

*) Modify “path” along your PC environment.

AccessLog Class

ErrorLog Class

History

16/1/2023 created

Provider Profile

Nick name is “Dead Fish” employed as an engineer in Japan.
I am grad if you need my code.
Thanks !

Download

Get download password

Following files and data are zipped.

├── AccessLogClass.py
├── ErrorLogClass.py
└── test_log
    ├── combined
    │   ├── access_combine.log
    │   ├── dxintel.net.access_log_20230106
    │   ├── dxintel.net.access_log_20230112
    │   ├── stcode.net.access_log
    │   └── test_combined.log
    ├── common
    │   └── test_common.log
    ├── error
    │   ├── error_1.log
    │   ├── error_2.log
    │   └── test_error.log
    └── join
        ├── access_combine.log
        ├── aceesslog_errorlog.csv
        └── error.log

Remarks

None

Comments

  1. Aw, this was an incredibly nice post. Spending some
    time and actual effort to make a superb article… but what
    can I say… I put things off a lot and never manage to get
    anything done.

  2. Thanks for sharing your thoughts on Python. Regards

Copied title and URL