For programmers to need to analyze Apache access log and error log by python,this tested source code is useful.
Output is python dictionary format,appending status code detail and error level detail.
Also available to join access log and error log by datetime(second).
Output samples are followings.
– access log (common format)
[{‘remote_host’: ‘127.0.0.1’, ‘remote_logname’: ‘-‘, ‘remote_user’: ‘frank’, ‘time_received’: ‘[10/Oct/2000:13:55:36 -0700]’, ‘time_received_datetimeobj’: datetime.datetime(2000, 10, 10, 13, 55, 36), ‘time_received_isoformat’: ‘2000-10-10T13:55:36’, ‘time_received_tz_datetimeobj’: datetime.datetime(2000, 10, 10, 13, 55, 36, tzinfo=’0700′), ‘time_received_tz_isoformat’: ‘2000-10-10T13:55:36-07:00’, ‘time_received_utc_datetimeobj’: datetime.datetime(2000, 10, 10, 20, 55, 36, tzinfo=’0000′), ‘time_received_utc_isoformat’: ‘2000-10-10T20:55:36+00:00’, ‘request_first_line’: ‘GET /apache_pb.gif HTTP/1.0’, ‘request_method’: ‘GET’, ‘request_url’: ‘/apache_pb.gif’, ‘request_http_ver’: ‘1.0’, ‘request_url_scheme’: ”, ‘request_url_netloc’: ”, ‘request_url_path’: ‘/apache_pb.gif’, ‘request_url_query’: ”, ‘request_url_fragment’: ”, ‘request_url_username’: None, ‘request_url_password’: None, ‘request_url_hostname’: None, ‘request_url_port’: None, ‘request_url_query_dict’: {}, ‘request_url_query_list’: [], ‘request_url_query_simple_dict’: {}, ‘status’: ‘200’, ‘response_bytes_clf’: ‘2326’, ‘time_received_datetimeobj_str’: ‘2000-10-10 13:55:36.000000’, ‘status_code_name’: ‘OK’, ‘status_code_explanation’: ‘The 200 (OK) status code indicates that the request has succeeded. The payload sent in a 200 response depends on the request method. For the methods defined by this specification, the intended meaning of the payload can be summarized as:GET a representation of the target resource;HEAD the same representation as GET, but without the representation data; POST a representation of the status of, or results obtained from, the action;PUT, DELETE a representation of the status of the action;PUT、DELETE。OPTIONS a representation of the communications options;TRACE a representation of the request message as received by the end server.Aside from responses to CONNECT, a 200 response always has a payload, though an origin server MAY generate a payload body of zero length. If no payload is desired, an origin server ought to send 204 (No Content) instead. For CONNECT, no payload is allowed because the successful result is a tunnel, which begins immediately after the 200 response header section.’, ‘remote_host_ip’: ‘127.0.0.1’, ‘remote_host_domainname’: ‘localhost’},…………….
– access log (combined format)
[{‘remote_host’: ‘127.0.0.1’, ‘remote_logname’: ‘-‘, ‘remote_user’: ‘-‘, ‘time_received’: ‘[20/Oct/2018:19:51:19 +0900]’, ‘time_received_datetimeobj’: datetime.datetime(2018, 10, 20, 19, 51, 19), ‘time_received_isoformat’: ‘2018-10-20T19:51:19’, ‘time_received_tz_datetimeobj’: datetime.datetime(2018, 10, 20, 19, 51, 19, tzinfo=’0900′), ‘time_received_tz_isoformat’: ‘2018-10-20T19:51:19+09:00’, ‘time_received_utc_datetimeobj’: datetime.datetime(2018, 10, 20, 10, 51, 19, tzinfo=’0000′), ‘time_received_utc_isoformat’: ‘2018-10-20T10:51:19+00:00’, ‘request_first_line’: ‘GET /xampp/itsfa/ HTTP/1.1’, ‘request_method’: ‘GET’, ‘request_url’: ‘/xampp/itsfa/’, ‘request_http_ver’: ‘1.1’, ‘request_url_scheme’: ”, ‘request_url_netloc’: ”, ‘request_url_path’: ‘/xampp/itsfa/’, ‘request_url_query’: ”, ‘request_url_fragment’: ”, ‘request_url_username’: None, ‘request_url_password’: None, ‘request_url_hostname’: None, ‘request_url_port’: None, ‘request_url_query_dict’: {}, ‘request_url_query_list’: [], ‘request_url_query_simple_dict’: {}, ‘status’: ‘404’, ‘response_bytes_clf’: ‘1134’, ‘request_header_referer’: ‘-‘, ‘request_header_user_agent’: ‘Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:62.0) Gecko/20100101 Firefox/62.0’, ‘request_header_user_agent__browser__family’: ‘Firefox’, ‘request_header_user_agent__browser__version_string’: ‘62.0’, ‘request_header_user_agent__os__family’: ‘Windows’, ‘request_header_user_agent__os__version_string’: ’10’, ‘request_header_user_agent__is_mobile’: False, ‘time_received_datetimeobj_str’: ‘2018-10-20 19:51:19.000000’, ‘status_code_name’: ‘Not Found’, ‘status_code_explanation’: ‘The 404 (Not Found) status code indicates that the origin server did not find a current representation for the target resource or is not willing to disclose that one exists. A 404 status code does not indicate whether this lack of representation is temporary or permanent; the 410 (Gone) status code is preferred over 404 if the origin server knows, presumably through some configurable means, that the condition is likely to be permanent.’, ‘remote_host_ip’: ‘127.0.0.1’, ‘remote_host_domainname’: ‘localhost’}, {‘remote_host’: ‘183.79.135.206’, ‘remote_logname’: ‘-‘, ‘remote_user’: ‘-‘, ‘time_received’: ‘[20/Oct/2018:19:51:24 +0900]’, ‘time_received_datetimeobj’: datetime.datetime(2018, 10, 20, 19, 51, 24), ‘time_received_isoformat’: ‘2018-10-20T19:51:24’, ‘time_received_tz_datetimeobj’: datetime.datetime(2018, 10, 20, 19, 51, 24, tzinfo=’0900′), ‘time_received_tz_isoformat’: ‘2018-10-20T19:51:24+09:00’, ‘time_received_utc_datetimeobj’: datetime.datetime(2018, 10, 20, 10, 51, 24, tzinfo=’0000′), ‘time_received_utc_isoformat’: ‘2018-10-20T10:51:24+00:00’, ‘request_first_line’: ‘GET /xampp/ HTTP/1.1’, ‘request_method’: ‘GET’, ‘request_url’: ‘/xampp/’, ‘request_http_ver’: ‘1.1’, ‘request_url_scheme’: ”, ‘request_url_netloc’: ”, ‘request_url_path’: ‘/xampp/’, ‘request_url_query’: ”, ‘request_url_fragment’: ”, ‘request_url_username’: None, ‘request_url_password’: None, ‘request_url_hostname’: None, ‘request_url_port’: None, ‘request_url_query_dict’: {}, ‘request_url_query_list’: [], ‘request_url_query_simple_dict’: {}, ‘status’: ‘200’, ‘response_bytes_clf’: ‘768’, ‘request_header_referer’: ‘-‘, ‘request_header_user_agent’: ‘Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:62.0) Gecko/20100101 Firefox/62.0’, ‘request_header_user_agent__browser__family’: ‘Firefox’, ‘request_header_user_agent__browser__version_string’: ‘62.0’, ‘request_header_user_agent__os__family’: ‘Windows’, ‘request_header_user_agent__os__version_string’: ’10’, ‘request_header_user_agent__is_mobile’: False, ‘time_received_datetimeobj_str’: ‘2018-10-20 19:51:24.000000’, ‘status_code_name’: ‘OK’, ‘status_code_explanation’: ‘The 200 (OK) status code indicates that the request has succeeded. The payload sent in a 200 response depends on the request method. For the methods defined by this specification, the intended meaning of the payload can be summarized as:GET a representation of the target resource;HEAD the same representation as GET, but without the representation data; POST a representation of the status of, or results obtained from, the action;PUT, DELETE a representation of the status of the action;PUT、DELETE。OPTIONS a representation of the communications options;TRACE a representation of the request message as received by the end server.Aside from responses to CONNECT, a 200 response always has a payload, though an origin server MAY generate a payload body of zero length. If no payload is desired, an origin server ought to send 204 (No Content) instead. For CONNECT, no payload is allowed because the successful result is a tunnel, which begins immediately after the 200 response header section.’, ‘remote_host_ip’: ‘183.79.135.206’, ‘remote_host_domainname’: ‘f1.top.vip.kks.yahoo.co.jp’},…………..
– error log
[{‘dt’: ‘Wed Oct 11 14:32:52 2000’, ‘dtobj’: datetime.datetime(2000, 10, 11, 14, 32, 52), ‘dtobj_str’: ‘2000-10-11 14:32:52’, ‘level’: ‘error’, ‘module’: ”, ‘message’: ‘[client 127.0.0.1] client denied by server configuration: /export/home/live/ap/htdocs/test’, ‘dtobj_sec’: datetime.datetime(2000, 10, 11, 14, 32, 52), ‘level_explanation’: ‘Error conditions’},………..
Specification
ID | STCD_0000000008 |
Language | Python |
Steps | 120(AccessLogClass) 67(ErrorLogClass) |
Purpose | Analyze access log and error log of apache. |
Function | Analyze access log. Analyze error log. Join accesslog and error log with datetime(second). |
Environment | Anaconda3(Python 3.9.7) IDE : Visual Studio Code apache_log_parser |
Restriction | free license You can use source code copy as owner . You can customize and distribute it freely. |
Price | 7 dollars or 700 yen (Pay with PayPal) |
References | https://github.com/amandasaurus/apache-log-parser |
Source Code
AccsessLog Class
ErrorLog Class
Test Result
AccessLog Class
NO | test case | result |
01 | analyze common log. | OK |
02 | analyze combined log. | OK |
03 | set status_code converting to on or off. | OK |
04 | set remote host converting to on or off. | OK |
05 | set some status codes to skip. | OK |
06 | Join access log and error log. | OK |
ErrorLog Class
NO | test case | result |
01 | analyze error log. | OK |
Test Code
*) Modify “path” along your PC environment.
AccessLog Class
ErrorLog Class
History
16/1/2023 created
Provider Profile
Nick name is “Dead Fish” employed as an engineer in Japan.
I am grad if you need my code.
Thanks !
Download
Get download passwordSTCD_0000000008
Following files and data are zipped.
├── AccessLogClass.py ├── ErrorLogClass.py └── test_log ├── combined │ ├── access_combine.log │ ├── dxintel.net.access_log_20230106 │ ├── dxintel.net.access_log_20230112 │ ├── stcode.net.access_log │ └── test_combined.log ├── common │ └── test_common.log ├── error │ ├── error_1.log │ ├── error_2.log │ └── test_error.log └── join ├── access_combine.log ├── aceesslog_errorlog.csv └── error.log
Remarks
None
Comments
Aw, this was an incredibly nice post. Spending some
time and actual effort to make a superb article… but what
can I say… I put things off a lot and never manage to get
anything done.
Thanks for sharing your thoughts on Python. Regards