/*
This program produces web site statistics from log file records.
Various reports are produced as a function of criteria supplied by the user.
Depending on the time period specified by the user, one or more log files
must be accessed. There is always a current "access" log file and possibly
one or more archive log files. Archive files are gzipped and named to
reflect the date.
To speed processing, as little time as possible is spent reading log records.
Fields are read into work areas and later distributed and processed as needed.
Arrays are built driven by the needs of the various reports. If an array fills
up, it is enlarged using "realloc." Arrays are sorted using quicksort after
they are finished being populated.
Report 1: Top n clients accessing the site. Shows a count and the domain name.
Sorted by count, highest first.
Report 2: Top n files accessed. Shows count and file (e.g. a gif or an HTML page).
Sorted by count, highest first.
Report 3: Files containing the string "whatever." Shows count and client (IP).
Sorted by count, highest first.
Report 4: Period totals for site. Shows individual clients, all clients, hits,
page hits, KB transmitted. Individual clients = unique IP's; all clients
counts an IP as many times as days it occurs, and = the sum of the "clients"
column in report 5; hits is sum of "hits" column in report 5; page hits =
sum of "page hits" column in report 5; KB transmitted = sum of "KB" column
in report 5.
Report 5: Daily Totals for site. Shows date, clients, hits, page hits, KB transmitted.
Date is in form: weekday (3 char), e.g. Mon; month (3 char), day (no leading zero),
year (4 digits). KB is to two decimal places.
Report 6: Daily Averages. Shows same columns as report 5. Values = report 4 values
divided by number of days in period. Clients here is based on "all clients"
from report 4, rather than "individual clients."
Report 7: Hourly averages. Shows hour, hits, percentage, and KB transmitted.
Hour appears as e.g. "Midnight to 1 am" or "1 am to 2 am," Percentage is
nn.n %.
Report 8: Summary of HTTP errors. Shows count, error code (e.g. 404), text (e.g. Not Found).
Sorted by count, highest first.
Report 9: (Top 20) requests causing errors. Shows count and request (file).
Sorted by count, highest first.
Design notes:
Since this isn't Perl, and none of us can figure out how to redirect standard output
from a shell command back to this C program's standard in, the output of the cat or
gzcat is to a work file.
For reports 4-9 we read in the log records one-by-one. But we only read each
work file once, extracting all of the data for all of those reports in one pass.
The processing for reports 1-3 is affected by the fact that there can be from a handful
to millions of log records to be processed for a given user submission of the stats
engine. This precludes building an array in-core to hold information for each ip
encountered, e.g., without first having determined how many occur. And even if we
built an array based on that count of unique ip's, if we read each log record we have
to search the array each time to find the right entry to increment the count for.
So, a hash table is a better data structure to use.
For these reports, we extract selected log fields from each relevant log file
(based on the user-supplied date range) using shell commands (via system calls)
and concatenate this output to a work file. This allows us to determine how many
hash table entries to allocate for the primary array. We also can perform a number
of useful operations on the combined log data.
For the purposes of report 1 we "cut" extract just the first field, which is the ip,
from each log record.* After cutting from all relevant files, this gives us a file of
all occurrences of ip's. We sort this and pipe to "uniq" which gives us just the unique
ip's that occurred. A "wc" of that tells us how many ip's to take into account when
allocating the primary hash table stucture.
We can now populate the hash table. Reading in a record from the file with all ip
occurrences, we directly hash to the right entry and increment its count. In case of
the entry already being populated by a different ip (a synonym situation), we chain a
linked list off the main entry so that each different ip mapping to that same primary
entry gets its own entry.
When we're done accumulating occurrences, we can produce report 1.
For report 2 we employ a similar strategy to that used for report 1. We extract the
"file" field instead of the ip field. A hash table is eventually built based on the
file combining the "file" fields from all relevant log files.
For report 3 we read the log records as in reports 4-9, but proceed more like we do for
reports 1 and 2. We look for records whose "file" field contains the user-supplied string.
For those that do, we write a record to a file with just the ip. After all log files
have been read, we process the resulting ip file. We sort it and uniq it to get the
number of unique ip's. We read this in and allocate an array with that many entries.
We read the sorted ip file and accumulate the count of occurrences for each ip there.
We then write the ip's and counts to a file and sort it by count, descending. We read
that in and produce the report.
*Since for report 5 we need ip information within date, we build a work file with both
ip and date. We use this later for report 1, cutting the ip column to create the ip file.
*/
#include "stdio.h"
#include "string.h"
#include "time.h"
#include "math.h"
#include "stdlib.h"
#include "unistd.h" /* For getpid, execle commands */
#include <sys/types.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <arpa/inet.h>
#include <netdb.h>
#include <ctype.h>
/* Multiplied times number of unique ip's when malloc'ing hash table */
#define scaling_factor 3
char ip[8192]; /* Host (IP address) */
char ident[256]; /* Ident field */
char authuser[256]; /* Authuser */
char timestamp1[256]; /* Time Stamp part 1 */
char timestamp2[256]; /* Time Stamp part 2 */
char file1[256]; /* HTTP Request part 1 */
char file2[256]; /* HTTP Request part 2 */
char file3[256]; /* HTTP Request part 3 */
char status[256]; /* Status Code */
long bytes; /* Transfer Volume */
char *blank_date = " "; /* 11 blanks */
char blank_ip[16];
char blank_string[256];
char *buff;
char buffer[1024];
char *buff2;
long bump1 = 1000;
long bump2 = 100;
long bump3 = 10;
char cat_string[256];
char cgipath[10] = "/tmp/";
/*char cgipath[44] = "/Space/Domains/stats.simplenet.net/cgi-bin/";*/
char char_oldest_day[4];
char char_oldest_year[5];
char char_youngest_day[4];
char char_youngest_year[5];
int code_red_found = 0;
char curr_file[100];
long curr_s1 = 0;
long curr_s2 = 0;
long curr_s3 = 0;
char *dashed_line = "\n\t\t\t---------------------------\n\n";
char date[12];
long date_bias;
char date_save[12];
char domain[256];
char end_date[12];
char end_day[3];
char end_month[3];
char end_year[5];
long endloop;
char enterprise_string[64];
char *enterprise_string1 = "format=%Ses->client.ip%";
char *enterprise_string2 = "%Req->srvhdrs.content-length%";
long entire_period = 0;
long e400_count = 0;
char *e400_lit = "Syntax error ";
long e401_count = 0;
char *e401_lit = "Unauthorized ";
long e402_count = 0;
char *e402_lit = "Unauthorized ";
long e403_count = 0;
char *e403_lit = "Forbidden ";
long e404_count = 0;
char *e404_lit = "Not found ";
long e405_count = 0;
char *e405_lit = "Not found ";
long e406_count = 0;
char *e406_lit = "Internal server error";
long e410_count = 0;
char *e410_lit = "No longer available ";
long e500_count = 0;
char *e500_lit = "Internal server error";
long e501_count = 0;
char *e501_lit = "Not implemented ";
long e502_count = 0;
char *e502_lit = "Bad gateway ";
long e503_count = 0;
char *e503_lit = "Service unavailable ";
long e504_count = 0;
char *e504_lit = "Bad gateway ";
char error_code_file_string[256] = "error_code_file_";
char file2_buffer[8192];
int file10_open = 0;
int file11_open = 0;
int file12_open = 0;
int file15_open = 0;
int file16_open = 0;
long file_date_status = 0;
char file_list_string[256] = "file_list_";
long fqdn_limit;
char header_work[256];
char header_work2[256];
long indexx;
char *input_parm_file;
long int_start_year;
long int_start_month;
long int_start_day;
long int_end_year;
long int_end_month;
long int_end_day;
long ip_date_file_removed = 0;
char ip_date_file_string[256] = "ip_date_file_";
char ip_save[256];
int null_found = 0;
long num_clients;
long num_dates;
long num_dates_used = 0;
long num_uniq_items;
long oldest_year;
long oldest_month;
long oldest_day;
char outfile_string[256] = "outfile_";
char parm_line[256];
char pid[256] = "\0";
char plural[2];
char random_suffix_char[2] = " ";
int rc = 0;
char rec_date[12];
char rec_day[3];
int rec_day_int;
long rec_hour;
char rec_month[4];
int rec_month_int;
long rec_too_new;
char rec_year[3];
int rec_year_int;
char rec_year_long[5];
long record_count = 0;
char report1_ip_file_string[256] = "report1_ip_file_";
char report2_file_file_string[256] = "report2_file_file_";
char report3_ip_file_string[256] = "report3_ip_file_";
char report9_file_file_string[256] = "report9_file_file_";
char report_file_string[256] = "report_file_";
long report_index;
char report_input_file[256];
long report_limit;
char report_lit[256];
long report_num;
char report_work[256];
char r1_exclude_string[16];
long r1_limit = 30;
char *r1_lit = "Top %ld Clients Accessing %s:\n\n";
char r2_exclude_string[256];
long r2_limit = 40; /* User supplied */
char *r2_lit = "Top %ld Files Accessed on %s:\n\n";
long r3_limit = 40;
char *r3_lit = "Clients accessing files containing the string \"%s\" in this period:\n\n";
char r3_string[256];
int r5sorted_ip_date_file_created = 0;
int r5sorted_ip_date_file_removed = 0;
char r5sorted_ip_date_file_string[256] = "r5sorted_ip_date_file_";
long r5_data_needed = 0;
long r9_limit = 20;
char *r9_lit = "Top %ld Requests Causing Errors:\n\n";
char sorted_error_code_file_string[256] = "sorted_error_code_file_";
char sorted_report_file_string[256] = "sorted_report_file_";
char sort_string[256];
char start_date[12];
char start_day[3];
char start_month[3];
char start_year[5];
char std_error_file_string[256] = "std_error_file_";
char std_error_file_string2[256] = "std_error_file2_";
long tot_clients;
long tot_hits;
float tot_kb;
long tot_page_hits;
char uniq_ip_count_string[256] = "uniq_ip_count_";
char uniq_ip_list_string[256] = "uniq_ip_list_";
char uniq_item_count_string[256] = "uniq_item_count_";
char uniq_item_list_string[256] = "uniq_item_list_";
char *val;
char work_bytes[256];
long work_count;
char work_date[12];
char work_day[4];
char work_daynum[3];
char work_file_string[256] = "work_file_";
char work_mon[4];
char work_string1[256];
char work_string2[256];
char work_string3[256];
char work_string4[256];
char work_string5[256];
char work_year[5];
long y;
long youngest_year;
long youngest_month;
long youngest_day;
typedef struct struc_1
{
char s1_string[256];
long s1_count;
struct struc_1 *s1_next_hash_entry;
}d1;
struct struc_1 *report_hash;
struct struc_1 *report_array; /* Really an array, not used as a hash table. See create_report1_file(). */
typedef struct struc_3
{
char s3_date[12];
long s3_clients;
long s3_hits;
long s3_page_hits;
float s3_kbytes;
struct struc_3 *s3_next_hash_entry;
}d3;
struct struc_3 *report5_hash;
typedef struct struc_4
{
char s4_time[21];
long s4_hits;
float s4_kbytes;
}d4;
struct struc_4 *report7_array;
struct tm *time_ptr;
time_t lt;
FILE *fptr1; /* log file list */
FILE *fptr2; /* cat'd log file */
FILE *fptr3; /* stats report file */
FILE *fptr4; /* count of unique ip's */
FILE *fptr5; /* ip from each log rec */
FILE *fptr6; /* report initial data as i/p */
FILE *fptr7; /* count of report ip's or files*/
FILE *fptr8; /* report data to sort */
FILE *fptr9; /* sorted report data */
FILE *fptr10; /* ip_date_file */
FILE *fptr11; /* report1_ip_file */
FILE *fptr12; /* report2_file_file */
FILE *fptr13; /* error code file */
FILE *fptr14; /* sorted error code file */
FILE *fptr15; /* report3 initial data as o/p */
FILE *fptr16; /* report9 initial data as o/p */
FILE *fptr17; /* input parameter file */
long check_date_range();
void check_file_date();
long check_for_match();
long check_for_wildcards();
void close_some_files();
int create_record_stream();
void create_report();
void create_report_file();
char *dayname();
void do_reports();
char *error_lit();
void estimate_num_dates_entire();
void estimate_num_dates_specific();
void get_count();
void get_dates();
void get_domain_name();
void get_form_input();
void get_ip_count();
void get_random_suffix();
void get_time();
char *get_value();
long hash_date();
long hash_file();
long hash_ip();
void init();
void init_report5_hash();
void init_report7_array();
void init_report_array();
void init_report_hash();
char *month_name();
long month_num();
void omit_header();
void open_report_input_file();
void open_some_files();
void output_report5();
long page_hit();
void parse_input();
void populate_report_hash();
char *prepare_filename_string(char *);
void prepare_filename_strings();
void prepare_filename_suffix();
void prepare_file_list();
void prepare_parm_input();
void print_glossary();
void print_header();
void print_report_hash();
void print_sorry();
void print_time();
void report();
void report_setup();
void reports1and4_setup();
void report1_setup();
void report2_setup();
void report3_ongoing();
void report3_setup();
void report4();
void report5();
void report5_init();
void report5_ongoing();
void report6();
void report7();
void report7_ongoing();
void report8();
void report9_setup();
void reports8and9_ongoing();
void set_num_dates();
void update_report5_hash();
void update_report5_totals();
void update_report_hash();
void update_strucs();
void wrapup();
void wrapup_hogs();
int main(int argc, char *argv[])
{
input_parm_file = argv[1];
init();
/* get name of next log file */
while (fscanf(fptr1,"%s",curr_file) != EOF)
{
if (strstr(curr_file,"%") != NULL) {continue;}
if (entire_period == 0) {check_file_date();}
if (file_date_status != 0) {continue;} /* too old */
rc = create_record_stream();
if (rc > 0) {continue;}
omit_header();
/* to find file clobbering buffer (when "not found" becomes email address) */
printf("curr_file = %s\n",curr_file);
printf("buffer = %s\n\n",buffer);
rec_too_new = 0;
while (fscanf(fptr2,"%s%s%s",ip,ident,authuser)
!= EOF && rec_too_new == 0)
{
/* IP field can contain an unlimited number of leading nulls.
Corruption or even coring can occur. Skip the record.
*/
if (ip[0] < '!') {null_found = 1;}
/* authuser can sometimes consist of multiple fields,
so read fields til definitely at timestamp1
*/
if (fscanf(fptr2,"%s",timestamp1) == EOF) {break;}
while (timestamp1[0] != '[') {if (fscanf(fptr2,"%s",timestamp1) == EOF) break;}
/* Ignore timestamp1 if it has been partially clobbered by new rec.
Use current ip rather than ip of new rec (hard to extract from
clobbered timestamp1 field).
*/
if (strlen(timestamp1) != 21 || strstr(timestamp1,".") != NULL)
{
if (fscanf(fptr2,"%s%s",ident,authuser) == EOF) {break;}
if (fscanf(fptr2,"%s",timestamp1) == EOF) {break;}
while (timestamp1[0] != '[') {if (fscanf(fptr2,"%s",timestamp1) == EOF) break;}
}
if (fscanf(fptr2,"%s",timestamp2) == EOF) {break;}
/* Read http fields til last char of field read is ".
If more than one, second is request. There can be as
few as one or two fields, as many as 3 or more.
URL *can* contain blanks + stuff, throwing off expected field sequence.
For example, "GET /cgi-bin/Stats/stats.cgi [ <a href= HTTP/1.0" shows up
in the log record, and has 3 extraneous "fields" before "HTTP".
*/
work_count = 0;
if (fscanf(fptr2,"%s",file2_buffer) == EOF) {break;}
while (file2_buffer[strlen(file2_buffer)-1] != '"')
{
work_count++;
if (work_count == 2)
{
if (strlen(file2_buffer) > 255)
{
memcpy(file2,file2_buffer,255);
memset(file2+255,'\0',1);
}
else {strcpy(file2,file2_buffer);}
}
if (fscanf(fptr2,"%s",file2_buffer) == EOF) {break;}
}
if (fscanf(fptr2,"%s%s",status,work_bytes) == EOF) {break;}
if (strcmp(work_bytes,"-") == 0)
{bytes = 0;}
else {bytes = atol(work_bytes);}
/*printf("%s\n",timestamp1);
*/
if (strstr(file2,"default.ida") != NULL) {code_red_found = 1;}
if (null_found == 0 && code_red_found == 0)
{
update_strucs();
}
else
{
null_found = 0;
code_red_found = 0;
}
}
strcpy(work_string1,"rm ");
strcat(work_string1,work_file_string);
system(work_string1);
}
/* strcpy(work_string1,"rm ");
strcat(work_string1,file_list_string);
system(work_string1);
*/
close_some_files();
do_reports();
wrapup();
/* print_time("ending");
*/
return (0);
}
void init()
{
/* print_time("starting");
*/
prepare_parm_input();
prepare_filename_suffix();
prepare_filename_strings();
prepare_file_list();
open_some_files();
if (entire_period == 0) {get_dates();}
set_num_dates();
init_report5_hash();
init_report7_array();
memset(blank_string,' ',255);
memset(blank_string+255,'\0',1);
strcpy(report_work,blank_string);
memset(blank_ip,' ',15);
memset(blank_ip+15,'\0',1);
strcpy(domain,blank_string);
strcpy(domain,get_value("login"));
strcpy(r3_string,get_value("nameaccessfile"));
}
void get_time()
{
lt = time(NULL);
time_ptr = localtime(<);
}
void print_time(char * msg)
{
get_time();
printf("%s at %s<br>",msg,asctime(time_ptr));
}
void prepare_parm_input()
{
/*
Input parameters containing stats form fields and the path to the
stats data are contained in the input parm file provided by the
stats engine wrapper. Open the file, read all the records, and
concatenate the data into one long string. This string will be
equivalent to a query string that has been url-decoded.
The get_value subroutine will take this string and extract values
for specified input variables.
*/
if ((fptr17 = fopen(input_parm_file,"r")) == NULL )
{
printf("can't open %s\n",input_parm_file);
exit(1);
}
strcpy(buffer,"\0");
while (fscanf(fptr17,"%s",parm_line) != EOF)
{
strcat(buffer,parm_line);
strcat(buffer,"&");
}
if (strcmp(get_value("period"),"entire") == 0) {entire_period = 1;}
if (strcmp(get_value("usedailytotal"),"on") == 0 /* if report 5 requested */
|| strcmp(get_value("useperiodtotal"),"on") == 0 /* or report 4 requested */
|| strcmp(get_value("usedailyavg"),"on") == 0) /* or report 6 requested */
{r5_data_needed = 1;}
}
void prepare_filename_suffix()
{
strcat(pid,get_value("login")); /* use for unique filenames */
strcat(pid,"_");
get_random_suffix();
/* Shouldn't have more than one for an account at a time, but just in case */
strcat(pid,random_suffix_char); /* use for unique filenames */
}
void get_random_suffix()
{
int stime, random_suffix;
long ltime;
ltime = time(NULL);
stime = (unsigned) ltime/2;
srand(stime);
random_suffix = rand()%10;
switch (random_suffix)
{
case 0: strcpy(random_suffix_char,"0"); break;
case 1: strcpy(random_suffix_char,"1"); break;
case 2: strcpy(random_suffix_char,"2"); break;
case 3: strcpy(random_suffix_char,"3"); break;
case 4: strcpy(random_suffix_char,"4"); break;
case 5: strcpy(random_suffix_char,"5"); break;
case 6: strcpy(random_suffix_char,"6"); break;
case 7: strcpy(random_suffix_char,"7"); break;
case 8: strcpy(random_suffix_char,"8"); break;
case 9: strcpy(random_suffix_char,"9"); break;
default: break;
}
}
void prepare_filename_strings()
{
strcpy(error_code_file_string,prepare_filename_string(error_code_file_string));
strcpy(file_list_string,prepare_filename_string(file_list_string));
strcpy(ip_date_file_string,prepare_filename_string(ip_date_file_string));
strcpy(outfile_string,prepare_filename_string(outfile_string));
strcpy(r5sorted_ip_date_file_string,prepare_filename_string(r5sorted_ip_date_file_string));
strcpy(report_file_string,prepare_filename_string(report_file_string));
strcpy(report1_ip_file_string,prepare_filename_string(report1_ip_file_string));
strcpy(report2_file_file_string,prepare_filename_string(report2_file_file_string));
strcpy(report3_ip_file_string,prepare_filename_string(report3_ip_file_string));
strcpy(report9_file_file_string,prepare_filename_string(report9_file_file_string));
strcpy(sorted_error_code_file_string,prepare_filename_string(sorted_error_code_file_string));
strcpy(sorted_report_file_string,prepare_filename_string(sorted_report_file_string));
strcpy(uniq_ip_count_string,prepare_filename_string(uniq_ip_count_string));
strcpy(uniq_ip_list_string,prepare_filename_string(uniq_ip_list_string));
strcpy(uniq_item_count_string,prepare_filename_string(uniq_item_count_string));
strcpy(uniq_item_list_string,prepare_filename_string(uniq_item_list_string));
strcpy(work_file_string,prepare_filename_string(work_file_string));
}
char *prepare_filename_string(char *filename_string)
{
strcpy(work_string1,cgipath);
strcat(work_string1,filename_string);
strcat(work_string1,pid);
return(work_string1);
}
void prepare_file_list()
{
strcpy(work_string1,"ls ");
/* strcat(work_string1,get_value("path"));
*/ strcat(work_string1,"*access* > ");
strcat(work_string1,file_list_string);
system(work_string1);
}
void open_some_files()
{
if ((fptr1 = fopen(file_list_string,"r")) == NULL) {printf("can't open file_list\n");exit(1);}
if ((fptr3 = fopen(outfile_string,"w")) == NULL) {printf("can't open outfile\n");exit(1);}
if ((fptr15 = fopen(report3_ip_file_string,"w")) == NULL) {printf("can't open report3_ip_file\n");exit(1);}
if ((fptr16 = fopen(report9_file_file_string,"w")) == NULL) {printf("can't open report9_file_file\n");exit(1);}
if (r5_data_needed == 1)
{
if ((fptr10 = fopen(ip_date_file_string,"w")) == NULL)
{
printf("can't open ip_date_file\n");
exit(1);
}
file10_open = 1;
}
else if (strcmp(get_value("useclients"),"on") == 0) /* if report 1 requested */
{
if ((fptr11 = fopen(report1_ip_file_string,"w")) == NULL)
{
printf("can't open report1_ip_file\n");
exit(1);
}
file11_open = 1;
}
if (strcmp(get_value("usefiles"),"on") == 0) /* if report 2 requested */
{
if ((fptr12 = fopen(report2_file_file_string,"w")) == NULL)
{
printf("can't open report2_file_file\n");
exit(1);
}
file12_open = 1;
}
}
void set_num_dates()
{
if (entire_period == 0) {estimate_num_dates_specific();}
else {estimate_num_dates_entire();}
}
void estimate_num_dates_specific()
{
/*
6/98 - 6/98 -> 1 month
6/97 - 6/98 -> 13 months
7/97 - 6/98 -> 12 months
5/97 - 6/98 -> 14 months
*/
long work;
int year_diff, month_diff, startmonth, endmonth;
/* find number of months in range, multiply by 31 */
year_diff = atoi(end_year) - atoi(start_year);
startmonth = atoi(start_month);
endmonth = atoi(end_month);
month_diff = endmonth - startmonth;
num_dates = 31 * ((12 * year_diff) + month_diff + 1);
/* find date_bias (see comment in hash_date subroutine) */
memcpy(rec_year_long,start_year,4);
memset(rec_year_long + 4,'\0',1);
strcpy(rec_month,start_month);
strcpy(rec_day,"01");
work = (long) (372 * atoi(rec_year_long) + 31 * (atoi(rec_month) - 1) + atoi(rec_day));
date_bias = work % (long) num_dates;
}
long month_num(char *month_nam)
{
if (strcmp("Jan",month_nam) == 0) {return (1);}
if (strcmp("Feb",month_nam) == 0) {return (2);}
if (strcmp("Mar",month_nam) == 0) {return (3);}
if (strcmp("Apr",month_nam) == 0) {return (4);}
if (strcmp("May",month_nam) == 0) {return (5);}
if (strcmp("Jun",month_nam) == 0) {return (6);}
if (strcmp("Jul",month_nam) == 0) {return (7);}
if (strcmp("Aug",month_nam) == 0) {return (8);}
if (strcmp("Sep",month_nam) == 0) {return (9);}
if (strcmp("Oct",month_nam) == 0) {return (10);}
if (strcmp("Nov",month_nam) == 0) {return (11);}
if (strcmp("Dec",month_nam) == 0) {return (12);}
/* printf("%s %s bad month name = %s<br>",get_value("login"),timestamp1,month_name);
*/ return (13);
}
char * month_name(long month_number)
{
if (month_number == 1) {return "Jan";}
if (month_number == 2) {return "Feb";}
if (month_number == 3) {return "Mar";}
if (month_number == 4) {return "Apr";}
if (month_number == 5) {return "May";}
if (month_number == 6) {return "Jun";}
if (month_number == 7) {return "Jul";}
if (month_number == 8) {return "Aug";}
if (month_number == 9) {return "Sep";}
if (month_number == 10) {return "Oct";}
if (month_number == 11) {return "Nov";}
if (month_number == 12) {return "Dec";}
/* printf("bad month number = %s<br>",month_num);
*/ return "Jan";
}
void estimate_num_dates_entire()
{
/*
For "entire" period, allow for two full years of dates. Since the most
recent date is yesterday, the date range begins two years ago today.
To find the date_bias (see hash_date subroutine for comments), just find the
number of days so far this year.
Why is it so simple? Suppose today is 9/16/98. The normal ("specific")
date bias calculation would produce "work" = (365 * 98) + (31 * 9) + 16 = 36065,
which would be reduced to 295 via modulo 730 (730 is num_dates, 2 * 365).
295 just happens to be 31*9 + 16, the number of days so far this year.
So we don't need the year, nor do we need to modulo.
p.s. Don't sweat leap year.
*/
num_dates = 744; /* Since 372 used in hash_date, double it here, else bad things. */
/* num_dates = 730;
*/
get_time();
date_bias = (long) (31 * (time_ptr->tm_mon) + time_ptr->tm_mday);
/* date_bias = (long) (31 * (time_ptr->tm_mon+1) + time_ptr->tm_mday);
*/
}
void get_dates()
{
strcpy(start_day, get_value("startday"));
strcpy(start_month,get_value("startmonth"));
strcpy(start_year, get_value("startyear"));
strcpy(end_day, get_value("endday"));
strcpy(end_month, get_value("endmonth"));
strcpy(end_year, get_value("endyear"));
int_start_year = atoi(start_year);
int_start_month = atoi(start_month);
int_start_day = atoi(start_day);
int_end_year = atoi(end_year);
int_end_month = atoi(end_month);
int_end_day = atoi(end_day);
strcpy(start_date,month_name(int_start_month));
strcat(start_date," ");
strcat(start_date,start_day);
strcat(start_date," ");
strcat(start_date,start_year);
strcpy(end_date,month_name(int_end_month));
strcat(end_date," ");
strcat(end_date,end_day);
strcat(end_date," ");
strcat(end_date,end_year);
}
void init_report5_hash()
{
register long i;
if ((report5_hash = (struct struc_3 *) malloc(num_dates * sizeof(struct struc_3))) == NULL)
{
printf("error allocating report5 hash - aborting");
exit(1);
}
for (i = 0; i < num_dates; i++)
{
strcpy(report5_hash[i].s3_date,blank_date);
report5_hash[i].s3_clients = 0;
report5_hash[i].s3_hits = 0;
report5_hash[i].s3_page_hits = 0;
report5_hash[i].s3_kbytes = 0;
report5_hash[i].s3_next_hash_entry = NULL;
}
}
void init_report7_array()
{
long i;
if ((report7_array = (struct struc_4 *) malloc(24 * sizeof(struct struc_4))) == NULL)
{
printf("error allocating report7_array - aborting");
exit(1);
}
for (i = 0; i < 24; i++)
{
report7_array[i].s4_hits = 0;
report7_array[i].s4_kbytes = 0;
}
strcpy(report7_array[0].s4_time, "Midnight to 1 AM ");
strcpy(report7_array[1].s4_time, " 1 AM to 2 AM ");
strcpy(report7_array[2].s4_time, " 2 AM to 3 AM ");
strcpy(report7_array[3].s4_time, " 3 AM to 4 AM ");
strcpy(report7_array[4].s4_time, " 4 AM to 5 AM ");
strcpy(report7_array[5].s4_time, " 5 AM to 6 AM ");
strcpy(report7_array[6].s4_time, " 6 AM to 7 AM ");
strcpy(report7_array[7].s4_time, " 7 AM to 8 AM ");
strcpy(report7_array[8].s4_time, " 8 AM to 9 AM ");
strcpy(report7_array[9].s4_time, " 9 AM to 10 AM ");
strcpy(report7_array[10].s4_time," 10 AM to 11 AM ");
strcpy(report7_array[11].s4_time," 11 AM to 12 PM ");
strcpy(report7_array[12].s4_time," 12 PM to 1 PM ");
strcpy(report7_array[13].s4_time," 1 PM to 2 PM ");
strcpy(report7_array[14].s4_time," 2 PM to 3 PM ");
strcpy(report7_array[15].s4_time," 3 PM to 4 PM ");
strcpy(report7_array[16].s4_time," 4 PM to 5 PM ");
strcpy(report7_array[17].s4_time," 5 PM to 6 PM ");
strcpy(report7_array[18].s4_time," 6 PM to 7 PM ");
strcpy(report7_array[19].s4_time," 7 PM to 8 PM ");
strcpy(report7_array[20].s4_time," 8 PM to 9 PM ");
strcpy(report7_array[21].s4_time," 9 PM to 10 PM ");
strcpy(report7_array[22].s4_time," 10 PM to 11 PM ");
strcpy(report7_array[23].s4_time," 11 PM to Midnight");
}
void check_file_date()
{
char *date_pointer;
char file_year[3];
int file_year_int;
char file_month[3];
int file_month_int;
char file_day[3];
int file_day_int;
if ((date_pointer = strstr(curr_file,"-")) == NULL) /* can't tell */
{file_date_status = 0; return;}
memcpy(file_year,date_pointer+1,2);
memset(file_year+2,'\0',1);
file_year_int = atoi(file_year);
if (file_year_int > 90) {file_year_int += 1900;}
else {file_year_int += 2000;}
memcpy(file_month,date_pointer+3,2);
memset(file_month+2,'\0',1);
file_month_int = atoi(file_month);
memcpy(file_day,date_pointer+5,2);
memset(file_day+2,'\0',1);
file_day_int = atoi(file_day);
if (file_year_int < atoi(start_year)) {file_date_status = -1; return;}
if (file_year_int == atoi(start_year))
{
if ((file_month_int < atoi(start_month))
|| (file_month_int == atoi(start_month) && file_day_int < atoi(start_day)))
{ file_date_status = -1; return;}
}
file_date_status = 0;
}
int create_record_stream()
{
strcpy(cat_string,blank_string);
if (strstr(curr_file,".gz") != NULL)
{
strcpy(cat_string,"/usr/bin/gzcat ");
}
else
{
strcpy(cat_string,"cat ");
}
strcat(cat_string,curr_file);
strcat(cat_string," >");
strcat(cat_string,work_file_string); /* overwrite work file if it exists */
system(cat_string);
if ((fptr2 = fopen(work_file_string,"r")) == NULL)
{
printf("can't open work_file\n");
exit(1);
}
return 0;
}
void omit_header()
{
/* Enterprise server puts a one-line header at the start of a log file.
If it's there, omit it.
*/
fscanf(fptr2,"%s",header_work);
if(strcmp(header_work,"") == 0) /* Empty file */
{return;}
if (strstr(header_work,"format=%Ses->client.ip%") != NULL)
{
fscanf(fptr2,"%s%s%s%s%s%s",header_work,header_work,header_work,header_work,header_work,header_work);
}
else
{
fscanf(fptr2,"%s",header_work2); /* Don't know why it would, but...*/
if (strcmp(header_work2,"-") != 0) /* If *2nd* fscanf got ip field */
{
strcpy(ip,header_work2);
fscanf(fptr2,"%s",ident);
}
else
{
strcpy(ip,header_work);
strcpy(ident,header_work2);
}
fscanf(fptr2,"%s",authuser);
/* authuser can sometimes consist of multiple fields,
so read fields til definitely at timestamp1
*/
fscanf(fptr2,"%s",timestamp1);
while (timestamp1[0] != '[')
{ fscanf(fptr2,"%s",timestamp1);}
fscanf(fptr2,"%s",timestamp2);
/* Read http fields til last char of field read is ".
If more than one, second is request. There can be as
few as one or two fields, as many as 3 or more.
URL *can* contain blanks + stuff, throwing off expected field sequence.
For example, "GET /cgi-bin/Stats/stats.cgi [ <a href= HTTP/1.0" shows up
in the log record, and has 3 extraneous "fields" before "HTTP".
*/
work_count = 0;
fscanf(fptr2,"%s",file2_buffer);
while (file2_buffer[strlen(file2_buffer)-1] != '"')
{
work_count++;
if (work_count == 2)
{
if (strlen(file2_buffer) > 255)
{
memcpy(file2,file2_buffer,255);
memset(file2+255,'\0',1);
}
else {strcpy(file2,file2_buffer);}
}
fscanf(fptr2,"%s",file2_buffer);
}
fscanf(fptr2,"%s%s",status,work_bytes);
if (strcmp(work_bytes,"-") == 0)
{bytes = 0;}
else {bytes = atol(work_bytes);}
if (strstr(file2,"default.ida") != NULL) {code_red_found = 1;}
if (null_found == 0 && code_red_found == 0)
{
update_strucs();
}
else
{
null_found = 0;
code_red_found = 0;
}
}
}
void update_strucs()
{
long i;
parse_input();
if (entire_period == 0) {i = check_date_range();}
else {i = 0;}
if (i < 0) {return;} /* rec too old */
else if (i > 0) {rec_too_new = 1; return;} /* rec too new, done with file */
if (atoi(status) > 399) {reports8and9_ongoing(); return;}
record_count++;
/* if (record_count > 1000000)
{
wrapup_hogs();*/ /* shunt request to hogs queue */
/* printf("hog found -- %s\n",get_value("login"));
exit(10);
}
*/
/* cut recs for report input files, if reports requested */
if (r5_data_needed == 1)
{
fprintf(fptr10,"%s %s\n",ip,rec_date);
report5_ongoing(); /* needed if r4, r5, or r6 requested */
}
else if (strcmp(get_value("useclients"),"on") == 0) /* if report 1 requested */
{
fprintf(fptr11,"%s\n",ip);
}
if (strcmp(get_value("usefiles"),"on") == 0) /* if report 2 requested */
{
fprintf(fptr12,"%s\n",file2);
}
if (strcmp(get_value("usehourly"),"on") == 0) {report7_ongoing();}
if (strcmp(get_value("useclientaccess"),"on") == 0) {report3_ongoing();}
}
long check_date_range()
{
/* If log rec date before start date */
if ((rec_year_int < int_start_year)
|| (rec_year_int == int_start_year && rec_month_int < int_start_month)
|| (rec_year_int == int_start_year && rec_month_int == int_start_month && rec_day_int < int_start_day))
{
return -1;
}
/* If log rec date after end date */
else if ((rec_year_int > int_end_year)
|| (rec_year_int == int_end_year && rec_month_int > int_end_month)
|| (rec_year_int == int_end_year && rec_month_int == int_end_month && rec_day_int > int_end_day))
{
return 1;
}
else return 0;
}
void parse_input()
{
char work_hour[3];
memcpy(rec_date,timestamp1+1,11);
memset(rec_date + 11,'\0',1);
memcpy(work_hour,timestamp1+13,2);
memset(work_hour + 2,'\0',1);
rec_hour = atoi(work_hour);
memcpy(rec_year_long,rec_date+7,4);
memset(rec_year_long+4,'\0',1);
rec_year_int = atoi(rec_year_long);
memcpy(rec_month,rec_date+3,3);
memset(rec_month+3,'\0',1);
rec_month_int = month_num(rec_month);
memcpy(rec_day,rec_date,2);
memset(rec_day+2,'\0',1);
rec_day_int = atoi(rec_day);
if (record_count > 0)
{
if (rec_year_int < oldest_year
|| (rec_year_int == oldest_year && rec_month_int < oldest_month)
|| (rec_year_int == oldest_year && rec_month_int == oldest_month && rec_day_int < oldest_day))
{
oldest_year = rec_year_int;
oldest_month = rec_month_int;
oldest_day = rec_day_int;
strcpy(char_oldest_year,rec_year_long);
strcpy(char_oldest_day,rec_day);
}
if (rec_year_int > youngest_year
|| (rec_year_int == youngest_year && rec_month_int > youngest_month)
|| (rec_year_int == youngest_year && rec_month_int == youngest_month && rec_day_int > youngest_day))
{
youngest_year = rec_year_int;
youngest_month = rec_month_int;
youngest_day = rec_day_int;
strcpy(char_youngest_year,rec_year_long);
strcpy(char_youngest_day,rec_day);
}
}
else
{
oldest_year = rec_year_int;
oldest_month = rec_month_int;
oldest_day = rec_day_int;
strcpy(char_oldest_year,rec_year_long);
strcpy(char_oldest_day,rec_day);
youngest_year = rec_year_int;
youngest_month = rec_month_int;
youngest_day = rec_day_int;
strcpy(char_youngest_year,rec_year_long);
strcpy(char_youngest_day,rec_day);
}
}
void close_some_files()
{
if (file10_open) {fclose(fptr10);}
if (file11_open) {fclose(fptr11);}
if (file12_open && fptr12 != fptr11) {fclose(fptr12);}
fclose(fptr15);
if (fptr16 != fptr6) {fclose(fptr16);}
}
void do_reports()
{
print_header();
if (record_count == 0) {print_sorry(); return;}
print_time("starting report 5");
if (r5_data_needed == 1) {report5();}
if (strcmp(get_value("useperiodtotal"),"on") == 0 /* report 4 requested */
|| strcmp(get_value("useclients"),"on") == 0) /* report 1 requested */
{reports1and4_setup();}
print_time("starting report 4");
if (strcmp(get_value("useperiodtotal"),"on") == 0) {report4();}
print_time("starting report 6");
if (strcmp(get_value("usedailyavg"),"on") == 0) {report6();}
print_time("starting report 7");
if (strcmp(get_value("usehourly"),"on") == 0) {report7();}
print_time("starting report 1");
if (strcmp(get_value("useclients"),"on") == 0) {report_num = 1; report();}
print_time("starting report 2");
if (strcmp(get_value("usefiles"),"on") == 0) {report_num = 2; report();}
print_time("starting report 3");
if (strcmp(get_value("useclientaccess"),"on") == 0) {report_num = 3; report();}
print_time("starting report 8");
{report8();}
print_time("starting report 9");
{report_num = 9; report();}
{print_glossary();}
}
void print_sorry()
{
fprintf(fptr3,"\nWe're sorry, but no records fall in the date range specified.\n");
}
void print_header()
{
fprintf(fptr3,"From: statsmaster@simplenet.com\n");
fprintf(fptr3,"Subject: Your Statistics Report\n");
fprintf(fptr3,"Reply-to: statsmaster@simplenet.com\n");
/* get_time();
fprintf(fptr3,"Date: %s\n\n",asctime(time_ptr));
*/
fprintf(fptr3,"\t\t\t %s\n","SimpleNet Statistics");
fprintf(fptr3,"%s",dashed_line);
if (entire_period == 1 && record_count > 0)
{
strcpy(start_date,month_name(oldest_month));
strcat(start_date," ");
strcat(start_date,char_oldest_day);
strcat(start_date," ");
strcat(start_date,char_oldest_year);
strcpy(end_date,month_name(youngest_month));
strcat(end_date," ");
strcat(end_date,char_youngest_day);
strcat(end_date," ");
strcat(end_date,char_youngest_year);
}
if (entire_period == 0 || record_count > 0)
{
fprintf(fptr3,"Access Report for %s to %s\n\n",start_date,end_date);
}
else
{
fprintf(fptr3,"Access Report for entire period\n\n");
}
}
void reports1and4_setup()
{
/* If report 5 data not needed, report1_ip_file was created earlier. */
strcpy(work_string1,"cut -f1 -d\" \" < ");
strcat(work_string1,r5sorted_ip_date_file_string);
strcat(work_string1," > ");
strcat(work_string1,report1_ip_file_string);
strcpy(work_string2,"sort ");
strcat(work_string2,report1_ip_file_string);
strcat(work_string2," | uniq > ");
strcat(work_string2,uniq_ip_list_string);
strcpy(work_string3,"wc -l ");
strcat(work_string3,uniq_ip_list_string);
strcat(work_string3," > ");
strcat(work_string3,uniq_ip_count_string);
strcpy(work_string4,"rm ");
strcat(work_string4,uniq_ip_list_string);
strcpy(work_string5,"rm ");
strcat(work_string5,r5sorted_ip_date_file_string);
if (r5_data_needed == 1) {system(work_string1);}
system(work_string2);
system(work_string3);
system(work_string4);
get_ip_count();
if (r5_data_needed == 1)
{
system(work_string5);
r5sorted_ip_date_file_removed = 1;
}
}
void get_ip_count()
{
strcpy(work_string1,"rm ");
strcat(work_string1,uniq_ip_count_string);
if ((fptr4 = fopen(uniq_ip_count_string,"r")) == NULL)
{
printf("can't open uniq_ip_count\n");
exit(1);
}
fscanf(fptr4,"%ld",&num_clients);
system(work_string1);
}
/**********************************************************************************************/
/**********************************************************************************************/
/**********************************************************************************************/
/**********************************************************************************************/
/**********************************************************************************************/
void report3_ongoing()
{
if (check_for_match(file2,r3_string,"any"))
{
fprintf(fptr15,"%s\n",ip);
}
}
/**********************************************************************************************/
/**********************************************************************************************/
/**********************************************************************************************/
/**********************************************************************************************/
/**********************************************************************************************/
void report5_ongoing()
{
update_report5_hash(rec_date);
}
void update_report5_hash(char *work_date)
{
struct struc_3 *old, *start;
long done;
indexx = hash_date(work_date);
if (strcmp(report5_hash[indexx].s3_date,blank_date) == 0) /* If hash table entry is unoccupied */
{
strcpy(report5_hash[indexx].s3_date,work_date);
report5_hash[indexx].s3_hits++;
if (page_hit(file2) == 1)
{
report5_hash[indexx].s3_page_hits++;
}
report5_hash[indexx].s3_kbytes += (float)bytes / 1000;
}
else
if (strcmp(report5_hash[indexx].s3_date,work_date) == 0)/* Entry occupied, input date matches entry's */
{
report5_hash[indexx].s3_hits++;
if (page_hit(file2) == 1)
{
report5_hash[indexx].s3_page_hits++;
}
report5_hash[indexx].s3_kbytes += (float)bytes / 1000;
}
else /* Search chain for match -- if none, add entry to end of chain */
{
old = &report5_hash[indexx];
start = report5_hash[indexx].s3_next_hash_entry;
done = 0;
while (start != NULL && !done)
{
if (strcmp(start->s3_date,work_date) == 0)
{
start->s3_hits++;
if (page_hit(file2) == 1)
{
start->s3_page_hits++;
}
start->s3_kbytes += (float)bytes / 1000;
done = 1;
}
else
{
old = start;
start = start->s3_next_hash_entry;
}
}
if (!done)
{
start = (struct struc_3 *) malloc (sizeof(struct struc_3));
if (!start)
{
printf("out of memory\n");
return;
}
strcpy(start->s3_date,work_date);
start->s3_hits++;
if (strstr(file2,".htm") != NULL)
{
start->s3_page_hits++;
}
start->s3_kbytes += (float)bytes / 1000;
start->s3_next_hash_entry = NULL;
old->s3_next_hash_entry = start;
}
}
}
long page_hit(char *field)
{
long i;
char *ptr = field + strlen(field); /* string must be at end of filename */
char ptr2[5]; /* long enough to hold longest suffix + null */
for (i = 0; i < 4; i++)
{
memset(ptr2 + i,tolower(*(ptr - 4 + i)),1);
}
memset(ptr2 + i,'\0',1);
if (strcmp((char *)(ptr2 + 1),"htm") == 0 /* includes shtm */
|| strcmp((char *)(ptr2),"html") == 0 /* includes shtml */
|| strcmp((char *)(ptr2),".hts") == 0
|| strcmp((char *)(ptr2 + 1),".mv") == 0
|| strcmp((char *)(ptr2 + 3),"/") == 0)
return 1;
else return 0;
}
/**********************************************************************************************/
/**********************************************************************************************/
/**********************************************************************************************/
/**********************************************************************************************/
/**********************************************************************************************/
void report7_ongoing()
{
report7_array[rec_hour].s4_hits++;
report7_array[rec_hour].s4_kbytes += (float) bytes / 1000;
}
/**********************************************************************************************/
/**********************************************************************************************/
/**********************************************************************************************/
/**********************************************************************************************/
/**********************************************************************************************/
void reports8and9_ongoing()
{
if (strcmp(status,"400") == 0) {e400_count++; fprintf(fptr16,"%s\n",file2);}
else if (strcmp(status,"401") == 0) {e401_count++; fprintf(fptr16,"%s\n",file2);}
else if (strcmp(status,"402") == 0) {e402_count++; fprintf(fptr16,"%s\n",file2);}
else if (strcmp(status,"403") == 0) {e403_count++; fprintf(fptr16,"%s\n",file2);}
else if (strcmp(status,"404") == 0) {e404_count++; fprintf(fptr16,"%s\n",file2);}
else if (strcmp(status,"405") == 0) {e405_count++; fprintf(fptr16,"%s\n",file2);}
else if (strcmp(status,"406") == 0) {e406_count++; fprintf(fptr16,"%s\n",file2);}
else if (strcmp(status,"410") == 0) {e410_count++; fprintf(fptr16,"%s\n",file2);}
else if (strcmp(status,"500") == 0) {e500_count++; fprintf(fptr16,"%s\n",file2);}
else if (strcmp(status,"501") == 0) {e501_count++; fprintf(fptr16,"%s\n",file2);}
else if (strcmp(status,"502") == 0) {e502_count++; fprintf(fptr16,"%s\n",file2);}
else if (strcmp(status,"503") == 0) {e503_count++; fprintf(fptr16,"%s\n",file2);}
else if (strcmp(status,"504") == 0) {e504_count++; fprintf(fptr16,"%s\n",file2);}
}
/**********************************************************************************************/
/**********************************************************************************************/
/**********************************************************************************************/
/**********************************************************************************************/
/**********************************************************************************************/
void report()
{
/* long i;
*/
/*
open uniq_item_count file (based on all items that occurred, with no duplicates)
create hash table, sized as function of number of unique items
open item file (all items that occurred, with duplicates)
for each item,
map it into hash table
if it's already there, update the count (may have to traverse chain)
else add entry with hash_value, item, count=1, next_ptr
endfor
*/
report_setup();
init_report_hash();
init_report_array();
open_report_input_file();
populate_report_hash();
/* for (i = 0; i < scaling_factor * num_uniq_items; i++)
{
print_report_hash(report_hash[i],i);
}
*/
create_report_file();
create_report();
free(report_hash);
free(report_array);
}
void report_setup()
{
if (report_num != 1)
{
strcpy(sort_string,blank_string);
strcpy(sort_string,"sort < ");
}
report_limit = 0;
switch (report_num)
{
case 1: report1_setup(); break;
case 2: report2_setup(); break;
case 3: report3_setup(); break;
case 9: report9_setup(); break;
default: break;
}
if (report_num != 1) /* For r1 we've already done this. */
{
strcat(sort_string,report_input_file);
strcat(sort_string," | uniq > ");
strcat(sort_string,uniq_item_list_string);
system(sort_string);
strcpy(work_string1,"wc -l ");
strcat(work_string1,uniq_item_list_string);
strcat(work_string1," > ");
strcat(work_string1,uniq_item_count_string);
system(work_string1);
get_count();
strcpy(work_string1,"rm ");
strcat(work_string1,uniq_item_list_string);
system(work_string1);
}
}
void report1_setup()
{
char x[16] = "\0";
fqdn_limit = 40;
strcpy(x,get_value("numclients"));
if (strcmp(x,"") == 0) {r1_limit = 1000;}
else {r1_limit = atoi(x);}
report_limit = r1_limit;
num_uniq_items = num_clients;
if (strcmp(get_value("useexclude"),"on") == 0)
{
strcpy(r1_exclude_string,get_value("clientexclude"));
}
strcpy(enterprise_string,enterprise_string1);
strcpy(report_lit,r1_lit);
strcpy(report_input_file,report1_ip_file_string);
}
void report2_setup()
{
char x[256] = "\0";
fqdn_limit = 0;
strcpy(x,get_value("numfiles"));
if (strcmp(x,"") == 0) {r2_limit = 1000;}
else if (strlen(x) > 4) {r2_limit = 32766;} /* Max 32767 for numfiles */
else {r2_limit = atoi(x);}
report_limit = r2_limit;
if (strcmp(get_value("useexcludefile"),"on") == 0)
{
strcpy(r2_exclude_string,get_value("nameexcludefile"));
}
strcpy(enterprise_string,enterprise_string2);
strcpy(report_lit,r2_lit);
strcpy(report_input_file,report2_file_file_string);
}
void report3_setup()
{
fqdn_limit = 40;
report_limit = r3_limit;
tot_hits = 0;
strcpy(enterprise_string,enterprise_string1);
strcpy(report_lit,r3_lit);
strcpy(report_input_file,report3_ip_file_string);
}
void report9_setup()
{
fqdn_limit = 0;
report_limit = r9_limit;
strcpy(enterprise_string,enterprise_string2);
strcpy(report_lit,r9_lit);
strcpy(report_input_file,report9_file_file_string);
}
void get_count()
{
strcpy(work_string1,"rm ");
strcat(work_string1,uniq_item_count_string);
if ((fptr6 = fopen(uniq_item_count_string,"r")) == NULL)
{
printf("can't open uniq_item_count\n");
exit(1);
}
fscanf(fptr6,"%ld",&num_uniq_items);
system(work_string1);
}
void init_report_hash()
{
register long i;
if ((report_hash = (struct struc_1 *)
malloc(scaling_factor * num_uniq_items * sizeof(struct struc_1))) == NULL)
{
printf("error allocating report_hash - aborting");
exit(1);
}
for (i = 0; i < scaling_factor * num_uniq_items; i++)
{
strcpy(report_hash[i].s1_string,blank_string);
report_hash[i].s1_count = 0;
report_hash[i].s1_next_hash_entry = NULL;
}
}
void init_report_array()
{
register long i;
if ((report_array = (struct struc_1 *)
malloc(num_uniq_items * sizeof(struct struc_1))) == NULL)
{
printf("error allocating report_array - aborting");
exit(1);
}
for (i = 0; i < num_uniq_items; i++)
{
strcpy(report_array[i].s1_string,blank_string);
report_array[i].s1_count = 0;
report_array[i].s1_next_hash_entry = NULL;
}
}
void open_report_input_file()
{
if ((fptr7 = fopen(report_input_file,"r")) == NULL)
{
printf("can't open %s\n",report_input_file);
exit(1);
}
}
void populate_report_hash()
{
long enterprise_seen = 0;
report_index = 0;
while (fscanf(fptr7,"%s",report_work) != EOF)
{
if (strstr(report_work,enterprise_string) == NULL) /* omit enterprise record */
{
if (report_num == 1
&& strcmp(get_value("useexclude"),"on") == 0
&& check_for_match(report_work,r1_exclude_string,"left")){;}
else if (report_num == 2
&& ((strcmp(get_value("useexcludefile"),"on") == 0
&& check_for_match(report_work,r2_exclude_string,"any"))
|| (strcmp(get_value("usehtmlonly"),"on") == 0
&& page_hit(report_work) == 0))) {;}
else update_report_hash(report_work);
}
else if (enterprise_seen == 0) /* only subtract once from count of unique items */
{
num_uniq_items--;
enterprise_seen = 1;
}
}
fclose(fptr7);
}
/*
See if string 1 meets any criteria from string 2. String 1 might be the IP field
or the file field from a log record, while string 2 might be a list of one or
more character strings, each of which could have one or more wild card aspects
indicated by an asterisk. List items are separated by commas. String 2 might
come in from a form field such as "exclude clients matching this IP address pattern"
or "show all clients accessing this file" or "exclude files with this pattern."
For example, suppose string 1 is a file such as "index.html" and string 2 is
"in*.htm*,*.gif". This subroutine must determine if string 1 satisfies "in*.htm*"
OR "*.gif". While checking against an individual string within the list in
string 2, the subroutine must determine if string 1 satisfies EVERY part of
that substring from string 2. So, given "in*.htm*" as the substring from string 2,
string 1 must contain "in" AND ".htm" in that order.
Since "index.html" satisfies the first substring from string 2 ("in*.htm*"), the
subroutine determines that there is a match and returns 1. If no match can be
found, 0 is returned.
The third input parameter, "type," is either "left" or "any." "Left" means the
match must occur on the leftmost part of string 1, while "any" means that the match
can occur starting anywhere in string 1. Left matching is important when working
with IP's. E.g. if the IP in the log record is 123.132.209.5, and string 2 is
"209*", without requiring a left-end match we would get a false positive, i.e.
that there is a match when there really isn't as far as the user is concerned.
Even if "left" is specified, if string 2 begins with an asterisk, e.g. *209,
then it doesn't matter where a match occurs. Also, a left match only matters
for the first subsubstring of the substring from string 2. A substring from
string 2 has multiple subsubstrings when an asterisk has content to both sides
within the substring. E.g. given the substring "abc*ghi", there are two
subsubstrings, "abc" and "ghi". Even in a left match situation, once "abc"
matches the leftmost part of string 1, then anything else in the substring
can match anywhere in string 1 (to the right of "abc"). So, if string 1 is
"abcdefghi" and string 2 is "abc*ghi,vw.x.yz*", the substring "abc*ghi" matches
string 1.
*/
long check_for_match(char * string1, char * string2, char * type)
{
char *curr_ptr;
char *new_ptr;
long length;
long rc;
curr_ptr = string2;
while ((new_ptr = strstr(curr_ptr,",")) != NULL)
{
length = new_ptr - curr_ptr;
memcpy(work_string1,curr_ptr,length);
memset(work_string1 + length,'\0',1);
rc = check_for_wildcards(string1,work_string1,type);
if (rc == 1) {return 1;}
curr_ptr = new_ptr + 1; /* go past comma to next substring */
}
rc = check_for_wildcards(string1,curr_ptr,type);
if (rc == 0) {return 0;}
else {return 1;}
}
long check_for_wildcards(char * string_1, char * string_2, char * type2)
{
char *new_s1ptr;
char *curr_s1ptr;
char *new_s2ptr;
char *curr_s2ptr;
long length2;
long first_or_only = 1;
curr_s1ptr = string_1;
curr_s2ptr = string_2;
while ((new_s2ptr = strstr(curr_s2ptr,"*")) != NULL)
{
length2 = new_s2ptr - curr_s2ptr;
memcpy(work_string2,curr_s2ptr,length2);
memset(work_string2 + length2,'\0',1);
if ((new_s1ptr = strstr(curr_s1ptr,work_string2)) == NULL) {return 0;}
if (strcmp(type2,"left") == 0
&& first_or_only == 1
&& string_2[0] != '*'
&& new_s1ptr != curr_s1ptr) {return 0;}
first_or_only = 0;
curr_s1ptr = new_s1ptr + length2;
curr_s2ptr = new_s2ptr + 1; /* go past asterisk to next subsubstring */
}
if ((new_s1ptr = strstr(curr_s1ptr,curr_s2ptr)) == NULL)
{return 0;}
else if (strcmp(type2,"left") == 0
&& first_or_only == 1
&& string_2[0] != '*'
&& new_s1ptr != curr_s1ptr)
{return 0;}
else {return 1;}
}
void create_report_file()
{
long i;
strcpy(work_string1,"sort +0 -1 -rn ");
strcat(work_string1,report_file_string);
strcat(work_string1," > ");
strcat(work_string1,sorted_report_file_string);
strcpy(work_string2,"rm ");
strcat(work_string2,report_file_string);
if ((fptr8 = fopen(report_file_string,"w")) == NULL)
{
printf("can't open report_file\n");
exit(1);
}
/* Get count for each hash entry in use and write it to the report file.
Report array entry's next_hash_entry is pointer not to a next entry
in the report array, but to the *hash table* entry for this item.
We're just re-using the hash table structure. The count is unused.
*/
for (i = 0; i < report_index; i++)
{
fprintf(fptr8,"%ld %s\n",report_array[i].s1_next_hash_entry->s1_count,report_array[i].s1_string);
}
fclose(fptr8);
system(work_string1);
system(work_string2);
}
void create_report()
{
long i;
long report_count;
if ((fptr9 = fopen(sorted_report_file_string,"r")) == NULL)
{
printf("can't open sorted_report_file\n");
exit(1);
}
/* report_limit is the cutoff the USER set (e.g. top n clients) */
strcpy(work_string1,"rm ");
strcat(work_string1,sorted_report_file_string);
if (report_num == 1
|| report_num == 2) {fprintf(fptr3,report_lit,report_limit,domain);}
else if (report_num == 3) {fprintf(fptr3,report_lit,r3_string);}
else {fprintf(fptr3,report_lit,report_limit);}
strcpy(report_work,blank_string);
i = 0;
while (fscanf(fptr9,"%ld %s",&report_count,report_work) != EOF && i < report_limit)
{
fprintf(fptr3," %7ld",report_count);
if (i < fqdn_limit) /* Hard limit, else way too slow */
{
get_domain_name(report_work);
}
else
{
fprintf(fptr3,"\t%s\n",report_work);
}
i++;
if (report_num == 3) {tot_hits += report_count;}
strcpy(report_work,blank_string);
}
if (report_num == 3)
{
if (report_index != 1) {strcpy(plural,"s");} else {strcpy(plural,"");}
if (report_index == 0) {fprintf(fptr3,"\t\tNo match...\n");}
fprintf(fptr3,"\n\nTotalling %ld hits by %ld individual client%s\n",tot_hits,i,plural);
}
fprintf(fptr3,"%s",dashed_line);
system(work_string1);
}
void get_domain_name(char *input_ip)
{
u_int addr;
struct hostent *hp;
char **p;
if ((int)(addr = inet_addr(input_ip)) == -1)
{
(void) fprintf(fptr3,"\t%s\n",input_ip);
return;
/* (void) printf("IP-address must be of the form a.b.c.d\n");
exit (2);
*/ }
hp = gethostbyaddr((char *)&addr, sizeof (addr), AF_INET);
if (hp == NULL)
{
(void) fprintf(fptr3,"\t%s\n",input_ip);
return;
}
for (p = hp->h_addr_list; *p != 0; p++)
{
struct in_addr in;
/* char **q;
*/
(void) memcpy(&in.s_addr, *p, sizeof (in.s_addr));
(void) fprintf(fptr3,"\t%s",hp->h_name);
(void) fprintf(fptr3,"\n");
}
}
void print_report_hash(struct struc_1 *h_e, long i)
{
printf("%ld %s %ld %p\n<br>",i,h_e->s1_string,h_e->s1_count,h_e->s1_next_hash_entry);
if (h_e->s1_next_hash_entry != NULL)
{
print_report_hash(h_e->s1_next_hash_entry,i);
}
}
void update_report_hash(char *work_string)
{
struct struc_1 *old, *start;
long done;
if (report_num == 1 || report_num == 3)
{indexx = hash_ip(work_string);}
else if (report_num == 2 || report_num == 9)
{indexx = hash_file(work_string);}
if (strcmp(report_hash[indexx].s1_string,blank_string) == 0) /* If hash table entry is unoccupied */
{
strcpy(report_hash[indexx].s1_string,work_string);
report_hash[indexx].s1_count = 1;
strcpy(report_array[report_index].s1_string,work_string);
report_array[report_index].s1_next_hash_entry = &report_hash[indexx];
report_index++;
}
else
if (strcmp(report_hash[indexx].s1_string,work_string) == 0)/* Entry occupied, input ip matches entry's */
{
report_hash[indexx].s1_count++;
}
else /* Search chain for match -- if none, add entry to end of chain */
{
old = &report_hash[indexx];
start = report_hash[indexx].s1_next_hash_entry;
done = 0;
while (start != NULL && !done)
{
if (strcmp(start->s1_string,work_string) == 0)
{
start->s1_count++;
done = 1;
}
else
{
old = start;
start = start->s1_next_hash_entry;
}
}
if (!done)
{
start = (struct struc_1 *) malloc (sizeof(struct struc_1));
if (!start)
{
printf("out of memory\n");
return;
}
strcpy(start->s1_string,work_string);
start->s1_count = 1;
start->s1_next_hash_entry = NULL;
old->s1_next_hash_entry = start;
strcpy(report_array[report_index].s1_string,work_string);
report_array[report_index].s1_next_hash_entry = start;
report_index++;
}
}
}
/**********************************************************************************************/
/**********************************************************************************************/
/**********************************************************************************************/
/**********************************************************************************************/
/**********************************************************************************************/
void report4()
{
fprintf(fptr3,"Period Totals for %s:\n\n",domain);
fprintf(fptr3," \t\t\t All\t\t\t\t\t Kilobytes\nIndividual Clients Clients Hits \t Page Hits Transmitted\n\n");
fprintf(fptr3,"%18ld%12ld%11ld%18ld%16.2f\n",num_clients,tot_clients,tot_hits,tot_page_hits,tot_kb);
fprintf(fptr3,"%s",dashed_line);
}
/**********************************************************************************************/
/**********************************************************************************************/
/**********************************************************************************************/
/**********************************************************************************************/
/**********************************************************************************************/
void report5()
{
long i;
long count;
report5_init();
/* read r5sorted_ip_date_file
for each unique date
while more recs for this date
while next ip = saved ip, next rec
else bump count in report5 array of ip's seen
end
end
output report 5 if r5 requested (r4 and r6 need the rest even if r5 not requested)
*/
fscanf(fptr8,"%s %s",ip,date);
strcpy(date_save,date);
strcpy(ip_save,ip);
count = 1;
while (fscanf(fptr8,"%s %s",ip,date) != EOF)
{
if (strstr(ip_save,"format") != NULL)
{
strcpy(date_save,date);
strcpy(ip_save,ip);
continue;
}
if (strcmp(date,date_save) == 0)
{
if (strcmp(ip,ip_save) != 0)
{
count++;
strcpy(ip_save,ip);
}
}
else
{
update_report5_totals(count);
count = 1;
}
}
/* We still need sorted_ip_date_file for reports 1 and/or 4. */
/* Do last guy if not done already. Wouldn't be done already unless last record
in sorted_ip_date_file's date was different from previous record's. */
if (strcmp(date,date_save) == 0) {update_report5_totals(count);}
if (strcmp(get_value("usedailytotal"),"on") == 0) {output_report5();}
else
{
for (i=0; i < num_dates; i++) /* Used in report 6 */
{
if (strcmp(report5_hash[i].s3_date,blank_date) != 0){num_dates_used++;}
}
}
free(report5_hash);
}
void report5_init()
{
strcpy(work_string1,"sort +1 -2 +0 -1 ");
strcat(work_string1,ip_date_file_string);
strcat(work_string1," > ");
strcat(work_string1,r5sorted_ip_date_file_string);
strcpy(work_string2,"rm ");
strcat(work_string2,ip_date_file_string);
system(work_string1);
r5sorted_ip_date_file_created = 1;
system(work_string2);
ip_date_file_removed = 1;
if ((fptr8 = fopen(r5sorted_ip_date_file_string,"r")) == NULL)
{
printf("can't open r5sorted_ip_date_file\n");
exit(1);
}
tot_clients = 0;
tot_hits = 0;
tot_page_hits = 0;
tot_kb = 0;
}
void update_report5_totals(count)
long count;
{
long i;
i = hash_date(date_save);
report5_hash[i].s3_clients = count;
tot_clients += report5_hash[i].s3_clients;
tot_hits += report5_hash[i].s3_hits;
tot_page_hits += report5_hash[i].s3_page_hits;
tot_kb += report5_hash[i].s3_kbytes;
strcpy(date_save,date);
strcpy(ip_save,ip);
}
void output_report5()
{
/* Had to move work var's to global var's, since they were breaking here, though OK
before this subroutine was removed from main report5 subrtn. */
long i;
fprintf(fptr3,"Daily Totals for %s:\n",domain);
fprintf(fptr3,"\t\t\t\t\t\t\t\t Kilobytes\nDate\t\t Clients Hits \t Page Hits Transmitted\n\n");
for (i=0; i < num_dates; i++)
{
if (strcmp(report5_hash[i].s3_date,blank_date) != 0)
{
strcpy(work_date,report5_hash[i].s3_date);
memcpy(work_mon,work_date+3,3);
memset(work_mon + 3,'\0',1);
memcpy(work_daynum,work_date,2);
memset(work_daynum + 2,'\0',1);
memcpy(work_year,work_date+7,4);
memset(work_year + 4,'\0',1);
if (month_num(work_mon) == 13) {continue;}
strcpy(work_day,dayname(month_num(work_mon),atoi(work_daynum),atoi(work_year)));
fprintf(fptr3,"%s %s %.2s %.4s\t\t %5ld %10ld\t %10ld\t %10.2f\n",
work_day, work_mon, work_daynum, work_year,
report5_hash[i].s3_clients,
report5_hash[i].s3_hits,
report5_hash[i].s3_page_hits,
report5_hash[i].s3_kbytes);
num_dates_used++; /* Used in report 6 */
}
}
fprintf(fptr3,"%s",dashed_line);
}
char * dayname(m,d,y)
long m, d, y;
{
long val;
long dd[12] = {0,3,2,5,0,3,5,1,4,6,2,4};
if (m < 3) {y--;}
val = (y+(int)(y/4)-(int)(y/100)+(int)(y/400)+dd[m-1]+d) % 7;
switch (val)
{
case 0: return("Sun");
case 1: return("Mon");
case 2: return("Tue");
case 3: return("Wed");
case 4: return("Thu");
case 5: return("Fri");
case 6: return("Sat");
}
return ("");
}
/**********************************************************************************************/
/**********************************************************************************************/
/**********************************************************************************************/
/**********************************************************************************************/
/**********************************************************************************************/
void report6()
{
fprintf(fptr3,"Daily Averages:\n\n");
fprintf(fptr3," \t\t\t\t\t\t\t\t Kilobytes\n\t\t Clients Hits\t Page Hits Transmitted\n\n");
fprintf(fptr3,"%30ld%11ld%18ld%16.2f\n",
tot_clients / num_dates_used,
tot_hits / num_dates_used,
tot_page_hits / num_dates_used,
tot_kb / num_dates_used);
fprintf(fptr3,"%s",dashed_line);
}
/**********************************************************************************************/
/**********************************************************************************************/
/**********************************************************************************************/
/**********************************************************************************************/
/**********************************************************************************************/
void report7()
{
long i;
float tot_hits = 0;
float tot_kbytes = 0;
fprintf(fptr3,"Hourly Averages (Pacific Time):\n\n");
fprintf(fptr3," \t\t\t\t\t\t\t\t Kilobytes\n\t\t Time\t Hits\t Percentage\t Transmitted\n\n");
for (i = 0; i < 24; i++)
{
tot_hits += report7_array[i].s4_hits;
tot_kbytes += report7_array[i].s4_kbytes;
}
for (i = 0; i < 24; i++)
{
fprintf(fptr3,"\t%s\t%7ld\t\t%5.1f %%\t\t %10.2f\n",
report7_array[i].s4_time,
report7_array[i].s4_hits,
100*(float)report7_array[i].s4_hits/tot_hits,
100*report7_array[i].s4_kbytes/tot_kbytes);
}
fprintf(fptr3,"%s",dashed_line);
free(report7_array);
}
/**********************************************************************************************/
/**********************************************************************************************/
/**********************************************************************************************/
/**********************************************************************************************/
/**********************************************************************************************/
void report8()
{
long work_count;
char work_status[4];
strcpy(work_string1,"sort +0 -1 -rn +1 -2 < ");
strcat(work_string1,error_code_file_string);
strcat(work_string1," > ");
strcat(work_string1,sorted_error_code_file_string);
strcpy(work_string2,"rm ");
strcat(work_string2,error_code_file_string);
strcpy(work_string3,"rm ");
strcat(work_string3,sorted_error_code_file_string);
fprintf(fptr3,"Summary of HTTP errors:\n\n");
if ((fptr13 = fopen(error_code_file_string,"w")) == NULL)
{
printf("can't open error_code_file\n");
exit(1);
}
/* Don't write lit's to file 'cause they contain blanks, which throw off fscanf. */
if (e400_count > 0) {fprintf(fptr13,"%ld %s\n",e400_count,"400");}
if (e401_count > 0) {fprintf(fptr13,"%ld %s\n",e401_count,"401");}
if (e402_count > 0) {fprintf(fptr13,"%ld %s\n",e402_count,"402");}
if (e403_count > 0) {fprintf(fptr13,"%ld %s\n",e403_count,"403");}
if (e404_count > 0) {fprintf(fptr13,"%ld %s\n",e404_count,"404");}
if (e405_count > 0) {fprintf(fptr13,"%ld %s\n",e405_count,"405");}
if (e406_count > 0) {fprintf(fptr13,"%ld %s\n",e406_count,"406");}
if (e410_count > 0) {fprintf(fptr13,"%ld %s\n",e410_count,"410");}
if (e500_count > 0) {fprintf(fptr13,"%ld %s\n",e500_count,"500");}
if (e501_count > 0) {fprintf(fptr13,"%ld %s\n",e501_count,"501");}
if (e502_count > 0) {fprintf(fptr13,"%ld %s\n",e502_count,"502");}
if (e503_count > 0) {fprintf(fptr13,"%ld %s\n",e503_count,"503");}
if (e504_count > 0) {fprintf(fptr13,"%ld %s\n",e504_count,"504");}
fclose(fptr13);
system(work_string1);
system(work_string2);
if ((fptr14 = fopen(sorted_error_code_file_string,"r")) == NULL)
{
printf("can't open sorted_error_code_file\n");
exit(1);
}
while (fscanf(fptr14,"%ld%s",&work_count,work_status) != EOF)
{
fprintf(fptr3," %7ld\t%s\t%s\n",work_count,work_status,error_lit(work_status));
}
system(work_string3);
fprintf(fptr3,"%s",dashed_line);
}
char * error_lit(err_code)
char * err_code;
{
if (strcmp(err_code,"400") == 0) {return (e400_lit);}
if (strcmp(err_code,"401") == 0) {return (e401_lit);}
if (strcmp(err_code,"402") == 0) {return (e402_lit);}
if (strcmp(err_code,"403") == 0) {return (e403_lit);}
if (strcmp(err_code,"404") == 0) {return (e404_lit);}
if (strcmp(err_code,"405") == 0) {return (e405_lit);}
if (strcmp(err_code,"406") == 0) {return (e406_lit);}
if (strcmp(err_code,"410") == 0) {return (e410_lit);}
if (strcmp(err_code,"500") == 0) {return (e500_lit);}
if (strcmp(err_code,"501") == 0) {return (e501_lit);}
if (strcmp(err_code,"502") == 0) {return (e502_lit);}
if (strcmp(err_code,"503") == 0) {return (e503_lit);}
if (strcmp(err_code,"504") == 0) {return (e504_lit);}
return ("");
}
/**********************************************************************************************/
/**********************************************************************************************/
/**********************************************************************************************/
/**********************************************************************************************/
/**********************************************************************************************/
long hash_date(char *input_date)
{
long work;
/* Calculate hash value. */
/* Compute a number as a function of the year, month, and day.
pre-bias-Num = 365 * yr + 31 * mon + day
Num = pre-bias-Num - date_bias.
Date_bias is used so that all dates are in sequence within hash. It is calculated by
treating the first possible date in the date range as a data value and hashing in with that
to the date range before any bias is used. This gives a remainder or offset for the first
possible date into the hashed date range. We want the first possible date to line up with
the first possible hashed value, so all hashed values are in date sequence. Otherwise,
if the first possible date is somewhere within the range of hashed values, dates later than
that date can hash in either after it or *before* it in the hash range, causing a date sequence
problem. So, we subtract the bias from the first possible date's pre-bias Num so that date
will map to the beginning of the hash value range. We also adjust all other dates the same way.
To clarify, suppose there are 100 dates in the date range, and that hashing the first possible date,
say October 1, puts October 1's entry as the 30th entry in the hash table. The first 69 dates
after October 1 are OK, since they hash in after it in the hash range, but the next 30 dates
after *those* hash in *before* October 1's entry. So, printing entries out in hash table sequence
will show a date wraparound rather than a strictly ascending date sequence. To fix this, we
subtract 30 from each pre-bias Num that is calculated, so 30 becomes 0, etc.
We use 372 as a multiplier since we treat all months as having 31 days. This way 1/1 > 12/31.
*/
memcpy(rec_year_long,input_date + 7,4);
memset(rec_year_long + 4,'\0',1);
memcpy(rec_month,input_date+3,3);
memset(rec_month + 3,'\0',1);
memcpy(rec_day,input_date,2);
memset(rec_day + 2,'\0',1);
work = (long) (372 * atoi(rec_year_long) + 31 * (month_num(rec_month) - 1) + atoi(rec_day) - date_bias);
return
(
work % (long) num_dates
);
}
long hash_file(char *input_file)
{
long work = 0;
register long i;
/* Calculate hash value. */
for (i = 0; i < strlen(input_file); i++)
{
work += input_file[i] + 128;
}
return ((long)(work % (scaling_factor * num_uniq_items)));
}
long hash_ip(char *input_ip)
{
char work[16];
register long i,j,k,l;
strcpy(work,blank_ip);
/* Calculate hash value. */
/* Convert ip string; get rid of dots and reverse the digits (so more random).
So, e.g. 123.45.189.67 becomes 7698154321.
*/
/*printf("%s\n",input_ip);
*/
/* backwards ip often bigger than max long (2,147,483,647), so stop after 9 digits */
j = 0;
k = strlen(input_ip)-1;
if (k > 8) {l = k - 8;} else {l = 0;}
for (i = k; i >= l; i--)
{
if (isdigit(input_ip[i]))
{
work[j] = input_ip[i];
}
else
{
/* hi 4 bits -> 0, lo 4 bits -> 0-9, prepend 0011, -> 30-39 = number */
work[j] = (input_ip[i] & 9) | 48;
}
j++;
}
indexx = atol(work);
indexx = indexx % (scaling_factor * num_uniq_items);
if (indexx < 0) {return (-1 * indexx);}
else {return (indexx);}
}
void wrapup()
{
char mail_string[256];
strcpy(mail_string,blank_string);
strcpy(mail_string,"/usr/lib/sendmail ");
strcat(mail_string,"statsmaster@simplenet.com");
/* strcat(mail_string,get_value("address"));
*/ strcat(mail_string," < ");
strcat(mail_string,outfile_string);
strcat(mail_string,"\0");
strcpy(work_string1,"rm ");
strcat(work_string1,ip_date_file_string);
strcpy(work_string2,"rm ");
strcat(work_string2,cgipath);
strcat(work_string2,"report*");
strcat(work_string2,pid);
strcpy(work_string3,"rm ");
strcat(work_string3,outfile_string);
fclose(fptr3);
if (ip_date_file_removed != 1 && r5_data_needed == 1) {system(work_string1);}
system(work_string2);
system(mail_string);
system(work_string3);
strcpy(work_string5,"rm ");
strcat(work_string5,r5sorted_ip_date_file_string);
if (r5sorted_ip_date_file_created == 1
&& r5sorted_ip_date_file_removed != 1) {system(work_string5);}
}
void wrapup_hogs()
{
strcpy(work_string1,"rm ");
strcat(work_string1,file_list_string);
system(work_string1);
strcpy(work_string1,"rm ");
strcat(work_string1,work_file_string);
system(work_string1);
strcpy(work_string1,"rm ");
strcat(work_string1,ip_date_file_string);
if (ip_date_file_removed != 1 && r5_data_needed == 1) {system(work_string1);}
strcpy(work_string1,"rm ");
strcat(work_string1,cgipath);
strcat(work_string1,"report*");
strcat(work_string1,pid);
system(work_string1);
fclose(fptr3);
strcpy(work_string1,"rm ");
strcat(work_string1,outfile_string);
system(work_string1);
strcpy(work_string5,"rm ");
strcat(work_string5,r5sorted_ip_date_file_string);
if (r5sorted_ip_date_file_created == 1
&& r5sorted_ip_date_file_removed != 1) {system(work_string5);}
}
void get_form_input()
{
/* This subroutine gets the form variables and values as one big string, "buffer".
You need a global variable called "buffer", e.g. char buffer[1024];
You also need a global variable called "val" declared e.g. char *val.
You also need the other subroutine, "get_value", to access the variables. */
char r_method[5];
char c_length[6];
char work_buffer[1024];
long i = 0;
long j = 0;
register char digit;
if (getenv("REQUEST_METHOD") != NULL)
{
strcpy(r_method,getenv("REQUEST_METHOD"));
if (strcmp(r_method,"POST") == 0)
{
if (getenv("CONTENT_LENGTH") != NULL)
{
strcpy(c_length,getenv("CONTENT_LENGTH"));
fgets(work_buffer,atoi(c_length)+1,stdin);
}
}
else if (getenv("QUERY_STRING") != NULL)
{
strcpy(work_buffer,getenv("QUERY_STRING"));
}
}
/* Convert from urlencoding */
while (i < strlen(work_buffer))
{
if (work_buffer[i] == '%')
{
if (work_buffer[i+1] >= 'A') {digit = ((work_buffer[i+1] & 0xdf) - 'A') + 10;}
else {digit = work_buffer[i+1] - '0';}
digit *= 16;
if (work_buffer[i+2] >= 'A') {digit += ((work_buffer[i+2] & 0xdf) - 'A') + 10;}
else {digit += (work_buffer[i+2] - '0');}
buffer[j] = digit;
i += 3;
}
else if (work_buffer[i] == '+') {buffer[j] = ' '; i++;}
else {buffer[j] = work_buffer[i]; i++;}
j++;
}
strcat(buffer,"\0");
val = NULL; /* Init here, so can check on automatically freeing it on entry to get_value. */
}
char * get_value(char varname[256])
{
/* This subroutine extracts the value of a form variable. The name of the variable
must be passed as a string. E.g. to get the value of "form_city", call as follows:
get_value("form_city"); The value is returned as a string. You can use "val" or assign
the value to a variable, e.g. strcpy(city,get_value("form_city")); Make sure you
allocate space for the receiving variable, e.g. char city[256]; rather than char *city;
*/
char *name_start;
char *val_start;
char *parm_end;
char name[100];
long val_length, name_length;
strcpy(name,varname);
strcat(name,"=");
name_length = strlen(name);
if (val != NULL) {free(val);}
if (strstr(buffer,name) != NULL)
{
name_start = strstr(buffer,name);
val_start = name_start + name_length;
if (strstr(name_start,"&") != NULL)
{
parm_end = strstr(name_start,"&");
val_length = parm_end - val_start;
}
else {val_length = buffer + strlen(buffer) - val_start;}
val = (char *) malloc (val_length + 1);
memcpy(val,name_start + name_length,val_length);
memset(val + val_length,'\0',1);
}
else
{
val = (char *) malloc (10);
strcpy(val,"not found");
}
return(val);
}
void print_glossary()
{
fprintf(fptr3,"
Definitions\n
Hit: A request for any object that is on your site. Each element of a
requested page (a graphic, a sound file, or the page itself) is counted
as a hit. A page on your site that contains five graphics generates six
hits - the five images and the original request made for the page.\n
Page Hit: The total number of times any of your pages are visited. The
same user counts as a page hit each time he loads one of your pages.\n
Client: A unique IP number that has accessed your site (i.e. person).\n
Individual Clients: If a client accesses more than once on different
days within your report period, this counts only once, whereas the Clients
and All Clients fields count the client again.\n
Kilobytes transmitted: The number of Kilobytes of data transmitted from
your site.
");
}