This program produces web site statistics from log file records.

       Various reports are produced as a function of criteria supplied by the user.

       Depending on the time period specified by the user, one or more log files

       must be accessed. There is always a current "access" log file and possibly

       one or more archive log files. Archive files are gzipped and named to

       reflect the date.

       To speed processing, as little time as possible is spent reading log records.

       Fields are read into work areas and later distributed and processed as needed.

       Arrays are built driven by the needs of the various reports. If an array fills

       up, it is enlarged using "realloc." Arrays are sorted using quicksort after

       they are finished being populated.

       Report 1: Top n clients accessing the site. Shows a count and the domain name.

              Sorted by count, highest first.

       Report 2: Top n files accessed. Shows count and file (e.g. a gif or an HTML page).

              Sorted by count, highest first.

       Report 3: Files containing the string "whatever." Shows count and client (IP).

              Sorted by count, highest first.

       Report 4: Period totals for site. Shows individual clients, all clients, hits,

              page hits, KB transmitted. Individual clients = unique IP's; all clients

              counts an IP as many times as days it occurs, and = the sum of the "clients"

              column in report 5; hits is sum of "hits" column in report 5; page hits =

              sum of "page hits" column in report 5; KB transmitted = sum of "KB" column

              in report 5.

       Report 5: Daily Totals for site. Shows date, clients, hits, page hits, KB transmitted.

              Date is in form: weekday (3 char), e.g. Mon; month (3 char), day (no leading zero),

              year (4 digits). KB is to two decimal places.

       Report 6: Daily Averages. Shows same columns as report 5. Values = report 4 values

              divided by number of days in period. Clients here is based on "all clients"

              from report 4, rather than "individual clients."

       Report 7: Hourly averages. Shows hour, hits, percentage, and KB transmitted.

              Hour appears as e.g. "Midnight to 1 am" or "1 am to 2 am," Percentage is

              nn.n %.

       Report 8: Summary of HTTP errors. Shows count, error code (e.g. 404), text (e.g. Not Found).

              Sorted by count, highest first.

       Report 9: (Top 20) requests causing errors. Shows count and request (file).

              Sorted by count, highest first.

       Design notes:

              Since this isn't Perl, and none of us can figure out how to redirect standard output

              from a shell command back to this C program's standard in, the output of the cat or

              gzcat is to a work file.

              For reports 4-9 we read in the log records one-by-one. But we only read each

              work file once, extracting all of the data for all of those reports in one pass.

              The processing for reports 1-3 is affected by the fact that there can be from a handful

              to millions of log records to be processed for a given user submission of the stats

              engine. This precludes building an array in-core to hold information for each ip

              encountered, e.g., without first having determined how many occur. And even if we

              built an array based on that count of unique ip's, if we read each log record we have

              to search the array each time to find the right entry to increment the count for.

              So, a hash table is a better data structure to use.

              For these reports, we extract selected log fields from each relevant log file

              (based on the user-supplied date range) using shell commands (via system calls)

              and concatenate this output to a work file. This allows us to determine how many

              hash table entries to allocate for the primary array. We also can perform a number

              of useful operations on the combined log data.

              For the purposes of report 1 we "cut" extract just the first field, which is the ip,

              from each log record.* After cutting from all relevant files, this gives us a file of

              all occurrences of ip's. We sort this and pipe to "uniq" which gives us just the unique

              ip's that occurred. A "wc" of that tells us how many ip's to take into account when

              allocating the primary hash table stucture.

              We can now populate the hash table. Reading in a record from the file with all ip

              occurrences, we directly hash to the right entry and increment its count. In case of

              the entry already being populated by a different ip (a synonym situation), we chain a

              linked list off the main entry so that each different ip mapping to that same primary

              entry gets its own entry.

              When we're done accumulating occurrences, we can produce report 1.

              For report 2 we employ a similar strategy to that used for report 1. We extract the

              "file" field instead of the ip field. A hash table is eventually built based on the

              file combining the "file" fields from all relevant log files.

              For report 3 we read the log records as in reports 4-9, but proceed more like we do for

              reports 1 and 2. We look for records whose "file" field contains the user-supplied string.

              For those that do, we write a record to a file with just the ip. After all log files

              have been read, we process the resulting ip file. We sort it and uniq it to get the

              number of unique ip's. We read this in and allocate an array with that many entries.

              We read the sorted ip file and accumulate the count of occurrences for each ip there.

              We then write the ip's and counts to a file and sort it by count, descending. We read

              that in and produce the report.

              *Since for report 5 we need ip information within date, we build a work file with both

              ip and date. We use this later for report 1, cutting the ip column to create the ip file.

*/

#include "stdio.h"

#include "string.h"

#include "time.h"

#include "math.h"

#include "stdlib.h"

#include "unistd.h"       /* For getpid, execle commands */

#include <sys/types.h>

#include <sys/socket.h>

#include <netinet/in.h>

#include <arpa/inet.h>

#include <netdb.h>

#include <ctype.h>

/* Multiplied times number of unique ip's when malloc'ing hash table */

#define scaling_factor 3

char ip[8192];            /* Host   (IP address)       */

char ident[256];       /* Ident field          */

char authuser[256];      /* Authuser             */

char timestamp1[256];    /* Time Stamp part 1 */

char timestamp2[256];    /* Time Stamp part 2 */

char file1[256];       /* HTTP Request       part 1       */

char file2[256];       /* HTTP Request       part 2       */

char file3[256];       /* HTTP Request       part 3       */

char status[256];       /* Status Code          */

long bytes;          /* Transfer Volume */

char *blank_date = "           ";     /* 11 blanks */

char blank_ip[16];

char blank_string[256];

char *buff;

char buffer[1024];

char *buff2;

long bump1 = 1000;

long bump2 = 100;

long bump3 = 10;

char cat_string[256];

char cgipath[10] = "/tmp/";

/*char cgipath[44] = "/Space/Domains/stats.simplenet.net/cgi-bin/";*/

char char_oldest_day[4];

char char_oldest_year[5];

char char_youngest_day[4];

char char_youngest_year[5];

int   code_red_found = 0;

char curr_file[100];

long curr_s1 = 0;

long curr_s2 = 0;

long curr_s3 = 0;

char *dashed_line = "\n\t\t\t---------------------------\n\n";

char date[12];

long date_bias;

char date_save[12];

char domain[256];

char end_date[12];

char end_day[3];

char end_month[3];

char end_year[5];

long endloop;

char enterprise_string[64];

char *enterprise_string1 = "format=%Ses->client.ip%";

char *enterprise_string2 = "%Req->srvhdrs.content-length%";

long entire_period = 0;

long e400_count = 0;

char *e400_lit = "Syntax error         ";

long e401_count = 0;

char *e401_lit = "Unauthorized         ";

long e402_count = 0;

char *e402_lit = "Unauthorized         ";

long e403_count = 0;

char *e403_lit = "Forbidden            ";

long e404_count = 0;

char *e404_lit = "Not found            ";

long e405_count = 0;

char *e405_lit = "Not found            ";

long e406_count = 0;

char *e406_lit = "Internal server error";

long e410_count = 0;

char *e410_lit = "No longer available ";

long e500_count = 0;

char *e500_lit = "Internal server error";

long e501_count = 0;

char *e501_lit = "Not implemented      ";

long e502_count = 0;

char *e502_lit = "Bad gateway          ";

long e503_count = 0;

char *e503_lit = "Service unavailable ";

long e504_count = 0;

char *e504_lit = "Bad gateway          ";

char error_code_file_string[256] = "error_code_file_";

char file2_buffer[8192];

int   file10_open = 0;

int   file11_open = 0;

int   file12_open = 0;

int   file15_open = 0;

int   file16_open = 0;

long file_date_status = 0;

char file_list_string[256] = "file_list_";

long fqdn_limit;

char header_work[256];

char header_work2[256];

long indexx;

char *input_parm_file;

long int_start_year;

long int_start_month;

long int_start_day;

long int_end_year;

long int_end_month;

long int_end_day;

long ip_date_file_removed = 0;

char ip_date_file_string[256] = "ip_date_file_";

char ip_save[256];

int   null_found = 0;

long num_clients;

long num_dates;

long num_dates_used = 0;

long num_uniq_items;

long oldest_year;

long oldest_month;

long oldest_day;

char outfile_string[256] = "outfile_";

char parm_line[256];

char pid[256] = "\0";

char plural[2];

char random_suffix_char[2] = " ";

int   rc = 0;

char rec_date[12];

char rec_day[3];

int   rec_day_int;

long rec_hour;

char rec_month[4];

int   rec_month_int;

long rec_too_new;

char rec_year[3];

int   rec_year_int;

char rec_year_long[5];

long record_count = 0;

char report1_ip_file_string[256] = "report1_ip_file_";

char report2_file_file_string[256] = "report2_file_file_";

char report3_ip_file_string[256] = "report3_ip_file_";

char report9_file_file_string[256] = "report9_file_file_";

char report_file_string[256] = "report_file_";

long report_index;

char report_input_file[256];

long report_limit;

char report_lit[256];

long report_num;

char report_work[256];

char r1_exclude_string[16];

long r1_limit = 30;

char *r1_lit = "Top %ld Clients Accessing %s:\n\n";

char r2_exclude_string[256];

long r2_limit = 40;                  /* User supplied */

char *r2_lit = "Top %ld Files Accessed on %s:\n\n";

long r3_limit = 40;

char *r3_lit = "Clients accessing files containing the string \"%s\" in this period:\n\n";

char r3_string[256];

int   r5sorted_ip_date_file_created = 0;

int   r5sorted_ip_date_file_removed = 0;

char r5sorted_ip_date_file_string[256] = "r5sorted_ip_date_file_";

long r5_data_needed = 0;

long r9_limit = 20;

char *r9_lit = "Top %ld Requests Causing Errors:\n\n";

char sorted_error_code_file_string[256] = "sorted_error_code_file_";

char sorted_report_file_string[256] = "sorted_report_file_";

char sort_string[256];

char start_date[12];

char start_day[3];

char start_month[3];

char start_year[5];

char std_error_file_string[256] = "std_error_file_";

char std_error_file_string2[256] = "std_error_file2_";

long tot_clients;

long tot_hits;

float tot_kb;

long tot_page_hits;

char uniq_ip_count_string[256] = "uniq_ip_count_";

char uniq_ip_list_string[256] = "uniq_ip_list_";

char uniq_item_count_string[256] = "uniq_item_count_";

char uniq_item_list_string[256] = "uniq_item_list_";

char *val;

char work_bytes[256];

long work_count;

char work_date[12];

char work_day[4];

char work_daynum[3];

char work_file_string[256] = "work_file_";

char work_mon[4];

char work_string1[256];

char work_string2[256];

char work_string3[256];

char work_string4[256];

char work_string5[256];

char work_year[5];

long y;

long youngest_year;

long youngest_month;

long youngest_day;

typedef struct struc_1

{

       char   s1_string[256];

       long   s1_count;

       struct struc_1 *s1_next_hash_entry;

}d1;

struct struc_1 *report_hash;

struct struc_1 *report_array;      /* Really an array, not used as a hash table. See create_report1_file(). */

typedef struct struc_3

{

       char   s3_date[12];

       long   s3_clients;

       long   s3_hits;

       long   s3_page_hits;

       float s3_kbytes;

       struct struc_3 *s3_next_hash_entry;

}d3;

struct struc_3 *report5_hash;

typedef struct struc_4

{

       char s4_time[21];

       long s4_hits;

       float s4_kbytes;

}d4;

struct struc_4 *report7_array;

struct tm *time_ptr;

time_t lt;

FILE *fptr1;       /* log file list            */

FILE *fptr2;       /* cat'd log file          */

FILE *fptr3;       /* stats report file          */

FILE *fptr4;       /* count of unique ip's          */

FILE *fptr5;       /* ip from each log rec    */

FILE *fptr6;       /* report initial data as i/p */

FILE *fptr7;       /* count of report ip's or files*/

FILE *fptr8;       /* report data to sort        */

FILE *fptr9;       /* sorted report data          */

FILE *fptr10;       /* ip_date_file               */

FILE *fptr11;       /* report1_ip_file            */

FILE *fptr12;       /* report2_file_file          */

FILE *fptr13;       /* error code file          */

FILE *fptr14;       /* sorted error code file       */

FILE *fptr15;       /* report3 initial data as o/p */

FILE *fptr16;       /* report9 initial data as o/p */

FILE *fptr17;       /* input parameter file          */

long check_date_range();

void check_file_date();

long check_for_match();

long check_for_wildcards();

void close_some_files();

int   create_record_stream();

void create_report();

void create_report_file();

char *dayname();

void do_reports();

char *error_lit();

void estimate_num_dates_entire();

void estimate_num_dates_specific();

void get_count();

void get_dates();

void get_domain_name();

void get_form_input();

void get_ip_count();

void get_random_suffix();

void get_time();

char *get_value();

long hash_date();

long hash_file();

long hash_ip();

void init();

void init_report5_hash();

void init_report7_array();

void init_report_array();

void init_report_hash();

char *month_name();

long month_num();

void omit_header();

void open_report_input_file();

void open_some_files();

void output_report5();

long page_hit();

void parse_input();

void populate_report_hash();

char *prepare_filename_string(char *);

void prepare_filename_strings();

void prepare_filename_suffix();

void prepare_file_list();

void prepare_parm_input();

void print_glossary();

void print_header();

void print_report_hash();

void print_sorry();

void print_time();

void report();

void report_setup();

void reports1and4_setup();

void report1_setup();

void report2_setup();

void report3_ongoing();

void report3_setup();

void report4();

void report5();

void report5_init();

void report5_ongoing();

void report6();

void report7();

void report7_ongoing();

void report8();

void report9_setup();

void reports8and9_ongoing();

void set_num_dates();

void update_report5_hash();

void update_report5_totals();

void update_report_hash();

void update_strucs();

void wrapup();

void wrapup_hogs();

int main(int argc, char *argv[])

{

       input_parm_file = argv[1];

       init();

       /* get name of next log file */

       while (fscanf(fptr1,"%s",curr_file) != EOF)

       {

              if (strstr(curr_file,"%") != NULL)       {continue;}

              if (entire_period == 0)                     {check_file_date();}

              if (file_date_status != 0)           {continue;} /* too old */

              rc = create_record_stream();

              if (rc > 0) {continue;}

              omit_header();

/* to find file clobbering buffer (when "not found" becomes email address) */

printf("curr_file = %s\n",curr_file);

printf("buffer = %s\n\n",buffer);

              rec_too_new = 0;

              while (fscanf(fptr2,"%s%s%s",ip,ident,authuser)

                     != EOF && rec_too_new == 0)

              {

                     /* IP field can contain an unlimited number of leading nulls.

                        Corruption or even coring can occur. Skip the record.

                     */

                     if (ip[0] < '!') {null_found = 1;}



                     /* authuser can sometimes consist of multiple fields,

                        so read fields til definitely at timestamp1

                     */

                     if (fscanf(fptr2,"%s",timestamp1) == EOF) {break;}

                     while (timestamp1[0] != '[') {if (fscanf(fptr2,"%s",timestamp1) == EOF) break;}

                     /* Ignore timestamp1 if it has been partially clobbered by new rec.

                        Use current ip rather than ip of new rec (hard to extract from

                        clobbered timestamp1 field).

                     */

                     if (strlen(timestamp1) != 21 || strstr(timestamp1,".") != NULL)

                     {

                            if (fscanf(fptr2,"%s%s",ident,authuser) == EOF) {break;}

                            if (fscanf(fptr2,"%s",timestamp1) == EOF) {break;}

                            while (timestamp1[0] != '[') {if (fscanf(fptr2,"%s",timestamp1) == EOF) break;}

                     }

                     if (fscanf(fptr2,"%s",timestamp2) == EOF) {break;}

                     /* Read http fields til last char of field read is ".

                        If more than one, second is request. There can be as

                        few as one or two fields, as many as 3 or more.

                        URL *can* contain blanks + stuff, throwing off expected field sequence.

                        For example, "GET /cgi-bin/Stats/stats.cgi [ <a href= HTTP/1.0" shows up

                        in the log record, and has 3 extraneous "fields" before "HTTP".

                     */

                     work_count = 0;

                     if (fscanf(fptr2,"%s",file2_buffer) == EOF) {break;}

                     while (file2_buffer[strlen(file2_buffer)-1] != '"')

                     {

                            work_count++;

                            if (work_count == 2)

                            {

                                   if (strlen(file2_buffer) > 255)

                                   {

                                          memcpy(file2,file2_buffer,255);

                                          memset(file2+255,'\0',1);

                                   }

                                   else {strcpy(file2,file2_buffer);}

                            }

                            if (fscanf(fptr2,"%s",file2_buffer) == EOF) {break;}

                     }

                     if (fscanf(fptr2,"%s%s",status,work_bytes) == EOF) {break;}

                     if   (strcmp(work_bytes,"-") == 0)

                            {bytes = 0;}

                     else       {bytes = atol(work_bytes);}

/*printf("%s\n",timestamp1);

*/

                     if       (strstr(file2,"default.ida") != NULL)       {code_red_found = 1;}

                     if       (null_found == 0 && code_red_found == 0)

                     {

                            update_strucs();

                     }

                     else

                     {

                            null_found = 0;

                            code_red_found = 0;

                     }

              }

              strcpy(work_string1,"rm ");

              strcat(work_string1,work_file_string);

              system(work_string1);

       }

/*       strcpy(work_string1,"rm ");

       strcat(work_string1,file_list_string);

       system(work_string1);

*/

       close_some_files();

       do_reports();

       wrapup();

/*       print_time("ending");

*/

       return (0);

}

void init()

{

/*       print_time("starting");

*/

       prepare_parm_input();

       prepare_filename_suffix();

       prepare_filename_strings();

       prepare_file_list();

       open_some_files();

       if (entire_period == 0) {get_dates();}

       set_num_dates();

       init_report5_hash();

       init_report7_array();

       memset(blank_string,' ',255);

       memset(blank_string+255,'\0',1);

       strcpy(report_work,blank_string);

       memset(blank_ip,' ',15);

       memset(blank_ip+15,'\0',1);

       strcpy(domain,blank_string);

       strcpy(domain,get_value("login"));

       strcpy(r3_string,get_value("nameaccessfile"));

}

void get_time()

{

       lt = time(NULL);

       time_ptr = localtime(&lt);

}

void print_time(char * msg)

{

       get_time();

       printf("%s at %s<br>",msg,asctime(time_ptr));

}

void prepare_parm_input()

{

       /*

              Input parameters containing stats form fields and the path to the

              stats data are contained in the input parm file provided by the

              stats engine wrapper. Open the file, read all the records, and

              concatenate the data into one long string. This string will be

              equivalent to a query string that has been url-decoded.

              The get_value subroutine will take this string and extract values

              for specified input variables.

       */

       if ((fptr17 = fopen(input_parm_file,"r")) == NULL )

       {

              printf("can't open %s\n",input_parm_file);

              exit(1);

       }

       strcpy(buffer,"\0");

       while (fscanf(fptr17,"%s",parm_line) != EOF)

       {

              strcat(buffer,parm_line);

              strcat(buffer,"&");

       }

       if (strcmp(get_value("period"),"entire") == 0)       {entire_period = 1;}

       if (strcmp(get_value("usedailytotal"),"on") == 0               /* if report 5 requested */

              || strcmp(get_value("useperiodtotal"),"on") == 0        /* or report 4 requested */

              || strcmp(get_value("usedailyavg"),"on") == 0)        /* or report 6 requested */

       {r5_data_needed = 1;}

}

void prepare_filename_suffix()

{

       strcat(pid,get_value("login"));        /* use for unique filenames */

       strcat(pid,"_");

       get_random_suffix();

       /* Shouldn't have more than one for an account at a time, but just in case */

       strcat(pid,random_suffix_char);        /* use for unique filenames */

}

void get_random_suffix()

{

       int stime, random_suffix;

       long ltime;

       ltime = time(NULL);

       stime = (unsigned) ltime/2;

       srand(stime);

       random_suffix = rand()%10;

       switch (random_suffix)

       {

              case 0: strcpy(random_suffix_char,"0"); break;

              case 1: strcpy(random_suffix_char,"1"); break;

              case 2: strcpy(random_suffix_char,"2"); break;

              case 3: strcpy(random_suffix_char,"3"); break;

              case 4: strcpy(random_suffix_char,"4"); break;

              case 5: strcpy(random_suffix_char,"5"); break;

              case 6: strcpy(random_suffix_char,"6"); break;

              case 7: strcpy(random_suffix_char,"7"); break;

              case 8: strcpy(random_suffix_char,"8"); break;

              case 9: strcpy(random_suffix_char,"9"); break;

              default:                          break;

       }

}

void prepare_filename_strings()

{

       strcpy(error_code_file_string,prepare_filename_string(error_code_file_string));

       strcpy(file_list_string,prepare_filename_string(file_list_string));

       strcpy(ip_date_file_string,prepare_filename_string(ip_date_file_string));

       strcpy(outfile_string,prepare_filename_string(outfile_string));

       strcpy(r5sorted_ip_date_file_string,prepare_filename_string(r5sorted_ip_date_file_string));

       strcpy(report_file_string,prepare_filename_string(report_file_string));

       strcpy(report1_ip_file_string,prepare_filename_string(report1_ip_file_string));

       strcpy(report2_file_file_string,prepare_filename_string(report2_file_file_string));

       strcpy(report3_ip_file_string,prepare_filename_string(report3_ip_file_string));

       strcpy(report9_file_file_string,prepare_filename_string(report9_file_file_string));

       strcpy(sorted_error_code_file_string,prepare_filename_string(sorted_error_code_file_string));

       strcpy(sorted_report_file_string,prepare_filename_string(sorted_report_file_string));

       strcpy(uniq_ip_count_string,prepare_filename_string(uniq_ip_count_string));

       strcpy(uniq_ip_list_string,prepare_filename_string(uniq_ip_list_string));

       strcpy(uniq_item_count_string,prepare_filename_string(uniq_item_count_string));

       strcpy(uniq_item_list_string,prepare_filename_string(uniq_item_list_string));

       strcpy(work_file_string,prepare_filename_string(work_file_string));

}

char *prepare_filename_string(char *filename_string)

{

       strcpy(work_string1,cgipath);

       strcat(work_string1,filename_string);

       strcat(work_string1,pid);

       return(work_string1);

}

void prepare_file_list()

{

       strcpy(work_string1,"ls ");

/*       strcat(work_string1,get_value("path"));

*/       strcat(work_string1,"*access* > ");

       strcat(work_string1,file_list_string);

       system(work_string1);

}

void open_some_files()

{

       if ((fptr1 = fopen(file_list_string,"r")) == NULL)          {printf("can't open file_list\n");exit(1);}

       if ((fptr3 = fopen(outfile_string,"w")) == NULL)             {printf("can't open outfile\n");exit(1);}

       if ((fptr15 = fopen(report3_ip_file_string,"w")) == NULL)       {printf("can't open report3_ip_file\n");exit(1);}

       if ((fptr16 = fopen(report9_file_file_string,"w")) == NULL)       {printf("can't open report9_file_file\n");exit(1);}

       if (r5_data_needed == 1)

       {

              if ((fptr10 = fopen(ip_date_file_string,"w")) == NULL)

              {

                     printf("can't open ip_date_file\n");

                      exit(1);

              }

              file10_open = 1;

       }

       else if (strcmp(get_value("useclients"),"on") == 0)        /* if report 1 requested */

       {

              if ((fptr11 = fopen(report1_ip_file_string,"w")) == NULL)

              {

                     printf("can't open report1_ip_file\n");

                      exit(1);

              }

              file11_open = 1;

       }

       if (strcmp(get_value("usefiles"),"on") == 0)            /* if report 2 requested */

       {

              if ((fptr12 = fopen(report2_file_file_string,"w")) == NULL)

              {

                     printf("can't open report2_file_file\n");

                      exit(1);

              }

              file12_open = 1;

       }

}

void set_num_dates()

{

       if       (entire_period == 0)       {estimate_num_dates_specific();}

       else                        {estimate_num_dates_entire();}

}

void estimate_num_dates_specific()

{

       /*

              6/98 - 6/98 -> 1 month

              6/97 - 6/98 -> 13 months

              7/97 - 6/98 -> 12 months

              5/97 - 6/98 -> 14 months

       */

       long work;

       int year_diff, month_diff, startmonth, endmonth;

       /* find number of months in range, multiply by 31 */

       year_diff = atoi(end_year) - atoi(start_year);

       startmonth = atoi(start_month);

       endmonth   = atoi(end_month);

       month_diff = endmonth - startmonth;

       num_dates = 31 * ((12 * year_diff) + month_diff + 1);

       /* find date_bias (see comment in hash_date subroutine) */

       memcpy(rec_year_long,start_year,4);

       memset(rec_year_long + 4,'\0',1);

       strcpy(rec_month,start_month);

       strcpy(rec_day,"01");

       work = (long) (372 * atoi(rec_year_long) + 31 * (atoi(rec_month) - 1) + atoi(rec_day));

       date_bias = work % (long) num_dates;

}

long month_num(char *month_nam)

{

       if (strcmp("Jan",month_nam) == 0) {return (1);}

       if (strcmp("Feb",month_nam) == 0) {return (2);}

       if (strcmp("Mar",month_nam) == 0) {return (3);}

       if (strcmp("Apr",month_nam) == 0) {return (4);}

       if (strcmp("May",month_nam) == 0) {return (5);}

       if (strcmp("Jun",month_nam) == 0) {return (6);}

       if (strcmp("Jul",month_nam) == 0) {return (7);}

       if (strcmp("Aug",month_nam) == 0) {return (8);}

       if (strcmp("Sep",month_nam) == 0) {return (9);}

       if (strcmp("Oct",month_nam) == 0) {return (10);}

       if (strcmp("Nov",month_nam) == 0) {return (11);}

       if (strcmp("Dec",month_nam) == 0) {return (12);}

/*       printf("%s %s bad month name = %s<br>",get_value("login"),timestamp1,month_name);

*/     return (13);

}

char * month_name(long month_number)

{

       if (month_number == 1) {return "Jan";}

       if (month_number == 2) {return "Feb";}

       if (month_number == 3) {return "Mar";}

       if (month_number == 4) {return "Apr";}

       if (month_number == 5) {return "May";}

       if (month_number == 6) {return "Jun";}

       if (month_number == 7) {return "Jul";}

       if (month_number == 8) {return "Aug";}

       if (month_number == 9) {return "Sep";}

       if (month_number == 10) {return "Oct";}

       if (month_number == 11) {return "Nov";}

       if (month_number == 12) {return "Dec";}

/*       printf("bad month number = %s<br>",month_num);

*/     return "Jan";

}

void estimate_num_dates_entire()

{

       /*

       For "entire" period, allow for two full years of dates. Since the most

       recent date is yesterday, the date range begins two years ago today.

       To find the date_bias (see hash_date subroutine for comments), just find the

       number of days so far this year.

       Why is it so simple? Suppose today is 9/16/98. The normal ("specific")

       date bias calculation would produce "work" = (365 * 98) + (31 * 9) + 16 = 36065,

       which would be reduced to 295 via modulo 730 (730 is num_dates, 2 * 365).

       295 just happens to be 31*9 + 16, the number of days so far this year.

       So we don't need the year, nor do we need to modulo.

       p.s. Don't sweat leap year.

       */

       num_dates = 744; /* Since 372 used in hash_date, double it here, else bad things. */

/*       num_dates = 730;

*/

       get_time();

       date_bias = (long) (31 * (time_ptr->tm_mon) + time_ptr->tm_mday);

/*       date_bias = (long) (31 * (time_ptr->tm_mon+1) + time_ptr->tm_mday);

*/

}

void get_dates()

{

       strcpy(start_day, get_value("startday"));

       strcpy(start_month,get_value("startmonth"));

       strcpy(start_year, get_value("startyear"));

       strcpy(end_day,    get_value("endday"));

       strcpy(end_month, get_value("endmonth"));

       strcpy(end_year,   get_value("endyear"));

       int_start_year       = atoi(start_year);

       int_start_month       = atoi(start_month);

       int_start_day       = atoi(start_day);

       int_end_year = atoi(end_year);

       int_end_month       = atoi(end_month);

       int_end_day = atoi(end_day);

       strcpy(start_date,month_name(int_start_month));

       strcat(start_date," ");

       strcat(start_date,start_day);

       strcat(start_date," ");

       strcat(start_date,start_year);

       strcpy(end_date,month_name(int_end_month));

       strcat(end_date," ");

       strcat(end_date,end_day);

       strcat(end_date," ");

       strcat(end_date,end_year);

}

void init_report5_hash()

{

       register long i;

       if ((report5_hash = (struct struc_3 *) malloc(num_dates * sizeof(struct struc_3))) == NULL)

       {

              printf("error allocating report5 hash - aborting");

              exit(1);

       }

       for (i = 0; i < num_dates; i++)

       {

              strcpy(report5_hash[i].s3_date,blank_date);

              report5_hash[i].s3_clients              = 0;

              report5_hash[i].s3_hits                 = 0;

              report5_hash[i].s3_page_hits            = 0;

              report5_hash[i].s3_kbytes        = 0;

              report5_hash[i].s3_next_hash_entry     = NULL;

       }

}

void init_report7_array()

{

       long i;

       if ((report7_array = (struct struc_4 *) malloc(24 * sizeof(struct struc_4))) == NULL)

       {

              printf("error allocating report7_array - aborting");

              exit(1);

       }

       for (i = 0; i < 24; i++)

       {

              report7_array[i].s4_hits = 0;

              report7_array[i].s4_kbytes = 0;

       }

       strcpy(report7_array[0].s4_time, "Midnight to   1 AM ");

       strcpy(report7_array[1].s4_time, " 1 AM   to   2 AM ");

       strcpy(report7_array[2].s4_time, " 2 AM   to   3 AM ");

       strcpy(report7_array[3].s4_time, " 3 AM   to   4 AM ");

       strcpy(report7_array[4].s4_time, " 4 AM   to   5 AM ");

       strcpy(report7_array[5].s4_time, " 5 AM   to   6 AM ");

       strcpy(report7_array[6].s4_time, " 6 AM   to   7 AM ");

       strcpy(report7_array[7].s4_time, " 7 AM   to   8 AM ");

       strcpy(report7_array[8].s4_time, " 8 AM   to   9 AM ");

       strcpy(report7_array[9].s4_time, " 9 AM   to 10 AM ");

       strcpy(report7_array[10].s4_time," 10 AM   to 11 AM ");

       strcpy(report7_array[11].s4_time," 11 AM   to 12 PM ");

       strcpy(report7_array[12].s4_time," 12 PM   to   1 PM ");

       strcpy(report7_array[13].s4_time," 1 PM   to   2 PM ");

       strcpy(report7_array[14].s4_time," 2 PM   to   3 PM ");

       strcpy(report7_array[15].s4_time," 3 PM   to   4 PM ");

       strcpy(report7_array[16].s4_time," 4 PM   to   5 PM ");

       strcpy(report7_array[17].s4_time," 5 PM   to   6 PM ");

       strcpy(report7_array[18].s4_time," 6 PM   to   7 PM ");

       strcpy(report7_array[19].s4_time," 7 PM   to   8 PM ");

       strcpy(report7_array[20].s4_time," 8 PM   to   9 PM ");

       strcpy(report7_array[21].s4_time," 9 PM   to 10 PM ");

       strcpy(report7_array[22].s4_time," 10 PM   to 11 PM ");

       strcpy(report7_array[23].s4_time," 11 PM   to Midnight");

}

void check_file_date()

{

       char *date_pointer;

       char file_year[3];

       int   file_year_int;

       char file_month[3];

       int   file_month_int;

       char file_day[3];

       int   file_day_int;

       if ((date_pointer = strstr(curr_file,"-")) == NULL) /* can't tell */

       {file_date_status = 0; return;}

       memcpy(file_year,date_pointer+1,2);

       memset(file_year+2,'\0',1);

       file_year_int = atoi(file_year);

       if        (file_year_int > 90)       {file_year_int += 1900;}

       else                        {file_year_int += 2000;}

       memcpy(file_month,date_pointer+3,2);

       memset(file_month+2,'\0',1);

       file_month_int = atoi(file_month);

       memcpy(file_day,date_pointer+5,2);

       memset(file_day+2,'\0',1);

       file_day_int = atoi(file_day);

       if (file_year_int < atoi(start_year)) {file_date_status = -1; return;}

       if (file_year_int == atoi(start_year))

       {

              if ((file_month_int < atoi(start_month))

              || (file_month_int == atoi(start_month) && file_day_int < atoi(start_day)))

              { file_date_status = -1; return;}

       }

       file_date_status = 0;

}

int create_record_stream()

{

       strcpy(cat_string,blank_string);

       if        (strstr(curr_file,".gz") != NULL)

       {

              strcpy(cat_string,"/usr/bin/gzcat ");

       }

       else

       {

              strcpy(cat_string,"cat ");

       }

       strcat(cat_string,curr_file);

       strcat(cat_string," >");

       strcat(cat_string,work_file_string);   /* overwrite work file if it exists */

       system(cat_string);

       if ((fptr2 = fopen(work_file_string,"r")) == NULL)

       {

              printf("can't open work_file\n");

               exit(1);

       }

       return 0;

}

void omit_header()

{

       /* Enterprise server puts a one-line header at the start of a log file.

          If it's there, omit it.

       */

       fscanf(fptr2,"%s",header_work);

       if(strcmp(header_work,"") == 0) /* Empty file */

       {return;}

       if (strstr(header_work,"format=%Ses->client.ip%") != NULL)

       {

              fscanf(fptr2,"%s%s%s%s%s%s",header_work,header_work,header_work,header_work,header_work,header_work);

       }

       else

       {

              fscanf(fptr2,"%s",header_work2);       /* Don't know why it would, but...*/

              if (strcmp(header_work2,"-") != 0) /* If *2nd* fscanf got ip field */

              {

                     strcpy(ip,header_work2);

                     fscanf(fptr2,"%s",ident);

              }

              else

              {

                     strcpy(ip,header_work);

                     strcpy(ident,header_work2);

              }

              fscanf(fptr2,"%s",authuser);

              /* authuser can sometimes consist of multiple fields,

                 so read fields til definitely at timestamp1

              */

              fscanf(fptr2,"%s",timestamp1);

              while (timestamp1[0] != '[')

              { fscanf(fptr2,"%s",timestamp1);}

              fscanf(fptr2,"%s",timestamp2);

              /* Read http fields til last char of field read is ".

                 If more than one, second is request. There can be as

                 few as one or two fields, as many as 3 or more.

                 URL *can* contain blanks + stuff, throwing off expected field sequence.

                 For example, "GET /cgi-bin/Stats/stats.cgi [ <a href= HTTP/1.0" shows up

                 in the log record, and has 3 extraneous "fields" before "HTTP".

              */

              work_count = 0;

              fscanf(fptr2,"%s",file2_buffer);



              while (file2_buffer[strlen(file2_buffer)-1] != '"')

              {

                     work_count++;

                     if (work_count == 2)

                     {

                            if (strlen(file2_buffer) > 255)

                            {

                                   memcpy(file2,file2_buffer,255);

                                   memset(file2+255,'\0',1);

                            }

                            else {strcpy(file2,file2_buffer);}

                     }

                     fscanf(fptr2,"%s",file2_buffer);

              }

              fscanf(fptr2,"%s%s",status,work_bytes);

              if   (strcmp(work_bytes,"-") == 0)

                     {bytes = 0;}

              else       {bytes = atol(work_bytes);}

              if       (strstr(file2,"default.ida") != NULL)       {code_red_found = 1;}

              if       (null_found == 0 && code_red_found == 0)

              {

                     update_strucs();

              }

              else

              {

                     null_found = 0;

                     code_red_found = 0;

              }

       }

}

void update_strucs()

{

       long i;

       parse_input();

       if        (entire_period == 0)     {i = check_date_range();}

       else                        {i = 0;}

       if        (i < 0)       {return;}                  /* rec too old */

       else if     (i > 0)       {rec_too_new = 1; return;}       /* rec too new, done with file */

       if (atoi(status) > 399)       {reports8and9_ongoing(); return;}

       record_count++;

/*     if (record_count > 1000000)

       {

              wrapup_hogs();*/ /* shunt request to hogs queue */

/*            printf("hog found -- %s\n",get_value("login"));

              exit(10);

       }

*/

       /* cut recs for report input files, if reports requested */

       if (r5_data_needed == 1)

       {

              fprintf(fptr10,"%s %s\n",ip,rec_date);

              report5_ongoing();        /* needed if r4, r5, or r6 requested */

       }

       else if (strcmp(get_value("useclients"),"on") == 0) /* if report 1 requested */

       {

              fprintf(fptr11,"%s\n",ip);

       }

       if (strcmp(get_value("usefiles"),"on") == 0)            /* if report 2 requested */

       {

              fprintf(fptr12,"%s\n",file2);

       }

       if (strcmp(get_value("usehourly"),"on") == 0)            {report7_ongoing();}

       if (strcmp(get_value("useclientaccess"),"on") == 0)     {report3_ongoing();}

}

long check_date_range()

{

       /* If log rec date before start date */

       if ((rec_year_int < int_start_year)

       || (rec_year_int == int_start_year && rec_month_int < int_start_month)

       || (rec_year_int == int_start_year && rec_month_int == int_start_month && rec_day_int < int_start_day))

       {

              return -1;

       }

       /* If log rec date after end date */

       else if ((rec_year_int > int_end_year)

       || (rec_year_int == int_end_year && rec_month_int > int_end_month)

       || (rec_year_int == int_end_year && rec_month_int == int_end_month && rec_day_int > int_end_day))

       {

              return 1;

       }

       else return 0;

}

void parse_input()

{

       char work_hour[3];

       memcpy(rec_date,timestamp1+1,11);

       memset(rec_date + 11,'\0',1);

       memcpy(work_hour,timestamp1+13,2);

       memset(work_hour + 2,'\0',1);

       rec_hour = atoi(work_hour);

       memcpy(rec_year_long,rec_date+7,4);

       memset(rec_year_long+4,'\0',1);

       rec_year_int = atoi(rec_year_long);

       memcpy(rec_month,rec_date+3,3);

       memset(rec_month+3,'\0',1);

       rec_month_int = month_num(rec_month);

       memcpy(rec_day,rec_date,2);

       memset(rec_day+2,'\0',1);

       rec_day_int = atoi(rec_day);

       if (record_count > 0)

       {

              if (rec_year_int < oldest_year

              || (rec_year_int == oldest_year && rec_month_int < oldest_month)

              || (rec_year_int == oldest_year && rec_month_int == oldest_month && rec_day_int < oldest_day))

              {

                     oldest_year        = rec_year_int;

                     oldest_month        = rec_month_int;

                     oldest_day          = rec_day_int;

                     strcpy(char_oldest_year,rec_year_long);

                     strcpy(char_oldest_day,rec_day);

              }

              if (rec_year_int > youngest_year

              || (rec_year_int == youngest_year && rec_month_int > youngest_month)

              || (rec_year_int == youngest_year && rec_month_int == youngest_month && rec_day_int > youngest_day))

              {

                     youngest_year              = rec_year_int;

                     youngest_month             = rec_month_int;

                     youngest_day        = rec_day_int;

                     strcpy(char_youngest_year,rec_year_long);

                     strcpy(char_youngest_day,rec_day);

              }

       }

       else

       {

              oldest_year        = rec_year_int;

              oldest_month        = rec_month_int;

              oldest_day          = rec_day_int;

              strcpy(char_oldest_year,rec_year_long);

              strcpy(char_oldest_day,rec_day);

              youngest_year              = rec_year_int;

              youngest_month             = rec_month_int;

              youngest_day        = rec_day_int;

              strcpy(char_youngest_year,rec_year_long);

              strcpy(char_youngest_day,rec_day);

       }

}

void close_some_files()

{

       if (file10_open) {fclose(fptr10);}

       if (file11_open) {fclose(fptr11);}

       if (file12_open && fptr12 != fptr11) {fclose(fptr12);}

       fclose(fptr15);

       if (fptr16 != fptr6) {fclose(fptr16);}

}

void do_reports()

{

       print_header();

       if (record_count == 0)       {print_sorry(); return;}

       print_time("starting report 5");

       if (r5_data_needed == 1)                           {report5();}

       if (strcmp(get_value("useperiodtotal"),"on") == 0       /* report 4 requested */

       || strcmp(get_value("useclients"),"on") == 0)            /* report 1 requested */

                                                        {reports1and4_setup();}

       print_time("starting report 4");

       if (strcmp(get_value("useperiodtotal"),"on") == 0)       {report4();}

       print_time("starting report 6");

       if (strcmp(get_value("usedailyavg"),"on") == 0)        {report6();}

       print_time("starting report 7");

       if (strcmp(get_value("usehourly"),"on") == 0)            {report7();}

       print_time("starting report 1");

       if (strcmp(get_value("useclients"),"on") == 0)        {report_num = 1; report();}

       print_time("starting report 2");

       if (strcmp(get_value("usefiles"),"on") == 0)            {report_num = 2; report();}

       print_time("starting report 3");

       if (strcmp(get_value("useclientaccess"),"on") == 0)       {report_num = 3; report();}

       print_time("starting report 8");

                                                        {report8();}

       print_time("starting report 9");

                                                        {report_num = 9; report();}

                                                        {print_glossary();}

}

void print_sorry()

{

       fprintf(fptr3,"\nWe're sorry, but no records fall in the date range specified.\n");

}

void print_header()

{

       fprintf(fptr3,"From: statsmaster@simplenet.com\n");

       fprintf(fptr3,"Subject: Your Statistics Report\n");

       fprintf(fptr3,"Reply-to: statsmaster@simplenet.com\n");

/*       get_time();

       fprintf(fptr3,"Date: %s\n\n",asctime(time_ptr));

*/

       fprintf(fptr3,"\t\t\t   %s\n","SimpleNet Statistics");

       fprintf(fptr3,"%s",dashed_line);

       if (entire_period == 1 && record_count > 0)

       {

              strcpy(start_date,month_name(oldest_month));

              strcat(start_date," ");

              strcat(start_date,char_oldest_day);

              strcat(start_date," ");

              strcat(start_date,char_oldest_year);

              strcpy(end_date,month_name(youngest_month));

              strcat(end_date," ");

              strcat(end_date,char_youngest_day);

              strcat(end_date," ");

              strcat(end_date,char_youngest_year);

       }

       if (entire_period == 0 || record_count > 0)

       {

              fprintf(fptr3,"Access Report for %s to %s\n\n",start_date,end_date);

       }

       else

       {

              fprintf(fptr3,"Access Report for entire period\n\n");

       }

}

void reports1and4_setup()

{

       /* If report 5 data not needed, report1_ip_file was created earlier. */

       strcpy(work_string1,"cut -f1 -d\" \" < ");

       strcat(work_string1,r5sorted_ip_date_file_string);

       strcat(work_string1," > ");

       strcat(work_string1,report1_ip_file_string);

       strcpy(work_string2,"sort ");

       strcat(work_string2,report1_ip_file_string);

       strcat(work_string2," | uniq > ");

       strcat(work_string2,uniq_ip_list_string);

       strcpy(work_string3,"wc -l ");

       strcat(work_string3,uniq_ip_list_string);

       strcat(work_string3," > ");

       strcat(work_string3,uniq_ip_count_string);

       strcpy(work_string4,"rm ");

       strcat(work_string4,uniq_ip_list_string);

       strcpy(work_string5,"rm ");

       strcat(work_string5,r5sorted_ip_date_file_string);

       if (r5_data_needed == 1)       {system(work_string1);}

       system(work_string2);

       system(work_string3);

       system(work_string4);

       get_ip_count();

       if (r5_data_needed == 1)

       {

              system(work_string5);

              r5sorted_ip_date_file_removed = 1;

       }

}

void get_ip_count()

{

       strcpy(work_string1,"rm ");

       strcat(work_string1,uniq_ip_count_string);

       if ((fptr4 = fopen(uniq_ip_count_string,"r")) == NULL)

       {

              printf("can't open uniq_ip_count\n");

               exit(1);

       }

       fscanf(fptr4,"%ld",&num_clients);

       system(work_string1);

}

/**********************************************************************************************/

/**********************************************************************************************/

/**********************************************************************************************/

/**********************************************************************************************/

/**********************************************************************************************/

void report3_ongoing()

{

       if (check_for_match(file2,r3_string,"any"))

       {

              fprintf(fptr15,"%s\n",ip);

       }

}

/**********************************************************************************************/

/**********************************************************************************************/

/**********************************************************************************************/

/**********************************************************************************************/

/**********************************************************************************************/

void report5_ongoing()

{

       update_report5_hash(rec_date);

}

void update_report5_hash(char *work_date)

{

       struct struc_3 *old, *start;

       long done;

       indexx = hash_date(work_date);

       if (strcmp(report5_hash[indexx].s3_date,blank_date) == 0)     /* If hash table entry is unoccupied */

       {

              strcpy(report5_hash[indexx].s3_date,work_date);

              report5_hash[indexx].s3_hits++;

              if (page_hit(file2) == 1)

              {

                     report5_hash[indexx].s3_page_hits++;

              }

              report5_hash[indexx].s3_kbytes += (float)bytes / 1000;

       }

       else

       if (strcmp(report5_hash[indexx].s3_date,work_date) == 0)/* Entry occupied, input date matches entry's */

       {

              report5_hash[indexx].s3_hits++;

              if (page_hit(file2) == 1)

              {

                     report5_hash[indexx].s3_page_hits++;

              }

              report5_hash[indexx].s3_kbytes += (float)bytes / 1000;

       }

       else       /* Search chain for match -- if none, add entry to end of chain */

       {

              old   = &report5_hash[indexx];

              start = report5_hash[indexx].s3_next_hash_entry;

              done = 0;

              while (start != NULL && !done)

              {

                     if (strcmp(start->s3_date,work_date) == 0)

                     {

                            start->s3_hits++;

                            if (page_hit(file2) == 1)

                            {

                                   start->s3_page_hits++;

                            }

                            start->s3_kbytes += (float)bytes / 1000;

                            done = 1;

                     }

                     else

                     {

                            old = start;

                            start = start->s3_next_hash_entry;

                     }

              }

              if (!done)

              {

                     start = (struct struc_3 *) malloc (sizeof(struct struc_3));

                     if (!start)

                     {

                            printf("out of memory\n");

                            return;

                     }

                     strcpy(start->s3_date,work_date);

                            start->s3_hits++;

                            if (strstr(file2,".htm") != NULL)

                            {

                                   start->s3_page_hits++;

                            }

                            start->s3_kbytes += (float)bytes / 1000;

                     start->s3_next_hash_entry = NULL;

                     old->s3_next_hash_entry = start;

              }

       }

}

long page_hit(char *field)

{

       long i;

       char *ptr = field + strlen(field);      /* string must be at end of filename */

       char ptr2[5];                    /* long enough to hold longest suffix + null */

       for (i = 0; i < 4; i++)

       {

              memset(ptr2 + i,tolower(*(ptr - 4 + i)),1);

       }

       memset(ptr2 + i,'\0',1);

       if (strcmp((char *)(ptr2 + 1),"htm")       == 0       /* includes shtm */

       || strcmp((char *)(ptr2),"html")    == 0       /* includes shtml */

       || strcmp((char *)(ptr2),".hts")    == 0

       || strcmp((char *)(ptr2 + 1),".mv")       == 0

       || strcmp((char *)(ptr2 + 3),"/")       == 0)

              return 1;

       else       return 0;

}

/**********************************************************************************************/

/**********************************************************************************************/

/**********************************************************************************************/

/**********************************************************************************************/

/**********************************************************************************************/

void report7_ongoing()

{

       report7_array[rec_hour].s4_hits++;

       report7_array[rec_hour].s4_kbytes += (float) bytes / 1000;

}

/**********************************************************************************************/

/**********************************************************************************************/

/**********************************************************************************************/

/**********************************************************************************************/

/**********************************************************************************************/

void reports8and9_ongoing()

{

       if        (strcmp(status,"400") == 0)       {e400_count++; fprintf(fptr16,"%s\n",file2);}

       else if       (strcmp(status,"401") == 0)       {e401_count++; fprintf(fptr16,"%s\n",file2);}

       else if (strcmp(status,"402") == 0)       {e402_count++; fprintf(fptr16,"%s\n",file2);}

       else if (strcmp(status,"403") == 0)       {e403_count++; fprintf(fptr16,"%s\n",file2);}

       else if (strcmp(status,"404") == 0)       {e404_count++; fprintf(fptr16,"%s\n",file2);}

       else if (strcmp(status,"405") == 0)       {e405_count++; fprintf(fptr16,"%s\n",file2);}

       else if (strcmp(status,"406") == 0)       {e406_count++; fprintf(fptr16,"%s\n",file2);}

       else if (strcmp(status,"410") == 0)       {e410_count++; fprintf(fptr16,"%s\n",file2);}

       else if (strcmp(status,"500") == 0)       {e500_count++; fprintf(fptr16,"%s\n",file2);}

       else if (strcmp(status,"501") == 0)       {e501_count++; fprintf(fptr16,"%s\n",file2);}

       else if (strcmp(status,"502") == 0)       {e502_count++; fprintf(fptr16,"%s\n",file2);}

       else if (strcmp(status,"503") == 0)       {e503_count++; fprintf(fptr16,"%s\n",file2);}

       else if (strcmp(status,"504") == 0)       {e504_count++; fprintf(fptr16,"%s\n",file2);}

}

/**********************************************************************************************/

/**********************************************************************************************/

/**********************************************************************************************/

/**********************************************************************************************/

/**********************************************************************************************/

void report()

{

/*     long i;

*/

       /*

               open uniq_item_count file (based on all items that occurred, with no duplicates)

              create hash table, sized as function of number of unique items

              open item file (all items that occurred, with duplicates)

              for each item,

                     map it into hash table

                     if it's already there, update the count (may have to traverse chain)

                     else add entry with hash_value, item, count=1, next_ptr

              endfor

       */

       report_setup();

       init_report_hash();

       init_report_array();

       open_report_input_file();

       populate_report_hash();

/*     for (i = 0; i < scaling_factor * num_uniq_items; i++)

       {

              print_report_hash(report_hash[i],i);

       }

*/

       create_report_file();

       create_report();

       free(report_hash);

       free(report_array);

}

void report_setup()

{

       if (report_num != 1)

       {

              strcpy(sort_string,blank_string);

              strcpy(sort_string,"sort < ");

       }

       report_limit = 0;

       switch (report_num)

       {

              case 1: report1_setup();    break;

              case 2: report2_setup();    break;

              case 3: report3_setup();    break;

              case 9: report9_setup();    break;

              default:                   break;

       }

       if (report_num != 1)        /* For r1 we've already done this. */

       {

              strcat(sort_string,report_input_file);

              strcat(sort_string," | uniq > ");

              strcat(sort_string,uniq_item_list_string);

              system(sort_string);

              strcpy(work_string1,"wc -l ");

              strcat(work_string1,uniq_item_list_string);

              strcat(work_string1," > ");

              strcat(work_string1,uniq_item_count_string);

              system(work_string1);

              get_count();

              strcpy(work_string1,"rm ");

              strcat(work_string1,uniq_item_list_string);

              system(work_string1);

       }

}

void report1_setup()

{

       char x[16] = "\0";

       fqdn_limit   = 40;

       strcpy(x,get_value("numclients"));

       if (strcmp(x,"") == 0)       {r1_limit = 1000;}

       else                 {r1_limit = atoi(x);}

       report_limit = r1_limit;

       num_uniq_items       = num_clients;

       if (strcmp(get_value("useexclude"),"on") == 0)

       {

              strcpy(r1_exclude_string,get_value("clientexclude"));

       }

       strcpy(enterprise_string,enterprise_string1);

       strcpy(report_lit,r1_lit);

       strcpy(report_input_file,report1_ip_file_string);

}

void report2_setup()

{

       char x[256] = "\0";

       fqdn_limit   = 0;

       strcpy(x,get_value("numfiles"));

       if       (strcmp(x,"") == 0)       {r2_limit = 1000;}

       else if       (strlen(x) > 4)            {r2_limit = 32766;} /* Max 32767 for numfiles */

       else                        {r2_limit = atoi(x);}



       report_limit = r2_limit;

       if (strcmp(get_value("useexcludefile"),"on") == 0)

       {

              strcpy(r2_exclude_string,get_value("nameexcludefile"));

       }

       strcpy(enterprise_string,enterprise_string2);

       strcpy(report_lit,r2_lit);

       strcpy(report_input_file,report2_file_file_string);

}

void report3_setup()

{

       fqdn_limit   = 40;

       report_limit = r3_limit;

       tot_hits     = 0;

       strcpy(enterprise_string,enterprise_string1);

       strcpy(report_lit,r3_lit);

       strcpy(report_input_file,report3_ip_file_string);

}

void report9_setup()

{

       fqdn_limit   = 0;

       report_limit = r9_limit;

       strcpy(enterprise_string,enterprise_string2);

       strcpy(report_lit,r9_lit);

       strcpy(report_input_file,report9_file_file_string);

}

void get_count()

{

       strcpy(work_string1,"rm ");

       strcat(work_string1,uniq_item_count_string);

       if ((fptr6 = fopen(uniq_item_count_string,"r")) == NULL)

       {

              printf("can't open uniq_item_count\n");

               exit(1);

       }

       fscanf(fptr6,"%ld",&num_uniq_items);

       system(work_string1);

}

void init_report_hash()

{

       register long i;

       if ((report_hash = (struct struc_1 *)

              malloc(scaling_factor * num_uniq_items * sizeof(struct struc_1))) == NULL)

       {

              printf("error allocating report_hash - aborting");

              exit(1);

       }

       for (i = 0; i < scaling_factor * num_uniq_items; i++)

       {

              strcpy(report_hash[i].s1_string,blank_string);

              report_hash[i].s1_count                 = 0;

              report_hash[i].s1_next_hash_entry      = NULL;

       }

}

void init_report_array()

{

       register long i;

       if ((report_array = (struct struc_1 *)

              malloc(num_uniq_items * sizeof(struct struc_1))) == NULL)

       {

              printf("error allocating report_array - aborting");

              exit(1);

       }

       for (i = 0; i < num_uniq_items; i++)

       {

              strcpy(report_array[i].s1_string,blank_string);

              report_array[i].s1_count        = 0;

              report_array[i].s1_next_hash_entry     = NULL;

       }

}

void open_report_input_file()

{

       if ((fptr7 = fopen(report_input_file,"r")) == NULL)

       {

              printf("can't open %s\n",report_input_file);

               exit(1);

       }

}

void populate_report_hash()

{

       long enterprise_seen = 0;

       report_index = 0;

       while (fscanf(fptr7,"%s",report_work) != EOF)

       {

              if (strstr(report_work,enterprise_string) == NULL)       /* omit enterprise record */

              {

                     if       (report_num == 1

                            && strcmp(get_value("useexclude"),"on") == 0

                            && check_for_match(report_work,r1_exclude_string,"left")){;}

                     else if (report_num == 2

                            && ((strcmp(get_value("useexcludefile"),"on") == 0

                                  && check_for_match(report_work,r2_exclude_string,"any"))

                               || (strcmp(get_value("usehtmlonly"),"on") == 0

                                  && page_hit(report_work) == 0)))                 {;}

                     else       update_report_hash(report_work);

              }

              else if (enterprise_seen == 0)       /* only subtract once from count of unique items */

              {

                     num_uniq_items--;

                     enterprise_seen = 1;

              }

       }

       fclose(fptr7);

}

/*

       See if string 1 meets any criteria from string 2. String 1 might be the IP field

       or the file field from a log record, while string 2 might be a list of one or

       more character strings, each of which could have one or more wild card aspects

       indicated by an asterisk. List items are separated by commas. String 2 might

       come in from a form field such as "exclude clients matching this IP address pattern"

       or "show all clients accessing this file" or "exclude files with this pattern."

       For example, suppose string 1 is a file such as "index.html" and string 2 is

       "in*.htm*,*.gif". This subroutine must determine if string 1 satisfies "in*.htm*"

       OR "*.gif". While checking against an individual string within the list in

       string 2, the subroutine must determine if string 1 satisfies EVERY part of

       that substring from string 2. So, given "in*.htm*" as the substring from string 2,

       string 1 must contain "in" AND ".htm" in that order.

       Since "index.html" satisfies the first substring from string 2 ("in*.htm*"), the

       subroutine determines that there is a match and returns 1. If no match can be

       found, 0 is returned.

       The third input parameter, "type," is either "left" or "any." "Left" means the

       match must occur on the leftmost part of string 1, while "any" means that the match

       can occur starting anywhere in string 1. Left matching is important when working

       with IP's. E.g. if the IP in the log record is 123.132.209.5, and string 2 is

       "209*", without requiring a left-end match we would get a false positive, i.e.

       that there is a match when there really isn't as far as the user is concerned.

       Even if "left" is specified, if string 2 begins with an asterisk, e.g. *209,

       then it doesn't matter where a match occurs. Also, a left match only matters

       for the first subsubstring of the substring from string 2. A substring from

       string 2 has multiple subsubstrings when an asterisk has content to both sides

       within the substring. E.g. given the substring "abc*ghi", there are two

       subsubstrings, "abc" and "ghi". Even in a left match situation, once "abc"

       matches the leftmost part of string 1, then anything else in the substring

       can match anywhere in string 1 (to the right of "abc"). So, if string 1 is

       "abcdefghi" and string 2 is "abc*ghi,vw.x.yz*", the substring "abc*ghi" matches

       string 1.

*/

long check_for_match(char * string1, char * string2, char * type)

{

       char *curr_ptr;

       char *new_ptr;

       long length;

       long rc;

       curr_ptr = string2;

       while ((new_ptr = strstr(curr_ptr,",")) != NULL)

       {

              length = new_ptr - curr_ptr;

              memcpy(work_string1,curr_ptr,length);

              memset(work_string1 + length,'\0',1);

              rc = check_for_wildcards(string1,work_string1,type);

              if (rc == 1)       {return 1;}

              curr_ptr = new_ptr + 1;   /* go past comma to next substring */

       }

       rc = check_for_wildcards(string1,curr_ptr,type);



       if        (rc == 0)       {return 0;}

       else                 {return 1;}

}

long check_for_wildcards(char * string_1, char * string_2, char * type2)

{

       char *new_s1ptr;

       char *curr_s1ptr;

       char *new_s2ptr;

       char *curr_s2ptr;

       long length2;

       long first_or_only = 1;

       curr_s1ptr = string_1;

       curr_s2ptr = string_2;

       while ((new_s2ptr = strstr(curr_s2ptr,"*")) != NULL)

       {

              length2 = new_s2ptr - curr_s2ptr;

              memcpy(work_string2,curr_s2ptr,length2);

              memset(work_string2 + length2,'\0',1);

              if ((new_s1ptr = strstr(curr_s1ptr,work_string2)) == NULL)       {return 0;}

              if (strcmp(type2,"left") == 0

              && first_or_only == 1

              && string_2[0] != '*'

              && new_s1ptr != curr_s1ptr)       {return 0;}

              first_or_only = 0;

              curr_s1ptr = new_s1ptr + length2;

              curr_s2ptr = new_s2ptr + 1;     /* go past asterisk to next subsubstring */

       }

       if ((new_s1ptr = strstr(curr_s1ptr,curr_s2ptr)) == NULL)

              {return 0;}

       else if (strcmp(type2,"left") == 0

              && first_or_only == 1

              && string_2[0] != '*'

              && new_s1ptr != curr_s1ptr)

              {return 0;}

       else        {return 1;}

}

void create_report_file()

{

       long i;

       strcpy(work_string1,"sort +0 -1 -rn ");

       strcat(work_string1,report_file_string);

       strcat(work_string1," > ");

       strcat(work_string1,sorted_report_file_string);

       strcpy(work_string2,"rm ");

       strcat(work_string2,report_file_string);

       if ((fptr8 = fopen(report_file_string,"w")) == NULL)

       {

              printf("can't open report_file\n");

               exit(1);

       }

       /* Get count for each hash entry in use and write it to the report file.

          Report array entry's next_hash_entry is pointer not to a next entry

          in the report array, but to the *hash table* entry for this item.

          We're just re-using the hash table structure. The count is unused.

       */

       for (i = 0; i < report_index; i++)

       {

              fprintf(fptr8,"%ld %s\n",report_array[i].s1_next_hash_entry->s1_count,report_array[i].s1_string);

       }

       fclose(fptr8);

       system(work_string1);

       system(work_string2);

}

void create_report()

{

       long i;

       long report_count;

       if ((fptr9 = fopen(sorted_report_file_string,"r")) == NULL)

       {

              printf("can't open sorted_report_file\n");

               exit(1);

       }

       /* report_limit is the cutoff the USER set (e.g. top n clients) */

       strcpy(work_string1,"rm ");

       strcat(work_string1,sorted_report_file_string);

       if       (report_num == 1

       ||   report_num == 2)       {fprintf(fptr3,report_lit,report_limit,domain);}

       else if (report_num == 3)       {fprintf(fptr3,report_lit,r3_string);}

       else                        {fprintf(fptr3,report_lit,report_limit);}

       strcpy(report_work,blank_string);

       i = 0;

       while (fscanf(fptr9,"%ld %s",&report_count,report_work) != EOF && i < report_limit)

       {

              fprintf(fptr3," %7ld",report_count);

              if (i < fqdn_limit)                      /* Hard limit, else way too slow */

              {

                     get_domain_name(report_work);

              }

              else

              {

                     fprintf(fptr3,"\t%s\n",report_work);

              }

              i++;

              if (report_num == 3) {tot_hits += report_count;}

              strcpy(report_work,blank_string);

       }

       if (report_num == 3)

       {

              if (report_index != 1)       {strcpy(plural,"s");} else {strcpy(plural,"");}

              if (report_index == 0)       {fprintf(fptr3,"\t\tNo match...\n");}

              fprintf(fptr3,"\n\nTotalling %ld hits by %ld individual client%s\n",tot_hits,i,plural);

       }

       fprintf(fptr3,"%s",dashed_line);

       system(work_string1);

}

void get_domain_name(char *input_ip)

{

       u_int addr;

       struct hostent *hp;

       char **p;

       if ((int)(addr = inet_addr(input_ip)) == -1)

       {

              (void) fprintf(fptr3,"\t%s\n",input_ip);

              return;

/*            (void) printf("IP-address must be of the form a.b.c.d\n");

              exit (2);

*/     }

       hp = gethostbyaddr((char *)&addr, sizeof (addr), AF_INET);

       if (hp == NULL)

       {

              (void) fprintf(fptr3,"\t%s\n",input_ip);

              return;

       }

       for (p = hp->h_addr_list; *p != 0; p++)

       {

              struct in_addr in;

/*            char **q;

*/

              (void) memcpy(&in.s_addr, *p, sizeof (in.s_addr));

              (void) fprintf(fptr3,"\t%s",hp->h_name);

              (void) fprintf(fptr3,"\n");

       }

}

void print_report_hash(struct struc_1 *h_e, long i)

{

       printf("%ld %s %ld %p\n<br>",i,h_e->s1_string,h_e->s1_count,h_e->s1_next_hash_entry);

       if (h_e->s1_next_hash_entry != NULL)

       {

              print_report_hash(h_e->s1_next_hash_entry,i);

       }

}

void update_report_hash(char *work_string)

{

       struct struc_1 *old, *start;

       long done;

       if        (report_num == 1 || report_num == 3)

              {indexx = hash_ip(work_string);}

       else if (report_num == 2 || report_num == 9)

              {indexx = hash_file(work_string);}



       if (strcmp(report_hash[indexx].s1_string,blank_string) == 0) /* If hash table entry is unoccupied */

       {

              strcpy(report_hash[indexx].s1_string,work_string);

              report_hash[indexx].s1_count = 1;

              strcpy(report_array[report_index].s1_string,work_string);

              report_array[report_index].s1_next_hash_entry = &report_hash[indexx];

              report_index++;

       }

       else

       if (strcmp(report_hash[indexx].s1_string,work_string) == 0)/* Entry occupied, input ip matches entry's */

       {

              report_hash[indexx].s1_count++;

       }

       else       /* Search chain for match -- if none, add entry to end of chain */

       {

              old   = &report_hash[indexx];

              start = report_hash[indexx].s1_next_hash_entry;

              done = 0;

              while (start != NULL && !done)

              {

                     if (strcmp(start->s1_string,work_string) == 0)

                     {

                            start->s1_count++;

                            done = 1;

                     }

                     else

                     {

                            old = start;

                            start = start->s1_next_hash_entry;

                     }

              }

              if (!done)

              {

                     start = (struct struc_1 *) malloc (sizeof(struct struc_1));

                     if (!start)

                     {

                            printf("out of memory\n");

                            return;

                     }

                     strcpy(start->s1_string,work_string);

                     start->s1_count = 1;

                     start->s1_next_hash_entry = NULL;

                     old->s1_next_hash_entry = start;

                     strcpy(report_array[report_index].s1_string,work_string);

                     report_array[report_index].s1_next_hash_entry = start;

                     report_index++;

              }

       }

}

/**********************************************************************************************/

/**********************************************************************************************/

/**********************************************************************************************/

/**********************************************************************************************/

/**********************************************************************************************/

void report4()

{

       fprintf(fptr3,"Period Totals for %s:\n\n",domain);

       fprintf(fptr3,"    \t\t\t   All\t\t\t\t\t Kilobytes\nIndividual Clients     Clients       Hits \t Page Hits     Transmitted\n\n");

       fprintf(fptr3,"%18ld%12ld%11ld%18ld%16.2f\n",num_clients,tot_clients,tot_hits,tot_page_hits,tot_kb);

       fprintf(fptr3,"%s",dashed_line);

}

/**********************************************************************************************/

/**********************************************************************************************/

/**********************************************************************************************/

/**********************************************************************************************/

/**********************************************************************************************/

void report5()

{

       long i;

       long count;

       report5_init();

       /* read r5sorted_ip_date_file

          for each unique date

              while more recs for this date

                     while next ip = saved ip, next rec

                     else bump count in report5 array of ip's seen

              end

          end

          output report 5 if r5 requested (r4 and r6 need the rest even if r5 not requested)

       */

       fscanf(fptr8,"%s %s",ip,date);

       strcpy(date_save,date);

       strcpy(ip_save,ip);

       count = 1;

       while (fscanf(fptr8,"%s %s",ip,date) != EOF)

       {

              if (strstr(ip_save,"format") != NULL)

              {

                     strcpy(date_save,date);

                     strcpy(ip_save,ip);

                     continue;

              }

              if (strcmp(date,date_save) == 0)

              {

                     if (strcmp(ip,ip_save) != 0)

                     {

                            count++;

                            strcpy(ip_save,ip);

                     }

              }

              else

              {

                     update_report5_totals(count);

                     count = 1;

              }

       }

       /* We still need sorted_ip_date_file for reports 1 and/or 4. */

       /* Do last guy if not done already. Wouldn't be done already unless last record

          in sorted_ip_date_file's date was different from previous record's. */

       if (strcmp(date,date_save) == 0)       {update_report5_totals(count);}

       if        (strcmp(get_value("usedailytotal"),"on") == 0)       {output_report5();}

       else

       {

              for (i=0; i < num_dates; i++)                        /* Used in report 6 */

              {

                     if (strcmp(report5_hash[i].s3_date,blank_date) != 0){num_dates_used++;}

              }

       }

       free(report5_hash);

}

void report5_init()

{

       strcpy(work_string1,"sort +1 -2 +0 -1 ");

       strcat(work_string1,ip_date_file_string);

       strcat(work_string1," > ");

       strcat(work_string1,r5sorted_ip_date_file_string);

       strcpy(work_string2,"rm ");

       strcat(work_string2,ip_date_file_string);

       system(work_string1);

       r5sorted_ip_date_file_created = 1;

       system(work_string2);

       ip_date_file_removed = 1;

       if ((fptr8 = fopen(r5sorted_ip_date_file_string,"r")) == NULL)

       {

              printf("can't open r5sorted_ip_date_file\n");

               exit(1);

       }

       tot_clients = 0;

       tot_hits     = 0;

       tot_page_hits        = 0;

       tot_kb               = 0;

}

void update_report5_totals(count)

       long count;

{

       long i;

       i = hash_date(date_save);

       report5_hash[i].s3_clients = count;

       tot_clients += report5_hash[i].s3_clients;

       tot_hits     += report5_hash[i].s3_hits;

       tot_page_hits        += report5_hash[i].s3_page_hits;

       tot_kb        += report5_hash[i].s3_kbytes;

       strcpy(date_save,date);

       strcpy(ip_save,ip);

}

void output_report5()

{

       /* Had to move work var's to global var's, since they were breaking here, though OK

          before this subroutine was removed from main report5 subrtn. */

       long i;

       fprintf(fptr3,"Daily Totals for %s:\n",domain);

       fprintf(fptr3,"\t\t\t\t\t\t\t\t Kilobytes\nDate\t\t       Clients       Hits \t Page Hits     Transmitted\n\n");

       for (i=0; i < num_dates; i++)

       {

              if (strcmp(report5_hash[i].s3_date,blank_date) != 0)

              {

                     strcpy(work_date,report5_hash[i].s3_date);

                     memcpy(work_mon,work_date+3,3);

                     memset(work_mon + 3,'\0',1);

                     memcpy(work_daynum,work_date,2);

                     memset(work_daynum + 2,'\0',1);

                     memcpy(work_year,work_date+7,4);

                     memset(work_year + 4,'\0',1);

                     if (month_num(work_mon) == 13) {continue;}

                     strcpy(work_day,dayname(month_num(work_mon),atoi(work_daynum),atoi(work_year)));

                     fprintf(fptr3,"%s %s %.2s %.4s\t\t %5ld %10ld\t %10ld\t %10.2f\n",

                            work_day, work_mon, work_daynum, work_year,

                            report5_hash[i].s3_clients,

                            report5_hash[i].s3_hits,

                            report5_hash[i].s3_page_hits,

                            report5_hash[i].s3_kbytes);

                     num_dates_used++;       /* Used in report 6 */

              }

       }

       fprintf(fptr3,"%s",dashed_line);

}

char * dayname(m,d,y)

       long m, d, y;

{

       long val;

       long dd[12] = {0,3,2,5,0,3,5,1,4,6,2,4};



       if (m < 3) {y--;}

       val = (y+(int)(y/4)-(int)(y/100)+(int)(y/400)+dd[m-1]+d) % 7;

       switch (val)

       {

              case 0: return("Sun");

              case 1: return("Mon");

              case 2: return("Tue");

              case 3: return("Wed");

              case 4: return("Thu");

              case 5: return("Fri");

              case 6: return("Sat");

       }

       return ("");

}

/**********************************************************************************************/

/**********************************************************************************************/

/**********************************************************************************************/

/**********************************************************************************************/

/**********************************************************************************************/

void report6()

{

       fprintf(fptr3,"Daily Averages:\n\n");

       fprintf(fptr3,"    \t\t\t\t\t\t\t\t Kilobytes\n\t\t       Clients       Hits\t Page Hits     Transmitted\n\n");

       fprintf(fptr3,"%30ld%11ld%18ld%16.2f\n",

              tot_clients / num_dates_used,

              tot_hits     / num_dates_used,

              tot_page_hits       / num_dates_used,

              tot_kb        / num_dates_used);

       fprintf(fptr3,"%s",dashed_line);

}

/**********************************************************************************************/

/**********************************************************************************************/

/**********************************************************************************************/

/**********************************************************************************************/

/**********************************************************************************************/

void report7()

{

       long   i;

       float tot_hits = 0;

       float tot_kbytes = 0;

       fprintf(fptr3,"Hourly Averages (Pacific Time):\n\n");

       fprintf(fptr3,"    \t\t\t\t\t\t\t\t Kilobytes\n\t\t      Time\t   Hits\t     Percentage\t        Transmitted\n\n");

       for (i = 0; i < 24; i++)

       {

              tot_hits   += report7_array[i].s4_hits;

              tot_kbytes += report7_array[i].s4_kbytes;

       }

       for (i = 0; i < 24; i++)

       {

              fprintf(fptr3,"\t%s\t%7ld\t\t%5.1f %%\t\t %10.2f\n",

                     report7_array[i].s4_time,

                     report7_array[i].s4_hits,

                     100*(float)report7_array[i].s4_hits/tot_hits,

                     100*report7_array[i].s4_kbytes/tot_kbytes);

       }

       fprintf(fptr3,"%s",dashed_line);

       free(report7_array);

}

/**********************************************************************************************/

/**********************************************************************************************/

/**********************************************************************************************/

/**********************************************************************************************/

/**********************************************************************************************/

void report8()

{

       long work_count;

   char work_status[4];

       strcpy(work_string1,"sort +0 -1 -rn +1 -2 < ");

       strcat(work_string1,error_code_file_string);

       strcat(work_string1," > ");

       strcat(work_string1,sorted_error_code_file_string);

       strcpy(work_string2,"rm ");

       strcat(work_string2,error_code_file_string);

       strcpy(work_string3,"rm ");

       strcat(work_string3,sorted_error_code_file_string);

       fprintf(fptr3,"Summary of HTTP errors:\n\n");

       if ((fptr13 = fopen(error_code_file_string,"w")) == NULL)

       {

              printf("can't open error_code_file\n");

               exit(1);

       }

       /* Don't write lit's to file 'cause they contain blanks, which throw off fscanf. */

       if (e400_count > 0) {fprintf(fptr13,"%ld %s\n",e400_count,"400");}

       if (e401_count > 0) {fprintf(fptr13,"%ld %s\n",e401_count,"401");}

       if (e402_count > 0) {fprintf(fptr13,"%ld %s\n",e402_count,"402");}

       if (e403_count > 0) {fprintf(fptr13,"%ld %s\n",e403_count,"403");}

       if (e404_count > 0) {fprintf(fptr13,"%ld %s\n",e404_count,"404");}

       if (e405_count > 0) {fprintf(fptr13,"%ld %s\n",e405_count,"405");}

       if (e406_count > 0) {fprintf(fptr13,"%ld %s\n",e406_count,"406");}

       if (e410_count > 0) {fprintf(fptr13,"%ld %s\n",e410_count,"410");}

       if (e500_count > 0) {fprintf(fptr13,"%ld %s\n",e500_count,"500");}

       if (e501_count > 0) {fprintf(fptr13,"%ld %s\n",e501_count,"501");}

       if (e502_count > 0) {fprintf(fptr13,"%ld %s\n",e502_count,"502");}

       if (e503_count > 0) {fprintf(fptr13,"%ld %s\n",e503_count,"503");}

       if (e504_count > 0) {fprintf(fptr13,"%ld %s\n",e504_count,"504");}

       fclose(fptr13);

       system(work_string1);

       system(work_string2);

       if ((fptr14 = fopen(sorted_error_code_file_string,"r")) == NULL)

       {

              printf("can't open sorted_error_code_file\n");

               exit(1);

       }

       while (fscanf(fptr14,"%ld%s",&work_count,work_status) != EOF)

       {

              fprintf(fptr3," %7ld\t%s\t%s\n",work_count,work_status,error_lit(work_status));

       }

       system(work_string3);

       fprintf(fptr3,"%s",dashed_line);

}

char * error_lit(err_code)

       char * err_code;

{

       if (strcmp(err_code,"400") == 0) {return (e400_lit);}

       if (strcmp(err_code,"401") == 0) {return (e401_lit);}

       if (strcmp(err_code,"402") == 0) {return (e402_lit);}

       if (strcmp(err_code,"403") == 0) {return (e403_lit);}

       if (strcmp(err_code,"404") == 0) {return (e404_lit);}

       if (strcmp(err_code,"405") == 0) {return (e405_lit);}

       if (strcmp(err_code,"406") == 0) {return (e406_lit);}

       if (strcmp(err_code,"410") == 0) {return (e410_lit);}

       if (strcmp(err_code,"500") == 0) {return (e500_lit);}

       if (strcmp(err_code,"501") == 0) {return (e501_lit);}

       if (strcmp(err_code,"502") == 0) {return (e502_lit);}

       if (strcmp(err_code,"503") == 0) {return (e503_lit);}

       if (strcmp(err_code,"504") == 0) {return (e504_lit);}

       return ("");

}

/**********************************************************************************************/

/**********************************************************************************************/

/**********************************************************************************************/

/**********************************************************************************************/

/**********************************************************************************************/

long hash_date(char *input_date)

{

       long work;

       /* Calculate hash value. */

       /* Compute a number as a function of the year, month, and day.

          pre-bias-Num = 365 * yr + 31 * mon + day

          Num = pre-bias-Num - date_bias.

          Date_bias is used so that all dates are in sequence within hash. It is calculated by

          treating the first possible date in the date range as a data value and hashing in with that

          to the date range before any bias is used. This gives a remainder or offset for the first

          possible date into the hashed date range. We want the first possible date to line up with

          the first possible hashed value, so all hashed values are in date sequence. Otherwise,

          if the first possible date is somewhere within the range of hashed values, dates later than

          that date can hash in either after it or *before* it in the hash range, causing a date sequence

          problem. So, we subtract the bias from the first possible date's pre-bias Num so that date

          will map to the beginning of the hash value range. We also adjust all other dates the same way.

          To clarify, suppose there are 100 dates in the date range, and that hashing the first possible date,

          say October 1, puts October 1's entry as the 30th entry in the hash table. The first 69 dates

          after October 1 are OK, since they hash in after it in the hash range, but the next 30 dates

          after *those* hash in *before* October 1's entry. So, printing entries out in hash table sequence

          will show a date wraparound rather than a strictly ascending date sequence. To fix this, we

          subtract 30 from each pre-bias Num that is calculated, so 30 becomes 0, etc.

          We use 372 as a multiplier since we treat all months as having 31 days. This way 1/1 > 12/31.

       */

       memcpy(rec_year_long,input_date + 7,4);

       memset(rec_year_long + 4,'\0',1);

       memcpy(rec_month,input_date+3,3);

       memset(rec_month + 3,'\0',1);

       memcpy(rec_day,input_date,2);

       memset(rec_day + 2,'\0',1);

       work = (long) (372 * atoi(rec_year_long) + 31 * (month_num(rec_month) - 1) + atoi(rec_day) - date_bias);



       return

       (

              work % (long) num_dates

       );

}

long hash_file(char *input_file)

{

       long work = 0;

       register long i;

       /* Calculate hash value. */

       for (i = 0; i < strlen(input_file); i++)

       {

              work += input_file[i] + 128;

       }

       return ((long)(work % (scaling_factor * num_uniq_items)));

}

long hash_ip(char *input_ip)

{

       char work[16];

       register long i,j,k,l;

       strcpy(work,blank_ip);

       /* Calculate hash value. */

       /* Convert ip string; get rid of dots and reverse the digits (so more random).

          So, e.g. 123.45.189.67 becomes 7698154321.

       */



/*printf("%s\n",input_ip);

*/

       /* backwards ip often bigger than max long (2,147,483,647), so stop after 9 digits */

       j = 0;

       k = strlen(input_ip)-1;

       if (k > 8) {l = k - 8;} else {l = 0;}

       for (i = k; i >= l; i--)

       {

              if (isdigit(input_ip[i]))

              {

                     work[j] = input_ip[i];

              }

              else

              {

                     /* hi 4 bits -> 0, lo 4 bits -> 0-9, prepend 0011, -> 30-39 = number */

                     work[j] = (input_ip[i] & 9) | 48;

              }

              j++;

       }



       indexx = atol(work);

       indexx = indexx % (scaling_factor * num_uniq_items);

       if   (indexx < 0)       {return (-1 * indexx);}

       else                   {return (indexx);}

}

void wrapup()

{

       char mail_string[256];

       strcpy(mail_string,blank_string);

       strcpy(mail_string,"/usr/lib/sendmail ");

strcat(mail_string,"statsmaster@simplenet.com");

/*       strcat(mail_string,get_value("address"));

*/       strcat(mail_string," < ");

       strcat(mail_string,outfile_string);

       strcat(mail_string,"\0");

       strcpy(work_string1,"rm ");

       strcat(work_string1,ip_date_file_string);

       strcpy(work_string2,"rm ");

       strcat(work_string2,cgipath);

       strcat(work_string2,"report*");

       strcat(work_string2,pid);

       strcpy(work_string3,"rm ");

       strcat(work_string3,outfile_string);

       fclose(fptr3);

       if (ip_date_file_removed != 1 && r5_data_needed == 1) {system(work_string1);}

       system(work_string2);

       system(mail_string);

       system(work_string3);

       strcpy(work_string5,"rm ");

       strcat(work_string5,r5sorted_ip_date_file_string);

       if (r5sorted_ip_date_file_created == 1

       && r5sorted_ip_date_file_removed != 1) {system(work_string5);}

}

void wrapup_hogs()

{

       strcpy(work_string1,"rm ");

       strcat(work_string1,file_list_string);

       system(work_string1);

       strcpy(work_string1,"rm ");

       strcat(work_string1,work_file_string);

       system(work_string1);

       strcpy(work_string1,"rm ");

       strcat(work_string1,ip_date_file_string);

       if (ip_date_file_removed != 1 && r5_data_needed == 1) {system(work_string1);}

       strcpy(work_string1,"rm ");

       strcat(work_string1,cgipath);

       strcat(work_string1,"report*");

       strcat(work_string1,pid);

       system(work_string1);

       fclose(fptr3);

       strcpy(work_string1,"rm ");

       strcat(work_string1,outfile_string);

       system(work_string1);

       strcpy(work_string5,"rm ");

       strcat(work_string5,r5sorted_ip_date_file_string);

       if (r5sorted_ip_date_file_created == 1

       && r5sorted_ip_date_file_removed != 1) {system(work_string5);}

}

void get_form_input()

{

       /* This subroutine gets the form variables and values as one big string, "buffer".

          You need a global variable called "buffer", e.g. char buffer[1024];

          You also need a global variable called "val" declared e.g. char *val.

          You also need the other subroutine, "get_value", to access the variables. */

       char r_method[5];

       char c_length[6];

       char work_buffer[1024];

       long i = 0;

       long j = 0;

       register char digit;

       if (getenv("REQUEST_METHOD") != NULL)

       {

              strcpy(r_method,getenv("REQUEST_METHOD"));

              if (strcmp(r_method,"POST") == 0)

              {

                     if (getenv("CONTENT_LENGTH") != NULL)

                     {

                            strcpy(c_length,getenv("CONTENT_LENGTH"));

                            fgets(work_buffer,atoi(c_length)+1,stdin);

                     }

              }

              else if (getenv("QUERY_STRING") != NULL)

              {

                     strcpy(work_buffer,getenv("QUERY_STRING"));

              }

       }

       /* Convert from urlencoding */

       while (i < strlen(work_buffer))

       {

              if        (work_buffer[i] == '%')

              {

                     if   (work_buffer[i+1] >= 'A')       {digit = ((work_buffer[i+1] & 0xdf) - 'A') + 10;}

                     else                        {digit = work_buffer[i+1] - '0';}

                     digit *= 16;

                     if   (work_buffer[i+2] >= 'A')       {digit += ((work_buffer[i+2] & 0xdf) - 'A') + 10;}

                     else                        {digit += (work_buffer[i+2] - '0');}

                     buffer[j] = digit;

                     i += 3;

              }

              else if (work_buffer[i] == '+')           {buffer[j] = ' '; i++;}

              else                               {buffer[j] = work_buffer[i]; i++;}

              j++;

       }

       strcat(buffer,"\0");

       val = NULL; /* Init here, so can check on automatically freeing it on entry to get_value. */

}

char * get_value(char varname[256])

{

       /* This subroutine extracts the value of a form variable. The name of the variable

          must be passed as a string. E.g. to get the value of "form_city", call as follows:

          get_value("form_city"); The value is returned as a string. You can use "val" or assign

          the value to a variable, e.g. strcpy(city,get_value("form_city")); Make sure you

          allocate space for the receiving variable, e.g. char city[256]; rather than char *city;

       */

       char *name_start;

       char *val_start;

       char *parm_end;

       char name[100];

       long val_length, name_length;

       strcpy(name,varname);

       strcat(name,"=");

       name_length = strlen(name);

       if (val != NULL) {free(val);}

       if        (strstr(buffer,name) != NULL)

       {

              name_start = strstr(buffer,name);

              val_start = name_start + name_length;

              if        (strstr(name_start,"&") != NULL)

              {

                     parm_end = strstr(name_start,"&");

                     val_length = parm_end - val_start;

              }

              else       {val_length = buffer + strlen(buffer) - val_start;}

              val = (char *) malloc (val_length + 1);

              memcpy(val,name_start + name_length,val_length);

              memset(val + val_length,'\0',1);

       }

       else

       {

              val = (char *) malloc (10);

              strcpy(val,"not found");

       }

       return(val);

}

void print_glossary()

{

       fprintf(fptr3,"

Definitions\n

Hit: A request for any object that is on your site. Each element of a

requested page (a graphic, a sound file, or the page itself) is counted

as a hit. A page on your site that contains five graphics generates six

hits - the five images and the original request made for the page.\n

Page Hit: The total number of times any of your pages are visited. The

same user counts as a page hit each time he loads one of your pages.\n

Client: A unique IP number that has accessed your site (i.e. person).\n

Individual Clients: If a client accesses more than once on different

days within your report period, this counts only once, whereas the Clients

and All Clients fields count the client again.\n

Kilobytes transmitted: The number of Kilobytes of data transmitted from

your site.

       ");

}