Parsing HTTP GET - C -


i'm trying parse http request, , have been doing using strtok(), running problems when trying use strcpy().

i can parse file path , file name fine, can't seem parse remote host dns name. below code should tokenize string , dns name, store in char[] called host.

#include <stdio.h> #include <time.h> #include <string.h> #include <sys/types.h> #include <sys/stat.h> #include <unistd.h> #include <stdlib.h>  int main() {         int c = 0, c2 = 0;         char *tk, *tk2, *tk3, *tk4;         char buf[64], buf2[64], buf3[64], buf4[64];         char host[1024], path[64], file[64];          strcpy(buf, "get /~yourloginid/index.htm http/1.1\r\nhost: remote.cba.csuohio.edu\r\n\r\n");          tk = strtok(buf, "\r\n");         while(tk != null)         {                 if(c == 1)                 {                         tk2 = strtok(tk, " ");                         while(tk2 != null)                         {                                 if(c2 == 1)                                 {                                         printf("%s\n", tk2);                                         strcpy(host, tk2);                                 //      printf("%s\n", host);                                 }                                 ++c2;                                 tk2 = strtok(null, " ");                         }                 }                 ++c;                 tk = strtok(null, "\r\n");         }          return 0; } 

bear me, i'm new c programmer , code may ugly. every time try running program, segmentation fault (core dumped) error, , believe has strcpy(). can print out tokenized string fine, can't seem copy char[].

sorry, strtok(3) function not parse http @ all. despite of this, i'll try explain what's happening in code.

  1. the first time, enter loop tk=="get /~yourloginid/index.html http/1.1", , buffer has been changed "get /~yourloginid/index.htm http/1.1\0\nhost: ...". c==0, won't if block, you'll c variable incremented , tk=strtok(null, "\r\n"); called again second line.
  2. the second time, enter loop tk=="host: remote.cba.scuohio.edu\r\n...", strtok(3) jumped on first \0 in string, skipped \r , \n characters, , got (strtok has put second \0 after part, leading tk=="host: remote.cba.scuohio.edu\0\n...". c==1 time, inside if block , call strtok(tk, " ");. makes strtok(3) to forget extent of string parsing, , begin new parse on host: remote.cba.csuohio.edu" (as passed first non-null argument), return tk=="host:", putting \0 after "host:". second time enter inner loop, copy value host variable.
  3. the third time enter main loop, have tk==null last time called tk=strtok(null, " "); returned null (in inner loop), strtok continue returning null until initialize again, passing first non-null argument.

strtok(3) operates on string passed first parameter (writing info on it) , modifies it. further, has global hidden variable mark end of string parsing, able return null when finished parsing. if nest calls strtok(3) undefined behaviour, loose internal state of function when initialize again, passing non-null first parameter. reason of fail.

calling strtok(3) has numerous drawbacks , cannot nested in several nested loops stores internally state related parsing. it's deprecated use. if want nestable, have switch strtok_r(3) instead. function has parameter allows save externally strtok internal state, can have several strtoks working in parallel.

further, strtok parse ok "get_/~yourlogin..." "get___/~yourlogin..." (i have used underscores represent spaces show multiple spaces between method name , uri) , latter not permitted http. same reason, can "host:remote.cba.csuohio.edu" valid header field (however, use discouraged) , not parse correctly that. also, host: header field might not first line in http header, can skip if not carefull.

if want parse http, first reading can recommend rfc-2616, "hypertext transfer protocol - http/1.1", mandatory document comply implementors. beware, it's dense , large document.