Distributed Program Construction
Fall 1999
Lecture 6: WWW
COMP 413
The
Web
Technologies
-
based on simple request-reply protocol
(HTTP)
-
on top of stream (TCP) sockets
-
server normally listens on port 80
-
default content encoding is HTML
-
other content types supported via MIME-like
content description facilities
-
content naming via Uniform Resource Locators (URL)
-
part of the more general class of Universal Resource Identifiers
(URIs)
Components
-
Server
-
originator of Web content
-
Proxy
-
often connected with organization's firewall
-
may or may not cache content (on disk)
-
Client
-
browser (Netscape, IE, etc.)
-
communicate via Hypertext Transfer Protocol (HTTP)
COMP 413
HTTP Version 1.1 is specified
in RFC 2616.
-
can be manually operated by connecting
to port 80 of a server with a command like
telnet www.cs.rice.edu 80
(try it!)
Example exchange:
HTTP Request:
-
Request header line: <command> <url>
<protocol-id>
-
variable number of request header fields
(one per line) of the form <name>: <value>
-
blank line
-
optional entity body
HTTP Response:
-
Response header line <protocol-id>
<result code> <reason phrase>
-
variable number of response header
fields (one per line) of the form <name>: <value>
-
blank line
-
optional entity body
COMP 413
HTTP commands ("methods")
-
GET -- retrieve "resource" associated with URL
-
HEAD -- same as GET, but don't send entity body
-
POST -- post information associated with URL
-
others
HTTP result codes
-
200 OK
-
401 Unauthorized
-
404 Not found
Some request header fields:
-
Accept
-- tell server about client's capabilities
-
Authorization
-- provide server with authentication credentials
-
If-Modified-Since
-- send content only if modified
-
If-None-Match
-- send content only if Etag doesn't match
-
Range
-- send only subset (byte-range) of content
-
Refered
-- where did client find this URL
Some response header fields:
-
Age
-
Content-encoding
-
Content-length
-
Expires
-
Etag
-
Last-Modified
COMP 413
Content caching critical to Web performance
and scalability
-
Client caching
-
Proxy caching
Cache consistency:
-
Expiration mechanism -- seeks to
eliminate server HTTP requests
-
Expires response header (set by origin server)
-
Heuristic expiration (cache guesses suitable expiration)
-
Revalidation mechanism -- seeks to reduce need to transmit
full content responses
-
If-Modified-Since request header (expiration data)
-
If-None-Match request header
(Etag comparison)
-
weak ETag -- must change when entity changes in a
"semantically significant way"
-
strong ETag -- must change when entity changes in
any way
COMP 413
Server and Proxy logging
-
Typical log entry (Apache server):
inehou-pxy02.compaq.com
- - [09/Sep/1999:08:17:35 -0500] "GET /~druschel/comp413/assignments/ HTTP/1.0"
200 6722
-
Server and proxy logs only available to server/proxy operators
-
Impairs caching, as many servers don't allow caching in order
to maintain proper accounting
COMP 413
HTTP access control
-
upon request for access controlled
resource without proper credentials, server responds with 401
Unauthorized.
-
WWW-Authenticate response
header field contains challenge string.
-
client may retry command with appropriate
credentials in the Authorization request header field.
Two authentication schemes (see RFC
2617)
-
Basic (only scheme currently supported
by browsers)
-
Digest
Basic Authentication example
-
WWW Authenticate: Basic realm="Rice-CS"
-
Authorization: Basic QWxhZGRpbjpvcGVuIHNlc2FtZQ==
(base64 encoding of <username>:<password>)
-
weak security: username and password
are sent in clear text
Digest Authentication (simplified)
-
server includes a nonce value in its
challenge
-
client responds with credentials that
include the MD5 checksum of
-
username
-
password
-
nonce
-
HTTP command
-
requested URL
-
much stronger than BASIC (no cleartext
password), but still a password scheme
COMP 413
HTTP is a stateless protocol
-
how to implement services that requires state across invocations?
(e.g., shopping cart)
-
one solution: allows servers to place state on clients (cookies)
-
server can place a cookie on the client during a HTTP request
-
cookie value sent to server during subsequent HTTP requests
-
Note: cookies also allow user tracking!
Netscape
HTTP cookies
-
supported by IE as well
-
server sets cookie using the set-Cookie response header
-
set-Cookie: NAME=VALUE; expires=DATE; path=PATH; domain=DOMAIN_NAME;
secure
-
NAME: name of cookie
-
VALUE: cookie value
-
expires: how long should client store the cookie
-
client includes cookie in any HTTP requests with URLs such
that
-
there is a suffix match between the URL server domain name
and DOMAIN_NAME
-
there is a prefix match between the URL path and PATH
-
secure means cookie should only be returned over secure
(HTTPS) connections
-
client includes cookies in HTTP requests using the Cookie
request
header
-
Cookie: NAME1=VALUE1; NAME2=VALUE2 ...
COMP 413
Stateless protocol, originally designed for file retrieval
-
no server state
-
no client-side computation
-
no server-side computation
Various fixes
-
Java applets
-- client-side computation
-
CGI
-- server-side computation
-
servlets
-- ditto
-
HTML forms -- fill-out
Web page forms
COMP 413
HTML
applet tag
<APPLET
CODEBASE =
codebaseURL
ARCHIVE =
archiveList
CODE = appletFile
...or... OBJECT = serializedApplet
ALT = alternateText
NAME = appletInstanceName
WIDTH = pixels
HEIGHT = pixels
ALIGN = alignment
VSPACE = pixels
HSPACE = pixels
>
<PARAM NAME = appletAttribute1
VALUE = value>
<PARAM NAME = appletAttribute2
VALUE = value>
. . .
alternateHTML
</APPLET>
-
allow execution of Java code inside client browser
-
applet computation constrained by security policy in effect
at client browser
COMP 413
Common Gateway Interface (CGI)
-
allow execution of arbitrary programs at server in response
to HTTP requests (e.g., database queries)
-
URL refers to an executable program (CGI program)
-
when receiving a HTTP for a CGI resource
-
server forks a process running the executable, passing the
request header in environment variables
-
server connects client TCP socket to CGI program's stdin,
stdout
-
CGI program reads HTTP request entity body from its stdin
-
CGI program performs arbitrary computation
-
CGI program send HTTP response on its stdout
CGI programs
-
can be any executable program
-
for portability reasons, often written in Perl, Tcl, or Java
Efficency
-
requires fork, exec of a process for each HTTP request
-
FastCGI: allow CGI process to persist across HTTP requests
(not state!)
-
NSAPI/ISAPI: allow linking of CGI programs with HTTP server
-
loss of protection between Web server and CGI program
-
Java servlets: execution of Java code inside a Java-based
Web server (Sun)
COMP 413
HTML Forms
HTML
Form tag
-
user input via buttons, menus, text fields
-
submitted to the server once user clicks a "submit" button
-
handled on the server via a CGI program (typically)
Example:
<FORM action="http://somesite.com/prog/adduser"
method="post">
<P>
<LABEL for="firstname">First
name: </LABEL>
<INPUT type="text" id="firstname"><BR>
<LABEL for="lastname">Last
name: </LABEL>
<INPUT type="text" id="lastname"><BR>
<LABEL for="email">email:
</LABEL>
<INPUT type="text" id="email"><BR>
<INPUT type="radio"
name="sex" value="Male"> Male<BR>
<INPUT type="radio"
name="sex" value="Female"> Female<BR>
<INPUT type="submit"
value="Send"> <INPUT type="reset">
</P>
</FORM>
Client broswer collects input data in form data set,
a string consisting of control-name/value pairs, separated by "&".
Two ways to submit the form data set to the server:
-
If method is GET, append "?", followed by the form
data set, to the action URL (spaces are transformed to "+")
-
if method is POST, send the form data set as the entity
body of the HTTP request
COMP 413
Many services require state across
HTTP requests (e.g., shopping)
-
cookies
-
applets
-
hidden fields in forms
-
place state into HTML fields not displayed
by the browser (e.g., comments)
-
server database in connection with
client-side state (cookies/applets/hidden form fields)
COMP 413
via Secure Socket Layer (HTTPS)
-
strong authentication (usually server
authenticates to client, not vice versa)
-
encrypted data transfer
COMP 413
Web shortcomings
-
originally designed for document retrieval
-
limited scale
-
non-commercial
-
many features added as an afterthought
-
interactice service support (CGI, Applets)
-
caching
-
security
-
no accounting (beyond simple server/proxy logging)
-
no support for distributed transactions
-
no automatic replication
-
structured documents
COMP 413