Internet becomes more and more crowded. Every day new websites emerge, new computers and/or new computer networks join the Internet, and more information resources become available on it (Raunak et.al, 2000). More people find the need to have access to the Internet and even more feel it become the essentials part of their work and life. Many companies and organisations are becoming aware that the Internet is very important to achieve advantages or even survive in their industry.
Unfortunately, as in the case of many Information Technology resources, Internet access is not freely available to any companies or even individual. Internet connections are limited resources even for big organization (Raunak, et.al. 2000), for example is in the case of University of Wollongong, Australia (Rome, 1999). Internet connection is a high cost resource due to its high price on bandwidth usage rate, even compared to its beneficiary existence. This kind of problem is faced by many organizations. Internet connection bandwidth is a limited resource while the demand for connection is high and even become higher from day to day.
Another problem rising is security. Connecting the organisation’s intranet or network to the Internet will open the door to outside party to interfere. Outside world could reach and access the organisation’s intranet. This kind of access could be very harmful and endangering the integrity or even the existence of organisation’s information systems (Gralla, 1996).
One of the best solutions to be applied on such conditions is to install and use web proxy server between the Internet and organisation’s Intranet (network). Web proxy server has several advantages, which are it could reduce the connection cost (Wallace, 1998), secure the organisation’s network (Yerxa, 1998), and balance the load on the server connected to the Internet in the case of reverse proxying (Luotonen 1998). Web proxy server could be integrated with firewall to provide better protection to organisation’s intranet (Hughes, 1998) and provides tool for monitor and control the network performance. Several products even integrated firewall, router, and proxy feature on single package, for example is Winroute Pro version 4.1 (ICSA Labs., 1999) and Microsoft Internet Security and Acceleration Server (ICSA Labs., 2000).
Luotonen (1998, p.18) describes the history of web proxy server. Originally proxy server is referred as gateway. It acts as the door between organisation’s intranets and the Internet. All incoming and outgoing communication should be directed through the gateway and sometimes convert the protocol to suit the destination network. In the next development, the term proxy server was came up to distinct between internet/firewall gateways, which allow web-related communication to enter secure intranet, and information gateways that interface third-party information systems to the Internet.
Luotonen (1998, pp. 7-15) also describes that there are two types of proxy servers. First is application-level proxy server, which are software programs that are familiar with a specific or several protocols that they relay. Clients on Intranet will make requests to the proxy server instead of connecting directly to a remote service, and the proxy server will perform the actual request on behalf of the client. Second is circuit-level proxy server, which are software program that acts on the connections level. It establishes the connection to the application that requests it, but after that, simply forwards the data in both directions in the connection, without interfering on the application-level protocol.
There are three general properties of proxy servers. First is transparency on the client side, where end result for the user side is not affected by any filtering capability on web proxy servers. Second is client could determined whether or not to use web proxy server. This is not the case when single point connection to Internet is enforced within the Intranet for monitoring and control purpose. Third is transparency on the server side, where the destination server is unaffected by any intermediate web proxy servers and often completely unaware (Luotonen, 1998).
The first generic internet/firewall proxy server was the CERN proxy server (CERN httpd), which was a combination of web server and proxy server on a single product. Ever since many web proxy servers product have been produce, such as Squid Proxy Server (Pearson, 2000), Microsoft Proxy Server (Galioto, 1998) and ISA Server (ICSA Labs, 2001), Netscape Proxy Server (Netscape Communication Inc, 1997), and even on a bundled package with a dedicated hardware that its only functioning as caching web proxy server such as Cache Flow and Net App (Hohman, 1998). There are even some independent institutions working on testing web proxy server product on the market such as ICSA. On the next part will be discussed what is a web proxy server and how it’s work.
Web proxy server is defined as an intermediary server that accepts requests from clients and forward them to other proxy servers, the origin server, or services the request from its own cache. A proxy server act both as a server as well as a client; the proxy is a server to the client connecting to it, and a client to servers that it connects to (Luotonen, 1998). Sometimes user is unaware that they are indirectly connected to the Internet through web proxy servers. Figure 1 describes this situation as proxies-reality.
Figure 1.
Proxies-reality Illusion
(Chapman & Zwicky, 1995, p. 191)
This is mean that instead of user on Intranet are directly connected to the Internet; web proxy server represents them. User and in some cases even the clients’ software is not aware of the web proxy server present. This condition is referred to Transparent Proxy (Luotonen, 1998). They act as if they are directly connected to the Internet. This will improve security since all incoming and outgoing flow of data must be directed via web proxy server. In order to achieve such goal, web proxy servers must be used in conjunction with some method of restricting IP-level traffic between the clients and the real servers (Internet), such as in screening router or a dual-homed host that doesn’t route packets. If there is IP-level connectivity between the clients and the real servers, the clients can bypass the proxy system and presumably so can someone from the outside (Chapman & Zwicky, 1995). This feature is known as Network Address Translating or NAT (Yerxa, 1998). Figure 2 is an example on using proxy service with a dual-homed bastion.
Figure 2.
Proxy service with a dual-homed bastion.
(Chapman & Zwicky, 1995, p. 62)
NAT will hide the real network address of its Intranet user and then using its own IP address in order to request document demanded by its client on the Internet (ICSA Labs., 1999). All traffic between Intranet and Internet will be concentrated on the web proxy servers and firewall. The advantage of this technique is simplifying Intranet and Internet traffic management. Traffic rules only need establish once on the web proxy servers and firewall and will affect the whole Intranet-Internet communication. Every error or suspicious communication that lead to unauthorized or malicious access could be monitored and prevented from single place. The structure of NAT could be implemented as shown in the Figure 3.
Another important feature of web proxy servers is caching. This is the most important feature of web proxy servers, in fact many web proxy servers product are called with the term cache on it, such as Net Cache, Cache Flow, or Cache Engine (Hohman, 1998). Luotonen (1998, p.157) described caching as process to store copies of document retrieved by the proxy servers to local storage media (typically to disk, but also main memory for short term caching) from where it’s readily available to anyone who requests that same document subsequently. Yeager and McGrath (1996, p. 195) give an illustration on how the basic operation of a caching proxy server as shown in Figure 4.
Figure 3.
Network Address Translating
(Tiny Software Inc., 2001).
Figure 4.
Web Document Caching
(Adapted from Yeager & McGrath, 1996, p. 195)
The Intranet client did not received the document directly from the Internet. The client received cached document from the local storage of web proxy servers. Later when another client ask the same document, web proxy servers would only need to send the cached document instead of request the same document from the origin web server.
Many web servers on the Internet have a dynamic feature, and its content sometimes change rapidly. On same cases change hourly or less, such as CNN or any other rapidly updated news web site. In this case, client could receive an older or staled version stored on the web proxy servers cache rather than newer version stored on the origin web servers. This condition is fixed with conditional request feature from a web proxy server (Luotonen, 1998, pp. 168-159). Conditional Request is represented with Conditional “Get” from HTTP protocol that allows a document to be retrieved conditionally, based on whether it has been modified since the last access. With this feature, web proxy servers would receive unmodified status if the document is unchanged and then use the stored cached version of the document, otherwise it would request the up to date version of the document from the origin web servers. Furthermore web proxy servers could allow cache expiration settings are set on the HTTP document in order to reduce transactions with remote web servers. The cache expiration setting will tell web proxy servers to estimate if the HTTP document needs an up to date check before sending request to remote web servers (Netscape Communication Inc. 1997, p. 119).
Caching is the main feature that attracts many users to use web proxy servers on their network. Advantage of caching is improving performance of Internet connection, saves bandwidth, and reduces latency (Luotonen, 1998). Instead of every client on the Intranet request a same document from the same origin web servers, web proxy servers could provide them with stored cache version of the same document. It will reduce the request for the same document from the Internet. Internet connection bandwidth usage will be more efficient because web proxy servers only need to use it when request a document for the first time plus issuing conditional get to the origin web servers (Luotonen, 1998).
According to Yerxa (1998, p. 134), there are two main caching protocols available, CARP (Cache Array Routing Protocol) and ICP (Internet Cache Protocol). CARP is a protocol that allows web proxy servers to be added and removed from the web proxy server array without relocating more than a single proxy’s share of document. CARP could be implemented on load balancing feature (Luotonen, 1998, p.318). ICP is a protocol used for querying web proxy servers for cached documents. ICP usually used by web proxy server for querying other web proxy servers’ cache, but it could be used by clients as well to query web proxy servers’ cache.
ICP and CARP could be used when there are more than one web proxy servers inside one organisations’ Intranet or when a web proxy server is chained to other web proxy servers on the Internet. For example is in the case of departmental web proxy servers (Luotonen, 1998) and sibling-parent proxy features found in Squid (Pearson, 2000). In departmental web proxy a server, each departmental network has it’s own web proxy server and then chained into the main web proxy servers that directly connected to the Internet. Advanced web proxy server product could provides distributed caching so multiple web proxy servers could act as a logical single web proxy server for load balancing, fail over, and dynamic proxy routing so that the web proxy servers can query other caches to determine if a document if available (Netscape Communication Inc, 1997, p. 22). In Squid Proxy Servers, it could use another web proxy servers on the Internet (ISP’s web proxy servers) as the parent or sibling web proxy servers and form hierarchical web proxy servers (Pearson, 2000).
Cache itself has three different architectures based on how they store caching data, map mechanism to establish relationship between URLs to their respective cached copies, and format of the cached object content and its metadata. (Luotonen, 1998, pp. 195-203). They are CERN Style Cache Architecture that is used by CERN httpd web proxy servers, Netscape Style Cache Architecture that is used by Netscape Proxy Server, and Harvest Style Cache Architecture that is used by Squid.
Wallace (1998, pp.47-48) argued that there are increased performances when using web proxy servers. For example is Sun Health, an ISP at Arizona, which claimed cost reduction up to US$ 1,200 per month since they have been using web proxy servers instead of adding a new T1 connection to the Internet backbone in order to accelerate their Internet access. According to Raunak, et.al (2000, p. 66) the network traffic average load could be reduced by 50% if using web proxy servers. It will improve network performance and reduce latency time to fulfil users’ request.
Caching feature also could be used on the web servers’ side; this capability is named Reverse Proxying or web proxy servers reverse feature. Reverse Proxying refers to a setup where the web proxy servers is run such a way that it appears to clients like a normal web server (Luotonen, 1998). Galioto (1998, p. 1) describe how to set up and use reverse proxy feature. It will need two web servers; one is required by the web proxy servers and become firewall computer and the other will be the place where the web site is published (the real web server). The web proxy servers become the only computer connected to the Internet and receive direct request from another computer. When a request came from the Internet, web proxy servers serve the request by fetching it from the internal web server. Web proxy servers also could use its caching feature to serve requests for the same document in the future. Figure 5 is illustrating how the reverse feature of web proxy servers working.
Figure 5.
Reverse Proxying
(Adapted from Galioto, 1998, p. 2).
According to Luotonen (1998, pp. 326-328), the reason to use reverse proxying is for replication of content to geographically dispersed areas and replication of content for load balancing. Reverse web proxy servers could be used to establish several replica servers of a single master server to geographically spread areas. This also could be a good way to protect the original web servers just in case of malicious attack aimed to destroy the web servers. The attack will only hit the web proxy server not the original web servers. The second reason for reverse proxying is for load balancing purpose of a heavily loaded web server. The requests from clients are distributed to multiple servers by using load-balancing methods available (such as DNS Round Robin, Hash Function Based Proxy Selection, or using hardware based load balancing such as Cisco products). Figure 6 is illustrating how the load balancing implemented using reverse web proxy servers and DNS round robin method.
Figure 6.
DNS Round Robin Load Balancing
(Adapted from Netscape Communication Inc., 1997, p. 92).
The next feature of web proxy servers is filtering. A web proxy server is a good place to perform content filtering of network traffic between Intranet and Internet. It is the single point where the entire request go through, and the point of entry for all the data entering the internal network (Luotonen, 1998). Furthermore, Luotonen (1998, pp. 213-224) describes that a web proxy server could perform several different types of filtering. First filtering type is URL filtering, where the requested URL will be matched against a set of patterns or checked against a precompiled list of known URLs and will reject or grant connection accordingly. Another type of filtering is content filtering, where the content of the requested document will be filtered according to a precompiled list and will be treated accordingly. This filtering feature could be used to censor some material that is not appropriate for the Intranet environment or prevent unwanted advertising. Another implementation of content filtering is to prevent computer virus to enter Intranet (Netscape Communication Inc., 1997, p.23)
Another important role of web proxy servers is to monitor and control the networks’ operation and performance. Large network can use web proxy servers primarily to limit HTTP bandwidth across their Internet connection pipes, as well as to concentrate web client access through a single point. This allows for potential access control and lets accounting data be gathered for future analysis (Yerxa, 1998, p. 132). This role could enable establishment of Internet bandwidth control and limitation such as implemented in the case of University of Wollongong (Ah Chung Tsoi, 2000). The control is established by limiting the amount of data downloaded by web proxy servers with establishing certain quota. When a user has reached the limit of his or her quota, web proxy servers will deny further Internet access. In several commercial Internet Service Providers the same principle is applied to charge additional connection rate to the customer who have consumed their entire quota and still wishing to use the connection and download more data.
Another feature of control is access control. Access control is needed when the organisation would like to ensure that particular Intranet resources only accessed by an authorized person, either from inside or outside of the organisation. Luotonen (1998, pp. 227-244) describes that are two kind of access control available. First is using user authentication. User needs to authenticate themselves to the web proxy servers before allowing the request to pass. For example is on the case of University of Wollongong authentication mechanism (Rome, 1999). No matter the location of the user, as long as they could provide the right password they could use the web proxy servers. The second method is using client host address. Requests are restricted based on the source host address either incoming request or the name of the requesting host.
When an organisation would like to install and use web proxy servers, they have to determined capacity, capability, and availability of the web proxy servers. Taylor (2001) provides several questions that could be use as a guide to select and deploy web proxy servers’ solutions.
The first question is regarding the type of web proxy servers needed. Taylor list three types web proxy servers, which are proxy firewalls filter services at the application level, stateful packet inspection firewalls, and combination of proxy firewalls filter service and stateful packet inspection firewalls (Taylor, 2001)
The next group of questions is regarding the scale and capacity of the web proxy servers. First is the scale of the Intranet that would be protected by and using web proxy servers and the service needed. Organizations that require multiple firewalls that need to be managed from one location need enterprise class firewall. Enterprise firewall is a turnkey hardware or software device that has all components pre-installed and pre-configured as much as possible, and manages a security policy for an entire enterprise. The second question on this group is regarding capability of the web proxy servers to make itself highly available under abnormal circumstances. For example, if one of web proxy servers array is fail to perform how the service is delegated to the other web proxy servers in the network (Taylor, 2001).
Luotonen (1998, pp. 291-313) provides another guidance to design web proxy servers’ solutions. First is considering the purpose of web proxy servers. The consideration factor could range from caching capability, security, and filtering to logging and monitoring. Second is considering the estimated load of the web proxy servers. The consideration factors is including amount of users, future growth, type of use, access time, etc. Third is considering the proxy hierarchy whether its flat or hierarchical structure, single or array web proxy servers. The fourth is considering about hardware and software available in the market.
Web proxy server is solution for problems faced by Intranet when connected to the Internet such as limited and expensive Internet bandwidth, security issues, and access control. Intranet could gain benefit by using web proxy servers to connect to the Internet. Web proxy servers could improve performance of the Internet connection by using caching features to serve client frequent request of certain web document, reserve expensive bandwidth, and reduce latency and work load of the network by (Raunak, 2000). Caching feature is equipped with conditional GET to ensure document served to the client is up to date (Luotonen, 1998). Web proxy servers could protect the Intranet client and resources from exposing itself to the Internet by requesting the document on behalf of the client or in case of reverse proxying by serving request from the Internet to protect web servers (Luotonen, 1998).
REFERENCES
Ah Chung Tsoi. (2000) Internet Access Guidelines for Coursework Students, University of Wollongong, [Online], Available:
http://www.uow.edu.au/its/userguides/students/ip_accounting.html
Galioto, J. (1998) Publishing Your Web Site with IIS and Microsoft Proxy Server 2.0, Microsoft Web Builder, 3(10), pp. 1-4.
Gralla, P. (1996) How Intranets Work, Macmillan Computer Publishing, USA.
Hughes, L. (1998) Internet E-Mail: Protocols, Standards, and Implementation, Artech House, Boston.
Hohman, R.S. (1998) Cache Beats Back Bandwidth Blues, Network World, 15(46), p.45.
ICSA Labs. May (1999) Winroute Pro 4.1: Firewall Product Functional Summary, ICSA, [Online], Available:
_________, January (2001) Microsoft Internet Security and Acceleration Server: Firewall Product Functional Summary, ICSA, [Online], Available:
http://www.icsalabs.com/html/communities/firewalls/certification/vendors/microsoft/isas/pfd.pdf
Luotonen, A. (1998) Web Proxy Servers, Prentice Hall, New Jersey.
Netscape Communication Inc. (1997) Administrator’s Guide: Netscape Proxy Server Version 3.5, Netscape Communication Inc., [Online], Available:
http://help.netscape.com/products/server/proxy/index.html.
Pearson, O. (2000) Squid: A User’s Guide, Qualica Technologies, [Online], Availabe:
http://squid-docs.sourceforge.net/latest/html/book1.htm.
Raunak, S.M., P. Shenoy, P. Goyal, & K. Ramamritham. (2000) ‘Implication of Proxy Caching for Provisioning Networks and Servers’, Proceedings of The International Conference on International Conference on Measurements and Modeling of Computer Systems, ACM, pp. 66-77.
Rome, D. (1999) Information Technology Services: Authentication, University of Wollongong, [Online], Available:
http://www.uow.edu.au/its/authen_hist.htm
Taylor, L. (2001) Considerations of Firewall: Part 1, ZDNet Asia, [Online] Available:
http://www.zdnetasia.com/biztech/security/story/0,2000010816,20192642,00.htm
Tiny Software Inc. (2000) Winroute Pro 4.1: Network Address Translating, Tiny Software Inc. [Online], Available:
http://www.tinysoftware.com/nat.php
Wallace, B. (1998) Web-caching Servers Cut Network Costs, Computerworld, 32(4), pp. 47-48.
Yeager, N.J. & R.B. McGrath (1996) Web Server Technology: The Advanced Guide for World Wide Web Information Providers, Morgan Kaufmann, San Fransisco, pp. 177-220.
Yerxa, G. (1998) Problem Solving with Web Proxy Servers, Network Computing, 9(11), pp. 132-136.
best informesens for my ….tcx