[ previous ] [ Contents ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ] [ 8 ] [ 9 ] [ 10 ] [ 11 ] [ 12 ] [ 13 ] [ 14 ] [ 15 ] [ 16 ] [ 17 ] [ A ] [ B ] [ C ] [ D ] [ E ] [ next ]

Smart Cache Manual
Chapter 7 - Advanced topics


7.1 Choosing a good cachesize

How to choose good cache size? It depends how you want to use it. Nowdays diskpace is cheap resource.

100 MB is starting point - I never recommend cache size to be smaller than 100 MB. If you want smaller cache, you do not need it - simply use cache in your browser.

Next sections has some hints about choosing cache size. What do you want to do?


7.1.1 Standard http-proxy

Standard mode of operation: leased line internet, not planing to use offline browsing, use proxy cache for saving the bandwidth, about 50 computers connected to it.

In this case you should have cache size for holding about 1 Month traffic with 1GB minimum. I do not recommend to make it bigger, because in my tests it do not improves cache hit rate. You can make it smaller but do not go under 14 days of traffic unless you really need it. Never go under 7days.


7.1.2 Dial-up and offline browsing

Case #1: If you do not browsing at home very much (you have T1 line at work) and just want to see some hot breaking news sometimes and you are not interrested mainly in offline browsing. 200 MB cachesize is enough.

Case #2: If you like offline browsing, you need a bigger cache. Start with 400 MB if you are not browsing very much. If you have some free diskspace or you are browsing very much, you can go up to 1GB. In any case use Data compression support, Section 7.10.


7.1.3 Home network, sharing connection

Start with cache size 500 MB. You should read standard http-proxy section also. Data compression support, Section 7.10 is your friend.


7.2 Configurating refresh patterns

Smart Cache can be configured how often will made checks if new version of page is available. This is done with keywords default_refresh_pattern and refresh_pattern in configuration file. The difference between them is that refresh_pattern contains URL mask, other arguments are the same.

If you want to play with them, setting trace_refresh yes and trace_url yes will provide you more information why page is being or not being loaded. If you can understand Java, go into file cacheobject.java and look at function needRefresh().

Arguments of these commands are:

[default_]refresh_pattern [URL mask] /Reload_age/ /Min_age/ /Lastmod_factor/ /max_age/ /Expire_age/ /Redirect_Age/

These numbers are floating-point times in minutes except /Lastmod_factor/ which is a fraction part < 1

Smart Cache's refresh algorithm in English. Page age is difference between date, when the page was loaded and today.

  1. If browser requests a forced page reload and page is older than /Reload_age/, reload it otherwise return old copy.
  1. If page is older than /Max_age/ load it.
  1. If page has expire date and page has expired and page age is bigger than /Expire_age/ load it.
  1. If page is redirect to other page and age is older than /Redir_age/ load it.
  1. If page is younger than /Min_age/ return cached copy.
  1. If page do not have last modified date, load it.
  1. Compute last mod factor: lmf= page_age / (page_date - page_last_modified)
  1. If lmf > /Lastmod_factor/ load it otherwise return cached copy.

7.3 How garbage collection works

Smart Cache uses real LRU based garbage collection. It remembers last access time to every object in cache. When GC runs last access value of every object is transformed to score. If object was not accessed for 2 days - score is 2 points.

Object's score is modified by various object's attributes (for example size or expiration age), size rules and best matching penalty rule are applied. Priority of penalty rules follows order in sample configuration file. Last step is to apply first matching urlmask rule (if any). Rules can be fine-tuned with high details in gc.cnf.

If score is bigger than reference_age or object is bigger than maximum_object_size or smaller than minimum_object_size object is immediately removed from cache without even considering cache size.

After all objects in cache are scanned and scored, there are sorted by score. When cache size is bigger than cache high mark GC starts cleaning with removing objects with highest score first until size of cache drops between high and low marks. GC prefers to clean as much as possible without needing of another cache scan, but never deletes files bellow low mark.


7.4 Cookies filtering

Some Web sites deal so called "cookies". These "cookies" are tags sent from the server to the browser, which enable the server to keep track of the sites that the user visits, and thus compromise his privacy.

As was requested by many users, Smart Cache has now built-in filter for Cookies. Smart Cache's cookie filter has now 2 working modes: incoming and outgoing. These modes are switched using allow_all_session_cookies.

In both modes browsers warns if incoming cookies are detected, so they will continue to display warnings. Just turn this warnings off. If you use Netscape 3+, you can disable confirmation messages for any cookies sent to your browser by going into Options->Network_Preferences->Protocols and checking off the box for Show an Alert before Accepting a cookie.


7.4.1 Outgoing cookie filter

This mode is set by allow_all_session_cookies false. In this mode all cookies sent by your browser will be filtered unless domain name is allowed in cookies.cnf.

Benefits of this solution:

DANGER: When using fake_cookie option (cookies filter itself do not harm), you can CRASH remote WWW site when sending back very long cookies (buffer overflow attack) or cookies with known name, but unexpected value (for example text instead of numeric input). Some versions of Microsoft Internet Information Server will crash entire (instead of just one http-child in apache), so no new users can access this server until IIS is restarted.


7.4.2 Incoming cookie filter

This mode is set by allow_all_session_cookies true. In this mode all-session cookies are allowed from all sites and persistent cookies are allowed only from sites listed in cookies.cnf. If persistent cookies is not allowed - it is changed to session cookie. This is enough for keeping cookies-only www servers happy.

It is not a good idea to switch filter from outgoing to incoming mode without deleting all cookies in browser cache first.

Benefits of this solution:

Drawbacks:


7.5 Setting up logfiles

Smart Cache produce common or combined log file formats. Type of log can be switched by log_common.

Logs are wildcard masked, so you can log to multiple logs depending on URL requested. This is especially written for use in Web forwarding with Smart Cache, Section 9.7, but logs can be produced even if no forwarding is set.

Usage: access_log mask filename


7.6 Importing files to Smart Cache

If you are using SC for offline browsing, sometimes you may find importing files from outer sources (for example some CDs has offline copies of some servers) useful.

Place these files to directory structure, which has the exactly same name as original URLs of files. For example in /tmp make directory /tmp/www.javaworld.com/javaworld/jw-10-1999/ and copy necessary files to it, you can create any number of directories (even from different servers).

After that run java scache -import /tmp and files will be imported.

Smart Cache will check if newer version of imported file is not already available in cache. If not, file will be moved (preserving timestamp) to cache, if it fails, because file and cache directory are at different filesystems (disks), SC will copy file instead. Is wise to place files to the same filesystem as SC main data directory and clear READ-ONLY attributes.


7.7 Exporting data from Smart Cache

Cached data can be easily exported from Smart Cache. Data are exported in format, which can be used in Importing files to Smart Cache, Section 7.6.

Smart Cache can export last recently cached data in 3 modes:

  1. -export uses file's last modified date.
  1. -fullexport uses date when file was last checked against original server.
  1. -lruexport uses file's last access date.

Command line syntax is -[lru|full]export <Directory> <Timedelta>

Time delta is in format number unit with no space. (For example "1w" is one week). Supported time units are: d/D day, w/W week, m minutes, M months, y/Y years, h/H hours.


7.8 Using Smart Cache for non-HTTP protocols

When Smart Cache gets request for unknown protocol (for example ftp) and http_proxy is set, SC will forward this request and cache received result. See also Smart Cache limitations, Section 12.3.


7.9 Parent proxy authentification

A parent proxy login and password may be specified in the "http_proxy" configuration statement after the parent proxy port in a form login:password.

     Example:
     http_proxy my.cache.net 3128 mylogin:mypass

7.10 Data compression support

Smart Cache can compress incoming text data, which save significant amount (about 50%) of diskspace. If you want to do this set auto_compress 1 in scache.cnf, which compress files greater than 512 bytes. You can also set filesize limit instead 1, for example auto_compress 20000 will compress files bigger than 20k. Is recommended to use your blocksize as limit for compressing files.

These data will be sent to your browser in gzip compressed form. Your browser must know how to decompress them. Existing data can be compressed via Smart Cache repair utility, Chapter 11

If you see garbage on the screen, your browser can not handle compressed data. If you want still to use data compression set auto_decompress 1. Smart Cache will decompress data if your browser do not sent accept-encoding: gzip header. Some browsers do not send this header, but accept compressed data, so use auto_decompress only if necessary. If you want to ALWAYS decompress outgoing data, use value 2.

Any modern browser supports compressed HTML pages. Reported browsers which supports compressed HTML pages:

Browsers which DO NOT SUPPORTS compressed pages:

I have no information about browsers on other OSes (Win,Mac). If you want, you can send me these informations.


[ previous ] [ Contents ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ] [ 8 ] [ 9 ] [ 10 ] [ 11 ] [ 12 ] [ 13 ] [ 14 ] [ 15 ] [ 16 ] [ 17 ] [ A ] [ B ] [ C ] [ D ] [ E ] [ next ]

Smart Cache Manual

0.84
Radim Kolar hsn@cybermail.net