Specification

 

The Clipen engine is composed of the following parts:

  1. RDBMS - third party relational database management system intended for data storage and management of the Clipen database. The management system also supports clickstream data post-processing within the Clipen database and output data extraction through appropriate database connectivity. The RDBMS is an essential part of Clipen technology and guarantees output data consistency using transaction processing. The Clipen engine's performance relies to a large degree upon the performance of the RDBMS Click to enlarge
  2. clipsnif - the Clipen sniffer creates snoop records of a captured network communication between client applications and tracked website server(s). The records consist of input clickstream data that are processed by Clipen node(s) and stored in the Clipen database
  3. clipsrv - the Clipen server controls the operation of the entire Clipen engine and provides tools for Clipen administration by a human user. The essential functionality of the Clipen server includes:
    • distribution and supervision of processing tasks
    • post-processing control
    • handling of the extraction window
    • engine updating
    • exception handling and error propagation
  4. clipnode - Clipen nodes perform parallel processing of input clickstream data. The data are divided into parallel tasks and processed in a task-per-node manner using approximately as many nodes as there are CPUs (or processor cores) available within the Clipen operating platform. The Clipen node is the "workhorse" of the engine

The clickstream data processing within Clipen provides the following main functions:

  • Session Tracking - Websites implement a session tracking mechanism in order to be able to distinguish user sessions1 of various website users because of the stateless nature of HTTP protocol. Clipen currently supports session tracking using cookies in order to correctly perform the session identification of tracked website's clickstream data. If there is no session identifier available within HTTP request, Clipen uses pseudo-identification of such requests in order to process them into sessions. However such sessions are called unidentified sessions because of the fact that they lack the regular session identifier utilized by the tracked website.
  • Authentication Tracking - HTTP authentication (see RFC2617 for details) is a widely used method of website user authentication within a website. Clipen currently supports the Basic Authentication Scheme in order to access the website user authentication name2 and verify the result of the authentication process.
  • User Tracking - Websites utilize persistent user identifiers3 to identify unique website users across multiple user sessions (to determine repeated website visits etc.). Clipen currently supports user tracking using persistent cookies, which are the most suitable way of carrying unique user identifiers.
  • Login Tracking - Most of websites offer a user login to provide appropriate website content or services. Clipen provides the login tracking mechanism that allows (in conjunction with custom Perl routine) tracking of the login process in order to acquire the login name of the user when successfully logged in.
  • Content Tracking - Clipen provides the Generic Content Tracking™ mechanism that creates website content unique identifiers and ties them with resource URLs received from HTTP requests sent to the tracked website. Using the ClipConf tool, Clipen users may import the content coding index and cross-refer its content descriptors to the generic content identifiers produced by Clipen (alternatively, Clipen users may benefit from content identifiers already embedded within website URLs by the user's content management application and access it using Clipen customization).
  • DNS Resolving - Clipen performs reverse IP address translations of client IP addresses acquired from the processed clickstream data. The results of the resolving are stored in the Clipen DNS cache within the Clipen database to prevent repeated translation (the cache is refreshed as required).
  • Robot Detection - The robot4 detection within the Clipen engine utilizes several techniques to detect automated agents and label them to be treated separately from human users' clickstream data or ignored entirely. Clipen also provides website user browser identification as a part of robot detection using "User-Agent" HTTP header value.
  • Page Recognition - As web analytics applications largely benefit from a knowledge of so-called page views, Clipen performs page/non-page recognition for each hit to appropriately identify page content in order to support that broad need (alternatively, Clipen users may use customization and specify other recognition algorithms).
  • Dwell Time Specification - Clipen fully utilizes knowledge about page/non-page content as well as the content true delivery time in order to precisely specify the page view dwell time. As pages usually contain embedded objects like images and style sheets that are downloaded within separate subrequests of given page, Clipen is able to accurately specify the time to page view of complete page content and consequently the page view dwell time (Clipen also indicates if the page content was not fully delivered or was not delivered at all).
 

This product includes software developed by Inprise Corporation.
This product includes software developed by the OpenSSL Project for use in the OpenSSL Toolkit. (http://www.openssl.org/).
This product includes cryptographic software written by Eric Young (eay@cryptsoft.com).
This product includes software developed by Eric Rescorla for RTFM, Inc.