Actionable Web Log Mining

In the Web world Web sites pulls vast amount Web traffic across the world every day, leaving behind Web site traversal information in the form of Web server or query logs. Analysis of these logs can provide various kinds of knowledge, which can be applied to improve the performance of Web services. A particularly useful kind of knowledge is knowledge that can be immediately applied to the operation of the Web site. We call this type of knowledge the actionable knowledge.
(Find an example of web log here)
In this Blog, I present three examples of actionable Web log mining.

The first method is to mine a Web log for Markov models that can be used for improving caching and pre-fetching of Web objects.
Analysing web logs we can identify one’s behaviour thus we can Pre-fetch one’s favourite Web Objects like one goes to my Web site’s particular application/heavy page after done with dashboard or home page so i would pre-fetch that application/heavy page before one goes.
Steps involved in this method:
1. Data clearning on weblog data:
First Identify user by an individual IP address. Then Break apart a long sequence of visits by the users into user sessions.
(Estimated by time interval between two successive visits.
If the time interval exceeds the average time interval(or some fixed threshold time interval) then clearly there is two different sessions).[LLLY 2002] provides a better method for finding user session.
2. Mining Web Logs for Path Profiles
As we have separate visiting sessions. Now we can develop path profiles from these sessions as user visiting a sequence of Web pages often leaves a trail of the pages URL’s in a Web log. A path profile consists frequent sub sequences from the frequently occurring paths.[AZN 1999] designed a system to learn an Nth-order
Markov model.
Path profile helps us to predict the next pages that are most likely to occur.
3. Pre-fetching Web Object
Now we know which page is likely to be occur next so we will pre-fetch that web object. Original cache memory is partitioned into two parts: cache-buffer and pre-fetching buffer. A pre-fetching agent (Script) keeps pre-loading the pre-fetching-buffer with documents predicted to access next.

The second method is to use the mined knowledge (Path Profile build in previous method) for building better, adaptive user interfaces. The new user interface can adjust as the user behaviour changes with time.
For example I can provide a quick menu using predicted path from Path Profile.

Finally, I present an example of applying Web query log knowledge to improve Web search for a search engine application.
Web query log mining involves various steps as follows Mining Generalized Query Patterns, Bottom to up Generalization Algorithms, A Hierarchy over Keywords, Flexible Generalization, Morphology Conclusion and Synonym Conversion.

I know this small blog cannot help you much to start working on it right now but you can dig up more in Research paper on Mining Web Logs for Actionable Knowledge by Qiang Yang, Charles X. Ling and Jianfeng Gao.

[LLLY 2002] Lou, W., Liu, G., Lu, H. and Yang, Q. (2002) Cut-and-Pick Transactions for Proxy
Log Mining. In Proceedings of the 2002 Conference on Extending Database Technology. March
24-28 2002, Prague.
[AZN 1999] Albrecht, D., Zukerman, I., and Nicholson, A. (1999). Pre-sending Documents on
the WWW: A Comparative Study. In Proceedings of the 1999 International Conference on
Artificial Intelligence, IJCAI99, pp. 1274-1279, Sweden.

Deependra Singh