Welcome
Photos of Larryblakeley
http://www.royblakeley.name/larry_blakeley/larryblakeley_photos_jpeg.htm
(Contact Info: larry at larryblakeley dot com)
Important Note: You will need to click this icon to download the free
needed to view most of the images on this Web site - just a couple of clicks and you're "good to go."
I manage this Web site and the following Web sites: Leslie (Blakeley) Adkins - my oldest daughter
Lori Ann Blakeley (June 20, 1985 - May 4, 2005) - my middle daughter
Evan Blakeley- my youngest child
Hsinchum Chen, Ph.D.Founder, Knowledge Computing Corporation
Director, Artificial Intelligence Lab
Director, Mark and Susan Hoffman E-Commerce Lab
Department of Management Information Systems
Eller College of Business and Public Administration
The University of Arizona
Tucson, Arizona
Trailblazing a Path Towards Knowledge and Transformation
E-Library, E-Government and E-Commerce
Knowledge Computing Corporation
Hsinchum Chen, Ph.D., 2002
Excerpts:
Foreword
This book is the follow-up to an earlier book in 2001 entitled: Knowledge Management Systems: A Text Mining Perspective. We wish to turn the readers' attention from this general knowledge management discussion to specific examples and case studies in digital library, digital government, and e-commerce applications. We want to demonstrate the potential of using IT as a catalyst for trailblazing a path towards knowledge and transformation in various organizations.
We stress the
*******************************************************************************************
ance for researchers and practitioners to become agents of transformation for their organizations. We also discuss opportunities for researchers and practitioners to help convert data and information (overload) into knowledge (asset) for libraries, governments, and businesses. In particular, we believe advanced Internet-enabled information technologies, especially knowledge portals, data mining, text mining, recommender systems, and visualization could become a catalyst for transforming libraries, governments, and businesses.
Introduction
The Internet is changing the way we live and do business. Since the first ARPANET node installed at UCLA on September 1, 1969 and the first paper on the Internet, written by Vint Cerf and Bob Kahn on September 10, 1973 (Cerf, 2002), the Internet has evolved from being ftp file transfer, gopher information service, and email exchange to supporting seamless multimedia content creation, access, and transactions over the World Wide Web.
Some researchers and practitioners believe that business, technology,and society in general are in a true "Digital Renaissance" (Fiorina, 2000). As Hewlett Packard (HP) CEO Carly Fiorina put it:
Like the first Renaissance, which was the liberation of the inventive imagination, the Digital Renaissance is about the empowerment of the individual and the consumer. And if we can bridge the gap between business, science, and government so that we can all understand and foster the Digital Renaissance, then we have a chance to make this second Renaissance truly global and grassroots.
Using HP as an example, Fiorina suggested three emerging forces in the technology and business landscape: information appliances, always-on IT infrastructure, and e-services. Information appliances are anything with a chip inside that can connect to the Internet. The always-on IT infrastructure needs to be as available and reliable as tap water and electricity. E-services will take any process or any asset that can be digitized and deliver it over the Web.
The Internet offers a tremendous opportunity for many different traditional institutions such as libraries, governments, and businesses to better deliver their content and services and interaction with their constituents citizens, patrons, businesses, and other government partners. In addition to providing information, communication, and transaction services, an exciting and innovative transformation could occur with new technologies and practices. Data and information can begin to become knowledge assets. Digital Library (e-library), digital government (e-government), and e-commerce research have many common threads, yet each faces some unique challenges and opportunities. We hope that by reviewing how these three fields have evolved over the past decade and examining the lessons learned, we will be in a better position to understand the power of technology and possibly identify a path to transformation and knowledge for a truly global Digital Renaissance.
We believe information technologies such as the Internet, WWW, data mining, knowledge portals, text mining, recommender systems, and visualization are best considered as the catalyst for creating a human-driven, system-assisted transformation process rather than as "silver bullets" for solving an institution's basic problems. IT cannot be effective if not implemented and utilized properly by its owners and users, nor can it succeed without considering its larger organizational and social context. Over the past decade, we have seen many excellent examples of fundamental transformation occur in many organizations with the help of new IT deployment, from e-commerce to digital library and digital government.
A Text Mining Perspective
Knowledge Computing Corporation
Hsinchum Chen, Ph.D., 2001
Excerpts:
A high-level, although systematic, discussion of text mining is presented. Unlike search engines and data mining that have a longer history and are better understood, text mining is an emerging technical area that is relatively unknown to IT professionals.
Knowledge Management Systems
Background
Before discussing Knowledge Management, we need first to understand the unit of analysis, namely, knowledge.
It is generally agreed by IT practitioners that there exists a continuum of data, information, and knowledge (and even wisdom) within any enterprise. The concept of data and the systems to manage them began to be popular in the 1980s. Data are mostly structured, factual, and oftentimes numeric. They often exist as business transactions in database management systems (DBMS) such as Oracle, DB2, and MS SQL. Information, on the other hand, became a hot item for businesses in the 1990s, especially after the Internet web explosion and the successes of many search engines. Information is factual, but unstructured, and in many cases textual. Web pages and email are good examples of information that often exists in search engines, groupware, and document management systems. Knowledge is inferential, abstract, and is needed to support business decisions.
In addition to the IT view of the data-information-knowledge continuum, other researchers have taken a more academic view. According to these researchers, data consist of facts, images, or sounds. When data are combined with interpretation and meaning, information emerges. Information is formatted, filtered, and summarized data that, when combined with action and application becomes knowledge. Knowledge exists in forms such as instincts, ideas, rules, and procedures that guide actions and decisions.
The concept of knowledge has become prevalent in many disciplines and business practices. For example, information scientists consider taxonomies, subject headings, and classification schemes as representations of knowledge. Artificial intelligence researchers have long been seeking such ways to represent human knowledge as semantic nets, logic, production systems, and frames. Consulting firms have also been actively promoting practices and methodologies to capture corporate knowledge assets and organizational memory. Since the 1990s, knowledge management has become a popular term that appears in many applications, from digital library to search engine, and from data mining to text mining. Despite its apparent popularity, we believe the field is rather disjointed and new knowledge management technologies are relatively foreign to practitioners.
Definition
We adopt a layman's definition of knowledge management in this book. Knowledge management is the system and managerial approach to collecting, processing, and organizing enterprise-specific knowledge assets for business functions and decisions. Notice that we equally stress both the managerial (consulting) and also the system (technology) components.
It is our belief that where a managerial approach lacks a sound technical basis, we will see KM become another casualty of consulting faddism, much (TQM), which, in many cases, did not deliver sustainable values to customers. On the other hand, new KM technologies will fall into misuse or produce unintended consequences if they are not properly understood and administered in the proper organizational and business context.
In light of corporate turnover, information overload, and the difficulty of codifying knowledge, knowledge management faces daunting challenges to making high-value corporate information and knowledge assets easily available to support decision making at the lowest, broadest possible levels.
A significant (emerging) approach to knowledge management is represented by researchers and practitioners who attempt to codify and extract knowledge using automated, algorithmic, and data-driven techniques. We define systems that adopt such techniques as Knowledge Management Systems (KMS), a class of new software systems that have begun to contribute to KM practices. A KMS focuses on analysis and is the subject of our discussion in this book.
Two of the most relevant sub-fields within knowledge management are data mining and text mining. Data mining, which is better known within the IT community, performs various statistical and artificial intelligence analyses on structured and numeric data sets. Text mining, a newer field, performs various searching functions, linguistic analysis, and categorizations. KMS complements existing IT infrastructure and often requires being superimposed on such foundational systems as e-portals or search engines. Methodologies for practicing these new techniques must be developed if they are to be successful.
The focus of our discussion in this book will be on text mining. However, we will briefly introduce search engines (that focus on text search) and data mining (that focuses on data analysis) as two related and well-known "siblings" of text mining.
KM is clearly suited to capturing both internal (employees') and external (customers') knowledge.
The most significant KM implementation challenge is not due to lack of skill in KM techniques. The top four implementation challenges are nontechnical in nature: (1) employees have no time for KM, (2) the current culture does not encourage sharing, (3) lack of understanding of KM and its benefits, and (4) inability to measure financial benefits of KM. It seems clear that significant KM education, training, and cultural issues will have to be addressed in most organizations.
Because KM practices are still new to many organizations, it is not surprising that most of the techniques and systems adopted have been basic IT systems, rather than the newer data mining or text mining systems. The most widely used KM software, in ranked order of budget allocations, are: enterprise information portal (e-portal), document management, groupware, workflow, data warehousing, search engine, web-based training, and messaging email.
Search engine companies have mostly evolved from their technical past to the current media focus that stresses content, service, advertising, and marketing. The majority have abandoned their historically technical and free-sharing roots. New technologies rarely emerge from these companies. Many have formed "keiretsu" with venture capitalists, ad agencies, old media companies, verticals, and even banks, e.g., Kleiner Perkins, AT&T, At Home, Excite, etc.
Some newer search-engine based companies have evolved into e-portals that aim to serve the information needs of a corporate Intranet, e.g., Autonomy and Northern Light. New functionalities such as chat rooms, bulletin boards, calendaring, content pushing, user personalization, publishing, and workflows are added to such a one-stop shopping site for enterprise users. Both internal content (e.g., best practices and communications) and external resources (e.g., industry reports, marketing intelligence) of various formats (e.g., email, power point files, Notes databases, etc.) are captured in such systems.
Despite significant technological advancement of search engines and e-portals, such systems are grounded on basic text processing technologies (inverted index and vector space model) developed in the 1970s. They lack advanced linguistic processing abilities (e.g., noun phrasing, entity extraction knowing who, what, where, when, etc. in text) and automatic categorization and clustering techniques.
Unlike search engines, data mining projects are domain-specific, application-dependent, and often require significant business analysis, customization, and refinement.
Several areas distinguish text mining from data mining. First, unlike data mining, data characteristics of text mining require significant linguistic processing or natural language processing abilities. Second, data mining often attempts to identify causal relationships through classification (or supervised learning) techniques (e.g., What employee demographic variables affect spending patterns). Text mining, on the other hand, aims to create organizational knowledge maps or concept yellowpages, as described in the Gartner Group Knowledge Management report. Third, text mining applications deal with much more diverse and eclectic collections of systems and formats (email, Web pages, Notes databases, newsgroups).
Even in light of the highly technical nature of the data mining and text mining systems, Knowledge Management requires a balanced managerial and technical approach. Quality content management and data assurance, careful business and system requirement solicitation, effective implementation methodologies, and supportive organizational culture and reward systems are critical for its success.
The diverse multimedia content and the ubiquitous presence of the web make both commercial users and the general public see the potential for utilizing unstructured information assets in their everyday activities and business decisions.
It is estimated that 80% of the world's online content is based on text. We have developed an effective means to deal with structured, numeric content via database management systems (DBMS), but text processing and analysis is significantly more difficult. The status of knowledge management systems is much like that of DBMS twenty years ago. The real challenges, and the potential payoffs for an effective, universal text solution, are equally appealing. It is inevitable that whoever dominates this space will become the next Oracle (in text).
In the 1980s, there was an explosion of AI research activities, most notably in expert systems. Many research prototypes were created to emulate expert knowledge and problem solving in domains such as medical and car diagnosis, oil drilling, computer configuration, etc. However, the failure of many such systems in commercial arenas led many venture capitalists to back away from any ventures associated with AI.
Nevertheless, commercial expert systems have made both researchers and practitioners become realistic about the strengths and weaknesses of such systems. Expert systems may not be silver bullets but they have been shown to be suited for well-defined domains with willing experts.
In the 1990s, AI-based symbolic learning, neural-network, and genetic programming technologies have generated many significant and useful techniques for both scientific and business applications. The field of data mining is the result of significant research developed in this era. Many companies have since applied such techniques in successful fraud-detection, financial prediction, web-mining, and customer-behavioral analysis applications.
Both IR and AI research have contributed to a foundation for knowledge representation. For example, indexing, subject heading, dictionaries, thesauri, taxonomies, and classification schemes are some of the IR knowledge representations still widely used in various knowledge management practices. AI researchers, on the other hand, have developed knowledge representation schemes such as semantic nets, production systems, logic, frames, and scripts. With the continued expansion and popularity of web-based scientific, governmental, and e-commerce applications in the 2000s, we foresee active research leading to autonomous web agents with learning and data mining abilities. The field of web mining promises to continue to provide a challenging test bed for advancing new IR and AI research.
The Industry
As the Gartner Group report suggested, the new KMSs require a new set of knowledge retrieval (KR) functionalities, broadly defined in two areas: the semantic dimension (with the goal of creating concept yellowpages) and the collaboration dimension (with the goal of providing value recommendations from experts). Such functionalities can be used to create knowledge maps for an organization.
The IR vendors (e.g., Verity, OpenText), document management companies (e.g., PCDOC, Documentum), and groupware/ email firms (e.g., Microsoft, Lotus, Netscape) are essential to knowledge management deployment but generally lack critical analytical and linguistic processing abilities.
New knowledge management companies excel in text mining ability but often lack execution and delivery abilities (e.g., Autonomy, Knowledge Computing Corporation, inXight, and Semio). Autonomy is probably the most successful company in this category so far.
Collection creation and processing ability and the retrieval and display ability both require significant system integration with an existing IT infrastructure. Text analysis and processing, however, are algorithmic in nature and are considered unique additions to the knowledge management functionalities. Such techniques are new, but they nevertheless exhibit overwhelming potential for business text mining.
Based on significant research in the IR and computational linguistics research communities (e.g., TREC and MUC Conferences sponsored by DARPA), it is generally agreed that phrasal-level analysis is most suited for coarse but scalable text mining applications. Word-level analysis is noisy and lacks precision. Sentence-level is too structured and lacks practical applications. Semantic analysis often requires a significant knowledge base or a domain lexicon creation effort and therefore is not suited for general-purpose text mining across a wide spectrum of domains. It is not coincidental that most of the subject headings and concept descriptors adopted in library classification schemes are noun phrases.
OOHAY and Benefits
It is our belief that the old way of creating subject hierarchies or knowledge maps based on human efforts (such as the Yahoo's directory structure) is not practical or scalable. The existing amount of business information, the speed of future information acquisition activities, and the amount of human effort involved make the manual approach obsolete. Only by leveraging various system-based computational techniques can we effectively organize and structure the ever-increasing textual (and maybe multimedia) content into a useful Object Oriented Hierarchical Automatic Yellowpage (OOHAY) a computational approach that we believe in and is the reverse of Yahoo!
Implementing an OOHAY approach to knowledge management offers multiple benefits. First, a system-aided approach could help alleviate the "information overload" problem faced by practitioners and managers. Oftentimes, conventional search engines return too much instead of too little information. A dynamic, customizable analysis component based on text mining techniques can help alleviate such a search bottleneck. Secondly, the system-generated thesaurus can help resolve the "vocabulary differences" faced by searchers - user search terms are often different from a database's index terms, and thus return low relevant results. The system-aided approach also helps to reduce the amount of time and effort required for creating enterprise-specific knowledge maps. The system-generated knowledge maps are often more fine-grained and precise than the human-generated knowledge maps. Organizations could benefit from such an efficient, high-quality, and cost-effective method for retaining internal and external knowledge.