From - Wed Apr 23 09:47:29 1997 Newsgroups: comp.ai,comp.ai.nat-lang,comp.ai.philosophy Path: news.uakom.sk!news.cesnet.cz!news.radio.cz!voskovec.radio.cz!newsbreeder.radio.cz!news.radio.cz!newsbastard.radio.cz!news.radio.cz!CESspool!hammer.uoregon.edu!xfer.kren.nm.kr!news-west.sprintlink.net!news-peer.sprintlink.net!news.sprintlink.net!sprint!ix.netcom.com!whitten Subject: Unofficial Cyc FAQ Version 2.0 Summary: Cyc Frequently Asked Questions Keywords: CYC FAQ Info Organization: NETCOM On-line Communication Services (408 261-4700 guest) Date: Mon, 21 Apr 1997 15:36:28 GMT Lines: 1094 Xref: news.uakom.sk comp.ai:40882 comp.ai.nat-lang:5556 comp.ai.philosophy:47288 The Unofficial, Unauthorized Cyc Frequently Asked Questions Information Sheet. Written by David Whitten, with various input from other net citizens If you think of questions that are appropriate for this FAQ, or would *** Copyright: Copyright (c) 1994,1995 by David Whitten. All rights reserved. Portions copyright (c) by MCC and Cycorp. This FAQ may be freely redistributed in its entirety without modification provided that this copyright notice is not removed. It may not be sold for profit or incorporated in commercial documents (e.g., published for sale on CD-ROM, floppy disks, books, magazines, or other print form) without the prior written permission of the copyright holder. Permission is expressly granted for this document to be made available for file transfer from installations offering unrestricted anonymous file transfer on the Internet. Should this file access information would be appreciated. This article is provided AS IS without any express or implied warranty. *** Topics Covered: [1] Introduction [1-1] What is Cyc ? [1-2] How do I find this FAQ ? [2] Who is doing Cyc ? [2-1] Who is sponsoring it ? [2-2] Who is Douglas Lenat ? [2-3] Who else was previously or is currently working on Cyc ? [2-4] How do I contact the authors myself ? [3] When did they start Cyc and when will it be completed ? [4] Why are they creating Cyc ? [5] Where can I get the source code ? [6] What do they use to make it work ? [6-1] What programming language is it written in ? [6-2] What machines does it run on ? [6-3] How large is Cyc? How has the size changed over the project? [7] How does it work ? [7-1] How do they store common sense in a computer ? [7-2] How do they input common sense ? [7-3] What theoretical foundation is behind Cyc ? [7-4] What is the difference between Cyc and an Expert System ? [7-5] What are they doing in Natural Language Processing ? [8] What are Cyc's capabilities right now? [8-1] Can Cyc reason about data not stored in Cyc format databases? [8-2] If a robot had a radio link to Cyc, what could it do ? [8-3] When do they expect Cyc to be able to read ? [8-4] Are you able to converse with Cyc ? [8-5] Will Cyc pass the Turing Test any time soon ? [9] Cyc standards with ANSI or ISO standards [9-1] What are the details of the functional interface to CycL ? [9-2] Can an SQL query be used to ask Cyc a question? [9-3] Can Cyc generate an SQL query to find information it needs ? [9-4] Can Cyc interact with HTML ? [9-5] Does Cyc have a KIF interface ? [A] Acknowledgements [B] Bibliography Search for [#] to get to question number # quickly. *** Recent changes: ;;; 1.0: 15-DEC-94 djw Initial release ;;; 2.0: 25-SEP-95 djw Update, include info about Cyc 10 ---------------------------------------------------------------- Subject: [1] Introduction Certain questions and topics come up frequently in the various artificial intelligence discussion groups about the Cyc program. This file/article is an attempt to gather these questions and their answers together as a convenient reference for AI researchers, students, hobbyists, and practitioners. I post it whenever I notice a newbie asking about Cyc on one of the newsgroups I read. I hope this will cut down on network traffic, and increase the enjoyment of these newsgroups for the regular readers by eliminating the necessity to read and respond to the same questions over and over again. It may even answer some questions that readers may not have thought about yet, and hopefully will stimulate new discussions by increasing the total amount of information available. Currently this FAQ covers the obvious questions and answers, but I plan to add new questions and answers as they become common. ---------------------------------------------------------------- Subject: [1-1] What is Cyc ? Cyc is the name of a very large, multi-contextual knowledge base and inference engine, the development of which started at the Microelectronics and Computer Technology Corporation (MCC) in Austin, Texas during the early 1980s. Over the past eleven years the members of the Cyc team have added to the knowledge base a huge amount of fundamental human knowledge: facts, rules of thumb, and heuristics for reasoning about the objects and events of modern everyday life. Cyc is an attempt to do symbolic AI on a massive scale. It is not based on numerical methods such as statistical probabilities, nor is it based on neural networks or fuzzy logic. All of the knowledge in Cyc is represented declaratively in the form of logical assertions. Cyc presently contains approximately 400,000 significant assertions, which include simple statements of fact, rules about what conclusions to draw if certain statements of fact are satisfied (true), and rules about how to reason with certain types of facts and rules. New conclusions are derived by the inference engine using deductive reasoning. Its avowed purpose is to break the software brittleness bottleneck once and for all by constructing a foundation of basic "common sense" knowledge -- a sort of semantic substratum of terms, rules, and relations -- that will enable a variety of knowledge-intensive products and services. Cyc is intended to provide a "deep" layer of understanding that can be used by other programs (such as domain-specific expert systems) to make them more flexible. To date, Cyc has made possible ground-breaking pilot applications in the areas of heterogeneous database browsing and integration, captioned image retrieval, and natural language processing. ---------------------------------------------------------------- Subject: [1-2] How do I find this FAQ ? This FAQ is posted semi-regularly on the comp.ai and comp.ai.nat-lang It is not currently available from archive sites, WWW sites, and ftp sites. A previous version of this FAQ that has been HTML-ized is available at http://www.mcs.net/~jorn/html/ai/cycfaq.html and it is anticipated that this version will be available there as well. a simple text listing of it is available at: http://www.mcs.com/~drt/software/cycfaq ---------------------------------------------------------------- Subject: [2] Who is doing Cyc ? Much of the Cyc work has been done at the Microelectronics and Computer Technology Corporation in Austin, Texas. In January of 1995, a new independent company named Cycorp was created to further the work done on the Cyc project. Cycorp continues to be based in Austin, Texas. ---------------------------------------------------------------- Subject: [2-1] Who is sponsoring it ? The development of Cyc has been supported by several organizations, including Apple, Bellcore, DEC, DoD, Interval, Kodak, and Microsoft. ---------------------------------------------------------------- Subject: [2-2] Who is Douglas Lenat ? Doug Lenat is one of the world's leading computer scientists and is head of the Cyc Project at MCC and President of Cycorp. He has been a Professor of Computer Science at Carnegie-Mellon University and Stanford University. He is a prolific author, whose publications include the books Knowledge Based Systems in Artificial Intelligence (1982, McGraw-Hill) Building Expert Systems (1983, Addison-Wesley) Knowledge Representation (1988, Addison-Wesley) Building Large Knowledge Based Systems (1989, Addison-Wesley) His 1976 Stanford thesis earned him the bi-annual IJCAI Computers and Thought Award in 1977. He was named one of America's brightest scientists under the age of 40 in the December 1984 Science Digest. In 1986, he was elected as Councilor of AAAI. ---------------------------------------------------------------- Subject: [2-3] Who else was previously or is currently working on Cyc ? Here is a complete list of everyone who has worked with Cyc for more than one continous year since 1988, plus present employees. Many other people have made sporadic or episodic contributions to Cyc. Present employees of Cycorp -- full-time, part-time, occasional, or currently on leave -- are marked with *. Paul Blair (1991-1993) is a graduate student in Philosophy who worked on Cyc as a knowledge enterer. His contributions to the project included writing a clear introduction to CycL, Cyc's representation language. Paul is currently continuing his graduate studies in New York City. Judy Bowman (1986-1994) was the secretary for the Cyc project during most of its existence at MCC. Rupert Brauch (1992-1994) holds a bachelor's degree in Philosophy and contributed to Cyc as a knowledge enterer. He is now pursuing a graduate degree in Computer Science at Stanford. *Kathy Burns (1993-present) is a member of the Cycorp Technical Board, and directs Cycorp's natural language processing development effort. She holds a bachelor's degree in Linguistics from the University of Texas at Austin, and has done graduate study in Linguistics at McGill University. In recent months Kathy (in conjunction with Keith Goolsbey) has completely rebuilt Cyc's NL facilities to make them compatible with the new version of the Cyc system (Cyc 10). She has made important contributions to Cyc's database browsing and retrieval application, has composed Cyc HTML interface and help pages, and has done a significant amount of knowledge entry. *Lisa Colvin (1995-present) holds a BA in philosophy from Tufts University, and an MA in Linguistics from the University of Texas at Austin. She works on Cycorp's natural language processing development effort and has done knowledge entry in a variety of domains. *Tony Davis (1995-present) holds a BA in Linguistics and Applied Math from U.C. San Diego, and will soon receive a PhD in Linguistics from Stanford. Before becoming a Cycorp employee he worked as a computational linguist for MITRE Corp., where he developed methods for ambiguity resolution. He works on Cycorp's natural language processing development effort and is especially interested in verb semantics. Mark Derthick (1988-1994) holds a PhD in Computer Science from Carnegie-Mellon University. He developed and maintained a variety of interface, browsing, and knowledge entry tools, including the Heuristic Level to Epistemological Level translator that proved to be a crucial addition to the earlier (pre-1991) frame-based version of Cyc. With Karen Pittman, he played a major role in developing the Cyccess pilot application, which demonstrated the value of using Cyc to integrate the data contained in disparate structured information sources such as database tables and spreadsheets. Mark is currently working on a data representation and visualization project at CMU. *David Gadbois (1990-present) is a member of the Cycorp Technical Board. He holds a BA in Plan II, a BS in Mathematics, and an MSCS in Computer Science from the University of Texas at Austin. He is presently finishing a PhD in Computer Science there with a dissertation on static analysis and compilation of rule system programs. He has written many of Cyc's system and network interfaces, manages Cycorp's computer systems, and helped design and implement Cyc database integration facilities. He is presently developing applications based on the database facilities. Lila Ghemri (1992-1995) holds a PhD in Computational Linguistics from the University of Bristol, England. She worked on the Cyc natural language processing effort, did a significant amount of knowledge entry, and played an important part in expanding Cyc's English language lexicon. In the Spring of 1995, she and her family moved to New York. *Keith Goolsbey (1990-present) is a member of Cycorp's Technical Board. He holds a Bachelor's degree in Electrical Engineering and a Master's degree in Computer Science, both from the University of Texas at Austin. He has done much knowledge entry (most notably regarding Cyc's treatment of quantities and scalar intervals), and has constructed practical HTML interfaces for Cyc, including a browsing/editing interface. In recent months Keith has played a dominant role in designing and implementing Cyc 10, including new versions of the Cyc inference engine (with Kenneth Murray), the Cyc Common Lisp to C translator, and the Cyc natural language processing facilities (with Kathy Burns). He has also implemented a distributed version of Cyc that allows the inference engine to derive conclusions using several, physically distinct KBs containing knowledge about different conceptual domains. Ramanathan V. Guha (1987-1994) holds an MS in Mechanical Engineering from UC-Berkeley, and a PhD in Computer Science from Stanford. From 1990 to 1994 he was the technical director of the Cyc project. He has written several important papers and technical reports, including (as co-author) the book Building Large Knowledge Based Systems (1989, Addison-Wesley). His most notable contributions included converting Cyc from a frame-based system to one in which the fundamental data objects are logical assertions, and the implementation of logical contexts (microtheories) as a method for structuring the knowledge in the Cyc knowledge base. Guha left the Cyc project at the end of 1994 and now works for Apple. *Bill Jarrold (1990-present) holds a BS in Cognitive Science from MIT. He has done a great deal of knowledge entry work for Cyc, most notably in the domains of weather and naive spatial relations. He is involved in the design and implementation of test suites for the Cyc knowledge base, the inference engine, and the underlying source code. Bill is currently pursuing a PhD degree in Counseling Psychology at the University of Texas at Austin. Kate Joly (1992-1994) holds a Bachelor's degree in Linguistics from UC-Santa Cruz. She made important contributions to Cyc's natural language processing effort, including writing most of the syntactic parsing templates for translating from English to CycL. *Fritz Lehmann (1995-present) specializes in building "ontological bridges" between different knowledge bases, thesauri, and standards. He has concentrated on semantic integration of differently arranged databases. He edited the book "Semantic Networks in Artificial Intelligence" and has written articles on the mathematical structure of taxonomies as well as practical issues involved in giving an ontological basis to the fields and codes contained in existing data interchange standards. He does some outside consulting on ontology- based methods, and has done detailed work on names, addresses, and documents. He is working on deep health-related modelling issues. Liz Lempert (1992-1994) holds a Bachelor's degree in Symbolic Systems from Stanford. She has done a great deal of knowledge entry for Cyc in a wide variety of domains. She is now pursuing graduate studies in Boston. Bill MacCartney (1994-1996) holds a Bachelor's degree in Philosophy from Princeton. In addition to doing knowledge entry, he has helped with porting Cyc from Common Lisp to C, and has worked on Cyc interface development in C++ and HTML. In recent months, he played an important part (with Keith Goolsbey) in devising the algorithm for Cyc's approach to distributed inferencing. Alan McKendree (1991-1994) holds a BS in Mathematics from the University of Texas at Austin. He worked on Cyc as a knowledge enterer and a system support specialist. Kathy Mitchell (1991-1995) holds a Bachelor's degree in Computer Science from Texas A & M and an MS in Computer Science from the University of Texas at Austin. She did much knowledge entry for Cyc in just about every domain. She played an important part in testing and debugging the Cyc captioned image retrieval pilot application, and also worked on the Cyc natural language processing effort. In the Spring of 1995 she and her husband moved to Portland. *Kenneth Murray (1987-present) is a member of Cycorp's Technical Board. He holds a PhD in Computer Science from the University of Texas at Austin. He has contributed to Cyc as a knowledge enterer in a wide variety of domains. With Keith Goolsbey, he is responsible for maintaining the Cyc inference engine, and he has written several of the heuristic modules designed to make Cyc's inference engine more efficient. He has also implemented one component of a CycL to SQL translator, making a major contribution to the Cyc 10 database browsing and retrieval application. Deborah Nichols (1992-1994) did knowledge entry for Cyc in several domains. She is now pursuing a PhD in Philosophy at the University of Texas at Austin. *Karen Pittman (1987-present) is a member of Cycorp's Technical Board, and plays a prominent role in directing general knowledge entry work on Cyc, training new knowledge enterers, scoping out new knowledge domains, and planning application-specific knowledge entry tasks. She is responsible for a great deal of the knowledge presently in the Cyc knowledge base. She holds an MS in Botany from the University of Texas at Austin, and is pursuing an MS in Computer Science (also at UT). With Mark Derthick, she played a major role in developing the Cyccess pilot application, which demonstrated the value of using Cyc to integrate the data contained in disparate structured information sources such as database tables and spreadsheets. More recently, she has coordinated Cycorp's work on the Cyc 10 database browsing and retrieval application. *Dexter Pratt (1989-1994,1996-present) holds a BS in Chemistry from Yale. Now an independent software developer and consultant, he was a pioneer in the development of the Lisp machine workstation in the early 1980s, when he worked for LMI. He contributed to Cyc in many ways, including writing and maintaining interfaces, system software development and maintenance, helping to port Cyc from Common Lisp to C, and knowledge entry in a variety of domains. With Nick Siegel, he played a major role in developing Cyc's captioned image retrieval pilot application. Wanda Pratt (1990-1993) holds an MS in Computer Science from the University of Texas at Austin. She contributed to Cyc as a knowledge enterer and a system software developer and maintainer. Her knowledge entry work covered many different domains. She is now pursuing a PhD in Computer Science at Stanford. Wei-Min Shen (1989-1991) holds a PhD in Computer Science from Carnegie-Mellon University. He did knowledge entry for Cyc and explored his interest in machine learning. He is now pursuing a career in university teaching and research. *Mary Shepherd (1984-present) is a Sociologist by training. She has contributed to Cyc as a knowledge enterer, and for much of the past eleven years has dealt with the onerous administrative and personnel tasks necessary to keep the Cyc team functioning. She is presently the administrative manager of Cycorp. *Nick Siegel (1988-present) is a member of Cycorp's Technical Board. He holds a BA in History of Religions from Creighton University and an MA in Cultural Anthropology from the University of Texas at Austin. His contributions have included planning and directing knowledge entry tasks, training new knowledge enterers, designing tests for the Cyc system, and implementing and maintaining various HTML interface tools. With Dexter Pratt, he played a major role in developing Cyc's captioned image retrieval pilot application. More recently, he implemented a database "meta query" browser that allows naive users to quickly determine the types of knowledge contained in a set of data tables. *Kevin Smith (1995-present) holds a BS in Symbolic Systems from Stanford University, with a concentration in Natural Language. Before becoming a member of the Cycorp technical staff he worked in Japan as an English teacher and language learning specialist. He works on Cycorp's natural language processing development effort and has also worked as a knowledge enterer. He is especially interested in the interdependence between cultural knowledge and language. Srinija Srinivasan (1993-1995) holds a Bachelor's degree in Symbolic Systems from Stanford. She did a great deal of knowledge entry for Cyc in a wide variety of domains, most notably in the area of human emotional states. She now works for Yahoo. Jamie Stephens (1992-1994) worked on Cyc as a knowledge enterer and system support specialist. Dan Torosian (1990-1993) worked on Cyc as a knowledge enterer. A professional jazz musician (clarinet, saxophone), he is now pursuing his musical career full-time in Austin, Texas. Ginger Webb (1990-1991) holds a Master's degree in French Linguistics from the University of Texas at Austin. She worked on Cyc as a knowledge enterer, contributing to several different domains. Alan Kay, Michael Lesk, John McCarthy, Marvin Minsky, Tom Murphy, Bob Simpson, Pat Hayes, Marvin Weinberger, and Steve Chenoweth have all provided useful comments and met with the Cyc team at various times. ---------------------------------------------------------------- Subject: [2-5] How do I contact the authors myself ? The Cyc group maintains a low profile on the Internet. As it would be easy to be deluged by email, they have chosen not to publicize their addresses. If you wish to discuss the ideas behind Cyc, or the philosophy of the Cyc project, you probably will get a faster response by posting to the comp.ai, the comp.ai.philosophy, or the comp.ai.nat-lang newsgroups. There are many talented people who read these groups, and it is possible that someone not affiliated with the Cyc project will be able to answer your questions. ---------------------------------------------------------------- Subject: [3] When did they start Cyc and when will it be completed ? The Cyc project began as a dream to create a computerized encyclopedia. When Alan Kay, one of computing's legendary figures, was at Atari's research center, he asked Doug Lenat for something original to add to this project. After Atari hit financial difficulties, Doug Lenat relocated his idea to MCC. Initially, Cyc was based on a re-implementation of RLL (the frame-based language underlying Eurisko) which was similar to the simultaneously but independently developed KRL. The original ten year funding period for the Cyc project at MCC was supposed to end in 1994, but was extended for one year to the close of 1995. From the perspective of some people on the Cyc team, asking when Cyc will be completed is sort of like asking of a person, "When will s/he be finished?" A more pertinent question is, when will it be useful? The practical answer is, when it can solve the problems and perform the tasks its builders would like for it to perform. The Cyc team believes that Cyc is now ready to be used in some interesting and useful applications. ---------------------------------------------------------------- Subject: [4] Why are they creating Cyc ? The Cyc team doesn't believe there is any shortcut toward being intelligent or creating an artificial intelligence based agent. Addressing the need for a large body of knowledge complete with content and context may only be done by manually organizing and collating information. This knowledge includes heuristic, rule of thumb problem solving strategies, as well as facts that can only be known to a machine if it is told. Much of the useful common sense knowledge needed for life is prescientific and has therefore not been analyzed in detail. Thus a large part of the work of the Cyc project is to formalize common relationships and fill in the gaps between the highly systematized knowledge used by specialists in the modern world. ---------------------------------------------------------------- Subject: [5] Where can I get the source code ? It is not free, nor is it freely available. If you or your company are willing to become a corporate sponsor of the Cyc development effort, you will be able to have the same access to the internal details and internal documentation available to the other sponsors (excluding, of course, information that is proprietary to particular sponsors). This is not an option for everyone, and Cycorp reserves the right to determine if they will accept sponsorship by your company. As the current sponsors have invested a considerable sum of money in developing Cyc, please do not pursue this option unless you or your company are willing to make a similar contribution. Serious inquiries regarding collaboration or sponsorship may be sent to: Doug Lenat Cycorp, Inc. 3500 West Balcones Center Drive Austin, Texas 78759 While the intent is to make Cyc widely available (so that it will become the standard representation and reasoning system), Cycorp is committed to protecting the intellectual property rights of those who have invested in Cyc's development. ---------------------------------------------------------------- Subject: [6] What do they use to make it work ? The Cyc system itself (the knowledge base, inference engine, interface modules, etc.) is a fairly large, complex piece of software. However, it now runs on what is basically stock, off-the-shelf hardware. Cyc can be run in a networked mode (information provided on one machine is available to all the other machines) or on a stand-alone workstation serving several users at once. ---------------------------------------------------------------- Subject: [6-1] What programming language is it written in ? There are both Common Lisp and C versions of Cyc. Most development is currently done in Common Lisp running on Symbolics Lisp machines. Lisp source code is translated into C, using a Common Lisp to C translator developed by the Cyc team, to produce source code that can be compiled by a variety of standard ANSI C compilers. Using the standard HTML-based interface tools, it is virtually impossible for a user to tell whether a given Cyc image is running in Common Lisp or C. ---------------------------------------------------------------- Subject: [6-2] What machines does it run on ? The C version of Cyc is intended to run on any system that provides an ANSI C compiler, virtual memory with at least 150 Mb of swap space, and at least a 32-bit flat virtual address space. As of January 1995, C versions of Cyc have been compiled and tested on the following OS/hardware combinations: UNIX OS: Sun Sparc DEC Alpha Apple System 7 OS: Macintosh Powerbook Macintosh Quadra Power Macintosh If there is sufficient demand from a sponsor company, the Cyc team will produce a C version of Cyc running under Microsoft Windows NT. The Common Lisp version runs on Symbolics Lisp machines, and under Lucid on Sparc 10s (with memory requirements similar to those for the C version). ---------------------------------------------------------------- [6-3] How large is Cyc? How has the size changed over the project? This answer depends on several factors. The primary factors are the language used and the implementation of the language used to build the system. Other differences include the operating system support which is provided to the executable. Different versions of LISP or C may greatly influence these numbers. The average size of an executable image is between 150 Mbytes and 200 Mbytes This figure includes the inference engine and the Knowledge base, but does not include temporary files or space used when importing external data. The size of the Cyc Knowledge base has fluctuated over the years, as the axioms and facts stored have changed. The knowledge base has decreased in size when axioms have been generalized over the years. This has resulted in fewer total axioms. Adding a new context or microtheory has increased the size, although a large amount of information needed when defining a new context is already availble in other parts of the knowledge base. As has been stated earlier, there are more than 400,000 significant assertions of which less than 30,000 are Rules (Inference IF THEN statement) There are currently over 500 MicroTheories (long lived contexts) defined within Cyc. (see section [7]) Adequate partial solutions have been defined for representing and reasoning with the most commonly occurring situations that deal with Time, Belief, Substances, Causality and Possibility, etc. Much of this has been outlined in the book _Building Large Knowledge-Based Systems_. A rough breakdown of the 30,000 Constant terms in the Cyc KB is: 40% Categories .5% Categories of categories Categories of individuals 3.0% Categories of Intangible objects,Information Bearing objects, Numbers, and Physical attributes, etc. 18.5% Categories of Tangible objects, Living Things, Artifacts 18.0% Categories of Script types Actions by one person : Physiological Actions, Problem Solving and Planning, Work/Hobby/etc Actions Actions by more than one person : Communication, Rites of Passage, Trade and Commerce, etc. Actions of Natural Phenomena (Weather etc) 15% Predicates and Functions Unary: see Categories and Attributes 12.0% Binary 2.0% Ternary 1.0% Quaternary or more 10% Attributes 15% Lexical objects (words, parts of speech, tense, number, gender) 15% Proper Nouns (specific people, places, languages, events, etc.) 1.5% Microtheories (long lived contexts) 3.5% Misc. and Sundry Another breakdown would be by the rules or formulae 25% Taxonomic Information (like type constraints on predicates) 35% Partonomic relations 5% general relationships 15% what kind of parts physical/anatomical/subEvents might various types of objects have? 15% what kind of actors are involved in various script types? 5% Information about specific people, places, etc. 10% Lexical information 8% linguistic properties of different word senses 2% denotations of word senses 10% More complex information interrelating script types, people and tangible objects. 10% General topics (time,space, intentions, stuff, numbers, etc) 5% Misc. and sundry formulae ---------------------------------------------------------------- Subject: [7] How does it work ? In the old days (before 1991), Cyc's representation language (CycL) was primarily a frame-based language, the Cyc KB was thought of as a set of unit/slot/entry triples, and inferencing was done pretty much by inheritance. This led to a set of increasingly baroque add-ons and work-arounds, such as encoding higher-arity predicates as entries which were tuples, having variant forms of predicates (in which the only difference was the order of the arguments), and placing more and more stress on frame-oriented editing interfaces to navigate around in the knowledge base. The Cyc team now thinks of the Cyc KB as a "sea" of assertions, with each assertion being no more "about" its first argument than its last one. For example, if one says that Fred is Sally's father, this is now regarded as being just as much a statement "about" Sally as Fred. Inference has broadened out into general logical deduction, with AI's well-known named inference engines (such as inheritance, automatic classification, etc.) just special cases that might or might not get treated specially in any particular implementation of the Cyc system; but in any event the persons entering knowledge do not need to cater to that, or even know about it. So one way to visualize the Cyc KB is as a circle filled with assertions; a circular "assertion sea". Above this sea (or outside it, from a two-dimensional perspective) sit all the "constants". Attached to each constant is a bundle of thin wires or strings. The other ends are attached to all the assertions, in the sea, that mention that constant anywhere. Moreover, each of the assertions in the sea can itself be treated as a constant, if you want, and have its own wires reaching to other assertions which mention it. Inference rules in Cyc can now be thought of as ways of saying that if you have certain assertions in the sea (a set of them, that match a certain pattern) then you are justified in adding a particular new assertion. Each time an assertion is added, wires are automatically strung to all the constants that are mentioned anywhere inside the assertion, and "ripples" of its adding may cause yet other inferences to occur, yet other new assertions to get dumped into the sea, etc. Sometimes one of the new assertions is the answer someone was waiting for, for some problem; sometimes one of the inference procedures reaches a contradiction and has to cope with that. CycL, the Cyc representation language, is essentially a form of First Order Predicate Calculus (FOPC) with equality, augmentations for default reasoning, skolemization, and some second-order features (e.g., quantification over predicates is allowed in some circumstances). Like FOPC, CycL allows using ForAll (universal quantification), ThereExists (existential quantification), and LogImplication (material implication), as well as the other common ways of combining variables and logical expressions such as LogAnd (conjunction), LogOr (disjunction), and LogNot (negation). It uses a form of circumscription, includes the unique names assumption, and can make use of the closed world assumption where appropriate. Cyc currently does not store most of the information you would find in a dictionary, encyclopedia, or an almanac. For example, Cyc may not know that Birendra Bir Vikram Shah Dev is the current king of Nepal, or that Kathmandu is its capital city. It does know what the characteristics of a capital city are, and it knows the significance of being a head of state. ---------------------------------------------------------------- Subject: [7-1] How do they store common sense in a computer ? See sections [1-1] and [7], above. Each assertion in Cyc (a statement of fact or a "rule-of-thumb") is located in (or associated with) a specific microtheory or context. Each microtheory captures one "fairly adequate" solution to some knowledge representation area (knowledge domain). These solutions may address general areas like representing and reasoning about space, common devices, time, substances, agents, and causality or specific areas like weather, manufacturing a particular thing, and walking. Different areas may have several different microtheories, since the way an area is perceived or modeled may be different. Different points of view, different assumptions, different levels of granularity, and even what distinctions are important or not important may be significant enough to require creating a separate microtheory. A microtheory may be considered to be a smaller and more modular knowledge-base within Cyc, which is specialized on a particular topic. The important thing to realize is that neither the Cyc team, nor Cyc itself claims to have a unified theory of time, space, and the universe. Nor does it embody some great master Laws of Thought. What they do have is a suite of specialized microtheories whose union covers the most common cases. ---------------------------------------------------------------- Subject: [7-2] How do they input common sense ? The Cyc team's basic knowledge browsing and editing tools consist of an extensive and growing set of HTML pages. This scheme provides maximal standardization and portability across platforms. In effect, this means that anyone with access to a WWW browser, the correct URL, and security clearance can, from anywhere in the world, browse in or edit the Cyc knowledge base. These tools allow the user to view assertions in the knowledge base and perform a variety of operations, including adding assertions, removing assertions, creating new constants, killing constants, renaming constants, setting inference performance parameters (e.g., forward or backward propagation for rules), asking for conclusions to be derived (if possible), and viewing the inference chains that resulted in particular conclusions. As these tools are for inhouse development, there is no public World Wide Web site available. There currently is some discussion of providing a subset of the Cyc database on an example Web site. Should this happen, the address will be publicized. This may be available before the end of 1995. There is also a variety of test suites that are run periodically to test the integrity of the knowledge base and the functioning of the inference engine. The Cyc team expects to give more emphasis to regular, automatic testing of the system now that product development has begun. ---------------------------------------------------------------- Subject: [7-3] What theoretical foundation is behind Cyc ? Cyc is not a theoretical effort, although there has been a lot of theory used in its construction. The Cyc team prefers to think of the project as an engineering effort. The primary focus of the Cyc project is to actually start consolidating a cohesive knowledge bank. Any theoretical issues which have been addressed have been directly motivated by the requirements of solving specific problems. The Cyc team believes that a hand-encoded effort using symbolic logic may express a significant fraction of the fundamental human knowledge typically shared by most people. This bootstrap process is greatly enhanced by the redundant nature of knowledge. Most knowledge uses and re-uses the same basic ideas and relationships in many different ways. The day to day entering of knowledge is not based on ethereal definitions of elaborate Causality and Time-Space-Intelligence collections. Most data is as plebian as 'living organisms have to eat to stay alive' or 'broom handles tend to be made of wood'. It is hoped that as the Natural Language effort continues (see [7-5], below), more knowledge may be entered by persons typing in assertions in English, and eventually by having Cyc 'read' source materials for itself, bothering its human attendants only when disambiguation is required. ---------------------------------------------------------------- Subject: [7-4] What is the difference between Cyc and an Expert System ? The knowledge in Cyc is more densely interrelated, Cyc has more information about the common attributes of the world, and Cyc has a broader focus than any individual expert system. A typical expert system uses highly detailed knowledge about a single, tightly-focused domain. Cyc encodes general knowledge about many different domains, viewed from a variety of perspectives. Based on the bodies of information (microtheories) it uses in inferencing, Cyc may draw differing conclusions. Cyc may be thought of as a tool for building expert systems and other programs that use a rule-based knowledge representation. It supports and uses both forward and backward chaining and the dynamic creation of terms (Skolemization). Cyc has an integrated argumentation-based truth maintenance system to provide logical reasoning as well as supporting non-monotonicity. ---------------------------------------------------------------- Subject: [7-5] What are they doing in Natural Language Processing ? The Cyc NL system is unique in having access to a very large, declaratively represented common sense knowledge base. Cyc helps the natural language system handle word/phrase disambiguation, and also provides a target internal representation language (CycL) that can be used to do interesting things, such as inference. A substantial portion of the Cyc natural language processing system (the lexicon and many semantic rules) is actually represented in the Cyc knowledge base; Cyc "knows about" words just like it "knows about" cars or trees. Syntactic parsing is carried out by application of phrase-structure rules to an input string. Semantic rules are applied to the output of the syntax module. It is in the application of the semantic rules that the knowledge in the knowledge base is proving especially advantageous. Most of the Cyc pilot applications developed in the recent past have some NL component in their interfaces. The captioned image retrieval application, for example, accepts queries in English, and allows captioners to describe new images to the system using English sentences. The Cyc NL team is currently expanding the lexicon, extending the parser, and adding new semantic capabilities to the system. ---------------------------------------------------------------- Subject: [8] What are Cyc's capabilities right now? The basic hypothesis behind symbolic Artificial Intelligence is that it is possible to simulate intelligence in a particular "microworld" by manipulating a set of symbols that represent that "microworld". The Cyc team believes this involves picking a particular task domain and then solving the problems encountered in that task domain by a combination of 1. Defining appropriate symbol-manipulation techniques 2. Building an adequate symbol set for representation 3. Find some general purpose reasoning mechanism 4. Build a reasoner using that mechanism Cyc's chosen "microworld" is more or less every particular task domain, but only down to a pre-expert level of detail. The symbol set and symbol manipulations should be biased by the special cases encountered in modeling the regularities in the world as encountered by the 'common man'. ---------------------------------------------------------------- Subject: [8-1] Can Cyc reason about data not stored in Cyc-format databases ? Yes. In the previous version of Cyc (Cyc 9) a pilot application named Cyccess was developed. Through Cyccess, Cyc could interface with structured information sources (SIS) such as databases or spreadsheets. Cyccess used Cyc to understand the contents of structured sources, to retrieve information, and to pose queries that depended on a combination of Cyc knowledge and the data in the SIS. After the information in a SIS was appropriately linked to assertions in Cyc, all the Cyc inferencing, guessing, and consistency checking capabilities were available. An interesting implication of this is that Cyc could use specific facts or time-sensitive information without duplicating it within the Cyc knowledge base. More recently, the Cyc team has built a production version of a database browsing and retrieval application for one of Cycorp's sponsors. ---------------------------------------------------------------- Subject: [8-2] If a robot had a radio link to Cyc, what could it do ? The robot would not have a sense of self in the same way that a human does. Cyc has predicates that refer to the symbol 'Cyc', but since it would not have a representation of itself in space, it couldn't be immediately usable as a control program for a robot. The CycL representation language is capable of representing many things about time and space, but few predicates are organized in the immediate form necessary for real time control. Another issue is that Cyc cannot sense itself in space. It does not currently embody any information about its location. Since Cyc does not have any sensors, it depends solely upon data input through knowledge enterers. The Cyc KB however, has the spatial knowledge necessary to allow the robot to navigate. It would be necessary to define the axioms the Cyc program would need to interpret input from sensors in terms of these spatial knowledge axioms. It is not part of the Cyc effort to build a real time planner which would be necessary in giving a robot autonomous control. ---------------------------------------------------------------- Subject: [8-3] When do they expect Cyc to be able to read? The subject of Natural Language or English input is complex. There is some implication in reading that there is a corresponding level of assimilation of new knowledge, as well as an understanding of what is read. Cyc does have some ability to convert English text into the language it uses to store knowledge. The English to CycL translator is not so robust that it can read news articles from a newsfeed and learn from them. Kathy Burns is leading the effort of the NL group and has substantially rewritten many components of the existing system. The new version of the Natural Language system is anticipated to be incorporated into existing applications using Cyc by the end of this year. These include image retrieval from text requests and limited natural language query processing. ---------------------------------------------------------------- Subject: [8-4] Are you able to converse with Cyc ? You cannot converse with Cyc as you would to a person. Generation of natural language text such as English is part of Cyc's continuing development plan, but it currently is not capable of conversation. A conversation involves many ellipses and false starts that people are able to process, but complicate conversation immensely. This is not to say you cannot ask questions of Cyc. The English to CycL translator is developed sufficiently to be the basis of several Cyc applications. The CycL to English generation is one of the focuses of the Natural language group. ---------------------------------------------------------------- Subject: [8-5] Will Cyc pass the Turing Test any time soon ? The Turing Test is a test developed by the computer scientist Alan Mathison Turing (1912-1954). It proposed a method to determine if a computer program should be classified as intelligent. Essentially, the test involves two input devices, one to a computer, and the other to a human. If an observer, using the input devices, cannot determine which device communicates with the machine, and which communicates with the human, the computer is said to have 'passed' the Turing test. The Cyc Team is interested in creating an Artificial Intelligence based agent. If in the process of creating this agent a program is created that should pass the Turing Test, it would be very satisfying. The type of common knowledge that is being put into Cyc would, in general, be useful to a program which attempted to pass the Turing Test. It is not expected that the Cyc program will pass the Turing Test in the near future. ---------------------------------------------------------------- Subject: [9] Cyc standards with ANSI or ISO standards It is important to the Cyc team that Cyc applications and interfaces (eventually) provide support for any commonly accepted knowledge interchange and knowledge representation standards capable of expressing the kinds of assertions and heuristics found in Cyc. Some of Cyc's capabilties may be more complex than those encoded in existing standards. One of the long term goals of Cyc's builders is to influence the shape of proposed knowledge interchange and knowledge representation standards. The Cyc team intends to make a major contribution to the development of a basic, "core" knowledge foundation that could be used by other projects and applications which need access to knowledge about consensus reality. ---------------------------------------------------------------- Subject: [9-1] What are the details of the functional interface to CycL ? The Functional Interface (FI) of Cyc is a set of procedures that can be used by external programs to query Cyc for conclusions or general information. It currently consists of twenty-seven operations, with the following having the most general utility: FI-FIND given a name string return the Cyc constant having that name FI-CREATE given a name string create and return a Cyc constant having that name FI-KILL given a term return a modified knowledge base in which the term no longer exists and any assertions of which it was a component are no longer true FI-RENAME given a constant given a string (new constant name) change the constant's name to the new name and return the new constant FI- ASSERT given a logical formula, given a knowledge base subset produce a modified knowledge base with the formula as an axiom and return a status notification FI-UNASSERT given a logical formula, given a knowledge base subset produce a modified knowledge base with the formula not an axiom (the formula may still be a theorem, i.e., follow from other axioms) and return a status notification FI-JUSTIFY given a (concluded, cached) formula given a knowledge base subset return a list of the cached formula-subset pairs that together justify the previous derivation of the formula as a theorem FI- ASK given a logical formula, possibly including free variables, given a knowledge base subset return a binding list of variables that will make the formula true (optional parameters which can be varied include whether or not to backchain, the desired number of bindings, the clock time allowed for the operation, and how deeply in the search tree to search) FI-CONTINUE-LAST-ASK (no inputs) continue the last call to FI-ASK from the search state where it terminated, looking for (more, new) bindings, and return a binding list (optional parameters which can be varied are the same as those for FI-ASK). This is a new feature, possible because in Cyc 10, unlike Cyc 9, inference search state is explicitly maintained FI-DENOTATION given a natural language string return a list of Cyc constants which (according to the Cyc lexicon) constitute possible denotations for the string The present implementation of the FI includes an external telnet server. This means that applications (and users) can telnet directly to Cyc, evaluate FI operations, and get the results. Currently, Cycorp has not chosen to publicize any port where this interaction may take place. Cycorp machines are privately owned. This telnet capability of the software is not an invitation to unauthorized access. ---------------------------------------------------------------- Subject: [9-2] Can an SQL query be used to ask Cyc a question? No. Since the SQL model presumes a relational abstraction consisting of tables with rows and columns, it does not translate well into the Sea of Assertions abstraction (or even the older Frame based abstraction). The Cyc model is much richer than the SQL model, and thus too many distinctions with the knowledge base probably could not be expressed. In essence, an SQL query would be too coarse grained to pose an adequately meaningful question to the Cyc knowledge base. (However, it is possible that object-oriented versions of SQL would be capable of posing meaningful questions to Cyc). ---------------------------------------------------------------- Subject: [9-3] Can Cyc generate an SQL query to find information it needs ? Yes. This is part of the capability that has been recently incorporated into Cyc to use knowledge stored externally to the Cyc knowledge base. To translate a CycL expression into an SQL database query, it is necessary to describe the database schema to Cyc using CycL. This procedure can be used to tell Cyc the meaning of the various rows and columns (fields) in any given SQL database. After this has been done, Cyc can use both its own internal rules of thumb and the data contained in the external database to derive new conclusions. ---------------------------------------------------------------- Subject: [9-4] Can Cyc interact with HTML ? Yes. There are existing tools to query Cyc and edit knowledge in Cyc that are HTML based. Cyc dynamically generates HTML pages in response to user queries. (See sections [6-1] and [7-2]) Cyc does not yet 'web-surf' or access gopher clients to learn about the world by using available Internet resources. Many of these resources are not in a form that Cyc can currently interpret, but Cyc's builders hope that as its natural language understanding abilities improve it will be able to assimilate knowledge from sources on the Internet. ---------------------------------------------------------------- Subject: [9-5] Does Cyc have a KIF interface ? KIF stands for Knowledge Interface Format. It, as well as other standards such as Conceptual Graphs, is intended to be a means of transmitting rules and facts which are stored in an Expert System Shell, or other knowledge base system. This format is intended to be a linear form of the logical assertions stored internally using symbols and other complex data structures. Since knowledge in Cyc is stored in this assertion format and CycL is also a form of linearly expressing assertions of a knowledge base, there are no theoretical restrictions that would prevent CycL from being expressed as KIF assertions. KIF currently is not a published standard, although there is a draft standard available, under the auspices of the American National Standards Institute (ANSI). The Cyc team is interested in this is an area yet there is currently no official interface developed in Cyc to allow Cyc to generate KIF or interpret KIF as input. ---------------------------------------------------------------- Subject: [A] Acknowledgements This FAQ has been based on magazine articles and books published by the Cyc team, notably Doug Lenat and R.V. Guha, and personal communication with Doug Lenat and Nick Siegel. Any mistakes in it are my sole responsibility, although this is not a warranty. I would appreciate a note about any inaccuracies or misrepresentations herein. This FAQ could not be created without the generosity of the Cyc team in sharing information with the computing community about the methods and philosophy they have been using. ---------------------------------------------------------------- Subject: [B] Bibliography of Expert Systems books, introductions, documentation, periodicals, and conference proceedings. Davidson, Clive Common Sense and the Computer, New Scientist April 2, 1994 Guha, R.V and Lenat, D.B. Pittman K., Pratt, D. Shepherd M. Cyc: a midterm report. Communications of the ACM August 1990/ Vol 33. No 8 Guha, R.V and Lenat, D.B. Cyc: a midterm report. A.I. Magazine, Fall 1990 Lenat, D.B. and Guha, R.V Building Large Knowledge Based Systems, Addison Wesley, Reading Mass, 1990 Lenat, D.B. and Guha, R.V Enabling Agents to Work Together, Communications of the ACM July 1994/ Vol 37. No. 7 Lenat, D.B Steps to Sharing Knowledge. In Toward Very Large Knowledge Bases. Edited by N.J.I. Mars. IOS Press, 1995. Lenat, D.B. Artificial Intelligence. Scientific American, September 1995. ----------------------------------------------------------------