Last-Modified: Feb 12, 1995 18:00 EST Posting-Frequency: Monthly Version: 0.03 Archive-Name: natural-lang-processing-faq This is the new draft of a FAQ (frequently asked questions and answers) list for the comp.ai.nat-lang newsgroup. The main reason for posting it now is for me to get as much feedback as possible before I go any further. Please don't hesitate to send me any comments, be they positive or negative. There are many blank spots in the FAQ, please help fill them. Copyright (c) 1994 Dragomir R. Radev. All rights reserved. Permission to distribute this FAQ by all volatile electronic means (mailing lists, FTP, HTTP, Usenet news, etc.) is hereby given under the restriction that the file is not modified and all disclaimers and acknowledgements remain intact. This permission does NOT apply to CD-ROMS and/or commercial printed publications. All requests for republication in this case should be referred to the FAQ maintainer (radev@cs.columbia.edu) Version: 0.03 TABLE OF CONTENTS ================= [1] General Information [2] History [3] Studies and Research [4] Contacts [5] How-to questions [6] Literature [7] Commercial Sites [8] Corpora [9] Miscellaneous Disclaimers and Notes --------------------- 1. Please read this FAQ list before posting to comp.ai.nat-lang 2. The FAQ is a collection of materials, rather than a complete reference. Some of the information may be out of date, so please be careful and take everything with a grain of salt. Unless an article contains explicit information about when it was last updated, it is older than February 1, 1994. 3. I don't assume any responsibility for wrong information. 4. I need suggestions as to what parts of the FAQ to keep, what parts to move to the ftp site and what new parts to include. 5. The maintainer of this list is Dragomir R. Radev (radev@cs.columbia.edu) 6. Any comments and corrections are more than welcome. So are contributions. Please help make the FAQ really helpful and interesting. [1] General Information ----------------------- [1-0] What is this FAQ all about A: This is an attempt to put together a list of frequently (and not so frequently) asked questions about Natural Language Processing and their answers. This document is in no way perfect or complete or 100 % accurate. In no way should the maintainer be responsible for damage resulting directly or indirectly from using information in this FAQ. The following questions and answers have been written by Mark Kantrowitz (mkant+ai-faq@cs.cmu.edu): 4-2, 4-4, 6-1 [1-1] What is NLP A: Natural Language Processing - definitions coming up :) [1-2] What is comp.ai.nat-lang A: Here follows the original charter for comp.ai.nat-lang. Name: comp.ai.nat-lang Moderation: This group will be unmoderated. Purpose: To discuss issues relating to natural language, especially computer-related issues from an AI viewpoint. The topics that will be discussed in this group will concentrate on, but are not limited to, the following: * Natural Language Understanding * Natural Language Generation * Machine Translation * Dialogue and Discourse Systems * Natural Language Interfaces * Parsing * Computational Linguistics * Computer-Aided Language Learning This group will avoid discussing issues that are more properly covered by other newsgroups. For example, speech synthesis should be discussed in comp.speech. However, due to the interdisciplinary nature of the field, there may be overlap in material between other groups. To try to keep this to a minimum, topics should pertain to computer-related aspects of natural language. Rules of Decorum: Because of the unmoderated format, anyone with access to this newsgroup will be able to post without review. This is meant to encourage discussion of the topics. Please refrain from "flames" or unnecessary criticism of a person's viewpoints or personality in a harsh or insulting manner. Criticisms should constructive and polite whenever possible. Intended Audience: The following is a repost of Terry Gaasterland's opinion on how this newsgroup would fit in to the other alternative newsgroups and mailing lists. [1-3] How to get this FAQ A: This FAQ is available currently from the comp.ai.nat-lang newsgroup. The current copy can also be retrieved from the following HTTP: http://www.cs.columbia.edu/~acl/nlpfaq.txt Soon, it will be also available from comp.answers, news.answers, as well as by anonymous ftp [2] History ----------- [2-1] What are the major accomplishments of the field Note: This section is in a very preliminary stage. [2-1-2] Some important theses & systems: Overall: Woods (1967), Procedural semantics Thorne et al. and Woods (1968-70), ATNs Winograd (1970), Shrdlu Woods et al. (1972), LSNLIS / Lunar Charniak (1972), Frames and demons Grosz (1977), Focus in task-oriented dialogues Marcus (1977), Deterministic parsing Davey (1978) Cohen, Phil (1979), Planning speech acts Allen (1980), Understanding speech acts McDonald (1980) McKeown (1982), TEXT Appelt (1982) Pollack (1986), Plan inference Conceptual Dependency: Schank (1969), Conceptual Dependency Schank, Riesbeck, Rieger, Goldman (1975), MARGIE Cullingford (1979), SAM Wilensky (1979), PAM DeJong (1980), FRUMP Lebowitz (1980), IPP Dyer (1982), BORIS Lytinen (1986), MOPTRANS Hovy (1986), PAULINE Ram (1989), AQUA Dehn (1989), AUTHOR/STARSHIP [3] Studies and Research ------------------------ [3-1] Which schools offer graduate programs in CL/NLP A: This list is, *of course*, completely preliminary. Please send me information about other programs. I will try and get in touch with the editors of the ACL guide to Graduate Programs in CL for more information. Universities are given in alphabetical order. If a certain university is not included now and you feel it must be included, please send me some information about it. Australia: Melbourne, University of Microsoft Institute of Advanced Software Technology in association with Macquarie University Canada: Montreal, University of Toronto, University of Waterloo, University of Finland: Helsinki, University of France: Paris 7, Jussieu, University of Germany: Bonn, University of Karlsruhe, University of Koblenz-Landau, University of Saarlandes, University of the Stuttgart, University of Tuebingen, University of Italy: Pisa, University of Trento, University of Japan: Kyoto University Korea: Pohang University of Science and Technology, Pohang Netherlands: Amsterdam, University of Groningen, University of Nijmegen, University of Tilburg, University of Utrecht, University of Sweden: Stockholm, University of UK: Brighton, University of Cambridge, University of Durham, University of Edinburgh, University of Sussex, University of USA: Brown University Buffalo, SUNY at California at Berkeley, University of California at Los Angeles, University of Carnegie-Mellon University Columbia University Delaware, University of Duke University Georgetown University Georgia, University of Georgia Institute of Technology Harvard University Indiana University Johns Hopkins University Massachusetts at Amherst, University of Massachusetts Institute of Technology New Mexico State University New York University Pennsylvania, University of Rochester, University of Southern California, University of Stanford University SUNY, Buffalo Yale University [3-2] How to apply to graduate school in CL/NLP [3-2-1]How to apply to graduate school in CL/NLP in the USA Usually, the best timetable is as follows (given that M is the month when your studies would start, usually, in September) M - 24 : Try to clarify your interests, is it really NLP that you are interested in, what possible subfields might be of interest to you, etc. Remember: 5 years working in an area you are not interested in will be a very painful experience. M - 18 : Read publications in the area of your interest in order to discover the best places for you to apply in terms of research, and professors. Remember: Unless you are familiar with the most current research, you will not be able to find the best place for you. M - 18 : Go to your local library and consult some of the available directories (see [3-3]) - write down as much information as you can about some 15-25 universities. These universities form your preliminary list. Remember: There are some 100 universities in the USA offering NLP/CL programs. Some of them will be more attractive to you than others. M - 18 : Talk to your advisers at school, talk to other students, post questions on the Internet. This way you will get advice on a few more univer- sities that you might have skipped until this moment. Remember: Others have faced what you are going through. Use their experience. M - 15 : Send letters to the universities that you have on your preliminary list. Make sure you indicate when do you want to start, what degree (MA, MS, Ph.D.) you are interested in, whether or not you will be applying for financial aid, whether you will need some special visa... Remember: Ask for all the information that you need, give them all the information they'd need to satisfy your request. M - 12 : Read carefully the information that you have received from the universities. Shorten your list of places to the number that you will eventually apply to (usually 5-8 is a good number). Make Remember: Make sure you include both your best choice schools and some places where you are almost certain of getting accepted. M - 10 : Fill in all the forms that are sent to you, ask your professors to send reference letters to the schools directly. Remember: Professors will be probably very busy at that time of the year (any time of the year...) Give them the reference forms as early as possible and make sure you specify a reasonable time for them to fill them in and send them out. M - 10 : (or earlier) - take the necessary tests (GRE, TOEFL, or others) that the schools want. Make sure you tell the testing service which universities you want them to send your scores to. Remember: Time yourself through several practice tests. The GRE General test, for example, is more about mastery of timing than knowledge. M - 9 : (approximately) - mail your forms to the schools, preferably 2-3 weeks before the deadlines. Remember: You don't want your applications to get there at the same time as everyone else. Give the admissions committee some extra time to review your application M - 6 : usually six months before the beginning of the semester that you are applying for, you will get a letter saying whether you have been accepted. Remember: Usually, thick letters, e-mails, and telegrams mean acceptance. Thin one-sheet letters will most likely be disappointing for you. M - 5 : now, you have been accepted to a few schools. Go back to the same resources that you used when you were deciding where to apply (journals, catalogs, directo- ries, professors, etc.). Ask the schools that accepted you to fly you in for a visit (many will do this). Remember: Don't forget non-academic factors such as location, financial aid, the athmosphere in the department, etc. [3-3] Where to get information on graduate programs A: The Peterson's Guide A: The ACL Directory of Graduate Programs in Computational Linguistics [3-4] Major non-academic research laboratories AT&T Bell Labs, Murray Hill, NJ BBN Systems and Technologies Corporation Bellcore, Morristown, NJ DFKI (German research center for AI) General Electric IRST, Italy IBM T.J. Watson Research, Yorktown Heights, NY Microsoft Research, Redmond, WA NEC Corporation SRI International, Menlo Park, CA SRI International, Cambridge, UK Xerox, Palo Alto, CA Xerox, Grenoble, France [4] Contacts ------------ [4-1] What major publications exist in the field Computational Linguistics, published by ACL - Julia Hirschberg, Editor PUBLISHED QUARTERLY BY MIT PRESS JOURNALS: (617) 253-2889 (PHONE), (617) 258-6779 (FAX), or JOURNALS-ORDERS@MIT.EDU. _Institutional_ Orders Only. Computer Speech and Language Journal of Natural Language Engineering _Natural Language Engineering_ is to be published four times a year in March, June, September and December. For more information or to submit a paper, please con- tact Roberto Garigliano, Laboratory for Natural Lan- guage Engineering, Computer Science Dept., University of Durham, South Road, Durham DH1 3LE, UK, Tel: +4 91 372639, Fax: +44 91 374 2560, <Roberto.Garigliano@ durham.ac.uk>, Language - Sarah Thomason, Editor Linguistic Inquiry - Samuel Jay Keyser, Editor PUBLISHED QUARTERLY BY MIT PRESS JOURNALS: (617) 253-2889 (PHONE), (617) 258-6779 (FAX), or JOURNALS-ORDERS@MIT.EDU. Machine Translation, published by Kluwer Natural Language and Linguistic Theory Speech Communication Elsevier Science B.V. Journals Department P.O. Box 211, 1000 AE Amsterdam [4-2] Electronic mailing lists Michael Everson <everson@irlearn.ucd.ie> has updated his List of Language Lists. FTP LNGLST15.TXT from /everson on <colossus.ucd.ie>. Information Retrieval: irlist <ir-l%uccvma.bitnet@vm1.nodak.edu> Natural Language and Knowledge Representation (moderated): nl-kr@cs.rpi.edu (formerly nl-kr@cs.rochester.edu) Gatewayed to the newsgroup comp.ai.nlang-know-rep. Natural Language Generation: siggen@black.bgu.ac.il LFG (Lexical-Functional Grammar): majordomo@list.stanford.edu Parsing: sigparse@cs.cmu.edu Statistics, Natural Language, and Computing: empiricists@csli.stanford.edu Colibri (weekly update on Conferences, Seminars, Jobs and Shareware in NLP and speech) colibri-request@let.ruu.nl Dependency Grammar dg@ai.uga.edu Prosody: listserv@purccvm.bitnet TEI: tei-l Text Analysis and Natural Language Applications: SCHOLAR@CUNYVM.BITNET Text Corpora: corpora-request@nora.hd.uib.no Speech production and perception: foNETiks <fonetiks@mailbase.ac.uk> LN: ln@frmop11.bitnet Linguist: linguist@tamvm1.tamu.edu ELSNET: elsnet-list@cogsci.ed.ac.uk Eastern (European) Language Engineering list: to join, send mail to poul_andersen@eurokom.ie Preprint archive mailing list For further information about (among other topics) submission of papers to the server, subscribing or canceling your subscription, requesting full text of any of the papers above, retrieving macro files for these papers, searching past listings, or submitting comments to the server operators, send a message: To: CMP-LG@XXX.LANL.GOV Subject: help [4-3] Newsgroups alt.usage.english English grammar, word usages, and related topics. comp.ai.nat-lang Natural language processing by computers. comp.ai.nlang-know-rep Natural Language and Knowledge Representation. (Moderated) comp.speech Research & applications in speech science & technology. sci.lang Natural languages, communication, etc. alt.etext Electronic texts. comp.text.sgml ISO 8879 SGML structured documents markup languages [4-4] Professional Organizations, Associations ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL) To get information about the ACL listserver, send mail to listserv@cs.columbia.edu with index acl-l in the message body. To get the membership form, include get acl-l 94membership.form in the message body. The ACL archive can also be accessed by anonymous ftp from cs.columbia.edu:/acl-l/. ASSOCIATION FOR MACHINE TRANSLATION IN THE AMERICAS (AMTA) 655 Fifteenth Street, NW, Suite 310, Washington, DC 20005 AMERICAN ASSOCIATION FOR ARTIFICIAL INTELLIGENCE (AAAI) COGNITIVE SCIENCE SOCIETY [4-5] Conferences COLING - last conference - Kyoto, Japan (August 94) ACL - next conference - Cambridge, Massachusetts (Summer 1995) EACL - next conference - Dublin, Ireland (Spring 1995) IJCAI - next conference - Montreal, Canada (Summer 1995) PacLing - next conference - Brisbane, Australia (Spring 1995) [4-6] Evaluation Competitions MUC - ARPA Message Understanding Conference Currently running MUC-6 (1994-95) using text articles from the Wall Street Journal Corpus. Systems compete in any or all of five categories including, named entity categorisation, word sense disambiguation, mini-MUC (contents scanning, template filling), coreference identification, predicate-argument identification. TREC - ARPA Text Retrieval Conference Information retrieval using NLP/statistical techniques. [5] How-to questions -------------------- [5-1] How to join a mailing list A: Most often, you have to send mail to the listserver at the site where the mailing list resides, and put "subscribe <listname> <yourname> in the body of the mail message. The underlined text is what you have to type in. Example: Mail listserv@tamvm1.tamu.edu ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Subject: some text here ^^^^^^^^^^^^^^ subscribe LINGUIST Dragomir R. Radev ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ . ^ [5-2] How to obtain files by anonymous ftp A: There are many ways. The most common way, however, is using a local ftp client. Suppose you want to get the file /pub/editors/webster.tar.Z from ftp.uu.net Here is a sample session. You type in whatever is underlined here. $ftp ftp.uu.net ^^^^^^^^^^^^^^ Connected to ftp.uu.net. 220 ftp.UU.NET FTP server Thu Apr 14 15:45:10 EDT 1994) ready. Name (ftp.uu.net:radev): anonymous ^^^^^^^^^ 331 Password required for anonymous. Password: radev@cs.columbia.edu ^^^^^^^^^^^^^^^^^^^^^ (put your email address here) 230 Guest login ok, access restrictions apply. ftp> cd pub/editors ^^^^^^^^^^^^^^ ftp> binary ^^^^^^ ftp> get webster.tar.Z ^^^^^^^^^^^^^^^^^ 200 PORT command successful. 150 Opening BINARY mode data connection for webster.tar.Z (148579 bytes). 226 Transfer complete. local: webster.tar.Z remote: webster.tar.Z 148579 bytes received in 2.2 seconds (67 Kbytes/s) ftp> quit ^^^^ $ [5-4] FTP repositories A: Here follows a list of the most popular FTP sites that carry NLP-related materials (data, tools, etc.) * Consortium for Lexical Research (CRL) The Consortium for Lexical Research is designed to serve as a repository for software and resources of importance to the natural language processing research community. Sharable resources, and the task of centralizing lexical data and tools, are of foremost concern in lexical research and computational linguistics. It is our objective to help alleviate the repeated recreation of basic software tools, and to assist in making essential data sources more generally available. CLR maintains a public ftp site, and a separate library of materials only for members of CLR. Currently CLR has about 60 members, mostly academic institutions, and almost every major natural language processing center in the U.S. belongs. Access to the members-only materials is strictly regulated by password and userid. Our catalog of current holdings is available by using anonymous ftp to clr.nmsu.edu (128.123.1.12). The file to 'get' is "catalog.ps" for a postscript version, or "catalog" for a simple ascii version. * Linguistic Data Consortium (LDC) To order LDC materials, send mail to ldc@unagi.cis.upenn.edu or fax your order to (215) 573-2175. If you require additional information before placing your order, please call (215) 898-0464. * Oxford Text Archive (OTA) ftp ota.ox.ac.uk ota/textarchive.list the current catalogue There are two classes of texts available from this FTP server (a) texts which are in TEI format and which we can make freely available (these all appear as category P texts in the shortlist) (b) texts which are available only under our standard conditions of use, (these all appear as category U or A in the shortlist) * University of Michigan Linguistics Archive (UMICH) ftp linguistics.archive.umich.edu /linguistics moderator: John Lawler (jlawler@umich.edu) * others... [5-6] Tools available on the Internet [5-6-1] Parsers [5-6-1-2] TAG - grammar of English site: linc.cis.upenn.edu directory: /pub/xtag [5-6-2] Generators [5-6-3] Machine Translation [5-6-4] Speech [5-6-4-1] OGI Speech Tools site: speech.cse.ogi.edu directory: /pub/tools [5-6-5] Lexical Tools [5-6-5-1] Wordnet site: clarity.princeton.edu directory: /pub [5-7] Papers and Technical Reports [5-7-1] Dissertations [5-7-1-2] Finch, S.P. Finding Structure in Language site: scott.cogsci.ed.ac.uk directory: /pub/statling/Papers/phdThesis.ps.Z [5-7-1-3] [5-7-2] Technical reports [5-7-3] Other on-line papers http://www.cc.gatech.edu/cogsci/cogsci.html Georgia Tech CogSci home page http://www.cc.gatech.edu/cogsci/nlr.html Georgia Tech Natural Language and Reasoning research group [5-8] WWW and gopher Resources (in no specific order) [5-8-1] Association for Computational Linguistics Home Page http://www.cs.columbia.edu/~acl/ [5-8-2] Dutch working group on Computational Linguistics http://tyr.let.rug.nl/~vannoord/clin/clin.html [5-8-3] Colibri, Newsletter on Computational Linguistics http://colibri.let.ruu.nl [5-8-4] Georgetown University Catalogue of Projects in Electronic Text gopher://gopher.georgetown.edu/11gopher_root%3a%5bcpet_projects_in_electronic_text%5d [5-8-5] The English Server at Carnegie-Mellon University http://english-server.hss.cmu.edu/FrontDoor.html/ [5-8-6] Project for American and French Research of the Treasury of the French Language, University of Chicago (ARTFL) http://tuna.uchicago.edu/ARTFL [5-8-7] Computational Phonology, University of Edinburgh http://ftp.cogsci.ed.ac.uk/phonology/CompPhon.html [5-8-8] New Mexico State University Computing Research Laboratory (CRL) http://crl.nmsu.edu/Home.html [5-8-9] The Carnegie-Mellon University AI repository http://www.cs.cmu.edu:8001/Web/Groups/AI/html/repository.html [5-8-10] The Computation and Language E-Print Archive http://xxx.lanl.gov/cmp-lg/ [5-8-11] The Journal of Artificial Intelligence Research http://www.cs.washington.edu/research/jair/home.html [5-8-12] ACL-94 On-line conference proceedings. http://xxx.lanl.gov/cmp-lg/ACL-94-proceedings.html [5-8-13] The Natural Language Software Registry http://cl-www.dfki.uni-sb.de/cl/registry/draft.html [5-8-14] Language and Linguistics Directory, Rice University gopher://chico.rice.edu/11/Subject/Language [5-8-15] A text-to-speech demo, University of Twente http://www_tios.cs.utwente.nl:8001/say/ [5-8-16] SRI International Natural Language WWW Page http://www.ai.sri.com/aic/natural-language/natural-language.html [5-8-17] Survey of Language Engineering Organisations in Central and Eastern Europe http://www.cogsci.ed.ac.uk/elsnet/survey/survey.html [5-9] Other comprehensive documents [5-9-1] Software Registries [5-9-1-1] Natural Language Software Registry (from DFKI) site: ftp.dfki.uni-sb.de directory: pub/registry [5-9-1-2] Natural Language Software Registry (from CRL) site: crlftp.nmsu.edu directory: pub/non-lexical/NL_Software_Registry [5-9-1-3] Another list of NLP software site: ftphost.uni-koblenz.de directory: outgoing files: software_list.ps.Z [6] Literature -------------- [6-1] What are some important books in NLP General: Gazdar, G. and Mellish, C., "Natural Language Processing in Lisp: An Introduction to Computational Linguistics", Addison-Wesley, Reading, Massachusetts, 1989. (There are three different editions of the book, one for Lisp, one for Prolog, and one for Pop-11.) Michael A. Covington, "Natural Language Processing for Prolog Programmers", Prentice-Hall, Englewood Cliffs, NJ, 1994. ISBN 0-13-629213-5. Grosz, Barbara J., Sparck-Jones, Karen, and Webber, Bonnie L., "Readings in Natural Language Processing", Morgan Kaufmann Publishers, Los Altos, CA, 1986, 664 pages. ISBN 0-934613-11-7, $44.95. Robert C. Berwick, "Computational Linguistics", MIT Press, Cambridge, MA, 1989, ISBN 0262-02266-4. Brady, Michael, and Berwick, Robert C., "Computational Models of Discourse", MIT Press, Cambridge, MA, 1983. Allen, James F., "Natural Language Understanding", The Benjamin/Cummings Publishing Company, Menlo Park, California, (Addison-Wesley Publishing Company, Reading, Massachusetts), 1988, 550 pages, ISBN 0-8053-0330-8. [A new edition came out in 1994] Code for the book is available from ftp.cs.cmu.edu:/user/ai/areas/nlp/bookcode/allen/ Terry Winograd, "Language as a Cognitive Process", Addison-Wesley, Reading, MA, 1983. Schank, R. and Abelson, R. "Scripts, Plans, Goals, and Understanding," Lawrence Erlbaum Associates, Hillsdale, New Jersey, 1977. Terminology: David Crystal, "A Dictionary of Linguistics and Phonetics", 3rd Edition, Basil Blackwell Publishers, New York, 1991. Parsing: Tomita, M. (Editor), "Current Issues in Parsing Technology", Kluwer Academic Publishers, Norwell, MA, 1991. Marcus, M. "A Theory of Syntactic Recognition for Natural Language," The MIT Press, Cambridge, MA, 1980. Pereira, F. and Sheiber, S. "Prolog and Natural-Language Analysis," Center for the Study of Language and Information, 1987. Probabilistic Parsing: Ted Briscoe and John Carroll, "Generalised Probabilistic LR Parsing of Natural Language (Corpora) with Unification-based Grammars", University of Cambridge Computer Laboratory, Technical Report Number 224, 1991. Zhi Biao Wu, Loke Soo Hsu, and Chew Lim Tan, "A Survey of Statistical Approaches to Natural Language Processing", Technical report TRA4/92, Department of Information Systems and Computer Science, National University of Singapore, 1992 Natural Language Understanding: Dyer, M. "In-Depth Understanding: A Computer Model of Integrated Processing for Narrative Comprehension," MIT Press, Cambridge, MA, 1983. Aravind Joshi, Bonnie Webber and Ivan Sag, "Elements of Discourse Understanding", Cambridge University Press, New York, 1981. Cohen, P. R., Morgan, J. and Pollack, M., editors, "Intentions in Communication", MIT Press, Cambridge, MA, 1990. Natural Language Interfaces: Raymond C. Perrault and Barbara J. Grosz, "Natural Language Interfaces", Annual Review of Computer Science, volume 1, J.F. Traub, editor, pages 435-452, Annual Reviews Inc., Palo Alto, CA, 1986. Natural Language Generation: McKeown, Kathleen R. and Swartout, William R., "Language Generation and Explanation", in Zock, M. and Sabah, G., editors, Advances in Natural Language Generation, Volume 1, Pages 1-51, Ablex Publishing Company, Norwood, NJ, 1988. (Overview of the state of the art in natural language generation.) There are several books published as a result of the international workshops on natural language generation. Speech: John Allen, Sharon Hunnicut and Dennis H. Klatt, "From Text to Speech: The MITalk System", Cambridge University Press, 1987. [Synthesis, precursor of DECtalk.] Frank Fallside and William A. Woods (editors), "Computer Speech Processing" Prentice Hall, Englewood Cliffs, NJ, 1985. X. D. Huang, Y. Ariki and M. A. Jack, "Hidden Markov Models for Speech Recognition", Edinburgh University Press, 1990. [Analysis] A. Nejat Ince (editor), "Digital Speech Processing: Speech Coding, Synthesis, and Recognition", Kluwer Academic Publishers, Boston, 1992. [Analysis and Synthesis] Kai-Fu Lee, "Automatic Speech Recognition: The Development of the SPHINX System", Kluwer Academic Publishers, Boston, MA, 1989. [Analysis] Douglas O'Shaughnessy, "Speech Communication: Human and Machine" Addison-Wesley, MA, 1987. [Analysis and Synthesis] Lawrence R. Rabiner and Ronald W. Schafer, "Digital Processing of Speech Signals", Prentice Hall, Englewood Cliffs, NJ, 1978. [Analysis and Synthesis] Lawrence R. Rabiner and Biing-Hwang Juang, "Fundamentals of Speech Recognition", Prentice Hall, Englewood Cliffs, NJ, 1993. ISBN 0-13-015157-2. [Analysis] Ronald W. Schafer and John D. Markel (editors), "Speech Analysis", IEEE Press, New York, 1979. [Analysis] Alex Waibel and Kai-Fu Lee (editors), "Readings in Speech Recognition" Morgan Kaufmann Publishers, San Mateo, CA, 1990, 680 pages. ISBN 1-55860-124-4, $49.95. [Analysis] Alex Waibel, "Prosody and Speech Recognition", Morgan Kaufmann Publishers, San Mateo, CA, 1988. [Analysis] Machine Translation: W. John Hutchins and Harold L. Somers, "An Introduction to Machine Translation", Academic Press, San Diego, 1992. 362 pages, ISBN 0-123-62830-X. Bonnie J. Dorr, "Machine Translation: A View from the Lexicon" MIT Press, Cambridge, MA 1993. 432 pages, ISBN 0-262-04138-3. Kenneth Goodman and Sergei Nirenburg., editors, "The KBMT Project: A Case Study in Knowledge-Based Machine Translation", Morgan Kaufmann Publishers, San Mateo, CA, 1991. 331 pages, ISBN 1-558-60129-5, $34.95. Arnold, D.J.; Balkan, L.; Lee Humphreys, R.; Meijer, S.; and Sadler, L. (1994). Machine Translation: An Introductory Guide. NCC Blackwell. The journal "Machine Translation" is the principle forum for current research. A review of MT systems on the market appeared in BYTE 18(1), January 1993. Reversible Grammars: Tomek Strzalkowski, editor, "Reversible Grammar in Natural Language Processing", Kluwer Academic Publishers, 1993. Proceedings of the ACL Workshop on Reversible Grammar in Natural Language Processing, UC Berkeley, 1991. (See especially Remi Zajac's paper.) Statistical Processing: Eugene Charniak, "Statistical Language Learning", MIT Press, Cambridge, Massachusetts, 1993, 170 pages. Linguistics: Vivian J. Cook, "Chomsky's Universal Grammar: An Introduction", Basil Blackwell Publisher, New York, 1988, 201 pages. Victoria Fromkin and Robert Rodman, "An Introduction to Language", Holt, Rinehart, and Winston, New York, 4th edition, 1988, 474 pages. Ralph Grishman, "Computational Linguistics: An Introduction", Cambridge University Press, New York, 1986, 193 pages. Liliane M.V. Haegeman, "Introduction to Government and Binding Theory", Basil Blackwell Publishers, Oxford, 1991, 618 pages. Michael A. K. Halliday, "An Introduction to Functional Grammar", Edward Arnold, London, 1985. Geoffrey C. Horrocks, "Generative Grammar", Longman, London, 1987, 339 pages. Andrew Radford, "Transformational Grammar: A First Course", Cambridge University Press, New York, 1988, 625 pages. Categorial Grammar (CG): M. Moortgat, "Categorial Investigations. Logical and Linguistic Aspects of the Lambek Calculus", Groningen-Amsterdam Studies in Semantics:9, Foris, Dordrecht, Holland, 1988. Richard T. Oehrle, Emmon Bach and Deirdre Wheeler, "Categorial Grammars and Natural Language Structures", Studies in Linguistics and Philosophy:32, D. Reidel Publishing Company, Dordrecht, 1988. Mary McGee Wood, "Categorial Grammars", Linguistic Theory Guides, Routledge, London, 1993. Cognitive Grammar: Ronald W. Langacker, "Foundations of cognitive grammar" Stanford University Press, 1987. Miscellaneous: _The Mulltilingual PC Directory_. By Ian Tresman. 254pp. Stamford CT: Knowledge Computing Ltd. Stefan Wermter, Hybrid connectionist natural language processing Chapman & Hall Inc, 1995. Connectionist approaches to natural language processing. Edited by Ronan G. Reilly and Noel E. Sharky. Earlsdale, 1992 ISBN 0-86377-179-3 _Natural Language Processing_. Ed. Fernando C.N. Pereira and Barbara J. Grosz. A Bradford Book. Cambridge, MA, and London: The MIT Press, 1994. Rptd from _Artificial Intelligence: An International Journal_, Volume 63, Numbers 1-2 (1993). _Research in Humanities Computing 1: Selected Papers from the ALLC/ACH Conference, Toronto, June 1989_. Ed. Ian Lancashire. Oxford: Clarendon Press, 1991. Peter D. Smith, _An Introduction to Text Processing_. Cambridge MA and London: The MIT Press, 1990. ISBN 0-262-19299-3. Computer processing of natural language Author Gilbert K Krulee published Prentice Hall ISBN 0-13-610299-3 [6-2] Encyclopedia of Artificial Intelligence A GUIDE TO COMPUTATIONAL LINGUISTICS ARTICLES IN THE ENCYCLOPEDIA OF ARTIFICIAL INTELLIGENCE, 2nd Edition Stuart C. Shapiro (editor) (John Wiley & Sons, 1992) compiled by: William J. Rapaport Department of Computer Science and Center for Cognitive Science State University of New York at Buffalo Buffalo, NY 14260 rapaport@cs.buffalo.edu AUTHOR TITLE PAGES Volume 1: Bookman, L. A., & Alterman, R. Analog Semantic Features 27-28 Alvarado, S. J. Argument Comprehension 30-52 Kucera, H. Brown Corpus 128-130 Srihari, S. N., & Hull, J. J. Character Recognition 138-150 Ballard, B., & Jones, M. Computational Linguistics 203-224 Hardt, S. L. Conceptual Dependency 259-265 Hindle, D. Deep Structure 328-330 Ingria, R.; Boguraev, B.; & Pustejovsky,J. Dictionary/Lexicon 341-365 Scha, R.; Bruce, B. C.; & Polanyi,L. Discourse Understanding 365-379 Tennant, H. Ellipsis 445-446 Novak, V. Fuzzy Logic: Applications to Natural Language 515-521 Woods, W. A. Grammar, Augmented Transition Network 552-563 Bruce, B., & Moser, M. G. Grammar, Case 563-570 Gazdar, G. Grammar, Generalized Phrase Structure 570-573 Joshi, A. K. Grammar, Phrase Structure 573-580 Burton, R. Grammar, Semantic 580-583 Bateman, J. A. Grammar, Systemic 583-592 Mallery, J. C.; Hurwitz, R.; & Duffy,G. Hermeneutics 596-611 Hill, J. C. Language Acquisition 761-772 Fass, D., & Pustejovsky, J. Lexical Decomposition 806-812 Pustejovsky, J. Lexical Semantics 812-819 Volume 2: Nagao, M. Machine Translation 898-902 Klavans, J. L., & Tzoukermann, E. Morphology 963-972 McDonald, D. D. Natural-Language Generation 983-997 Carbonell, J. G., & Hayes, P. J. Natural-Language Understanding 997-1016 Petrick, S. Parsing 1099-1109 Small, S. L. Parsing, Word-Expert 1109-1116 Wilks, Y., & Fass, D. Preference Semantics 1183-1194 Cruse, D. A. Presupposition 1194-1201 Dyer, M. G.; Cullingford, R. E.; & Alvarado, S. J. Scripts 1443-1460 Sowa, J. F. Semantic Networks 1493-1511 Devlin, K. J. Situation Theory and Situation Semantics 1541-1547 Briscoe, E. J. Speech Recognition 1553-1559 Norvig, P. Story Analysis 1568-1576 Alterman, R. Text Summarization 1579-1587 Sparck Jones, K. Thesaurus 1605-1613 Knight, K. Unification 1630-1636 Additional articles from the 1st edition (1987): Coelho, H. Grammar, Definite Clause 339-342 Berwick, R. Grammar, Transformational 353-361 Newmeyer, F. J. Linguistics, Competence and Performance 503-508 Wilks, Y. Machine Translation 564-571 Tennant, H. Menu-Based Natural Language 594-597 Koskenniemi, K. Morphology 619-620 Bates, M. Natural-Language Interfaces 655-660 Riesbeck, C. K. Parsing, Expectation-Driven 696-701 Keyser, S. J. Phonemes 744-746 Webber, B. Question Answering 814-822 Smith, B. C. Self-Reference 1005-1010 Hirst, G. Semantics 1024-1029 Woods, W. Semantics, Procedural 1029-1031 Allen, J. F. Speech Acts 1062-1065 Allen, J. Speech Recognition 1065-1070 Allen, J. Speech Synthesis 1070-1076 Briscoe, E. J. Speech Understanding 1076-1083 Lehnert, W. G. Story Analysis 1090-1099 [7] Commercial sites -------------------- [7-1] Machine Translation Globalink, Inc 9302 Lee Highway Fairfax, VA, 22031, USA Tel: +1 703 273 5600 Fax: +1 703 273 3866 Archers Translation Services 203-205 Desborough Road High Wycombe, Bucks., HP11 2QL, UK Tel: +44 494 537755 Fax: +44 494 474001 [8] Corpora ----------- Corpora are large sets of natural language documents (text or speech). They can usually be obtained through ftp, or sometimes on CD-ROM. Please read below for details. 0001 Word list site: gatekeeper.dec.com directory: pub/misc/stolfi-wordlists moderator: Jorge Stolfi 0002 Word list site: ftp.cs.vu.nl directory: /dictionaries 0003 CELEX To order LDC materials, send mail to ldc@unagi.cis.upenn.edu or fax your order to (215) 573-2175. If you require additional information before placing your order, please call (215) 898-0464. 0004 Word list site: wocket.vantage.gte.com directory: /pub/standard_dictionary 0005 Russian corpus mail: ingrid.maier@slaviska.uu.se 0006 Brown corpus no info for now... [9] Miscellaneous ----------------- [9-1] About this FAQ This FAQ is maintained by Dragomir R. Radev from Columbia University. Please send me all your comments, suggestions, corrections, additions, and such to my e-mail address: radev@cs.columbia.edu [9-2] Large parts of sections 4-1, 4-4, and 6-1 come from Mark Kantrowitz's comp.ai FAQ [9-3] Partial list of contributors (in alphabetical order): Paul Buitelaar paulb@zag.cs.brandeis.edu Russell Collingham R.J.Collingham@durham.ac.uk Robert Dale rdale@microsoft.com Joshua Goodman goodman@das.harvard.edu Malcolm Grandis Malcolm@celtic.demon.co.uk Graeme Hirst gh@cs.toronto.ca Mark Kantrowitz mkant+ai-faq@cs.cmu.edu Alberto Lavelli lavelli@irst.it Ashwin Ram ashwin@cc.gatech.edu William J. Rapaport rapaport@cs.buffalo.edu Hinrich Schuetze schuetze@Sante.Stanford.EDU Kevin Thomas kevint@cdplus.com Gertjan van Noord vannoord@let.rug.nl -- Dragomir R. Radev Graduate Research Assistant Natural Language Processing Group Columbia University CS Department Office: (212) 939-7121 Lab: (212) 939-7108 Home: (212) 749-9770 http://www.cs.columbia.edu/~radev