Last-Modified:  Feb 12, 1995 18:00 EST
Posting-Frequency: Monthly
Version: 0.03
Archive-Name: natural-lang-processing-faq


This is the new draft of a FAQ (frequently asked questions and answers)
list for the comp.ai.nat-lang newsgroup. The main reason for posting it now
is for me to get as much feedback as possible before I go any further.
Please don't hesitate to send me any comments, be they positive or negative.
There are many blank spots in the FAQ, please help fill them.


Copyright (c) 1994 Dragomir R. Radev. All rights reserved.

Permission to distribute this FAQ by all volatile electronic means
(mailing lists, FTP, HTTP, Usenet news, etc.) is hereby given under
the restriction that the file is not modified and all disclaimers and
acknowledgements remain intact.
This permission does NOT apply to CD-ROMS and/or commercial printed
publications. All requests for republication in this case should
be referred to the FAQ maintainer (radev@cs.columbia.edu)

Version: 0.03



    TABLE OF CONTENTS
    =================


[1] General Information
[2] History
[3] Studies and Research
[4] Contacts
[5] How-to questions
[6] Literature
[7] Commercial Sites
[8] Corpora
[9] Miscellaneous


Disclaimers and Notes
---------------------

 1. Please read this FAQ list before posting to comp.ai.nat-lang
 2. The FAQ is a collection of materials, rather than a complete reference.
    Some of the information may be out of date, so please be careful and
    take everything with a grain of salt. Unless an article contains
    explicit information about when it was last updated, it is older than
    February 1, 1994.
 3. I don't assume any responsibility for wrong information.
 4. I need suggestions as to what parts of the FAQ to keep, what parts to
    move to the ftp site and what new parts to include.
 5. The maintainer of this list is Dragomir R. Radev (radev@cs.columbia.edu)
 6. Any comments and corrections are more than welcome. So are
    contributions.  Please help make the FAQ really helpful and interesting.

[1] General Information
-----------------------

 [1-0] What is this FAQ all about

 A: This is an attempt to put together a list of frequently (and not so
    frequently) asked questions about Natural Language Processing and their
    answers. This document is in no way perfect or complete or 100 % accurate.
    In no way should the maintainer be responsible for damage resulting
    directly or indirectly from using information in this FAQ.

    The following questions and answers have been written by Mark Kantrowitz
    (mkant+ai-faq@cs.cmu.edu): 4-2, 4-4, 6-1

 [1-1] What is NLP

 A: Natural Language Processing - definitions coming up :)

 [1-2] What is comp.ai.nat-lang

 A: Here follows the original charter for comp.ai.nat-lang.

   Name:         comp.ai.nat-lang

   Moderation:   This group will be unmoderated.

   Purpose:      To discuss issues relating to natural language, especially
                 computer-related issues from an AI viewpoint.   The topics
                 that will be discussed in this group will concentrate on, but
                 are not limited to, the following:

                      *   Natural Language Understanding
                      *   Natural Language Generation
                      *   Machine Translation
                      *   Dialogue and Discourse Systems
                      *   Natural Language Interfaces
                      *   Parsing
                      *   Computational Linguistics
                      *   Computer-Aided Language Learning

                 This group will avoid discussing issues that are more properly
                 covered by other newsgroups.   For example, speech synthesis
                 should be discussed in comp.speech.   However, due to the
                 interdisciplinary nature of the field, there may be overlap in
                 material between other groups.    To try to keep this to a
                 minimum, topics should pertain to computer-related aspects
                 of natural language.

   Rules of Decorum:  Because of the unmoderated format, anyone with access to
                      this newsgroup will be able to post without review.
                      This is meant to encourage discussion of the topics.
                      Please refrain from "flames" or unnecessary criticism
                      of a person's viewpoints or personality in a harsh
                      or insulting manner.   Criticisms should constructive
                      and polite whenever possible.

   Intended Audience:  The following is a repost of Terry Gaasterland's opinion
                       on how this newsgroup would fit in to the other
                       alternative  newsgroups and mailing lists.

 [1-3] How to get this FAQ

 A: This FAQ is available currently from the comp.ai.nat-lang newsgroup.
    The current copy can also be retrieved from the following HTTP:
    http://www.cs.columbia.edu/~acl/nlpfaq.txt
    Soon, it will be also available from comp.answers, news.answers,
    as well as by anonymous ftp


[2] History
-----------

 [2-1] What are the major accomplishments of the field

    Note: This section is in a very preliminary stage.

   [2-1-2] Some important theses & systems:

      Overall:

    Woods (1967), Procedural semantics
    Thorne et al. and Woods (1968-70), ATNs
    Winograd (1970), Shrdlu
    Woods et al. (1972), LSNLIS / Lunar
    Charniak (1972), Frames and demons
    Grosz (1977), Focus in task-oriented dialogues
    Marcus (1977), Deterministic parsing
    Davey (1978)
    Cohen, Phil (1979), Planning speech acts
    Allen (1980), Understanding speech acts
    McDonald (1980)
    McKeown (1982), TEXT
    Appelt (1982)
    Pollack (1986), Plan inference

      Conceptual Dependency:

    Schank (1969), Conceptual Dependency
    Schank, Riesbeck, Rieger, Goldman (1975), MARGIE
    Cullingford (1979), SAM
    Wilensky (1979), PAM
    DeJong (1980), FRUMP
    Lebowitz (1980), IPP
    Dyer (1982), BORIS
    Lytinen (1986), MOPTRANS
    Hovy (1986), PAULINE
    Ram (1989), AQUA
    Dehn (1989), AUTHOR/STARSHIP

[3] Studies and Research
------------------------

 [3-1] Which schools offer graduate programs in CL/NLP

 A: This list is, *of course*, completely preliminary. Please send me
    information about other programs. I will try and get in touch with the
    editors of the ACL guide to Graduate Programs in CL for more information.
    Universities are given in alphabetical order. If a certain university
    is not included now and you feel it must be included, please send me
    some information about it.

    Australia:

    Melbourne, University of
    Microsoft Institute of Advanced Software Technology in association with
            Macquarie University

    Canada:

    Montreal, University of
    Toronto, University of
    Waterloo, University of

    Finland:

    Helsinki, University of

    France:

    Paris 7, Jussieu, University of

    Germany:

    Bonn, University of
    Karlsruhe, University of
    Koblenz-Landau, University of
    Saarlandes, University of the
    Stuttgart, University of
    Tuebingen, University of

    Italy:

    Pisa, University of
    Trento, University of

    Japan:

    Kyoto University

    Korea:

    Pohang University of Science and Technology, Pohang

    Netherlands:

    Amsterdam, University of
    Groningen, University of
    Nijmegen, University of
    Tilburg, University of
    Utrecht, University of

    Sweden:

    Stockholm, University of

    UK:

    Brighton, University of
    Cambridge, University of
    Durham, University of
    Edinburgh, University of
    Sussex, University of

    USA:

    Brown University
    Buffalo, SUNY at
    California at Berkeley, University of
    California at Los Angeles, University of
    Carnegie-Mellon University
    Columbia University
    Delaware, University of
    Duke University
    Georgetown University
    Georgia, University of
    Georgia Institute of Technology
    Harvard University
    Indiana University
    Johns Hopkins University
    Massachusetts at Amherst, University of
    Massachusetts Institute of Technology
    New Mexico State University
    New York University
    Pennsylvania, University of
    Rochester, University of
    Southern California, University of
    Stanford University
    SUNY, Buffalo
    Yale University

  [3-2] How to apply to graduate school in CL/NLP

  [3-2-1]How to apply to graduate school in CL/NLP in the USA

  Usually, the best timetable is as follows (given that M is the month
  when your studies would start, usually, in September)

        M - 24 : Try to clarify your interests, is it really NLP
                 that you are interested in, what possible
                 subfields might be of interest to you, etc.
                 Remember: 5 years working in an area you are
                           not interested in will be a very painful
                           experience.
        M - 18 : Read publications in the area of your interest
                 in order to discover the best places for
                 you to apply in terms of research, and
                 professors.
                 Remember: Unless you are familiar with the most
                           current research, you will not be able
                           to find the best place for you.
        M - 18 : Go to your local library and consult some of the
                 available directories (see [3-3]) - write down
                 as much information as you can about some
                 15-25 universities. These universities form your
                 preliminary list.
                 Remember: There are some 100 universities in the
                           USA offering NLP/CL programs. Some of them
                           will be more attractive to you than others.
        M - 18 : Talk to your advisers at school, talk to other
                 students, post questions on the Internet.
                 This way you will get advice on a few more univer-
                 sities that you might have skipped until this moment.
                 Remember: Others have faced what you are going
                           through. Use their experience.
        M - 15 : Send letters to the universities that you have
                 on your preliminary list. Make sure you indicate
                 when do you want to start, what degree (MA, MS,
                 Ph.D.) you are interested in, whether or not
                 you will be applying for financial aid, whether
                 you will need some special visa...
                 Remember: Ask for all the information that you
                           need, give them all the information they'd
                           need to satisfy your request.
        M - 12 : Read carefully the information that you have
                 received from the universities. Shorten your list
                 of places to the number that you will eventually
                 apply to (usually 5-8 is a good number). Make
                 Remember: Make sure you include both your best choice
                           schools and some places where you are almost
                           certain of getting accepted.
        M - 10 : Fill in all the forms that are sent to you,
                 ask your professors to send reference letters to
                 the schools directly.
                 Remember: Professors will be probably very busy
                           at that time of the year (any time of
                           the year...) Give them the reference forms
                           as early as possible and make sure you
                           specify a reasonable time for them to fill
                           them in and send them out.
        M - 10 : (or earlier) - take the necessary tests (GRE,
                 TOEFL, or others) that the schools want. Make sure
                 you tell the testing service which universities
                 you want them to send your scores to.
                 Remember: Time yourself through several practice
                           tests. The GRE General test, for example,
                           is more about mastery of timing than knowledge.
        M -  9 : (approximately) - mail your forms to the schools,
                 preferably 2-3 weeks before the deadlines.
                 Remember: You don't want your applications to get there
                           at the same time as everyone else. Give the
                           admissions committee some extra time to
                           review your application
        M -  6 : usually six months before the beginning of the semester
                 that you are applying for, you will get a letter
                 saying whether you have been accepted.
                 Remember: Usually, thick letters, e-mails, and telegrams
                           mean acceptance. Thin one-sheet letters will
                           most likely be disappointing for you.
        M -  5 : now, you have been accepted to a few schools. Go back
                 to the same resources that you used when you were
                 deciding where to apply (journals, catalogs, directo-
                 ries, professors, etc.). Ask the schools that accepted
                 you to fly you in for a visit (many will do this).
                 Remember: Don't forget non-academic factors such as
                           location, financial aid, the athmosphere in
                           the department, etc.


  [3-3] Where to get information on graduate programs

  A: The Peterson's Guide
  A: The ACL Directory of Graduate Programs in Computational Linguistics

  [3-4] Major non-academic research laboratories

   AT&T Bell Labs, Murray Hill, NJ
   BBN Systems and Technologies Corporation
   Bellcore, Morristown, NJ
   DFKI (German research center for AI)
   General Electric
   IRST, Italy
   IBM T.J. Watson Research, Yorktown Heights, NY
   Microsoft Research, Redmond, WA
   NEC Corporation
   SRI International, Menlo Park, CA
   SRI International, Cambridge, UK
   Xerox, Palo Alto, CA
   Xerox, Grenoble, France

[4] Contacts
------------

 [4-1] What major publications exist in the field

    Computational Linguistics, published by ACL - Julia Hirschberg, Editor

      PUBLISHED QUARTERLY BY MIT PRESS JOURNALS:
      (617) 253-2889 (PHONE), (617) 258-6779 (FAX), or
      JOURNALS-ORDERS@MIT.EDU.  _Institutional_ Orders Only.

    Computer Speech and Language

    Journal of Natural Language Engineering

      _Natural Language Engineering_ is to be published four
      times a year in March, June, September and December.
      For more information or to submit a paper, please con-
      tact Roberto Garigliano, Laboratory for Natural Lan-
      guage Engineering, Computer Science Dept., University of
      Durham, South Road, Durham DH1 3LE, UK, Tel: +4 91
      372639, Fax: +44 91 374 2560, <Roberto.Garigliano@
      durham.ac.uk>,

    Language - Sarah Thomason, Editor

    Linguistic Inquiry - Samuel Jay Keyser, Editor

      PUBLISHED QUARTERLY BY MIT PRESS JOURNALS:
      (617) 253-2889 (PHONE), (617) 258-6779 (FAX),
      or JOURNALS-ORDERS@MIT.EDU.

    Machine Translation, published by Kluwer

    Natural Language and Linguistic Theory

    Speech Communication

      Elsevier Science B.V.
      Journals Department
      P.O. Box 211, 1000 AE
      Amsterdam

 [4-2] Electronic mailing lists

   Michael Everson <everson@irlearn.ucd.ie> has updated his
   List of Language Lists.  FTP LNGLST15.TXT from /everson
   on <colossus.ucd.ie>.

   Information Retrieval:
      irlist <ir-l%uccvma.bitnet@vm1.nodak.edu>

   Natural Language and Knowledge Representation (moderated):
      nl-kr@cs.rpi.edu (formerly nl-kr@cs.rochester.edu)
      Gatewayed to the newsgroup comp.ai.nlang-know-rep.

   Natural Language Generation:
      siggen@black.bgu.ac.il

   LFG (Lexical-Functional Grammar):
      majordomo@list.stanford.edu

   Parsing:
      sigparse@cs.cmu.edu

   Statistics, Natural Language, and Computing:
      empiricists@csli.stanford.edu

   Colibri (weekly update on Conferences, Seminars, Jobs and Shareware in
   NLP and speech)
      colibri-request@let.ruu.nl

   Dependency Grammar
      dg@ai.uga.edu

   Prosody:
      listserv@purccvm.bitnet

   TEI:
      tei-l

   Text Analysis and Natural Language Applications:
      SCHOLAR@CUNYVM.BITNET

   Text Corpora:
      corpora-request@nora.hd.uib.no

   Speech production and perception:
      foNETiks <fonetiks@mailbase.ac.uk>

   LN:
      ln@frmop11.bitnet

   Linguist:
      linguist@tamvm1.tamu.edu

   ELSNET:
      elsnet-list@cogsci.ed.ac.uk

   Eastern (European) Language Engineering list:
      to join, send mail to poul_andersen@eurokom.ie

   Preprint archive mailing list

     For further information about (among other topics) submission of papers to
     the server, subscribing or canceling your subscription, requesting full
     text of any of the papers above, retrieving macro files for these papers,
     searching past listings, or submitting comments to the server operators,
     send a message:
        To: CMP-LG@XXX.LANL.GOV
        Subject: help

 [4-3] Newsgroups

   alt.usage.english       English grammar, word usages, and related topics.
   comp.ai.nat-lang        Natural language processing by computers.
   comp.ai.nlang-know-rep  Natural Language and Knowledge Representation.
                           (Moderated)
   comp.speech             Research & applications in speech science &
                           technology.
   sci.lang                Natural languages, communication, etc.
   alt.etext               Electronic texts.
   comp.text.sgml          ISO 8879 SGML structured documents markup languages


 [4-4] Professional Organizations, Associations

    ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL)

      To get information about the ACL listserver, send mail to
         listserv@cs.columbia.edu
      with
         index acl-l
      in the message body. To get the membership form, include
         get acl-l 94membership.form
      in the message body. The ACL archive can also be accessed by
      anonymous ftp from cs.columbia.edu:/acl-l/.

      ASSOCIATION FOR MACHINE TRANSLATION IN THE AMERICAS (AMTA)
      655 Fifteenth Street, NW, Suite 310, Washington, DC 20005


    AMERICAN ASSOCIATION FOR ARTIFICIAL INTELLIGENCE (AAAI)

    COGNITIVE SCIENCE SOCIETY


 [4-5] Conferences

    COLING - last conference - Kyoto, Japan (August 94)

    ACL - next conference - Cambridge, Massachusetts (Summer 1995)

    EACL - next conference - Dublin, Ireland (Spring 1995)

    IJCAI - next conference - Montreal, Canada (Summer 1995)

    PacLing - next conference - Brisbane, Australia (Spring 1995)

 [4-6] Evaluation Competitions

    MUC - ARPA Message Understanding Conference
    Currently running MUC-6 (1994-95) using text articles from the Wall Street
    Journal Corpus. Systems compete in any or all of five categories including,
    named entity categorisation, word sense disambiguation, mini-MUC (contents
    scanning, template filling), coreference identification, predicate-argument
    identification.

    TREC - ARPA Text Retrieval Conference
    Information retrieval using NLP/statistical techniques.


[5] How-to questions
--------------------

 [5-1] How to join a mailing list

   A: Most often, you have to send mail to the listserver at the site where
       the mailing list resides, and put "subscribe <listname> <yourname> in the
      body of the mail message. The underlined text is what you have to type in.

      Example:

      Mail listserv@tamvm1.tamu.edu
      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

      Subject: some text here
               ^^^^^^^^^^^^^^
      subscribe LINGUIST Dragomir R. Radev
      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      .
      ^

 [5-2] How to obtain files by anonymous ftp

   A: There are many ways. The most common way, however, is using a local ftp
      client.
      Suppose you want to get the file /pub/editors/webster.tar.Z
      from ftp.uu.net

      Here is a sample session. You type in whatever is underlined here.

      $ftp ftp.uu.net
       ^^^^^^^^^^^^^^
      Connected to ftp.uu.net.
      220 ftp.UU.NET FTP server Thu Apr 14 15:45:10 EDT 1994) ready.
      Name (ftp.uu.net:radev): anonymous
                               ^^^^^^^^^

      331 Password required for  anonymous.
      Password: radev@cs.columbia.edu
                ^^^^^^^^^^^^^^^^^^^^^  (put your email address here)

      230 Guest login ok, access restrictions apply.
      ftp> cd pub/editors
           ^^^^^^^^^^^^^^
      ftp> binary
           ^^^^^^
      ftp> get webster.tar.Z
           ^^^^^^^^^^^^^^^^^
      200 PORT command successful.
      150 Opening BINARY mode data connection for webster.tar.Z (148579 bytes).
      226 Transfer complete.
      local: webster.tar.Z remote: webster.tar.Z
      148579 bytes received in 2.2 seconds (67 Kbytes/s)
      ftp> quit
           ^^^^
      $

 [5-4] FTP repositories

   A: Here follows a list of the most popular FTP sites that carry NLP-related
      materials (data, tools, etc.)

   * Consortium for Lexical Research (CRL)

     The Consortium for Lexical Research is designed to serve as a
     repository for software and resources of importance to the natural
     language processing research community. Sharable resources, and the
     task of centralizing lexical data and tools, are of foremost
     concern in lexical research and computational linguistics. It
     is our objective to help alleviate the repeated recreation of
     basic software tools, and to assist in making essential data
     sources more generally available.

     CLR maintains a public ftp site, and a separate library of
     materials only for members of CLR. Currently CLR has about 60
     members, mostly academic institutions, and almost every major
     natural language processing center in the U.S. belongs. Access to
     the members-only materials is strictly regulated by password and
     userid.

     Our catalog of current holdings is available by using anonymous
     ftp to clr.nmsu.edu (128.123.1.12). The file to 'get' is
     "catalog.ps" for a postscript version, or "catalog" for a simple
     ascii version.

   * Linguistic Data Consortium (LDC)

     To order LDC materials, send mail to ldc@unagi.cis.upenn.edu
     or fax your order to (215) 573-2175. If you require additional
     information before placing your order, please call (215) 898-0464.

   * Oxford Text Archive (OTA)

     ftp ota.ox.ac.uk
     ota/textarchive.list         the current catalogue

     There are two classes of texts available from this FTP server

     (a) texts which are in TEI format and which we can make freely
         available (these all appear as category P texts in the shortlist)

     (b) texts which are available only under our standard conditions of
         use, (these all appear as category U or A in the shortlist)

   * University of Michigan Linguistics Archive (UMICH)

     ftp linguistics.archive.umich.edu
     /linguistics
     moderator: John Lawler (jlawler@umich.edu)

   * others...

 [5-6] Tools available on the Internet

 [5-6-1] Parsers

 [5-6-1-2] TAG - grammar of English

    site:      linc.cis.upenn.edu
    directory: /pub/xtag

 [5-6-2] Generators

 [5-6-3] Machine Translation

 [5-6-4] Speech

 [5-6-4-1] OGI Speech Tools

    site:      speech.cse.ogi.edu
    directory: /pub/tools

 [5-6-5] Lexical Tools

 [5-6-5-1] Wordnet

    site:      clarity.princeton.edu
    directory: /pub

 [5-7] Papers and Technical Reports

 [5-7-1] Dissertations

 [5-7-1-2] Finch, S.P. Finding Structure in Language

    site:      scott.cogsci.ed.ac.uk
    directory: /pub/statling/Papers/phdThesis.ps.Z

 [5-7-1-3]

 [5-7-2] Technical reports

 [5-7-3] Other on-line papers

    http://www.cc.gatech.edu/cogsci/cogsci.html  Georgia Tech CogSci home page
    http://www.cc.gatech.edu/cogsci/nlr.html     Georgia Tech Natural Language
                                                 and Reasoning research group
 [5-8] WWW and gopher Resources (in no specific order)

 [5-8-1] Association for Computational Linguistics Home Page

       http://www.cs.columbia.edu/~acl/

 [5-8-2] Dutch working group on Computational Linguistics

       http://tyr.let.rug.nl/~vannoord/clin/clin.html

 [5-8-3] Colibri, Newsletter on Computational Linguistics

       http://colibri.let.ruu.nl

 [5-8-4] Georgetown University Catalogue of Projects in Electronic Text

       gopher://gopher.georgetown.edu/11gopher_root%3a%5bcpet_projects_in_electronic_text%5d

 [5-8-5] The English Server at Carnegie-Mellon University

       http://english-server.hss.cmu.edu/FrontDoor.html/

 [5-8-6] Project for American and French Research of the Treasury of the
         French Language, University of Chicago (ARTFL)

       http://tuna.uchicago.edu/ARTFL

 [5-8-7] Computational Phonology, University of Edinburgh

       http://ftp.cogsci.ed.ac.uk/phonology/CompPhon.html

 [5-8-8] New Mexico State University Computing Research Laboratory (CRL)

       http://crl.nmsu.edu/Home.html

 [5-8-9] The Carnegie-Mellon University AI repository

       http://www.cs.cmu.edu:8001/Web/Groups/AI/html/repository.html

 [5-8-10] The Computation and Language E-Print Archive

       http://xxx.lanl.gov/cmp-lg/

 [5-8-11] The Journal of Artificial Intelligence Research

       http://www.cs.washington.edu/research/jair/home.html

 [5-8-12] ACL-94 On-line conference proceedings.

       http://xxx.lanl.gov/cmp-lg/ACL-94-proceedings.html

 [5-8-13] The Natural Language Software Registry

       http://cl-www.dfki.uni-sb.de/cl/registry/draft.html

 [5-8-14] Language and Linguistics Directory, Rice University

       gopher://chico.rice.edu/11/Subject/Language

 [5-8-15] A text-to-speech demo, University of Twente

       http://www_tios.cs.utwente.nl:8001/say/

 [5-8-16] SRI International Natural Language WWW Page

       http://www.ai.sri.com/aic/natural-language/natural-language.html

 [5-8-17] Survey of Language Engineering Organisations in Central and
          Eastern Europe

       http://www.cogsci.ed.ac.uk/elsnet/survey/survey.html

 [5-9] Other comprehensive documents

 [5-9-1] Software Registries

 [5-9-1-1] Natural Language Software Registry (from DFKI)

    site:      ftp.dfki.uni-sb.de
    directory: pub/registry

 [5-9-1-2] Natural Language Software Registry (from CRL)

    site:      crlftp.nmsu.edu
    directory: pub/non-lexical/NL_Software_Registry

 [5-9-1-3] Another list of NLP software

    site:      ftphost.uni-koblenz.de
    directory: outgoing
    files:     software_list.ps.Z

[6] Literature
--------------

 [6-1] What are some important books in NLP

General:

   Gazdar, G. and Mellish, C., "Natural Language Processing in Lisp:
   An Introduction to Computational Linguistics", Addison-Wesley,
   Reading, Massachusetts, 1989. (There are three different editions
   of the book, one for Lisp, one for Prolog, and one for Pop-11.)

   Michael A. Covington, "Natural Language Processing for Prolog
   Programmers", Prentice-Hall, Englewood Cliffs, NJ, 1994. ISBN
   0-13-629213-5.

   Grosz, Barbara J., Sparck-Jones, Karen, and Webber, Bonnie L.,
   "Readings in Natural Language Processing", Morgan Kaufmann
   Publishers, Los Altos, CA, 1986, 664 pages. ISBN 0-934613-11-7, $44.95.

   Robert C. Berwick, "Computational Linguistics", MIT Press,
   Cambridge, MA, 1989, ISBN 0262-02266-4.

   Brady, Michael, and Berwick, Robert C., "Computational Models
   of Discourse", MIT Press, Cambridge, MA, 1983.

   Allen, James F., "Natural Language Understanding", The
   Benjamin/Cummings Publishing Company, Menlo Park, California,
   (Addison-Wesley Publishing Company, Reading, Massachusetts),
   1988, 550 pages, ISBN 0-8053-0330-8. [A new edition came out in 1994]
   Code for the book is available from
      ftp.cs.cmu.edu:/user/ai/areas/nlp/bookcode/allen/

   Terry Winograd, "Language as a Cognitive Process", Addison-Wesley,
   Reading, MA, 1983.

   Schank, R. and Abelson, R.  "Scripts, Plans, Goals, and Understanding,"
   Lawrence Erlbaum Associates, Hillsdale, New Jersey, 1977.

Terminology:

   David Crystal, "A Dictionary of Linguistics and Phonetics", 3rd Edition,
   Basil Blackwell Publishers, New York, 1991.

Parsing:

   Tomita, M. (Editor), "Current Issues in Parsing Technology",
   Kluwer Academic Publishers, Norwell, MA, 1991.

   Marcus, M.  "A Theory of Syntactic Recognition for Natural Language,"
   The MIT Press, Cambridge, MA, 1980.

   Pereira, F. and Sheiber, S.  "Prolog and Natural-Language Analysis,"
   Center for the Study of Language and Information, 1987.

Probabilistic Parsing:

   Ted Briscoe and John Carroll, "Generalised Probabilistic LR Parsing of
   Natural Language (Corpora) with Unification-based Grammars",
   University of Cambridge Computer Laboratory, Technical Report Number
   224, 1991.

   Zhi Biao Wu, Loke Soo Hsu, and Chew Lim Tan, "A Survey of Statistical
   Approaches to Natural Language Processing", Technical report TRA4/92,
   Department of Information Systems and Computer Science, National
   University of Singapore, 1992

Natural Language Understanding:

   Dyer, M.  "In-Depth Understanding:  A Computer Model of Integrated
   Processing for Narrative Comprehension,"  MIT Press, Cambridge, MA, 1983.

   Aravind Joshi, Bonnie Webber and Ivan Sag, "Elements of Discourse
   Understanding", Cambridge University Press, New York, 1981.

   Cohen, P. R., Morgan, J. and Pollack, M., editors, "Intentions in
   Communication", MIT Press, Cambridge, MA, 1990.

Natural Language Interfaces:

   Raymond C. Perrault and Barbara J. Grosz, "Natural Language
   Interfaces", Annual Review of Computer Science, volume 1, J.F. Traub,
   editor, pages 435-452, Annual Reviews Inc., Palo Alto, CA, 1986.

Natural Language Generation:

   McKeown, Kathleen R. and Swartout, William R., "Language
   Generation and Explanation", in Zock, M. and Sabah, G.,
   editors, Advances in Natural Language Generation, Volume 1, Pages
   1-51, Ablex Publishing Company, Norwood, NJ, 1988. (Overview of
   the state of the art in natural language generation.)

   There are several books published as a result of the international
   workshops on natural language generation.

Speech:

   John Allen, Sharon Hunnicut and Dennis H. Klatt, "From Text to Speech:
   The MITalk System", Cambridge University Press, 1987. [Synthesis,
   precursor of DECtalk.]

   Frank Fallside and William A. Woods (editors), "Computer Speech Processing"
   Prentice Hall, Englewood Cliffs, NJ, 1985.

   X. D. Huang, Y. Ariki and M. A. Jack, "Hidden Markov Models for Speech
   Recognition", Edinburgh University Press, 1990. [Analysis]

   A. Nejat Ince (editor), "Digital Speech Processing: Speech Coding,
   Synthesis, and Recognition", Kluwer Academic Publishers, Boston,
   1992. [Analysis and Synthesis]

   Kai-Fu Lee, "Automatic Speech Recognition: The Development of the
   SPHINX System", Kluwer Academic Publishers, Boston, MA, 1989. [Analysis]

   Douglas O'Shaughnessy, "Speech Communication: Human and Machine"
   Addison-Wesley, MA, 1987. [Analysis and Synthesis]

   Lawrence R. Rabiner and Ronald W. Schafer, "Digital Processing of
   Speech Signals", Prentice Hall, Englewood Cliffs, NJ, 1978.
   [Analysis and Synthesis]

   Lawrence R. Rabiner and Biing-Hwang Juang, "Fundamentals of Speech
   Recognition", Prentice Hall, Englewood Cliffs, NJ, 1993.
   ISBN 0-13-015157-2. [Analysis]

   Ronald W. Schafer and John D. Markel (editors), "Speech Analysis",
   IEEE Press, New York, 1979. [Analysis]

   Alex Waibel and Kai-Fu Lee (editors), "Readings in Speech Recognition"
   Morgan Kaufmann Publishers, San Mateo, CA, 1990, 680 pages.
   ISBN 1-55860-124-4, $49.95. [Analysis]

   Alex Waibel, "Prosody and Speech Recognition", Morgan Kaufmann
   Publishers, San Mateo, CA, 1988. [Analysis]

Machine Translation:

   W. John Hutchins and Harold L. Somers, "An Introduction to Machine
   Translation", Academic Press, San Diego, 1992. 362 pages, ISBN
   0-123-62830-X.

   Bonnie J. Dorr, "Machine Translation: A View from the Lexicon" MIT
   Press, Cambridge, MA 1993. 432 pages, ISBN 0-262-04138-3.

   Kenneth Goodman and Sergei Nirenburg., editors, "The KBMT Project: A
   Case Study in Knowledge-Based Machine Translation", Morgan Kaufmann
   Publishers, San Mateo, CA, 1991. 331 pages, ISBN 1-558-60129-5, $34.95.

   Arnold, D.J.; Balkan, L.; Lee Humphreys, R.; Meijer, S.; and Sadler, L.
   (1994). Machine Translation: An Introductory Guide. NCC Blackwell.

   The journal "Machine Translation" is the principle forum for
   current research.

   A review of MT systems on the market appeared in BYTE 18(1), January 1993.

Reversible Grammars:

   Tomek Strzalkowski, editor, "Reversible Grammar in Natural Language
   Processing", Kluwer Academic Publishers, 1993.

   Proceedings of the ACL Workshop on Reversible Grammar in Natural
   Language Processing, UC Berkeley, 1991. (See especially Remi
   Zajac's paper.)

Statistical Processing:
   Eugene Charniak, "Statistical Language Learning", MIT Press, Cambridge,
   Massachusetts, 1993, 170 pages.

Linguistics:

   Vivian J. Cook, "Chomsky's Universal Grammar: An Introduction", Basil
   Blackwell Publisher, New York, 1988, 201 pages.

   Victoria Fromkin and Robert Rodman, "An Introduction to Language",
   Holt, Rinehart, and Winston, New York, 4th edition, 1988, 474 pages.

   Ralph Grishman, "Computational Linguistics: An Introduction",
   Cambridge University Press, New York, 1986, 193 pages.

   Liliane M.V. Haegeman, "Introduction to Government and Binding
   Theory", Basil Blackwell Publishers, Oxford, 1991, 618 pages.

   Michael A. K. Halliday, "An Introduction to Functional Grammar",
   Edward Arnold, London, 1985.

   Geoffrey C. Horrocks, "Generative Grammar", Longman, London, 1987,
   339 pages.

   Andrew Radford, "Transformational Grammar: A First Course", Cambridge
   University Press, New York, 1988, 625 pages.

Categorial Grammar (CG):

   M. Moortgat, "Categorial Investigations. Logical and Linguistic
   Aspects of the Lambek Calculus", Groningen-Amsterdam Studies in
   Semantics:9, Foris, Dordrecht, Holland, 1988.

   Richard T. Oehrle, Emmon Bach and Deirdre Wheeler, "Categorial
   Grammars and Natural Language Structures", Studies in Linguistics
   and Philosophy:32, D. Reidel Publishing Company, Dordrecht, 1988.

   Mary McGee Wood, "Categorial Grammars", Linguistic Theory Guides,
   Routledge, London, 1993.

Cognitive Grammar:

   Ronald W. Langacker, "Foundations of cognitive grammar" Stanford
   University Press, 1987.

Miscellaneous:

   _The Mulltilingual PC Directory_. By Ian Tresman. 254pp.
   Stamford CT: Knowledge Computing Ltd.

   Stefan Wermter, Hybrid connectionist natural language processing
   Chapman & Hall Inc, 1995.

   Connectionist approaches to natural language processing.
   Edited by Ronan G. Reilly and Noel E. Sharky.
   Earlsdale, 1992 ISBN 0-86377-179-3

   _Natural Language Processing_.  Ed. Fernando C.N. Pereira and
   Barbara J. Grosz. A Bradford Book. Cambridge, MA, and London:
   The MIT Press, 1994. Rptd from _Artificial Intelligence: An
   International Journal_, Volume 63, Numbers 1-2 (1993).

   _Research in Humanities Computing 1: Selected Papers
   from the ALLC/ACH Conference, Toronto, June 1989_.
   Ed. Ian Lancashire. Oxford: Clarendon Press, 1991.

   Peter D. Smith, _An Introduction to Text Processing_.
   Cambridge MA and London: The MIT Press, 1990.
   ISBN 0-262-19299-3.

   Computer processing of natural language
   Author Gilbert K Krulee
   published Prentice Hall
   ISBN 0-13-610299-3

 [6-2] Encyclopedia of Artificial Intelligence

                   A GUIDE TO COMPUTATIONAL LINGUISTICS ARTICLES IN
               THE ENCYCLOPEDIA OF ARTIFICIAL INTELLIGENCE, 2nd Edition

                  Stuart C. Shapiro (editor) (John Wiley & Sons, 1992)

                                    compiled by:

                                William J. Rapaport

                           Department of Computer Science
                          and Center for Cognitive Science
                       State University of New York at Buffalo
                                 Buffalo, NY 14260
                              rapaport@cs.buffalo.edu

AUTHOR                          TITLE                                     PAGES

                               Volume 1:

Bookman, L. A.,
  & Alterman, R.       Analog Semantic Features                           27-28
Alvarado, S. J.        Argument Comprehension                             30-52
Kucera, H.             Brown Corpus                                     128-130
Srihari, S. N.,
  & Hull, J. J.        Character Recognition                            138-150
Ballard, B.,
  & Jones, M.          Computational Linguistics                        203-224
Hardt, S. L.           Conceptual Dependency                            259-265
Hindle, D.             Deep Structure                                   328-330
Ingria, R.;
  Boguraev, B.;
  & Pustejovsky,J.     Dictionary/Lexicon                               341-365
Scha, R.;
  Bruce, B. C.;
  & Polanyi,L.         Discourse Understanding                          365-379
Tennant, H.            Ellipsis                                         445-446
Novak, V.              Fuzzy Logic: Applications to Natural Language    515-521
Woods, W. A.           Grammar, Augmented Transition Network            552-563
Bruce, B.,
  & Moser, M. G.       Grammar, Case                                    563-570
Gazdar, G.             Grammar, Generalized Phrase Structure            570-573
Joshi, A. K.           Grammar, Phrase Structure                        573-580
Burton, R.             Grammar, Semantic                                580-583
Bateman, J. A.         Grammar, Systemic                                583-592
Mallery, J. C.;
  Hurwitz, R.;
  & Duffy,G.           Hermeneutics                                     596-611
Hill, J. C.            Language Acquisition                             761-772
Fass, D.,
  & Pustejovsky, J.    Lexical Decomposition                            806-812
Pustejovsky, J.        Lexical Semantics                                812-819

                               Volume 2:

Nagao, M.              Machine Translation                              898-902
Klavans, J. L.,
  & Tzoukermann, E.    Morphology                                       963-972
McDonald, D. D.        Natural-Language Generation                      983-997
Carbonell, J. G.,
  & Hayes, P. J.       Natural-Language Understanding                  997-1016
Petrick, S.            Parsing                                        1099-1109
Small, S. L.           Parsing, Word-Expert                           1109-1116
Wilks, Y.,
  & Fass, D.           Preference Semantics                           1183-1194
Cruse, D. A.           Presupposition                                 1194-1201
Dyer, M. G.;
  Cullingford, R. E.;
  & Alvarado, S. J.    Scripts                                        1443-1460
Sowa, J. F.            Semantic Networks                              1493-1511
Devlin, K. J.          Situation Theory and Situation Semantics       1541-1547
Briscoe, E. J.         Speech Recognition                             1553-1559
Norvig, P.             Story Analysis                                 1568-1576
Alterman, R.           Text Summarization                             1579-1587
Sparck Jones, K.       Thesaurus                                      1605-1613
Knight, K.             Unification                                    1630-1636

                  Additional articles from the 1st edition (1987):

Coelho, H.             Grammar, Definite Clause                         339-342
Berwick, R.            Grammar, Transformational                        353-361
Newmeyer, F. J.        Linguistics, Competence and Performance          503-508
Wilks, Y.              Machine Translation                              564-571
Tennant, H.            Menu-Based Natural Language                      594-597
Koskenniemi, K.        Morphology                                       619-620
Bates, M.              Natural-Language Interfaces                      655-660
Riesbeck, C. K.        Parsing, Expectation-Driven                      696-701
Keyser, S. J.          Phonemes                                         744-746
Webber, B.             Question Answering                               814-822
Smith, B. C.           Self-Reference                                 1005-1010
Hirst, G.              Semantics                                      1024-1029
Woods, W.              Semantics, Procedural                          1029-1031
Allen, J. F.           Speech Acts                                    1062-1065
Allen, J.              Speech Recognition                             1065-1070
Allen, J.              Speech Synthesis                               1070-1076
Briscoe, E. J.         Speech Understanding                           1076-1083
Lehnert, W. G.         Story Analysis                                 1090-1099

[7] Commercial sites
--------------------

  [7-1] Machine Translation

      Globalink, Inc
      9302 Lee Highway
      Fairfax, VA, 22031, USA
      Tel: +1 703 273 5600
      Fax: +1 703 273 3866

      Archers Translation Services
      203-205 Desborough Road
      High Wycombe, Bucks., HP11 2QL, UK
      Tel: +44 494 537755
      Fax: +44 494 474001


[8] Corpora
-----------

 Corpora are large sets of natural language documents (text or speech).
 They can usually be obtained through ftp, or sometimes on CD-ROM. Please
 read below for details.

 0001 Word list

    site:      gatekeeper.dec.com
    directory: pub/misc/stolfi-wordlists
    moderator: Jorge Stolfi

 0002 Word list

    site:      ftp.cs.vu.nl
    directory: /dictionaries

 0003 CELEX

     To order LDC materials, send mail to ldc@unagi.cis.upenn.edu
     or fax your order to (215) 573-2175. If you require additional
     information before placing your order, please call (215) 898-0464.

 0004 Word list

     site:     wocket.vantage.gte.com
     directory: /pub/standard_dictionary

 0005 Russian corpus

     mail:     ingrid.maier@slaviska.uu.se

 0006 Brown corpus

     no info for now...

[9] Miscellaneous
-----------------

   [9-1] About this FAQ

    This FAQ is maintained by Dragomir R. Radev from Columbia University.
    Please send me all your comments, suggestions, corrections, additions, and
    such to my e-mail address:

    radev@cs.columbia.edu

   [9-2] Large parts of sections 4-1, 4-4, and 6-1 come from Mark Kantrowitz's
    comp.ai FAQ

   [9-3] Partial list of contributors (in alphabetical order):

    Paul Buitelaar        paulb@zag.cs.brandeis.edu
    Russell Collingham    R.J.Collingham@durham.ac.uk
    Robert Dale           rdale@microsoft.com
    Joshua Goodman        goodman@das.harvard.edu
    Malcolm Grandis       Malcolm@celtic.demon.co.uk
    Graeme Hirst          gh@cs.toronto.ca
    Mark Kantrowitz       mkant+ai-faq@cs.cmu.edu
    Alberto Lavelli       lavelli@irst.it
    Ashwin Ram            ashwin@cc.gatech.edu
    William J. Rapaport   rapaport@cs.buffalo.edu
    Hinrich Schuetze      schuetze@Sante.Stanford.EDU
    Kevin Thomas          kevint@cdplus.com
    Gertjan van Noord     vannoord@let.rug.nl

--
Dragomir R. Radev                                Graduate Research Assistant
Natural Language Processing Group          Columbia University CS Department
Office: (212) 939-7121     Lab: (212) 939-7108          Home: (212) 749-9770
                     http://www.cs.columbia.edu/~radev