Last-Modified: Feb 12, 1995 18:00 EST
Posting-Frequency: Monthly
Version: 0.03
Archive-Name: natural-lang-processing-faq
This is the new draft of a FAQ (frequently asked questions and answers)
list for the comp.ai.nat-lang newsgroup. The main reason for posting it now
is for me to get as much feedback as possible before I go any further.
Please don't hesitate to send me any comments, be they positive or negative.
There are many blank spots in the FAQ, please help fill them.
Copyright (c) 1994 Dragomir R. Radev. All rights reserved.
Permission to distribute this FAQ by all volatile electronic means
(mailing lists, FTP, HTTP, Usenet news, etc.) is hereby given under
the restriction that the file is not modified and all disclaimers and
acknowledgements remain intact.
This permission does NOT apply to CD-ROMS and/or commercial printed
publications. All requests for republication in this case should
be referred to the FAQ maintainer (radev@cs.columbia.edu)
Version: 0.03
TABLE OF CONTENTS
=================
[1] General Information
[2] History
[3] Studies and Research
[4] Contacts
[5] How-to questions
[6] Literature
[7] Commercial Sites
[8] Corpora
[9] Miscellaneous
Disclaimers and Notes
---------------------
1. Please read this FAQ list before posting to comp.ai.nat-lang
2. The FAQ is a collection of materials, rather than a complete reference.
Some of the information may be out of date, so please be careful and
take everything with a grain of salt. Unless an article contains
explicit information about when it was last updated, it is older than
February 1, 1994.
3. I don't assume any responsibility for wrong information.
4. I need suggestions as to what parts of the FAQ to keep, what parts to
move to the ftp site and what new parts to include.
5. The maintainer of this list is Dragomir R. Radev (radev@cs.columbia.edu)
6. Any comments and corrections are more than welcome. So are
contributions. Please help make the FAQ really helpful and interesting.
[1] General Information
-----------------------
[1-0] What is this FAQ all about
A: This is an attempt to put together a list of frequently (and not so
frequently) asked questions about Natural Language Processing and their
answers. This document is in no way perfect or complete or 100 % accurate.
In no way should the maintainer be responsible for damage resulting
directly or indirectly from using information in this FAQ.
The following questions and answers have been written by Mark Kantrowitz
(mkant+ai-faq@cs.cmu.edu): 4-2, 4-4, 6-1
[1-1] What is NLP
A: Natural Language Processing - definitions coming up :)
[1-2] What is comp.ai.nat-lang
A: Here follows the original charter for comp.ai.nat-lang.
Name: comp.ai.nat-lang
Moderation: This group will be unmoderated.
Purpose: To discuss issues relating to natural language, especially
computer-related issues from an AI viewpoint. The topics
that will be discussed in this group will concentrate on, but
are not limited to, the following:
* Natural Language Understanding
* Natural Language Generation
* Machine Translation
* Dialogue and Discourse Systems
* Natural Language Interfaces
* Parsing
* Computational Linguistics
* Computer-Aided Language Learning
This group will avoid discussing issues that are more properly
covered by other newsgroups. For example, speech synthesis
should be discussed in comp.speech. However, due to the
interdisciplinary nature of the field, there may be overlap in
material between other groups. To try to keep this to a
minimum, topics should pertain to computer-related aspects
of natural language.
Rules of Decorum: Because of the unmoderated format, anyone with access to
this newsgroup will be able to post without review.
This is meant to encourage discussion of the topics.
Please refrain from "flames" or unnecessary criticism
of a person's viewpoints or personality in a harsh
or insulting manner. Criticisms should constructive
and polite whenever possible.
Intended Audience: The following is a repost of Terry Gaasterland's opinion
on how this newsgroup would fit in to the other
alternative newsgroups and mailing lists.
[1-3] How to get this FAQ
A: This FAQ is available currently from the comp.ai.nat-lang newsgroup.
The current copy can also be retrieved from the following HTTP:
http://www.cs.columbia.edu/~acl/nlpfaq.txt
Soon, it will be also available from comp.answers, news.answers,
as well as by anonymous ftp
[2] History
-----------
[2-1] What are the major accomplishments of the field
Note: This section is in a very preliminary stage.
[2-1-2] Some important theses & systems:
Overall:
Woods (1967), Procedural semantics
Thorne et al. and Woods (1968-70), ATNs
Winograd (1970), Shrdlu
Woods et al. (1972), LSNLIS / Lunar
Charniak (1972), Frames and demons
Grosz (1977), Focus in task-oriented dialogues
Marcus (1977), Deterministic parsing
Davey (1978)
Cohen, Phil (1979), Planning speech acts
Allen (1980), Understanding speech acts
McDonald (1980)
McKeown (1982), TEXT
Appelt (1982)
Pollack (1986), Plan inference
Conceptual Dependency:
Schank (1969), Conceptual Dependency
Schank, Riesbeck, Rieger, Goldman (1975), MARGIE
Cullingford (1979), SAM
Wilensky (1979), PAM
DeJong (1980), FRUMP
Lebowitz (1980), IPP
Dyer (1982), BORIS
Lytinen (1986), MOPTRANS
Hovy (1986), PAULINE
Ram (1989), AQUA
Dehn (1989), AUTHOR/STARSHIP
[3] Studies and Research
------------------------
[3-1] Which schools offer graduate programs in CL/NLP
A: This list is, *of course*, completely preliminary. Please send me
information about other programs. I will try and get in touch with the
editors of the ACL guide to Graduate Programs in CL for more information.
Universities are given in alphabetical order. If a certain university
is not included now and you feel it must be included, please send me
some information about it.
Australia:
Melbourne, University of
Microsoft Institute of Advanced Software Technology in association with
Macquarie University
Canada:
Montreal, University of
Toronto, University of
Waterloo, University of
Finland:
Helsinki, University of
France:
Paris 7, Jussieu, University of
Germany:
Bonn, University of
Karlsruhe, University of
Koblenz-Landau, University of
Saarlandes, University of the
Stuttgart, University of
Tuebingen, University of
Italy:
Pisa, University of
Trento, University of
Japan:
Kyoto University
Korea:
Pohang University of Science and Technology, Pohang
Netherlands:
Amsterdam, University of
Groningen, University of
Nijmegen, University of
Tilburg, University of
Utrecht, University of
Sweden:
Stockholm, University of
UK:
Brighton, University of
Cambridge, University of
Durham, University of
Edinburgh, University of
Sussex, University of
USA:
Brown University
Buffalo, SUNY at
California at Berkeley, University of
California at Los Angeles, University of
Carnegie-Mellon University
Columbia University
Delaware, University of
Duke University
Georgetown University
Georgia, University of
Georgia Institute of Technology
Harvard University
Indiana University
Johns Hopkins University
Massachusetts at Amherst, University of
Massachusetts Institute of Technology
New Mexico State University
New York University
Pennsylvania, University of
Rochester, University of
Southern California, University of
Stanford University
SUNY, Buffalo
Yale University
[3-2] How to apply to graduate school in CL/NLP
[3-2-1]How to apply to graduate school in CL/NLP in the USA
Usually, the best timetable is as follows (given that M is the month
when your studies would start, usually, in September)
M - 24 : Try to clarify your interests, is it really NLP
that you are interested in, what possible
subfields might be of interest to you, etc.
Remember: 5 years working in an area you are
not interested in will be a very painful
experience.
M - 18 : Read publications in the area of your interest
in order to discover the best places for
you to apply in terms of research, and
professors.
Remember: Unless you are familiar with the most
current research, you will not be able
to find the best place for you.
M - 18 : Go to your local library and consult some of the
available directories (see [3-3]) - write down
as much information as you can about some
15-25 universities. These universities form your
preliminary list.
Remember: There are some 100 universities in the
USA offering NLP/CL programs. Some of them
will be more attractive to you than others.
M - 18 : Talk to your advisers at school, talk to other
students, post questions on the Internet.
This way you will get advice on a few more univer-
sities that you might have skipped until this moment.
Remember: Others have faced what you are going
through. Use their experience.
M - 15 : Send letters to the universities that you have
on your preliminary list. Make sure you indicate
when do you want to start, what degree (MA, MS,
Ph.D.) you are interested in, whether or not
you will be applying for financial aid, whether
you will need some special visa...
Remember: Ask for all the information that you
need, give them all the information they'd
need to satisfy your request.
M - 12 : Read carefully the information that you have
received from the universities. Shorten your list
of places to the number that you will eventually
apply to (usually 5-8 is a good number). Make
Remember: Make sure you include both your best choice
schools and some places where you are almost
certain of getting accepted.
M - 10 : Fill in all the forms that are sent to you,
ask your professors to send reference letters to
the schools directly.
Remember: Professors will be probably very busy
at that time of the year (any time of
the year...) Give them the reference forms
as early as possible and make sure you
specify a reasonable time for them to fill
them in and send them out.
M - 10 : (or earlier) - take the necessary tests (GRE,
TOEFL, or others) that the schools want. Make sure
you tell the testing service which universities
you want them to send your scores to.
Remember: Time yourself through several practice
tests. The GRE General test, for example,
is more about mastery of timing than knowledge.
M - 9 : (approximately) - mail your forms to the schools,
preferably 2-3 weeks before the deadlines.
Remember: You don't want your applications to get there
at the same time as everyone else. Give the
admissions committee some extra time to
review your application
M - 6 : usually six months before the beginning of the semester
that you are applying for, you will get a letter
saying whether you have been accepted.
Remember: Usually, thick letters, e-mails, and telegrams
mean acceptance. Thin one-sheet letters will
most likely be disappointing for you.
M - 5 : now, you have been accepted to a few schools. Go back
to the same resources that you used when you were
deciding where to apply (journals, catalogs, directo-
ries, professors, etc.). Ask the schools that accepted
you to fly you in for a visit (many will do this).
Remember: Don't forget non-academic factors such as
location, financial aid, the athmosphere in
the department, etc.
[3-3] Where to get information on graduate programs
A: The Peterson's Guide
A: The ACL Directory of Graduate Programs in Computational Linguistics
[3-4] Major non-academic research laboratories
AT&T Bell Labs, Murray Hill, NJ
BBN Systems and Technologies Corporation
Bellcore, Morristown, NJ
DFKI (German research center for AI)
General Electric
IRST, Italy
IBM T.J. Watson Research, Yorktown Heights, NY
Microsoft Research, Redmond, WA
NEC Corporation
SRI International, Menlo Park, CA
SRI International, Cambridge, UK
Xerox, Palo Alto, CA
Xerox, Grenoble, France
[4] Contacts
------------
[4-1] What major publications exist in the field
Computational Linguistics, published by ACL - Julia Hirschberg, Editor
PUBLISHED QUARTERLY BY MIT PRESS JOURNALS:
(617) 253-2889 (PHONE), (617) 258-6779 (FAX), or
JOURNALS-ORDERS@MIT.EDU. _Institutional_ Orders Only.
Computer Speech and Language
Journal of Natural Language Engineering
_Natural Language Engineering_ is to be published four
times a year in March, June, September and December.
For more information or to submit a paper, please con-
tact Roberto Garigliano, Laboratory for Natural Lan-
guage Engineering, Computer Science Dept., University of
Durham, South Road, Durham DH1 3LE, UK, Tel: +4 91
372639, Fax: +44 91 374 2560, <Roberto.Garigliano@
durham.ac.uk>,
Language - Sarah Thomason, Editor
Linguistic Inquiry - Samuel Jay Keyser, Editor
PUBLISHED QUARTERLY BY MIT PRESS JOURNALS:
(617) 253-2889 (PHONE), (617) 258-6779 (FAX),
or JOURNALS-ORDERS@MIT.EDU.
Machine Translation, published by Kluwer
Natural Language and Linguistic Theory
Speech Communication
Elsevier Science B.V.
Journals Department
P.O. Box 211, 1000 AE
Amsterdam
[4-2] Electronic mailing lists
Michael Everson <everson@irlearn.ucd.ie> has updated his
List of Language Lists. FTP LNGLST15.TXT from /everson
on <colossus.ucd.ie>.
Information Retrieval:
irlist <ir-l%uccvma.bitnet@vm1.nodak.edu>
Natural Language and Knowledge Representation (moderated):
nl-kr@cs.rpi.edu (formerly nl-kr@cs.rochester.edu)
Gatewayed to the newsgroup comp.ai.nlang-know-rep.
Natural Language Generation:
siggen@black.bgu.ac.il
LFG (Lexical-Functional Grammar):
majordomo@list.stanford.edu
Parsing:
sigparse@cs.cmu.edu
Statistics, Natural Language, and Computing:
empiricists@csli.stanford.edu
Colibri (weekly update on Conferences, Seminars, Jobs and Shareware in
NLP and speech)
colibri-request@let.ruu.nl
Dependency Grammar
dg@ai.uga.edu
Prosody:
listserv@purccvm.bitnet
TEI:
tei-l
Text Analysis and Natural Language Applications:
SCHOLAR@CUNYVM.BITNET
Text Corpora:
corpora-request@nora.hd.uib.no
Speech production and perception:
foNETiks <fonetiks@mailbase.ac.uk>
LN:
ln@frmop11.bitnet
Linguist:
linguist@tamvm1.tamu.edu
ELSNET:
elsnet-list@cogsci.ed.ac.uk
Eastern (European) Language Engineering list:
to join, send mail to poul_andersen@eurokom.ie
Preprint archive mailing list
For further information about (among other topics) submission of papers to
the server, subscribing or canceling your subscription, requesting full
text of any of the papers above, retrieving macro files for these papers,
searching past listings, or submitting comments to the server operators,
send a message:
To: CMP-LG@XXX.LANL.GOV
Subject: help
[4-3] Newsgroups
alt.usage.english English grammar, word usages, and related topics.
comp.ai.nat-lang Natural language processing by computers.
comp.ai.nlang-know-rep Natural Language and Knowledge Representation.
(Moderated)
comp.speech Research & applications in speech science &
technology.
sci.lang Natural languages, communication, etc.
alt.etext Electronic texts.
comp.text.sgml ISO 8879 SGML structured documents markup languages
[4-4] Professional Organizations, Associations
ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL)
To get information about the ACL listserver, send mail to
listserv@cs.columbia.edu
with
index acl-l
in the message body. To get the membership form, include
get acl-l 94membership.form
in the message body. The ACL archive can also be accessed by
anonymous ftp from cs.columbia.edu:/acl-l/.
ASSOCIATION FOR MACHINE TRANSLATION IN THE AMERICAS (AMTA)
655 Fifteenth Street, NW, Suite 310, Washington, DC 20005
AMERICAN ASSOCIATION FOR ARTIFICIAL INTELLIGENCE (AAAI)
COGNITIVE SCIENCE SOCIETY
[4-5] Conferences
COLING - last conference - Kyoto, Japan (August 94)
ACL - next conference - Cambridge, Massachusetts (Summer 1995)
EACL - next conference - Dublin, Ireland (Spring 1995)
IJCAI - next conference - Montreal, Canada (Summer 1995)
PacLing - next conference - Brisbane, Australia (Spring 1995)
[4-6] Evaluation Competitions
MUC - ARPA Message Understanding Conference
Currently running MUC-6 (1994-95) using text articles from the Wall Street
Journal Corpus. Systems compete in any or all of five categories including,
named entity categorisation, word sense disambiguation, mini-MUC (contents
scanning, template filling), coreference identification, predicate-argument
identification.
TREC - ARPA Text Retrieval Conference
Information retrieval using NLP/statistical techniques.
[5] How-to questions
--------------------
[5-1] How to join a mailing list
A: Most often, you have to send mail to the listserver at the site where
the mailing list resides, and put "subscribe <listname> <yourname> in the
body of the mail message. The underlined text is what you have to type in.
Example:
Mail listserv@tamvm1.tamu.edu
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Subject: some text here
^^^^^^^^^^^^^^
subscribe LINGUIST Dragomir R. Radev
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
.
^
[5-2] How to obtain files by anonymous ftp
A: There are many ways. The most common way, however, is using a local ftp
client.
Suppose you want to get the file /pub/editors/webster.tar.Z
from ftp.uu.net
Here is a sample session. You type in whatever is underlined here.
$ftp ftp.uu.net
^^^^^^^^^^^^^^
Connected to ftp.uu.net.
220 ftp.UU.NET FTP server Thu Apr 14 15:45:10 EDT 1994) ready.
Name (ftp.uu.net:radev): anonymous
^^^^^^^^^
331 Password required for anonymous.
Password: radev@cs.columbia.edu
^^^^^^^^^^^^^^^^^^^^^ (put your email address here)
230 Guest login ok, access restrictions apply.
ftp> cd pub/editors
^^^^^^^^^^^^^^
ftp> binary
^^^^^^
ftp> get webster.tar.Z
^^^^^^^^^^^^^^^^^
200 PORT command successful.
150 Opening BINARY mode data connection for webster.tar.Z (148579 bytes).
226 Transfer complete.
local: webster.tar.Z remote: webster.tar.Z
148579 bytes received in 2.2 seconds (67 Kbytes/s)
ftp> quit
^^^^
$
[5-4] FTP repositories
A: Here follows a list of the most popular FTP sites that carry NLP-related
materials (data, tools, etc.)
* Consortium for Lexical Research (CRL)
The Consortium for Lexical Research is designed to serve as a
repository for software and resources of importance to the natural
language processing research community. Sharable resources, and the
task of centralizing lexical data and tools, are of foremost
concern in lexical research and computational linguistics. It
is our objective to help alleviate the repeated recreation of
basic software tools, and to assist in making essential data
sources more generally available.
CLR maintains a public ftp site, and a separate library of
materials only for members of CLR. Currently CLR has about 60
members, mostly academic institutions, and almost every major
natural language processing center in the U.S. belongs. Access to
the members-only materials is strictly regulated by password and
userid.
Our catalog of current holdings is available by using anonymous
ftp to clr.nmsu.edu (128.123.1.12). The file to 'get' is
"catalog.ps" for a postscript version, or "catalog" for a simple
ascii version.
* Linguistic Data Consortium (LDC)
To order LDC materials, send mail to ldc@unagi.cis.upenn.edu
or fax your order to (215) 573-2175. If you require additional
information before placing your order, please call (215) 898-0464.
* Oxford Text Archive (OTA)
ftp ota.ox.ac.uk
ota/textarchive.list the current catalogue
There are two classes of texts available from this FTP server
(a) texts which are in TEI format and which we can make freely
available (these all appear as category P texts in the shortlist)
(b) texts which are available only under our standard conditions of
use, (these all appear as category U or A in the shortlist)
* University of Michigan Linguistics Archive (UMICH)
ftp linguistics.archive.umich.edu
/linguistics
moderator: John Lawler (jlawler@umich.edu)
* others...
[5-6] Tools available on the Internet
[5-6-1] Parsers
[5-6-1-2] TAG - grammar of English
site: linc.cis.upenn.edu
directory: /pub/xtag
[5-6-2] Generators
[5-6-3] Machine Translation
[5-6-4] Speech
[5-6-4-1] OGI Speech Tools
site: speech.cse.ogi.edu
directory: /pub/tools
[5-6-5] Lexical Tools
[5-6-5-1] Wordnet
site: clarity.princeton.edu
directory: /pub
[5-7] Papers and Technical Reports
[5-7-1] Dissertations
[5-7-1-2] Finch, S.P. Finding Structure in Language
site: scott.cogsci.ed.ac.uk
directory: /pub/statling/Papers/phdThesis.ps.Z
[5-7-1-3]
[5-7-2] Technical reports
[5-7-3] Other on-line papers
http://www.cc.gatech.edu/cogsci/cogsci.html Georgia Tech CogSci home page
http://www.cc.gatech.edu/cogsci/nlr.html Georgia Tech Natural Language
and Reasoning research group
[5-8] WWW and gopher Resources (in no specific order)
[5-8-1] Association for Computational Linguistics Home Page
http://www.cs.columbia.edu/~acl/
[5-8-2] Dutch working group on Computational Linguistics
http://tyr.let.rug.nl/~vannoord/clin/clin.html
[5-8-3] Colibri, Newsletter on Computational Linguistics
http://colibri.let.ruu.nl
[5-8-4] Georgetown University Catalogue of Projects in Electronic Text
gopher://gopher.georgetown.edu/11gopher_root%3a%5bcpet_projects_in_electronic_text%5d
[5-8-5] The English Server at Carnegie-Mellon University
http://english-server.hss.cmu.edu/FrontDoor.html/
[5-8-6] Project for American and French Research of the Treasury of the
French Language, University of Chicago (ARTFL)
http://tuna.uchicago.edu/ARTFL
[5-8-7] Computational Phonology, University of Edinburgh
http://ftp.cogsci.ed.ac.uk/phonology/CompPhon.html
[5-8-8] New Mexico State University Computing Research Laboratory (CRL)
http://crl.nmsu.edu/Home.html
[5-8-9] The Carnegie-Mellon University AI repository
http://www.cs.cmu.edu:8001/Web/Groups/AI/html/repository.html
[5-8-10] The Computation and Language E-Print Archive
http://xxx.lanl.gov/cmp-lg/
[5-8-11] The Journal of Artificial Intelligence Research
http://www.cs.washington.edu/research/jair/home.html
[5-8-12] ACL-94 On-line conference proceedings.
http://xxx.lanl.gov/cmp-lg/ACL-94-proceedings.html
[5-8-13] The Natural Language Software Registry
http://cl-www.dfki.uni-sb.de/cl/registry/draft.html
[5-8-14] Language and Linguistics Directory, Rice University
gopher://chico.rice.edu/11/Subject/Language
[5-8-15] A text-to-speech demo, University of Twente
http://www_tios.cs.utwente.nl:8001/say/
[5-8-16] SRI International Natural Language WWW Page
http://www.ai.sri.com/aic/natural-language/natural-language.html
[5-8-17] Survey of Language Engineering Organisations in Central and
Eastern Europe
http://www.cogsci.ed.ac.uk/elsnet/survey/survey.html
[5-9] Other comprehensive documents
[5-9-1] Software Registries
[5-9-1-1] Natural Language Software Registry (from DFKI)
site: ftp.dfki.uni-sb.de
directory: pub/registry
[5-9-1-2] Natural Language Software Registry (from CRL)
site: crlftp.nmsu.edu
directory: pub/non-lexical/NL_Software_Registry
[5-9-1-3] Another list of NLP software
site: ftphost.uni-koblenz.de
directory: outgoing
files: software_list.ps.Z
[6] Literature
--------------
[6-1] What are some important books in NLP
General:
Gazdar, G. and Mellish, C., "Natural Language Processing in Lisp:
An Introduction to Computational Linguistics", Addison-Wesley,
Reading, Massachusetts, 1989. (There are three different editions
of the book, one for Lisp, one for Prolog, and one for Pop-11.)
Michael A. Covington, "Natural Language Processing for Prolog
Programmers", Prentice-Hall, Englewood Cliffs, NJ, 1994. ISBN
0-13-629213-5.
Grosz, Barbara J., Sparck-Jones, Karen, and Webber, Bonnie L.,
"Readings in Natural Language Processing", Morgan Kaufmann
Publishers, Los Altos, CA, 1986, 664 pages. ISBN 0-934613-11-7, $44.95.
Robert C. Berwick, "Computational Linguistics", MIT Press,
Cambridge, MA, 1989, ISBN 0262-02266-4.
Brady, Michael, and Berwick, Robert C., "Computational Models
of Discourse", MIT Press, Cambridge, MA, 1983.
Allen, James F., "Natural Language Understanding", The
Benjamin/Cummings Publishing Company, Menlo Park, California,
(Addison-Wesley Publishing Company, Reading, Massachusetts),
1988, 550 pages, ISBN 0-8053-0330-8. [A new edition came out in 1994]
Code for the book is available from
ftp.cs.cmu.edu:/user/ai/areas/nlp/bookcode/allen/
Terry Winograd, "Language as a Cognitive Process", Addison-Wesley,
Reading, MA, 1983.
Schank, R. and Abelson, R. "Scripts, Plans, Goals, and Understanding,"
Lawrence Erlbaum Associates, Hillsdale, New Jersey, 1977.
Terminology:
David Crystal, "A Dictionary of Linguistics and Phonetics", 3rd Edition,
Basil Blackwell Publishers, New York, 1991.
Parsing:
Tomita, M. (Editor), "Current Issues in Parsing Technology",
Kluwer Academic Publishers, Norwell, MA, 1991.
Marcus, M. "A Theory of Syntactic Recognition for Natural Language,"
The MIT Press, Cambridge, MA, 1980.
Pereira, F. and Sheiber, S. "Prolog and Natural-Language Analysis,"
Center for the Study of Language and Information, 1987.
Probabilistic Parsing:
Ted Briscoe and John Carroll, "Generalised Probabilistic LR Parsing of
Natural Language (Corpora) with Unification-based Grammars",
University of Cambridge Computer Laboratory, Technical Report Number
224, 1991.
Zhi Biao Wu, Loke Soo Hsu, and Chew Lim Tan, "A Survey of Statistical
Approaches to Natural Language Processing", Technical report TRA4/92,
Department of Information Systems and Computer Science, National
University of Singapore, 1992
Natural Language Understanding:
Dyer, M. "In-Depth Understanding: A Computer Model of Integrated
Processing for Narrative Comprehension," MIT Press, Cambridge, MA, 1983.
Aravind Joshi, Bonnie Webber and Ivan Sag, "Elements of Discourse
Understanding", Cambridge University Press, New York, 1981.
Cohen, P. R., Morgan, J. and Pollack, M., editors, "Intentions in
Communication", MIT Press, Cambridge, MA, 1990.
Natural Language Interfaces:
Raymond C. Perrault and Barbara J. Grosz, "Natural Language
Interfaces", Annual Review of Computer Science, volume 1, J.F. Traub,
editor, pages 435-452, Annual Reviews Inc., Palo Alto, CA, 1986.
Natural Language Generation:
McKeown, Kathleen R. and Swartout, William R., "Language
Generation and Explanation", in Zock, M. and Sabah, G.,
editors, Advances in Natural Language Generation, Volume 1, Pages
1-51, Ablex Publishing Company, Norwood, NJ, 1988. (Overview of
the state of the art in natural language generation.)
There are several books published as a result of the international
workshops on natural language generation.
Speech:
John Allen, Sharon Hunnicut and Dennis H. Klatt, "From Text to Speech:
The MITalk System", Cambridge University Press, 1987. [Synthesis,
precursor of DECtalk.]
Frank Fallside and William A. Woods (editors), "Computer Speech Processing"
Prentice Hall, Englewood Cliffs, NJ, 1985.
X. D. Huang, Y. Ariki and M. A. Jack, "Hidden Markov Models for Speech
Recognition", Edinburgh University Press, 1990. [Analysis]
A. Nejat Ince (editor), "Digital Speech Processing: Speech Coding,
Synthesis, and Recognition", Kluwer Academic Publishers, Boston,
1992. [Analysis and Synthesis]
Kai-Fu Lee, "Automatic Speech Recognition: The Development of the
SPHINX System", Kluwer Academic Publishers, Boston, MA, 1989. [Analysis]
Douglas O'Shaughnessy, "Speech Communication: Human and Machine"
Addison-Wesley, MA, 1987. [Analysis and Synthesis]
Lawrence R. Rabiner and Ronald W. Schafer, "Digital Processing of
Speech Signals", Prentice Hall, Englewood Cliffs, NJ, 1978.
[Analysis and Synthesis]
Lawrence R. Rabiner and Biing-Hwang Juang, "Fundamentals of Speech
Recognition", Prentice Hall, Englewood Cliffs, NJ, 1993.
ISBN 0-13-015157-2. [Analysis]
Ronald W. Schafer and John D. Markel (editors), "Speech Analysis",
IEEE Press, New York, 1979. [Analysis]
Alex Waibel and Kai-Fu Lee (editors), "Readings in Speech Recognition"
Morgan Kaufmann Publishers, San Mateo, CA, 1990, 680 pages.
ISBN 1-55860-124-4, $49.95. [Analysis]
Alex Waibel, "Prosody and Speech Recognition", Morgan Kaufmann
Publishers, San Mateo, CA, 1988. [Analysis]
Machine Translation:
W. John Hutchins and Harold L. Somers, "An Introduction to Machine
Translation", Academic Press, San Diego, 1992. 362 pages, ISBN
0-123-62830-X.
Bonnie J. Dorr, "Machine Translation: A View from the Lexicon" MIT
Press, Cambridge, MA 1993. 432 pages, ISBN 0-262-04138-3.
Kenneth Goodman and Sergei Nirenburg., editors, "The KBMT Project: A
Case Study in Knowledge-Based Machine Translation", Morgan Kaufmann
Publishers, San Mateo, CA, 1991. 331 pages, ISBN 1-558-60129-5, $34.95.
Arnold, D.J.; Balkan, L.; Lee Humphreys, R.; Meijer, S.; and Sadler, L.
(1994). Machine Translation: An Introductory Guide. NCC Blackwell.
The journal "Machine Translation" is the principle forum for
current research.
A review of MT systems on the market appeared in BYTE 18(1), January 1993.
Reversible Grammars:
Tomek Strzalkowski, editor, "Reversible Grammar in Natural Language
Processing", Kluwer Academic Publishers, 1993.
Proceedings of the ACL Workshop on Reversible Grammar in Natural
Language Processing, UC Berkeley, 1991. (See especially Remi
Zajac's paper.)
Statistical Processing:
Eugene Charniak, "Statistical Language Learning", MIT Press, Cambridge,
Massachusetts, 1993, 170 pages.
Linguistics:
Vivian J. Cook, "Chomsky's Universal Grammar: An Introduction", Basil
Blackwell Publisher, New York, 1988, 201 pages.
Victoria Fromkin and Robert Rodman, "An Introduction to Language",
Holt, Rinehart, and Winston, New York, 4th edition, 1988, 474 pages.
Ralph Grishman, "Computational Linguistics: An Introduction",
Cambridge University Press, New York, 1986, 193 pages.
Liliane M.V. Haegeman, "Introduction to Government and Binding
Theory", Basil Blackwell Publishers, Oxford, 1991, 618 pages.
Michael A. K. Halliday, "An Introduction to Functional Grammar",
Edward Arnold, London, 1985.
Geoffrey C. Horrocks, "Generative Grammar", Longman, London, 1987,
339 pages.
Andrew Radford, "Transformational Grammar: A First Course", Cambridge
University Press, New York, 1988, 625 pages.
Categorial Grammar (CG):
M. Moortgat, "Categorial Investigations. Logical and Linguistic
Aspects of the Lambek Calculus", Groningen-Amsterdam Studies in
Semantics:9, Foris, Dordrecht, Holland, 1988.
Richard T. Oehrle, Emmon Bach and Deirdre Wheeler, "Categorial
Grammars and Natural Language Structures", Studies in Linguistics
and Philosophy:32, D. Reidel Publishing Company, Dordrecht, 1988.
Mary McGee Wood, "Categorial Grammars", Linguistic Theory Guides,
Routledge, London, 1993.
Cognitive Grammar:
Ronald W. Langacker, "Foundations of cognitive grammar" Stanford
University Press, 1987.
Miscellaneous:
_The Mulltilingual PC Directory_. By Ian Tresman. 254pp.
Stamford CT: Knowledge Computing Ltd.
Stefan Wermter, Hybrid connectionist natural language processing
Chapman & Hall Inc, 1995.
Connectionist approaches to natural language processing.
Edited by Ronan G. Reilly and Noel E. Sharky.
Earlsdale, 1992 ISBN 0-86377-179-3
_Natural Language Processing_. Ed. Fernando C.N. Pereira and
Barbara J. Grosz. A Bradford Book. Cambridge, MA, and London:
The MIT Press, 1994. Rptd from _Artificial Intelligence: An
International Journal_, Volume 63, Numbers 1-2 (1993).
_Research in Humanities Computing 1: Selected Papers
from the ALLC/ACH Conference, Toronto, June 1989_.
Ed. Ian Lancashire. Oxford: Clarendon Press, 1991.
Peter D. Smith, _An Introduction to Text Processing_.
Cambridge MA and London: The MIT Press, 1990.
ISBN 0-262-19299-3.
Computer processing of natural language
Author Gilbert K Krulee
published Prentice Hall
ISBN 0-13-610299-3
[6-2] Encyclopedia of Artificial Intelligence
A GUIDE TO COMPUTATIONAL LINGUISTICS ARTICLES IN
THE ENCYCLOPEDIA OF ARTIFICIAL INTELLIGENCE, 2nd Edition
Stuart C. Shapiro (editor) (John Wiley & Sons, 1992)
compiled by:
William J. Rapaport
Department of Computer Science
and Center for Cognitive Science
State University of New York at Buffalo
Buffalo, NY 14260
rapaport@cs.buffalo.edu
AUTHOR TITLE PAGES
Volume 1:
Bookman, L. A.,
& Alterman, R. Analog Semantic Features 27-28
Alvarado, S. J. Argument Comprehension 30-52
Kucera, H. Brown Corpus 128-130
Srihari, S. N.,
& Hull, J. J. Character Recognition 138-150
Ballard, B.,
& Jones, M. Computational Linguistics 203-224
Hardt, S. L. Conceptual Dependency 259-265
Hindle, D. Deep Structure 328-330
Ingria, R.;
Boguraev, B.;
& Pustejovsky,J. Dictionary/Lexicon 341-365
Scha, R.;
Bruce, B. C.;
& Polanyi,L. Discourse Understanding 365-379
Tennant, H. Ellipsis 445-446
Novak, V. Fuzzy Logic: Applications to Natural Language 515-521
Woods, W. A. Grammar, Augmented Transition Network 552-563
Bruce, B.,
& Moser, M. G. Grammar, Case 563-570
Gazdar, G. Grammar, Generalized Phrase Structure 570-573
Joshi, A. K. Grammar, Phrase Structure 573-580
Burton, R. Grammar, Semantic 580-583
Bateman, J. A. Grammar, Systemic 583-592
Mallery, J. C.;
Hurwitz, R.;
& Duffy,G. Hermeneutics 596-611
Hill, J. C. Language Acquisition 761-772
Fass, D.,
& Pustejovsky, J. Lexical Decomposition 806-812
Pustejovsky, J. Lexical Semantics 812-819
Volume 2:
Nagao, M. Machine Translation 898-902
Klavans, J. L.,
& Tzoukermann, E. Morphology 963-972
McDonald, D. D. Natural-Language Generation 983-997
Carbonell, J. G.,
& Hayes, P. J. Natural-Language Understanding 997-1016
Petrick, S. Parsing 1099-1109
Small, S. L. Parsing, Word-Expert 1109-1116
Wilks, Y.,
& Fass, D. Preference Semantics 1183-1194
Cruse, D. A. Presupposition 1194-1201
Dyer, M. G.;
Cullingford, R. E.;
& Alvarado, S. J. Scripts 1443-1460
Sowa, J. F. Semantic Networks 1493-1511
Devlin, K. J. Situation Theory and Situation Semantics 1541-1547
Briscoe, E. J. Speech Recognition 1553-1559
Norvig, P. Story Analysis 1568-1576
Alterman, R. Text Summarization 1579-1587
Sparck Jones, K. Thesaurus 1605-1613
Knight, K. Unification 1630-1636
Additional articles from the 1st edition (1987):
Coelho, H. Grammar, Definite Clause 339-342
Berwick, R. Grammar, Transformational 353-361
Newmeyer, F. J. Linguistics, Competence and Performance 503-508
Wilks, Y. Machine Translation 564-571
Tennant, H. Menu-Based Natural Language 594-597
Koskenniemi, K. Morphology 619-620
Bates, M. Natural-Language Interfaces 655-660
Riesbeck, C. K. Parsing, Expectation-Driven 696-701
Keyser, S. J. Phonemes 744-746
Webber, B. Question Answering 814-822
Smith, B. C. Self-Reference 1005-1010
Hirst, G. Semantics 1024-1029
Woods, W. Semantics, Procedural 1029-1031
Allen, J. F. Speech Acts 1062-1065
Allen, J. Speech Recognition 1065-1070
Allen, J. Speech Synthesis 1070-1076
Briscoe, E. J. Speech Understanding 1076-1083
Lehnert, W. G. Story Analysis 1090-1099
[7] Commercial sites
--------------------
[7-1] Machine Translation
Globalink, Inc
9302 Lee Highway
Fairfax, VA, 22031, USA
Tel: +1 703 273 5600
Fax: +1 703 273 3866
Archers Translation Services
203-205 Desborough Road
High Wycombe, Bucks., HP11 2QL, UK
Tel: +44 494 537755
Fax: +44 494 474001
[8] Corpora
-----------
Corpora are large sets of natural language documents (text or speech).
They can usually be obtained through ftp, or sometimes on CD-ROM. Please
read below for details.
0001 Word list
site: gatekeeper.dec.com
directory: pub/misc/stolfi-wordlists
moderator: Jorge Stolfi
0002 Word list
site: ftp.cs.vu.nl
directory: /dictionaries
0003 CELEX
To order LDC materials, send mail to ldc@unagi.cis.upenn.edu
or fax your order to (215) 573-2175. If you require additional
information before placing your order, please call (215) 898-0464.
0004 Word list
site: wocket.vantage.gte.com
directory: /pub/standard_dictionary
0005 Russian corpus
mail: ingrid.maier@slaviska.uu.se
0006 Brown corpus
no info for now...
[9] Miscellaneous
-----------------
[9-1] About this FAQ
This FAQ is maintained by Dragomir R. Radev from Columbia University.
Please send me all your comments, suggestions, corrections, additions, and
such to my e-mail address:
radev@cs.columbia.edu
[9-2] Large parts of sections 4-1, 4-4, and 6-1 come from Mark Kantrowitz's
comp.ai FAQ
[9-3] Partial list of contributors (in alphabetical order):
Paul Buitelaar paulb@zag.cs.brandeis.edu
Russell Collingham R.J.Collingham@durham.ac.uk
Robert Dale rdale@microsoft.com
Joshua Goodman goodman@das.harvard.edu
Malcolm Grandis Malcolm@celtic.demon.co.uk
Graeme Hirst gh@cs.toronto.ca
Mark Kantrowitz mkant+ai-faq@cs.cmu.edu
Alberto Lavelli lavelli@irst.it
Ashwin Ram ashwin@cc.gatech.edu
William J. Rapaport rapaport@cs.buffalo.edu
Hinrich Schuetze schuetze@Sante.Stanford.EDU
Kevin Thomas kevint@cdplus.com
Gertjan van Noord vannoord@let.rug.nl
--
Dragomir R. Radev Graduate Research Assistant
Natural Language Processing Group Columbia University CS Department
Office: (212) 939-7121 Lab: (212) 939-7108 Home: (212) 749-9770
http://www.cs.columbia.edu/~radev