From lingua at geez.org Sun Jul 3 02:31:07 2005 From: lingua at geez.org (Daniel Yacob) Date: Sun Jul 3 02:31:08 2005 Subject: [am-nlp] Some Notes from ACL Semitic NLP Workshop Message-ID: Greetings All, I thought I would send some notes here from the recent ACL conference. There were 4 of us at the conference with a focus on Amharic language, I think this must be a new record for the ACL. Also an indication that our subfield has grown in activity to a level where communication between participants is all the more important, something that I hope this mail list will help support. Our parent field, Semitic NLP has also been a growing area, an assessment shared by participants at the Semitic NLP Workshop at the ACL. Accordingly a special interests group (SIG) for Semitic NLP has formed. Joining the SIG is simply a matter of joining the email list of the SIG, which can be found under home page here: http://www.semitic.tk/ The SIG was announced by the organizers of the Semitic NLP Workshop. The workshop is held every two years at the ACL conference, there was some discussion of the idea that the workshop might break away from the ACL venue and be hosted in some vicinity geographically closer to where the languages are spoken. It was a very nice occassion to meet people that I have communicated with via email and whose papers I have been reading over the years. Photos from ACL 2005: http://yacob.org/ACL2005/ In the same week I had to leave the ACL conference briefly to attend another nearby conference for the Perl language. My Perl conference presentation on Regular Expressions for Syllabaries: http://yacob.org/presentations/yapc-na-2005/ The presentation uses the required "S5" system, which is a single file that applies CSS and JavaScript to emulate PowerPoint (design size is for 1024x768 and viewed in "full screen" browser mode). Notes from the conference: * The Arabic NLP Tutorial was a great idea but a little disappointing because the length of the tutorial ended 4 hours earlier than I expected (at 2PM instead of 6PM). What I got out of it was an overview of Arabic language, writing, and problems that occur (encoding, presentation) in computer environments. Very little time spent on algorithms and analysis techniques which is more of what I was expecting. As an introduction to problems involved when you work with Arabic electronically, it was a good tutorial, I just expected it to go a little further. * Modern Standard Arabic (MSA) is used in international media for the Arab world, but it is not the first language of any region, it is not used in dialog. It seems to be a consensus standard, it is not set by a standards body or fixed authority. Each country will set its own local Arabic standard. * Egypt's Arabic standard is the most respected. Egypt is also a cultural center in part because it produces the most movies and music and has been the most successful in exporting them, and intrinsically its culture. * The Arabic keyboard is standard. * "Buckwalter" is the most used ASCII transliteration system for Arabic. Buckwalter is also a suite of tools for Arabic NLP. * Arabic is most typically processed in transliterated form. Like Amharic presently, but the barriers to processing Amharic in Ethiopic are much fewer than for Arabic. * Arabic writing gained diacritic marks 1,500 years ago when the Koran was written. Arabic was written until then (and still today in advanced form) as a sequence of consonants that were understood in context. The diacritic marks however would disambiguate writing assuring that it would be read aloud universally in the same way. This sounds identical to the story of how written Ge'ez became a syllabary at the time the Bible was written. * Arabic has a standard for punctuation, but no one uses it. * Arabic has different spellings for foreign words. * Overall the methodologies applied to Arabic sound like they could be adapted readily for Amharic. * Arabic NLP is a rapidly evolving field, presentations would rarely reference work prior to 2001. * Links to various projects presented: http://www.research.ibm.com/UIMA/ http://alma.oieau.fr/ http://utool.sourceforge.net/ http://www.mozart-oz.org/ http://textmining.cryst.bbk.ac.uk/acl05/ http://opennlp.sourceforge.net/ From lingua at geez.org Mon Jul 4 15:37:44 2005 From: lingua at geez.org (Daniel Yacob) Date: Mon Jul 4 15:37:48 2005 Subject: [am-nlp] ICES XVI Postponed Until 2007 Message-ID: Greetings, I noticed that the timing of the WOCAL conference coincided with the expected timing for the 16th International Conference of Ethiopian Studies. I spoke with a friend this weekend who was involved with ICES XV and he indicates that ICES XVI, previously scheduled for mid-July 2006 in Rome, has been canceled and will be held instead in Addis Ababa in 2007. I consider this information reliable, it would be reassuring if the conference committee made a formal announcement. I can't seem to find one. thanks, /Daniel