[am-nlp] Workshop on Language Resources and Human Language
Technologies for Semitic Languages
Michael Gasser
gasser at cs.indiana.edu
Sat Feb 6 03:46:08 EAT 2010
ሰላም, everybody.
It would be great if papers on Amharic and Tigrinya were well
represented at this workshop in May. Please consider submitting
something.
Mike Gasser
> CALL FOR PAPERS
> Workshop on Language Resources (LRs) and Human Language Technologies
> (HLT) for Semitic Languages: Status, Updates, and Prospects
>
> To be held in conjunction with the 7th International Language
> Resources and Evaluation Conference (LREC 2010)
>
> 17 May 2010, Mediterranean Conference Centre, Valetta, Malta
> Deadline for submission: 22 February 2010
>
> This workshop serves as the 2010 meeting of the ACL SIG on
> Computational Approaches to Semitic Languages (http://semitic.tk).
>
> Description
>
> The Semitic family includes languages and dialects spoken by a large
> number of native speakers (around 300 million). Prominent members of
> this family are Arabic (and its varieties), Hebrew, Amharic, Tigrinya,
> Aramaic, Maltese and Syriac. Their shared ancestry is apparent through
> pervasive cognate sharing, a rich and productive pattern-based
> morphology, and similar syntactic constructions. In addition, there
> are several languages which are used in the same geographic area such
> as Amazigh or Coptic, which, while not Semitic, have common features
> with Semitic languages, such as borrowed vocabulary.
>
> The recent surge in computational work for processing Semitic
> languages, particularly Modern Standard Arabic (MSA) and Modern Hebrew
> (MH), has brought modest improvements in terms of actual empirical
> results for various language processing components (e.g.,
> morphological analyzers, parsers, named entity recognizers, audio
> transcriptions, etc.). Apparently, reusing existing approaches
> developed for English or French for processing Semitic language
> text/speech, e.g., Arabic parsing is not as straightforward as
> initially thought. Apart from the limited availability of suitable
> language resources, there is increasing evidence that Semitic
> languages demand modeling approaches and annotations that deviate from
> those found suitable for English/French. Issues such as the
> pattern-based morphology, the frequently head-initial syntactic
> structure, the importance of the interface between morphology and
> syntax, and the difference between spoken and written forms
> (especially in Colloquial Arabic(s)) exemplify the kind of challenges
> that may arise when processing Semitic languages. For language
> technologies, such as information retrieval and machine translation,
> these challenges are compounded by sparse data and often result in
> poorer performance than for other languages.
>
> This Workshop intends to follow on topics of paramount importance for
> Semitic-language NLP that were discussed at previous events (LREC,
> MEDAR/NEMLAR Conferences, the workshops of the ACL Special Interest
> Group for Semitic languages, etc.) and which are worth revisiting.
>
> The workshop will bring together people who are actively involved in
> Semitic language processing in a mono- or cross/multilingual context,
> and give them an opportunity to update the community through reports
> on completed or ongoing work as well as on the availability of LRs,
> evaluation protocols and campaigns, products and core technologies (in
> particular open source ones). We also invite authors to address other
> languages spoken in the Semitic language area (languages such as
> Amazigh, Coptic, etc.). This should enable participants to develop a
> common view on where we stand and to foster the discussion of the
> future of this research area. Particular attention will be paid to
> activities involving technologies such as Machine Translation and
> Cross-Lingual Information Retrieval/Extraction, Summarization, etc.
> Evaluation methodologies and resources for evaluation of HLT will be
> also a main focus.
>
> We expect to elaborate on the HLT state of the art, identify problems
> of common interest, and debate on a potential roadmap for the Semitic
> languages. Issues related to sharing of resources, tools, standards,
> sharing and dissemination of information and expertise, adoption of
> current best practices, setting up joint projects and technology
> transfer mechanisms will be an important part of the workshop.
>
> Topics of Interest
>
> This full-day workshop is not intended to be a mini-conference, but as
> a real workshop aiming at concrete results that should clarify the
> situation of Semitic languages with respect to Language Resources and
> Evaluation. We expect to launch at least two evaluation campaigns:
> Comparative evaluation of Morphology taggers and Named Entities
> Recognizers.
>
> Among the many issues to be addressed, below follow a few suggestions:
>
> * Issues in the design, the acquisition, creation, management,
> access, distribution, use of Language Resources, in particular in a
> bilingual/multilingual setting (Standard Arabic, Hebrew, Colloquial
> Arabic, Amazigh, Coptic, Maltese, etc.)
>
> * Impact on LR collections/processing and NLP of the crucial
> issues related to "code switching" between different dialects and
> languages
>
> * Specific issues related to the above-mentioned languages such
> as the role of morphology, named entities, corpus alignment, etc.
>
> * Multilinguality issues including relationship between
> Colloquial and Standard Arabic
>
> * Exploitation of LR in different types of applications
>
> * Industrial LR requirements and community's response
>
> * Benchmarking of systems and products; resources for
> benchmarking and evaluation for written and spoken language
> processing;
>
> * Focus on some key technologies such as MT (all approaches e.g.
> Statistical, Example-Based, etc.), Information Retrieval, Speech
> Recognition, Spoken Documents Retrieval, CLIR, Question-Answering,
> Summarization, etc.
>
> * Local, regional, and international activities and projects and
> needs, possibilities, forms, initiatives of/for regional and
> international cooperation.
>
> We invite submissions on computational approaches to processing
> text/speech in all Semitic and Semitic-area languages. The call is
> open for all kinds of computational work, e.g., work on computational
> linguistic processing components (e.g., analyzers, taggers, parsers),
> on state-of-the-art NLP applications and systems, on leveraging
> resource and tool creation for the Semitic language family, and on
> using computational tools to gain new linguistic insight. We
> especially welcome submissions on work that crosses individual
> language boundaries, heightens awareness amongst Semitic-language
> researchers of shared challenges and breakthroughs, and highlights
> issues and solutions common to any subset of the Semitic languages
> family.
>
>
> Workshop general chair:
> Khalid Choukri, ELRA/ELDA, Paris, France
>
> Workshop co-chairs:
> Owen Rambow, Columbia University, New York, USA
> Bente Maegaard , University of Copenhagen, Denmark
> Ibrahim A. Al-Kharashi, Computer and Electronics Research Institute,
> King Abdulaziz City for Science and Technology, Saudi Arabia
>
>
> Organizing Committee information
> Khalil Sima’an, Language and Computation, University of Amsterdam
> (The Netherlands).
> Mona Diab , Center for Computational Learning Systems,Columbia
> University (USA).
> Mike Rosner , Dept. Intelligent Computer Systems,University of Malta
> (Malta).
> Shuly Wintner , Computer Science Dept., Haifa University, (Israel).
> Christopher Cieri, Linguistic Data Consortium, Philadelphia, (USA)
> Paolo Rosso, Universidad Politécnica Valencia, (Spain)
>
>
> The Program and Scientific Committees will be listed on the web pages.
>
> Important Dates
>
> Deadline for abstract submissions: 26 February 2010
> Notification of acceptance: 15 March 2010
> Final version of accepted paper: 11 April 2010
> Workshop full-day: 17 May 2010
>
> Submission Details
>
> Submissions should comply with LREC standards (including the LREC Map
> initiative) and must be in English. Abstracts for workshop
> contributions should not exceed Four A4 pages (excluding references).
> An additional title page should state: the title; author(s);
> affiliation(s); and contact author's e-mail address, as well as postal
> address, telephone and fax numbers.
>
> Submission will use the LREC START facility. Expected deadline is 26
> February 2010.
>
> Submitted papers will be judged based on relevance to the workshop
> aims, as well as the novelty of the idea, technical quality, clarity
> of presentation, and expected impact on future research within the
> area of focus.
>
> Registration to LREC’2010 will be required for participation, so
> potential participants are invited to refer to the main conference
> website for all details not covered in the present call
> (http://www.lrec-conf.org/lrec2010/)
>
> Formatting instructions for the final full version of papers will be
> sent to authors after notification of acceptance and will be identical
> to LREC main conference instructions.
>
> When submitting a paper through the START page, authors will be kindly
> asked to provide relevant information about the resources that have
> been used for the work described in their paper or that are the
> outcome of their research. For further information on this new
> initiative, please refer to
> http://www.lrec-conf.org/lrec2010/?LREC2010-Map-of-Language-Resources.
> _______________________________________________
> Semitic mailing list
> Semitic at cs.haifa.ac.il
> https://cs.haifa.ac.il/mailman/listinfo/semitic
>
> The material posted is under the full responsibility of whoever
> posted it and under their sole responsibility and liability. The
> University takes no responsibility whatsoever for any material or
> other damage, direct or indirect, that may incur from publications
> in the forum and/or distribution list. Nor is it responsible for the
> authenticity of any data and material posted in the forum and/or
> distribution list, their legality, accuracy, credibility or their
> completeness
More information about the am-nlp
mailing list