From locales at geez.org Sun Apr 3 10:12:29 2005 From: locales at geez.org (Daniel Yacob) Date: Sun Apr 3 10:12:34 2005 Subject: [ti-translate] Tigrigna Software Glossary Message-ID: Gents, I've placed online a starting point for a "Tigrinya Software Glossary". It is a collection of mostly ISO data used in locales in the next level of localization beyond month and day names. The data is all Amharic which should be an easier starting point than English since some number of words will be identical. If any has time to work on it that would be excellent, this data gets reused quite a bit: http://translate.tigrinya.org/docs/TigrinyaSoftwareGlossary.xls thanks, /Daniel From B.Gebremichael at cs.ru.nl Mon Apr 25 02:47:12 2005 From: B.Gebremichael at cs.ru.nl (Biniam Gebremichael) Date: Mon Apr 25 12:50:57 2005 Subject: [ti-translate] Geez crawler In-Reply-To: <20050422135314.GA22375@borel.slu.edu> Message-ID: Dear all, Following the work of Kevin P. Scannell on web crawling, I made similar attempt for Ethiopic scripts. The site http://www.cs.ru.nl/geezcraw shows some preliminary results. You can in the above page, word list and list of URLS containing tigrigna/amharic words. I have made some efforts to keep both languages separated, i.e. keep two files good.seed and bad.seed a page containing a word from good.seed and does not contain any word from the bad.seed is correct page. Words that were wrongly spelled in the orginal document may still show up in the word list. To fix this, only words that appear twice or more can be considered as trusted. I appreciate any suggestions on these issues and others. Regards, Biniam From B.Gebremichael at cs.ru.nl Mon Apr 25 04:02:33 2005 From: B.Gebremichael at cs.ru.nl (Biniam Gebremichael) Date: Mon Apr 25 14:08:00 2005 Subject: [ti-translate] Geez crawler In-Reply-To: References: Message-ID: <200504251102.33765.B.Gebremichael@cs.ru.nl> Sorry the site is http://www.cs.ru.nl/~biniam/geez/ /Biniam On Monday 25 April 2005 09:47, Biniam Gebremichael wrote: > Dear all, > > Following the work of Kevin P. Scannell on web crawling, > I made similar attempt for Ethiopic scripts. The site > http://www.cs.ru.nl/geezcraw shows some preliminary results. > > You can find, in the above page, word list and list of URLS containing > tigrigna/amharic words. > > I have made some efforts to keep both languages separated, i.e. keep two > files good.seed and bad.seed a page containing a word from good.seed and > does not contain any word from the bad.seed is correct page. > > Words that were wrongly spelled in the orginal document may still show up > in the word list. To fix this, only words that appear twice or more can be > considered as trusted. > > I appreciate any suggestions on these issues and others. > > Regards, > Biniam > > > _______________________________________________ > ti-translate mailing list > ti-translate@geez.org > http://geez.org/mailman/listinfo/ti-translate From locales at geez.org Mon Apr 25 18:18:06 2005 From: locales at geez.org (Daniel Yacob) Date: Mon Apr 25 18:18:09 2005 Subject: [ti-translate] Geez crawler Message-ID: Biniam, Congradulations on the progress! It is wonderful to see someone working in this area for Tigrinya, please keep up the good work! /Daniel