Wiktionary:Community Portal
Izvor: Wiktionary
Let's talk here... I gave to Cabrilo bureaucrat status and to Francis admin status (I don't think that there is a sense that non-native speaker has bureaucrat rights, but if two of you think different, I'll give to Francis bureaucrat permissions). --Millosh 19:21, 19 November 2007 (UTC)
BTW, do you have some plan or you just started with inserting words? --Millosh 19:21, 19 November 2007 (UTC)
- I have a large corpus of English to Serbo-Croatian (and vice versa, although still unchecked) translations. These are crude though, no part-of-speech tagging, likely many mistakes, etc. Francis has a lot of inflections and lexical information on words in different languages. He will tell you more about them. These are all very complete entries (except for the meaning). You can see some of the examples that we have already added in this fashion. Will you be in Zagreb, at the Open Translation conference? If yes, that would be a perfect opportunity to discuss how to merge all the info. The two of us could also meet in Belgrade sometime if that suits you better (but both Francis and I will be in Zagreb from 28th November). Do you have a plan on how to make mediawiki definition-centric? This could also be a good point to discuss when (if) we meet. --Cabrilo 20:07, 19 November 2007 (UTC)
- As Dejan says, in Apertium we have quite a bit of information on inflection in quite a few languages... ranging from a few lemmata, to ~30,000 lemmata (in the case of Italian), or ~25,000 lemmata (in the case of Polish). These are being expanded by the day (well, some are). Basically, I'd like to make these useful, and Wiktionary seems like a good bet. - Francis Tyers 20:20, 19 November 2007 (UTC)
I am in Zagreb from Tuesday. (BTW, Dejan, do you want to come with Brane and I? Train ticket will be cheaper; we may find some bed for you if you don't have it for Tuesday/Wednesday night.) So, I think that we should talk there what should we do... There are a lot of interesting things to talk about, including possibility to work together with young Serbian linguists... --Millosh 21:25, 19 November 2007 (UTC)
- Thanks for the offer, I'd love to, but I have to be in Belgrade on Tuesday for work and can't leave before Wednesday. If you have some time this coming weekend, we could meet for some "preliminary" (I love that word) chitchat. Well anyway, I'm in Belgrade full time now, so whenever... :( --Cabrilo 08:54, 20 November 2007 (UTC)
Generally, we should make database and then we should add words with bot. Also, we should work on the localization of the interface, including conversion between scripts and standards... --Millosh 21:25, 19 November 2007 (UTC)
- I am thinking now and I realized that we should use categories as meanings. Maybe to ask for another "category" type, which would be "meaning" (and to leave contemporary categories for other things). What do you think? (Actually, categories and meanings are the same in the way how it is organized in MediaWiki [category may have more then one supercategory].) --Millosh 21:30, 19 November 2007 (UTC)
- There are issues like this: mainly about user experience. We would definitely need to add a "supercategory", but also need to think about how would users search through words, how to display results and how to allow future editors to browse through meanings when they are adding words (it would probably be very hard to find a predefined meaning, so we would end up with people adding differently worded meanings for the same thing, etc.). A few usage scenarios:
- A person searches for a word that has three different, but very similar meanings (A, B and C), and he wants English and Albanian translations of these words. English has one corresponding meaning and one different meaning (C and D), while Albanian has two corresponding meanings (A and B). What's the best way to display these results? Ideally, on one page, so the user can choose what suits them best with the fewest number of clicks.
- An editor is adding a word. He browses for the meaning he needs and can't/doesn't take time to think up of synonyms or other language words that would lead him to the meaning, so he doesn't find one. He adds another meaning worded differently than the one already in the dictionary.
- These are just a couple of different things I don't even know how to begin thinking about and that need some serious brainstorming, at least on my part :) --Cabrilo 08:54, 20 November 2007 (UTC)
- There are issues like this: mainly about user experience. We would definitely need to add a "supercategory", but also need to think about how would users search through words, how to display results and how to allow future editors to browse through meanings when they are adding words (it would probably be very hard to find a predefined meaning, so we would end up with people adding differently worded meanings for the same thing, etc.). A few usage scenarios:
I read a couple of interesting papers at MT summit this year on creating sense-distinguished lexicons from dictionaries and monolingual corpora, probably it would be worth looking them up:
- Marcus Sammer and Stephen Soderland (2007) "Building a Sense-Distinguished Multilingual Lexicon from Monolingual Corpora and Bilingual Lexicons". Proceedings of Machine Translation Summit XI, 2007
Actually, they seem to have them online here, so just take a look. - Francis Tyers 10:46, 20 November 2007 (UTC)
- I think that it is not so hard to implement relations inside of something-similar-to-categories: from something like "augmentative", to just "connected". Also, some kind of algorithm which would calculate "the closest word in the target language" shouldn't be hard to implement, too. With those implementations there wouldn't be a sense to add interwiki links to words, but only to meanings. --Millosh 11:44, 20 November 2007 (UTC)
- Wiktionary is a dictionary, not a translation engine. We should think about the best way to structure data, not about algorithm implementations. Here is one set of problems which dictionary has to describe:
- Here are two sets words in English with translations in sr/hr/bs/sh: professor (profesor, profesorka/profesorica; plural profesori, profesorke/profesorice) and Estonian (only in the ethnonym meaning: Estonac, Estonka; pluar Estonci, Estonke). Meanings are:
- general meaning: profesor, profesori, Estonci (professor, Estonian)
- general meaning only when gender is not known: Estonac (profesor, profesori, Estonci inherit it from the meaning above; professor, Estonian inherit it from general meaning)
- masculine meaning: profesor, profesori, Estonac, Estonci (professor, Estonian inherit it from general meaning)
- feminine meaning: profesorka/ica, profesorke/ice, Etonka, Estonke (professor, Estonian inherit it from general meaning)
- In all cases, it will be translated as "professor" or "Estonian". This means that we need a way how to describe such cases. Of course, this is only an example for possibly more complex situations in other languages (I may imagine some problems in non-Indo-European languages). --Millosh 11:44, 20 November 2007 (UTC)
- So, according to this situation we have to have four meanings in all languages. However, if some language has strictly rules for, for example, a professor with hat or without hat and uses the first form in general meaning -- we should add that meaning in all languages, too. --Millosh 11:44, 20 November 2007 (UTC)
- Francis, I looked at the paper. I think that it may be useful as an approach for automatic generation of database, but we have to think how to structure our database, firstly. Also, we have a number of dictionaries which are partially tagged and we should use them firstly. Also, there are partial translations of WordNet in Serbian (I'll find the url and I'll send it here). --Millosh 11:44, 20 November 2007 (UTC)
- BTW, what do you think about making a page with resources with links and descriptions? Maybe Wiktionary:Resources)? --Millosh 11:44, 20 November 2007 (UTC)
- Also, dictionary which I mentioned is free, but we should talk about that in Zagreb. --Millosh 11:44, 20 November 2007 (UTC)
-
-
- Good idea (Wiktionary:Resources). - Francis Tyers 13:47, 20 November 2007 (UTC)
-