Compound words in different languages

Discuss various aspects of natural language.

Moderators: Calilasseia, ADParker

Re: Compound words in different languages

#21  Postby lpetrich » Apr 13, 2015 7:05 am

Here are some English attributive phrases turned into compounds:

tunnelboringmachine
reactioncontrolsystem
spaceshuttlemainengine
partialwaveexpansion
virtualmemorymanager
lowmemoryglobals
lpetrich
THREAD STARTER
 
Posts: 750
Age: 60
Male

United States (us)
Print view this post

Ads by Google


Re: Compound words in different languages

#22  Postby don't get me started » Apr 15, 2015 7:30 am

epepke wrote:
don't get me started wrote:I do, however, have another quibble with your example sentence concerning the mittens.
English has a system for piling up adjectives, and that order is violated in 'woolen red mittens'.


OK, this is true, and about 25 years ago, I figured out a bit of this. I wrote a linguist who said that I was doing some classical linguistics in a good way, but Cognitive Science hasn't caught up. Now it has.

Still, consider that "woolen red mittens" is still understandable. It's not as easy to understand as "red woolen mittens," but it's easier to understand than "baseball wooden bat." It might not be very noticeable; the time difference it takes for an English speaker to understand the utterances is at most in the tenths of seconds, but there are a lot of other things like this, such as where you put commas.

Now, if you acknowledge this and stop thinking in anal-retentive Honky Chomsky terms, it leads to some catastrophic qualitative decisions. The one it has led me to can be put simply: the very idea of parsing is completely fucked. All of it. Parts of speech, too. The ideas are just stupid.

Which one could intuit by noticing that people speak languages without even knowing what "parse" means, and traditionally, "parsing" is an arcane exercise in language classes. You can apply it and get a mapping, but what of it? That doesn't mean it describes how things work in a brain.

All these "rules" and shit can be expressed in, and much better interpreted as, fuzzy heuristics in an overall system that is hardly structured at all, except in the sense that nearby words in an utterance usually have a lot more connection than far-away words. Structures beyond this, which we stupid rationalistic people can reinterpret as phrase structure or whatever, are just what happens when blobs of close-together words are put close together.

That's how I'm approaching natural language understanding, and it's working so much better than the classical approaches that I am constantly amazed and surprised. Of course, it's going to be a while before I get everything, but so far, I haven't met a single idiom or weird structure that doesn't fit easily into the paradigm.


Yep epepke, I hear you.
I especially liked your description of 'anal retentive Honky Chomsky terms'!!
TBH I can't make head nor tail of those tree diagram things that litter generative grammar reference books. Usually the domain of people who spend all day in their offices thinking up concocted John and Mary sentences and then thinking up ways why they are 'correct' or 'incorrect' as if there was a sharp, black and white division. The same people are (in my experience) often not very good interactants in actual, real world situations.
The fact is that humans are inclined towards sense-making and will try really hard to understand any utterance, no matter how 'chaotic' or 'deviant' it may be when tested against formal rules.
The title of an article I read some time ago sums it up for me: "Meaning in context: Is there any other kind?"

The example of English adjective ordering seems interesting to me because of its underlying structure (along a gradient from subjective to objective), which structure is largely invisible to users of the language. Clearly, there was never a committee that sat down and decided that it should be this way. The invisibility to casual introspection of the underlying system is more interesting to me than the actual pattern itself. And, as you say, 'violations' of this system are still perfectly understandable in real world interactions which are basically affiliative in nature.

Consider legal language, which is notoriously impenetrable, such language being based on the fact that the presupposition is one of disaffiliation between parties rather than the basic human stance towards interaction, which is trying one's best to make sense of what the other person actually means, and working together to co-construct meaning in the here and now of the unfolding discourse.

I feel we are far from where we started...apologies to OP for derail.
don't get me started
 
Posts: 1130

Country: Japan
Japan (jp)
Print view this post

Re: Compound words in different languages

#23  Postby Newmark » Apr 15, 2015 9:16 am

In Swedish, we can construct arbitrarily long compound words and still be (more or less) understood. The longest word in the official dictionary (SAOL) is "realisationsvinstbeskattning", "capital gains taxation" in English. According to Guinness World Records, the longest Swedish word (that I presume has been used) is "nord-väster-sjö-kust-artilleri-flyg-spanings-simulator-anläggnings-materiel-underhålls-uppföljnings-system-diskussions-inläggs-förberedelse-arbeten", although it uses far more hyphenation than is generally accepted...

We do however have a rather curious problem when someone mistakenly inserts blanks into a compound word, thus changing its meaning, and we even have a (compound) word for it: "särskrivning" (roughly "written apart"). Often, it's easy to spot as the compounds don't make sense grammatically, but thanks to a plethora of homographs and generous rules for inflections, it can lead to a lot of fun:

"rökfritt": lit. "smoke-free", used to indicate a "No smoking"-area
"rök fritt": "(you are) free to smoke"

"Giftorm säljes!": "Poisonous snake for sale!"
"Gift orm säljes!": "Married snake for sale!"

"Grillad kycklinglever med kulpotatis": "Grilled chicken liver with potato-balls"
"Grillad kyckling lever med kul potatis": "Grilled chicken lives with fun potato"

"Jag är en mörkhårig sjuksköterska.": "I am a dark-haired nurse."
"Jag är en mörk hårig sjuk sköterska.": "I am a dark, hairy, sick nurse."

"Jag skriver min doktorsavhandling": "I'm writing my doctorate thesis"
"Jag skriver min doktors avhandling": "I'm writing my doctor's thesis"

"Postens kassaservice": "The Post Office's cashier service"
"Postens kassa service": "The Post Office's crummy service"
User avatar
Newmark
 
Posts: 365
Age: 41
Male

Sweden (se)
Print view this post

Re: Compound words in different languages

#24  Postby Adco » Apr 15, 2015 10:21 am

Scot Dutchy wrote:
lpetrich wrote:So Dutch can be as obnoxious as German.


Why do you say that? Do you know the language? Do you speak and write it.


Afrikaans, my second language, is similar to Dutch/German. I have trouble reading some of the long words. Speaking it or listening is not a problem because you don't hear the lack of breaks or hyphens. I have found that I speak Afrikaans less and less as the years pass. I was fluent in Afrikaans for many years. Recently, there seems to be less Afrikaans spoken in South Africa. 99% of my business is conducted in English and most of my Afrikaans speaking friends speak English to me.

I love Dutch prefer it in many ways to English as it has many more subtle words. Making up compound words is part of the sport of the language.

Afrikaans also has words that cannot be translated directly into English. They often gets used in our general conversations when we want to describe a particular situation. I can't think of any specific phrases right now but they are usually gems.
god must love stupid people - he made so many of them
User avatar
Adco
 
Posts: 1321
Age: 61
Male

South Africa (za)
Print view this post

Re: Compound words in different languages

#25  Postby Scot Dutchy » Apr 15, 2015 2:20 pm

Gezellig is a well known example.

The home is gezellig. The man is gezellig. Mean complete different things.
Myths in islam Women and islam Musilm opinion polls


"Religion is excellent stuff for keeping common people quiet.” — Napoleon Bonaparte
User avatar
Scot Dutchy
 
Posts: 43119
Age: 71
Male

Country: Nederland
European Union (eur)
Print view this post

Re: Compound words in different languages

#26  Postby epepke » Apr 15, 2015 3:48 pm

don't get me started wrote:Yep epepke, I hear you.
I especially liked your description of 'anal retentive Honky Chomsky terms'!!


Thanks. I have to credit Chuck Entress, my semi-son-out-law, for the term "Honky Chomsky."

TBH I can't make head nor tail of those tree diagram things that litter generative grammar reference books. Usually the domain of people who spend all day in their offices thinking up concocted John and Mary sentences and then thinking up ways why they are 'correct' or 'incorrect' as if there was a sharp, black and white division. The same people are (in my experience) often not very good interactants in actual, real world situations.


It does reflect, I think, a passion of some people, especially geeks like me, to think that the solution to any problem is more and more precise rules. I try to fight that tendency.

The fact is that humans are inclined towards sense-making and will try really hard to understand any utterance, no matter how 'chaotic' or 'deviant' it may be when tested against formal rules.


I'm going to harp on something a bit, and I don't think it's much of a derail. What I find fascinating is that there are gradations of difficulty in understanding an utterance, and that they aren't always related to the ambiguity. That is, one can have easy and difficult unambiguous and ambiguous utterances. I could probably cook up some examples if need be.

Languages do have some heuristic mechanisms for understanding that are specific to the language. Compound words, which are so prominent in German, form one. Others are case, gender, definite and indefinite articles, verb conjugations, and so on. Most of these work in a fairly predictable manner. In German compounds, for example, some bits of the word are more likely to be used than other bits in German that mean the same things.

I've been able to simulate this quite effectively in a program, which also is very good at understanding. I'm going to talk a bit about it. It might interest some, but talking about it also helps me understand.

I have several cognitive layer. An utterance usually passes down this chain, though each layer can pass parts back up for testing, which is quite important. An utterance can also go down and pass a question back up, remaining in place until the question is answered or it forgets. So short dialogues for clarification are an inherent part of the system and do not need explicitly to be programmed. Still, understanding goes mainly down, and expression goes mainly up.

The important layers for language processing are as follows. There are more to get things done, but never mind.

processed: Words and punctuation and stuff like that

surface: A structure containing things like noun and prepositional phrases, verbs, adjectives and so on, but not exactly what the grammatical terms means. A big problem here is coming up with words to describe these things. I like "adjectival," which is something that talks about a thing but is not necessarily an adjective. In any event, this represents how the words can be clumped together and uses specifically only features of the language itself.

deep: Something vaguely like a head with "deep cases" that are like Fillmore's deep cases. But I'm planning to get rid of that and represent things more in terms of their roles in an action. I've already worked this out but I haven't implemented that part. At this level, it starts to have meaning. The objects in a deep structure aren't yet realized; they are abstract vague ideas that might map onto objects, although some may be realized by analogy to embodied metaphors. For example, "pick up the toad" will have an unrealized concept of a toad, but an implied realized concept of up, as in, to/with the hand.

concrete: At this point, all those abstract vague ideas have been realized, that is, they correspond to objects known to the cognitive system. (Unless, of course, they don't.)

It starts with a very rough kind of parsing, which I might eventually replace with a word search. But that's the way I started. It parses an utterance in all possible ways with almost no assumptions of what words mean or what kind of words they are, and what assumptions exist are heuristic. Even for fairly short utterances, this can easily involve 20,000 parsings. Fortunately, computers are fast. Dynamic programming in this case typically reduces it to 500 attempts, many of which can trivially be rejected. This results in sometimes one or a few to a few dozen, all graded with likelihood using fuzzy logic.

The other layers are similar in that they take all the alternatives given and produce a new set of alternatives. The ultimate goal is to boil it down to one best alternative, though jokes often produce two or three.

Now, I've noticed a few things that happen with utterances that are more or less similar to canonical, "grammatical" utterances. And also in similar ways at other layers of cognition.

1) Canonical utterances produce a small number of high-likelihood alternatives, whereas less canonical ones produce a larger number of low-likelihood utterances.

2) More of the interpretations of canonical utterances can be rejected outright at any layer, requiring fewer to be passed onto deeper layers.

From 1 I can get a "confusion" metric. 2 makes "harder" utterances take longer to process. The ratio of timing is about the same as what happens in my brain when I process utterances, though the computer is faster overall for all kinds of utterances.

This is very exciting to me, as perhaps it means I'm on the right track.

Consider legal language, which is notoriously impenetrable, such language being based on the fact that the presupposition is one of disaffiliation between parties rather than the basic human stance towards interaction, which is trying one's best to make sense of what the other person actually means, and working together to co-construct meaning in the here and now of the unfolding discourse.


This is interesting, as one of the things that I've done throughout my life leading to this sorry state was a limited heuristic parser for a subset of legal language, which is renewal times for contracts. I had to do it because I had a spreadsheet where this was just typed in English, and I had to put them unambiguously in a database. So I wrote a script that first got about 80% right, and I tweaked it until it got about 99% right. It was a small application, but it's kind of the same approach that I'm using now. I don't have many preconceived notions about what English should look like. I just try it and see. As a result, I've already gotten many features of English that work, and native speakers know how to do it, that aren't clearly in grammar textbooks. I even have a "this sounds right" metric. Again, I find this encouraging.
User avatar
epepke
 
Posts: 4080

Country: US
United States (us)
Print view this post

Re: Compound words in different languages

#27  Postby lpetrich » Apr 22, 2015 4:15 am

I recall from somewhere that the long names in organic-chemical nomenclature are a result of late 19th cy. Germans doing a lot of work in this field, complete with using big compound words. I think that it was the Isaac Asimov science essay quoted here:

The singing organicker

"I was standing at the desk of a receptionist waiting ... to give her my name. ... She was a very pretty Irish receptionist. ... So I waited patiently and smiled at her; and then her patent Irish stirred that drumbeat memory in my mind, so that I sang in a soft voice [to the tune of "Irish Washerwoman"] . . . PA-ruh-dy-METH-il-a-MEE-noh-ben-ZAL-duhhide ... through several choruses.

"And the receptionist clapped her hands ... in delight and cried out, Oh, my, you know it in the original Gaelic!'"

Asimov, Isaac, "You, Too, Can Speak Gaelic," in "Adding A Dimension," Lancer Books, Inc., New York, 1969.

(source: News Scripts - Chemical & Engineering News Archive (ACS Publications))

Wikipedia mentions it: para-dimethylaminobenzaldehyde, complete with that Isaac Asimov reference. He traced the etymology of each part of its name: para-di-methyl-amino-benz-alde-hyde.
lpetrich
THREAD STARTER
 
Posts: 750
Age: 60
Male

United States (us)
Print view this post

Ads by Google


Re: Compound words in different languages

#28  Postby lpetrich » May 21, 2015 10:52 pm

Seems like English is the only present-day Germanic language with attributive phrases not turned into compounds. Here are some attributive phrases that I've recently used, turned into compounds:

dollmakerscreencaptures
portholepuncher

I recall something about the history of Chinese. It is first recorded as having mainly one-syllable words with lots of possible initial and final consonants, as determined by reconstruction. These get eroded down to not as many initial consonants and much fewer final ones, with tones emerging. Even with tones, Chinese gets numerous homophones. Recent Chinese speakers have gotten around that problem by coining numerous two-syllable compounds, and I recall someone claiming that Chinese is on its way to becoming a multisyllabic language.
lpetrich
THREAD STARTER
 
Posts: 750
Age: 60
Male

United States (us)
Print view this post

Previous

Return to Linguistics

Who is online

Users viewing this topic: No registered users and 1 guest