FIG. 018 — INTRODUCTION

Introduction to Constructed Languages

A guided overview with examples of some of the basics of conlanging.

Introduction to Constructed Languages

This is a gentle and more narrative introduction to constructed languages (conlangs) and linguistics in general. This is for the absolute beginner. If you’re already familiar with conlangs or linguistics in general, you can safely explore onward.

Do you LOVE J.R.R. Tolkien?

J.R.R. Tolkien, aside from being an absolutely BRILLIANT author was also a massive language nerd (philologist). He studied the origins of the English language in ways few had up to that point in time. It was this love of language and its history that was one of the driving factors behind and defining aspects of Lord of the Rings and all things Middle Earth in general. This site is dedicated to that work, making it fun and accessible as well as providing tools for authors, gamers, and other language nerds.

Quenya & Klingon & Na’vi, Oh My!

If you’re at this site, there is a good chance you’re already initiated into the (frankly pretty awesome) world of conlangs. But, in general, if not, there are a lot of languages to choose from and most have pretty devoted communities. While conlangs are popular in fantasy and science fiction, some also exist for practical purposes. Esperanto was created in 1887 as an international auxiliary language. Basically so that everyone in Europe could learn a common language that had some features and vocabulary they were used to. While far from perfect it was certainly a nice thought and is probably the most spoken conlang in the world. There is also Toki Pona which is a minimalist language with only around 130 words. It is made for simplicity and the reflect how early language might have worked.

Beyond these examples, there are numerous authors, film makers, and game designers that have created conlangs. Some are better, some are worse, but all are fun or interesting in their own right. Language in general and linguistics specifically are broad, deep, and rich topics of discussion. If you’re world-building or just generally interested, there are a few levels of conlanging you might explore.

A Jabberwockyesque Getting Started Section

If you’re reading this, it means you probably read and speak English. While English rules of spelling and pronunciation are inconsistent, you can easily generate some constructed words in the style of Lewis Carroll in his famous poem Jabberwocky. Here is one to start: “aklorng”. Here is another: “shreptim”. From here, just make some simple definitions. What is “aklorng”? Is it a noun? A verb? An interjection? Let’s assume it is a thing, let’s say a type of lodging used by the “Shreptim” people of the “Misok” mountains. In English a Shreptim village is probably full of aklorngs. But then again, this isn’t English necessarily. Maybe the Shreptim people form plurals with an s as a prefix. Thus, one aklorng, many saklorng. In many early languages group names are often just that language’s word for people, so one shreptim but many… sshreptim? That is a little awkward but maybe that is how your language works (Polish does much worse ;)). Or maybe in the Shreptim language, if a word already starts with an s or s-like sound, the plural is formed by prefixing an su. Thus one shreptim, many sushreptim. If you’ve never done an exercise like this, I highly encourage it. It is both fun and informative. Make some rules for nouns, verbs, and adjectives. Toss in a few pronouns, decide if your language needs articles “the”, “a”, “an”. If you haven’t studied English, this will help you get started and will probably take you a long way while also giving you a solid base for world building.

If you want a good, practical exercise, make up words for the Swadesh 100. These are 100 words representing (tentatively) universal concepts.

Show the Swadesh 100

I (first person singular pronoun)
you (second person singular pronoun; thou & ye)
we (inclusive)
this
that
who?
what?
not
all (of a number)
many
one
two
big
long (not wide)
small
woman
man (adult male human)
person (individual human)
fish (noun)
bird
dog
louse
tree (not log)
seed (noun)
leaf (botanics)
root (botanics)
bark (of tree)
skin (person’s)
flesh (meat, flesh)
blood
bone
grease (fat, organic substance)
egg
horn (of bull etc.)
tail
feather (large, not down)
hair (on head of humans)
head (anatomic)
ear
eye
nose
mouth
tooth (front, rather than molar)
tongue (anatomical)
claw
foot (not leg)
knee
hand
belly (lower part of body, abdomen)
neck (not nape)
breast
heart
liver
drink (verb)
eat (verb)
bite (verb)
see (verb)
hear (verb)
know (facts)
sleep (verb)
die (verb)
kill (verb)
swim (verb)
fly (verb)
walk (verb)
come (verb)
lie (on side, recline)
sit (verb)
stand (verb)
give (verb)
say (verb)
sun
moon
star
water (noun)
rain (noun/verb)
stone
sand
earth (soil)
cloud (not fog)
smoke (noun, of fire)
fire
ash(es)
burn (verb intransitive)
path (road, trail; not street)
mountain (not hill)
red (color)
green (color)
yellow (color)
white (color)
black (color)
night
hot (adjective; warm, of weather)
cold (of weather)
full
new
good
round
dry (substance)
name

Some things to note: You’ll see that most of these words are just one syllable, very few are two syllables, and none are three or more. These likely represent very early concepts put into language and thus are simple utterances. Your language might seem a little odd if the word for “I” is something like “motraltu”. Additionally, because of the oddities of English, it might help to clarify pronunciation using relatively familiar conventions. For example, in English ton and gun rhyme but tone and gone do not. If you made the word “lonet”, did you want it pronounced “lawn-et” or “loan-et”?

An Initially Linguistic Approach

If you’re feeling a little more technically minded, you might think (or know) that there is a way to transcribe a word that is (fairly) consistent across all languages. This site uses the International Phonetic Alphabet (IPA). The IPA defines, with reasonable precision, exactly how a sound is made. So, in the above example, we can say that “lonet” should be pronounced /ˌlɔn’ət/ as opposed to /‘lɔnˌət/, /ˌloʊn’ət/, or /‘loʊnˌət/, or even /‘lɔnˌɛt/, /ˌlɔn’ɛt/, /‘loʊnˌɛt/, or /ˌloʊn’ɛt/. Each of the items listed between the / is a unique pronunciation and the beauty of the IPA is that anyone who understands it can pronounce these words exactly (or at least understand the pronunciation), even if they don’t know English. This is known as phonology.

The obvious first step is to familiarize yourself with the IPA. In the IPA each letter written between the slashes (/) (except the ˌ and ’, which we’ll discuss later) are called phonemes. A phoneme is the smallest unit of sound in a language that distinguishes one word from another. In the above example, /l/ is a phoneme, /ɔ/ is a phoneme, etc. In general, there are two main types of phonemes: consonants and vowels. Consonants are defined by 3 factors. Where the sound is made in the mouth, how the sound is made, and if it involves the vocal chords. These are called place, manner, and voicing, respectively. Let’s look at the example of /b/ and /p/. Again, if you speak English, these should be familiar. The /b/ is a bilabial plosive which is voiced. That is, the sound is generated at the lips, it is made by allowing a puff of air to “explode” from out behind the lips, and vocal chords are engaged. In contrast, the /p/ is a bilabial plosive which is unvoiced. It is made in exactly the same way as the /b/ but the vocal chords are not engaged. If you want, you can test this out. When you whisper the words “pub pup bub” to a person (and actually whisper, don’t just lower your voice), they will be unable to differentiate the words. Most, though not all phonemes, have a voiced and unvoiced pair. Next, you can compare /b/, to /d/, and /g/. These are all voiced plosives, but /d/ is made with the tongue on the roof of your mouth on the hard part of your palette near your teeth (the alveolar ridge). The /g/ is made with the center of your tongue near the soft part of your palette (the velum). If you know /d/ and /g/, you can probably guess their unvoiced counterparts. Besides plosives there are also nasals (air directed through the nose), fricatives (air heavily restricted but not stopped), approximants (air flow restricted but not nearly as much as in fricatives), taps, trills, etc.

Vowels are a little different. A vowel has a height, “backness”, and roundness. Vowel sounds are made by moving the tongue without really restricting airflow. How high or low the tongue is in the mouth (in relation to the most constricted air flow area) is height. How far forward or back the tongue is in the mouth is backness, and the rounding of your lips is roundness. If you make the “uh” sound in English, that is called a schwa, it is represented by the phoneme /ə/ and is right in the middle for height and backness. If you make the “uh” sound and slowly transition to an “ee” sound, you’ll be moving your tongue up and forward (in linguistics, close and front respectively). If you start from the “uh” and move to an “aw” (like when your doctor wants to look at the back of your throat), you’ll be moving your tongue down and back (or open and back, linguistically). Now you can go from “aw” to “ew” (as in “ew, that’s disgusting”) and you’ll be staying back but going from open to close.

Now that you can define a word very accurately, there are a couple of things to consider for a constructed language. The first is your language’s phonemic inventory. This is the total number of phonemes found in your language. In the actual field of linguistics, this can be tricky. English itself has multiple dialects, so Standard American English has something like 19-21 vowel phonemes and about 24 consonant phonemes. This idea gets more complicated when you start considering things like allophones (which we’ll discuss later), but for now, you can either copy the English phonemic inventory, one of the other provided inventories, or decide what you like. This inventory will form the basis of your conlang. However, if you want to really make your conlang work, you need more than phonemes, you also need some phonotactics.

Phonotactics are the general rules about how sounds combine together. For many languages, this involves syllable definition and construction. In general, a syllable has two parts, an onset and a rhyme. The onset is one or more phonemes that start the syllable and is generally considered optional. The rhyme has two parts, a nucleus and a coda and is generally considered required. The nucleus is a vowel or combination of vowels (known as diphthongs, triphthongs, etc.) which is required and a coda, which is one or more phonemes, which is optional. So, /kat/ represents a syllable. The /k/ is the onset, and the /at/ is the rhyme. In the rhyme the /a/ is the nucleus and the /t/ is the coda. The word /skoʊnz/ would have /sk/ as an onset, /oʊ/ for the nucleus, and /nz/ for a coda. As stated previously, the onset and coda are both optional (usually). So /soʊ/ is a valid syllable, /aks/ is a valid syllable, as is /ɑɪ/. When constructing a language, what you’ll want to do is define a maximal syllable. In English, the maximal syllable is CCCVCCCC and an example of such a syllable is /stɹɛŋkθs/ “strengths”. Your language can be more complex, like Polish: CCCCVCCCC, or simpler, like Hawai’ian: CVV. These rules will help determine how multisyllable words are constructed. There are also rules about phonemes, which ones are allowed, in which positions, which ones can cluster, etc. For example: the English “ng” sound /ŋ/ is not permitted in the onset. There are thus no English words that start with an ng. Additionally, English doesn’t allow consonant clusters with the same manner and place of articulation but different voicing. So you won’t find /sz/ or /pb/ together in an onset or coda.

Morphing into Word Construction

Once you have phonemes and phonotactics, you have the raw material to build words, but you don’t yet have a system for how words themselves are put together or how they change form. That is the job of morphology.

A morpheme is the smallest unit of meaning in a language. This is different from a phoneme, which is the smallest unit of sound. A word can be one morpheme or many. The English word “dog” is a single morpheme. The word “dogs” is two morphemes: “dog” (the animal) and “-s” (more than one). The word “unbreakable” is three morphemes: “un-” (not), “break” (to break), and “-able” (capable of). Each piece carries a chunk of meaning, and you can mix and match them to build new words.

Morphemes come in two main flavors. A free morpheme can stand alone as a word: “dog”, “break”, “run”, “good”. A bound morpheme cannot stand alone and must attach to something else: “-s”, “un-”, “-able”, “-ed”. The pieces that attach to other morphemes are called affixes. If they attach to the front, they are prefixes (like “un-”). If they attach to the back, they are suffixes (like “-s”). Some languages also have infixes (which sit inside another morpheme) and circumfixes (which wrap around another morpheme), though English doesn’t really use these in any productive way.

There is also a distinction between two kinds of work that affixes do. Derivational morphology builds new words from existing ones, often changing the word’s category. Adding “-er” to the verb “teach” gives you the noun “teacher”. Adding “-ness” to the adjective “happy” gives you the noun “happiness”. These create new lexical items that you’d list separately in a dictionary. Inflectional morphology, on the other hand, doesn’t create new words but rather adjusts an existing word to fit the grammar of a sentence. The “-s” in “dogs” doesn’t make a new word, it just marks the plural. The “-ed” in “walked” doesn’t make a new word, it just marks the past tense. You won’t find “dogs” and “walked” as separate dictionary entries; they’re just grammatical forms of “dog” and “walk”.

Languages vary wildly in how much morphology they use. Some languages, like Mandarin Chinese or Vietnamese, are highly analytic or isolating. In these languages, each word tends to be one morpheme, and grammatical relationships are shown by separate words and by word order. Other languages, like Turkish, Finnish, or Swahili, are agglutinative. They stack morpheme after morpheme onto a single word, with each affix doing one clear job. A single Turkish word can carry meaning that requires a full English sentence to translate. Still other languages, like Latin, Russian, or Spanish, are fusional. They use affixes too, but each affix can carry several grammatical meanings at once. The Spanish “-o” in “hablo” simultaneously marks first person, singular, present tense, and indicative mood, all in one morpheme. At the extreme end, some languages, like many Indigenous American languages, are polysynthetic. They can pack what would be an entire English sentence into a single word.

When you’re building your conlang, one of the early decisions is roughly where on this spectrum you want to land. This will shape almost everything about how your words look and how your sentences come together. If you want short, simple words and lots of helper words like English’s “the” and “of” and “will”, lean analytic. If you want big chunky words with lots of pieces glued together, lean agglutinative or polysynthetic. Many real languages are a mix, of course, but having a sense of the dominant tendency is a good starting point.

There are also categories of meaning that languages commonly mark on words, especially nouns and verbs, and you’ll want to decide which of these your language cares about. For nouns, common categories include number (singular, plural, sometimes dual or paucal), case (whether the noun is a subject, object, possessor, location, etc.), gender or noun class (masculine, feminine, neuter, or sometimes much more elaborate systems with dozens of classes), and definiteness (the difference between “a dog” and “the dog”). For verbs, common categories include tense (when something happens), aspect (how the action unfolds, whether it is completed, ongoing, habitual), mood (whether it’s a statement, command, hypothetical, etc.), voice (active versus passive, and others), and agreement with the subject and sometimes the object. English marks relatively little of this on the word itself. A language like Latin or Swahili marks a great deal.

A small example: let’s go back to the Shreptim word for house or lodging “aklorng”. We decided that houses or lodgings should be “saklorng”. You might now decide that the locative case (meaning “in” or “at” something) is formed by suffixing “-i”, giving you “aklorngi” for “in the house” and “saklorngi” for “in the houses”. Or perhaps you like sticking with the prefixes and choose a locative prefix of “ki-”. You might then decide both can stack: “kisaklorng” meaning “in the houses”. Or maybe the plural prefix is always the first making “skiaklorng” Now you’ve made a small morphological system, and you can apply the same rules to every noun in your language. That consistency is what gives a language its character.

Paying the Grammar “Tax”

Morphology tells you how individual words are built. Syntax tells you how words are arranged to form phrases and sentences. Even if your morphology is rich, you still need rules about word order and sentence structure, because the same set of words can mean very different things depending on how they’re arranged. “The dog bit the man” means something very different from “The man bit the dog”, even though the words are identical.

The most basic syntactic decision is your language’s default word order. Linguists usually describe this in terms of three core elements: the subject (S), the verb (V), and the object (O). English is an SVO language. We say “The dog (S) chased (V) the cat (O)”. Japanese is SOV: the verb comes at the end. Welsh and Classical Arabic are VSO: the verb comes first. There are six logically possible orders, and all six occur in real languages, though SOV and SVO are by far the most common. When you build your conlang, you’ll pick one of these as the basic order. Some languages are very strict about word order, like English. Other languages are very flexible because their morphology marks who is doing what to whom regardless of position, like Latin or Russian. If your morphology is rich enough to track grammatical roles, you can afford to have looser word order. If your morphology is sparse, like in English, word order has to do most of the heavy lifting.

Beyond the basic SVO question, there are many smaller word-order decisions that tend to cluster together. Languages that put the verb at the end often put adjectives before nouns, possessors before the possessed, and use postpositions instead of prepositions. Languages that put the verb in the middle, like English, tend to put adjectives before nouns but possessors and prepositions in various positions. These patterns aren’t strict laws, but they describe strong tendencies. If you want your conlang to feel natural, picking a profile and sticking to it (mostly) is a good idea. If you want it to feel exotic or deliberately strange, breaking these patterns can do the job.

Sentences also have internal structure beyond just the order of words. A noun phrase is a group of words that act together as a single noun-like unit. “The big red dog” is a noun phrase, with “dog” as its core and “the”, “big”, and “red” all modifying it. A verb phrase is the verb plus the things that go with it. “Quickly ate the cookies” is a verb phrase. Sentences are usually built out of a noun phrase (the subject) and a verb phrase (which contains the verb and often other things, including other noun phrases). When you’re describing your conlang’s syntax, you’ll want to specify how each of these phrase types is built. Where does the adjective go relative to the noun? Where does the article go? Can you stack multiple adjectives, and if so, in what order? Where do numbers and demonstratives (“this”, “that”) fit in?

Then there is the matter of more complex sentence structures. How does your language ask questions? English uses two main strategies: it inverts the subject and verb (“Are you coming?”) and it uses special question words at the front (“Where are you going?”). Japanese, by contrast, just adds a particle “ka” to the end of a statement. Mandarin can simply add the particle “ma” or use intonation alone. How does your language make a sentence negative? English uses “not” or contractions like “don’t”. Many languages use a prefix or suffix on the verb. Some languages have two negative markers that wrap around the verb, like French “ne… pas”. How does your language combine multiple clauses into a single sentence? “I went to the store and I bought bread” versus “I went to the store because I needed bread” versus “The man who lives next door is a baker”. These all involve different ways of linking smaller sentences together, and every language has its own set of strategies.

You’ll also want to think about how your language handles the relationships between participants in a sentence. In English, we mostly figure this out from word order: whoever comes before the verb is the subject, and whoever comes after is the object. In a language with case marking, the noun itself wears a marker that announces its role. The Latin word “puella” means “girl” as a subject, but “puellam” means “girl” as an object, and “puellae” means “of the girl” or “to the girl” depending on context. If your language has case, you’ll need to decide which cases exist, what each one marks, and how they’re formed. If your language has no case, word order or adpositions will have to do this work instead.

Finally, there is the question of how subjects and objects are tracked across more than one sentence. Most languages have some way of avoiding repetition. In English, we use pronouns: “John walked to the store. He bought bread.” In some languages, you simply leave the subject out entirely once it has been established, and the verb’s agreement marking is enough to tell you who’s doing what. Spanish, Italian, and Japanese all do this regularly. This is called pro-drop, and it’s another design choice for your conlang.

As with morphology, the goal at this stage is to pick a coherent set of rules and apply them consistently. You don’t need to decide every detail at once. Start with a basic word order, decide how noun phrases and verb phrases are built, decide how to form questions and negatives, and decide how to handle multiple clauses. Once you have those bones in place, you can flesh out the more complex structures as you encounter them in actual sentences you want to translate.

Where to Go From Here

Phonology, phonotactics, morphology, and syntax give you the four foundational layers of your conlang. With these in hand, you can generate words, inflect them, arrange them into phrases, and combine those phrases into sentences. You will have a functioning language.

There are still deeper layers to explore if you want to go further. Semantics looks at how meaning works, including the more abstract questions of how concepts get carved up differently in different languages. Pragmatics covers how language is used in context, including politeness, register, and implication. Sociolinguistics looks at how language varies across social groups and changes over time. Historical linguistics covers how languages evolve from earlier forms, and is especially useful if you want to build a family of related conlangs. Orthography covers how your language gets written down, with its own set of fascinating decisions about how spelling relates to pronunciation. Each of these is a rich topic in its own right, and any one of them can give your conlang a depth and texture that elevates it from a curiosity into something that feels truly lived-in.