மொழி புரியும் கணினிகள்

Computers That Understand Language

~12 min

Free

Autocomplete finishes your sentences. Translation apps read Tamil text. AI writing tools produce entire paragraphs. But does the computer understand language — or is it doing something much simpler, and sometimes much less reliable?

By the end of this lesson you will be able to— இந்த பாடத்தின் இறுதியில்

Explain what Natural Language Processing (NLP) means in simple terms
Give three examples of NLP tools they have used
Explain why language is difficult for computers (ambiguity, context, nuance)
Understand that AI language tools can produce confident-sounding but false text
Apply the CHECK habit to AI language output before trusting it

Let's Learn

What you will learn today

Understand how AI processes and generates human language — from word prediction to translation to chatbots.

🔁

Language Is Harder Than It Looks

Read these sentences:

1. 'I saw the man with the telescope.'
2. 'The bank was steep.'
3. 'Time flies like an arrow; fruit flies like a banana.'
4. 'She can't bear children.' (is she unable to have them, or does she dislike them?)

Each of these is genuinely ambiguous — the meaning depends on context that is outside the sentence. Humans resolve these instantly using our knowledge of the world. Computers find this extraordinarily difficult.

How AI Turns Words Into Numbers

Computers cannot work with words directly — only numbers. So the first step in language AI is converting words to numbers, using a technique called word embeddings.

In a word embedding, each word is represented as a point in a high-dimensional space (imagine 300 dimensions instead of our 3D space). Words with similar meanings are close together in this space:

• 'King' and 'Queen' are close
• 'Dog' and 'Cat' are close
• 'King' – 'Man' + 'Woman' ≈ 'Queen' (mathematically!)

This lets the AI capture meaning relationships between words as distances between points.

📐 The Autocomplete Trick — Predicting the Next Word

The core task behind most language AI is simple to describe: given the words so far, predict the most likely next word.

Example: 'The cat sat on the ___'
Possible next words and their probability:
• mat: 23%
• floor: 18%
• sofa: 15%
• table: 12%
• roof: 4%

By training on billions of web pages and books, AI learns the statistical patterns of how words follow each other. Doing this at massive scale (predicting the next word in millions of sentences) forces the model to learn grammar, facts, reasoning patterns, and even some common sense — as side effects.

Large Language Models (LLMs)

ChatGPT, Gemini, and similar tools are called Large Language Models (LLMs). 'Large' refers to the enormous number of weights — GPT-4 has roughly 1.76 trillion weight values.

How LLMs are built:
1. Pre-training: read and learn from hundreds of billions of words from the internet, books, and code
2. Fine-tuning: shown examples of helpful conversations and instructed to be helpful and safe
3. RLHF (Reinforcement Learning from Human Feedback): human trainers rate responses, reinforcing helpful and harmless outputs

The result: an AI that can write, summarise, translate, code, explain, and answer questions — all by very sophisticated next-word prediction.

Pre-training on vast text
Fine-tuning for helpfulness
RLHF — human feedback shapes behaviour

💡

LLMs Do Not Know Truth — They Know Patterns

An LLM does not look things up. It has no connection to a live database of facts. It generates text that is statistically likely to be correct — based on patterns in its training data.

This is why LLMs 'hallucinate' — confidently stating false facts. If you ask an LLM for a fake book title, it will generate an author, publication date, and summary that sound completely plausible but are entirely made up.

Always verify factual claims from LLMs using reliable sources.

Machine Translation

Google Translate and similar tools use a neural network called a Transformer to translate between languages.

The model is trained on millions of parallel texts — the same document in two languages (like UN documents, which are official in 6 languages). It learns which phrases in one language correspond to which phrases in another.

Modern translation AI is remarkably good for common language pairs (English–French, English–Spanish) but still struggles with:
• Languages with less training data (many regional languages)
• Idioms and cultural references
• Very formal or very colloquial speech
• Languages with very different grammatical structures

🔍

Misconception: 'AI Understands What It Says'

When a chatbot gives a brilliant explanation of quantum physics, it is not understanding quantum physics. It has seen millions of texts where 'quantum physics' appears alongside certain other words, concepts, and explanations — and it generates statistically appropriate text.

Proof: ask the same LLM a subtle logical trick question. It will often confidently give the wrong answer — because the pattern it is following leads to the plausible-sounding but incorrect response.

Understanding involves knowing when you are wrong. LLMs currently have limited ability to do this — they generate plausible text, whether or not that text is true.

⚡

Challenge Round

Testing an LLM's Limitations

Try these tasks with any AI chatbot available to you (ChatGPT, Gemini, etc.):

1. Ask it to count how many times the letter 'r' appears in 'strawberry'
2. Ask it to explain a genuinely made-up Tamil proverb you invent
3. Ask it what it had for breakfast

Notice: what does it get wrong? What does it say about things it cannot know? These failures reveal how it actually works.

Language AI — The Big Picture

Language AI works by turning words into numbers, then predicting the most likely next word at enormous scale. LLMs like ChatGPT are trained on hundreds of billions of words and generate text that is statistically plausible — not necessarily true. They are powerful tools but not understanding minds.

🌟

You understand how AI reads and generates language — and why 'sounds right' is not the same as 'is right'.

↪ Next lesson: how AI learns from YOU personally — recommendation systems and personalisation.

★

Key Points

✓Word embeddings place words in mathematical space — similar words are close together
✓Language AI predicts the most likely next word based on patterns in training data
✓AI can 'hallucinate' — generate confident text that is completely false
✓Translation AI uses patterns from millions of parallel texts, not grammar rules
✓Sounding right and being right are very different things in language AI

Glossary

சொல் அகராதி

NLP (Natural Language Processing)

இயற்கை மொழி செயலாக்கம்

Ambiguity

இரு பொருள்

Context

சூழல்

Translation

மொழிபெயர்ப்பு

Hallucination (AI)

AI கட்டுக்கதை — AI generates confident but false information

Practice Activities

Quizவினாடி வினா

Answer each question to check your understanding.

Question 1 of 3

What is 'hallucination' in AI language models?

Fill in the Blanksஇடைவெளி நிரப்புக

Type the missing word and press Check or Enter.

Fill in the blanks

Type the missing word and click Check or press Enter.

An AI that confidently states a false fact that sounds plausible is said to be .

Word are a technique that represents each word as a point in a high-dimensional space, so similar words are close together.

Large Language Models (LLMs) are trained to predict the most likely next given what came before.

Computers That Can See

Next Lesson

AI That Learns From You