மொழி புரியும் கணினிகள்
Computers That Understand Language
Siri hears your words. Google Translate reads Tamil text. Autocomplete finishes your sentences. Language is one of the hardest things for computers — and the most useful when it works.
Let's Learn
What you will learn today
Understand how AI processes and generates human language — from word prediction to translation to chatbots.
Language Is Harder Than It Looks
Read these sentences: 1. 'I saw the man with the telescope.' 2. 'The bank was steep.' 3. 'Time flies like an arrow; fruit flies like a banana.' 4. 'She can't bear children.' (is she unable to have them, or does she dislike them?) Each of these is genuinely ambiguous — the meaning depends on context that is outside the sentence. Humans resolve these instantly using our knowledge of the world. Computers find this extraordinarily difficult.
How AI Turns Words Into Numbers
Computers cannot work with words directly — only numbers. So the first step in language AI is converting words to numbers, using a technique called word embeddings. In a word embedding, each word is represented as a point in a high-dimensional space (imagine 300 dimensions instead of our 3D space). Words with similar meanings are close together in this space: • 'King' and 'Queen' are close • 'Dog' and 'Cat' are close • 'King' – 'Man' + 'Woman' ≈ 'Queen' (mathematically!) This lets the AI capture meaning relationships between words as distances between points.
📐 The Autocomplete Trick — Predicting the Next Word
The core task behind most language AI is simple to describe: given the words so far, predict the most likely next word. Example: 'The cat sat on the ___' Possible next words and their probability: • mat: 23% • floor: 18% • sofa: 15% • table: 12% • roof: 4% By training on billions of web pages and books, AI learns the statistical patterns of how words follow each other. Doing this at massive scale (predicting the next word in millions of sentences) forces the model to learn grammar, facts, reasoning patterns, and even some common sense — as side effects.
Large Language Models (LLMs)
ChatGPT, Gemini, and similar tools are called Large Language Models (LLMs). 'Large' refers to the enormous number of weights — GPT-4 has roughly 1.76 trillion weight values. How LLMs are built: 1. Pre-training: read and learn from hundreds of billions of words from the internet, books, and code 2. Fine-tuning: shown examples of helpful conversations and instructed to be helpful and safe 3. RLHF (Reinforcement Learning from Human Feedback): human trainers rate responses, reinforcing helpful and harmless outputs The result: an AI that can write, summarise, translate, code, explain, and answer questions — all by very sophisticated next-word prediction.
- Pre-training on vast text
- Fine-tuning for helpfulness
- RLHF — human feedback shapes behaviour
LLMs Do Not Know Truth — They Know Patterns
An LLM does not look things up. It has no connection to a live database of facts. It generates text that is statistically likely to be correct — based on patterns in its training data. This is why LLMs 'hallucinate' — confidently stating false facts. If you ask an LLM for a fake book title, it will generate an author, publication date, and summary that sound completely plausible but are entirely made up. Always verify factual claims from LLMs using reliable sources.
Machine Translation
Google Translate and similar tools use a neural network called a Transformer to translate between languages. The model is trained on millions of parallel texts — the same document in two languages (like UN documents, which are official in 6 languages). It learns which phrases in one language correspond to which phrases in another. Modern translation AI is remarkably good for common language pairs (English–French, English–Spanish) but still struggles with: • Languages with less training data (many regional languages) • Idioms and cultural references • Very formal or very colloquial speech • Languages with very different grammatical structures
Misconception: 'AI Understands What It Says'
When a chatbot gives a brilliant explanation of quantum physics, it is not understanding quantum physics. It has seen millions of texts where 'quantum physics' appears alongside certain other words, concepts, and explanations — and it generates statistically appropriate text. Proof: ask the same LLM a subtle logical trick question. It will often confidently give the wrong answer — because the pattern it is following leads to the plausible-sounding but incorrect response. Understanding involves knowing when you are wrong. LLMs currently have limited ability to do this — they generate plausible text, whether or not that text is true.
Challenge Round
Testing an LLM's Limitations
Try these tasks with any AI chatbot available to you (ChatGPT, Gemini, etc.): 1. Ask it to count how many times the letter 'r' appears in 'strawberry' 2. Ask it to explain a genuinely made-up Tamil proverb you invent 3. Ask it what it had for breakfast Notice: what does it get wrong? What does it say about things it cannot know? These failures reveal how it actually works.
Language AI — The Big Picture
Language AI works by turning words into numbers, then predicting the most likely next word at enormous scale. LLMs like ChatGPT are trained on hundreds of billions of words and generate text that is statistically plausible — not necessarily true. They are powerful tools but not understanding minds.
You understand how AI reads and generates language — and why 'sounds right' is not the same as 'is right'.
↪ Next lesson: how AI learns from YOU personally — recommendation systems and personalisation.
Key Points
முக்கிய குறிப்புகள்
- ✓NLP: teaching computers to understand human language
- ✓Language is complex — the same word can mean different things in different contexts
- ✓Applications: voice assistants, translation, autocomplete, spam detection
- ✓NLP works by learning patterns from vast amounts of text
- ✓Computers still struggle with sarcasm, regional dialects, and nuance
Glossary
சொல் அகராதி
NLP (Natural Language Processing)
இயற்கை மொழி செயலாக்கம்
Ambiguity
இரு பொருள்
Context
சூழல்
Translation
மொழிபெயர்ப்பு
Practice Activities
Quizவினாடி வினா
Answer each question to check your understanding.
What is 'hallucination' in AI language models?
Fill in the Blanksஇடைவெளி நிரப்புக
Type the missing word and press Check or Enter.
Type the missing word and click Check.