MCS-224 : Artificial Intelligence and Machine Learning
This content is optimized for web viewing.
For the PDF Version and Micro Version (Exam Notes), please contact:
Suraj
WhatsApp: +91 86389 03328
https://wa.me/918638903328
MCS-224 : Artificial Intelligence and Machine Learning
Complete Previous Year Question Answered
IMPORTANCE LEGEND:
-
🔥 Most Important: High frequency (3+ times) OR critical foundational topics essential for the course.
-
⭐ Very Important: Medium frequency (2 times) OR key operational concepts/extensions.
-
📌 Important: Standard frequency (1 time) OR specific/niche topics.
BLOCK 1: ARTIFICIAL INTELLIGENCE - INTRODUCTION
UNIT 1: Introduction to Artificial Intelligence
Q1. 🔥 Compare Artificial Narrow Intelligence, Artificial General Intelligence and Artificial Super Intelligence
[ Jun 2025 Q1(a), Jun 2024 Q1(a), Jun 2022 Q1(a) | Frequency: 3 ]
Artificial Intelligence can be categorized into three types based on capability and scope:
| Aspect | Artificial Narrow Intelligence (ANI) | Artificial General Intelligence (AGI) | Artificial Super Intelligence (ASI) |
|---|---|---|---|
| Definition | AI designed for a specific task only | AI with human-level cognitive abilities across all domains | AI that surpasses human intelligence in all aspects |
| Capability | Single task or limited set of tasks | Can perform any intellectual task a human can | Exceeds best human minds in every field |
| Learning | Task-specific learning | Transfer learning across domains | Self-improving and evolving autonomously |
| Consciousness | No self-awareness | Potential self-awareness | Hypothetically fully conscious |
| Current Status | Exists today | Theoretical/Research stage | Hypothetical/Future concept |
| Flexibility | Cannot adapt beyond programmed scope | Highly flexible and adaptable | Unlimited adaptability |
| Examples | Siri, Chess engines, Spam filters, Recommendation systems | None exists yet (Depicted in movies like "Her") | None exists (Depicted in movies like "Terminator") |
| Also Known As | Weak AI | Strong AI / Human-level AI | Superintelligent AI |
Diagram:
Q2. 📌 Give suitable example for Narrow AI
[ Jun 2022 Q1(a) | Frequency: 1 ]
Narrow AI (ANI) performs specific tasks with high efficiency but cannot generalize beyond its designed purpose.
Examples of Narrow AI:
| Application | Description |
|---|---|
| Virtual Assistants | Siri, Alexa, Google Assistant - respond to voice commands for specific tasks |
| Recommendation Systems | Netflix, YouTube, Amazon - suggest content based on user behavior |
| Image Recognition | Google Photos face detection, medical imaging diagnosis |
| Spam Filters | Gmail spam detection - classifies emails as spam or not |
| Game Playing AI | IBM Deep Blue (Chess), AlphaGo (Go game) |
| Navigation Systems | Google Maps - optimal route calculation |
| Language Translation | Google Translate - translates text between languages |
Key Characteristic: Each system excels only at its designated task and cannot perform tasks outside its programming.
Q3. 📌 Give suitable example for General AI
[ Jun 2022 Q1(a) | Frequency: 1 ]
General AI (AGI) would possess human-like cognitive abilities to understand, learn, and apply knowledge across any domain.
Theoretical Examples & Representations:
| Category | Example |
|---|---|
| Science Fiction | JARVIS from Iron Man - assists in all tasks from research to combat |
| Movies | "Her" (2013) - Samantha, an AI with emotional intelligence |
| Conceptual | A robot that can cook, write poetry, solve math, and hold conversations |
| Research Goals | OpenAI's mission to develop safe AGI |
Expected Capabilities of AGI:
-
Reasoning and problem-solving across domains
-
Understanding natural language contextually
-
Learning new skills without explicit programming
-
Emotional understanding and social intelligence
-
Creative thinking and abstract reasoning
Current Status: AGI does not exist yet. It remains a goal of AI research with estimated timelines ranging from decades to centuries.
Q4. 📌 Give suitable example for Super AI
[ Jun 2022 Q1(a) | Frequency: 1 ]
Super AI (ASI) would surpass the brightest human minds in every conceivable domain including creativity, problem-solving, and social intelligence.
Hypothetical Examples & Representations:
| Category | Example |
|---|---|
| Science Fiction | Skynet from Terminator - self-aware, self-improving AI |
| Movies | Ultron from Avengers - surpasses creators' intelligence |
| Literature | The Machine in "I Have No Mouth, and I Must Scream" |
| Theoretical | An AI that could solve climate change, cure all diseases, and advance physics beyond human capability |
Theoretical Capabilities of ASI:
-
Self-improvement without human intervention
-
Scientific discoveries beyond human comprehension
-
Perfect decision-making capabilities
-
Potential for technological singularity
-
Complete mastery over all human knowledge
Current Status: Purely hypothetical. Raises significant ethical concerns about control, safety, and existential risks.
Q5. ⭐ Explain Turing test (with example/block diagram)
[ Dec 2023 Q1(b), Jun 2023 Q2(a) | Frequency: 2 ]
Definition: The Turing Test, proposed by Alan Turing in 1950 in his paper "Computing Machinery and Intelligence," is a test of a machine's ability to exhibit intelligent behavior indistinguishable from that of a human.
Original Name: Turing originally called it the "Imitation Game."
Test Setup:
Procedure:
-
A human interrogator (C) is separated from two respondents
-
One respondent is a human (B), the other is a machine (A)
-
Communication occurs only through text-based interface
-
Interrogator asks questions to determine which is human
-
If interrogator cannot reliably distinguish, machine passes the test
Example Conversation:
| Interrogator | Respondent (Unknown if Human/Machine) |
|---|---|
| "Do you like music?" | "Yes, I enjoy classical music, especially Beethoven." |
| "What is 15789 × 4673?" | "Let me think... I'm not great at mental math. Maybe around 73 million?" |
| "Write a short poem about rain." | "Drops fall soft on leaves, Nature's tears bring life renewed, Earth drinks and rejoices." |
Key Points:
-
Tests behavioral intelligence, not internal processes
-
Machine must exhibit human-like responses including errors
-
Text-only communication prevents physical appearance bias
-
Benchmark for conversational AI systems
Q6. ⭐ Explain Chinese room test as Criticism to Turing test (with example)
[ Jun 2024 Q1(b), Jun 2023 Q2(a) | Frequency: 2 ]
Definition: The Chinese Room Argument, proposed by philosopher John Searle in 1980, is a thought experiment that challenges the idea that passing the Turing Test demonstrates true understanding or intelligence.
The Thought Experiment:
Scenario Explained:
| Component | Description |
|---|---|
| The Room | A closed room with a slot for passing papers |
| The Person | An English-only speaker inside the room |
| The Rule Book | Detailed instructions for manipulating Chinese symbols |
| Input | Chinese questions passed through the slot |
| Output | Chinese responses generated by following rules |
How It Works:
-
Chinese speaker sends questions in Chinese characters
-
Person inside matches symbols using the rule book
-
Person produces appropriate Chinese symbol responses
-
From outside, responses appear perfectly fluent
The Argument:
-
The person inside follows syntax (rules) perfectly
-
BUT has zero semantic understanding (meaning)
-
Similarly, computers manipulate symbols without understanding
-
Passing Turing Test ≠ True Intelligence or Understanding
Example:
| Chinese Input | Rule Book Says | Person's Action | Understanding? |
|---|---|---|---|
| 你好吗 (How are you?) | Symbol pattern A → respond with pattern B | Writes 我很好 (I'm fine) | NO |
Searle's Conclusion:
-
Strong AI Claim (Rejected): Computers can truly understand and have minds
-
Weak AI Claim (Accepted): Computers can simulate intelligent behavior
Significance: Distinguishes between simulation of intelligence and actual understanding/consciousness.
Q7. 🔥 Compare Artificial Intelligence, Machine Learning and Deep Learning
[ Dec 2024 Q1(b), Dec 2023 Q2(a), Dec 2022 Q1(a) | Frequency: 3 ]
AI, ML, and DL are related but distinct concepts with a hierarchical relationship:
Diagram:
Comprehensive Comparison:
| Aspect | Artificial Intelligence | Machine Learning | Deep Learning |
|---|---|---|---|
| Definition | Broad field of creating intelligent machines | Subset of AI where machines learn from data | Subset of ML using neural networks with many layers |
| Scope | Widest - includes all intelligent systems | Narrower - focuses on learning algorithms | Narrowest - focuses on neural networks |
| Approach | Rule-based OR learning-based | Statistical learning from data | Hierarchical feature learning |
| Human Intervention | High - requires explicit programming | Medium - requires feature engineering | Low - automatic feature extraction |
| Data Requirement | Varies | Moderate amounts | Large amounts (Big Data) |
| Hardware | Standard computing | Standard to moderate | High-end GPUs/TPUs required |
| Training Time | Depends on approach | Moderate | Very long |
| Interpretability | High (rule-based) | Moderate | Low (Black box) |
| Examples | Expert systems, Robotics, NLP | Spam detection, Recommendation systems | Image recognition, Voice assistants |
| Algorithms | Search, Logic, Planning | Decision Trees, SVM, Random Forest | CNN, RNN, Transformers |
Relationship Flow:
Key Distinctions:
-
AI: Goal is to create intelligent behavior (any method)
-
ML: Method to achieve AI through learning from experience
-
DL: Advanced ML using deep neural networks for complex patterns
Q8. 📌 What do you understand by the term "Agents" in Artificial Intelligence
[ Jun 2022 Q2(a) | Frequency: 1 ]
Definition: An Agent is anything that can perceive its environment through sensors and act upon that environment through actuators.
Formal Definition: An agent is a system that:
-
Perceives its environment
-
Makes decisions
-
Takes actions to achieve goals
Diagram:
Components of an Agent:
| Component | Function | Example (Robot) | Example (Software) |
|---|---|---|---|
| Sensors | Perceive environment | Cameras, Microphones | Keyboard input, File reads |
| Actuators | Perform actions | Motors, Speakers | Display output, File writes |
| Agent Function | Maps percepts to actions | Control program | Algorithm/Logic |
Types of Agents:
-
Human Agent - Eyes, ears (sensors); Hands, legs (actuators)
-
Robotic Agent - Cameras, sensors; Motors, grippers
-
Software Agent - GUI inputs, APIs; Screen output, commands
Agent Function: The mathematical mapping from percept sequence to action:
-
f: P* → A
-
Where P* is the sequence of all percepts and A is the action
Q9. 📌 List the properties supposed to be possessed by agents
[ Jun 2022 Q2(a) | Frequency: 1 ]
An ideal AI agent should possess the following properties:
| Property | Description | Example |
|---|---|---|
| Autonomy | Operates without direct human intervention; has control over its actions | Self-driving car making decisions independently |
| Reactivity | Perceives environment and responds timely to changes | Thermostat adjusting temperature |
| Pro-activeness | Takes initiative; goal-directed behavior | Email filter learning new spam patterns |
| Social Ability | Interacts with other agents (human or artificial) | Chatbots conversing with users |
| Mobility | Ability to move around in the environment | Delivery robots navigating warehouse |
| Rationality | Makes decisions to maximize expected performance | Chess AI selecting optimal moves |
| Learning/Adaptability | Improves performance based on experience | Recommendation system learning preferences |
| Temporal Continuity | Persistent identity and state over time | Virtual assistant remembering past interactions |
| Benevolence | Does not have conflicting goals with user | AI assistant following user instructions |
| Veracity | Does not deliberately communicate false information | Honest reporting of data/results |
Diagram:
Q10. 📌 Compare the SR (simple reflex) agents with model based reflex agents
[ Jun 2022 Q2(a) | Frequency: 1 ]
Comparison Table:
| Aspect | Simple Reflex Agent | Model-Based Reflex Agent |
|---|---|---|
| Definition | Acts based only on current percept | Maintains internal model of world state |
| Memory | No memory of past percepts | Has memory; tracks world state |
| Decision Basis | Condition-Action rules on current input | Current percept + Internal state |
| Environment Handling | Only fully observable environments | Can handle partially observable environments |
| Internal State | None | Maintains internal state |
| Complexity | Simple | More complex |
| World Model | No model | Has model of how world works |
| Adaptability | Limited; fixed rules | Better; can infer unobserved aspects |
| Example | Thermostat (if temp > 25°C → AC on) | Robot tracking obstacles it cannot currently see |
Diagrams:
Simple Reflex Agent:
Model-Based Reflex Agent:
Key Difference Summary:
-
Simple Reflex: "See → Act" (stateless)
-
Model-Based: "See → Think (using memory & model) → Act" (stateful)
Q11. 📌 What are task environments in context of Intelligent Agents
[ Dec 2022 Q2(a) | Frequency: 1 ]
Definition: A Task Environment is the "problem" to which an intelligent agent is the "solution." It specifies the complete setting in which the agent operates.
Task Environment Components:
| Component | Description | Example (Self-driving car) |
|---|---|---|
| Performance Measure | Criteria for success | Safety, time, comfort, fuel efficiency |
| Environment | External surroundings | Roads, traffic, weather, pedestrians |
| Actuators | Means to affect environment | Steering, brakes, accelerator, signals |
| Sensors | Means to perceive environment | Cameras, LIDAR, GPS, speedometer |
This is known as PEAS description (Performance, Environment, Actuators, Sensors)
Properties of Task Environments:
| Property | Options | Description |
|---|---|---|
| Observability | Fully / Partially | Can agent see complete state? |
| Deterministic | Deterministic / Stochastic | Is next state fully determined by current state + action? |
| Episodic | Episodic / Sequential | Are decisions independent or dependent on past? |
| Static | Static / Dynamic | Does environment change while agent deliberates? |
| Discrete | Discrete / Continuous | Are states/actions finite or infinite? |
| Agents | Single / Multi-agent | One agent or multiple agents? |
| Known | Known / Unknown | Does agent know environment rules? |
Q12. 📌 Explain the standard set of measures for specifying task environment under PEAS
[ Dec 2022 Q2(a) | Frequency: 1 ]
PEAS is a framework for describing the task environment of an intelligent agent:
| Letter | Component | Description |
|---|---|---|
| P | Performance Measure | Criteria to evaluate agent's success |
| E | Environment | External conditions where agent operates |
| A | Actuators | Mechanisms through which agent acts |
| S | Sensors | Mechanisms through which agent perceives |
Diagram:
PEAS Examples:
| Agent Type | Performance | Environment | Actuators | Sensors |
|---|---|---|---|---|
| Taxi Driver | Safe, fast, legal, comfortable trip, maximize profits | Roads, traffic, pedestrians, customers | Steering, accelerator, brake, signal, horn, display | Cameras, sonar, speedometer, GPS, engine sensors |
| Medical Diagnosis System | Healthy patient, minimize costs, lawsuits | Patient, hospital, staff | Display for questions, tests, diagnoses, treatments | Keyboard entry of symptoms, findings, patient answers |
| Vacuum Cleaner | Cleanliness, efficiency, battery life | Room, floor types, furniture, dust | Wheels, brushes, suction | Dirt sensor, bump sensor, cliff sensor |
| Chess Program | Winning games | Chessboard, opponent, clock | Display moves | Keyboard/mouse input |
| Spam Filter | Correctly classify emails, minimize false positives | Email server, user inbox | Move email to spam/inbox | Email header, content, sender info |
Importance of PEAS:
-
Provides structured approach to agent design
-
Clarifies what success means (Performance)
-
Identifies challenges (Environment)
-
Determines required capabilities (Actuators & Sensors)
-
Foundation for selecting appropriate agent architecture
UNIT 2: Problem Solving using Search
Q1. ⭐ What do you understand by state space in AI
[ Dec 2023 Q2(b), Jun 2023 Q3(a) | Frequency: 2 ]
Definition: A State Space is a mathematical representation of a problem that includes all possible states (configurations) that can be reached from the initial state through any sequence of actions.
Formal Definition: State Space is represented as a 4-tuple: S = {S, A, Action(s), Result(s,a)}
| Component | Description | Example (8-Puzzle) |
|---|---|---|
| S | Set of all possible states | All arrangements of tiles |
| A | Set of all possible actions | Move blank: Up, Down, Left, Right |
| Action(s) | Actions applicable in state s | Valid moves from current position |
| Result(s,a) | State resulting from action a in state s | New tile arrangement after move |
State Space Structure:
Diagram:
Key Elements:
-
Initial State: Starting point of the problem
-
Goal State: Desired end configuration
-
State: A unique configuration of the problem
-
Operators: Actions that transform one state to another
-
Path: Sequence of states from initial to goal
-
Path Cost: Sum of costs of individual actions
State Space as a Graph:
-
Nodes represent states
-
Edges represent actions/operators
-
Root node is the initial state
-
Goal nodes are target states
Q2. 📌 What is the utility of state space in AI
[ Dec 2023 Q2(b) | Frequency: 1 ]
State Space provides a structured framework for problem-solving in AI with the following utilities:
| Utility | Description |
|---|---|
| Problem Representation | Provides formal mathematical model for any problem |
| Search Foundation | Enables application of various search algorithms |
| Solution Path Identification | Helps find sequence of actions from start to goal |
| Complexity Analysis | Allows measurement of problem difficulty |
| Algorithm Comparison | Enables evaluation of different search strategies |
| Optimization | Facilitates finding optimal (least cost) solutions |
Diagram:
Practical Benefits:
-
Abstraction: Converts real-world problems into computational form
-
Generalization: Same search techniques work on different problems
-
Visualization: Graph representation aids understanding
-
Completeness Check: Ensures all possibilities are considered
-
Debugging: Easy to trace and verify solution paths
Q3. 📌 Discuss the formulation of state space search (with example)
[ Jun 2022 Q1(b) | Frequency: 1 ]
State Space Search Formulation involves defining a problem in terms of states, actions, and goals to enable systematic exploration.
Formulation Steps:
| Step | Description |
|---|---|
| 1. Define State | Describe what information represents a configuration |
| 2. Initial State | Specify the starting configuration |
| 3. Goal Test | Define condition(s) that identify solution |
| 4. Actions/Operators | List possible actions in each state |
| 5. Transition Model | Define result of each action |
| 6. Path Cost | Assign costs to actions (if applicable) |
Example: 8-Puzzle Problem
Diagram:
Formulation for 8-Puzzle:
| Component | Description |
|---|---|
| State | A 3×3 array showing position of each tile (0-8) |
| Initial State | Any configuration of tiles |
| Goal State | Tiles arranged as 1,2,3,4,5,6,7,8,blank |
| Actions | Move blank: UP, DOWN, LEFT, RIGHT |
| Transition | Swap blank with adjacent tile |
| Path Cost | Number of moves (each move costs 1) |
| Goal Test | Current state matches goal configuration |
State Space Size: 9! = 362,880 possible states (only half are reachable from any given state)
Q4. 📌 What are the various factors for developing a state space representation
[ Dec 2024 Q5(b) | Frequency: 1 ]
When developing a state space representation, the following factors must be considered:
| Factor | Description | Consideration |
|---|---|---|
| State Description | What information defines a state | Should be complete yet minimal |
| Initial State | Starting configuration | Must be clearly defined |
| Goal State(s) | Target configuration(s) | May be single, multiple, or condition-based |
| Operators/Actions | Possible state transitions | Should be well-defined and finite |
| Operator Preconditions | When can an action be applied | Must specify valid conditions |
| Operator Effects | Result of applying action | Must produce valid successor states |
| State Space Size | Total number of possible states | Affects computational feasibility |
| Branching Factor | Average number of successors | Impacts search efficiency |
| Solution Depth | Steps from initial to goal | Determines search strategy choice |
| Path Cost Function | Cost assignment to actions | Required for optimal solutions |
Diagram:
Key Principles:
-
Abstraction: Include only relevant details
-
Efficiency: Minimize state representation size
-
Completeness: Capture all necessary information
-
Uniqueness: Each state should have unique representation
-
Computability: States must be computationally manageable
Q5. ⭐ Write production rules for state space representation of water jug problem
[ Dec 2023 Q2(b), Jun 2023 Q3(a) | Frequency: 2 ]
Problem Statement: Given two jugs - one with capacity 4 gallons and another with capacity 3 gallons. Initially both are empty. Goal is to get exactly 2 gallons of water in the 4-gallon jug.
State Representation: (x, y) where:
-
x = gallons in 4-gallon jug (0 ≤ x ≤ 4)
-
y = gallons in 3-gallon jug (0 ≤ y ≤ 3)
Initial State: (0, 0) Goal State: (2, n) where n can be any value 0-3
Production Rules:
| Rule No. | Condition | Action | Result | Description |
|---|---|---|---|---|
| R1 | x < 4 | Fill 4-gallon jug | (4, y) | Fill 4-gal jug from tap |
| R2 | y < 3 | Fill 3-gallon jug | (x, 3) | Fill 3-gal jug from tap |
| R3 | x > 0 | Empty 4-gallon jug | (0, y) | Empty 4-gal jug on ground |
| R4 | y > 0 | Empty 3-gallon jug | (x, 0) | Empty 3-gal jug on ground |
| R5 | x + y ≥ 4, y > 0 | Pour 3-gal into 4-gal | (4, y-(4-x)) | Fill 4-gal from 3-gal |
| R6 | x + y ≥ 3, x > 0 | Pour 4-gal into 3-gal | (x-(3-y), 3) | Fill 3-gal from 4-gal |
| R7 | x + y ≤ 4, y > 0 | Pour all 3-gal into 4-gal | (x+y, 0) | Empty 3-gal into 4-gal |
| R8 | x + y ≤ 3, x > 0 | Pour all 4-gal into 3-gal | (0, x+y) | Empty 4-gal into 3-gal |
Solution Path:
Diagram:
Solution Trace:
| Step | State | Rule Applied | Description |
|---|---|---|---|
| 0 | (0, 0) | - | Initial state |
| 1 | (0, 3) | R2 | Fill 3-gallon jug |
| 2 | (3, 0) | R7 | Pour 3-gal into 4-gal |
| 3 | (3, 3) | R2 | Fill 3-gallon jug |
| 4 | (4, 2) | R5 | Pour from 3-gal to fill 4-gal |
| 5 | (0, 2) | R3 | Empty 4-gallon jug |
| 6 | (2, 0) | R7 | Pour 3-gal into 4-gal ✓ |
Q6. 📌 Formulate the missionaries and cannibal problem
[ Jun 2025 Q5(b)(i) | Frequency: 1 ]
Problem Statement: Three missionaries and three cannibals are on one side of a river with a boat that can hold at most two people. Find a way to get everyone to the other side, without ever leaving a group of missionaries outnumbered by cannibals (on either side).
State Representation: (M, C, B) where:
-
M = Number of missionaries on left bank (0, 1, 2, or 3)
-
C = Number of cannibals on left bank (0, 1, 2, or 3)
-
B = Position of boat (L = Left, R = Right)
Diagram:
Formulation:
| Component | Description |
|---|---|
| Initial State | (3, 3, L) - All on left bank, boat on left |
| Goal State | (0, 0, R) - All on right bank, boat on right |
| Operators | Move 1 or 2 people in boat to other side |
| Constraints | Missionaries ≥ Cannibals on both sides (when M > 0) |
Possible Actions:
-
Move 1 missionary
-
Move 1 cannibal
-
Move 2 missionaries
-
Move 2 cannibals
-
Move 1 missionary + 1 cannibal
Q7. 📌 Solve the missionaries and cannibal problem
[ Jun 2025 Q5(b)(i) | Frequency: 1 ]
Solution Steps:
| Step | Left Bank | Boat Move | Right Bank | State |
|---|---|---|---|---|
| 0 | 3M, 3C, ⛵ | - | - | (3,3,L) Initial |
| 1 | 3M, 1C | 2C → | 2C | (3,1,R) |
| 2 | 3M, 2C, ⛵ | ← 1C | 1C | (3,2,L) |
| 3 | 3M | 2C → | 3C | (3,0,R) |
| 4 | 3M, 1C, ⛵ | ← 1C | 2C | (3,1,L) |
| 5 | 1M, 1C | 2M → | 2M, 2C | (1,1,R) |
| 6 | 2M, 2C, ⛵ | ← 1M,1C | 1M, 1C | (2,2,L) |
| 7 | 2C | 2M → | 3M, 1C | (0,2,R) |
| 8 | 3C, ⛵ | ← 1C | 3M | (0,3,L) |
| 9 | 1C | 2C → | 3M, 2C | (0,1,R) |
| 10 | 2C, ⛵ | ← 1C | 3M, 1C | (0,2,L) |
| 11 | - | 2C → | 3M, 3C | (0,0,R) Goal |
Diagram:
Total Moves: 11 crossings
Q8. 📌 Draw the state-space search graph for solving missionaries and cannibal problem
[ Jun 2025 Q5(b)(ii) | Frequency: 1 ]
State Space Graph:
Diagram:
Legend:
-
🟡 Yellow: Initial State (3,3,L)
-
🟢 Green: Goal State (0,0,R)
-
Blue edges: Solution path
-
L states: Boat on left bank
-
R states: Boat on right bank
Q9. ⭐ Discuss the Adversarial search briefly
[ Dec 2022 Q1(b), Jun 2022 Q3(a) | Frequency: 2 ]
Definition: Adversarial Search is a search strategy used in competitive environments where two or more agents have conflicting goals. It is primarily used in game-playing AI.
Characteristics:
| Feature | Description |
|---|---|
| Multi-agent | Two or more competing agents |
| Conflicting Goals | One agent's gain is another's loss |
| Turn-based | Agents take alternate turns |
| Zero-sum | Total payoff is constant (win/lose) |
| Opponent Modeling | Must anticipate opponent's moves |
Diagram:
Key Concepts:
| Concept | Description |
|---|---|
| MAX Player | Agent trying to maximize utility |
| MIN Player | Opponent trying to minimize MAX's utility |
| Game Tree | Tree of all possible game states |
| Terminal State | End of game (win/lose/draw) |
| Utility Function | Score at terminal states |
Applications:
-
Chess, Checkers, Tic-Tac-Toe
-
Go, Backgammon
-
Card games
-
Real-time strategy games
Q10. 📌 How Adversarial Search is different from the normal search
[ Jun 2022 Q3(a) | Frequency: 1 ]
Comparison Table:
| Aspect | Normal Search | Adversarial Search |
|---|---|---|
| Agents | Single agent | Two or more competing agents |
| Environment | Passive | Active (opponent acts) |
| Control | Full control over actions | Partial control (opponent's turn) |
| Predictability | Deterministic outcomes | Must consider opponent's moves |
| Goal | Find path to goal state | Win or maximize score |
| Opponent | None | Rational opponent assumed |
| Strategy | Optimal path finding | Game-theoretic optimal play |
| Time Constraint | Usually flexible | Often real-time |
| Utility | Single evaluation | Alternating max/min evaluation |
| Examples | 8-puzzle, Route finding | Chess, Tic-Tac-Toe |
Diagram:
Key Differences:
- Decision Making:
- Normal: Choose best action for self
-
Adversarial: Choose best action considering opponent's response
-
Tree Structure:
- Normal: Single type of nodes
-
Adversarial: Alternating MAX and MIN nodes
-
Evaluation:
- Normal: Heuristic estimates distance to goal
- Adversarial: Heuristic estimates game position quality
Q11. ⭐ Name/Discuss the techniques/types used for adversarial search
[ Dec 2022 Q1(b), Jun 2022 Q3(a) | Frequency: 2 ]
Major Adversarial Search Techniques:
| Technique | Description | Use Case |
|---|---|---|
| Minimax | Complete search assuming optimal opponent | Perfect information games |
| Alpha-Beta Pruning | Optimized Minimax with branch elimination | Large game trees |
| Expectiminimax | Handles chance elements (dice, cards) | Games with randomness |
| Monte Carlo Tree Search | Statistical sampling approach | Go, complex games |
| Iterative Deepening | Progressive depth search with time limit | Real-time games |
Diagram:
Technique Details:
1. Minimax Algorithm:
-
Assumes both players play optimally
-
MAX player maximizes score, MIN player minimizes score
-
Complete and optimal for finite games
2. Alpha-Beta Pruning:
-
Enhancement to Minimax
-
Prunes branches that won't affect final decision
-
α = best value for MAX, β = best value for MIN
3. Expectiminimax:
-
Extension for games with chance (dice, cards)
-
Adds "chance nodes" for random events
4. Monte Carlo Tree Search (MCTS):
-
Uses random simulations
-
Balances exploration and exploitation
-
Used in Go (AlphaGo)
Q12. 📌 What is Min-Max Search Strategy
[ Jun 2023 Q1(b) | Frequency: 1 ]
Definition: The Minimax Search Strategy is a recursive algorithm used in decision-making and game theory for finding the optimal move for a player, assuming that the opponent also plays optimally.
Core Principle:
-
MAX Player: Tries to maximize the score (your move)
-
MIN Player: Tries to minimize the score (opponent's move)
How It Works:
| Step | Description |
|---|---|
| 1 | Generate complete game tree to terminal states |
| 2 | Apply utility function to terminal states |
| 3 | Propagate values up: MAX nodes take maximum, MIN nodes take minimum |
| 4 | Select move leading to highest value at root |
Diagram:
Value Propagation:
-
Terminal: (3,1), (5,6), (0,2), (4,1)
-
MAX level: max(3,1)=3, max(5,6)=6, max(0,2)=2, max(4,1)=4
-
MIN level: min(3,6)=3, min(2,4)=2
-
ROOT (MAX): max(3,2)=3 → Choose left branch
Q13. 📌 Discuss the Min-Max Search Strategy briefly
[ Dec 2022 Q3(a) | Frequency: 1 ]
Minimax is a backtracking algorithm used in game theory and AI for decision-making in two-player zero-sum games.
Key Components:
| Component | Description |
|---|---|
| Players | MAX (us) and MIN (opponent) |
| Game Tree | All possible moves and responses |
| Terminal States | Game end positions with utility values |
| Backup Values | Propagated scores from leaves to root |
Algorithm Flow:
Diagram:
Properties:
-
Complete: Yes, if tree is finite
-
Optimal: Yes, against optimal opponent
-
Time Complexity: O(b^m)
-
Space Complexity: O(bm)
Where: b = branching factor, m = maximum depth
Assumptions:
-
Both players play optimally
-
Perfect information (both see full game state)
-
Zero-sum game (one's gain = other's loss)
Q14. 📌 Write MINIMAX algorithm
[ Jun 2023 Q1(b) | Frequency: 1 ]
MINIMAX Algorithm (Pseudocode):
function MINIMAX(node, depth, isMaximizingPlayer):
if depth == 0 OR node is terminal:
return evaluate(node)
if isMaximizingPlayer:
maxEval = -∞
for each child of node:
eval = MINIMAX(child, depth-1, FALSE)
maxEval = max(maxEval, eval)
return maxEval
else: // Minimizing player
minEval = +∞
for each child of node:
eval = MINIMAX(child, depth-1, TRUE)
minEval = min(minEval, eval)
return minEval
// Initial call
bestMove = MINIMAX(rootNode, maxDepth, TRUE)
Alternative Representation:
function MINIMAX-VALUE(state):
if TERMINAL-TEST(state):
return UTILITY(state)
if PLAYER(state) == MAX:
return max(MINIMAX-VALUE(s) for s in SUCCESSORS(state))
else:
return min(MINIMAX-VALUE(s) for s in SUCCESSORS(state))
Function Descriptions:
| Function | Purpose |
|---|---|
TERMINAL-TEST |
Checks if game has ended |
UTILITY |
Returns score for terminal state |
SUCCESSORS |
Returns all possible next states |
PLAYER |
Returns whose turn it is |
evaluate |
Heuristic evaluation for non-terminal |
Flow Diagram:
Q15. ⭐ Explain Min-Max search algorithm (with example, properties, advantages, disadvantages)
[ Jun 2024 Q2(b) | Frequency: 1 ]
Definition: Minimax is a recursive backtracking algorithm that finds the optimal move for a player in a two-player zero-sum game by minimizing the maximum possible loss.
Example: Simple Game Tree
Diagram:
Step-by-Step Evaluation:
| Step | Node | Operation | Result |
|---|---|---|---|
| 1 | D, E | Terminal values | 3, 5 |
| 2 | F, G | Terminal values | 2, 9 |
| 3 | B (MIN) | min(3, 5) | 3 |
| 4 | C (MIN) | min(2, 9) | 2 |
| 5 | A (MAX) | max(3, 2) | 3 → Choose Left |
Properties:
| Property | Value |
|---|---|
| Complete | Yes (for finite game trees) |
| Optimal | Yes (against optimal opponent) |
| Time Complexity | O(b^m) |
| Space Complexity | O(bm) - DFS approach |
Where b = branching factor, m = maximum depth
Advantages:
| # | Advantage |
|---|---|
| 1 | Guarantees optimal play against rational opponent |
| 2 | Simple and intuitive recursive implementation |
| 3 | Complete - will find solution if exists |
| 4 | Based on solid game theory foundation |
| 5 | Foundation for advanced algorithms (Alpha-Beta) |
Disadvantages:
| # | Disadvantage |
|---|---|
| 1 | Exponential time complexity O(b^m) |
| 2 | Explores all nodes including unnecessary ones |
| 3 | Impractical for complex games (Chess, Go) |
| 4 | Cannot handle time constraints |
| 5 | Assumes opponent plays optimally |
Q16. ⭐ Give properties of MinMax search algorithm
[ Jun 2024 Q2(b), Dec 2022 Q3(a) | Frequency: 2 ]
Properties of Minimax Algorithm:
| Property | Description |
|---|---|
| Completeness | Yes - Always finds a solution if one exists (for finite trees) |
| Optimality | Yes - Finds optimal solution against an optimal opponent |
| Time Complexity | O(b^m) - Exponential in maximum depth |
| Space Complexity | O(bm) - Linear space using DFS |
Where:
-
b = Branching factor (average number of legal moves)
-
m = Maximum depth of game tree
Diagram:
Complexity Examples:
| Game | Branching Factor (b) | Depth (m) | Approximate States |
|---|---|---|---|
| Tic-Tac-Toe | ~4 | 9 | ~9! ≈ 362K |
| Chess | ~35 | ~100 | ~10^120 |
| Go | ~250 | ~150 | ~10^360 |
Additional Properties:
-
Deterministic: Same input always produces same output
-
Recursive: Natural recursive implementation
-
Zero-sum assumption: One player's gain equals other's loss
-
Perfect information: Both players see complete game state
Q17. ⭐ Write/Give advantages of MinMax search algorithm
[ Jun 2024 Q2(b), Dec 2022 Q3(a) | Frequency: 2 ]
Advantages of Minimax Algorithm:
| # | Advantage | Description |
|---|---|---|
| 1 | Optimal Solution | Guarantees best possible outcome against a rational opponent |
| 2 | Complete | Will always find a solution if one exists in finite games |
| 3 | Simple Logic | Easy to understand and implement using recursion |
| 4 | Theoretical Foundation | Based on solid game theory principles |
| 5 | Predictable | Deterministic behavior - same situation yields same decision |
| 6 | Foundation for Improvements | Basis for advanced algorithms like Alpha-Beta pruning |
| 7 | No Training Required | Works without learning or training data |
| 8 | Handles Any Two-Player Game | General algorithm applicable to various games |
| 9 | Perfect Play | Achieves perfect play in games with small state spaces |
| 10 | Clear Decision Making | Transparent reasoning for chosen moves |
Diagram:
Q18. ⭐ Write/Give disadvantages of MinMax search algorithm
[ Jun 2024 Q2(b), Dec 2022 Q3(a) | Frequency: 2 ]
Disadvantages of Minimax Algorithm:
| # | Disadvantage | Description |
|---|---|---|
| 1 | Exponential Time Complexity | O(b^m) makes it impractical for complex games |
| 2 | Explores All Nodes | Wastes time on branches that won't affect result |
| 3 | Memory Intensive | Large game trees require significant storage |
| 4 | No Time Limit Handling | Cannot guarantee move within time constraint |
| 5 | Assumes Optimal Opponent | May not exploit suboptimal opponent moves |
| 6 | Requires Complete Tree | Needs evaluation of all terminal states |
| 7 | Not Suitable for Real-time | Too slow for time-critical applications |
| 8 | Limited Depth in Practice | Must use heuristics for deep games |
| 9 | No Learning | Cannot improve from past games |
| 10 | Deterministic | Predictable; opponent can exploit patterns |
Diagram:
Solutions to Overcome:
| Problem | Solution |
|---|---|
| Explores all nodes | Alpha-Beta Pruning |
| Time constraints | Iterative Deepening |
| Deep trees | Heuristic Evaluation |
| Repeated positions | Transposition Tables |
UNIT 4: Predicate and Propositional Logic - Exam Answers
4.3-4.5 Introduction to Propositional Logic & Logical Connectives
Q1. 🔥 Differentiate between predicate and propositional logic.
[ Dec 2023 Q3(a) | Frequency: 1 ]
Propositional logic and predicate logic are two fundamental systems in mathematical logic used for knowledge representation in AI.
| Aspect | Propositional Logic | Predicate Logic |
|---|---|---|
| Definition | Deals with propositions (statements) that are either true or false | Extends propositional logic with predicates, quantifiers, and variables |
| Basic Unit | Propositions (atomic statements) | Predicates with arguments |
| Variables | No variables for objects | Uses variables to represent objects |
| Quantifiers | Not supported | Supports ∀ (for all) and ∃ (there exists) |
| Expressiveness | Limited - cannot express relationships between objects | Highly expressive - can express complex relationships |
| Complexity | Simple and decidable | More complex, semi-decidable |
| Example | P: "It is raining" | Raining(x): "x is raining", ∀x Human(x) → Mortal(x) |
| Connectives | AND (∧), OR (∨), NOT (¬), IMPLIES (→), IFF (↔) | Same as propositional + quantifiers |
Examples:
Propositional Logic:
-
P = "Socrates is a man" (True/False)
-
Q = "Socrates is mortal" (True/False)
-
Expression: P → Q
Predicate Logic:
-
Man(Socrates) - Socrates is a man
-
Mortal(x) - x is mortal
-
Expression: ∀x (Man(x) → Mortal(x))
Key Difference: Propositional logic treats statements as atomic units, while predicate logic can analyze the internal structure of statements using predicates and quantifiers.
4.7 Propositional Rules of Inference
Q2. ⭐ Describe the Modus Ponens as propositional rule of inference.
[ Jun 2023 Q1(d) | Frequency: 1 ]
Modus Ponens (Latin: "mode of affirming") is one of the most fundamental and widely used rules of inference in propositional logic.
Formal Definition:
Premise 1: P → Q (If P then Q)
Premise 2: P (P is true)
─────────────────────
Conclusion: Q (Therefore, Q is true)
Symbolic Representation:
Explanation:
-
If we know that "P implies Q" is true
-
And we also know that "P" is true
-
Then we can validly conclude that "Q" must be true
Example 1:
-
Premise 1: If it rains, the ground gets wet (Rain → WetGround)
-
Premise 2: It is raining (Rain)
-
Conclusion: The ground is wet (WetGround)
Example 2:
-
Premise 1: If a student studies hard, they will pass (Study → Pass)
-
Premise 2: John studies hard (Study)
-
Conclusion: John will pass (Pass)
Truth Table Verification:
| P | Q | P → Q | Valid when P=T and P→Q=T |
|---|---|---|---|
| T | T | T | Q = T ✓ |
| T | F | F | Invalid premise |
| F | T | T | P not satisfied |
| F | F | T | P not satisfied |
Importance in AI: Modus Ponens is the foundation of forward chaining in rule-based expert systems.
Q3. ⭐ Describe the Modus Tollens as propositional rule of inference.
[ Jun 2023 Q1(d) | Frequency: 1 ]
Modus Tollens (Latin: "mode of denying") is a rule of inference that allows us to derive a negation from a conditional statement and the negation of its consequent.
Formal Definition:
Premise 1: P → Q (If P then Q)
Premise 2: ¬Q (Q is false)
─────────────────────
Conclusion: ¬P (Therefore, P is false)
Symbolic Representation:
Explanation:
-
If we know that "P implies Q" is true
-
And we also know that "Q" is false
-
Then we can validly conclude that "P" must be false
Example 1:
-
Premise 1: If it rains, the ground gets wet (Rain → WetGround)
-
Premise 2: The ground is not wet (¬WetGround)
-
Conclusion: It is not raining (¬Rain)
Example 2:
-
Premise 1: If the battery is charged, the car will start (Charged → Start)
-
Premise 2: The car does not start (¬Start)
-
Conclusion: The battery is not charged (¬Charged)
Comparison of Modus Ponens and Modus Tollens:
| Aspect | Modus Ponens | Modus Tollens |
|---|---|---|
| Direction | Forward reasoning | Backward reasoning |
| Given | Antecedent is true | Consequent is false |
| Concludes | Consequent is true | Antecedent is false |
| Form | P→Q, P ⊢ Q | P→Q, ¬Q ⊢ ¬P |
| AI Application | Forward chaining | Backward chaining |
Importance in AI: Modus Tollens is fundamental for backward chaining and goal-driven reasoning in expert systems.
4.9 Validity and Satisfiability
Q4. 📌 Obtain Conjunctive Normal Form (CNF) for given formula.
[ Dec 2022 Q1(d) | Frequency: 1 ]
Conjunctive Normal Form (CNF): A formula is in CNF when it is a conjunction (AND) of clauses, where each clause is a disjunction (OR) of literals.
General Form: (L₁ ∨ L₂ ∨ ...) ∧ (L₃ ∨ L₄ ∨ ...) ∧ ...
Steps to Convert to CNF:
-
Eliminate implications (→): P → Q ≡ ¬P ∨ Q
-
Eliminate biconditionals (↔): P ↔ Q ≡ (P → Q) ∧ (Q → P)
-
Move negation inward (De Morgan's Laws):
- ¬(P ∧ Q) ≡ ¬P ∨ ¬Q
- ¬(P ∨ Q) ≡ ¬P ∧ ¬Q
-
¬¬P ≡ P
-
Distribute OR over AND: P ∨ (Q ∧ R) ≡ (P ∨ Q) ∧ (P ∨ R)
Example: Convert (P → Q) ∧ (¬P → R) to CNF
Step 1: Eliminate implications
-
P → Q ≡ ¬P ∨ Q
-
¬P → R ≡ ¬(¬P) ∨ R ≡ P ∨ R
Step 2: Combine with conjunction
- (¬P ∨ Q) ∧ (P ∨ R)
Result: (¬P ∨ Q) ∧ (P ∨ R) — Already in CNF
Example 2: Convert ¬(P ∧ Q) → R to CNF
Step 1: Eliminate implication
- ¬(P ∧ Q) → R ≡ ¬(¬(P ∧ Q)) ∨ R ≡ (P ∧ Q) ∨ R
Step 2: Distribute OR over AND
- (P ∧ Q) ∨ R ≡ (P ∨ R) ∧ (Q ∨ R)
Result: (P ∨ R) ∧ (Q ∨ R) — CNF achieved
Q5. 📌 Obtain Disjunctive Normal Form for given well form formula.
[ Jun 2022 Q1(d) | Frequency: 1 ]
Disjunctive Normal Form (DNF): A formula is in DNF when it is a disjunction (OR) of clauses, where each clause is a conjunction (AND) of literals.
General Form: (L₁ ∧ L₂ ∧ ...) ∨ (L₃ ∧ L₄ ∧ ...) ∨ ...
Steps to Convert to DNF:
-
Eliminate implications (→): P → Q ≡ ¬P ∨ Q
-
Eliminate biconditionals (↔): P ↔ Q ≡ (P ∧ Q) ∨ (¬P ∧ ¬Q)
-
Move negation inward (De Morgan's Laws)
-
Distribute AND over OR: P ∧ (Q ∨ R) ≡ (P ∧ Q) ∨ (P ∧ R)
Example: Convert (P ∨ Q) ∧ (R ∨ S) to DNF
Step 1: Apply distribution of AND over OR
-
(P ∨ Q) ∧ (R ∨ S)
-
= (P ∧ (R ∨ S)) ∨ (Q ∧ (R ∨ S))
-
= (P ∧ R) ∨ (P ∧ S) ∨ (Q ∧ R) ∨ (Q ∧ S)
Result: (P ∧ R) ∨ (P ∧ S) ∨ (Q ∧ R) ∨ (Q ∧ S) — DNF achieved
Example 2: Convert ¬(P → Q) to DNF
Step 1: Eliminate implication
- ¬(P → Q) ≡ ¬(¬P ∨ Q)
Step 2: Apply De Morgan's Law
- ¬(¬P ∨ Q) ≡ P ∧ ¬Q
Result: P ∧ ¬Q — Already in DNF (single term)
Comparison CNF vs DNF:
| Feature | CNF | DNF |
|---|---|---|
| Outer Connective | AND (∧) | OR (∨) |
| Inner Connective | OR (∨) | AND (∧) |
| Use in AI | Resolution theorem proving | Satisfiability checking |
| Example | (A ∨ B) ∧ (C ∨ D) | (A ∧ B) ∨ (C ∧ D) |
4.14 Propositional Resolution
Q6. 🔥 What do you understand by the term 'Resolution' in AI / Explain the concept.
[ Jun 2024 Q3(b), Dec 2023 Q3(b) | Frequency: 2 ]
Resolution is a powerful inference rule and proof technique used in automated theorem proving and AI systems for logical reasoning.
Definition: Resolution is a single inference rule that can be applied repeatedly to derive new clauses from existing ones, ultimately proving whether a statement is satisfiable or a consequence of given premises.
The Resolution Rule:
Clause 1: A ∨ B
Clause 2: ¬A ∨ C
─────────────────
Resolvent: B ∨ C
Formal Definition:
Where L is a literal, and α, β are disjunctions of literals.
Key Concepts:
-
Complementary Literals: Two literals are complementary if one is the negation of the other (e.g., P and ¬P)
-
Resolvent: The new clause produced by resolving two clauses
-
Empty Clause (□): When resolution produces an empty clause, it indicates a contradiction
Diagram:
Resolution Procedure:
-
Convert all statements to CNF
-
Negate the conclusion and add to clause set
-
Repeatedly apply resolution rule
-
If empty clause (□) is derived → Original statement is proved
-
If no new clauses can be derived → Statement cannot be proved
Q7. 📌 Discuss the utility of Resolution mechanism in AI.
[ Jun 2024 Q3(b) | Frequency: 1 ]
Utility of Resolution Mechanism in AI:
| Application Area | Utility |
|---|---|
| Theorem Proving | Automated proof verification in mathematical systems |
| Expert Systems | Deriving conclusions from knowledge bases |
| Logic Programming | Foundation of Prolog language |
| Knowledge Representation | Verifying consistency of knowledge bases |
| Planning Systems | Proving achievability of goals |
Key Advantages:
-
Completeness: Resolution is refutation-complete — if a contradiction exists, resolution will find it
-
Uniform Procedure: Single rule applicable to all logical problems
-
Mechanical Nature: Can be fully automated by computers
-
Sound: Never produces incorrect conclusions
-
Efficient for Certain Problems: Works well with Horn clauses
Practical Applications:
-
Prolog: Uses SLD-resolution for query answering
-
SAT Solvers: Use resolution-based techniques
-
Verification Systems: Prove program correctness
-
Database Query Optimization: Logical query processing
Q8. 📌 Discuss the relevance of resolution mechanism in AI briefly.
[ Jun 2022 Q1(e) | Frequency: 1 ]
Relevance of Resolution in AI:
1. Foundation of Logic Programming:
-
Prolog and similar languages are built on resolution
-
Enables declarative programming paradigm
2. Automated Reasoning:
-
Provides systematic method for machine reasoning
-
Eliminates need for human intervention in proofs
3. Knowledge Base Validation:
-
Checks consistency of expert system knowledge bases
-
Identifies contradictory rules
4. Query Answering Systems:
-
Derives answers from database facts and rules
-
Powers intelligent information retrieval
5. AI Planning:
-
Proves whether goals are achievable
-
Validates plan correctness
Why Resolution is Preferred:
| Feature | Benefit |
|---|---|
| Single Rule | Simplifies implementation |
| Refutation Complete | Guarantees finding proofs if they exist |
| Works with CNF | Standard form simplifies processing |
| Suitable for Automation | No human insight needed |
Q9. 🔥 Apply Resolution to conclude statement from knowledge / with example.
[ Jun 2024 Q3(b), Dec 2023 Q3(b) | Frequency: 2 ]
Example Problem: Prove "Socrates is mortal" from the following knowledge base:
-
Socrates is a man
-
All men are mortal
Step 1: Represent in Predicate Logic
-
Man(Socrates)
-
∀x (Man(x) → Mortal(x))
Step 2: Convert to Propositional Form (for simplicity)
-
P: Socrates is a man
-
Q: Socrates is mortal
-
Knowledge: P, P → Q
-
Goal: Prove Q
Step 3: Convert to CNF
-
Clause 1: P (given)
-
Clause 2: ¬P ∨ Q (from P → Q)
-
Clause 3: ¬Q (negation of goal for refutation)
Step 4: Apply Resolution
Clause 1: P
Clause 2: ¬P ∨ Q
Clause 3: ¬Q
Resolution Step 1: Resolve Clause 1 and Clause 2
P
¬P ∨ Q
─────────
Q (Resolvent 1)
Resolution Step 2: Resolve Resolvent 1 and Clause 3
Q
¬Q
─────────
□ (Empty Clause - Contradiction!)
Diagram:
Conclusion: Since we derived the empty clause (□), the negation of our goal leads to a contradiction. Therefore, the original goal "Socrates is mortal" is PROVED.
Another Example: Prove R from {P, P→Q, Q→R}
| Step | Clauses Used | Resolvent |
|---|---|---|
| 1 | P, ¬P∨Q | Q |
| 2 | Q, ¬Q∨R | R |
| 3 | R, ¬R | □ |
Result: Goal R is proved by deriving empty clause.
Summary Table: Unit 4 Key Concepts
| Topic | Key Points |
|---|---|
| Propositional vs Predicate Logic | Predicate adds quantifiers (∀,∃) and predicates |
| Modus Ponens | P→Q, P ⊢ Q (forward reasoning) |
| Modus Tollens | P→Q, ¬Q ⊢ ¬P (backward reasoning) |
| CNF | AND of ORs — used in resolution |
| DNF | OR of ANDs — used in satisfiability |
| Resolution | Single inference rule for automated theorem proving |
UNIT 5: First Order Logic - Exam Answers
5.2 Syntax of First Order Predicate Logic (FOPL)
Q1. 🔥 Symbolize "Every rational number is a real number" using predicate logic.
[ Dec 2024 Q1(d)(i), Dec 2023 Q3(a)(i) | Frequency: 2 ]
Given Statement: "Every rational number is a real number"
Step 1: Identify Predicates
-
Let R(x) = "x is a rational number"
-
Let E(x) = "x is a real number"
Step 2: Analyze the Statement
-
"Every" indicates Universal Quantifier (∀)
-
The statement means: For all x, if x is rational, then x is real
Step 3: Symbolic Representation
Interpretation: For all x, if x is a rational number, then x is a real number.
Alternative Notation: ∀x (Rational(x) → Real(x))
Q2. 🔥 Symbolize "Some real numbers are rational numbers" using predicate logic.
[ Dec 2024 Q1(d)(ii), Dec 2023 Q3(a)(ii) | Frequency: 2 ]
Given Statement: "Some real numbers are rational numbers"
Step 1: Identify Predicates
-
Let R(x) = "x is a rational number"
-
Let E(x) = "x is a real number"
Step 2: Analyze the Statement
-
"Some" indicates Existential Quantifier (∃)
-
The statement means: There exists at least one x that is both real AND rational
Step 3: Symbolic Representation
Interpretation: There exists an x such that x is a real number AND x is a rational number.
Important Note: With existential quantifier, we use conjunction (∧) not implication (→).
Q3. 🔥 Symbolize "Not every real number is a rational number" using predicate logic.
[ Dec 2024 Q1(d)(iii), Dec 2023 Q3(a)(iii) | Frequency: 2 ]
Given Statement: "Not every real number is a rational number"
Step 1: Identify Predicates
-
Let R(x) = "x is a rational number"
-
Let E(x) = "x is a real number"
Step 2: Analyze the Statement
-
"Not every" = negation of universal quantifier
-
Equivalent to: "There exists some real number that is not rational"
Step 3: Symbolic Representation
Method 1 (Direct Negation):
Method 2 (Equivalent Form using De Morgan's for Quantifiers):
Equivalence Explanation:
-
¬∀x P(x) ≡ ∃x ¬P(x)
-
¬(E(x) → R(x)) ≡ E(x) ∧ ¬R(x)
Interpretation: There exists an x such that x is a real number AND x is NOT a rational number (e.g., √2, π).
5.4 Semantics of Quantifiers
Q4. 📌 Translate "∃ₓC(x)" into English (car dealer context).
[ Jun 2024 Q3(a)(i) | Frequency: 1 ]
Given: ∃x C(x)
Where: C(x) = "x is a car dealer"
Translation:
Alternative Translations:
-
"Some person is a car dealer"
-
"At least one car dealer exists"
-
"There is a car dealer"
Q5. 📌 Translate "∃ₓH(x)" into English (honest context).
[ Jun 2024 Q3(a)(ii) | Frequency: 1 ]
Given: ∃x H(x)
Where: H(x) = "x is honest"
Translation:
Alternative Translations:
-
"Some person is honest"
-
"At least one honest person exists"
-
"There is an honest person"
Q6. 🔥 Translate "∀x C(x)→∼H(x)" into English (car dealer/honest context).
[ Jun 2025 Q1(d)(i), Jun 2024 Q3(a)(iii) | Frequency: 2 ]
Given: ∀x (C(x) → ¬H(x))
Where:
-
C(x) = "x is a car dealer"
-
H(x) = "x is honest"
Translation:
Alternative Translations:
-
"Every car dealer is dishonest"
-
"For every person, if they are a car dealer, then they are not honest"
-
"No car dealer is honest"
Logical Breakdown:
| Component | Meaning |
|---|---|
| ∀x | For all x |
| C(x) | x is a car dealer |
| → | implies |
| ¬H(x) | x is not honest |
Q7. 🔥 Translate "∃x (C(x)∧H(x))" into English (car dealer/honest context).
[ Jun 2025 Q1(d)(ii), Jun 2024 Q3(a)(iv) | Frequency: 2 ]
Given: ∃x (C(x) ∧ H(x))
Where:
-
C(x) = "x is a car dealer"
-
H(x) = "x is honest"
Translation:
Alternative Translations:
-
"There exists a car dealer who is honest"
-
"At least one person is both a car dealer and honest"
-
"There is an honest car dealer"
Note: The conjunction (∧) indicates that both properties must hold simultaneously for the same individual.
Q8. 🔥 Translate "∃x (H(x)→C(x))" into English (car dealer/honest context).
[ Jun 2025 Q1(d)(iii), Jun 2024 Q3(a)(v) | Frequency: 2 ]
Given: ∃x (H(x) → C(x))
Where:
-
C(x) = "x is a car dealer"
-
H(x) = "x is honest"
Translation:
Logical Analysis:
Since P → Q ≡ ¬P ∨ Q, we have:
- H(x) → C(x) ≡ ¬H(x) ∨ C(x)
Equivalent Translations:
-
"There exists someone who is either not honest or is a car dealer"
-
"Some person being honest implies they are a car dealer"
Important Observation: This statement is TRUE if there exists anyone who is:
-
Dishonest (regardless of being a car dealer), OR
-
A car dealer (regardless of being honest)
5.6 Conversion to Clausal Form
Q9. ⭐ What is Prenex Normal Form (PNF)?
[ Jun 2023 Q1(e) | Frequency: 1 ]
Prenex Normal Form (PNF) is a standardized way of writing First Order Logic formulas where all quantifiers appear at the beginning (prefix) of the formula.
Definition: A formula is in PNF if it has the form:
Where:
-
Q₁, Q₂, ..., Qₙ are quantifiers (∀ or ∃)
-
x₁, x₂, ..., xₙ are variables
-
M is a quantifier-free formula (called the matrix)
Structure:
| Part | Description |
|---|---|
| Prefix | String of quantifiers (∀x₁∃x₂∀x₃...) |
| Matrix | Quantifier-free formula containing predicates and connectives |
Example:
-
Not in PNF: ∀x P(x) → ∃y Q(y)
-
In PNF: ∃x ∀y (¬P(x) ∨ Q(y))
Importance in AI:
-
Required for resolution-based theorem proving
-
Standardizes formulas for automated processing
-
Simplifies unification algorithms
Q10. ⭐ Write steps to transform FOPL to PNF.
[ Jun 2024 Q1(c) | Frequency: 1 ]
Steps to Convert FOPL Formula to Prenex Normal Form:
| Step | Operation | Example |
|---|---|---|
| 1 | Eliminate implications (→) and biconditionals (↔) | P→Q becomes ¬P∨Q |
| 2 | Move negations inward using De Morgan's laws | ¬(P∧Q) becomes ¬P∨¬Q |
| 3 | Standardize variables (rename if needed) | Ensure each quantifier uses unique variable |
| 4 | Move all quantifiers to the front | Pull quantifiers to prefix position |
Detailed Rules for Step 4:
¬∀x P(x) ≡ ∃x ¬P(x)
¬∃x P(x) ≡ ∀x ¬P(x)
(∀x P(x)) ∧ Q ≡ ∀x (P(x) ∧ Q) [if x not free in Q]
(∀x P(x)) ∨ Q ≡ ∀x (P(x) ∨ Q) [if x not free in Q]
(∃x P(x)) ∧ Q ≡ ∃x (P(x) ∧ Q) [if x not free in Q]
(∃x P(x)) ∨ Q ≡ ∃x (P(x) ∨ Q) [if x not free in Q]
Diagram:
Q11. ⭐ Apply the steps to transform given formula to PNF.
[ Jun 2024 Q1(c), Jun 2023 Q1(e) | Frequency: 2 ]
Example: Convert ∀x P(x) → ∃y Q(y) to PNF
Step 1: Eliminate Implication
Step 2: Move Negation Inward
Step 3: Standardize Variables
- Variables x and y are already distinct ✓
Step 4: Move Quantifiers to Front
Final PNF:
Another Example: Convert ∀x(P(x) → ∀y Q(x,y)) to PNF
| Step | Transformation |
|---|---|
| Original | ∀x(P(x) → ∀y Q(x,y)) |
| Eliminate → | ∀x(¬P(x) ∨ ∀y Q(x,y)) |
| Move ∀y out | ∀x ∀y(¬P(x) ∨ Q(x,y)) |
Final PNF: ∀x ∀y (¬P(x) ∨ Q(x,y))
Q12. 🔥 What is Skolemization?
[ Dec 2023 Q1(c), Dec 2022 Q1(e) | Frequency: 2 ]
Skolemization is a technique for eliminating existential quantifiers (∃) from a formula while preserving satisfiability.
Definition: The process of replacing existentially quantified variables with Skolem constants or Skolem functions.
Why Skolemization?
-
Resolution requires formulas in clausal form
-
Clausal form cannot have existential quantifiers
-
Skolemization removes ∃ while maintaining logical equivalence for satisfiability
Rules of Skolemization:
| Case | Rule | Example |
|---|---|---|
| ∃x not in scope of ∀ | Replace with Skolem constant | ∃x P(x) → P(a) |
| ∃x in scope of ∀y | Replace with Skolem function | ∀y∃x P(x,y) → ∀y P(f(y),y) |
Key Points:
-
Skolem constants are new unique constants (a, b, c...)
-
Skolem functions are new unique functions (f, g, h...)
-
The resulting formula is satisfiability-equivalent (not logically equivalent)
Q13. ⭐ What is the utility of Skolemization?
[ Dec 2023 Q1(c) | Frequency: 1 ]
Utility of Skolemization in AI:
| Purpose | Description |
|---|---|
| Clause Form Conversion | Enables conversion to clausal form for resolution |
| Automated Theorem Proving | Essential step in resolution-based provers |
| Eliminate ∃ Quantifiers | Removes existential quantifiers systematically |
| Preserve Satisfiability | Maintains satisfiability for proof procedures |
| Simplify Processing | Reduces complexity of logical formulas |
Importance in Resolution:
-
Resolution works only with universally quantified clauses
-
Skolemization removes all ∃ quantifiers
-
After Skolemization, all variables are implicitly universally quantified
-
Enables unification algorithm to work properly
Workflow:
Original Formula → PNF → Skolemization → CNF → Resolution
Q14. 🔥 Skolemize the given expression.
[ Dec 2023 Q1(c), Dec 2022 Q1(e) | Frequency: 2 ]
Example 1: Skolemize ∃x ∀y P(x, y)
Analysis:
-
∃x is NOT in scope of any ∀
-
Replace x with Skolem constant 'a'
Result:
Example 2: Skolemize ∀x ∃y P(x, y)
Analysis:
-
∃y IS in scope of ∀x
-
Replace y with Skolem function f(x)
Result:
Example 3: Skolemize ∀x ∀y ∃z P(x, y, z)
Analysis:
-
∃z is in scope of both ∀x and ∀y
-
Replace z with Skolem function f(x, y)
Result:
Summary Table:
| Original | Skolemized Form |
|---|---|
| ∃x P(x) | P(a) |
| ∀x ∃y R(x,y) | ∀x R(x, f(x)) |
| ∃x ∀y ∃z Q(x,y,z) | ∀y Q(a, y, g(y)) |
5.7 Resolution & Unification
Q15. 🔥 Explain the concept of unification in AI.
[ Dec 2023 Q3(b), Jun 2022 Q1(e) | Frequency: 2 ]
Unification is a fundamental process in AI that finds substitutions to make two logical expressions identical.
Definition: Unification is the process of finding a substitution (called the Most General Unifier or MGU) that makes two expressions syntactically identical.
Key Concepts:
| Term | Definition |
|---|---|
| Substitution | Mapping of variables to terms {x/a, y/f(b)} |
| Unifier | A substitution that makes two expressions identical |
| MGU | Most General Unifier - the simplest unifier |
| Unifiable | Two expressions that can be made identical |
Unification Algorithm Steps:
-
If both expressions are constants, they unify if identical
-
If one is a variable, substitute it with the other term
-
If both are compound terms, unify functor and arguments recursively
-
Apply occurs check to prevent infinite substitutions
Diagram:
Relevance in AI:
-
Essential for resolution-based theorem proving
-
Used in Prolog for pattern matching
-
Enables logical inference in expert systems
Q16. ⭐ Explain unification in AI with suitable example.
[ Dec 2023 Q3(b) | Frequency: 1 ]
Unification Examples:
Example 1: Simple Unification
Unify: P(x, a) and P(b, a)
| Step | Action | Result |
|---|---|---|
| 1 | Compare functors | P = P ✓ |
| 2 | Compare first args | x and b → substitute {x/b} |
| 3 | Compare second args | a = a ✓ |
MGU: {x/b} Unified Expression: P(b, a)
Example 2: Complex Unification
Unify: P(x, f(y)) and P(a, f(b))
| Step | Expression 1 | Expression 2 | Substitution |
|---|---|---|---|
| 1 | P | P | Match ✓ |
| 2 | x | a | {x/a} |
| 3 | f(y) | f(b) | Unify recursively |
| 4 | y | b | {y/b} |
MGU: {x/a, y/b} Unified Expression: P(a, f(b))
Example 3: Unification Failure
Unify: P(x, x) and P(a, b)
| Step | Action | Result |
|---|---|---|
| 1 | x unifies with a | {x/a} |
| 2 | Apply {x/a} → P(a, a) | |
| 3 | Compare a with b | a ≠ b → FAIL |
Result: Cannot unify (no MGU exists)
Example 4: Occurs Check
Unify: P(x) and P(f(x))
| Step | Action | Result |
|---|---|---|
| 1 | Try {x/f(x)} | |
| 2 | Occurs check: x in f(x)? | YES → FAIL |
Result: Cannot unify (would create infinite term)
Summary of Unification Rules:
| Case | Result |
|---|---|
| Identical constants | Unify with {} |
| Different constants | FAIL |
| Variable with term (no occur) | Substitute |
| Variable in term (occurs) | FAIL |
| Same functor, same arity | Unify arguments |
| Different functors/arity | FAIL |
Quick Reference: FOPL Symbolization Patterns
| English Pattern | FOPL Symbol |
|---|---|
| All A are B | ∀x (A(x) → B(x)) |
| Some A are B | ∃x (A(x) ∧ B(x)) |
| No A are B | ∀x (A(x) → ¬B(x)) |
| Not all A are B | ∃x (A(x) ∧ ¬B(x)) |
| Some A are not B | ∃x (A(x) ∧ ¬B(x)) |
UNIT 6: Rule Based Systems and Other Formalism
6.2 Rule Based Systems
Q1. 🔥 Explain Rule Based Systems in AI and its types.
[ Dec 2023 Q3(c), Jun 2022 Q5(a) | Frequency: 2 ]
Rule Based Systems (RBS): A Rule Based System is an AI system that uses a set of IF-THEN rules to represent knowledge and make inferences. It consists of a knowledge base containing rules and facts, and an inference engine that applies rules to derive new information or make decisions.
Components of Rule Based System:
| Component | Description |
|---|---|
| Knowledge Base | Contains domain-specific rules and facts |
| Inference Engine | Applies rules to facts to derive conclusions |
| Working Memory | Stores current facts and intermediate results |
| User Interface | Allows interaction with the system |
| Explanation Facility | Explains reasoning process to users |
Diagram:
Types of Rule Based Systems:
| Type | Description | Example |
|---|---|---|
| Forward Chaining (Data-Driven) | Starts with known facts and applies rules to derive new facts until goal is reached | Medical diagnosis systems |
| Backward Chaining (Goal-Driven) | Starts with goal and works backward to find supporting facts | Expert systems like MYCIN |
| Hybrid Systems | Combines both forward and backward chaining | Complex decision support systems |
Rule Format:
IF <condition/antecedent> THEN <action/consequent>
Example:
Rule 1: IF fever AND cough THEN suspect_flu
Rule 2: IF suspect_flu AND body_ache THEN diagnose_flu
Rule 3: IF diagnose_flu THEN prescribe_rest_and_fluids
Q2. ⭐ Give advantages of Rule Based Systems.
[ Dec 2023 Q3(c) | Frequency: 1 ]
Advantages of Rule Based Systems:
| # | Advantage | Description |
|---|---|---|
| 1 | Modularity | Rules can be added, modified, or deleted independently without affecting other rules |
| 2 | Transparency | Easy to understand how conclusions are reached; provides clear explanation trail |
| 3 | Separation of Knowledge | Knowledge (rules) is separated from control (inference engine) |
| 4 | Easy Maintenance | Simple to update and maintain as domain knowledge changes |
| 5 | Naturalness | IF-THEN rules closely resemble human reasoning patterns |
| 6 | Reusability | Same inference engine can be used with different knowledge bases |
| 7 | Consistency | Always produces consistent results for same inputs |
| 8 | Expert Knowledge Capture | Effectively captures and preserves domain expert knowledge |
| 9 | Incremental Development | System can be built and tested incrementally |
| 10 | Explanation Capability | Can explain its reasoning process to users |
Q3. ⭐ Give disadvantages of Rule Based Systems.
[ Dec 2023 Q3(c) | Frequency: 1 ]
Disadvantages of Rule Based Systems:
| # | Disadvantage | Description |
|---|---|---|
| 1 | Knowledge Acquisition Bottleneck | Difficult to extract and formalize expert knowledge into rules |
| 2 | Scalability Issues | Performance degrades as number of rules increases |
| 3 | Rule Conflicts | Multiple rules may fire simultaneously causing conflicts |
| 4 | Brittleness | Cannot handle situations not covered by existing rules |
| 5 | No Learning Capability | Cannot learn from experience; rules must be manually updated |
| 6 | Combinatorial Explosion | Complex domains require enormous number of rules |
| 7 | Maintenance Complexity | Large rule bases become difficult to maintain and debug |
| 8 | Limited Reasoning | Cannot handle uncertain or incomplete information well |
| 9 | Inefficiency | May check irrelevant rules repeatedly |
| 10 | Domain Dependency | Rules are specific to one domain; not easily transferable |
Q4. 📌 Give the two important sources of uncertainty in Rule Based Systems.
[ Dec 2023 Q3(c) | Frequency: 1 ]
Two Important Sources of Uncertainty in Rule Based Systems:
1. Uncertainty in Rules (Rule Uncertainty):
-
Rules themselves may not be absolutely certain
-
Expert knowledge is often approximate or probabilistic
-
The relationship between conditions and conclusions may not be 100% reliable
Example:
IF patient_has_fever AND patient_has_cough
THEN patient_has_flu (Certainty Factor: 0.7)
The rule only suggests 70% confidence in the conclusion.
2. Uncertainty in Data/Facts (Data Uncertainty):
-
Input data may be incomplete, imprecise, or unreliable
-
Sensor readings may have errors
-
User-provided information may be uncertain or vague
Example:
Fact: patient_has_fever (Confidence: 0.8)
The observation itself has only 80% reliability.
Diagram:
Handling Uncertainty:
-
Certainty Factors (CF)
-
Fuzzy Logic
-
Bayesian Probability
-
Dempster-Shafer Theory
6.2.1 Forward Chaining
Q5. 🔥 Explain Forward Chaining Systems with example.
[ Jun 2024 Q1(d), Dec 2022 Q5(a) | Frequency: 2 ]
Forward Chaining: Forward chaining is a data-driven reasoning strategy that starts with available facts and applies inference rules to extract more data until a goal is reached. It works from antecedent (IF part) to consequent (THEN part).
Characteristics:
-
Also called bottom-up reasoning or data-driven reasoning
-
Begins with known facts in working memory
-
Applies all applicable rules to generate new facts
-
Continues until goal is achieved or no more rules can fire
Algorithm:
-
Start with initial facts in working memory
-
Find all rules whose conditions (IF part) match the facts
-
Fire the matched rules and add conclusions to working memory
-
Repeat steps 2-3 until goal is reached or no rules fire
Diagram:
Example:
Knowledge Base (Rules):
R1: IF animal_has_feathers THEN animal_is_bird
R2: IF animal_is_bird AND animal_cannot_fly THEN animal_is_penguin
R3: IF animal_is_penguin THEN animal_lives_in_cold_regions
Initial Facts:
-
animal_has_feathers
-
animal_cannot_fly
Forward Chaining Process:
| Step | Working Memory | Rule Fired | New Fact Added |
|---|---|---|---|
| 1 | animal_has_feathers, animal_cannot_fly | R1 | animal_is_bird |
| 2 | animal_has_feathers, animal_cannot_fly, animal_is_bird | R2 | animal_is_penguin |
| 3 | All above + animal_is_penguin | R3 | animal_lives_in_cold_regions |
Applications:
-
Planning systems
-
Monitoring and control systems
-
Configuration systems
-
CLIPS, OPS5 expert systems
6.2.2 Backward Chaining
Q6. 🔥 Explain Backward Chaining System with example.
[ Dec 2023 Q1(d), Jun 2023 Q5(a) | Frequency: 2 ]
Backward Chaining: Backward chaining is a goal-driven reasoning strategy that starts with a goal (hypothesis) and works backward to find facts that support the goal. It works from consequent (THEN part) to antecedent (IF part).
Characteristics:
-
Also called top-down reasoning or goal-driven reasoning
-
Begins with a goal to prove
-
Finds rules whose conclusions match the goal
-
Sets rule conditions as new subgoals
-
Continues until all subgoals are proven by known facts
Algorithm:
-
Start with the goal to be proven
-
Find rules whose THEN part matches the goal
-
Set the IF conditions of matching rules as subgoals
-
Recursively try to prove each subgoal
-
If subgoal matches a known fact, it is proven
-
Goal is proven when all subgoals are proven
Diagram:
Example:
Knowledge Base (Rules):
R1: IF has_feathers THEN is_bird
R2: IF is_bird AND cannot_fly THEN is_penguin
R3: IF is_penguin THEN lives_in_cold
Known Facts: has_feathers, cannot_fly
Goal: Prove "lives_in_cold"
Backward Chaining Process:
| Step | Current Goal | Matching Rule | Subgoals Generated |
|---|---|---|---|
| 1 | lives_in_cold | R3 | is_penguin |
| 2 | is_penguin | R2 | is_bird, cannot_fly |
| 3 | is_bird | R1 | has_feathers |
| 4 | has_feathers | — | Known fact ✓ |
| 5 | cannot_fly | — | Known fact ✓ |
Result: Goal "lives_in_cold" is PROVEN
Applications:
-
Diagnostic systems (MYCIN)
-
Query answering systems
-
Prolog programming language
-
Theorem proving
Comparison:
| Aspect | Forward Chaining | Backward Chaining |
|---|---|---|
| Direction | Data → Goal | Goal → Data |
| Also called | Bottom-up | Top-down |
| Best for | Multiple goals possible | Specific goal to prove |
| Efficiency | May derive irrelevant facts | Focused on goal |
6.3 Semantic Nets
Q7. 🔥 Explain Semantic Nets with suitable example.
[ Jun 2024 Q3(c)(iii), Dec 2022 Q5(b) | Frequency: 2 ]
Semantic Networks: Semantic networks (Semantic Nets) are knowledge representation schemes that use a graph structure consisting of nodes and labeled edges (arcs) to represent objects, concepts, and relationships between them.
Components:
| Component | Representation | Description |
|---|---|---|
| Nodes | Circles/Boxes | Represent objects, concepts, or entities |
| Edges/Arcs | Labeled arrows | Represent relationships between nodes |
| Labels | Text on edges | Describe the type of relationship |
Common Relationship Types:
| Relationship | Meaning | Example |
|---|---|---|
| IS-A | Class membership/inheritance | Dog IS-A Animal |
| HAS-A | Part-whole/composition | Car HAS-A Engine |
| CAN | Capability | Bird CAN Fly |
| INSTANCE-OF | Specific instance of class | Tommy INSTANCE-OF Dog |
Example: Representing Animal Kingdom
Diagram:
Advantages of Semantic Networks:
-
Visual and intuitive representation
-
Supports inheritance through IS-A links
-
Easy to understand relationships
-
Natural for representing taxonomies
-
Efficient for certain types of queries
Disadvantages:
-
No standard definition of link types
-
Difficult to represent complex relationships
-
Limited expressiveness for logic
-
Inheritance can be ambiguous
Applications:
-
Natural Language Processing
-
Knowledge bases
-
Information retrieval
-
Conceptual modeling
Q8. ⭐ Draw a semantic network for given English statement.
[ Dec 2024 Q5(a) | Frequency: 1 ]
Statement Example: "John owns a red car. The car has four wheels. John is a person who works at IBM."
Diagram:
General Steps to Draw Semantic Network:
| Step | Action |
|---|---|
| 1 | Identify all entities/objects (nouns) → Create nodes |
| 2 | Identify relationships (verbs, prepositions) → Create labeled edges |
| 3 | Identify properties/attributes → Create property nodes |
| 4 | Connect nodes with appropriate relationship labels |
| 5 | Use IS-A for class membership, INSTANCE-OF for specific objects |
6.4 Frames
Q9. 🔥 Explain Frames with suitable example.
[ Jun 2024 Q3(c)(i), Jun 2022 Q5(b) | Frequency: 2 ]
Frames: Frames are a structured knowledge representation scheme proposed by Marvin Minsky (1975). A frame represents a stereotyped situation or object as a collection of slots (attributes) and their values (fillers). Frames capture both declarative and procedural knowledge.
Structure of a Frame:
| Component | Description |
|---|---|
| Frame Name | Unique identifier for the frame |
| Slots | Attributes or properties of the frame |
| Facets | Meta-information about slots (type, default, constraints) |
| Fillers/Values | Actual values assigned to slots |
| Procedures | Actions triggered when slots are accessed/modified |
Slot Facets:
| Facet | Purpose |
|---|---|
| Value | Current value of the slot |
| Default | Value used if no specific value given |
| Type | Data type constraint |
| Range | Valid value range |
| If-needed | Procedure to compute value when needed |
| If-added | Procedure triggered when value is added |
| If-removed | Procedure triggered when value is removed |
Example: University Frame System
Frame: PERSON
Slots:
Name: [Type: String]
Age: [Type: Integer, Range: 0-150]
Gender: [Type: {Male, Female, Other}]
Frame: STUDENT (IS-A: PERSON)
Slots:
Student_ID: [Type: String]
Department: [Type: String]
GPA: [Type: Float, Range: 0.0-10.0, Default: 0.0]
Courses: [Type: List]
Advisor: [Type: PROFESSOR]
Frame: JOHN (INSTANCE-OF: STUDENT)
Slots:
Name: "John Smith"
Age: 21
Gender: Male
Student_ID: "CS2024001"
Department: "Computer Science"
GPA: 8.5
Courses: [AI, DBMS, Networks]
Diagram:
Features of Frames:
-
Inheritance - Child frames inherit slots from parent frames
-
Default Values - Provide typical values for slots
-
Procedural Attachment - Demons/triggers for slot operations
-
Multiple Inheritance - Can inherit from multiple parent frames
Advantages:
-
Natural representation of objects and classes
-
Supports inheritance and default reasoning
-
Combines data and procedures
-
Easy to understand and organize
Disadvantages:
-
Complex inheritance can be confusing
-
Difficult to handle exceptions
-
No standard frame language
6.5 Scripts
Q10. 🔥 Explain Scripts with suitable example.
[ Jun 2024 Q3(c)(ii), Jun 2023 Q5(b) | Frequency: 2 ]
Scripts: Scripts are structured knowledge representation schemes proposed by Schank and Abelson (1977) for representing stereotyped sequences of events. A script captures the typical sequence of actions that occur in familiar situations, enabling understanding and prediction of events.
Components of a Script:
| Component | Description | Example (Restaurant Script) |
|---|---|---|
| Entry Conditions | Prerequisites for script activation | Customer is hungry, has money |
| Roles | Participants in the script | Customer, Waiter, Chef, Cashier |
| Props | Objects used in the script | Menu, Table, Food, Bill, Money |
| Scenes | Major subdivisions of the script | Entering, Ordering, Eating, Paying |
| Track | Specific variation of the script | Fast-food, Fine-dining, Cafeteria |
| Results | Outcomes after script execution | Customer not hungry, has less money |
Example: Restaurant Script
SCRIPT: RESTAURANT
Track: Sit-down Restaurant
ENTRY CONDITIONS:
- Customer is hungry
- Customer has money
- Restaurant is open
ROLES:
- Customer (C)
- Waiter (W)
- Chef (CH)
- Cashier (CA)
PROPS:
- Tables, Chairs, Menu
- Food, Plates, Utensils
- Bill, Money/Card
SCENES:
Scene 1: ENTERING
C enters restaurant
C waits to be seated
W leads C to table
C sits down
Scene 2: ORDERING
W brings menu
C reads menu
C decides on food
W takes order
W gives order to CH
Scene 3: EATING
CH prepares food
W brings food to C
C eats food
Scene 4: PAYING
C asks for bill
W brings bill
C pays CA
C leaves tip
C exits restaurant
RESULTS:
- Customer is not hungry
- Customer has less money
- Restaurant has more money
Diagram:
Applications of Scripts:
-
Natural Language Understanding - Understanding stories and conversations
-
Question Answering - Inferring unstated information
-
Prediction - Anticipating what happens next
-
Planning - Generating action sequences
Example Usage: Statement: "John went to a restaurant and ordered pizza."
Inferences from Restaurant Script:
-
John was probably hungry
-
John sat at a table
-
A waiter took his order
-
John will receive pizza
-
John will pay a bill
Advantages:
-
Captures temporal sequences naturally
-
Enables inference of unstated events
-
Supports expectation-based understanding
-
Good for stereotyped situations
Disadvantages:
-
Limited to familiar, routine situations
-
Cannot handle novel situations
-
Difficult to modify or combine scripts
-
Scripts may be culture-specific
Summary: Comparison of Knowledge Representation Schemes
| Scheme | Structure | Best For | Limitations |
|---|---|---|---|
| Rule-Based | IF-THEN rules | Decision making, diagnosis | No learning, scalability |
| Semantic Nets | Nodes + labeled edges | Taxonomies, relationships | Limited expressiveness |
| Frames | Slots + values + procedures | Object representation | Complex inheritance |
| Scripts | Event sequences | Stereotyped situations | Cannot handle novelty |
UNIT 7: Probabilistic Reasoning
7.3 Review of Probability Theory
Q1. 📌 Describe the event "The maximum score is 6" when a six-faced die is thrown twice.
[ Jun 2025 Q5(a)(i) | Frequency: 1 ]
Sample Space: When a six-faced die is thrown twice, the sample space S consists of all ordered pairs (i, j) where i, j ∈ {1, 2, 3, 4, 5, 6}.
Total outcomes = 6 × 6 = 36
Event Description: The event "Maximum score is 6" means at least one of the two throws must show 6, and no throw shows more than 6 (which is impossible anyway).
Event A = {(i, j) : max(i, j) = 6}
This includes:
-
First throw is 6, second is anything: (6,1), (6,2), (6,3), (6,4), (6,5), (6,6)
-
Second throw is 6, first is anything except 6: (1,6), (2,6), (3,6), (4,6), (5,6)
Event Set:
A = {(6,1), (6,2), (6,3), (6,4), (6,5), (6,6),
(1,6), (2,6), (3,6), (4,6), (5,6)}
Number of favorable outcomes = 11
Probability = 11/36
Diagram:
Q2. 📌 Describe the event "The total score is 9" when a six-faced die is thrown twice.
[ Jun 2025 Q5(a)(ii) | Frequency: 1 ]
Sample Space: When a six-faced die is thrown twice, total outcomes = 6 × 6 = 36
Event Description: The event "Total score is 9" means the sum of both throws equals 9.
Event B = {(i, j) : i + j = 9}
Finding pairs where i + j = 9:
| First Throw (i) | Second Throw (j) | Sum |
|---|---|---|
| 3 | 6 | 9 ✓ |
| 4 | 5 | 9 ✓ |
| 5 | 4 | 9 ✓ |
| 6 | 3 | 9 ✓ |
Event Set:
B = {(3,6), (4,5), (5,4), (6,3)}
Number of favorable outcomes = 4
Probability = 4/36 = 1/9
Diagram:
Q3. 📌 Describe the event "Each throw results in an even score larger than 2" when die thrown twice.
[ Jun 2025 Q5(a)(iii) | Frequency: 1 ]
Sample Space: When a six-faced die is thrown twice, total outcomes = 6 × 6 = 36
Event Description: The event requires BOTH throws to satisfy:
-
Even number (2, 4, or 6)
-
Larger than 2 (3, 4, 5, or 6)
Intersection of conditions: Even AND > 2 → {4, 6}
Event C = {(i, j) : i ∈ {4, 6} and j ∈ {4, 6}}
Event Set:
C = {(4,4), (4,6), (6,4), (6,6)}
| First Throw | Second Throw | Valid? |
|---|---|---|
| 4 | 4 | ✓ |
| 4 | 6 | ✓ |
| 6 | 4 | ✓ |
| 6 | 6 | ✓ |
Number of favorable outcomes = 4
Probability = 4/36 = 1/9
Diagram:
7.4 Introduction to Bayesian Theory
Q4. 🔥 Write Bayes' Theorem and explain each component.
[ Jun 2024 Q4(b), Jun 2022 Q1(g) | Frequency: 2 ]
Bayes' Theorem:
Expanded Form (using Total Probability):
Components Explained:
| Component | Name | Description |
|---|---|---|
| P(H|E) | Posterior Probability | Probability of hypothesis H being true AFTER observing evidence E |
| P(H) | Prior Probability | Initial probability of hypothesis H BEFORE observing evidence |
| P(E|H) | Likelihood | Probability of observing evidence E IF hypothesis H is true |
| P(E) | Evidence/Marginal | Total probability of observing evidence E under all hypotheses |
| P(¬H) | Prior of negation | Probability that hypothesis H is false = 1 - P(H) |
Diagram:
Interpretation:
-
Prior P(H): What we believed before seeing evidence
-
Likelihood P(E|H): How well evidence supports hypothesis
-
Posterior P(H|E): Updated belief after seeing evidence
Example:
Medical Diagnosis:
-
H = Patient has disease
-
E = Test result is positive
-
P(H) = 0.01 (1% of population has disease)
-
P(E|H) = 0.95 (test is 95% accurate for diseased)
-
P(E|¬H) = 0.05 (5% false positive rate)
Even with positive test, only 16.1% chance of actually having the disease!
Applications:
-
Medical diagnosis
-
Spam filtering
-
Machine learning classifiers
-
Decision making under uncertainty
7.5 Bayes' Networks
Q5. ⭐ Explain Bayes' Networks.
[ Dec 2022 Q5(c) | Frequency: 1 ]
Bayesian Networks (Belief Networks): A Bayesian Network is a probabilistic graphical model that represents a set of random variables and their conditional dependencies using a Directed Acyclic Graph (DAG). It combines graph theory with probability theory for reasoning under uncertainty.
Components:
| Component | Description |
|---|---|
| Nodes | Represent random variables |
| Directed Edges | Represent causal/conditional dependencies |
| CPT (Conditional Probability Tables) | Specify probability of each node given its parents |
| DAG Structure | No cycles allowed in the graph |
Key Concepts:
| Term | Meaning |
|---|---|
| Parent Node | Node with outgoing edge to another node |
| Child Node | Node receiving edge from parent |
| Root Node | Node with no parents |
| Leaf Node | Node with no children |
| Markov Blanket | Parents, children, and children's parents |
Example: Burglary Alarm Network
Diagram:
Conditional Probability Tables:
P(Burglary):
| B | P(B) |
|---|---|
| T | 0.001 |
| F | 0.999 |
P(Alarm | Burglary, Earthquake):
| B | E | P(A=T) |
|---|---|---|
| T | T | 0.95 |
| T | F | 0.94 |
| F | T | 0.29 |
| F | F | 0.001 |
P(JohnCalls | Alarm):
| A | P(J=T) |
|---|---|
| T | 0.90 |
| F | 0.05 |
Joint Probability Calculation:
Properties:
-
Conditional Independence: A node is independent of non-descendants given its parents
-
Compact Representation: Requires fewer parameters than full joint distribution
-
Inference: Can compute any probability query from the network
Advantages:
-
Visual representation of dependencies
-
Handles uncertainty naturally
-
Supports both prediction and diagnosis
-
Compact knowledge representation
Applications:
-
Medical diagnosis systems
-
Fault detection
-
Risk assessment
-
Decision support systems
7.9 Dempster-Shafer Theory
Q6. ⭐ What are the different types of evidences in D-S (Dempster-Shafer) theory? Explain them briefly.
[ Jun 2025 Q1(e) | Frequency: 1 ]
Dempster-Shafer Theory (also called Theory of Evidence) handles uncertainty by allowing belief to be assigned to sets of hypotheses rather than single hypotheses.
Types of Evidence in D-S Theory:
1. Vacuous Evidence:
-
Provides no information about hypotheses
-
Belief mass assigned entirely to the frame of discernment (Θ)
-
m(Θ) = 1, m(A) = 0 for all proper subsets A
-
Represents complete ignorance
2. Certain Evidence:
-
Provides complete certainty about a single hypothesis
-
All belief mass assigned to one singleton set
-
m({h}) = 1 for some hypothesis h
-
Represents absolute knowledge
3. Consonant Evidence:
-
Focal elements are nested (one contains another)
-
Forms a chain: A₁ ⊂ A₂ ⊂ ... ⊂ Aₙ
-
Belief and Plausibility values are related simply
-
Example: m({a}) = 0.3, m({a,b}) = 0.5, m({a,b,c}) = 0.2
4. Bayesian Evidence:
-
All focal elements are singletons
-
Equivalent to standard probability
-
m assigns mass only to individual hypotheses
-
Bel(A) = Pl(A) for all subsets A
5. Categorical/Dogmatic Evidence:
-
Supports a specific proper subset with certainty
-
m(A) = 1 for some A ⊂ Θ (where A ≠ Θ)
-
No uncertainty about the supported set
Diagram:
| Type | Focal Elements | Characteristic |
|---|---|---|
| Vacuous | Only Θ | No information |
| Certain | One singleton | Complete certainty |
| Consonant | Nested sets | Hierarchical structure |
| Bayesian | All singletons | Traditional probability |
| Categorical | One proper subset | Dogmatic belief |
Q7. 🔥 Explain Dempster-Shafer Theory with example.
[ Jun 2025 Q4(b), Jun 2022 Q5(c) | Frequency: 2 ]
Dempster-Shafer Theory: The Dempster-Shafer (D-S) theory is a mathematical framework for representing and combining uncertain evidence. It generalizes Bayesian probability by allowing belief to be assigned to sets of hypotheses rather than requiring probability for each individual hypothesis.
Key Concepts:
| Concept | Symbol | Description |
|---|---|---|
| Frame of Discernment | Θ | Set of all mutually exclusive hypotheses |
| Power Set | 2^Θ | All possible subsets of Θ |
| Mass Function | m(A) | Basic probability assignment to subset A |
| Belief Function | Bel(A) | Total belief committed to A |
| Plausibility Function | Pl(A) | Maximum possible belief in A |
Mass Function Properties:
-
m(∅) = 0 (no belief assigned to empty set)
-
Σ m(A) = 1 for all A ⊆ Θ
Belief and Plausibility:
Relationship: Bel(A) ≤ P(A) ≤ Pl(A)
Dempster's Rule of Combination: For combining evidence from two independent sources:
Where conflict: $K = \sum_{B \cap C = \emptyset} m_1(B) \cdot m_2(C)$
Example: Medical Diagnosis
Frame of Discernment: Θ = {Flu, Cold, Allergy}
Evidence from Source 1 (Doctor 1):
-
m₁({Flu}) = 0.6
-
m₁({Cold}) = 0.3
-
m₁(Θ) = 0.1 (uncertainty)
Evidence from Source 2 (Doctor 2):
-
m₂({Flu}) = 0.4
-
m₂({Allergy}) = 0.4
-
m₂(Θ) = 0.2
Combining Evidence:
Diagram:
Combination Calculation Table:
| m₁ × m₂ | {Flu} 0.4 | {Allergy} 0.4 | Θ 0.2 |
|---|---|---|---|
| {Flu} 0.6 | {Flu} 0.24 | ∅ 0.24 | {Flu} 0.12 |
| {Cold} 0.3 | ∅ 0.12 | ∅ 0.12 | {Cold} 0.06 |
| Θ 0.1 | {Flu} 0.04 | {Allergy} 0.04 | Θ 0.02 |
Conflict: K = 0.24 + 0.12 + 0.12 = 0.48
Normalization factor: 1 - K = 0.52
Combined masses:
-
m₁₂({Flu}) = (0.24 + 0.12 + 0.04) / 0.52 = 0.40/0.52 ≈ 0.77
-
m₁₂({Cold}) = 0.06 / 0.52 ≈ 0.12
-
m₁₂({Allergy}) = 0.04 / 0.52 ≈ 0.08
-
m₁₂(Θ) = 0.02 / 0.52 ≈ 0.04
Belief and Plausibility for Flu:
-
Bel({Flu}) = m({Flu}) = 0.77
-
Pl({Flu}) = m({Flu}) + m(Θ) = 0.77 + 0.04 = 0.81
Advantages of D-S Theory:
-
Handles ignorance explicitly (mass to Θ)
-
Does not require complete probability assignments
-
Combines evidence from multiple sources
-
Distinguishes between uncertainty and disbelief
Disadvantages:
-
Computationally expensive (2^n subsets)
-
Conflict handling can be problematic
-
Requires independent evidence sources
-
May produce counterintuitive results with high conflict
Applications:
-
Expert systems
-
Sensor fusion
-
Target identification
-
Medical diagnosis
-
Risk assessment
Summary: Comparison of Uncertainty Handling Methods
| Method | Handles Ignorance | Combines Evidence | Complexity |
|---|---|---|---|
| Probability Theory | No (must sum to 1) | Bayes' Theorem | Low |
| Bayesian Networks | Limited | Message passing | Medium |
| Dempster-Shafer | Yes (mass to Θ) | Dempster's Rule | High (2^n) |
| Fuzzy Logic | Partial membership | Fuzzy operators | Medium |
UNIT 8: Fuzzy and Rough Set
8.2-8.4 Fuzzy Systems & Fuzzy Set Representation
Q1. 📌 Find fuzzy set A∪B for given fuzzy sets A and B
[ Dec 2024 Q1(e)(i) | Frequency: 1 ]
Fuzzy Set Union (A∪B):
The union of two fuzzy sets A and B is defined by taking the maximum of the membership values for each element.
Formula:
Example:
Let Universal Set X = {1, 2, 3, 4, 5}
| Element (x) | μ_A(x) | μ_B(x) | μ_A∪B(x) = max(μ_A, μ_B) |
|---|---|---|---|
| 1 | 0.2 | 0.6 | 0.6 |
| 2 | 0.5 | 0.3 | 0.5 |
| 3 | 0.8 | 0.7 | 0.8 |
| 4 | 0.4 | 0.9 | 0.9 |
| 5 | 0.1 | 0.2 | 0.2 |
Result:
-
A = {0.2/1, 0.5/2, 0.8/3, 0.4/4, 0.1/5}
-
B = {0.6/1, 0.3/2, 0.7/3, 0.9/4, 0.2/5}
-
A∪B = {0.6/1, 0.5/2, 0.8/3, 0.9/4, 0.2/5}
Q2. 📌 Find fuzzy set A∩B for given fuzzy sets A and B
[ Dec 2024 Q1(e)(ii) | Frequency: 1 ]
Fuzzy Set Intersection (A∩B):
The intersection of two fuzzy sets A and B is defined by taking the minimum of the membership values for each element.
Formula:
Example:
Using the same fuzzy sets A and B:
| Element (x) | μ_A(x) | μ_B(x) | μ_A∩B(x) = min(μ_A, μ_B) |
|---|---|---|---|
| 1 | 0.2 | 0.6 | 0.2 |
| 2 | 0.5 | 0.3 | 0.3 |
| 3 | 0.8 | 0.7 | 0.7 |
| 4 | 0.4 | 0.9 | 0.4 |
| 5 | 0.1 | 0.2 | 0.1 |
Result:
- A∩B = {0.2/1, 0.3/2, 0.7/3, 0.4/4, 0.1/5}
Q3. 📌 Find fuzzy set (A∩B)' for given fuzzy sets A and B
[ Dec 2024 Q1(e)(iii) | Frequency: 1 ]
Complement of Fuzzy Set Intersection (A∩B)':
First, find A∩B (intersection), then calculate its complement.
Complement Formula:
Step 1: Find A∩B
From Q2: A∩B = {0.2/1, 0.3/2, 0.7/3, 0.4/4, 0.1/5}
Step 2: Find (A∩B)'
| Element (x) | μ_A∩B(x) | μ_(A∩B)'(x) = 1 - μ_A∩B(x) |
|---|---|---|
| 1 | 0.2 | 0.8 |
| 2 | 0.3 | 0.7 |
| 3 | 0.7 | 0.3 |
| 4 | 0.4 | 0.6 |
| 5 | 0.1 | 0.9 |
Result:
- (A∩B)' = {0.8/1, 0.7/2, 0.3/3, 0.6/4, 0.9/5}
Q4. ⭐ Prove Commutative property of fuzzy sets with suitable example
[ Dec 2024 Q4(b)(i) | Frequency: 1 ]
Commutative Property:
For fuzzy sets A and B:
-
A ∪ B = B ∪ A (Union is commutative)
-
A ∩ B = B ∩ A (Intersection is commutative)
Proof:
For Union:
For Intersection:
Example:
Let X = {a, b, c}
-
A = {0.3/a, 0.7/b, 0.5/c}
-
B = {0.6/a, 0.4/b, 0.8/c}
| Element | μ_A | μ_B | A∪B = max | B∪A = max | A∩B = min | B∩A = min |
|---|---|---|---|---|---|---|
| a | 0.3 | 0.6 | 0.6 | 0.6 | 0.3 | 0.3 |
| b | 0.7 | 0.4 | 0.7 | 0.7 | 0.4 | 0.4 |
| c | 0.5 | 0.8 | 0.8 | 0.8 | 0.5 | 0.5 |
Verified: A∪B = B∪A = {0.6/a, 0.7/b, 0.8/c} and A∩B = B∩A = {0.3/a, 0.4/b, 0.5/c}
Q5. ⭐ Prove Associative property of fuzzy sets with suitable example
[ Dec 2024 Q4(b)(ii) | Frequency: 1 ]
Associative Property:
For fuzzy sets A, B, and C:
-
(A ∪ B) ∪ C = A ∪ (B ∪ C)
-
(A ∩ B) ∩ C = A ∩ (B ∩ C)
Proof:
For Union:
Example:
Let X = {1, 2}
-
A = {0.2/1, 0.5/2}
-
B = {0.4/1, 0.3/2}
-
C = {0.6/1, 0.8/2}
For Union:
| Element | μ_A | μ_B | μ_C | A∪B | (A∪B)∪C | B∪C | A∪(B∪C) |
|---|---|---|---|---|---|---|---|
| 1 | 0.2 | 0.4 | 0.6 | 0.4 | 0.6 | 0.6 | 0.6 |
| 2 | 0.5 | 0.3 | 0.8 | 0.5 | 0.8 | 0.8 | 0.8 |
Verified: (A∪B)∪C = A∪(B∪C) = {0.6/1, 0.8/2}
Q6. ⭐ Prove Distributive property of fuzzy sets with suitable example
[ Dec 2024 Q4(b)(iii) | Frequency: 1 ]
Distributive Property:
For fuzzy sets A, B, and C:
-
A ∪ (B ∩ C) = (A ∪ B) ∩ (A ∪ C)
-
A ∩ (B ∪ C) = (A ∩ B) ∪ (A ∩ C)
Proof:
By properties of max and min functions, both expressions are equal.
Example:
Let X = {p, q}
-
A = {0.3/p, 0.6/q}
-
B = {0.5/p, 0.4/q}
-
C = {0.7/p, 0.2/q}
| Element | μ_A | μ_B | μ_C | B∩C | A∪(B∩C) | A∪B | A∪C | (A∪B)∩(A∪C) |
|---|---|---|---|---|---|---|---|---|
| p | 0.3 | 0.5 | 0.7 | 0.5 | 0.5 | 0.5 | 0.7 | 0.5 |
| q | 0.6 | 0.4 | 0.2 | 0.2 | 0.6 | 0.6 | 0.6 | 0.6 |
Verified: A∪(B∩C) = (A∪B)∩(A∪C) = {0.5/p, 0.6/q}
Q7. 🔥 Prove De-Morgan's laws for fuzzy sets with suitable example
[ Dec 2024 Q4(b)(iv) | Frequency: 1 ]
De-Morgan's Laws for Fuzzy Sets:
-
(A ∪ B)' = A' ∩ B'
-
(A ∩ B)' = A' ∪ B'
Proof of First Law:
LHS: $\mu_{(A \cup B)'}(x) = 1 - \max[\mu_A(x), \mu_B(x)]$
RHS: $\mu_{A' \cap B'}(x) = \min[1-\mu_A(x), 1-\mu_B(x)]$
Since $1 - \max[a, b] = \min[1-a, 1-b]$, LHS = RHS
Example:
Let X = {1, 2, 3}
-
A = {0.2/1, 0.7/2, 0.4/3}
-
B = {0.5/1, 0.3/2, 0.6/3}
| Element | μ_A | μ_B | A∪B | (A∪B)' | A' | B' | A'∩B' |
|---|---|---|---|---|---|---|---|
| 1 | 0.2 | 0.5 | 0.5 | 0.5 | 0.8 | 0.5 | 0.5 |
| 2 | 0.7 | 0.3 | 0.7 | 0.3 | 0.3 | 0.7 | 0.3 |
| 3 | 0.4 | 0.6 | 0.6 | 0.4 | 0.6 | 0.4 | 0.4 |
Verified: (A∪B)' = A'∩B' = {0.5/1, 0.3/2, 0.4/3}
Diagram:
8.7 Rough Set Theory
Q8. 🔥 Explain Rough Set Theory
[ Dec 2022 Q5(d) | Frequency: 1 ]
Rough Set Theory:
Rough Set Theory, introduced by Zdzisław Pawlak in 1982, is a mathematical approach to deal with imprecision, vagueness, and uncertainty in data. Unlike fuzzy sets, rough sets work with indiscernibility based on available information.
Key Concepts:
| Concept | Description |
|---|---|
| Information System | A table with objects (rows) and attributes (columns) |
| Indiscernibility Relation | Objects that cannot be distinguished based on attributes |
| Equivalence Classes | Groups of indiscernible objects |
| Approximation Space | Pair (U, R) where U is universe and R is equivalence relation |
Approximations:
- Lower Approximation (R_X): Objects that certainly belong to set X
-
Contains objects whose equivalence class is completely contained in X
-
Upper Approximation (R̄_X): Objects that possibly belong to set X
-
Contains objects whose equivalence class has non-empty intersection with X
-
Boundary Region: Objects that cannot be classified with certainty
- Boundary = Upper Approximation - Lower Approximation
Diagram:
Example:
Consider patients with symptoms and diagnosis:
| Patient | Headache | Fever | Flu |
|---|---|---|---|
| P1 | Yes | Yes | Yes |
| P2 | Yes | Yes | Yes |
| P3 | Yes | No | No |
| P4 | No | Yes | No |
| P5 | No | No | No |
For X = {Patients with Flu} = {P1, P2}
-
Indiscernibility: P1 and P2 are indiscernible (same symptoms)
-
Lower Approximation: {P1, P2} (certainly have Flu)
-
Upper Approximation: {P1, P2} (possibly have Flu)
-
Boundary: Empty (crisp classification possible)
Properties of Rough Sets:
| Property | Formula |
|---|---|
| Lower ⊆ Set ⊆ Upper | R_X ⊆ X ⊆ R̄_X |
| Empty Set | R_∅ = R̄_∅ = ∅ |
| Universal Set | R_U = R̄_U = U |
| Complement | R_(X') = (R̄_X)' |
Applications:
-
Data Mining: Feature selection and reduction
-
Medical Diagnosis: Handling incomplete patient data
-
Pattern Recognition: Classification with uncertain data
-
Decision Support Systems: Rule extraction from data
Comparison: Fuzzy Sets vs Rough Sets:
| Aspect | Fuzzy Sets | Rough Sets |
|---|---|---|
| Uncertainty Type | Membership degree (0 to 1) | Boundary region |
| Knowledge | Subjective membership | Objective from data |
| Representation | μ_A(x) ∈ [0,1] | Lower & Upper approximations |
| Origin | Zadeh (1965) | Pawlak (1982) |
Diagram:
Summary: Fuzzy Set Operations Quick Reference
| Operation | Formula | Description |
|---|---|---|
| Union (A∪B) | max[μ_A(x), μ_B(x)] | Take maximum membership |
| Intersection (A∩B) | min[μ_A(x), μ_B(x)] | Take minimum membership |
| Complement (A') | 1 - μ_A(x) | Subtract from 1 |
Properties Summary
| Property | Union | Intersection |
|---|---|---|
| Commutative | A∪B = B∪A | A∩B = B∩A |
| Associative | (A∪B)∪C = A∪(B∪C) | (A∩B)∩C = A∩(B∩C) |
| Distributive | A∪(B∩C) = (A∪B)∩(A∪C) | A∩(B∪C) = (A∩B)∪(A∩C) |
| De-Morgan | (A∪B)' = A'∩B' | (A∩B)' = A'∪B' |
| Idempotent | A∪A = A | A∩A = A |
| Identity | A∪∅ = A | A∩U = A |
BLOCK 3: MACHINE LEARNING - I
UNIT 9: Introduction to Machine Learning Methods
Q1. 📌 State and discuss the basic steps of machine learning cycle.
[ Dec 2024 Q1(f) | Frequency: 1 ]
The Machine Learning Cycle consists of systematic steps to build and deploy ML models:
| Step | Description |
|---|---|
| 1. Data Collection | Gathering relevant data from various sources (databases, APIs, sensors) |
| 2. Data Preparation | Cleaning, transforming, and preprocessing raw data for analysis |
| 3. Feature Engineering | Selecting and creating meaningful features from data |
| 4. Model Selection | Choosing appropriate algorithm based on problem type |
| 5. Model Training | Feeding prepared data to train the model |
| 6. Model Evaluation | Testing model performance using metrics (accuracy, precision, recall) |
| 7. Model Tuning | Optimizing hyperparameters to improve performance |
| 8. Deployment | Integrating model into production environment |
| 9. Monitoring | Continuously tracking model performance and retraining if needed |
Q2. 🔥 Draw block diagram for the machine learning cycle.
[ Jun 2024 Q1(e), Jun 2022 Q2(b) | Frequency: 2 ]
Diagram:
Key Components:
-
Data Collection → Preparation: Raw data is collected and cleaned
-
Feature Engineering → Model Selection: Features extracted and algorithm chosen
-
Training → Evaluation → Tuning: Iterative process until optimal performance
-
Deployment → Monitoring: Model goes live with continuous monitoring
Q3. 📌 List the steps involved in machine learning cycle.
[ Jun 2024 Q1(e) | Frequency: 1 ]
Steps in Machine Learning Cycle:
-
Problem Definition - Clearly define the objective and success criteria
-
Data Collection - Gather data from relevant sources
-
Data Preprocessing - Clean, normalize, and handle missing values
-
Exploratory Data Analysis (EDA) - Understand data patterns and distributions
-
Feature Engineering - Create and select important features
-
Data Splitting - Divide data into training, validation, and test sets
-
Model Selection - Choose appropriate ML algorithm
-
Model Training - Train model on training dataset
-
Model Evaluation - Assess performance using appropriate metrics
-
Hyperparameter Tuning - Optimize model parameters
-
Model Deployment - Deploy model to production
-
Monitoring & Maintenance - Track performance and update as needed
Q4. 📌 Discuss the role of each component of machine learning cycle briefly.
[ Jun 2022 Q2(b) | Frequency: 1 ]
| Component | Role |
|---|---|
| Data Collection | Acquiring sufficient, relevant, and quality data that represents the problem domain |
| Data Preprocessing | Transforming raw data into usable format by handling missing values, outliers, and noise |
| Feature Engineering | Extracting and selecting features that best represent the underlying patterns |
| Model Selection | Choosing algorithm that suits the problem type (classification, regression, clustering) |
| Training | Algorithm learns patterns from data by adjusting internal parameters |
| Evaluation | Measuring model performance using metrics like accuracy, F1-score, RMSE |
| Tuning | Adjusting hyperparameters to achieve optimal model performance |
| Deployment | Making the trained model available for real-world predictions |
| Monitoring | Tracking model behavior in production and detecting performance degradation |
Q5. 📌 Differentiate between Machine Learning and Data Mining.
[ Jun 2022 Q3(b) | Frequency: 1 ]
| Aspect | Machine Learning | Data Mining |
|---|---|---|
| Definition | Algorithms that learn from data to make predictions | Process of discovering patterns in large datasets |
| Primary Goal | Build predictive models | Extract hidden knowledge and patterns |
| Focus | Learning and generalizing from data | Finding interesting relationships in data |
| Approach | Uses known patterns to predict unknown outcomes | Discovers unknown patterns from existing data |
| Output | Trained model for predictions | Rules, patterns, associations, clusters |
| Data Requirement | Labeled/unlabeled data for training | Large historical databases |
| Human Intervention | Less (automated learning) | More (domain expertise needed) |
| Example | Spam email classifier | Market basket analysis |
| Techniques | Neural networks, SVM, Decision trees | Association rules, clustering, anomaly detection |
Q6. 📌 Give an example of Machine Learning.
[ Jun 2022 Q3(b) | Frequency: 1 ]
Example: Email Spam Detection
-
Problem: Classify incoming emails as spam or not spam
-
Data: Historical emails labeled as spam/not-spam
-
Features: Words frequency, sender information, links, attachments
-
Algorithm: Naive Bayes or Neural Network
-
Training: Model learns patterns distinguishing spam from legitimate emails
-
Prediction: New emails are automatically classified
Other Examples:
-
Netflix movie recommendations
-
Voice assistants (Siri, Alexa)
-
Fraud detection in banking
-
Self-driving cars
-
Medical diagnosis systems
Q7. 📌 Give an example of Data Mining.
[ Jun 2022 Q3(b) | Frequency: 1 ]
Example: Market Basket Analysis in Retail
-
Problem: Discover which products are frequently bought together
-
Data: Transaction records from supermarket
-
Technique: Association Rule Mining (Apriori Algorithm)
-
Discovery: "Customers who buy bread also tend to buy butter (confidence: 85%)"
-
Application: Product placement, cross-selling strategies
Other Examples:
-
Customer segmentation in banking
-
Detecting fraudulent insurance claims
-
Identifying disease outbreaks from health records
-
Social network analysis
-
Web usage mining for website optimization
Q8. 🔥 Discuss Supervised Learning with suitable example.
[ Dec 2024 Q1(a), Dec 2022 Q2(b)(ii) | Frequency: 2 ]
Supervised Learning is a type of machine learning where the model is trained on labeled data - data with known input-output pairs. The algorithm learns to map inputs to correct outputs.
Characteristics:
-
Requires labeled training data
-
Goal is to learn a mapping function: Y = f(X)
-
Used for classification and regression tasks
-
Model is "supervised" by correct answers
Working Process:
-
Provide training data with labels
-
Algorithm learns patterns from input-output relationships
-
Model is evaluated on test data
-
Trained model predicts outputs for new inputs
Diagram:
Example: House Price Prediction
-
Input Features: Size, location, bedrooms, age
-
Label: Actual house price
-
Training: Model learns relationship between features and prices
-
Prediction: Estimate price for new houses
Algorithms: Linear Regression, Decision Trees, SVM, Neural Networks, Random Forest
Q9. 🔥 Discuss Unsupervised Learning with suitable example.
[ Dec 2024 Q1(a), Dec 2022 Q2(b)(iii) | Frequency: 2 ]
Unsupervised Learning is a type of machine learning where the model is trained on unlabeled data - data without predefined categories or outcomes. The algorithm discovers hidden patterns or structures.
Characteristics:
-
No labeled data required
-
Goal is to find hidden structure in data
-
Used for clustering, association, dimensionality reduction
-
Algorithm finds patterns on its own
Working Process:
-
Provide unlabeled data
-
Algorithm identifies patterns, similarities, or groupings
-
Output reveals data structure or clusters
-
Human interprets the discovered patterns
Diagram:
Example: Customer Segmentation
-
Input: Customer purchase history, demographics, browsing behavior
-
No Labels: Categories are unknown
-
Process: Algorithm groups similar customers together
-
Output: Customer segments (e.g., "budget shoppers," "premium customers")
-
Application: Targeted marketing strategies
Algorithms: K-Means Clustering, Hierarchical Clustering, PCA, Apriori Algorithm, DBSCAN
Q10. ⭐ Compare Supervised Learning and Unsupervised Learning.
[ Jun 2022 Q1(f) | Frequency: 1 ]
| Aspect | Supervised Learning | Unsupervised Learning |
|---|---|---|
| Data Type | Labeled data | Unlabeled data |
| Goal | Predict outcomes | Discover patterns |
| Training | Uses input-output pairs | Uses only input data |
| Feedback | Direct feedback available | No feedback |
| Complexity | Less complex | More complex |
| Accuracy | Higher accuracy | Lower accuracy |
| Human Effort | More (for labeling) | Less |
| Applications | Classification, Regression | Clustering, Association |
| Algorithms | SVM, Decision Trees, Linear Regression | K-Means, PCA, Apriori |
| Example | Spam detection, Price prediction | Customer segmentation, Anomaly detection |
| Output | Predicted class/value | Groups/clusters/rules |
Q11. 📌 List the algorithms for Supervised Learning.
[ Jun 2022 Q1(f) | Frequency: 1 ]
Classification Algorithms:
-
Logistic Regression
-
Decision Trees
-
Random Forest
-
Support Vector Machine (SVM)
-
K-Nearest Neighbors (KNN)
-
Naive Bayes
-
Neural Networks
-
Gradient Boosting (XGBoost, AdaBoost)
Regression Algorithms:
-
Linear Regression
-
Polynomial Regression
-
Ridge Regression
-
Lasso Regression
-
Decision Tree Regression
-
Random Forest Regression
-
Support Vector Regression (SVR)
-
Neural Network Regression
Q12. 📌 List the algorithms for Unsupervised Learning.
[ Jun 2022 Q1(f) | Frequency: 1 ]
Clustering Algorithms:
-
K-Means Clustering
-
Hierarchical Clustering
-
DBSCAN (Density-Based)
-
Gaussian Mixture Models
-
Mean Shift Clustering
Association Rule Algorithms:
-
Apriori Algorithm
-
FP-Growth Algorithm
-
Eclat Algorithm
Dimensionality Reduction:
-
Principal Component Analysis (PCA)
-
t-SNE (t-Distributed Stochastic Neighbor Embedding)
-
Linear Discriminant Analysis (LDA)
-
Singular Value Decomposition (SVD)
Anomaly Detection:
-
Isolation Forest
-
One-Class SVM
-
Local Outlier Factor (LOF)
Q13. 📌 Discuss Rote Learning with suitable example.
[ Dec 2022 Q2(b)(i) | Frequency: 1 ]
Rote Learning is the simplest form of machine learning where the system memorizes training examples exactly and retrieves them during inference without generalization.
Characteristics:
-
Direct Memorization: Stores input-output pairs exactly
-
No Generalization: Cannot handle new, unseen examples
-
Fast Retrieval: Quick lookup from memory
-
Memory Intensive: Requires large storage for all examples
-
Exact Matching: Only works for previously seen inputs
Working Process:
-
Store all training examples in memory
-
When new input arrives, search for exact match
-
Return stored output for matched input
-
If no match found, cannot provide answer
Example: Lookup Table for Multiplication
| Input (A × B) | Output |
|---|---|
| 2 × 3 | 6 |
| 5 × 7 | 35 |
| 8 × 9 | 72 |
-
System memorizes all multiplication facts
-
Query: 5 × 7 → Retrieves stored answer: 35
-
Query: 11 × 13 → Cannot answer (not memorized)
Limitations:
-
Cannot generalize to new situations
-
Impractical for large problem spaces
-
No learning of underlying concepts
Q14. 🔥 Compare/Discuss Descriptive, Predictive, and Prescriptive Analytics in Machine Learning.
[ Jun 2025 Q1(b), Jun 2024 Q2(a), Jun 2023 Q1(a) | Frequency: 3 ]
| Aspect | Descriptive Analytics | Predictive Analytics | Prescriptive Analytics |
|---|---|---|---|
| Question Answered | "What happened?" | "What will happen?" | "What should we do?" |
| Purpose | Summarize historical data | Forecast future outcomes | Recommend optimal actions |
| Time Focus | Past | Future | Future actions |
| Complexity | Low | Medium | High |
| Techniques | Statistics, visualization | ML algorithms, forecasting | Optimization, simulation |
| Output | Reports, dashboards | Predictions, probabilities | Recommendations, decisions |
| Data Used | Historical data | Historical + current data | All available data |
| Human Role | Interpret results | Validate predictions | Implement recommendations |
Diagram:
Examples:
| Analytics Type | Business Example |
|---|---|
| Descriptive | "Last quarter sales were $2M, 15% higher than previous quarter" |
| Predictive | "Next quarter sales predicted to be $2.3M based on trends" |
| Prescriptive | "Increase marketing budget by 20% in Region A to maximize sales" |
Q15. 📌 Discuss Rule-based Machine Learning with suitable example.
[ Jun 2023 Q2(b)(i) | Frequency: 1 ]
Rule-based Machine Learning uses a set of IF-THEN rules to make predictions or decisions. These rules can be created by domain experts or learned from data.
Characteristics:
-
Uses explicit IF-THEN rules
-
Highly interpretable and explainable
-
Easy to modify and maintain
-
Works well with structured knowledge
-
May not handle complex patterns
Structure of Rules:
IF condition1 AND condition2 THEN action/conclusion
Learning Approaches:
-
Expert-defined Rules: Domain experts create rules manually
-
Rule Induction: Algorithms learn rules from data (RIPPER, OneR)
-
Association Rule Mining: Discover rules from patterns (Apriori)
Example: Medical Diagnosis System
Rule 1: IF fever > 101°F AND cough = yes AND fatigue = yes
THEN diagnosis = "Flu"
Rule 2: IF fever > 101°F AND rash = yes AND joint_pain = yes
THEN diagnosis = "Dengue"
Rule 3: IF fever < 99°F AND runny_nose = yes AND sneezing = yes
THEN diagnosis = "Common Cold"
Advantages:
-
Transparent decision-making
-
Easy to audit and validate
-
Domain knowledge integration
Disadvantages:
-
Difficult to handle exceptions
-
May conflict or overlap
-
Doesn't generalize well
Q16. 📌 Discuss Bayesian Algorithms with suitable example.
[ Jun 2023 Q2(b)(ii) | Frequency: 1 ]
Bayesian Algorithms are based on Bayes' Theorem, which calculates the probability of a hypothesis given observed evidence. They provide probabilistic predictions with uncertainty quantification.
Bayes' Theorem:
Where:
-
P(H|E): Posterior probability (probability of hypothesis given evidence)
-
P(E|H): Likelihood (probability of evidence given hypothesis)
-
P(H): Prior probability (initial belief)
-
P(E): Evidence probability
Common Bayesian Algorithms:
-
Naive Bayes Classifier
-
Bayesian Networks
-
Bayesian Linear Regression
-
Gaussian Naive Bayes
Example: Email Spam Classification using Naive Bayes
Training Data Analysis:
-
P(Spam) = 0.3 (30% of emails are spam)
-
P("Free" | Spam) = 0.8
-
P("Free" | Not Spam) = 0.1
Prediction for new email containing "Free":
P(Spam | "Free") = P("Free"|Spam) × P(Spam) / P("Free")
= (0.8 × 0.3) / P("Free")
= High probability → Classify as SPAM
Advantages:
-
Handles uncertainty naturally
-
Works well with small datasets
-
Provides probability scores
-
Fast training and prediction
Applications: Spam filtering, sentiment analysis, medical diagnosis, recommendation systems
Q17. 📌 What is Reinforcement Learning?
[ Dec 2022 Q1(f) | Frequency: 1 ]
Reinforcement Learning (RL) is a type of machine learning where an agent learns to make decisions by interacting with an environment. The agent receives rewards or penalties based on its actions and learns to maximize cumulative rewards over time.
Key Concepts:
-
Agent: The learner/decision-maker
-
Environment: The world the agent interacts with
-
State: Current situation of the agent
-
Action: Possible moves the agent can make
-
Reward: Feedback signal (positive or negative)
-
Policy: Strategy for choosing actions
Learning Process:
-
Agent observes current state
-
Agent takes an action based on policy
-
Environment provides reward and new state
-
Agent updates policy to maximize future rewards
-
Process repeats until goal is achieved
Examples:
-
Game playing (Chess, Go, Video games)
-
Robot navigation
-
Autonomous vehicles
-
Stock trading algorithms
Q18. ⭐ Discuss Reinforcement Learning.
[ Dec 2024 Q1(a) | Frequency: 1 ]
Reinforcement Learning (RL) is a learning paradigm where an agent learns optimal behavior through trial and error interactions with the environment, guided by reward signals.
Core Components:
| Component | Description |
|---|---|
| Agent | Entity that takes actions and learns |
| Environment | External system agent interacts with |
| State (S) | Current situation representation |
| Action (A) | Set of possible moves |
| Reward (R) | Numerical feedback for actions |
| Policy (π) | Mapping from states to actions |
| Value Function | Expected cumulative reward from state |
Types of RL:
-
Positive Reinforcement: Reward for good actions
-
Negative Reinforcement: Penalty for bad actions
Key Characteristics:
-
No labeled data required
-
Learns from consequences of actions
-
Balances exploration vs exploitation
-
Goal is long-term reward maximization
Applications:
-
AlphaGo (game playing)
-
Self-driving cars
-
Robotics control
-
Resource management
-
Personalized recommendations
Q19. 🔥 Explain Reinforcement Learning with block diagram and component roles.
[ Dec 2023 Q4(a) | Frequency: 1 ]
Reinforcement Learning is a goal-oriented learning approach where an agent learns to achieve goals by interacting with an environment.
Diagram:
Component Roles:
| Component | Role |
|---|---|
| Agent | Decision-maker that selects actions based on policy to maximize rewards |
| Environment | External system that responds to agent's actions with new states and rewards |
| State (St) | Complete description of current situation; basis for decision-making |
| Action (At) | Choice made by agent from available options; affects environment |
| Reward (Rt) | Immediate feedback signal; guides learning toward goal |
| Policy (π) | Strategy mapping states to actions; can be deterministic or probabilistic |
| Value Function | Estimates expected cumulative future reward from each state |
| Q-Function | Estimates value of taking specific action in specific state |
Learning Loop:
-
Agent observes state St
-
Agent selects action At using policy
-
Environment transitions to state St+1
-
Environment provides reward Rt+1
-
Agent updates policy based on experience
-
Repeat until convergence
Q20. 📌 Classify the various Reinforcement Learning algorithms.
[ Dec 2022 Q1(f) | Frequency: 1 ]
Diagram:
Classification Summary:
| Category | Subcategory | Algorithms |
|---|---|---|
| Model-Based | - | Dynamic Programming, Dyna-Q |
| Model-Free | Value-Based | Q-Learning, SARSA, DQN |
| Policy-Based | REINFORCE, Policy Gradient | |
| Actor-Critic | A2C, A3C, PPO, SAC, DDPG | |
| On-Policy | - | SARSA, A2C, PPO |
| Off-Policy | - | Q-Learning, DQN, DDPG |
Q21. 📌 Discuss Delayed-Reinforcement Learning with suitable example.
[ Dec 2022 Q2(b)(iv) | Frequency: 1 ]
Delayed Reinforcement Learning refers to situations where rewards are not received immediately after an action but are delayed until much later, often after a sequence of actions.
Characteristics:
-
Reward signal comes after many time steps
-
Difficult to attribute credit to specific actions
-
Requires temporal credit assignment
-
Common in real-world problems
Challenges:
-
Credit Assignment Problem: Which actions led to the reward?
-
Long-term Dependencies: Early actions affect late rewards
-
Exploration: Must try sequences to discover rewards
-
Sparse Rewards: Few feedback signals to learn from
Example: Chess Game
-
Agent plays hundreds of moves during a game
-
Reward: +1 for winning, -1 for losing, 0 for draw
-
Reward received only at game end
-
Challenge: Which moves were responsible for the outcome?
Timeline:
Move 1 → Move 2 → Move 3 → ... → Move 50 → Win (+1 Reward)
↑
Which move was crucial for winning?
Solutions:
-
Temporal Difference (TD) Learning: Estimate value at each step
-
Eligibility Traces: Track which states were visited
-
Monte Carlo Methods: Average rewards over complete episodes
-
Reward Shaping: Add intermediate rewards to guide learning
Other Examples:
-
Robot learning to walk (reward only when destination reached)
-
Stock trading (profit/loss realized later)
-
Drug treatment effects (outcomes visible after months)
Q22. ⭐ Compare Model-free Reinforcement Learning with Model-based Reinforcement Learning.
[ Jun 2024 Q4(a) | Frequency: 1 ]
| Aspect | Model-Free RL | Model-Based RL |
|---|---|---|
| Definition | Learns directly from experience without modeling environment | Learns a model of environment, then plans |
| Environment Model | Not required | Required (learned or given) |
| Sample Efficiency | Low (needs many interactions) | High (can plan with model) |
| Computation | Less during training | More (model learning + planning) |
| Flexibility | Adapts easily to changes | Model may become outdated |
| Accuracy | Limited by experience | Limited by model accuracy |
| Learning Speed | Slower | Faster with good model |
| Memory | Stores value/policy | Stores model of dynamics |
| Planning | No explicit planning | Can simulate and plan ahead |
| Examples | Q-Learning, SARSA, Policy Gradient | Dyna-Q, PILCO, World Models |
Diagram:
When to Use:
-
Model-Free: When environment is complex, hard to model
-
Model-Based: When sample efficiency is critical, model is simple
Q23. ⭐ Discuss the sub-classes of Model-free Reinforcement Learning.
[ Jun 2024 Q4(a) | Frequency: 1 ]
Model-Free RL has three main sub-classes:
Diagram:
1. Value-Based Methods:
-
Learn value function (V or Q)
-
Policy derived from value function
-
Algorithms: Q-Learning, SARSA, DQN
| Feature | Description |
|---|---|
| Learning | Estimates expected rewards for state-action pairs |
| Policy | Implicit (choose action with max value) |
| Best For | Discrete action spaces |
| Example | Q-Learning: Q(s,a) ← Q(s,a) + α[R + γmax Q(s',a') - Q(s,a)] |
2. Policy-Based Methods:
-
Learn policy directly (π: state → action)
-
No value function required
-
Algorithms: REINFORCE, Policy Gradient
| Feature | Description |
|---|---|
| Learning | Directly optimizes policy parameters |
| Output | Probability distribution over actions |
| Best For | Continuous action spaces, stochastic policies |
| Advantage | Can learn stochastic policies |
3. Actor-Critic Methods:
-
Combines value-based and policy-based
-
Actor: Learns policy
-
Critic: Evaluates actions using value function
-
Algorithms: A2C, A3C, PPO, SAC
| Feature | Description |
|---|---|
| Actor | Updates policy in direction suggested by critic |
| Critic | Estimates value function to guide actor |
| Advantage | Lower variance than pure policy gradient |
| Best For | Complex, high-dimensional problems |
Q24. ⭐ Discuss Ensemble Learning / What is Ensemble Learning.
[ Dec 2024 Q1(g), Jun 2023 Q1(f) | Frequency: 2 ]
Ensemble Learning is a machine learning technique that combines multiple models (base learners) to produce a more accurate and robust prediction than any individual model.
Core Principle: "Wisdom of the crowd" - combining diverse opinions often yields better results.
Key Ensemble Methods:
| Method | Description | Algorithms |
|---|---|---|
| Bagging | Train models on random subsets; aggregate by voting/averaging | Random Forest |
| Boosting | Train models sequentially; each corrects predecessor's errors | AdaBoost, XGBoost, Gradient Boosting |
| Stacking | Train meta-model on predictions of base models | Stacked Generalization |
| Voting | Combine predictions by majority vote (classification) or average (regression) | Voting Classifier |
Diagram:
Advantages:
-
Higher accuracy than individual models
-
Reduced overfitting
-
More robust predictions
-
Handles noise better
Example - Random Forest:
-
Combines multiple decision trees
-
Each tree trained on different data subset
-
Final prediction: majority vote of all trees
Q25. 📌 Explain Ensemble Methods.
[ Dec 2022 Q5(g) | Frequency: 1 ]
Ensemble Methods combine predictions from multiple machine learning models to improve overall performance.
Types of Ensemble Methods:
1. Bagging (Bootstrap Aggregating):
-
Creates multiple training sets by sampling with replacement
-
Trains independent models on each set
-
Aggregates predictions (voting/averaging)
-
Example: Random Forest
2. Boosting:
-
Trains models sequentially
-
Each model focuses on errors of previous models
-
Weighted combination of models
-
Examples: AdaBoost, Gradient Boosting, XGBoost
3. Stacking:
-
Trains diverse base models
-
Uses another model (meta-learner) to combine predictions
-
More complex but often more accurate
4. Voting:
-
Hard Voting: Majority class prediction
-
Soft Voting: Average of predicted probabilities
Diagram:
Comparison:
| Aspect | Bagging | Boosting | Stacking |
|---|---|---|---|
| Training | Parallel | Sequential | Two-level |
| Focus | Reduce variance | Reduce bias | Combine strengths |
| Overfitting | Low risk | Higher risk | Moderate |
| Complexity | Low | Medium | High |
Summary Table: Unit 9 - Machine Learning Methods
| Topic | Key Points |
|---|---|
| ML Cycle | Data Collection → Preparation → Training → Evaluation → Deployment |
| Supervised Learning | Labeled data, classification/regression |
| Unsupervised Learning | Unlabeled data, clustering/association |
| Reinforcement Learning | Agent-environment interaction, rewards |
| Analytics Types | Descriptive → Predictive → Prescriptive |
| Ensemble Methods | Bagging, Boosting, Stacking |
| Model-Free RL | Value-based, Policy-based, Actor-Critic |
UNIT 10: Classification
Q1. ⭐ Differentiate between Lazy Learners and Eager Learners in Classification.
[ Dec 2023 Q1(e), Dec 2022 Q3(b) | Frequency: 2 ]
| Aspect | Lazy Learners | Eager Learners |
|---|---|---|
| Definition | Defer processing until a query is made | Build classification model during training |
| Training Time | Minimal (just stores data) | High (learns from training data) |
| Prediction Time | Slow (processes at query time) | Fast (model already built) |
| Memory Usage | High (stores all training data) | Low (stores only model) |
| Model | No explicit model created | Creates explicit model |
| Adaptability | Easily adapts to new data | Requires retraining for new data |
| Also Called | Instance-based learning | Model-based learning |
| Generalization | Local approximation | Global approximation |
| Handling Changes | Better with dynamic data | Better with static data |
Key Difference: Lazy learners wait until prediction time to generalize, while eager learners generalize during the training phase itself.
Q2. ⭐ List/Give examples of algorithms used for Lazy Learners.
[ Dec 2023 Q1(e), Dec 2022 Q3(b) | Frequency: 2 ]
Lazy Learner Algorithms:
- K-Nearest Neighbors (KNN)
- Classifies based on majority class of K nearest neighbors
-
Stores all training instances
-
Case-Based Reasoning (CBR)
- Solves new problems using solutions from similar past cases
-
Maintains case library
-
Locally Weighted Learning (LWL)
- Builds local models around query point
-
Weights instances by proximity
-
Radius Neighbors Classifier
-
Similar to KNN but uses fixed radius instead of K neighbors
-
Lazy Bayesian Rules
- Constructs Bayesian rules only when needed
Q3. ⭐ List/Give examples of algorithms used for Eager Learners.
[ Dec 2023 Q1(e), Dec 2022 Q3(b) | Frequency: 2 ]
Eager Learner Algorithms:
- Decision Trees (ID3, C4.5, CART)
-
Builds tree structure during training
-
Naive Bayes Classifier
-
Learns probability distributions from training data
-
Support Vector Machine (SVM)
-
Constructs optimal hyperplane during training
-
Neural Networks
-
Learns weights through backpropagation
-
Logistic Regression
-
Learns model parameters during training
-
Random Forest
-
Builds ensemble of decision trees
-
Linear Discriminant Analysis (LDA)
- Computes discriminant functions during training
Q4. ⭐ Discuss the concept of Classification.
[ Jun 2025 Q1(f) | Frequency: 1 ]
Classification is a supervised machine learning technique that categorizes data into predefined classes or labels based on input features.
Key Characteristics:
-
Supervised Learning: Requires labeled training data
-
Discrete Output: Predicts categorical class labels
-
Pattern Recognition: Learns decision boundaries from data
Working Process:
Diagram:
Components:
-
Training Phase: Algorithm learns from labeled examples
-
Model Building: Creates decision boundaries
-
Prediction Phase: Classifies new, unseen instances
Types:
-
Binary Classification (2 classes)
-
Multi-class Classification (>2 classes)
-
Multi-label Classification (multiple labels per instance)
Applications:
- Spam detection, Medical diagnosis, Image recognition, Sentiment analysis
Q5. 🔥 Give/List the algorithms for Classification.
[ Jun 2025 Q1(f), Jun 2024 Q1(g), Dec 2023 Q1(a) | Frequency: 3 ]
Classification Algorithms:
| Category | Algorithms |
|---|---|
| Tree-Based | Decision Tree, Random Forest, Gradient Boosting, XGBoost |
| Probabilistic | Naive Bayes, Bayesian Networks |
| Linear | Logistic Regression, Linear Discriminant Analysis (LDA) |
| Instance-Based | K-Nearest Neighbors (KNN) |
| Kernel-Based | Support Vector Machine (SVM) |
| Neural Networks | Perceptron, Multi-layer Perceptron, Deep Neural Networks |
| Ensemble | AdaBoost, Bagging, Voting Classifier |
Most Commonly Used:
-
Logistic Regression - Simple, interpretable
-
Decision Trees - Easy to visualize and understand
-
Random Forest - High accuracy, handles overfitting
-
SVM - Effective in high-dimensional spaces
-
KNN - Simple, no training phase
-
Naive Bayes - Fast, works well with text data
-
Neural Networks - Complex pattern recognition
Q6. 📌 Give suitable example for Classification.
[ Jun 2024 Q1(g) | Frequency: 1 ]
Example 1: Email Spam Classification
| Feature | Value |
|---|---|
| Contains "Free" | Yes |
| Contains "Winner" | Yes |
| Sender in contacts | No |
| Has attachments | Yes |
| Class Label | SPAM |
Example 2: Medical Diagnosis
-
Input: Patient symptoms, test results, age, medical history
-
Output: Disease category (Diabetic / Non-Diabetic)
Example 3: Loan Approval
-
Features: Income, credit score, employment status, debt ratio
-
Classes: Approved / Rejected
Example 4: Image Classification
-
Input: Pixel values of an image
-
Output: Cat / Dog / Bird
Example 5: Sentiment Analysis
-
Input: Customer review text
-
Output: Positive / Negative / Neutral
Q7. ⭐ Compare/Differentiate Clustering and Classification.
[ Jun 2024 Q1(g), Dec 2023 Q1(a) | Frequency: 2 ]
| Aspect | Classification | Clustering |
|---|---|---|
| Learning Type | Supervised | Unsupervised |
| Labels | Requires labeled data | No labels required |
| Goal | Predict predefined classes | Discover natural groupings |
| Categories | Known beforehand | Unknown, discovered from data |
| Output | Class label for each instance | Cluster assignment |
| Training | Learns from examples | Finds patterns in data |
| Evaluation | Accuracy, Precision, Recall | Silhouette score, Inertia |
| Algorithms | SVM, Decision Trees, KNN | K-Means, DBSCAN, Hierarchical |
| Use Case | Spam detection | Customer segmentation |
| Process | Assign to existing categories | Create new categories |
Diagram:
Q8. 📌 What is Binary Classification?
[ Jun 2023 Q3(b) | Frequency: 1 ]
Binary Classification is a type of classification where the output has exactly two possible classes.
Characteristics:
-
Only two mutually exclusive categories
-
Most fundamental classification type
-
Output: Class 0 or Class 1 (Yes/No, True/False)
Examples:
| Problem | Class 0 | Class 1 |
|---|---|---|
| Email filtering | Not Spam | Spam |
| Medical test | Negative | Positive |
| Loan approval | Reject | Approve |
| Fraud detection | Legitimate | Fraudulent |
| Sentiment | Negative | Positive |
Common Algorithms:
-
Logistic Regression
-
Support Vector Machine (SVM)
-
Decision Trees
-
Neural Networks with sigmoid output
Evaluation Metrics:
-
Accuracy, Precision, Recall, F1-Score
-
ROC-AUC Curve
-
Confusion Matrix (2×2)
Q9. ⭐ Differentiate between Multi-class Classification and Multi-label Classification.
[ Jun 2022 Q3(b) | Frequency: 1 ]
| Aspect | Multi-class Classification | Multi-label Classification |
|---|---|---|
| Definition | Each instance belongs to exactly one class from multiple classes | Each instance can belong to multiple classes simultaneously |
| Classes per Instance | Exactly one | Zero, one, or more |
| Mutual Exclusivity | Classes are mutually exclusive | Classes are NOT mutually exclusive |
| Output Format | Single class label | Set of class labels |
| Example Problem | Fruit classification (Apple/Orange/Banana) | Movie genre (Action AND Comedy AND Romance) |
| Algorithms | One-vs-Rest, One-vs-One, Softmax | Binary Relevance, Classifier Chains |
| Evaluation | Standard accuracy, F1-score | Hamming loss, Subset accuracy |
Diagram:
Q10. 📌 Give an example of Multi-class Classification.
[ Jun 2022 Q3(b) | Frequency: 1 ]
Example: Handwritten Digit Recognition (MNIST)
-
Input: 28×28 pixel image of handwritten digit
-
Classes: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 (10 classes)
-
Output: Single digit class (mutually exclusive)
Other Examples:
| Problem | Classes |
|---|---|
| Iris Flower Classification | Setosa, Versicolor, Virginica |
| News Categorization | Sports, Politics, Entertainment, Technology |
| Animal Classification | Cat, Dog, Elephant, Lion, Tiger |
| Weather Prediction | Sunny, Rainy, Cloudy, Snowy |
| Customer Segmentation | Premium, Standard, Basic |
Key Point: Each instance is assigned to exactly ONE class from multiple available options.
Q11. 📌 Give an example of Multi-label Classification.
[ Jun 2022 Q3(b) | Frequency: 1 ]
Example: Movie Genre Classification
-
Input: Movie plot, cast, keywords
-
Possible Labels: Action, Comedy, Romance, Thriller, Horror, Sci-Fi
-
Output: Multiple genres can apply
| Movie | Labels Assigned |
|---|---|
| "Avengers" | Action, Sci-Fi, Adventure |
| "Titanic" | Romance, Drama |
| "Scary Movie" | Horror, Comedy |
Other Examples:
| Problem | Possible Labels |
|---|---|
| Document Tagging | Politics, Sports, Finance, Health (multiple tags) |
| Image Annotation | Beach, Sunset, People, Mountains |
| Music Classification | Pop, Rock, Electronic, Vocal |
| Protein Function | Multiple biological functions |
| Product Categorization | Electronics, Home, Office, Portable |
Key Point: Each instance can have zero, one, or multiple labels simultaneously.
Q12. ⭐ Can binary classification algorithms be altered for multi-class problems? Justify.
[ Jun 2023 Q3(b) | Frequency: 1 ]
Yes, binary classification algorithms can be extended for multi-class problems using decomposition strategies.
Justification:
Many powerful algorithms (SVM, Logistic Regression) are inherently binary. To handle multiple classes, we decompose the multi-class problem into multiple binary problems.
Two Main Approaches:
| Approach | Method | Binary Classifiers Needed |
|---|---|---|
| One-vs-Rest (OvR) | One classifier per class vs all others | N classifiers for N classes |
| One-vs-One (OvO) | One classifier per pair of classes | N(N-1)/2 classifiers |
Example with 4 Classes (A, B, C, D):
-
OvR: 4 classifiers (A vs Rest, B vs Rest, C vs Rest, D vs Rest)
-
OvO: 6 classifiers (A vs B, A vs C, A vs D, B vs C, B vs D, C vs D)
Algorithms Supporting Multi-class Natively:
-
Decision Trees
-
Random Forest
-
Naive Bayes
-
Neural Networks (with softmax)
Conclusion: Binary algorithms remain valuable through decomposition strategies, making them versatile for multi-class classification.
Q13. 📌 Discuss One-versus-the-Rest (OvR) Approach.
[ Jun 2023 Q3(b) | Frequency: 1 ]
One-vs-Rest (OvR), also called One-vs-All (OvA), is a strategy to extend binary classifiers for multi-class classification.
Working Principle:
-
For N classes, train N binary classifiers
-
Each classifier distinguishes one class from all other classes combined
-
During prediction, choose class with highest confidence score
Diagram:
Example with Classes {Cat, Dog, Bird}:
-
Classifier 1: Cat vs (Dog + Bird)
-
Classifier 2: Dog vs (Cat + Bird)
-
Classifier 3: Bird vs (Cat + Dog)
Advantages:
-
Computationally efficient (only N classifiers)
-
Interpretable results
-
Works with any binary classifier
Disadvantages:
-
Class imbalance (one class vs many)
-
Ambiguous regions possible
Q14. 📌 Discuss One-versus-One (OvO) Approach.
[ Jun 2023 Q3(b) | Frequency: 1 ]
One-vs-One (OvO) is a multi-class strategy that trains a binary classifier for every pair of classes.
Working Principle:
-
For N classes, train N(N-1)/2 binary classifiers
-
Each classifier distinguishes between two specific classes
-
Prediction uses voting: each classifier votes for one class
-
Final prediction: class with most votes
Diagram:
Example with Classes {Cat, Dog, Bird}:
-
Classifier 1: Cat vs Dog
-
Classifier 2: Cat vs Bird
-
Classifier 3: Dog vs Bird
Number of Classifiers: N(N-1)/2 = 3(2)/2 = 3
Advantages:
-
Balanced training data for each classifier
-
Each classifier focuses on only two classes
-
Often more accurate than OvR
Disadvantages:
-
More classifiers needed (O(N²) complexity)
-
Higher computational cost for large N
Q15. ⭐ Compare Classification and Regression techniques of Supervised Learning.
[ Dec 2023 Q4(b), Dec 2022 Q3(b) | Frequency: 2 ]
| Aspect | Classification | Regression |
|---|---|---|
| Output Type | Discrete (categorical labels) | Continuous (numerical values) |
| Prediction | Predicts class/category | Predicts quantity/value |
| Goal | Assign to predefined categories | Estimate numerical relationship |
| Decision Boundary | Creates boundaries between classes | Fits a curve/line to data |
| Evaluation Metrics | Accuracy, Precision, Recall, F1 | MSE, RMSE, MAE, R² |
| Loss Function | Cross-entropy, Hinge loss | Mean Squared Error |
| Output Example | "Spam" or "Not Spam" | Price = $250,000 |
| Algorithms | SVM, Decision Trees, KNN | Linear Regression, SVR |
Diagram:
Examples:
| Classification | Regression |
|---|---|
| Disease diagnosis (Yes/No) | Temperature prediction (°C) |
| Email spam detection | House price estimation |
| Image recognition (Cat/Dog) | Stock price prediction |
| Credit approval (Accept/Reject) | Sales forecasting |
Q16. 📌 Give an example of Classification techniques.
[ Dec 2022 Q3(b) | Frequency: 1 ]
Example: Credit Card Fraud Detection
Problem: Identify whether a transaction is fraudulent or legitimate
Technique: Decision Tree Classification
Features (Input):
| Feature | Value |
|---|---|
| Transaction Amount | $5,000 |
| Location | Overseas |
| Time | 3:00 AM |
| Merchant Category | Electronics |
| Previous Fraud History | No |
Target Classes: Fraudulent / Legitimate
Decision Tree Logic:
IF Amount > $1000 AND Location = Overseas AND Time = Late Night
THEN Class = "Fraudulent"
ELSE
Class = "Legitimate"
Other Classification Examples:
-
Naive Bayes → Spam email detection
-
SVM → Text categorization
-
Random Forest → Disease prediction
-
KNN → Handwriting recognition
-
Logistic Regression → Customer churn prediction
Q17. ⭐ Explain the various metrics used for evaluating the classification model.
[ Dec 2023 Q4(b) | Frequency: 1 ]
Classification Evaluation Metrics:
1. Confusion Matrix Components:
| Predicted Positive | Predicted Negative | |
|---|---|---|
| Actual Positive | True Positive (TP) | False Negative (FN) |
| Actual Negative | False Positive (FP) | True Negative (TN) |
2. Key Metrics:
| Metric | Formula | Interpretation |
|---|---|---|
| Accuracy | (TP + TN) / (TP + TN + FP + FN) | Overall correctness |
| Precision | TP / (TP + FP) | Correctness of positive predictions |
| Recall (Sensitivity) | TP / (TP + FN) | Coverage of actual positives |
| Specificity | TN / (TN + FP) | Coverage of actual negatives |
| F1-Score | 2 × (Precision × Recall) / (Precision + Recall) | Harmonic mean of precision and recall |
3. Advanced Metrics:
| Metric | Use Case |
|---|---|
| ROC-AUC | Overall model performance across thresholds |
| Log Loss | Probabilistic confidence of predictions |
| Cohen's Kappa | Agreement beyond chance |
| Matthews Correlation | Balanced measure for imbalanced data |
Choosing Metrics:
-
Balanced data → Accuracy
-
Imbalanced data → Precision, Recall, F1
-
Cost-sensitive → Custom weighted metrics
Q18. 🔥 Draw/Explain Confusion Matrix.
[ Jun 2023 Q1(g), Jun 2022 Q5(h) | Frequency: 2 ]
Confusion Matrix is a table that summarizes the performance of a classification model by comparing actual vs. predicted values.
Diagram:
Matrix Structure (Binary Classification):
| Predicted: YES | Predicted: NO | |
|---|---|---|
| Actual: YES | TP (Correct) | FN (Type II Error) |
| Actual: NO | FP (Type I Error) | TN (Correct) |
Components Explained:
| Term | Meaning | Example (Disease Detection) |
|---|---|---|
| True Positive (TP) | Correctly predicted positive | Sick patient correctly diagnosed |
| True Negative (TN) | Correctly predicted negative | Healthy patient correctly identified |
| False Positive (FP) | Incorrectly predicted positive | Healthy patient wrongly diagnosed sick |
| False Negative (FN) | Incorrectly predicted negative | Sick patient wrongly identified healthy |
Example:
| Predicted Spam | Predicted Not Spam | |
|---|---|---|
| Actual Spam | 85 (TP) | 15 (FN) |
| Actual Not Spam | 10 (FP) | 90 (TN) |
Derived Metrics:
-
Accuracy = (85+90)/(85+90+10+15) = 87.5%
-
Precision = 85/(85+10) = 89.5%
-
Recall = 85/(85+15) = 85%
Q19. 📌 Write formula for Accuracy.
[ Jun 2023 Q1(g) | Frequency: 1 ]
Accuracy measures the proportion of correct predictions among total predictions.
Formula:
Or equivalently:
Where:
-
TP = True Positives
-
TN = True Negatives
-
FP = False Positives
-
FN = False Negatives
Example: If TP=80, TN=70, FP=10, FN=20
Note: Accuracy can be misleading for imbalanced datasets.
Q20. 📌 Write formula for Precision.
[ Jun 2023 Q1(g) | Frequency: 1 ]
Precision measures the accuracy of positive predictions (how many predicted positives are actually positive).
Formula:
Where:
-
TP = True Positives
-
FP = False Positives
Example: If TP=80, FP=20
Interpretation: Of all instances predicted as positive, 80% were actually positive.
Use Case: Important when false positives are costly (e.g., spam detection - don't want to mark important emails as spam).
Q21. 📌 Write formula for Sensitivity (Recall).
[ Jun 2023 Q1(g) | Frequency: 1 ]
Sensitivity (also called Recall or True Positive Rate) measures the proportion of actual positives correctly identified.
Formula:
Where:
-
TP = True Positives
-
FN = False Negatives
Example: If TP=80, FN=20
Interpretation: 80% of all actual positive cases were correctly identified.
Use Case: Critical when missing positives is costly (e.g., cancer detection - want to catch all cases).
Q22. 📌 Write formula for Specificity.
[ Jun 2023 Q1(g) | Frequency: 1 ]
Specificity (also called True Negative Rate) measures the proportion of actual negatives correctly identified.
Formula:
Where:
-
TN = True Negatives
-
FP = False Positives
Example: If TN=90, FP=10
Interpretation: 90% of all actual negative cases were correctly identified.
Use Case: Important when false positives are problematic (e.g., drug testing - don't want to falsely accuse).
Relationship with Sensitivity:
-
High Sensitivity → Fewer false negatives
-
High Specificity → Fewer false positives
-
Trade-off often exists between the two
10.5.1 Naive Bayes
Q23. ⭐ Write Naive Bayes' Algorithm.
[ Jun 2024 Q4(b) | Frequency: 1 ]
Naive Bayes Algorithm is a probabilistic classifier based on Bayes' Theorem with the "naive" assumption of feature independence.
Bayes' Theorem:
Algorithm Steps:
NAIVE BAYES ALGORITHM
Input: Training dataset D with features X and class labels C
New instance x = (x₁, x₂, ..., xₙ)
Step 1: CALCULATE PRIOR PROBABILITIES
For each class cᵢ:
P(cᵢ) = Count(cᵢ) / Total instances
Step 2: CALCULATE LIKELIHOOD PROBABILITIES
For each class cᵢ and each feature xⱼ:
P(xⱼ|cᵢ) = Count(xⱼ in cᵢ) / Count(cᵢ)
Step 3: APPLY NAIVE BAYES FORMULA
For each class cᵢ:
P(cᵢ|X) ∝ P(cᵢ) × ∏ P(xⱼ|cᵢ)
j=1 to n
Step 4: CLASSIFY
Predicted class = argmax P(cᵢ|X)
cᵢ
Output: Class with maximum posterior probability
Key Assumption: All features are conditionally independent given the class label.
Q24. 🔥 Explain Naive Bayes' Algorithm with suitable example.
[ Jun 2024 Q4(b), Jun 2023 Q5(h) | Frequency: 2 ]
Naive Bayes Classifier is a supervised learning algorithm based on Bayes' Theorem that assumes all features are independent of each other given the class.
Bayes' Theorem:
Working Principle:
Diagram:
Example: Play Tennis Classification
Training Data:
| Outlook | Temperature | Humidity | Play Tennis |
|---|---|---|---|
| Sunny | Hot | High | No |
| Sunny | Hot | High | No |
| Overcast | Hot | High | Yes |
| Rain | Mild | High | Yes |
| Rain | Cool | Normal | Yes |
| Overcast | Cool | Normal | Yes |
| Sunny | Mild | High | No |
| Sunny | Cool | Normal | Yes |
| Rain | Mild | Normal | Yes |
| Sunny | Mild | Normal | Yes |
New Instance: Outlook=Sunny, Temperature=Cool, Humidity=High
Step 1: Calculate Prior Probabilities
-
P(Yes) = 7/10 = 0.7
-
P(No) = 3/10 = 0.3
Step 2: Calculate Likelihoods
| Feature | P(Feature|Yes) | P(Feature|No) |
|---|---|---|
| Sunny | 2/7 | 3/3 = 1 |
| Cool | 1/7 | 1/3 |
| High | 2/7 | 2/3 |
Step 3: Apply Naive Bayes
P(Yes|X) ∝ P(Yes) × P(Sunny|Yes) × P(Cool|Yes) × P(High|Yes) = 0.7 × (2/7) × (1/7) × (2/7) = 0.7 × 0.0058 = 0.0041
P(No|X) ∝ P(No) × P(Sunny|No) × P(Cool|No) × P(High|No) = 0.3 × 1 × (1/3) × (2/3) = 0.3 × 0.222 = 0.0667
Step 4: Classification Since P(No|X) > P(Yes|X), Predicted Class = No
Advantages: Fast, works well with high-dimensional data, handles missing values Applications: Spam filtering, sentiment analysis, document classification
10.5.2 K-Nearest Neighbour (K-NN)
Q25. ⭐ Explain K-Nearest Neighbour.
[ Dec 2022 Q5(h) | Frequency: 1 ]
K-Nearest Neighbour (KNN) is a lazy learning algorithm that classifies a new instance based on the majority class of its K nearest neighbors.
Key Characteristics:
-
Instance-based learning (no explicit model)
-
Non-parametric (no assumptions about data distribution)
-
Lazy learner (stores training data, computes at prediction time)
Working Principle:
Diagram:
Algorithm Steps:
-
Choose the value of K (number of neighbors)
-
Calculate distance from new point to all training points
-
Select K nearest neighbors based on calculated distances
-
Assign class by majority voting among K neighbors
Distance Metrics:
| Metric | Formula |
|---|---|
| Euclidean | $\sqrt{\sum(x_i - y_i)^2}$ |
| Manhattan | $\sum|x_i - y_i|$ |
| Minkowski | $(\sum|x_i - y_i|^p)^{1/p}$ |
Choosing K:
-
Small K → Sensitive to noise
-
Large K → Smoother boundaries but may miss patterns
-
Typically use odd K to avoid ties
Applications: Pattern recognition, image classification, recommendation systems
Q26. 🔥 Find/Apply KNN classification algorithm to classify class of given point with K=3 and Euclidean distance.
[ Jun 2025 Q3(b), Dec 2024 Q4(a) | Frequency: 2 ]
Problem: Classify a new point using KNN with K=3 and Euclidean distance.
Given Training Data:
| Point | X | Y | Class |
|---|---|---|---|
| P1 | 1 | 1 | A |
| P2 | 2 | 2 | A |
| P3 | 3 | 3 | A |
| P4 | 6 | 6 | B |
| P5 | 7 | 7 | B |
| P6 | 8 | 8 | B |
New Point to Classify: Q = (4, 4)
Step 1: Calculate Euclidean Distance
| Point | Distance from Q(4,4) | Class |
|---|---|---|
| P1 | $\sqrt{(4-1)^2 + (4-1)^2} = \sqrt{18} = 4.24$ | A |
| P2 | $\sqrt{(4-2)^2 + (4-2)^2} = \sqrt{8} = 2.83$ | A |
| P3 | $\sqrt{(4-3)^2 + (4-3)^2} = \sqrt{2} = 1.41$ | A |
| P4 | $\sqrt{(4-6)^2 + (4-6)^2} = \sqrt{8} = 2.83$ | B |
| P5 | $\sqrt{(4-7)^2 + (4-7)^2} = \sqrt{18} = 4.24$ | B |
| P6 | $\sqrt{(4-8)^2 + (4-8)^2} = \sqrt{32} = 5.66$ | B |
Step 2: Sort by Distance and Select K=3 Nearest
| Rank | Point | Distance | Class |
|---|---|---|---|
| 1 | P3 | 1.41 | A |
| 2 | P2 | 2.83 | A |
| 3 | P4 | 2.83 | B |
Step 3: Majority Voting
-
Class A: 2 votes
-
Class B: 1 vote
Step 4: Classification Result
Diagram:
Q27. ⭐ Differentiate between K-NN Algorithm and K-Means Algorithm.
[ Dec 2023 Q4(c)(ii) | Frequency: 1 ]
| Aspect | K-NN (K-Nearest Neighbors) | K-Means Clustering |
|---|---|---|
| Type | Supervised Learning | Unsupervised Learning |
| Purpose | Classification/Regression | Clustering |
| K Meaning | Number of neighbors to consider | Number of clusters to form |
| Labels Required | Yes (labeled data) | No (unlabeled data) |
| Learning Type | Lazy learner (instance-based) | Eager learner |
| Training Phase | None (stores all data) | Iterative training |
| Output | Class label for new point | Cluster assignments |
| Algorithm | Distance-based voting | Centroid-based grouping |
| Prediction Time | Slow (computes at runtime) | Fast (uses learned centroids) |
| K Selection | Affects classification boundary | Affects number of groups |
| Use Case | Spam detection, image recognition | Customer segmentation |
Diagram:
Q28. 📌 Give example of K-NN Algorithm.
[ Dec 2023 Q4(c)(ii) | Frequency: 1 ]
Example: Fruit Classification
Scenario: Classify a fruit as Apple, Orange, or Banana based on weight and color intensity.
Training Data:
| Fruit | Weight (g) | Color Intensity | Class |
|---|---|---|---|
| F1 | 150 | 8 | Apple |
| F2 | 160 | 7 | Apple |
| F3 | 200 | 6 | Orange |
| F4 | 180 | 5 | Orange |
| F5 | 120 | 3 | Banana |
| F6 | 130 | 2 | Banana |
New Fruit: Weight = 155g, Color Intensity = 7
Using K=3:
-
Calculate distances to all fruits
-
Find 3 nearest neighbors (likely F1, F2, F3)
-
Majority voting: Apple=2, Orange=1
-
Classification: Apple
Other KNN Applications:
-
Recommendation Systems: Recommend products based on similar user preferences
-
Medical Diagnosis: Classify disease based on similar patient records
-
Handwriting Recognition: Match digits to similar training samples
-
Credit Scoring: Classify loan applications based on similar customers
10.5.3 Decision Trees
Q29. ⭐ Write ID3 Algorithm for creating decision tree for any training dataset.
[ Jun 2025 Q4(a) | Frequency: 1 ]
ID3 (Iterative Dichotomiser 3) is a decision tree algorithm that uses Information Gain to select the best attribute for splitting.
Key Concepts:
Entropy (Measure of Impurity):
Information Gain:
ID3 Algorithm:
ALGORITHM ID3(S, Attributes, Target)
Input: S = Training set
Attributes = Set of features
Target = Target attribute (class label)
Output: Decision Tree
BEGIN
1. Create root node for tree
2. IF all examples in S belong to same class c THEN
RETURN leaf node with label c
END IF
3. IF Attributes is empty THEN
RETURN leaf node with majority class in S
END IF
4. CALCULATE Entropy of S: H(S)
5. FOR each attribute A in Attributes DO
CALCULATE Information Gain: IG(S, A)
END FOR
6. SELECT attribute A_best with highest Information Gain
7. SET A_best as decision attribute for root
8. FOR each value v of A_best DO
8.1 Add branch from root for condition A_best = v
8.2 Let S_v = subset of S where A_best = v
8.3 IF S_v is empty THEN
Add leaf with majority class of S
ELSE
Add subtree: ID3(S_v, Attributes - {A_best}, Target)
END IF
END FOR
9. RETURN root node
END
Diagram:
Q30. ⭐ Discuss Decision Trees with suitable example.
[ Jun 2023 Q2(b)(iii) | Frequency: 1 ]
Decision Tree is a supervised learning algorithm that creates a tree-like model of decisions and their possible consequences.
Structure:
-
Root Node: Top node, first decision point
-
Internal Nodes: Decision points based on features
-
Branches: Outcomes of decisions
-
Leaf Nodes: Final classification/prediction
Example: Loan Approval System
Training Data:
| Age | Income | Credit Score | Loan Approved |
|---|---|---|---|
| Young | High | Good | Yes |
| Young | Low | Good | No |
| Middle | High | Good | Yes |
| Middle | Low | Bad | No |
| Old | High | Good | Yes |
| Old | Low | Good | Yes |
Decision Tree:
Diagram:
Classification of New Instance:
-
Credit Score = Good, Income = Low, Age = Old
-
Path: Credit Score(Good) → Income(Low) → Age(Old) → YES (Approved)
Advantages:
-
Easy to interpret and visualize
-
Handles both numerical and categorical data
-
No feature scaling required
Algorithms: ID3, C4.5, CART, Random Forest
10.5.4 Logistic Regression
Q31. ⭐ What is Logistic Regression?
[ Dec 2024 Q1(c), Dec 2022 Q1(g) | Frequency: 2 ]
Logistic Regression is a supervised machine learning algorithm used for binary classification problems. Despite its name, it is a classification algorithm, not regression.
Key Characteristics:
-
Predicts probability of class membership
-
Output range: 0 to 1
-
Uses Sigmoid (Logistic) function
-
Linear decision boundary
Sigmoid Function:
Where: $z = w_0 + w_1x_1 + w_2x_2 + ... + w_nx_n$
Diagram:
Output Interpretation:
-
P(Y=1|X) ≥ 0.5 → Class 1
-
P(Y=1|X) < 0.5 → Class 0
Applications:
-
Spam detection
-
Disease diagnosis
-
Credit risk assessment
-
Customer churn prediction
Q32. 🔥 Explain Logistic Regression.
[ Jun 2023 Q5(g), Jun 2022 Q4(b)(iii) | Frequency: 2 ]
Logistic Regression is a statistical method for binary classification that models the probability of an outcome using the logistic (sigmoid) function.
Mathematical Foundation:
Linear Model: $z = \beta_0 + \beta_1x_1 + \beta_2x_2 + ... + \beta_nx_n$
Sigmoid Transformation: $P(Y=1|X) = \frac{1}{1 + e^{-z}}$
Diagram:
{
"$schema": "https://vega.github.io/schema/vega-lite/v5.json",
"width": 300,
"height": 200,
"title": "Sigmoid Function",
"data": {
"sequence": {
"start": -6,
"stop": 6,
"step": 0.1,
"as": "z"
}
},
"transform": [
{"calculate": "1/(1+exp(-datum.z))", "as": "sigmoid"}
],
"mark": "line",
"encoding": {
"x": {"field": "z", "type": "quantitative", "title": "z"},
"y": {"field": "sigmoid", "type": "quantitative", "title": "P(Y=1)"}
}
}
Working Process:
-
Input Features: Receive feature vector X
-
Linear Combination: Calculate weighted sum z
-
Apply Sigmoid: Transform z to probability [0,1]
-
Thresholding: Convert probability to class (typically 0.5 threshold)
Cost Function (Log Loss):
Example: Pass/Fail Prediction
| Hours Studied | Pass (1) / Fail (0) |
|---|---|
| 2 | 0 |
| 3 | 0 |
| 5 | 1 |
| 6 | 1 |
Model learns: $P(Pass) = \frac{1}{1 + e^{-(−4 + 1×hours)}}$
For 4 hours: $P(Pass) = \frac{1}{1 + e^0} = 0.5$ (boundary case)
Advantages: Interpretable, probabilistic output, efficient training
Q33. 🔥 Discuss the various types of Logistic Regressions briefly.
[ Dec 2024 Q1(c), Dec 2022 Q1(g) | Frequency: 2 ]
Types of Logistic Regression:
| Type | Classes | Output | Use Case |
|---|---|---|---|
| Binary | 2 | 0 or 1 | Spam/Not Spam |
| Multinomial | >2 (mutually exclusive) | One of K classes | Digit recognition (0-9) |
| Ordinal | >2 (ordered) | Ordered categories | Rating (Low/Medium/High) |
1. Binary Logistic Regression:
-
Two classes (0 and 1)
-
Uses single sigmoid function
-
Most common type
Example: Disease positive/negative
2. Multinomial Logistic Regression:
-
Multiple classes (>2)
-
Classes are mutually exclusive
-
Uses Softmax function
Example: Classify news as Sports/Politics/Entertainment
3. Ordinal Logistic Regression:
-
Multiple ordered categories
-
Preserves natural ordering
-
Uses cumulative probabilities
Example: Customer satisfaction (Very Poor < Poor < Average < Good < Excellent)
Diagram:
Q34. ⭐ Differentiate between Logistic Regression and Linear Regression.
[ Dec 2023 Q4(c)(i) | Frequency: 1 ]
| Aspect | Logistic Regression | Linear Regression |
|---|---|---|
| Purpose | Classification | Regression |
| Output Type | Discrete (categorical) | Continuous (numerical) |
| Output Range | 0 to 1 (probability) | -∞ to +∞ |
| Function | Sigmoid/Logistic | Linear |
| Prediction | Class labels | Numeric values |
| Cost Function | Log Loss (Cross-entropy) | Mean Squared Error |
| Decision Boundary | S-curve (sigmoid) | Straight line |
| Threshold | Typically 0.5 | Not applicable |
| Use Case | Spam detection | House price prediction |
| Equation | $P = \frac{1}{1+e^{-z}}$ | $y = mx + c$ |
Diagram:
Key Difference: Linear regression predicts quantity, logistic regression predicts probability of class membership.
Q35. 📌 Give example of Logistic Regression.
[ Dec 2023 Q4(c)(i) | Frequency: 1 ]
Example: Email Spam Detection
Problem: Classify emails as Spam (1) or Not Spam (0)
Features:
| Feature | Description |
|---|---|
| x₁ | Contains word "free" (0/1) |
| x₂ | Contains word "winner" (0/1) |
| x₃ | Number of exclamation marks |
| x₄ | Sender in contacts (0/1) |
Model:
Prediction Example:
-
Email with: "free"=1, "winner"=1, exclamations=5, in contacts=0
-
z = -2 + 1.5(1) + 2(1) + 0.3(5) - 3(0) = 3
-
P(Spam) = 1/(1 + e⁻³) = 0.95
Decision: P(Spam) = 0.95 > 0.5 → Classified as SPAM
Other Examples:
-
Credit card fraud detection
-
Disease diagnosis (Diabetic/Non-diabetic)
-
Customer churn prediction
-
Loan default prediction
10.5.5 Support Vector Machines
Q36. 🔥 Explain Support Vector Machines with example.
[ Jun 2024 Q4(c)(i), Jun 2022 Q5(g) | Frequency: 2 ]
Support Vector Machine (SVM) is a supervised learning algorithm that finds the optimal hyperplane to separate classes with maximum margin.
Key Concepts:
| Term | Description |
|---|---|
| Hyperplane | Decision boundary that separates classes |
| Support Vectors | Data points closest to the hyperplane |
| Margin | Distance between hyperplane and nearest points |
| Kernel | Function to transform data for non-linear separation |
Working Principle:
Diagram:
Linear SVM:
-
Finds hyperplane: $w \cdot x + b = 0$
-
Maximizes margin: $\frac{2}{||w||}$
-
Classification: $f(x) = sign(w \cdot x + b)$
Kernel Trick for Non-linear Data:
| Kernel | Formula | Typical Use Case |
|---|---|---|
| Linear | $( K(x, y) = x \cdot y )$ | Linearly separable data |
| Polynomial | $( K(x, y) = (x \cdot y + c)^d )$ | Polynomial decision boundary |
| RBF / Gaussian | $( K(x, y) = e^{-\gamma |x - y|^2} )$ | Highly non-linear / complex boundaries |
| Example: Handwritten Digit Classification |
Problem: Classify digits as "1" or "7"
Features: Pixel intensities (784 features for 28×28 image)
Process:
-
Training data: Images of 1s and 7s with labels
-
SVM finds hyperplane in high-dimensional space
-
Support vectors: Images near decision boundary
-
New digit → Map to feature space → Classify based on hyperplane
Diagram:
Advantages:
-
Effective in high-dimensional spaces
-
Works well with clear margin of separation
-
Memory efficient (uses only support vectors)
Applications: Image classification, text categorization, bioinformatics, face detection
Summary Table: Classification Algorithms
| Algorithm | Type | Best For | Key Feature |
|---|---|---|---|
| Naive Bayes | Probabilistic | Text classification | Fast, simple |
| KNN | Instance-based | Pattern recognition | No training |
| Decision Tree | Rule-based | Interpretable models | Visual, explainable |
| Logistic Regression | Statistical | Binary classification | Probabilistic output |
| SVM | Geometric | High-dimensional data | Maximum margin |
UNIT 11: Regression
Q1. ⭐ Discuss the concept of regression.
[ Jun 2025 Q1(f) | Frequency: 1 ]
Regression is a supervised machine learning technique used to predict continuous numerical values based on input features. It establishes a relationship between dependent (target) and independent (predictor) variables.
Key Concepts:
| Aspect | Description |
|---|---|
| Purpose | Predict continuous outcomes (e.g., price, temperature, sales) |
| Output | Real-valued numbers (not discrete classes) |
| Relationship | Maps input features to continuous output |
| Training | Uses labeled data with known target values |
Mathematical Representation:
Where:
-
Y = Dependent variable (target)
-
X = Independent variables (features)
-
f = Function learned from data
-
ε = Error term
Types of Regression:
-
Simple Regression - One independent variable
-
Multiple Regression - Multiple independent variables
-
Linear Regression - Linear relationship
-
Non-linear Regression - Curved relationships
Applications:
-
House price prediction
-
Stock market forecasting
-
Weather prediction
-
Sales forecasting
-
Risk assessment
Q2. 📌 Give the list of algorithms for regression.
[ Jun 2025 Q1(f) | Frequency: 1 ]
Regression Algorithms:
| Category | Algorithms |
|---|---|
| Linear Methods | Simple Linear Regression, Multiple Linear Regression, Ridge Regression, Lasso Regression, Elastic Net |
| Non-Linear Methods | Polynomial Regression, Exponential Regression, Logarithmic Regression |
| Tree-Based | Decision Tree Regression, Random Forest Regression, Gradient Boosting Regression, XGBoost |
| Support Vector | Support Vector Regression (SVR) |
| Neural Networks | Multi-layer Perceptron Regression, Deep Neural Networks |
| Instance-Based | K-Nearest Neighbors Regression |
| Bayesian | Bayesian Linear Regression, Gaussian Process Regression |
Selection Criteria:
-
Linear relationships → Linear Regression
-
Non-linear patterns → Polynomial/SVR
-
High dimensionality → Ridge/Lasso
-
Complex patterns → Random Forest/Neural Networks
Q3. 📌 Give an example of Regression techniques.
[ Dec 2022 Q3(b) | Frequency: 1 ]
Example: House Price Prediction using Linear Regression
Problem: Predict house prices based on area (sq ft)
Dataset:
| Area (sq ft) | Price (₹ Lakhs) |
|---|---|
| 1000 | 50 |
| 1500 | 70 |
| 2000 | 90 |
| 2500 | 110 |
| 3000 | 130 |
Model: Linear Regression
Learned Parameters:
-
Slope (m) = 0.04
-
Intercept (c) = 10
Prediction Equation:
Prediction for 1800 sq ft:
Diagram:
Q4. ⭐ How regression differs from classification?
[ Jun 2022 Q4(b) | Frequency: 1 ]
Differences between Regression and Classification:
| Aspect | Regression | Classification |
|---|---|---|
| Output Type | Continuous numerical values | Discrete categorical labels |
| Prediction | Predicts quantity | Predicts category/class |
| Example Output | 25.5, 100.75, -3.2 | "Yes/No", "Cat/Dog", "A/B/C" |
| Algorithms | Linear Regression, SVR, Polynomial | Logistic Regression, SVM, Decision Tree |
| Evaluation Metrics | MSE, RMSE, MAE, R² | Accuracy, Precision, Recall, F1-Score |
| Loss Function | Mean Squared Error | Cross-Entropy Loss |
| Decision Boundary | Fits a line/curve through data | Separates classes |
Examples:
| Regression Problems | Classification Problems |
|---|---|
| Predict house price | Predict spam/not spam |
| Forecast temperature | Detect disease presence |
| Estimate sales | Classify image objects |
| Predict stock prices | Sentiment analysis |
Diagram:
Q5. 📌 What is the similarity between regression and classification?
[ Jun 2022 Q4(b) | Frequency: 1 ]
Similarities between Regression and Classification:
| Similarity | Description |
|---|---|
| Supervised Learning | Both require labeled training data |
| Learning from Data | Both learn patterns from input-output pairs |
| Feature Input | Both use input features to make predictions |
| Model Training | Both optimize parameters to minimize error |
| Generalization | Both aim to perform well on unseen data |
| Overfitting Risk | Both susceptible to overfitting |
| Preprocessing | Both require data preprocessing (normalization, handling missing values) |
| Feature Selection | Both benefit from relevant feature selection |
| Cross-Validation | Both use cross-validation for model evaluation |
| Regularization | Both can use regularization techniques |
Common Workflow:
Data Collection → Preprocessing → Feature Engineering →
Model Training → Validation → Testing → Deployment
Shared Techniques:
-
Train-test split
-
K-fold cross-validation
-
Hyperparameter tuning
-
Feature scaling
-
Ensemble methods
Q6. 🔥 What is linear regression?
[ Jun 2023 Q4(b) | Frequency: 1 ]
Linear Regression is a fundamental supervised learning algorithm that models the relationship between dependent variable (Y) and one or more independent variables (X) using a linear equation.
Simple Linear Regression Equation:
Where:
-
Y = Dependent variable (target)
-
X = Independent variable (feature)
-
β₀ = Y-intercept (constant term)
-
β₁ = Slope (coefficient)
-
ε = Error term
Key Characteristics:
| Property | Description |
|---|---|
| Relationship | Assumes linear relationship between variables |
| Output | Continuous numerical prediction |
| Goal | Find best-fit line minimizing errors |
| Assumption | Errors are normally distributed |
Diagram:
Assumptions of Linear Regression:
-
Linearity - Linear relationship between X and Y
-
Independence - Observations are independent
-
Homoscedasticity - Constant variance of errors
-
Normality - Errors are normally distributed
-
No Multicollinearity - Independent variables not highly correlated
Q7. 📌 Discuss Linear regression briefly.
[ Jun 2022 Q4(b)(i) | Frequency: 1 ]
Linear Regression Overview:
Linear regression fits a straight line through data points to model the relationship between variables.
Types:
-
Simple Linear Regression - One independent variable
-
Multiple Linear Regression - Multiple independent variables
Working Principle:
Step 1: Initialize parameters (slope, intercept)
Step 2: Calculate predicted values
Step 3: Compute error (actual - predicted)
Step 4: Adjust parameters to minimize error
Step 5: Repeat until convergence
Best Fit Line:
The line that minimizes the sum of squared differences between actual and predicted values.
Diagram:
Advantages:
-
Simple and interpretable
-
Fast training and prediction
-
Works well for linearly separable data
Disadvantages:
-
Assumes linear relationship
-
Sensitive to outliers
-
Cannot capture complex patterns
Q8. 📌 Give example of Linear regression.
[ Dec 2023 Q4(c)(i) | Frequency: 1 ]
Example: Predicting Student Marks based on Study Hours
Problem Statement: Predict exam marks based on hours studied.
Training Data:
| Hours Studied (X) | Marks (Y) |
|---|---|
| 2 | 40 |
| 4 | 55 |
| 6 | 65 |
| 8 | 80 |
| 10 | 90 |
Step 1: Calculate Means
-
Mean of X (X̄) = (2+4+6+8+10)/5 = 6
-
Mean of Y (Ȳ) = (40+55+65+80+90)/5 = 66
Step 2: Calculate Slope (β₁)
| X | Y | X-X̄ | Y-Ȳ | (X-X̄)(Y-Ȳ) | (X-X̄)² |
|---|---|---|---|---|---|
| 2 | 40 | -4 | -26 | 104 | 16 |
| 4 | 55 | -2 | -11 | 22 | 4 |
| 6 | 65 | 0 | -1 | 0 | 0 |
| 8 | 80 | 2 | 14 | 28 | 4 |
| 10 | 90 | 4 | 24 | 96 | 16 |
| Sum | 250 | 40 |
Step 3: Calculate Intercept (β₀)
Final Equation:
Prediction: For 7 hours of study:
Q9. 🔥 How linear regression is performed using least square method?
[ Jun 2023 Q4(b) | Frequency: 1 ]
Least Square Method finds the best-fit line by minimizing the sum of squared residuals (differences between actual and predicted values).
Objective Function:
Derivation of Parameters:
For Slope (β₁):
Or equivalently:
For Intercept (β₀):
Step-by-Step Process:
Step 1: Collect data points (X₁,Y₁), (X₂,Y₂),...,(Xₙ,Yₙ)
Step 2: Calculate required sums: ΣX, ΣY, ΣXY, ΣX²
Step 3: Compute means: X̄, Ȳ
Step 4: Calculate slope β₁ using formula
Step 5: Calculate intercept β₀ using formula
Step 6: Form regression equation: Y = β₀ + β₁X
Diagram:
Why "Least Squares"?
-
Squaring ensures all errors are positive
-
Larger errors are penalized more heavily
-
Provides unique optimal solution
-
Mathematically tractable (closed-form solution)
Q10. 🔥 Find the regression line for given data points.
[ Jun 2023 Q4(b) | Frequency: 1 ]
Example Problem: Find regression line for the following data:
| X | 1 | 2 | 3 | 4 | 5 |
|---|---|---|---|---|---|
| Y | 2 | 4 | 5 | 4 | 5 |
Solution:
Step 1: Create Calculation Table
| X | Y | XY | X² |
|---|---|---|---|
| 1 | 2 | 2 | 1 |
| 2 | 4 | 8 | 4 |
| 3 | 5 | 15 | 9 |
| 4 | 4 | 16 | 16 |
| 5 | 5 | 25 | 25 |
| ΣX=15 | ΣY=20 | ΣXY=66 | ΣX²=55 |
Step 2: Calculate Parameters
n = 5 (number of data points)
Slope (β₁):
Means:
Intercept (β₀):
Step 3: Form Regression Equation
Verification (for X=3):
This matches our calculated Ȳ = 4 ✓
Q11. ⭐ Discuss the term 'mean squared error'.
[ Jun 2023 Q4(b) | Frequency: 1 ]
Mean Squared Error (MSE) is a metric used to evaluate the performance of regression models by measuring the average of squared differences between actual and predicted values.
Formula:
Where:
-
n = Number of data points
-
Yᵢ = Actual value
-
Ŷᵢ = Predicted value
Characteristics:
| Property | Description |
|---|---|
| Range | 0 to ∞ (lower is better) |
| Unit | Squared unit of target variable |
| Sensitivity | Highly sensitive to outliers |
| Interpretation | Average squared prediction error |
Example Calculation:
| Actual (Y) | Predicted (Ŷ) | Error (Y-Ŷ) | Error² |
|---|---|---|---|
| 10 | 12 | -2 | 4 |
| 15 | 14 | 1 | 1 |
| 20 | 18 | 2 | 4 |
| 25 | 26 | -1 | 1 |
| Sum | 10 |
Related Metrics:
| Metric | Formula | Advantage |
|---|---|---|
| RMSE | √MSE | Same unit as target |
| MAE | (1/n)Σ|Y-Ŷ| | Less sensitive to outliers |
| R² | 1 - (SSE/SST) | Normalized (0-1 scale) |
Advantages of MSE:
-
Differentiable (useful for optimization)
-
Penalizes larger errors more
-
Unique minimum solution
Disadvantages:
-
Sensitive to outliers
-
Unit is squared (less interpretable)
Q12. 📌 Discuss the term 'mean absolute errors'.
[ Jun 2023 Q4(b) | Frequency: 1 ]
Mean Absolute Error (MAE) measures the average magnitude of errors between actual and predicted values without considering direction.
Formula:
Where:
-
n = Number of data points
-
Yᵢ = Actual value
-
Ŷᵢ = Predicted value
-
|...| = Absolute value
Characteristics:
| Property | Description |
|---|---|
| Range | 0 to ∞ (lower is better) |
| Unit | Same as target variable |
| Sensitivity | Robust to outliers |
| Interpretation | Average prediction error in original units |
Example Calculation:
| Actual (Y) | Predicted (Ŷ) | |Error| |
|---|---|---|
| 10 | 12 | 2 |
| 15 | 14 | 1 |
| 20 | 18 | 2 |
| 25 | 26 | 1 |
| Sum | 6 |
Comparison: MAE vs MSE
| Aspect | MAE | MSE |
|---|---|---|
| Formula | Average of |errors| | Average of errors² |
| Outlier Sensitivity | Less sensitive | More sensitive |
| Differentiability | Not differentiable at 0 | Fully differentiable |
| Unit | Original unit | Squared unit |
| Interpretation | More intuitive | Mathematical convenience |
When to Use MAE:
-
When outliers should not dominate
-
When interpretability is important
-
When all errors are equally important
Q13. 📌 Discuss Multiple linear regression briefly.
[ Jun 2022 Q4(b)(ii) | Frequency: 1 ]
Multiple Linear Regression extends simple linear regression to include multiple independent variables for predicting the dependent variable.
Equation:
Matrix Form:
Example: House Price Prediction
Key Components:
| Component | Description |
|---|---|
| β₀ | Intercept (base value) |
| β₁, β₂...βₙ | Coefficients for each feature |
| X₁, X₂...Xₙ | Independent variables |
| Y | Dependent variable |
Diagram:
Assumptions:
-
Linear relationship between predictors and response
-
No multicollinearity among predictors
-
Homoscedasticity of residuals
-
Independence of observations
-
Normal distribution of errors
Coefficient Estimation (OLS):
Applications:
-
Sales forecasting
-
Economic modeling
-
Medical research
-
Marketing analysis
Q14. 🔥 Differentiate between linear regression and polynomial regression techniques.
[ Jun 2025 Q1(c), Dec 2022 Q1(h) | Frequency: 2 ]
Comparison Table:
| Aspect | Linear Regression | Polynomial Regression |
|---|---|---|
| Equation | Y = β₀ + β₁X | Y = β₀ + β₁X + β₂X² + ... + βₙXⁿ |
| Relationship | Fits straight line | Fits curved line |
| Complexity | Simple model | More complex |
| Flexibility | Limited to linear patterns | Captures non-linear patterns |
| Parameters | 2 (slope, intercept) | n+1 (degree n polynomial) |
| Overfitting Risk | Low | High (with high degree) |
| Interpretability | Highly interpretable | Less interpretable |
| Computation | Fast | Slower with higher degrees |
| Use Case | Linear relationships | Curved relationships |
Visual Comparison:
Diagram:
Example:
| Data Pattern | Best Choice |
|---|---|
| House price vs area (linear trend) | Linear Regression |
| Growth curve (exponential pattern) | Polynomial Regression |
| Seasonal sales data | Polynomial Regression |
| Simple proportional relationship | Linear Regression |
Key Insight: Polynomial regression is essentially linear regression with polynomial features (X, X², X³...).
Q15. 🔥 Explain Polynomial Regression with example.
[ Jun 2024 Q4(c)(iii), Jun 2022 Q4(b)(iv) | Frequency: 2 ]
Polynomial Regression fits a polynomial equation to the data, allowing modeling of non-linear relationships between variables.
General Equation (Degree n):
Example: Quadratic Polynomial (Degree 2)
Practical Example: Predicting Fuel Efficiency
| Speed (X) | Fuel Efficiency (Y) |
|---|---|
| 20 | 25 |
| 40 | 35 |
| 60 | 38 |
| 80 | 32 |
| 100 | 22 |
Note: Efficiency increases then decreases (parabolic pattern)
Solution using Degree 2 Polynomial:
Step 1: Create Feature Matrix
| X | X² | Y |
|---|---|---|
| 20 | 400 | 25 |
| 40 | 1600 | 35 |
| 60 | 3600 | 38 |
| 80 | 6400 | 32 |
| 100 | 10000 | 22 |
Step 2: Apply Linear Regression on [X, X²]
Using least squares: β₀ = 5, β₁ = 1.1, β₂ = -0.01
Step 3: Final Equation
Prediction for Speed = 70:
Diagram:
Choosing Polynomial Degree:
-
Degree 1 - Linear (straight line)
-
Degree 2 - Quadratic (parabola)
-
Degree 3 - Cubic (S-curve)
-
Higher degrees - Risk of overfitting
Q16. 🔥 Explain Support Vector Regression with example. Draw diagram. Give applications.
[ Jun 2024 Q4(c)(ii), Dec 2022 Q4(b) | Frequency: 2 ]
Support Vector Regression (SVR) applies Support Vector Machine concepts to regression problems by finding a function that deviates from actual targets by at most ε (epsilon) while being as flat as possible.
Key Concepts:
| Term | Description |
|---|---|
| ε-tube | Margin of tolerance around regression line |
| Support Vectors | Data points on or outside the tube boundary |
| Slack Variables | Allow some points outside ε-tube |
| Kernel | Transform data to higher dimensions |
Mathematical Formulation:
Objective: Minimize
Subject to:
Where:
-
w = Weight vector
-
C = Regularization parameter
-
ξ, ξ* = Slack variables
-
ε = Epsilon (tube width)
Diagram:
SVR Visualization:
Y
| * (outlier, penalized)
| -------- Upper bound (f(x) + ε)
| * • *
| ========= Regression line f(x)
| • * •
| -------- Lower bound (f(x) - ε)
| * (outlier, penalized)
+-------------------------> X
• = Support Vectors (on boundary)
* = Data points
Example: Temperature Prediction
| Time (hrs) | Temperature (°C) |
|---|---|
| 6 | 15 |
| 9 | 22 |
| 12 | 28 |
| 15 | 30 |
| 18 | 25 |
SVR Parameters:
-
ε = 2 (tolerance of ±2°C)
-
C = 1 (regularization)
-
Kernel = RBF (for non-linear pattern)
Result: SVR finds a regression function where most predictions fall within ±2°C of actual values.
Kernel Types for SVR:
| Kernel | Formula | Use Case |
|---|---|---|
| Linear | K(x,y) = x·y | Linear relationships |
| Polynomial | K(x,y) = (x·y + c)^d | Polynomial patterns |
| RBF | K(x,y) = exp(-γ||x-y||²) | Complex non-linear |
| Sigmoid | K(x,y) = tanh(αx·y + c) | Neural network-like |
Applications of SVR:
| Domain | Application |
|---|---|
| Finance | Stock price prediction, Risk assessment |
| Weather | Temperature forecasting, Rainfall prediction |
| Healthcare | Drug response prediction, Disease progression |
| Engineering | Load forecasting, Quality prediction |
| Energy | Power consumption prediction, Solar output |
| Retail | Sales forecasting, Demand prediction |
Advantages:
-
Effective in high-dimensional spaces
-
Robust to outliers (ε-insensitive)
-
Kernel trick for non-linear patterns
-
Good generalization
Disadvantages:
-
Computationally intensive for large datasets
-
Sensitive to hyperparameter selection
-
Less interpretable than linear regression
Unit 11 Summary
| Topic | Key Points |
|---|---|
| Regression Concept | Supervised learning for continuous prediction |
| Linear Regression | Y = β₀ + β₁X, Least squares method |
| Multiple Regression | Multiple features, matrix notation |
| Polynomial Regression | Non-linear using polynomial features |
| SVR | ε-tube, support vectors, kernel functions |
| Evaluation Metrics | MSE, MAE, RMSE, R² |
Important Formulas:
| Metric | Formula |
|---|---|
| MSE | (1/n)Σ(Y - Ŷ)² |
| MAE | (1/n)Σ|Y - Ŷ| |
| RMSE | √MSE |
| R² | 1 - (SSE/SST) |
| Slope | Σ(X-X̄)(Y-Ȳ) / Σ(X-X̄)² |
| Intercept | Ȳ - β₁X̄ |
UNIT 12: Neural Networks and Deep Learning
Q1. ⭐ What is a Neural Network?
[ Jun 2023 Q1(h) | Frequency: 1 ]
Neural Network is a computational model inspired by the structure and function of biological neurons in the human brain. It consists of interconnected nodes (neurons) organized in layers that process information to learn patterns from data.
Definition: A neural network is a machine learning algorithm that uses a network of artificial neurons to transform inputs into meaningful outputs through learned weights and activation functions.
Key Components:
| Component | Description |
|---|---|
| Input Layer | Receives raw data/features |
| Hidden Layer(s) | Processes information through weighted connections |
| Output Layer | Produces final prediction/classification |
| Weights | Learnable parameters connecting neurons |
| Bias | Additional parameter for flexibility |
| Activation Function | Introduces non-linearity |
Diagram:
Working Process:
-
Input data fed to input layer
-
Weighted sum computed at each neuron
-
Activation function applied
-
Output propagated to next layer
-
Final output generated
Applications:
-
Image recognition
-
Natural Language Processing
-
Speech recognition
-
Medical diagnosis
-
Autonomous vehicles
Q2. ⭐ How biological neuron relates to Artificial Neuron?
[ Jun 2023 Q1(h) | Frequency: 1 ]
The Artificial Neuron is designed to mimic the structure and function of a Biological Neuron in the human brain.
Biological Neuron Working:
-
Dendrites receive signals from other neurons
-
Signals are processed in the Cell Body (Soma)
-
If signal exceeds threshold, neuron fires
-
Signal travels through Axon to other neurons
-
Synapses transmit signals to next neurons
Artificial Neuron Working:
-
Inputs (x₁, x₂, ...) receive data values
-
Each input multiplied by weight (w)
-
Weighted sum calculated: Σ(wᵢ × xᵢ) + bias
-
Activation function determines output
-
Output transmitted to next layer
Mathematical Model of Artificial Neuron:
Where:
-
xᵢ = Input signals
-
wᵢ = Weights (synaptic strength)
-
b = Bias
-
f = Activation function
-
y = Output
Key Similarities:
| Aspect | Biological | Artificial |
|---|---|---|
| Signal Reception | Dendrites | Input connections |
| Processing | Cell body | Summation + activation |
| Signal Transmission | Axon | Output |
| Learning | Synaptic plasticity | Weight adjustment |
| Threshold | Action potential | Activation function |
Q3. 📌 Illustrate biological neuron with suitable diagram.
[ Jun 2023 Q1(h) | Frequency: 1 ]
Biological Neuron Structure:
A biological neuron is the fundamental unit of the nervous system that processes and transmits information through electrical and chemical signals.
Diagram:
Components of Biological Neuron:
| Component | Function |
|---|---|
| Dendrites | Tree-like structures that receive input signals from other neurons |
| Cell Body (Soma) | Contains nucleus; integrates incoming signals |
| Axon Hillock | Decision point where action potential is initiated |
| Axon | Long fiber that carries electrical impulses away from cell body |
| Myelin Sheath | Insulating layer that speeds up signal transmission |
| Axon Terminals | End points that form synapses with other neurons |
| Synapse | Junction where neurotransmitters are released |
Signal Transmission Process:
Dendrites → Cell Body → Axon Hillock → Axon → Synaptic Terminals → Next Neuron
(Input) (Processing) (Threshold) (Transmission) (Output)
Q4. 🔥 Create a table to map the components of Biological Neuron with Artificial Neuron.
[ Jun 2023 Q1(h) | Frequency: 1 ]
Mapping Table: Biological vs Artificial Neuron
| Biological Neuron | Artificial Neuron | Function |
|---|---|---|
| Dendrites | Input connections (xᵢ) | Receive incoming signals/data |
| Synapses | Weights (wᵢ) | Determine strength of connections |
| Cell Body (Soma) | Summation function (Σ) | Aggregate all weighted inputs |
| Axon Hillock | Activation function | Decide whether to fire/activate |
| Threshold potential | Bias (b) | Minimum activation requirement |
| Axon | Output connection | Transmit processed signal |
| Neurotransmitters | Output signal (y) | Information passed to next layer |
| Synaptic plasticity | Learning algorithm | Adjust connection strengths |
| Neural network | Artificial Neural Network | Interconnected processing units |
Visual Comparison:
Diagram:
Detailed Functional Mapping:
| Aspect | Biological | Artificial | Mathematical |
|---|---|---|---|
| Input | Dendrite signals | x₁, x₂, ... xₙ | Input vector X |
| Connection Strength | Synaptic efficacy | w₁, w₂, ... wₙ | Weight vector W |
| Integration | Spatial-temporal summation | Σ(wᵢxᵢ) + b | Weighted sum |
| Threshold | -55mV to -70mV | Activation threshold | Bias term |
| Activation | All-or-none firing | Sigmoid, ReLU, etc. | f(z) |
| Learning | Hebbian learning | Backpropagation | Δw = η × error |
Q5. 🔥 Explain Convolutional Neural Networks with suitable example.
[ Dec 2023 Q5(e), Jun 2023 Q5(d) | Frequency: 2 ]
Convolutional Neural Network (CNN) is a specialized deep learning architecture designed primarily for processing structured grid data like images. It uses convolution operations to automatically learn spatial hierarchies of features.
Key Layers in CNN:
| Layer Type | Function |
|---|---|
| Convolutional Layer | Applies filters to extract features |
| Activation (ReLU) | Introduces non-linearity |
| Pooling Layer | Reduces spatial dimensions |
| Fully Connected | Makes final predictions |
| Softmax/Output | Produces class probabilities |
Diagram:
Convolution Operation:
Example: Digit Recognition (MNIST)
Problem: Classify handwritten digits (0-9)
CNN Architecture:
Key Concepts:
| Concept | Description |
|---|---|
| Stride | Step size for filter movement |
| Padding | Adding zeros around input borders |
| Feature Maps | Output of convolution operation |
| Parameter Sharing | Same filter applied across image |
| Local Connectivity | Neurons connect only to local regions |
Applications:
-
Image classification
-
Object detection
-
Face recognition
-
Medical image analysis
-
Self-driving cars
Q6. ⭐ Explain Recurrent Neural Networks.
[ Dec 2022 Q5(e) | Frequency: 1 ]
Recurrent Neural Network (RNN) is a type of neural network designed for sequential data processing. Unlike feedforward networks, RNNs have connections that form directed cycles, allowing them to maintain memory of previous inputs.
Key Characteristics:
| Feature | Description |
|---|---|
| Sequential Processing | Processes data one element at a time |
| Hidden State | Maintains memory across time steps |
| Weight Sharing | Same weights used across all time steps |
| Variable Length | Can handle sequences of different lengths |
Diagram:
Mathematical Formulation:
Hidden State Update:
Output Calculation:
Where:
-
hₜ = Hidden state at time t
-
xₜ = Input at time t
-
yₜ = Output at time t
-
W = Weight matrices
-
b = Bias vectors
Types of RNN Architectures:
| Type | Input-Output | Example Use |
|---|---|---|
| One-to-One | Single → Single | Standard classification |
| One-to-Many | Single → Sequence | Image captioning |
| Many-to-One | Sequence → Single | Sentiment analysis |
| Many-to-Many | Sequence → Sequence | Machine translation |
Variants to Address Limitations:
-
LSTM (Long Short-Term Memory) - Handles long-term dependencies
-
GRU (Gated Recurrent Unit) - Simplified version of LSTM
Applications:
-
Speech recognition
-
Language modeling
-
Machine translation
-
Text generation
-
Time series prediction
Limitations:
-
Vanishing/Exploding gradient problem
-
Difficulty with long sequences
-
Sequential computation (slow training)
Q7. 📌 Explain Recursive Neural Networks.
[ Jun 2022 Q5(e) | Frequency: 1 ]
Recursive Neural Network is a type of neural network that applies the same set of weights recursively over a structured input (typically tree-structured data) to produce a fixed-size output.
Key Difference from RNN:
-
RNN: Processes sequential (linear) data
-
Recursive NN: Processes hierarchical (tree) structures
Structure:
Diagram:
Mathematical Formulation:
For combining child nodes c₁ and c₂:
Where:
-
p = Parent node representation
-
W = Shared weight matrix
-
[c₁; c₂] = Concatenation of child representations
-
f = Non-linear activation function
Example: Sentence Parsing
Sentence: "The cat sat on mat"
Tree Structure:
S (Sentence)
/ \
NP VP
/ / \
The sat PP
cat / \
on NP
|
mat
Each node combines its children using the same neural network.
Applications:
| Domain | Application |
|---|---|
| NLP | Sentence parsing, Sentiment analysis |
| Computer Vision | Scene parsing, Image segmentation |
| Code Analysis | Abstract syntax tree processing |
| Knowledge Graphs | Hierarchical reasoning |
Advantages:
-
Captures hierarchical structure
-
Variable-size input handling
-
Parameter sharing reduces complexity
Limitations:
-
Requires pre-defined tree structure
-
Cannot model arbitrary graphs
-
Vanishing gradient in deep trees
Q8. 🔥 Explain Restricted Boltzmann Machines with suitable example.
[ Dec 2023 Q5(d), Dec 2022 Q5(f) | Frequency: 2 ]
Restricted Boltzmann Machine (RBM) is a generative stochastic neural network that can learn a probability distribution over its inputs. It consists of visible and hidden layers with no intra-layer connections.
Architecture:
| Component | Description |
|---|---|
| Visible Layer (v) | Represents input data |
| Hidden Layer (h) | Learns features/patterns |
| Weights (W) | Connections between layers |
| Biases (a, b) | For visible and hidden units |
Diagram:
"Restricted" Meaning:
-
No connections within the same layer
-
Only connections between visible and hidden layers
-
Makes computation tractable
Energy Function:
Probability Distribution:
Where Z is the partition function (normalization constant).
Training: Contrastive Divergence (CD)
Step 1: Positive Phase - Sample h given v (data)
Step 2: Negative Phase - Reconstruct v' from h, then h' from v'
Step 3: Update weights: ΔW = η(vh^T - v'h'^T)
Example: Movie Recommendation
Problem: Learn user preferences from movie ratings
| User | Movie1 | Movie2 | Movie3 | Movie4 |
|---|---|---|---|---|
| Visible Layer | 5 | 3 | 0 | 4 |
| (0 = not watched) |
RBM Learning:
-
Visible units: Movie ratings
-
Hidden units: Learn latent features (genre preferences)
-
After training: Can predict ratings for unwatched movies
Applications:
-
Recommendation systems
-
Dimensionality reduction
-
Feature learning
-
Collaborative filtering
-
Pre-training deep networks
Q9. 🔥 Explain Generative Adversarial Networks with suitable example.
[ Jun 2024 Q5(d), Jun 2022 Q5(f) | Frequency: 2 ]
Generative Adversarial Network (GAN) consists of two neural networks—Generator and Discriminator—that compete against each other in a game-theoretic framework to generate realistic synthetic data.
Architecture:
| Component | Role | Goal |
|---|---|---|
| Generator (G) | Creates fake data | Fool the discriminator |
| Discriminator (D) | Classifies real vs fake | Correctly identify fakes |
Diagram:
Training Process:
Minimax Game:
min_G max_D V(D,G) = E[log D(x)] + E[log(1 - D(G(z)))]
Step-by-Step Training:
| Step | Action |
|---|---|
| 1 | Sample random noise z |
| 2 | Generator creates fake samples G(z) |
| 3 | Discriminator evaluates real and fake samples |
| 4 | Calculate loss for D (maximize correct classification) |
| 5 | Calculate loss for G (minimize D's success on fakes) |
| 6 | Update both networks via backpropagation |
| 7 | Repeat until equilibrium |
Example: Face Generation
Problem: Generate realistic human faces
Setup:
-
Real Data: CelebA dataset (celebrity faces)
-
Generator Input: 100-dimensional random noise
-
Generator Output: 64×64×3 RGB image
-
Discriminator Output: Probability (real/fake)
Training Progression:
Epoch 1: [Noise] → Generator → [Blurry blob]
Epoch 100: [Noise] → Generator → [Face-like shape]
Epoch 1000:[Noise] → Generator → [Realistic face]
GAN Variants:
| Variant | Description |
|---|---|
| DCGAN | Uses convolutional layers |
| CGAN | Conditional generation |
| StyleGAN | Controls style attributes |
| CycleGAN | Image-to-image translation |
| WGAN | Wasserstein distance for stability |
Applications:
-
Image synthesis (faces, art)
-
Image super-resolution
-
Data augmentation
-
Video generation
-
Drug discovery
-
Text-to-image generation
Q10. 🔥 Explain Auto Encoders with suitable example.
[ Jun 2024 Q5(e), Jun 2023 Q5(e) | Frequency: 2 ]
Autoencoder is an unsupervised neural network that learns to compress data into a lower-dimensional representation (encoding) and then reconstruct the original data from this representation (decoding).
Architecture:
| Component | Function |
|---|---|
| Encoder | Compresses input to latent space |
| Latent Space (z) | Compressed representation (bottleneck) |
| Decoder | Reconstructs input from latent space |
Diagram:
Mathematical Formulation:
Encoder: $z = f_{encoder}(x)$
Decoder: $\hat{x} = f_{decoder}(z)$
Loss Function (Reconstruction Loss):
Example: Image Denoising
Problem: Remove noise from MNIST digit images
Architecture:
Input (28×28 noisy image)
↓
Encoder: 784 → 256 → 128 → 32
↓
Latent Space: 32 dimensions
↓
Decoder: 32 → 128 → 256 → 784
↓
Output (28×28 clean image)
Training:
-
Input: Noisy images (original + Gaussian noise)
-
Target: Clean original images
-
Loss: MSE between output and clean image
Variants of Autoencoders:
| Type | Description | Use Case |
|---|---|---|
| Vanilla AE | Basic encoder-decoder | Dimensionality reduction |
| Denoising AE | Trained on noisy data | Image denoising |
| Sparse AE | Sparse latent representation | Feature learning |
| Variational AE | Probabilistic latent space | Generative modeling |
| Convolutional AE | Uses CNN layers | Image compression |
Applications:
-
Dimensionality reduction
-
Anomaly detection
-
Image compression
-
Data denoising
-
Feature extraction
-
Generative models (VAE)
Q11. ⭐ Explain Transformers.
[ Jun 2023 Q5(f) | Frequency: 1 ]
Transformer is a deep learning architecture that uses self-attention mechanisms to process sequential data in parallel, eliminating the need for recurrence. Introduced in "Attention is All You Need" (2017).
Key Innovation: Self-Attention
Allows each element to attend to all other elements in the sequence simultaneously.
Architecture Components:
| Component | Function |
|---|---|
| Multi-Head Attention | Attends to different positions |
| Feed-Forward Network | Processes attention output |
| Layer Normalization | Stabilizes training |
| Positional Encoding | Adds position information |
Diagram:
Self-Attention Mechanism:
Where:
-
Q = Query matrix
-
K = Key matrix
-
V = Value matrix
-
dₖ = Dimension of keys
Multi-Head Attention:
Each head learns different attention patterns.
Complete Transformer (Encoder-Decoder):
Applications:
-
BERT: Bidirectional understanding (NLU)
-
GPT: Text generation (NLG)
-
T5: Text-to-text tasks
-
Vision Transformer (ViT): Image classification
-
ChatGPT: Conversational AI
Q12. 📌 Explain Non-Monotonic Reasoning Systems.
[ Jun 2023 Q5(c) | Frequency: 1 ]
Non-Monotonic Reasoning is a form of logical reasoning where conclusions can be retracted when new information is added. Unlike classical (monotonic) logic, adding facts can invalidate previous conclusions.
Comparison:
| Monotonic Logic | Non-Monotonic Logic |
|---|---|
| New facts only add conclusions | New facts may retract conclusions |
| Conclusions never withdrawn | Conclusions can be revised |
| Complete information assumed | Handles incomplete information |
| Traditional deductive logic | Default reasoning |
Example:
Monotonic:
Fact: Tweety is a bird
Rule: Birds can fly
Conclusion: Tweety can fly ✓
Add: Tweety is a penguin
Conclusion: Tweety can fly ✓ (unchanged)
Non-Monotonic:
Fact: Tweety is a bird
Default: Birds typically fly
Conclusion: Tweety can fly ✓
Add: Tweety is a penguin
Conclusion: Tweety cannot fly ✗ (revised)
Types of Non-Monotonic Reasoning:
| Type | Description |
|---|---|
| Default Logic | Rules with exceptions |
| Circumscription | Minimize abnormalities |
| Autoepistemic Logic | Based on agent's beliefs |
| Abductive Reasoning | Best explanation inference |
Default Logic Notation:
"If A is provable and B is consistent, then conclude C"
Example Default Rule:
"If x is a bird, and it's consistent that x can fly, conclude x can fly"
Applications:
-
Expert systems
-
Knowledge representation
-
Common-sense reasoning
-
Diagnosis systems
-
Planning under uncertainty
Q13. 📌 Explain Closed World Assumption.
[ Jun 2022 Q5(d) | Frequency: 1 ]
Closed World Assumption (CWA) is a reasoning principle stating that if a fact cannot be proven true from the knowledge base, it is assumed to be false.
Definition:
"What is not known to be true is assumed to be false."
Comparison with Open World Assumption:
| Closed World (CWA) | Open World (OWA) |
|---|---|
| Unknown = False | Unknown = Unknown |
| Complete knowledge assumed | Incomplete knowledge allowed |
| Used in databases | Used in Semantic Web |
| Definite conclusions | Uncertain conclusions |
Example:
Knowledge Base:
enrolled(John, CS101)
enrolled(John, MATH201)
enrolled(Mary, CS101)
Query: Is John enrolled in PHYSICS301?
| CWA Answer | OWA Answer |
|---|---|
| No (not stated, so false) | Unknown (might be true) |
Diagram:
Formal Definition:
For a knowledge base KB and ground atom A:
-
CWA: If KB ⊭ A, then infer ¬A
-
OWA: If KB ⊭ A, A's truth value is unknown
Applications of CWA:
-
Relational databases (SQL)
-
Logic programming (Prolog)
-
Expert systems
-
Planning systems
Limitations:
-
Cannot handle incomplete information properly
-
May lead to incorrect conclusions
-
Requires complete knowledge assumption
Relation to Non-Monotonic Reasoning: CWA is a form of non-monotonic reasoning because adding new facts can change conclusions from false to true.
Unit 12 Summary
| Topic | Key Points |
|---|---|
| Neural Network | Interconnected neurons, layers, weights, activation |
| Biological vs Artificial | Dendrites→Inputs, Synapse→Weights, Soma→Summation |
| CNN | Convolution, pooling, feature extraction for images |
| RNN | Sequential processing, hidden state memory |
| Recursive NN | Tree-structured data processing |
| RBM | Visible-hidden layers, energy-based model |
| GAN | Generator vs Discriminator adversarial training |
| Autoencoder | Encoder-Decoder, latent space compression |
| Transformer | Self-attention, parallel processing, BERT/GPT |
| Non-Monotonic | Conclusions can be retracted |
| CWA | Unknown facts assumed false |
BLOCK - 4. Machine Learning - II
UNIT 13: Feature Selection and Extraction
Q1. 🔥 Explain the term 'Dimensionality Reduction'.
[ Dec 2023 Q1(f), Jun 2022 Q1(h) | Frequency: 2 ]
Dimensionality Reduction is a technique used in machine learning and data analysis to reduce the number of input variables (features) in a dataset while preserving as much meaningful information as possible.
Definition: The process of transforming data from a high-dimensional space to a lower-dimensional space such that the reduced representation retains the essential characteristics of the original data.
Key Concepts:
| Aspect | Description |
|---|---|
| High Dimensionality | Dataset with many features (columns) |
| Curse of Dimensionality | Problems arising when data has too many dimensions |
| Information Preservation | Retaining variance and patterns in reduced form |
| Computational Efficiency | Faster processing with fewer dimensions |
Why Dimensionality Reduction is Needed:
-
Curse of Dimensionality: As dimensions increase, data becomes sparse
-
Overfitting Prevention: Fewer features reduce model complexity
-
Visualization: Data can be plotted in 2D or 3D
-
Storage Reduction: Less memory required
-
Noise Removal: Eliminates irrelevant features
Diagram:
Example: An image dataset with 1000×1000 pixels (1,000,000 features) can be reduced to 100 principal components, retaining 95% of the variance while dramatically reducing computational requirements.
Q2. 🔥 Name the general techniques to perform dimensionality reduction.
[ Dec 2023 Q1(f), Jun 2022 Q1(h) | Frequency: 2 ]
Dimensionality reduction techniques are broadly classified into two main categories:
1. Feature Selection Methods: Select a subset of original features without transforming them.
| Technique | Description |
|---|---|
| Filter Methods | Use statistical measures (correlation, chi-square, mutual information) |
| Wrapper Methods | Use ML model performance (forward/backward selection, RFE) |
| Embedded Methods | Feature selection built into algorithm (LASSO, Ridge, Decision Trees) |
2. Feature Extraction Methods: Transform original features into new reduced set of features.
| Technique | Type | Description |
|---|---|---|
| PCA | Linear | Projects data onto principal components |
| LDA | Linear | Maximizes class separability |
| SVD | Linear | Singular Value Decomposition |
| t-SNE | Non-linear | For visualization, preserves local structure |
| UMAP | Non-linear | Uniform Manifold Approximation |
| Autoencoders | Non-linear | Neural network-based compression |
Diagram:
Q3. ⭐ Give merits of dimensionality reduction.
[ Dec 2023 Q1(f) | Frequency: 1 ]
Merits of Dimensionality Reduction:
| S.No | Merit | Explanation |
|---|---|---|
| 1 | Reduced Computation Time | Fewer features mean faster training and prediction |
| 2 | Reduced Storage Space | Less memory required to store data |
| 3 | Avoids Overfitting | Simpler models generalize better to new data |
| 4 | Removes Redundancy | Eliminates correlated and duplicate features |
| 5 | Noise Reduction | Filters out irrelevant features and noise |
| 6 | Better Visualization | Enables plotting in 2D/3D for analysis |
| 7 | Improved Model Performance | Often leads to better accuracy |
| 8 | Handles Multicollinearity | Removes feature dependencies |
| 9 | Easier Interpretation | Simpler models are more explainable |
| 10 | Mitigates Curse of Dimensionality | Works better with high-dimensional data |
Q4. ⭐ Give limitation of dimensionality reduction.
[ Dec 2023 Q1(f) | Frequency: 1 ]
Limitations of Dimensionality Reduction:
| S.No | Limitation | Explanation |
|---|---|---|
| 1 | Information Loss | Some important information may be discarded |
| 2 | Interpretability Issues | Transformed features may lack physical meaning |
| 3 | Computational Overhead | Initial reduction process can be expensive |
| 4 | Parameter Sensitivity | Results depend on hyperparameter choices |
| 5 | Not Always Beneficial | May hurt performance if done incorrectly |
| 6 | Difficulty in Choosing Dimensions | Optimal number of components is not always clear |
| 7 | Assumption Dependency | Linear methods assume linear relationships |
| 8 | May Miss Non-linear Patterns | Linear methods fail with complex data structures |
| 9 | Reversibility Issues | Original data cannot be fully reconstructed |
| 10 | Domain Knowledge Required | Understanding data is crucial for proper application |
Q5. ⭐ Discuss the advantages of dimensionality reduction.
[ Dec 2024 Q1(h) | Frequency: 1 ]
Advantages of Dimensionality Reduction:
1. Computational Efficiency:
-
Faster model training and inference
-
Reduced memory consumption
-
Enables processing of large-scale datasets
2. Model Performance Improvement:
-
Reduces overfitting by simplifying models
-
Eliminates noisy and redundant features
-
Improves generalization to unseen data
3. Data Visualization:
-
High-dimensional data can be visualized in 2D/3D
-
Helps identify patterns, clusters, and outliers
-
Enables better exploratory data analysis
4. Storage Optimization:
-
Compressed representation requires less storage
-
Efficient data transmission and sharing
5. Handling Curse of Dimensionality:
-
Addresses sparsity in high-dimensional spaces
-
Improves distance-based algorithm performance
Diagram:
Q6. ⭐ Discuss the disadvantages of dimensionality reduction.
[ Dec 2024 Q1(h) | Frequency: 1 ]
Disadvantages of Dimensionality Reduction:
1. Information Loss:
-
Reduction inherently discards some data
-
Important subtle patterns may be lost
-
Cannot guarantee preservation of all relevant information
2. Interpretability Challenges:
-
New features may be abstract combinations
-
Difficult to explain transformed features to stakeholders
-
Original feature importance becomes unclear
3. Computational Cost of Transformation:
-
Initial computation can be expensive for large datasets
-
Some methods (t-SNE, Autoencoders) are computationally intensive
4. Method Selection Complexity:
-
Many techniques available with different assumptions
-
Wrong choice can degrade model performance
-
Requires domain expertise
5. Hyperparameter Sensitivity:
-
Number of components affects results significantly
-
Optimal parameters vary across datasets
| Disadvantage | Impact | Mitigation |
|---|---|---|
| Information Loss | Reduced accuracy | Use explained variance ratio |
| Poor Interpretability | Hard to explain | Use feature selection instead |
| Computation Cost | Slow processing | Use incremental methods |
| Method Dependency | Inconsistent results | Cross-validate techniques |
Q7. ⭐ Discuss Dimensionality reduction with suitable example.
[ Jun 2023 Q2(b)(iv) | Frequency: 1 ]
Dimensionality Reduction - Detailed Example:
Concept: Reducing the number of features in a dataset while retaining maximum information.
Example: Image Compression using PCA
Consider a facial recognition system with 64×64 pixel images:
| Stage | Dimensions | Description |
|---|---|---|
| Original | 4096 features | Each pixel is a feature |
| After PCA | 100 components | Retains 95% variance |
| Reduction | 97.5% | Massive space savings |
Step-by-Step Process:
Original Image: 64 × 64 = 4096 dimensions
↓
Step 1: Standardize data (mean = 0, std = 1)
↓
Step 2: Compute covariance matrix
↓
Step 3: Find eigenvalues and eigenvectors
↓
Step 4: Select top k eigenvectors (principal components)
↓
Step 5: Project data onto k dimensions
↓
Reduced Data: 100 dimensions (retaining 95% information)
Diagram:
Practical Benefits:
-
Storage: 4096 → 100 values per image (97.5% reduction)
-
Processing: Face recognition 40× faster
-
Quality: Faces still recognizable
Another Example: Customer Segmentation
| Feature Set | Before | After PCA |
|---|---|---|
| Demographics | 15 features | 3 components |
| Purchase History | 50 features | 5 components |
| Behavior | 30 features | 4 components |
| Total | 95 features | 12 components |
Q8. 📌 Explain Feature Selection.
[ Jun 2023 Q5(i) | Frequency: 1 ]
Feature Selection is the process of selecting a subset of relevant features from the original dataset for use in model construction.
Definition: A technique that chooses the most important original features without transforming them, reducing dimensionality while maintaining interpretability.
Types of Feature Selection Methods:
| Method | Approach | Examples |
|---|---|---|
| Filter | Statistical measures independent of model | Correlation, Chi-square, Mutual Information, Variance Threshold |
| Wrapper | Uses model performance as selection criterion | Forward Selection, Backward Elimination, RFE |
| Embedded | Built into learning algorithm | LASSO (L1), Ridge (L2), Tree-based importance |
Diagram:
Comparison of Methods:
| Aspect | Filter | Wrapper | Embedded |
|---|---|---|---|
| Speed | Fast | Slow | Medium |
| Accuracy | Moderate | High | High |
| Model Dependency | No | Yes | Yes |
| Overfitting Risk | Low | High | Medium |
Advantages:
-
Maintains original feature interpretation
-
Reduces training time
-
Improves model accuracy
-
Prevents overfitting
Q9. 📌 What is the purpose of feature extraction in machine learning?
[ Jun 2025 Q1(g) | Frequency: 1 ]
Feature Extraction is the process of transforming raw data into a reduced set of features that effectively represent the original data while reducing dimensionality.
Purpose of Feature Extraction:
| Purpose | Description |
|---|---|
| Dimensionality Reduction | Transforms high-dimensional data to lower dimensions |
| Information Compression | Captures essential patterns in fewer features |
| Noise Reduction | Filters out irrelevant variations in data |
| Improved Performance | Enhances model accuracy and efficiency |
| Pattern Discovery | Reveals hidden structures in data |
Key Differences from Feature Selection:
| Aspect | Feature Selection | Feature Extraction |
|---|---|---|
| Transformation | No | Yes |
| Original Features | Preserved | Combined/Transformed |
| Interpretability | High | Low |
| Techniques | Filter, Wrapper | PCA, LDA, Autoencoders |
Diagram:
Applications:
-
Image Processing: Extract edges, textures, shapes
-
Text Mining: Word embeddings, TF-IDF
-
Signal Processing: Frequency components, wavelets
-
Biometrics: Facial features, fingerprint minutiae
Q10. 🔥 Explain Principal Component Analysis with suitable example.
[ Jun 2024 Q5(a), Dec 2022 Q5(i) | Frequency: 2 ]
Principal Component Analysis (PCA) is an unsupervised linear dimensionality reduction technique that transforms data into a new coordinate system where the greatest variance lies on the first coordinate (first principal component), second greatest variance on the second coordinate, and so on.
Key Concepts:
| Term | Description |
|---|---|
| Principal Components | New orthogonal axes that capture maximum variance |
| Eigenvalues | Indicate amount of variance explained by each PC |
| Eigenvectors | Direction of principal components |
| Explained Variance Ratio | Proportion of total variance captured |
Algorithm Steps:
Step 1: Standardize the data (mean = 0, variance = 1)
Step 2: Compute covariance matrix
Step 3: Calculate eigenvalues and eigenvectors
Step 4: Sort eigenvectors by eigenvalues (descending)
Step 5: Select top k eigenvectors
Step 6: Transform data to new k-dimensional space
Diagram - PCA Geometric Interpretation:
Mathematical Formulation:
-
Covariance Matrix: C = (1/n) XᵀX
-
Eigenvalue equation: Cv = λv
-
Transformed data: Y = X × V
Example: 2D to 1D Reduction
Consider 5 data points:
| Point | X₁ | X₂ |
|---|---|---|
| A | 2.5 | 2.4 |
| B | 0.5 | 0.7 |
| C | 2.2 | 2.9 |
| D | 1.9 | 2.2 |
| E | 3.1 | 3.0 |
Step 1: Calculate Mean
- Mean(X₁) = 2.04, Mean(X₂) = 2.24
Step 2: Standardize
- Subtract mean from each value
Step 3: Covariance Matrix
C = [0.616 0.615]
[0.615 0.716]
Step 4: Eigenvalues and Eigenvectors
-
λ₁ = 1.284, v₁ = [0.677, 0.735]
-
λ₂ = 0.049, v₂ = [-0.735, 0.677]
Step 5: Explained Variance
PC1: 1.284 / (1.284 + 0.049) = 96.3%
PC2: 0.049 / (1.284 + 0.049) = 3.7%
Result: Data reduced from 2D to 1D with 96.3% variance retained.
Applications:
-
Face Recognition (Eigenfaces)
-
Image Compression
-
Gene Expression Analysis
-
Financial Portfolio Analysis
Q11. 📌 Calculate the principal component using PCA algorithm for given two-dimensional patterns.
[ Dec 2024 Q3(b) | Frequency: 1 ]
Problem: Calculate principal components for 2D data points.
Given Data Points:
| Point | X₁ | X₂ |
|---|---|---|
| 1 | 1 | 2 |
| 2 | 2 | 3 |
| 3 | 3 | 3 |
| 4 | 4 | 5 |
| 5 | 5 | 5 |
Step 1: Calculate Mean
μ₁ = (1+2+3+4+5)/5 = 15/5 = 3
μ₂ = (2+3+3+5+5)/5 = 18/5 = 3.6
Step 2: Center the Data (Subtract Mean)
| Point | X₁ - μ₁ | X₂ - μ₂ |
|---|---|---|
| 1 | -2 | -1.6 |
| 2 | -1 | -0.6 |
| 3 | 0 | -0.6 |
| 4 | 1 | 1.4 |
| 5 | 2 | 1.4 |
Step 3: Compute Covariance Matrix
Var(X₁) = [(-2)² + (-1)² + 0² + 1² + 2²] / 4 = 10/4 = 2.5
Var(X₂) = [(-1.6)² + (-0.6)² + (-0.6)² + 1.4² + 1.4²] / 4
= [2.56 + 0.36 + 0.36 + 1.96 + 1.96] / 4 = 7.2/4 = 1.8
Cov(X₁,X₂) = [(-2)(-1.6) + (-1)(-0.6) + (0)(-0.6) + (1)(1.4) + (2)(1.4)] / 4
= [3.2 + 0.6 + 0 + 1.4 + 2.8] / 4 = 8/4 = 2
Covariance Matrix:
C = [2.5 2.0]
[2.0 1.8]
Step 4: Calculate Eigenvalues
Using characteristic equation: det(C - λI) = 0
|2.5-λ 2.0 |
|2.0 1.8-λ | = 0
(2.5-λ)(1.8-λ) - (2.0)(2.0) = 0
4.5 - 2.5λ - 1.8λ + λ² - 4 = 0
λ² - 4.3λ + 0.5 = 0
Using quadratic formula:
λ = (4.3 ± √(18.49 - 2)) / 2
λ = (4.3 ± √16.49) / 2
λ = (4.3 ± 4.06) / 2
λ₁ = 4.18 (First Principal Component)
λ₂ = 0.12 (Second Principal Component)
Step 5: Calculate Eigenvectors
For λ₁ = 4.18:
(C - λ₁I)v = 0
[-1.68 2.0 ][v₁] [0]
[2.0 -2.38][v₂] = [0]
-1.68v₁ + 2.0v₂ = 0
v₂ = 0.84v₁
Normalized: v₁ = [0.766, 0.643]
For λ₂ = 0.12:
Normalized: v₂ = [-0.643, 0.766]
Step 6: Principal Components Summary
| Component | Eigenvector | Eigenvalue | Variance Explained |
|---|---|---|---|
| PC1 | [0.766, 0.643] | 4.18 | 97.2% |
| PC2 | [-0.643, 0.766] | 0.12 | 2.8% |
Scatter Plot Visualization:
Step 7: Transform Data onto PC1
Projection onto PC1:
Y = X_centered × v₁
Point 1: (-2)(0.766) + (-1.6)(0.643) = -2.56
Point 2: (-1)(0.766) + (-0.6)(0.643) = -1.15
Point 3: (0)(0.766) + (-0.6)(0.643) = -0.39
Point 4: (1)(0.766) + (1.4)(0.643) = 1.67
Point 5: (2)(0.766) + (1.4)(0.643) = 2.43
Result: Data successfully reduced from 2D to 1D with 97.2% variance retained.
Q12. 🔥 Explain Linear Discriminant Analysis with suitable example.
[ Dec 2023 Q5(a), Jun 2022 Q5(i) | Frequency: 2 ]
Linear Discriminant Analysis (LDA) is a supervised linear dimensionality reduction technique that finds a linear combination of features that best separates two or more classes.
Key Difference from PCA:
| Aspect | PCA | LDA |
|---|---|---|
| Type | Unsupervised | Supervised |
| Objective | Maximize variance | Maximize class separability |
| Uses Labels | No | Yes |
| Output | Principal components | Discriminant functions |
LDA Objective: Maximize the ratio of between-class variance to within-class variance.
Fisher's Criterion:
J(w) = (μ₁ - μ₂)² / (s₁² + s₂²)
where:
μ₁, μ₂ = class means
s₁², s₂² = class variances
Algorithm Steps:
Mathematical Formulation:
Within-Class Scatter Matrix (Sw):
Sw = Σᵢ Σₓ∈Cᵢ (x - μᵢ)(x - μᵢ)ᵀ
Between-Class Scatter Matrix (Sb):
Sb = Σᵢ Nᵢ(μᵢ - μ)(μᵢ - μ)ᵀ
Optimization:
Maximize: w = Sw⁻¹(μ₁ - μ₂)
Example: Two-Class Classification
Given Data:
| Class | Points (X₁, X₂) |
|---|---|
| Class 0 (●) | (4,2), (2,4), (2,3), (3,6), (4,4) |
| Class 1 (○) | (9,10), (6,8), (9,5), (8,7), (10,8) |
Step 1: Calculate Class Means
μ₀ = [(4+2+2+3+4)/5, (2+4+3+6+4)/5] = [3, 3.8]
μ₁ = [(9+6+9+8+10)/5, (10+8+5+7+8)/5] = [8.4, 7.6]
Step 2: Within-Class Scatter Matrix
S₀ = Σ(xᵢ - μ₀)(xᵢ - μ₀)ᵀ for Class 0
S₁ = Σ(xᵢ - μ₁)(xᵢ - μ₁)ᵀ for Class 1
Sw = S₀ + S₁
Step 3: Between-Class Scatter Matrix
Sb = (μ₀ - μ₁)(μ₀ - μ₁)ᵀ
= ([3,3.8] - [8.4,7.6])([3,3.8] - [8.4,7.6])ᵀ
= [-5.4, -3.8][-5.4, -3.8]ᵀ
Step 4: Compute Projection Vector
w = Sw⁻¹(μ₁ - μ₀)
LDA Geometric Visualization:
1D Projection Result:
Key Properties of LDA:
| Property | Description |
|---|---|
| Maximum Components | min(c-1, d) where c=classes, d=features |
| Assumption | Classes are normally distributed |
| Covariance | Assumes equal covariance for all classes |
| Linearity | Finds linear decision boundaries |
Applications:
-
Face Recognition
-
Medical Diagnosis
-
Marketing (Customer Segmentation)
-
Pattern Recognition
-
Bankruptcy Prediction
Advantages vs Disadvantages:
| Advantages | Disadvantages |
|---|---|
| Uses class labels for better separation | Assumes normal distribution |
| Reduces dimensionality preserving class info | Limited to (c-1) components |
| Works well when classes are separable | Sensitive to outliers |
| Computationally efficient | Requires labeled data |
UNIT 14: Association Rules
Q1. ⭐ Explain Association Rules.
[ Dec 2022 Q5(j) | Frequency: 1 ]
Association Rules discover interesting relationships (co-occurrence patterns) among items in large transactional databases (e.g., market-basket data).
A rule is of the form:
- $X \rightarrow Y$, where $X$ and $Y$ are itemsets, and $X \cap Y = \varnothing$
Meaning: “If a transaction contains $X$, it is likely to contain $Y$.”
Key measures
Let $N$ = total number of transactions.
-
Support of itemset $X$
$$\text{support}(X)=\frac{\#(X)}{N}$$where $#(X)$ = number of transactions containing $X$. -
Support of rule $X \rightarrow Y$
$$\text{support}(X\rightarrow Y)=\text{support}(X\cup Y)$$ -
Confidence of rule $X \rightarrow Y$ (conditional probability)
$$\text{confidence}(X\rightarrow Y)=\frac{\text{support}(X\cup Y)}{\text{support}(X)}=P(Y\mid X)$$ -
Lift (strength vs. independence)
$$\text{lift}(X\rightarrow Y)=\frac{\text{confidence}(X\rightarrow Y)}{\text{support}(Y)}$$Interpretation: - Lift $> 1$: positive association
- Lift $= 1$: independent
- Lift $< 1$: negative association
Rule mining tasks
-
Frequent itemset generation: find all itemsets with support $\ge \text{minsup}$
-
Rule generation: from frequent itemsets, generate rules with confidence $\ge \text{minconf}$ (optionally filter by lift)
Example (conceptual) If many receipts contain ${Bread, Milk}$, a rule may be:
- ${Bread}\rightarrow{Milk}$ with high confidence (customers who buy bread often buy milk)
Q2. 🔥 Explain Apriori Algorithm with suitable example.
[ Jun 2024 Q5(b) | Frequency: 1 ]
Apriori is a level-wise algorithm to find frequent itemsets using the Apriori property:
- Apriori property (anti-monotone):
If an itemset is frequent, then all of its non-empty subsets are frequent.
Equivalently, if an itemset is infrequent, all its supersets are infrequent.
This property allows pruning many candidates without counting their supports.
Algorithm idea (level-wise)
-
Find frequent 1-itemsets $L_1$
-
For $k=2,3,\dots$:
- Generate candidate $k$-itemsets $C_k$ by joining $L_{k-1}$ with itself
- Prune candidates whose any $(k-1)$-subset is not in $L_{k-1}$
- Scan database to count supports of $C_k$
-
Keep those meeting minsup as $L_k$
-
Stop when $L_k$ becomes empty
Pseudo-steps
-
$L_1 = { \text{frequent 1-itemsets} }$
-
For $k \ge 2$:
$C_k = \text{apriori_gen}(L_{k-1})$
Count supports of $C_k$ by scanning DB
$L_k = { c \in C_k \mid \text{support}(c)\ge \text{minsup} }$
Diagram:
Small example (illustration of pruning)
If ${A,B,C}$ is a candidate 3-itemset, then Apriori requires:
- ${A,B}$, ${A,C}$, and ${B,C}$ must all be frequent (in $L_2$)
If even one is not frequent, ${A,B,C}$ is pruned without support counting.
Q3. 🔥 Generate association rules using Apriori algorithm for a transaction dataset.
[ Jun 2025 Q3(a), Dec 2024 Q3(a) | Frequency: 2 ]
Below is a complete worked example (same steps apply to any given dataset).
Transaction database ($N=5$)
| TID | Items |
|---|---|
| T1 | Bread, Milk |
| T2 | Bread, Diaper, Beer, Eggs |
| T3 | Milk, Diaper, Beer, Coke |
| T4 | Bread, Milk, Diaper, Beer |
| T5 | Bread, Milk, Diaper, Coke |
Assume:
-
minsup = 40% $\Rightarrow$ minimum support count $=0.40\times 5=2$
-
minconf = 60%
(A) Frequent itemset generation (Apriori)
1-itemset counts ($C_1 \rightarrow L_1$)
| Item | Count | Support |
|---|---|---|
| Bread | 4 | $4/5=0.80$ |
| Milk | 4 | $4/5=0.80$ |
| Diaper | 4 | $4/5=0.80$ |
| Beer | 3 | $3/5=0.60$ |
| Coke | 2 | $2/5=0.40$ |
| Eggs | 1 | $1/5=0.20$ (drop) |
So,
- $L_1={Bread, Milk, Diaper, Beer, Coke}$
2-itemset counts ($C_2 \rightarrow L_2$)
| 2-itemset | Count | Support | Frequent? ($\ge 2$) |
|---|---|---|---|
| Bread, Milk | 3 | $3/5=0.60$ | Yes |
| Bread, Diaper | 3 | $3/5=0.60$ | Yes |
| Bread, Beer | 2 | $2/5=0.40$ | Yes |
| Bread, Coke | 1 | $1/5=0.20$ | No |
| Milk, Diaper | 3 | $3/5=0.60$ | Yes |
| Milk, Beer | 2 | $2/5=0.40$ | Yes |
| Milk, Coke | 2 | $2/5=0.40$ | Yes |
| Diaper, Beer | 3 | $3/5=0.60$ | Yes |
| Diaper, Coke | 2 | $2/5=0.40$ | Yes |
| Beer, Coke | 1 | $1/5=0.20$ | No |
So, $L_2$ includes all “Yes” pairs above.
3-itemset counts ($C_3 \rightarrow L_3$)
(Generated from $L_2$ and pruned using Apriori property)
| 3-itemset | Count | Support | Frequent? ($\ge 2$) |
|---|---|---|---|
| Bread, Milk, Diaper | 2 | $2/5=0.40$ | Yes |
| Bread, Diaper, Beer | 2 | $2/5=0.40$ | Yes |
| Milk, Diaper, Beer | 2 | $2/5=0.40$ | Yes |
| Milk, Diaper, Coke | 2 | $2/5=0.40$ | Yes |
No 4-itemset is frequent (each occurs only once).
(B) Rule generation (minconf = 60%)
Generate rules from each frequent itemset of size $\ge 2$.
Rules from frequent 2-itemsets
For each pair ${X,Y}$: $X\rightarrow Y$ and $Y\rightarrow X$
| Rule | Support | Confidence |
|---|---|---|
| Bread $\rightarrow$ Milk | $3/5=0.60$ | $(3/5)/(4/5)=3/4=0.75$ |
| Milk $\rightarrow$ Bread | $0.60$ | $3/4=0.75$ |
| Bread $\rightarrow$ Diaper | $0.60$ | $3/4=0.75$ |
| Diaper $\rightarrow$ Bread | $0.60$ | $3/4=0.75$ |
| Beer $\rightarrow$ Bread | $2/5=0.40$ | $(2/5)/(3/5)=2/3\approx 0.6667$ |
| Diaper $\rightarrow$ Beer | $3/5=0.60$ | $(3/5)/(4/5)=3/4=0.75$ |
| Beer $\rightarrow$ Diaper | $0.60$ | $(3/5)/(3/5)=3/3=1.00$ |
| Coke $\rightarrow$ Milk | $2/5=0.40$ | $(2/5)/(2/5)=2/2=1.00$ |
| Coke $\rightarrow$ Diaper | $0.40$ | $2/2=1.00$ |
(Examples dropped due to low confidence: Bread$\rightarrow$Beer $=2/4=0.50$, Milk$\rightarrow$Beer $=2/4=0.50$, etc.)
Rules from frequent 3-itemsets
Example for ${A,B,C}$: test $2\rightarrow 1$ rules (and $1\rightarrow 2$ if needed)
1) From ${Bread, Milk, Diaper}$ (support $=2/5$)
-
${Bread, Milk}\rightarrow{Diaper}$: confidence $=(2/5)/(3/5)=2/3\approx 0.6667$
-
${Bread, Diaper}\rightarrow{Milk}$: $(2/5)/(3/5)=2/3\approx 0.6667$
-
${Milk, Diaper}\rightarrow{Bread}$: $(2/5)/(3/5)=2/3\approx 0.6667$
2) From ${Bread, Diaper, Beer}$ (support $=2/5$)
-
${Bread, Diaper}\rightarrow{Beer}$: $(2/5)/(3/5)=2/3\approx 0.6667$
-
${Bread, Beer}\rightarrow{Diaper}$: $(2/5)/(2/5)=2/2=1.00$
-
${Diaper, Beer}\rightarrow{Bread}$: $(2/5)/(3/5)=2/3\approx 0.6667$
-
${Beer}\rightarrow{Bread, Diaper}$: $(2/5)/(3/5)=2/3\approx 0.6667$
3) From ${Milk, Diaper, Beer}$ (support $=2/5$)
-
${Milk, Diaper}\rightarrow{Beer}$: $(2/5)/(3/5)=2/3\approx 0.6667$
-
${Milk, Beer}\rightarrow{Diaper}$: $(2/5)/(2/5)=2/2=1.00$
-
${Diaper, Beer}\rightarrow{Milk}$: $(2/5)/(3/5)=2/3\approx 0.6667$
-
${Beer}\rightarrow{Milk, Diaper}$: $(2/5)/(3/5)=2/3\approx 0.6667$
4) From ${Milk, Diaper, Coke}$ (support $=2/5$)
-
${Milk, Diaper}\rightarrow{Coke}$: $(2/5)/(3/5)=2/3\approx 0.6667$
-
${Milk, Coke}\rightarrow{Diaper}$: $(2/5)/(2/5)=2/2=1.00$
-
${Diaper, Coke}\rightarrow{Milk}$: $(2/5)/(2/5)=2/2=1.00$
-
${Coke}\rightarrow{Milk, Diaper}$: $(2/5)/(2/5)=2/2=1.00$
(Optional) Lift for a few strong rules
-
Diaper $\rightarrow$ Beer: confidence $=0.75$, support(Beer)$=0.60$
$$\text{lift}=\frac{0.75}{0.60}=1.25$$ -
Beer $\rightarrow$ Diaper: confidence $=1.00$, support(Diaper)$=0.80$
$$\text{lift}=\frac{1.00}{0.80}=1.25$$ -
Coke $\rightarrow$ Milk: confidence $=1.00$, support(Milk)$=0.80$
$$\text{lift}=\frac{1.00}{0.80}=1.25$$
Q4. 🔥 Explain the working of FP-Growth algorithm.
[ Jun 2024 Q1(f), Dec 2023 Q5(b) | Frequency: 2 ]
FP-Growth (Frequent Pattern Growth) mines frequent itemsets without candidate generation, using a compressed structure called the FP-tree.
Main idea
-
Compress transactions into an FP-tree using frequency ordering.
-
Mine the FP-tree by exploring conditional pattern bases and conditional FP-trees.
Step 1: First database scan (find frequent 1-itemsets)
-
Compute support count of each item.
-
Remove infrequent items ($< \text{minsup}$).
-
Sort remaining items in descending frequency (global order).
Step 2: Second scan (build FP-tree + header table)
For each transaction:
-
Remove infrequent items
-
Sort items by global frequency order
-
Insert into FP-tree:
- share common prefixes
-
increment counts along existing paths
-
Maintain header table linking all nodes of the same item (node-links)
Diagram:
(This diagram is illustrative: actual node counts/shape depend on minsup and chosen ordering. In exam, draw the FP-tree according to the given transaction set and ordering.)
Step 3: Mine frequent patterns (recursive)
For each item $i$ in header table (usually from least frequent to most):
- Conditional pattern base of $i$
-
collection of prefix paths ending in $i$ (with counts)
-
Build conditional FP-tree for $i$
-
Extract frequent itemsets that end with $i$
-
Repeat recursively until conditional tree becomes empty or single path
Why FP-growth is efficient
-
avoids generating large candidate sets (unlike Apriori)
-
uses compressed representation via shared prefixes
-
mines locally in conditional trees
Q5. ⭐ Give advantages of FP-Growth over Apriori Algorithm.
[ Jun 2024 Q1(f) | Frequency: 1 ]
Advantages of FP-Growth
- No candidate generation
-
Apriori generates $C_2, C_3, \dots$ which can explode combinatorially.
-
Fewer database scans
- FP-growth typically needs two full scans (count + build tree), then mines the tree.
-
Apriori needs one scan per level $k$.
-
Better performance on dense data / long patterns
-
When many items co-occur, Apriori produces huge candidates; FP-growth remains practical due to compression.
-
Compact storage
-
FP-tree compresses repeated transaction prefixes, saving memory and computation.
-
Localized mining
- Conditional FP-trees focus only on relevant portions of data, reducing work.
When Apriori may still be preferred
- Very sparse datasets with short patterns and very high minsup (small search space), due to its simplicity.
Q6. 📌 Explain Pincer Search.
[ Jun 2022 Q5(j) | Frequency: 1 ]
Pincer Search is a frequent itemset mining method designed to find maximal frequent itemsets efficiently by combining:
-
Bottom-up search (like Apriori)
-
Top-down pruning guided by maximal sets
Key terms
-
Maximal Frequent Itemset (MFI): a frequent itemset that has no frequent superset.
-
MFS: set of maximal frequent itemsets found so far.
-
MFCS (Maximal Frequent Candidate Set): a set of candidate maximal itemsets maintained during search.
Working (core idea)
-
Start with MFCS containing the set of all items (or large candidates).
-
At each level $k$:
- Generate and count candidates bottom-up (as in Apriori) to find $L_k$.
-
Simultaneously test members of MFCS for frequency.
-
When an itemset is found infrequent, update MFCS by splitting it (remove items) to ensure MFCS still covers all possible MFIs.
-
When a member of MFCS becomes frequent, it is a maximal frequent itemset and is moved to MFS.
-
Continue until no new frequent itemsets / MFIs can exist.
Diagram:
Outcome
- Efficiently discovers maximal frequent patterns, often faster than enumerating all frequent itemsets when long patterns exist.
Q7. ⭐ What is the advantage of using Pincer Search over Apriori algorithm?
[ Jun 2025 Q1(h) | Frequency: 1 ]
Advantages of Pincer Search over Apriori
- Targets maximal frequent itemsets directly
- Apriori may generate all frequent itemsets, which can be extremely large.
-
Pincer focuses on MFIs, reducing output size and computation.
-
Bidirectional pruning (top-down + bottom-up)
- Top-down testing of MFCS can quickly confirm large frequent sets.
-
Once a maximal frequent itemset is found, many smaller candidates become unnecessary.
-
Fewer candidate generations in dense datasets
- In dense data with long frequent patterns, Apriori’s candidate explosion is severe.
-
Pincer reduces explosion by shrinking MFCS using infrequent evidence.
-
Potentially fewer scans and less counting work
- By eliminating large portions of the search space early, support counting can be reduced compared to pure level-wise Apriori.
Note
- Benefit is strongest when maximal patterns are long and the dataset is dense; for very sparse data, simple Apriori may be competitive due to lower overhead.
UNIT 15: Clustering
Q1. ⭐ Discuss the concept of clustering.
[ Jun 2025 Q1(f) | Frequency: 1 ]
Clustering is an unsupervised learning technique that groups a set of objects such that objects in the same group (called a cluster) are more similar to each other than to those in other clusters.
Key Characteristics:
-
No predefined labels: Unlike classification, clustering does not use labeled training data
-
Similarity-based: Objects are grouped based on distance or similarity measures
-
Exploratory: Used to discover hidden patterns or structures in data
Formal Definition: Given a dataset $D = {x_1, x_2, \ldots, x_n}$, clustering partitions $D$ into $k$ clusters $C = {C_1, C_2, \ldots, C_k}$ such that:
-
$C_1 \cup C_2 \cup \ldots \cup C_k = D$
-
$C_i \cap C_j = \varnothing$ for $i \neq j$ (in hard clustering)
Goals of Clustering:
-
High intra-cluster similarity: Points within a cluster should be close
-
Low inter-cluster similarity: Points in different clusters should be far apart
Common Distance Measures:
| Distance | Formula |
|---|---|
| Euclidean | $d(x, y) = \sqrt{\sum_{i=1}^{n}(x_i - y_i)^2}$ |
| Manhattan | $d(x, y) = \sum_{i=1}^{n} |x_i - y_i|$ |
| Minkowski | $d(x, y) = \left(\sum_{i=1}^{n} |x_i - y_i|^p\right)^{1/p}$ |
Applications:
-
Customer segmentation in marketing
-
Document categorization
-
Image segmentation
-
Anomaly detection
-
Gene expression analysis
Q2. 🔥 Give/List the algorithms for clustering.
[ Jun 2025 Q1(f), Jun 2024 Q1(g), Dec 2023 Q1(a) | Frequency: 3 ]
Major Clustering Algorithms by Category:
| Category | Algorithms | Key Idea |
|---|---|---|
| Partition-based | K-Means, K-Medoids (PAM), CLARA, CLARANS | Divide data into $k$ non-overlapping partitions |
| Hierarchical | Agglomerative (AGNES), Divisive (DIANA) | Build tree of clusters (dendrogram) |
| Density-based | DBSCAN, OPTICS, DENCLUE | Group dense regions, identify noise |
| Grid-based | STING, CLIQUE, WaveCluster | Quantize space into grid cells |
| Model-based | Gaussian Mixture Models (GMM), EM | Fit statistical models to data |
Brief Description of Key Algorithms:
-
K-Means: Partitions data into $k$ clusters by minimizing within-cluster variance; uses mean as centroid
-
K-Medoids (PAM): Similar to K-Means but uses actual data points as cluster centers; robust to outliers
-
Agglomerative Hierarchical: Bottom-up approach; starts with individual points and merges closest clusters iteratively
-
DBSCAN: Groups points based on density (eps, MinPts); can find arbitrarily shaped clusters and identifies outliers
-
Gaussian Mixture Models: Assumes data is generated from a mixture of Gaussian distributions; soft clustering
Diagram:
Q3. 📌 Give suitable example for clustering.
[ Jun 2024 Q1(g) | Frequency: 1 ]
Example: Customer Segmentation in Retail
A retail company wants to segment customers based on their purchasing behavior to create targeted marketing strategies.
Dataset Features:
-
Annual Income (in thousands)
-
Spending Score (1-100)
Sample Data:
| Customer | Income ($K) | Spending Score |
|---|---|---|
| C1 | 15 | 39 |
| C2 | 16 | 81 |
| C3 | 70 | 75 |
| C4 | 72 | 78 |
| C5 | 75 | 10 |
| C6 | 73 | 12 |
After K-Means Clustering ($k=3$):
| Cluster | Customers | Characteristics | Marketing Strategy |
|---|---|---|---|
| Cluster 1 | C1, C2 | Low income, varied spending | Budget-friendly offers |
| Cluster 2 | C3, C4 | High income, high spending | Premium products, loyalty rewards |
| Cluster 3 | C5, C6 | High income, low spending | Value-for-money promotions |
Diagram:
Other Clustering Examples:
-
Image segmentation: Grouping pixels by color/intensity
-
Document clustering: Organizing news articles by topic
-
Biological: Grouping genes with similar expression patterns
Q4. ⭐ Differentiate between Hierarchical clustering and Partition-based clustering.
[ Jun 2025 Q2(a) | Frequency: 1 ]
| Aspect | Hierarchical Clustering | Partition-based Clustering |
|---|---|---|
| Approach | Builds a tree of clusters (dendrogram) | Divides data into flat, non-overlapping partitions |
| Number of clusters ($k$) | Not required in advance; determined by cutting dendrogram | Must specify $k$ beforehand |
| Output | Dendrogram (nested hierarchy) | Flat partition of exactly $k$ clusters |
| Types | Agglomerative (bottom-up), Divisive (top-down) | K-Means, K-Medoids, CLARA |
| Flexibility | Can choose different $k$ from same dendrogram | Need to re-run algorithm for different $k$ |
| Time Complexity | $O(n^2 \log n)$ or $O(n^3)$ | $O(n \cdot k \cdot t)$ where $t$ = iterations |
| Space Complexity | $O(n^2)$ (distance matrix required) | $O(n + k)$ |
| Scalability | Poor for large datasets | Better for large datasets |
| Reversibility | Irreversible (once merged/split, cannot undo) | Iterative refinement possible |
| Cluster Shape | Can handle various shapes | Tends to find spherical clusters |
| Sensitivity | Less sensitive to initialization | Sensitive to initial centroid selection |
Diagram:
Q5. 📌 Give a suitable example of Hierarchical clustering.
[ Jun 2025 Q2(a) | Frequency: 1 ]
Example: Taxonomy of Animals
Hierarchical clustering naturally represents biological classification where organisms are grouped into nested categories.
Dataset: Animals with Features
| Animal | Has Fur | Warm-blooded | Lays Eggs | Aquatic |
|---|---|---|---|---|
| Dog | Yes | Yes | No | No |
| Cat | Yes | Yes | No | No |
| Eagle | No | Yes | Yes | No |
| Penguin | No | Yes | Yes | Yes |
| Salmon | No | No | Yes | Yes |
| Shark | No | No | No | Yes |
Agglomerative Clustering Process:
| Step | Action | Resulting Clusters |
|---|---|---|
| 0 | Initial | {Dog}, {Cat}, {Eagle}, {Penguin}, {Salmon}, {Shark} |
| 1 | Merge Dog, Cat (most similar) | {Dog, Cat}, {Eagle}, {Penguin}, {Salmon}, {Shark} |
| 2 | Merge Eagle, Penguin | {Dog, Cat}, {Eagle, Penguin}, {Salmon}, {Shark} |
| 3 | Merge Salmon, Shark | {Dog, Cat}, {Eagle, Penguin}, {Salmon, Shark} |
| 4 | Merge {Eagle, Penguin}, {Salmon, Shark} | {Dog, Cat}, {Eagle, Penguin, Salmon, Shark} |
| 5 | Final merge | {All Animals} |
Diagram:
Dendrogram Interpretation:
-
Cut at height 1: 6 clusters (individual animals)
-
Cut at height 2: 3 clusters (Mammals, Birds, Fish)
-
Cut at height 3: 2 clusters (Mammals vs Non-Mammals)
Q6. 📌 Give a suitable example of Partition-based clustering.
[ Jun 2025 Q2(a) | Frequency: 1 ]
Example: Grouping Cities by Climate
Cluster cities based on average temperature and annual rainfall using K-Means.
Dataset:
| City | Avg Temp (°C) | Rainfall (mm) |
|---|---|---|
| A (Dubai) | 35 | 50 |
| B (Delhi) | 33 | 45 |
| C (London) | 10 | 200 |
| D (Paris) | 12 | 180 |
| E (Mumbai) | 28 | 100 |
| F (Chennai) | 30 | 110 |
K-Means with $k=3$:
Step 1: Initialize centroids randomly
- $\mu_1 = A(35, 50)$, $\mu_2 = C(10, 200)$, $\mu_3 = E(28, 100)$
Step 2: Assign points to nearest centroid
| City | $d(\mu_1)$ | $d(\mu_2)$ | $d(\mu_3)$ | Cluster |
|---|---|---|---|---|
| A | 0 | 175.9 | 50.2 | C1 |
| B | 5.4 | 178.4 | 55.9 | C1 |
| C | 175.9 | 0 | 118.5 | C2 |
| D | 167.6 | 20.1 | 92.2 | C2 |
| E | 50.2 | 118.5 | 0 | C3 |
| F | 60.4 | 110.2 | 10.2 | C3 |
Step 3: Update centroids
-
$\mu_1 = \frac{(35,50)+(33,45)}{2} = (34, 47.5)$ — Hot-Dry
-
$\mu_2 = \frac{(10,200)+(12,180)}{2} = (11, 190)$ — Cold-Wet
-
$\mu_3 = \frac{(28,100)+(30,110)}{2} = (29, 105)$ — Moderate
Final Result:
| Cluster | Cities | Climate Type |
|---|---|---|
| C1 | Dubai, Delhi | Hot and Dry (Desert) |
| C2 | London, Paris | Cold and Wet (Temperate) |
| C3 | Mumbai, Chennai | Warm and Moderate (Tropical) |
Q7. ⭐ Explain the working of partition-based clustering.
[ Dec 2023 Q1(g) | Frequency: 1 ]
Partition-based clustering divides a dataset of $n$ objects into $k$ non-overlapping clusters, where $k$ is specified in advance.
Objective Function: Minimize the sum of squared distances between each point and its cluster centroid:
where $\mu_i$ is the centroid of cluster $C_i$.
Properties:
-
Each object belongs to exactly one cluster
-
Each cluster has at least one object
-
Clusters are mutually exclusive: $C_i \cap C_j = \varnothing$ for $i \neq j$
General Working Algorithm:
-
Initialization: Select $k$ initial cluster centers (randomly or using heuristics like K-Means++)
-
Assignment: Assign each data point to the nearest cluster center
$$C_i = \{x : \|x - \mu_i\| \leq \|x - \mu_j\|, \forall j \neq i\}$$ -
Update: Recalculate cluster centers
- For K-Means: $\mu_i = \frac{1}{|C_i|} \sum_{x \in C_i} x$ (mean)
-
For K-Medoids: Select actual data point that minimizes total distance
-
Convergence Check: Repeat steps 2-3 until:
- Centroids don't change significantly, OR
- Maximum iterations reached, OR
- No point changes cluster
Diagram:
Characteristics:
-
Produces globular (spherical) clusters
-
Sensitive to initialization and outliers
-
Converges to local optimum
-
Works best with numeric data
Q8. 📌 Mention any two methods used for partition-based clustering.
[ Dec 2023 Q1(g) | Frequency: 1 ]
1. K-Means Algorithm
| Aspect | Description |
|---|---|
| Center Type | Mean of cluster points (virtual point) |
| Distance | Typically Euclidean distance |
| Update Rule | $\mu_i = \frac{1}{|C_i|} \sum_{x \in C_i} x$ |
| Complexity | $O(nkt)$, where $n$ = number of points, $k$ = clusters, $t$ = iterations |
| Pros | Fast, simple, scalable to large datasets |
| Cons | Sensitive to outliers, requires $k$ in advance, finds only spherical clusters |
2. K-Medoids (PAM - Partitioning Around Medoids)
| Aspect | Description |
|---|---|
| Center Type | Actual data point (medoid) |
| Medoid | Point that minimizes sum of dissimilarities within cluster |
| Update Rule | Select $m_i$ such that $\sum_{x \in C_i} d(x, m_i)$ is minimized |
| Complexity | $O(k(n-k)^2)$ per iteration |
| Pros | More robust to outliers and noise, works with any distance metric |
| Cons | Computationally expensive for large datasets |
Comparison:
| Feature | K-Means | K-Medoids |
|---|---|---|
| Center representation | Mean (may not be actual point) | Actual data point |
| Outlier sensitivity | High | Low |
| Distance measures | Euclidean only | Any metric |
| Data types | Numeric only | Categorical also possible |
| Speed | Faster | Slower |
Diagram:
Q9. ⭐ Explain Hierarchical Clustering with suitable example.
[ Jun 2024 Q5(c) | Frequency: 1 ]
Hierarchical Clustering creates a tree-like structure (dendrogram) of nested clusters without requiring the number of clusters in advance.
Types:
-
Agglomerative (Bottom-up): Start with $n$ clusters (each point), merge until one cluster
-
Divisive (Top-down): Start with one cluster (all points), split until $n$ clusters
Agglomerative Algorithm Steps:
-
Treat each data point as a single cluster
-
Compute distance/similarity matrix between all clusters
-
Merge the two closest clusters
-
Update distance matrix
-
Repeat steps 3-4 until single cluster remains
-
Result: Dendrogram showing merge history
Example:
Data Points: A(1,1), B(1,2), C(4,4), D(5,4)
Step 1: Initial Distance Matrix (Euclidean)
| A | B | C | D | |
|---|---|---|---|---|
| A | 0 | 1.0 | 4.24 | 5.0 |
| B | 1.0 | 0 | 3.61 | 4.47 |
| C | 4.24 | 3.61 | 0 | 1.0 |
| D | 5.0 | 4.47 | 1.0 | 0 |
Step 2: Clustering Process (Single Linkage)
| Step | Minimum Distance | Action | Clusters |
|---|---|---|---|
| 0 | — | Initial | {A}, {B}, {C}, {D} |
| 1 | $d(A,B)=1.0$ | Merge A, B | {A,B}, {C}, {D} |
| 2 | $d(C,D)=1.0$ | Merge C, D | {A,B}, {C,D} |
| 3 | $d({A,B},{C,D})=3.61$ | Merge all | {A,B,C,D} |
Updated Distance Matrix after Step 1:
| {A,B} | C | D | |
|---|---|---|---|
| {A,B} | 0 | 3.61 | 4.47 |
| C | 3.61 | 0 | 1.0 |
| D | 4.47 | 1.0 | 0 |
(Single linkage: $d({A,B}, C) = \min(d(A,C), d(B,C)) = \min(4.24, 3.61) = 3.61$)
Diagram:
Cutting the Dendrogram:
-
Cut at height 2.0 → 2 clusters: {A,B} and {C,D}
-
Cut at height 0.5 → 4 clusters: {A}, {B}, {C}, {D}
Q10. 🔥 Explain Single linkage with example.
[ Dec 2024 Q2(b) | Frequency: 1 ]
Single Linkage (MIN / Nearest Neighbor)
Distance between two clusters equals the minimum distance between any pair of points from the two clusters.
Characteristics:
-
Tends to create elongated, chain-like clusters
-
Susceptible to chaining effect (noise can bridge clusters)
-
Can detect non-elliptical shapes
Example:
Data Points: P1(0,0), P2(1,0), P3(5,0), P4(6,0), P5(3,0)
Initial Distance Matrix:
| P1 | P2 | P3 | P4 | P5 | |
|---|---|---|---|---|---|
| P1 | 0 | 1 | 5 | 6 | 3 |
| P2 | 1 | 0 | 4 | 5 | 2 |
| P3 | 5 | 4 | 0 | 1 | 2 |
| P4 | 6 | 5 | 1 | 0 | 3 |
| P5 | 3 | 2 | 2 | 3 | 0 |
Clustering Steps:
| Step | Min Distance | Merge | Clusters |
|---|---|---|---|
| 1 | $d(P1,P2)=1$ | P1, P2 | {P1,P2}, {P3}, {P4}, {P5} |
| 2 | $d(P3,P4)=1$ | P3, P4 | {P1,P2}, {P3,P4}, {P5} |
| 3 | $d({P1,P2},P5)=2$ | {P1,P2}, P5 | {P1,P2,P5}, {P3,P4} |
| 4 | $d({P1,P2,P5},{P3,P4})=2$ | All | {P1,P2,P3,P4,P5} |
Calculation for Step 3:
Calculation for Step 4:
Diagram:
Q11. 🔥 Explain Complete linkage with example.
[ Dec 2024 Q2(b) | Frequency: 1 ]
Complete Linkage (MAX / Farthest Neighbor)
Distance between two clusters equals the maximum distance between any pair of points from the two clusters.
Characteristics:
-
Produces compact, spherical clusters
-
Sensitive to outliers (one distant point affects merge)
-
Avoids chaining effect
Example:
Data Points: P1(0,0), P2(1,0), P3(5,0), P4(6,0), P5(3,0)
Using same initial distance matrix as Single Linkage.
Clustering Steps:
| Step | Min Distance | Merge | Clusters |
|---|---|---|---|
| 1 | $d(P1,P2)=1$ | P1, P2 | {P1,P2}, {P3}, {P4}, {P5} |
| 2 | $d(P3,P4)=1$ | P3, P4 | {P1,P2}, {P3,P4}, {P5} |
| 3 | $d(P5,{P3,P4})=3$ | P5, {P3,P4} | {P1,P2}, {P3,P4,P5} |
| 4 | $d({P1,P2},{P3,P4,P5})=6$ | All | {P1,P2,P3,P4,P5} |
Calculation for Step 3:
(Tie-breaking: choose {P3,P4} and P5)
Calculation for Step 4:
Diagram:
Comparison with Single Linkage:
-
Single linkage merged all at distance 2
-
Complete linkage merged all at distance 6
-
Complete linkage creates tighter clusters
Q12. 🔥 Explain Average linkage with example.
[ Dec 2024 Q2(b) | Frequency: 1 ]
Average Linkage (UPGMA - Unweighted Pair Group Method with Arithmetic Mean)
Distance between two clusters equals the average of all pairwise distances between points from the two clusters.
Characteristics:
-
Compromise between single and complete linkage
-
Less susceptible to noise and outliers
-
Computationally more expensive
Example:
Data Points: P1(0,0), P2(1,0), P3(5,0), P4(6,0)
Initial Distance Matrix:
| P1 | P2 | P3 | P4 | |
|---|---|---|---|---|
| P1 | 0 | 1 | 5 | 6 |
| P2 | 1 | 0 | 4 | 5 |
| P3 | 5 | 4 | 0 | 1 |
| P4 | 6 | 5 | 1 | 0 |
Clustering Steps:
| Step | Action | Clusters |
|---|---|---|
| 1 | Merge P1, P2 (dist=1) | {P1,P2}, {P3}, {P4} |
| 2 | Merge P3, P4 (dist=1) | {P1,P2}, {P3,P4} |
| 3 | Merge {P1,P2}, {P3,P4} | {P1,P2,P3,P4} |
Calculating Average Distance for Step 3:
Comparison of All Linkage Methods:
| Linkage | Distance between {P1,P2} and {P3,P4} | Formula Used |
|---|---|---|
| Single | $\min(5,6,4,5) = 4$ | Minimum |
| Complete | $\max(5,6,4,5) = 6$ | Maximum |
| Average | $\frac{5+6+4+5}{4} = 5$ | Mean |
Diagram:
Q13. ⭐ Explain Density-based clustering with suitable example.
[ Dec 2023 Q5(c) | Frequency: 1 ]
Density-based clustering groups points that are densely packed together, marking points in low-density regions as outliers/noise.
Key Concept: A cluster is a dense region of points separated by low-density regions.
DBSCAN (Density-Based Spatial Clustering of Applications with Noise)
Parameters:
-
$\varepsilon$ (eps): Radius of neighborhood
-
MinPts: Minimum points required to form a dense region
Definitions:
-
$\varepsilon$-neighborhood: $N_\varepsilon(p) = {q \in D : d(p,q) \leq \varepsilon}$
-
Core Point: Point $p$ where $|N_\varepsilon(p)| \geq MinPts$
-
Border Point: Not a core point but within $\varepsilon$ of a core point
-
Noise Point: Neither core nor border
Algorithm:
- For each unvisited point P:
- Mark P as visited
- Find all points within $\varepsilon$ distance ($N_\varepsilon(P)$)
- If $|N_\varepsilon(P)| \geq MinPts$: Create new cluster, expand it
- Else: Mark P as noise (may later become border point)
Example:
Dataset with $\varepsilon = 2$, MinPts = 3
| Point | Coordinates |
|---|---|
| A | (1, 1) |
| B | (1.5, 1.2) |
| C | (0.8, 0.8) |
| D | (8, 8) |
| E | (8.2, 8.5) |
| F | (7.8, 8.2) |
| G | (20, 20) |
Analysis:
| Point | Neighbors within $\varepsilon=2$ | Count | Type |
|---|---|---|---|
| A | B, C | 3 (including self) | Core |
| B | A, C | 3 | Core |
| C | A, B | 3 | Core |
| D | E, F | 3 | Core |
| E | D, F | 3 | Core |
| F | D, E | 3 | Core |
| G | None | 1 | Noise |
Result:
-
Cluster 1: {A, B, C} — bottom-left dense region
-
Cluster 2: {D, E, F} — top-right dense region
-
Noise: {G} — isolated point
Diagram:
Advantages of DBSCAN:
-
No need to specify number of clusters
-
Can find arbitrarily shaped clusters
-
Robust to outliers (identifies them as noise)
-
Works well with spatial data
Q14. 🔥 Explain K-means clustering algorithm with example.
[ Dec 2024 Q2(a), Jun 2023 Q5(j) | Frequency: 2 ]
K-Means is a partition-based algorithm that divides $n$ data points into $k$ clusters by minimizing within-cluster variance.
Objective Function:
Algorithm Steps:
-
Choose $k$ (number of clusters)
-
Initialize $k$ centroids randomly
-
Assign each point to nearest centroid
-
Update centroids as mean of assigned points
-
Repeat 3-4 until convergence
Example:
Data: P1(2,3), P2(3,3), P3(6,8), P4(7,8), P5(8,7), P6(7,9)
$k=2$, Initial centroids: $\mu_1 = P1(2,3)$, $\mu_2 = P3(6,8)$
Iteration 1 - Assignment (Euclidean Distance):
| Point | Coordinates | $d(\mu_1)$ | $d(\mu_2)$ | Cluster |
|---|---|---|---|---|
| P1 | (2,3) | $\sqrt{0+0}=0$ | $\sqrt{16+25}=6.40$ | C1 |
| P2 | (3,3) | $\sqrt{1+0}=1$ | $\sqrt{9+25}=5.83$ | C1 |
| P3 | (6,8) | $\sqrt{16+25}=6.40$ | $\sqrt{0+0}=0$ | C2 |
| P4 | (7,8) | $\sqrt{25+25}=7.07$ | $\sqrt{1+0}=1$ | C2 |
| P5 | (8,7) | $\sqrt{36+16}=7.21$ | $\sqrt{4+1}=2.24$ | C2 |
| P6 | (7,9) | $\sqrt{25+36}=7.81$ | $\sqrt{1+1}=1.41$ | C2 |
Clusters: $C1 = {P1, P2}$, $C2 = {P3, P4, P5, P6}$
Update Centroids:
Iteration 2 - Assignment:
| Point | $d(\mu_1=(2.5,3))$ | $d(\mu_2=(7,8))$ | Cluster |
|---|---|---|---|
| P1 | $\sqrt{0.25+0}=0.5$ | $\sqrt{25+25}=7.07$ | C1 |
| P2 | $\sqrt{0.25+0}=0.5$ | $\sqrt{16+25}=6.40$ | C1 |
| P3 | $\sqrt{12.25+25}=6.10$ | $\sqrt{1+0}=1$ | C2 |
| P4 | $\sqrt{20.25+25}=6.73$ | $\sqrt{0+0}=0$ | C2 |
| P5 | $\sqrt{30.25+16}=6.80$ | $\sqrt{1+1}=1.41$ | C2 |
| P6 | $\sqrt{20.25+36}=7.50$ | $\sqrt{0+1}=1$ | C2 |
No change in cluster membership → Converged!
Final Clusters:
-
C1: {P1(2,3), P2(3,3)} with centroid (2.5, 3)
-
C2: {P3(6,8), P4(7,8), P5(8,7), P6(7,9)} with centroid (7, 8)
Diagram:
Q15. 🔥 Calculate cluster heads using Manhattan distance and K-means algorithm.
[ Jun 2025 Q2(b) | Frequency: 1 ]
Manhattan Distance Formula:
Problem:
Data Points: A(1,2), B(2,1), C(4,3), D(5,4), E(6,5), F(7,6)
$k=2$, Initial centroids: $\mu_1 = A(1,2)$, $\mu_2 = D(5,4)$
Iteration 1 - Assignment:
| Point | Coordinates | $d_{Manhattan}(\mu_1)$ | $d_{Manhattan}(\mu_2)$ | Cluster |
|---|---|---|---|---|
| A | (1,2) | $|1-1|+|2-2|=0$ | $|1-5|+|2-4|=6$ | C1 |
| B | (2,1) | $|2-1|+|1-2|=2$ | $|2-5|+|1-4|=6$ | C1 |
| C | (4,3) | $|4-1|+|3-2|=4$ | $|4-5|+|3-4|=2$ | C2 |
| D | (5,4) | $|5-1|+|4-2|=6$ | $|5-5|+|4-4|=0$ | C2 |
| E | (6,5) | $|6-1|+|5-2|=8$ | $|6-5|+|5-4|=2$ | C2 |
| F | (7,6) | $|7-1|+|6-2|=10$ | $|7-5|+|6-4|=4$ | C2 |
Clusters after Iteration 1:
-
$C1 = {A, B}$
-
$C2 = {C, D, E, F}$
Update Centroids:
Iteration 2 - Assignment:
| Point | Coordinates | $d_{Manhattan}(\mu_1)$ | $d_{Manhattan}(\mu_2)$ | Cluster |
|---|---|---|---|---|
| A | (1,2) | $|1-1.5|+|2-1.5|=1$ | $|1-5.5|+|2-4.5|=7$ | C1 |
| B | (2,1) | $|2-1.5|+|1-1.5|=1$ | $|2-5.5|+|1-4.5|=7$ | C1 |
| C | (4,3) | $|4-1.5|+|3-1.5|=4$ | $|4-5.5|+|3-4.5|=3$ | C2 |
| D | (5,4) | $|5-1.5|+|4-1.5|=6$ | $|5-5.5|+|4-4.5|=1$ | C2 |
| E | (6,5) | $|6-1.5|+|5-1.5|=8$ | $|6-5.5|+|5-4.5|=1$ | C2 |
| F | (7,6) | $|7-1.5|+|6-1.5|=10$ | $|7-5.5|+|6-4.5|=3$ | C2 |
No change in cluster membership → Converged!
Final Results:
| Cluster | Members | Cluster Head (Centroid) |
|---|---|---|
| C1 | A(1,2), B(2,1) | (1.5, 1.5) |
| C2 | C(4,3), D(5,4), E(6,5), F(7,6) | (5.5, 4.5) |
Diagram:
Q16. 📌 Give example of K-means algorithm.
[ Dec 2023 Q4(c)(ii) | Frequency: 1 ]
Problem: Cluster the following 2D points into 2 groups using K-means.
Dataset: P1(2,10), P2(2,5), P3(8,4), P4(5,8), P5(7,5), P6(6,4), P7(1,2), P8(4,9)
Parameters: $k=2$, Initial centroids: $\mu_1 = P1(2,10)$, $\mu_2 = P3(8,4)$
Iteration 1 - Assignment:
| Point | Coordinates | $d(\mu_1)$ | $d(\mu_2)$ | Cluster |
|---|---|---|---|---|
| P1 | (2,10) | $0$ | $\sqrt{36+36}=8.49$ | C1 |
| P2 | (2,5) | $\sqrt{0+25}=5$ | $\sqrt{36+1}=6.08$ | C1 |
| P3 | (8,4) | $8.49$ | $0$ | C2 |
| P4 | (5,8) | $\sqrt{9+4}=3.61$ | $\sqrt{9+16}=5$ | C1 |
| P5 | (7,5) | $\sqrt{25+25}=7.07$ | $\sqrt{1+1}=1.41$ | C2 |
| P6 | (6,4) | $\sqrt{16+36}=7.21$ | $\sqrt{4+0}=2$ | C2 |
| P7 | (1,2) | $\sqrt{1+64}=8.06$ | $\sqrt{49+4}=7.28$ | C2 |
| P8 | (4,9) | $\sqrt{4+1}=2.24$ | $\sqrt{16+25}=6.40$ | C1 |
Clusters: $C1={P1,P2,P4,P8}$, $C2={P3,P5,P6,P7}$
Update Centroids:
Iteration 2 - Assignment:
| Point | $d(\mu_1=(3.25,8))$ | $d(\mu_2=(5.5,3.75))$ | Cluster |
|---|---|---|---|
| P1 | $\sqrt{1.56+4}=2.36$ | $\sqrt{12.25+39.06}=7.16$ | C1 |
| P2 | $\sqrt{1.56+9}=3.25$ | $\sqrt{12.25+1.56}=3.72$ | C1 |
| P3 | $\sqrt{22.56+16}=6.21$ | $\sqrt{6.25+0.06}=2.51$ | C2 |
| P4 | $\sqrt{3.06+0}=1.75$ | $\sqrt{0.25+18.06}=4.28$ | C1 |
| P5 | $\sqrt{14.06+9}=4.80$ | $\sqrt{2.25+1.56}=1.95$ | C2 |
| P6 | $\sqrt{7.56+16}=4.86$ | $\sqrt{0.25+0.06}=0.56$ | C2 |
| P7 | $\sqrt{5.06+36}=6.41$ | $\sqrt{20.25+3.06}=4.83$ | C2 |
| P8 | $\sqrt{0.56+1}=1.25$ | $\sqrt{2.25+27.56}=5.46$ | C1 |
No change in cluster membership → Converged!
Final Results:
| Cluster | Members | Centroid | Interpretation |
|---|---|---|---|
| C1 | P1, P2, P4, P8 | (3.25, 8) | Upper-left region (high Y) |
| C2 | P3, P5, P6, P7 | (5.5, 3.75) | Lower-right region (low Y) |
Diagram:
Quick Revision: Key Clustering Formulas
| Concept | Formula |
|---|---|
| Euclidean Distance | $d(x,y) = \sqrt{\sum_{i=1}^{n}(x_i - y_i)^2}$ |
| Manhattan Distance | $d(x,y) = \sum_{i=1}^{n} \lvert x_i - y_i \rvert$ |
| K-Means Objective | $J = \sum_{i=1}^{k} \sum_{x \in C_i} \lVert x - \mu_i \rVert^2$ |
| Centroid Update | $\mu_i = \frac{1}{\lvert C_i \rvert} \sum_{x \in C_i} x$ |
| Single Linkage | $d(C_i, C_j) = \min_{x \in C_i,\, y \in C_j} d(x,y)$ |
| Complete Linkage | $d(C_i, C_j) = \max_{x \in C_i,\, y \in C_j} d(x,y)$ |
| Average Linkage | $d(C_i, C_j) = \frac{1}{\lvert C_i \rvert \lvert C_j \rvert} \sum_{x \in C_i} \sum_{y \in C_j} d(x,y)$ |
| DBSCAN Core Point | $\lvert N_\varepsilon(p) \rvert \geq \text{MinPts}$ |
UNIT 16: Machine Learning Programming using Python
(This unit covers practical implementations. Questions related to specific algorithms are mapped to their respective theoretical units above.)
Unit 16 Total: 0 families (all covered in respective algorithm units)