MCS-224 : Artificial Intelligence and Machine Learning
Study Material

MCS-224 : Artificial Intelligence and Machine Learning

Complete Previous Year Question Answered

IMPORTANCE LEGEND:

  • 🔥 Most Important: High frequency (3+ times) OR critical foundational topics essential for the course.

  • Very Important: Medium frequency (2 times) OR key operational concepts/extensions.

  • 📌 Important: Standard frequency (1 time) OR specific/niche topics.


BLOCK 1: ARTIFICIAL INTELLIGENCE - INTRODUCTION


UNIT 1: Introduction to Artificial Intelligence


Q1. 🔥 Compare Artificial Narrow Intelligence, Artificial General Intelligence and Artificial Super Intelligence

[ Jun 2025 Q1(a), Jun 2024 Q1(a), Jun 2022 Q1(a) | Frequency: 3 ]

Answer

Artificial Intelligence can be categorized into three types based on capability and scope:

Aspect Artificial Narrow Intelligence (ANI) Artificial General Intelligence (AGI) Artificial Super Intelligence (ASI)
Definition AI designed for a specific task only AI with human-level cognitive abilities across all domains AI that surpasses human intelligence in all aspects
Capability Single task or limited set of tasks Can perform any intellectual task a human can Exceeds best human minds in every field
Learning Task-specific learning Transfer learning across domains Self-improving and evolving autonomously
Consciousness No self-awareness Potential self-awareness Hypothetically fully conscious
Current Status Exists today Theoretical/Research stage Hypothetical/Future concept
Flexibility Cannot adapt beyond programmed scope Highly flexible and adaptable Unlimited adaptability
Examples Siri, Chess engines, Spam filters, Recommendation systems None exists yet (Depicted in movies like "Her") None exists (Depicted in movies like "Terminator")
Also Known As Weak AI Strong AI / Human-level AI Superintelligent AI

Diagram:

graph LR subgraph "AI Classification by Capability" A[Artificial Intelligence] --> B[ANI<br/>Narrow AI] A --> C[AGI<br/>General AI] A --> D[ASI<br/>Super AI] B --> B1[Task Specific<br/>EXISTS TODAY] C --> C1[Human-Level<br/>THEORETICAL] D --> D1[Beyond Human<br/>HYPOTHETICAL] end style B fill:#90EE90 style C fill:#FFD700 style D fill:#FF6B6B

Q2. 📌 Give suitable example for Narrow AI

[ Jun 2022 Q1(a) | Frequency: 1 ]

Answer

Narrow AI (ANI) performs specific tasks with high efficiency but cannot generalize beyond its designed purpose.

Examples of Narrow AI:

Application Description
Virtual Assistants Siri, Alexa, Google Assistant - respond to voice commands for specific tasks
Recommendation Systems Netflix, YouTube, Amazon - suggest content based on user behavior
Image Recognition Google Photos face detection, medical imaging diagnosis
Spam Filters Gmail spam detection - classifies emails as spam or not
Game Playing AI IBM Deep Blue (Chess), AlphaGo (Go game)
Navigation Systems Google Maps - optimal route calculation
Language Translation Google Translate - translates text between languages

Key Characteristic: Each system excels only at its designated task and cannot perform tasks outside its programming.


Q3. 📌 Give suitable example for General AI

[ Jun 2022 Q1(a) | Frequency: 1 ]

Answer

General AI (AGI) would possess human-like cognitive abilities to understand, learn, and apply knowledge across any domain.

Theoretical Examples & Representations:

Category Example
Science Fiction JARVIS from Iron Man - assists in all tasks from research to combat
Movies "Her" (2013) - Samantha, an AI with emotional intelligence
Conceptual A robot that can cook, write poetry, solve math, and hold conversations
Research Goals OpenAI's mission to develop safe AGI

Expected Capabilities of AGI:

  • Reasoning and problem-solving across domains

  • Understanding natural language contextually

  • Learning new skills without explicit programming

  • Emotional understanding and social intelligence

  • Creative thinking and abstract reasoning

Current Status: AGI does not exist yet. It remains a goal of AI research with estimated timelines ranging from decades to centuries.


Q4. 📌 Give suitable example for Super AI

[ Jun 2022 Q1(a) | Frequency: 1 ]

Answer

Super AI (ASI) would surpass the brightest human minds in every conceivable domain including creativity, problem-solving, and social intelligence.

Hypothetical Examples & Representations:

Category Example
Science Fiction Skynet from Terminator - self-aware, self-improving AI
Movies Ultron from Avengers - surpasses creators' intelligence
Literature The Machine in "I Have No Mouth, and I Must Scream"
Theoretical An AI that could solve climate change, cure all diseases, and advance physics beyond human capability

Theoretical Capabilities of ASI:

  • Self-improvement without human intervention

  • Scientific discoveries beyond human comprehension

  • Perfect decision-making capabilities

  • Potential for technological singularity

  • Complete mastery over all human knowledge

Current Status: Purely hypothetical. Raises significant ethical concerns about control, safety, and existential risks.


Q5. ⭐ Explain Turing test (with example/block diagram)

[ Dec 2023 Q1(b), Jun 2023 Q2(a) | Frequency: 2 ]

Answer

Definition: The Turing Test, proposed by Alan Turing in 1950 in his paper "Computing Machinery and Intelligence," is a test of a machine's ability to exhibit intelligent behavior indistinguishable from that of a human.

Original Name: Turing originally called it the "Imitation Game."

Test Setup:

d2 diagram

Procedure:

  1. A human interrogator (C) is separated from two respondents

  2. One respondent is a human (B), the other is a machine (A)

  3. Communication occurs only through text-based interface

  4. Interrogator asks questions to determine which is human

  5. If interrogator cannot reliably distinguish, machine passes the test

Example Conversation:

Interrogator Respondent (Unknown if Human/Machine)
"Do you like music?" "Yes, I enjoy classical music, especially Beethoven."
"What is 15789 × 4673?" "Let me think... I'm not great at mental math. Maybe around 73 million?"
"Write a short poem about rain." "Drops fall soft on leaves, Nature's tears bring life renewed, Earth drinks and rejoices."

Key Points:

  • Tests behavioral intelligence, not internal processes

  • Machine must exhibit human-like responses including errors

  • Text-only communication prevents physical appearance bias

  • Benchmark for conversational AI systems


Q6. ⭐ Explain Chinese room test as Criticism to Turing test (with example)

[ Jun 2024 Q1(b), Jun 2023 Q2(a) | Frequency: 2 ]

Answer

Definition: The Chinese Room Argument, proposed by philosopher John Searle in 1980, is a thought experiment that challenges the idea that passing the Turing Test demonstrates true understanding or intelligence.

The Thought Experiment:

plantuml diagram

Scenario Explained:

Component Description
The Room A closed room with a slot for passing papers
The Person An English-only speaker inside the room
The Rule Book Detailed instructions for manipulating Chinese symbols
Input Chinese questions passed through the slot
Output Chinese responses generated by following rules

How It Works:

  1. Chinese speaker sends questions in Chinese characters

  2. Person inside matches symbols using the rule book

  3. Person produces appropriate Chinese symbol responses

  4. From outside, responses appear perfectly fluent

The Argument:

  • The person inside follows syntax (rules) perfectly

  • BUT has zero semantic understanding (meaning)

  • Similarly, computers manipulate symbols without understanding

  • Passing Turing Test ≠ True Intelligence or Understanding

Example:

Chinese Input Rule Book Says Person's Action Understanding?
你好吗 (How are you?) Symbol pattern A → respond with pattern B Writes 我很好 (I'm fine) NO

Searle's Conclusion:

  • Strong AI Claim (Rejected): Computers can truly understand and have minds

  • Weak AI Claim (Accepted): Computers can simulate intelligent behavior

Significance: Distinguishes between simulation of intelligence and actual understanding/consciousness.


Q7. 🔥 Compare Artificial Intelligence, Machine Learning and Deep Learning

[ Dec 2024 Q1(b), Dec 2023 Q2(a), Dec 2022 Q1(a) | Frequency: 3 ]

Answer

AI, ML, and DL are related but distinct concepts with a hierarchical relationship:

Diagram:

d2 diagram

Comprehensive Comparison:

Aspect Artificial Intelligence Machine Learning Deep Learning
Definition Broad field of creating intelligent machines Subset of AI where machines learn from data Subset of ML using neural networks with many layers
Scope Widest - includes all intelligent systems Narrower - focuses on learning algorithms Narrowest - focuses on neural networks
Approach Rule-based OR learning-based Statistical learning from data Hierarchical feature learning
Human Intervention High - requires explicit programming Medium - requires feature engineering Low - automatic feature extraction
Data Requirement Varies Moderate amounts Large amounts (Big Data)
Hardware Standard computing Standard to moderate High-end GPUs/TPUs required
Training Time Depends on approach Moderate Very long
Interpretability High (rule-based) Moderate Low (Black box)
Examples Expert systems, Robotics, NLP Spam detection, Recommendation systems Image recognition, Voice assistants
Algorithms Search, Logic, Planning Decision Trees, SVM, Random Forest CNN, RNN, Transformers

Relationship Flow:

flowchart LR A[AI] -->|"Learning approach"| B[ML] B -->|"Neural networks with multiple layers"| C[DL] A1["Expert Systems<br/>Rule-based Reasoning"] --> A B1["Linear Regression<br/>Decision Trees<br/>SVM"] --> B C1["CNN<br/>RNN<br/>LSTM<br/>GANs"] --> C

Key Distinctions:

  • AI: Goal is to create intelligent behavior (any method)

  • ML: Method to achieve AI through learning from experience

  • DL: Advanced ML using deep neural networks for complex patterns


Q8. 📌 What do you understand by the term "Agents" in Artificial Intelligence

[ Jun 2022 Q2(a) | Frequency: 1 ]

Answer

Definition: An Agent is anything that can perceive its environment through sensors and act upon that environment through actuators.

Formal Definition: An agent is a system that:

  • Perceives its environment

  • Makes decisions

  • Takes actions to achieve goals

Diagram:

ditaa diagram

Components of an Agent:

Component Function Example (Robot) Example (Software)
Sensors Perceive environment Cameras, Microphones Keyboard input, File reads
Actuators Perform actions Motors, Speakers Display output, File writes
Agent Function Maps percepts to actions Control program Algorithm/Logic

Types of Agents:

  1. Human Agent - Eyes, ears (sensors); Hands, legs (actuators)

  2. Robotic Agent - Cameras, sensors; Motors, grippers

  3. Software Agent - GUI inputs, APIs; Screen output, commands

Agent Function: The mathematical mapping from percept sequence to action:

  • f: P* → A

  • Where P* is the sequence of all percepts and A is the action


Q9. 📌 List the properties supposed to be possessed by agents

[ Jun 2022 Q2(a) | Frequency: 1 ]

Answer

An ideal AI agent should possess the following properties:

Property Description Example
Autonomy Operates without direct human intervention; has control over its actions Self-driving car making decisions independently
Reactivity Perceives environment and responds timely to changes Thermostat adjusting temperature
Pro-activeness Takes initiative; goal-directed behavior Email filter learning new spam patterns
Social Ability Interacts with other agents (human or artificial) Chatbots conversing with users
Mobility Ability to move around in the environment Delivery robots navigating warehouse
Rationality Makes decisions to maximize expected performance Chess AI selecting optimal moves
Learning/Adaptability Improves performance based on experience Recommendation system learning preferences
Temporal Continuity Persistent identity and state over time Virtual assistant remembering past interactions
Benevolence Does not have conflicting goals with user AI assistant following user instructions
Veracity Does not deliberately communicate false information Honest reporting of data/results

Diagram:

mindmap root((Agent Properties)) Autonomy Self-governing Independent decisions Reactivity Environment perception Timely response Pro-activeness Goal-directed Initiative taking Social Ability Agent interaction Communication Rationality Optimal decisions Performance maximization Learning Experience-based improvement Adaptation

Q10. 📌 Compare the SR (simple reflex) agents with model based reflex agents

[ Jun 2022 Q2(a) | Frequency: 1 ]

Answer

Comparison Table:

Aspect Simple Reflex Agent Model-Based Reflex Agent
Definition Acts based only on current percept Maintains internal model of world state
Memory No memory of past percepts Has memory; tracks world state
Decision Basis Condition-Action rules on current input Current percept + Internal state
Environment Handling Only fully observable environments Can handle partially observable environments
Internal State None Maintains internal state
Complexity Simple More complex
World Model No model Has model of how world works
Adaptability Limited; fixed rules Better; can infer unobserved aspects
Example Thermostat (if temp > 25°C → AC on) Robot tracking obstacles it cannot currently see

Diagrams:

Simple Reflex Agent:

plantuml diagram

Model-Based Reflex Agent:

plantuml diagram

Key Difference Summary:

  • Simple Reflex: "See → Act" (stateless)

  • Model-Based: "See → Think (using memory & model) → Act" (stateful)


Q11. 📌 What are task environments in context of Intelligent Agents

[ Dec 2022 Q2(a) | Frequency: 1 ]

Answer

Definition: A Task Environment is the "problem" to which an intelligent agent is the "solution." It specifies the complete setting in which the agent operates.

Task Environment Components:

Component Description Example (Self-driving car)
Performance Measure Criteria for success Safety, time, comfort, fuel efficiency
Environment External surroundings Roads, traffic, weather, pedestrians
Actuators Means to affect environment Steering, brakes, accelerator, signals
Sensors Means to perceive environment Cameras, LIDAR, GPS, speedometer

This is known as PEAS description (Performance, Environment, Actuators, Sensors)

Properties of Task Environments:

Property Options Description
Observability Fully / Partially Can agent see complete state?
Deterministic Deterministic / Stochastic Is next state fully determined by current state + action?
Episodic Episodic / Sequential Are decisions independent or dependent on past?
Static Static / Dynamic Does environment change while agent deliberates?
Discrete Discrete / Continuous Are states/actions finite or infinite?
Agents Single / Multi-agent One agent or multiple agents?
Known Known / Unknown Does agent know environment rules?

Q12. 📌 Explain the standard set of measures for specifying task environment under PEAS

[ Dec 2022 Q2(a) | Frequency: 1 ]

Answer

PEAS is a framework for describing the task environment of an intelligent agent:

Letter Component Description
P Performance Measure Criteria to evaluate agent's success
E Environment External conditions where agent operates
A Actuators Mechanisms through which agent acts
S Sensors Mechanisms through which agent perceives

Diagram:

flowchart TB subgraph PEAS["PEAS Framework"] P[Performance Measure<br/>How success is measured] E[Environment<br/>Where agent operates] A[Actuators<br/>How agent acts] S[Sensors<br/>How agent perceives] end S --> Agent((AGENT)) Agent --> A E <--> Agent P -.-> Agent

PEAS Examples:

Agent Type Performance Environment Actuators Sensors
Taxi Driver Safe, fast, legal, comfortable trip, maximize profits Roads, traffic, pedestrians, customers Steering, accelerator, brake, signal, horn, display Cameras, sonar, speedometer, GPS, engine sensors
Medical Diagnosis System Healthy patient, minimize costs, lawsuits Patient, hospital, staff Display for questions, tests, diagnoses, treatments Keyboard entry of symptoms, findings, patient answers
Vacuum Cleaner Cleanliness, efficiency, battery life Room, floor types, furniture, dust Wheels, brushes, suction Dirt sensor, bump sensor, cliff sensor
Chess Program Winning games Chessboard, opponent, clock Display moves Keyboard/mouse input
Spam Filter Correctly classify emails, minimize false positives Email server, user inbox Move email to spam/inbox Email header, content, sender info

Importance of PEAS:

  1. Provides structured approach to agent design

  2. Clarifies what success means (Performance)

  3. Identifies challenges (Environment)

  4. Determines required capabilities (Actuators & Sensors)

  5. Foundation for selecting appropriate agent architecture


Q1. ⭐ What do you understand by state space in AI

[ Dec 2023 Q2(b), Jun 2023 Q3(a) | Frequency: 2 ]

Answer

Definition: A State Space is a mathematical representation of a problem that includes all possible states (configurations) that can be reached from the initial state through any sequence of actions.

Formal Definition: State Space is represented as a 4-tuple: S = {S, A, Action(s), Result(s,a)}

Component Description Example (8-Puzzle)
S Set of all possible states All arrangements of tiles
A Set of all possible actions Move blank: Up, Down, Left, Right
Action(s) Actions applicable in state s Valid moves from current position
Result(s,a) State resulting from action a in state s New tile arrangement after move

State Space Structure:

Diagram:

graphviz diagram

Key Elements:

  1. Initial State: Starting point of the problem

  2. Goal State: Desired end configuration

  3. State: A unique configuration of the problem

  4. Operators: Actions that transform one state to another

  5. Path: Sequence of states from initial to goal

  6. Path Cost: Sum of costs of individual actions

State Space as a Graph:

  • Nodes represent states

  • Edges represent actions/operators

  • Root node is the initial state

  • Goal nodes are target states


Q2. 📌 What is the utility of state space in AI

[ Dec 2023 Q2(b) | Frequency: 1 ]

Answer

State Space provides a structured framework for problem-solving in AI with the following utilities:

Utility Description
Problem Representation Provides formal mathematical model for any problem
Search Foundation Enables application of various search algorithms
Solution Path Identification Helps find sequence of actions from start to goal
Complexity Analysis Allows measurement of problem difficulty
Algorithm Comparison Enables evaluation of different search strategies
Optimization Facilitates finding optimal (least cost) solutions

Diagram:

blockdiag diagram

Practical Benefits:

  1. Abstraction: Converts real-world problems into computational form

  2. Generalization: Same search techniques work on different problems

  3. Visualization: Graph representation aids understanding

  4. Completeness Check: Ensures all possibilities are considered

  5. Debugging: Easy to trace and verify solution paths


Q3. 📌 Discuss the formulation of state space search (with example)

[ Jun 2022 Q1(b) | Frequency: 1 ]

Answer

State Space Search Formulation involves defining a problem in terms of states, actions, and goals to enable systematic exploration.

Formulation Steps:

Step Description
1. Define State Describe what information represents a configuration
2. Initial State Specify the starting configuration
3. Goal Test Define condition(s) that identify solution
4. Actions/Operators List possible actions in each state
5. Transition Model Define result of each action
6. Path Cost Assign costs to actions (if applicable)

Example: 8-Puzzle Problem

Diagram:

ditaa diagram

Formulation for 8-Puzzle:

Component Description
State A 3×3 array showing position of each tile (0-8)
Initial State Any configuration of tiles
Goal State Tiles arranged as 1,2,3,4,5,6,7,8,blank
Actions Move blank: UP, DOWN, LEFT, RIGHT
Transition Swap blank with adjacent tile
Path Cost Number of moves (each move costs 1)
Goal Test Current state matches goal configuration

State Space Size: 9! = 362,880 possible states (only half are reachable from any given state)


Q4. 📌 What are the various factors for developing a state space representation

[ Dec 2024 Q5(b) | Frequency: 1 ]

Answer

When developing a state space representation, the following factors must be considered:

Factor Description Consideration
State Description What information defines a state Should be complete yet minimal
Initial State Starting configuration Must be clearly defined
Goal State(s) Target configuration(s) May be single, multiple, or condition-based
Operators/Actions Possible state transitions Should be well-defined and finite
Operator Preconditions When can an action be applied Must specify valid conditions
Operator Effects Result of applying action Must produce valid successor states
State Space Size Total number of possible states Affects computational feasibility
Branching Factor Average number of successors Impacts search efficiency
Solution Depth Steps from initial to goal Determines search strategy choice
Path Cost Function Cost assignment to actions Required for optimal solutions

Diagram:

plantuml diagram

Key Principles:

  1. Abstraction: Include only relevant details

  2. Efficiency: Minimize state representation size

  3. Completeness: Capture all necessary information

  4. Uniqueness: Each state should have unique representation

  5. Computability: States must be computationally manageable


Q5. ⭐ Write production rules for state space representation of water jug problem

[ Dec 2023 Q2(b), Jun 2023 Q3(a) | Frequency: 2 ]

Answer

Problem Statement: Given two jugs - one with capacity 4 gallons and another with capacity 3 gallons. Initially both are empty. Goal is to get exactly 2 gallons of water in the 4-gallon jug.

State Representation: (x, y) where:

  • x = gallons in 4-gallon jug (0 ≤ x ≤ 4)

  • y = gallons in 3-gallon jug (0 ≤ y ≤ 3)

Initial State: (0, 0) Goal State: (2, n) where n can be any value 0-3

Production Rules:

Rule No. Condition Action Result Description
R1 x < 4 Fill 4-gallon jug (4, y) Fill 4-gal jug from tap
R2 y < 3 Fill 3-gallon jug (x, 3) Fill 3-gal jug from tap
R3 x > 0 Empty 4-gallon jug (0, y) Empty 4-gal jug on ground
R4 y > 0 Empty 3-gallon jug (x, 0) Empty 3-gal jug on ground
R5 x + y ≥ 4, y > 0 Pour 3-gal into 4-gal (4, y-(4-x)) Fill 4-gal from 3-gal
R6 x + y ≥ 3, x > 0 Pour 4-gal into 3-gal (x-(3-y), 3) Fill 3-gal from 4-gal
R7 x + y ≤ 4, y > 0 Pour all 3-gal into 4-gal (x+y, 0) Empty 3-gal into 4-gal
R8 x + y ≤ 3, x > 0 Pour all 4-gal into 3-gal (0, x+y) Empty 4-gal into 3-gal

Solution Path:

Diagram:

graphviz diagram

Solution Trace:

Step State Rule Applied Description
0 (0, 0) - Initial state
1 (0, 3) R2 Fill 3-gallon jug
2 (3, 0) R7 Pour 3-gal into 4-gal
3 (3, 3) R2 Fill 3-gallon jug
4 (4, 2) R5 Pour from 3-gal to fill 4-gal
5 (0, 2) R3 Empty 4-gallon jug
6 (2, 0) R7 Pour 3-gal into 4-gal ✓

Q6. 📌 Formulate the missionaries and cannibal problem

[ Jun 2025 Q5(b)(i) | Frequency: 1 ]

Answer

Problem Statement: Three missionaries and three cannibals are on one side of a river with a boat that can hold at most two people. Find a way to get everyone to the other side, without ever leaving a group of missionaries outnumbered by cannibals (on either side).

State Representation: (M, C, B) where:

  • M = Number of missionaries on left bank (0, 1, 2, or 3)

  • C = Number of cannibals on left bank (0, 1, 2, or 3)

  • B = Position of boat (L = Left, R = Right)

Diagram:

ditaa diagram

Formulation:

Component Description
Initial State (3, 3, L) - All on left bank, boat on left
Goal State (0, 0, R) - All on right bank, boat on right
Operators Move 1 or 2 people in boat to other side
Constraints Missionaries ≥ Cannibals on both sides (when M > 0)

Possible Actions:

  • Move 1 missionary

  • Move 1 cannibal

  • Move 2 missionaries

  • Move 2 cannibals

  • Move 1 missionary + 1 cannibal


Q7. 📌 Solve the missionaries and cannibal problem

[ Jun 2025 Q5(b)(i) | Frequency: 1 ]

Answer

Solution Steps:

Step Left Bank Boat Move Right Bank State
0 3M, 3C, ⛵ - - (3,3,L) Initial
1 3M, 1C 2C → 2C (3,1,R)
2 3M, 2C, ⛵ ← 1C 1C (3,2,L)
3 3M 2C → 3C (3,0,R)
4 3M, 1C, ⛵ ← 1C 2C (3,1,L)
5 1M, 1C 2M → 2M, 2C (1,1,R)
6 2M, 2C, ⛵ ← 1M,1C 1M, 1C (2,2,L)
7 2C 2M → 3M, 1C (0,2,R)
8 3C, ⛵ ← 1C 3M (0,3,L)
9 1C 2C → 3M, 2C (0,1,R)
10 2C, ⛵ ← 1C 3M, 1C (0,2,L)
11 - 2C → 3M, 3C (0,0,R) Goal

Diagram:

actdiag diagram

Total Moves: 11 crossings


Q8. 📌 Draw the state-space search graph for solving missionaries and cannibal problem

[ Jun 2025 Q5(b)(ii) | Frequency: 1 ]

Answer

State Space Graph:

Diagram:

graphviz diagram

Legend:

  • 🟡 Yellow: Initial State (3,3,L)

  • 🟢 Green: Goal State (0,0,R)

  • Blue edges: Solution path

  • L states: Boat on left bank

  • R states: Boat on right bank


Q9. ⭐ Discuss the Adversarial search briefly

[ Dec 2022 Q1(b), Jun 2022 Q3(a) | Frequency: 2 ]

Answer

Definition: Adversarial Search is a search strategy used in competitive environments where two or more agents have conflicting goals. It is primarily used in game-playing AI.

Characteristics:

Feature Description
Multi-agent Two or more competing agents
Conflicting Goals One agent's gain is another's loss
Turn-based Agents take alternate turns
Zero-sum Total payoff is constant (win/lose)
Opponent Modeling Must anticipate opponent's moves

Diagram:

plantuml diagram

Key Concepts:

Concept Description
MAX Player Agent trying to maximize utility
MIN Player Opponent trying to minimize MAX's utility
Game Tree Tree of all possible game states
Terminal State End of game (win/lose/draw)
Utility Function Score at terminal states

Applications:

  • Chess, Checkers, Tic-Tac-Toe

  • Go, Backgammon

  • Card games

  • Real-time strategy games


[ Jun 2022 Q3(a) | Frequency: 1 ]

Answer

Comparison Table:

Aspect Normal Search Adversarial Search
Agents Single agent Two or more competing agents
Environment Passive Active (opponent acts)
Control Full control over actions Partial control (opponent's turn)
Predictability Deterministic outcomes Must consider opponent's moves
Goal Find path to goal state Win or maximize score
Opponent None Rational opponent assumed
Strategy Optimal path finding Game-theoretic optimal play
Time Constraint Usually flexible Often real-time
Utility Single evaluation Alternating max/min evaluation
Examples 8-puzzle, Route finding Chess, Tic-Tac-Toe

Diagram:

ditaa diagram

Key Differences:

  1. Decision Making:
  2. Normal: Choose best action for self
  3. Adversarial: Choose best action considering opponent's response

  4. Tree Structure:

  5. Normal: Single type of nodes
  6. Adversarial: Alternating MAX and MIN nodes

  7. Evaluation:

  8. Normal: Heuristic estimates distance to goal
  9. Adversarial: Heuristic estimates game position quality

[ Dec 2022 Q1(b), Jun 2022 Q3(a) | Frequency: 2 ]

Answer

Major Adversarial Search Techniques:

Technique Description Use Case
Minimax Complete search assuming optimal opponent Perfect information games
Alpha-Beta Pruning Optimized Minimax with branch elimination Large game trees
Expectiminimax Handles chance elements (dice, cards) Games with randomness
Monte Carlo Tree Search Statistical sampling approach Go, complex games
Iterative Deepening Progressive depth search with time limit Real-time games

Diagram:

plantuml diagram

Technique Details:

1. Minimax Algorithm:

  • Assumes both players play optimally

  • MAX player maximizes score, MIN player minimizes score

  • Complete and optimal for finite games

2. Alpha-Beta Pruning:

  • Enhancement to Minimax

  • Prunes branches that won't affect final decision

  • α = best value for MAX, β = best value for MIN

3. Expectiminimax:

  • Extension for games with chance (dice, cards)

  • Adds "chance nodes" for random events

4. Monte Carlo Tree Search (MCTS):

  • Uses random simulations

  • Balances exploration and exploitation

  • Used in Go (AlphaGo)


Q12. 📌 What is Min-Max Search Strategy

[ Jun 2023 Q1(b) | Frequency: 1 ]

Answer

Definition: The Minimax Search Strategy is a recursive algorithm used in decision-making and game theory for finding the optimal move for a player, assuming that the opponent also plays optimally.

Core Principle:

  • MAX Player: Tries to maximize the score (your move)

  • MIN Player: Tries to minimize the score (opponent's move)

How It Works:

Step Description
1 Generate complete game tree to terminal states
2 Apply utility function to terminal states
3 Propagate values up: MAX nodes take maximum, MIN nodes take minimum
4 Select move leading to highest value at root

Diagram:

graphviz diagram

Value Propagation:

  • Terminal: (3,1), (5,6), (0,2), (4,1)

  • MAX level: max(3,1)=3, max(5,6)=6, max(0,2)=2, max(4,1)=4

  • MIN level: min(3,6)=3, min(2,4)=2

  • ROOT (MAX): max(3,2)=3 → Choose left branch


Q13. 📌 Discuss the Min-Max Search Strategy briefly

[ Dec 2022 Q3(a) | Frequency: 1 ]

Answer

Minimax is a backtracking algorithm used in game theory and AI for decision-making in two-player zero-sum games.

Key Components:

Component Description
Players MAX (us) and MIN (opponent)
Game Tree All possible moves and responses
Terminal States Game end positions with utility values
Backup Values Propagated scores from leaves to root

Algorithm Flow:

Diagram:

seqdiag diagram

Properties:

  • Complete: Yes, if tree is finite

  • Optimal: Yes, against optimal opponent

  • Time Complexity: O(b^m)

  • Space Complexity: O(bm)

Where: b = branching factor, m = maximum depth

Assumptions:

  1. Both players play optimally

  2. Perfect information (both see full game state)

  3. Zero-sum game (one's gain = other's loss)


Q14. 📌 Write MINIMAX algorithm

[ Jun 2023 Q1(b) | Frequency: 1 ]

Answer

MINIMAX Algorithm (Pseudocode):

function MINIMAX(node, depth, isMaximizingPlayer):

    if depth == 0 OR node is terminal:
        return evaluate(node)

    if isMaximizingPlayer:
        maxEval = -∞
        for each child of node:
            eval = MINIMAX(child, depth-1, FALSE)
            maxEval = max(maxEval, eval)
        return maxEval

    else:  // Minimizing player
        minEval = +∞
        for each child of node:
            eval = MINIMAX(child, depth-1, TRUE)
            minEval = min(minEval, eval)
        return minEval

// Initial call
bestMove = MINIMAX(rootNode, maxDepth, TRUE)

Alternative Representation:

function MINIMAX-VALUE(state):
    if TERMINAL-TEST(state):
        return UTILITY(state)

    if PLAYER(state) == MAX:
        return max(MINIMAX-VALUE(s) for s in SUCCESSORS(state))

    else:
        return min(MINIMAX-VALUE(s) for s in SUCCESSORS(state))

Function Descriptions:

Function Purpose
TERMINAL-TEST Checks if game has ended
UTILITY Returns score for terminal state
SUCCESSORS Returns all possible next states
PLAYER Returns whose turn it is
evaluate Heuristic evaluation for non-terminal

Flow Diagram:

graphviz diagram

Q15. ⭐ Explain Min-Max search algorithm (with example, properties, advantages, disadvantages)

[ Jun 2024 Q2(b) | Frequency: 1 ]

Answer

Definition: Minimax is a recursive backtracking algorithm that finds the optimal move for a player in a two-player zero-sum game by minimizing the maximum possible loss.

Example: Simple Game Tree

Diagram:

graphviz diagram

Step-by-Step Evaluation:

Step Node Operation Result
1 D, E Terminal values 3, 5
2 F, G Terminal values 2, 9
3 B (MIN) min(3, 5) 3
4 C (MIN) min(2, 9) 2
5 A (MAX) max(3, 2) 3 → Choose Left

Properties:

Property Value
Complete Yes (for finite game trees)
Optimal Yes (against optimal opponent)
Time Complexity O(b^m)
Space Complexity O(bm) - DFS approach

Where b = branching factor, m = maximum depth

Advantages:

# Advantage
1 Guarantees optimal play against rational opponent
2 Simple and intuitive recursive implementation
3 Complete - will find solution if exists
4 Based on solid game theory foundation
5 Foundation for advanced algorithms (Alpha-Beta)

Disadvantages:

# Disadvantage
1 Exponential time complexity O(b^m)
2 Explores all nodes including unnecessary ones
3 Impractical for complex games (Chess, Go)
4 Cannot handle time constraints
5 Assumes opponent plays optimally

Q16. ⭐ Give properties of MinMax search algorithm

[ Jun 2024 Q2(b), Dec 2022 Q3(a) | Frequency: 2 ]

Answer

Properties of Minimax Algorithm:

Property Description
Completeness Yes - Always finds a solution if one exists (for finite trees)
Optimality Yes - Finds optimal solution against an optimal opponent
Time Complexity O(b^m) - Exponential in maximum depth
Space Complexity O(bm) - Linear space using DFS

Where:

  • b = Branching factor (average number of legal moves)

  • m = Maximum depth of game tree

Diagram:

nwdiag diagram

Complexity Examples:

Game Branching Factor (b) Depth (m) Approximate States
Tic-Tac-Toe ~4 9 ~9! ≈ 362K
Chess ~35 ~100 ~10^120
Go ~250 ~150 ~10^360

Additional Properties:

  • Deterministic: Same input always produces same output

  • Recursive: Natural recursive implementation

  • Zero-sum assumption: One player's gain equals other's loss

  • Perfect information: Both players see complete game state


Q17. ⭐ Write/Give advantages of MinMax search algorithm

[ Jun 2024 Q2(b), Dec 2022 Q3(a) | Frequency: 2 ]

Answer

Advantages of Minimax Algorithm:

# Advantage Description
1 Optimal Solution Guarantees best possible outcome against a rational opponent
2 Complete Will always find a solution if one exists in finite games
3 Simple Logic Easy to understand and implement using recursion
4 Theoretical Foundation Based on solid game theory principles
5 Predictable Deterministic behavior - same situation yields same decision
6 Foundation for Improvements Basis for advanced algorithms like Alpha-Beta pruning
7 No Training Required Works without learning or training data
8 Handles Any Two-Player Game General algorithm applicable to various games
9 Perfect Play Achieves perfect play in games with small state spaces
10 Clear Decision Making Transparent reasoning for chosen moves

Diagram:

plantuml diagram

Q18. ⭐ Write/Give disadvantages of MinMax search algorithm

[ Jun 2024 Q2(b), Dec 2022 Q3(a) | Frequency: 2 ]

Answer

Disadvantages of Minimax Algorithm:

# Disadvantage Description
1 Exponential Time Complexity O(b^m) makes it impractical for complex games
2 Explores All Nodes Wastes time on branches that won't affect result
3 Memory Intensive Large game trees require significant storage
4 No Time Limit Handling Cannot guarantee move within time constraint
5 Assumes Optimal Opponent May not exploit suboptimal opponent moves
6 Requires Complete Tree Needs evaluation of all terminal states
7 Not Suitable for Real-time Too slow for time-critical applications
8 Limited Depth in Practice Must use heuristics for deep games
9 No Learning Cannot improve from past games
10 Deterministic Predictable; opponent can exploit patterns

Diagram:

graphviz diagram

Solutions to Overcome:

Problem Solution
Explores all nodes Alpha-Beta Pruning
Time constraints Iterative Deepening
Deep trees Heuristic Evaluation
Repeated positions Transposition Tables

UNIT 4: Predicate and Propositional Logic - Exam Answers


4.3-4.5 Introduction to Propositional Logic & Logical Connectives


Q1. 🔥 Differentiate between predicate and propositional logic.

[ Dec 2023 Q3(a) | Frequency: 1 ]

Answer

Propositional logic and predicate logic are two fundamental systems in mathematical logic used for knowledge representation in AI.

Aspect Propositional Logic Predicate Logic
Definition Deals with propositions (statements) that are either true or false Extends propositional logic with predicates, quantifiers, and variables
Basic Unit Propositions (atomic statements) Predicates with arguments
Variables No variables for objects Uses variables to represent objects
Quantifiers Not supported Supports ∀ (for all) and ∃ (there exists)
Expressiveness Limited - cannot express relationships between objects Highly expressive - can express complex relationships
Complexity Simple and decidable More complex, semi-decidable
Example P: "It is raining" Raining(x): "x is raining", ∀x Human(x) → Mortal(x)
Connectives AND (∧), OR (∨), NOT (¬), IMPLIES (→), IFF (↔) Same as propositional + quantifiers

Examples:

Propositional Logic:

  • P = "Socrates is a man" (True/False)

  • Q = "Socrates is mortal" (True/False)

  • Expression: P → Q

Predicate Logic:

  • Man(Socrates) - Socrates is a man

  • Mortal(x) - x is mortal

  • Expression: ∀x (Man(x) → Mortal(x))

Key Difference: Propositional logic treats statements as atomic units, while predicate logic can analyze the internal structure of statements using predicates and quantifiers.


4.7 Propositional Rules of Inference


Q2. ⭐ Describe the Modus Ponens as propositional rule of inference.

[ Jun 2023 Q1(d) | Frequency: 1 ]

Answer

Modus Ponens (Latin: "mode of affirming") is one of the most fundamental and widely used rules of inference in propositional logic.

Formal Definition:

Premise 1:  P → Q    (If P then Q)
Premise 2:  P        (P is true)
─────────────────────
Conclusion: Q        (Therefore, Q is true)

Symbolic Representation:

$$\frac{P \rightarrow Q, \quad P}{Q}$$

Explanation:

  • If we know that "P implies Q" is true

  • And we also know that "P" is true

  • Then we can validly conclude that "Q" must be true

Example 1:

  • Premise 1: If it rains, the ground gets wet (Rain → WetGround)

  • Premise 2: It is raining (Rain)

  • Conclusion: The ground is wet (WetGround)

Example 2:

  • Premise 1: If a student studies hard, they will pass (Study → Pass)

  • Premise 2: John studies hard (Study)

  • Conclusion: John will pass (Pass)

Truth Table Verification:

P Q P → Q Valid when P=T and P→Q=T
T T T Q = T ✓
T F F Invalid premise
F T T P not satisfied
F F T P not satisfied

Importance in AI: Modus Ponens is the foundation of forward chaining in rule-based expert systems.


Q3. ⭐ Describe the Modus Tollens as propositional rule of inference.

[ Jun 2023 Q1(d) | Frequency: 1 ]

Answer

Modus Tollens (Latin: "mode of denying") is a rule of inference that allows us to derive a negation from a conditional statement and the negation of its consequent.

Formal Definition:

Premise 1:  P → Q    (If P then Q)
Premise 2:  ¬Q       (Q is false)
─────────────────────
Conclusion: ¬P       (Therefore, P is false)

Symbolic Representation:

$$\frac{P \rightarrow Q, \quad \neg Q}{\neg P}$$

Explanation:

  • If we know that "P implies Q" is true

  • And we also know that "Q" is false

  • Then we can validly conclude that "P" must be false

Example 1:

  • Premise 1: If it rains, the ground gets wet (Rain → WetGround)

  • Premise 2: The ground is not wet (¬WetGround)

  • Conclusion: It is not raining (¬Rain)

Example 2:

  • Premise 1: If the battery is charged, the car will start (Charged → Start)

  • Premise 2: The car does not start (¬Start)

  • Conclusion: The battery is not charged (¬Charged)

Comparison of Modus Ponens and Modus Tollens:

Aspect Modus Ponens Modus Tollens
Direction Forward reasoning Backward reasoning
Given Antecedent is true Consequent is false
Concludes Consequent is true Antecedent is false
Form P→Q, P ⊢ Q P→Q, ¬Q ⊢ ¬P
AI Application Forward chaining Backward chaining

Importance in AI: Modus Tollens is fundamental for backward chaining and goal-driven reasoning in expert systems.


4.9 Validity and Satisfiability


Q4. 📌 Obtain Conjunctive Normal Form (CNF) for given formula.

[ Dec 2022 Q1(d) | Frequency: 1 ]

Answer

Conjunctive Normal Form (CNF): A formula is in CNF when it is a conjunction (AND) of clauses, where each clause is a disjunction (OR) of literals.

General Form: (L₁ ∨ L₂ ∨ ...) ∧ (L₃ ∨ L₄ ∨ ...) ∧ ...

Steps to Convert to CNF:

  1. Eliminate implications (→): P → Q ≡ ¬P ∨ Q

  2. Eliminate biconditionals (↔): P ↔ Q ≡ (P → Q) ∧ (Q → P)

  3. Move negation inward (De Morgan's Laws):

  4. ¬(P ∧ Q) ≡ ¬P ∨ ¬Q
  5. ¬(P ∨ Q) ≡ ¬P ∧ ¬Q
  6. ¬¬P ≡ P

  7. Distribute OR over AND: P ∨ (Q ∧ R) ≡ (P ∨ Q) ∧ (P ∨ R)

Example: Convert (P → Q) ∧ (¬P → R) to CNF

Step 1: Eliminate implications

  • P → Q ≡ ¬P ∨ Q

  • ¬P → R ≡ ¬(¬P) ∨ R ≡ P ∨ R

Step 2: Combine with conjunction

  • (¬P ∨ Q) ∧ (P ∨ R)

Result: (¬P ∨ Q) ∧ (P ∨ R) — Already in CNF

Example 2: Convert ¬(P ∧ Q) → R to CNF

Step 1: Eliminate implication

  • ¬(P ∧ Q) → R ≡ ¬(¬(P ∧ Q)) ∨ R ≡ (P ∧ Q) ∨ R

Step 2: Distribute OR over AND

  • (P ∧ Q) ∨ R ≡ (P ∨ R) ∧ (Q ∨ R)

Result: (P ∨ R) ∧ (Q ∨ R) — CNF achieved


Q5. 📌 Obtain Disjunctive Normal Form for given well form formula.

[ Jun 2022 Q1(d) | Frequency: 1 ]

Answer

Disjunctive Normal Form (DNF): A formula is in DNF when it is a disjunction (OR) of clauses, where each clause is a conjunction (AND) of literals.

General Form: (L₁ ∧ L₂ ∧ ...) ∨ (L₃ ∧ L₄ ∧ ...) ∨ ...

Steps to Convert to DNF:

  1. Eliminate implications (→): P → Q ≡ ¬P ∨ Q

  2. Eliminate biconditionals (↔): P ↔ Q ≡ (P ∧ Q) ∨ (¬P ∧ ¬Q)

  3. Move negation inward (De Morgan's Laws)

  4. Distribute AND over OR: P ∧ (Q ∨ R) ≡ (P ∧ Q) ∨ (P ∧ R)

Example: Convert (P ∨ Q) ∧ (R ∨ S) to DNF

Step 1: Apply distribution of AND over OR

  • (P ∨ Q) ∧ (R ∨ S)

  • = (P ∧ (R ∨ S)) ∨ (Q ∧ (R ∨ S))

  • = (P ∧ R) ∨ (P ∧ S) ∨ (Q ∧ R) ∨ (Q ∧ S)

Result: (P ∧ R) ∨ (P ∧ S) ∨ (Q ∧ R) ∨ (Q ∧ S) — DNF achieved

Example 2: Convert ¬(P → Q) to DNF

Step 1: Eliminate implication

  • ¬(P → Q) ≡ ¬(¬P ∨ Q)

Step 2: Apply De Morgan's Law

  • ¬(¬P ∨ Q) ≡ P ∧ ¬Q

Result: P ∧ ¬Q — Already in DNF (single term)

Comparison CNF vs DNF:

Feature CNF DNF
Outer Connective AND (∧) OR (∨)
Inner Connective OR (∨) AND (∧)
Use in AI Resolution theorem proving Satisfiability checking
Example (A ∨ B) ∧ (C ∨ D) (A ∧ B) ∨ (C ∧ D)

4.14 Propositional Resolution


Q6. 🔥 What do you understand by the term 'Resolution' in AI / Explain the concept.

[ Jun 2024 Q3(b), Dec 2023 Q3(b) | Frequency: 2 ]

Answer

Resolution is a powerful inference rule and proof technique used in automated theorem proving and AI systems for logical reasoning.

Definition: Resolution is a single inference rule that can be applied repeatedly to derive new clauses from existing ones, ultimately proving whether a statement is satisfiable or a consequence of given premises.

The Resolution Rule:

Clause 1:  A ∨ B
Clause 2:  ¬A ∨ C
─────────────────
Resolvent: B ∨ C

Formal Definition:

$$\frac{(L \vee \alpha), \quad (\neg L \vee \beta)}{\alpha \vee \beta}$$

Where L is a literal, and α, β are disjunctions of literals.

Key Concepts:

  1. Complementary Literals: Two literals are complementary if one is the negation of the other (e.g., P and ¬P)

  2. Resolvent: The new clause produced by resolving two clauses

  3. Empty Clause (□): When resolution produces an empty clause, it indicates a contradiction

Diagram:

flowchart TD A["Knowledge Base<br/>(Set of Clauses)"] --> B["Convert to CNF"] B --> C["Add Negation of Goal"] C --> D["Apply Resolution Rule"] D --> E{Empty Clause<br/>Derived?} E -->|Yes| F["Goal is PROVED<br/>(Contradiction Found)"] E -->|No| G{More Clauses<br/>to Resolve?} G -->|Yes| D G -->|No| H["Goal CANNOT<br/>be Proved"]

Resolution Procedure:

  1. Convert all statements to CNF

  2. Negate the conclusion and add to clause set

  3. Repeatedly apply resolution rule

  4. If empty clause (□) is derived → Original statement is proved

  5. If no new clauses can be derived → Statement cannot be proved


Q7. 📌 Discuss the utility of Resolution mechanism in AI.

[ Jun 2024 Q3(b) | Frequency: 1 ]

Answer

Utility of Resolution Mechanism in AI:

Application Area Utility
Theorem Proving Automated proof verification in mathematical systems
Expert Systems Deriving conclusions from knowledge bases
Logic Programming Foundation of Prolog language
Knowledge Representation Verifying consistency of knowledge bases
Planning Systems Proving achievability of goals

Key Advantages:

  1. Completeness: Resolution is refutation-complete — if a contradiction exists, resolution will find it

  2. Uniform Procedure: Single rule applicable to all logical problems

  3. Mechanical Nature: Can be fully automated by computers

  4. Sound: Never produces incorrect conclusions

  5. Efficient for Certain Problems: Works well with Horn clauses

Practical Applications:

  • Prolog: Uses SLD-resolution for query answering

  • SAT Solvers: Use resolution-based techniques

  • Verification Systems: Prove program correctness

  • Database Query Optimization: Logical query processing


Q8. 📌 Discuss the relevance of resolution mechanism in AI briefly.

[ Jun 2022 Q1(e) | Frequency: 1 ]

Answer

Relevance of Resolution in AI:

1. Foundation of Logic Programming:

  • Prolog and similar languages are built on resolution

  • Enables declarative programming paradigm

2. Automated Reasoning:

  • Provides systematic method for machine reasoning

  • Eliminates need for human intervention in proofs

3. Knowledge Base Validation:

  • Checks consistency of expert system knowledge bases

  • Identifies contradictory rules

4. Query Answering Systems:

  • Derives answers from database facts and rules

  • Powers intelligent information retrieval

5. AI Planning:

  • Proves whether goals are achievable

  • Validates plan correctness

Why Resolution is Preferred:

Feature Benefit
Single Rule Simplifies implementation
Refutation Complete Guarantees finding proofs if they exist
Works with CNF Standard form simplifies processing
Suitable for Automation No human insight needed

Q9. 🔥 Apply Resolution to conclude statement from knowledge / with example.

[ Jun 2024 Q3(b), Dec 2023 Q3(b) | Frequency: 2 ]

Answer

Example Problem: Prove "Socrates is mortal" from the following knowledge base:

  1. Socrates is a man

  2. All men are mortal

Step 1: Represent in Predicate Logic

  • Man(Socrates)

  • ∀x (Man(x) → Mortal(x))

Step 2: Convert to Propositional Form (for simplicity)

  • P: Socrates is a man

  • Q: Socrates is mortal

  • Knowledge: P, P → Q

  • Goal: Prove Q

Step 3: Convert to CNF

  • Clause 1: P (given)

  • Clause 2: ¬P ∨ Q (from P → Q)

  • Clause 3: ¬Q (negation of goal for refutation)

Step 4: Apply Resolution

Clause 1: P
Clause 2: ¬P ∨ Q
Clause 3: ¬Q

Resolution Step 1: Resolve Clause 1 and Clause 2
    P
    ¬P ∨ Q
    ─────────
    Q (Resolvent 1)

Resolution Step 2: Resolve Resolvent 1 and Clause 3
    Q
    ¬Q
    ─────────
    □ (Empty Clause - Contradiction!)

Diagram:

flowchart TD C1["Clause 1: P<br/>(Socrates is man)"] C2["Clause 2: ¬P ∨ Q<br/>(Man implies Mortal)"] C3["Clause 3: ¬Q<br/>(Negated Goal)"] C1 --> R1 C2 --> R1 R1["Resolvent: Q<br/>(Socrates is mortal)"] R1 --> R2 C3 --> R2 R2["□ Empty Clause<br/>(CONTRADICTION)"] R2 --> PROOF["✓ PROVED: Socrates is mortal"]

Conclusion: Since we derived the empty clause (□), the negation of our goal leads to a contradiction. Therefore, the original goal "Socrates is mortal" is PROVED.

Another Example: Prove R from {P, P→Q, Q→R}

Step Clauses Used Resolvent
1 P, ¬P∨Q Q
2 Q, ¬Q∨R R
3 R, ¬R

Result: Goal R is proved by deriving empty clause.


Summary Table: Unit 4 Key Concepts

Topic Key Points
Propositional vs Predicate Logic Predicate adds quantifiers (∀,∃) and predicates
Modus Ponens P→Q, P ⊢ Q (forward reasoning)
Modus Tollens P→Q, ¬Q ⊢ ¬P (backward reasoning)
CNF AND of ORs — used in resolution
DNF OR of ANDs — used in satisfiability
Resolution Single inference rule for automated theorem proving

UNIT 5: First Order Logic - Exam Answers


5.2 Syntax of First Order Predicate Logic (FOPL)


Q1. 🔥 Symbolize "Every rational number is a real number" using predicate logic.

[ Dec 2024 Q1(d)(i), Dec 2023 Q3(a)(i) | Frequency: 2 ]

Answer

Given Statement: "Every rational number is a real number"

Step 1: Identify Predicates

  • Let R(x) = "x is a rational number"

  • Let E(x) = "x is a real number"

Step 2: Analyze the Statement

  • "Every" indicates Universal Quantifier (∀)

  • The statement means: For all x, if x is rational, then x is real

Step 3: Symbolic Representation

$$\boxed{\forall x \, (R(x) \rightarrow E(x))}$$

Interpretation: For all x, if x is a rational number, then x is a real number.

Alternative Notation: ∀x (Rational(x) → Real(x))


Q2. 🔥 Symbolize "Some real numbers are rational numbers" using predicate logic.

[ Dec 2024 Q1(d)(ii), Dec 2023 Q3(a)(ii) | Frequency: 2 ]

Answer

Given Statement: "Some real numbers are rational numbers"

Step 1: Identify Predicates

  • Let R(x) = "x is a rational number"

  • Let E(x) = "x is a real number"

Step 2: Analyze the Statement

  • "Some" indicates Existential Quantifier (∃)

  • The statement means: There exists at least one x that is both real AND rational

Step 3: Symbolic Representation

$$\boxed{\exists x \, (E(x) \land R(x))}$$

Interpretation: There exists an x such that x is a real number AND x is a rational number.

Important Note: With existential quantifier, we use conjunction (∧) not implication (→).


Q3. 🔥 Symbolize "Not every real number is a rational number" using predicate logic.

[ Dec 2024 Q1(d)(iii), Dec 2023 Q3(a)(iii) | Frequency: 2 ]

Answer

Given Statement: "Not every real number is a rational number"

Step 1: Identify Predicates

  • Let R(x) = "x is a rational number"

  • Let E(x) = "x is a real number"

Step 2: Analyze the Statement

  • "Not every" = negation of universal quantifier

  • Equivalent to: "There exists some real number that is not rational"

Step 3: Symbolic Representation

Method 1 (Direct Negation):

$$\boxed{\neg \forall x \, (E(x) \rightarrow R(x))}$$

Method 2 (Equivalent Form using De Morgan's for Quantifiers):

$$\boxed{\exists x \, (E(x) \land \neg R(x))}$$

Equivalence Explanation:

  • ¬∀x P(x) ≡ ∃x ¬P(x)

  • ¬(E(x) → R(x)) ≡ E(x) ∧ ¬R(x)

Interpretation: There exists an x such that x is a real number AND x is NOT a rational number (e.g., √2, π).


5.4 Semantics of Quantifiers


Q4. 📌 Translate "∃ₓC(x)" into English (car dealer context).

[ Jun 2024 Q3(a)(i) | Frequency: 1 ]

Answer

Given: ∃x C(x)

Where: C(x) = "x is a car dealer"

Translation:

$$\boxed{\text{"There exists someone who is a car dealer"}}$$

Alternative Translations:

  • "Some person is a car dealer"

  • "At least one car dealer exists"

  • "There is a car dealer"


Q5. 📌 Translate "∃ₓH(x)" into English (honest context).

[ Jun 2024 Q3(a)(ii) | Frequency: 1 ]

Answer

Given: ∃x H(x)

Where: H(x) = "x is honest"

Translation:

$$\boxed{\text{"There exists someone who is honest"}}$$

Alternative Translations:

  • "Some person is honest"

  • "At least one honest person exists"

  • "There is an honest person"


Q6. 🔥 Translate "∀x C(x)→∼H(x)" into English (car dealer/honest context).

[ Jun 2025 Q1(d)(i), Jun 2024 Q3(a)(iii) | Frequency: 2 ]

Answer

Given: ∀x (C(x) → ¬H(x))

Where:

  • C(x) = "x is a car dealer"

  • H(x) = "x is honest"

Translation:

$$\boxed{\text{"All car dealers are not honest"}}$$

Alternative Translations:

  • "Every car dealer is dishonest"

  • "For every person, if they are a car dealer, then they are not honest"

  • "No car dealer is honest"

Logical Breakdown:

Component Meaning
∀x For all x
C(x) x is a car dealer
implies
¬H(x) x is not honest

Q7. 🔥 Translate "∃x (C(x)∧H(x))" into English (car dealer/honest context).

[ Jun 2025 Q1(d)(ii), Jun 2024 Q3(a)(iv) | Frequency: 2 ]

Answer

Given: ∃x (C(x) ∧ H(x))

Where:

  • C(x) = "x is a car dealer"

  • H(x) = "x is honest"

Translation:

$$\boxed{\text{"Some car dealer is honest"}}$$

Alternative Translations:

  • "There exists a car dealer who is honest"

  • "At least one person is both a car dealer and honest"

  • "There is an honest car dealer"

Note: The conjunction (∧) indicates that both properties must hold simultaneously for the same individual.


Q8. 🔥 Translate "∃x (H(x)→C(x))" into English (car dealer/honest context).

[ Jun 2025 Q1(d)(iii), Jun 2024 Q3(a)(v) | Frequency: 2 ]

Answer

Given: ∃x (H(x) → C(x))

Where:

  • C(x) = "x is a car dealer"

  • H(x) = "x is honest"

Translation:

$$\boxed{\text{"There exists someone such that if they are honest, then they are a car dealer"}}$$

Logical Analysis:

Since P → Q ≡ ¬P ∨ Q, we have:

  • H(x) → C(x) ≡ ¬H(x) ∨ C(x)

Equivalent Translations:

  • "There exists someone who is either not honest or is a car dealer"

  • "Some person being honest implies they are a car dealer"

Important Observation: This statement is TRUE if there exists anyone who is:

  • Dishonest (regardless of being a car dealer), OR

  • A car dealer (regardless of being honest)


5.6 Conversion to Clausal Form


Q9. ⭐ What is Prenex Normal Form (PNF)?

[ Jun 2023 Q1(e) | Frequency: 1 ]

Answer

Prenex Normal Form (PNF) is a standardized way of writing First Order Logic formulas where all quantifiers appear at the beginning (prefix) of the formula.

Definition: A formula is in PNF if it has the form:

$$\boxed{Q_1x_1 \, Q_2x_2 \, ... \, Q_nx_n \, (M)}$$

Where:

  • Q₁, Q₂, ..., Qₙ are quantifiers (∀ or ∃)

  • x₁, x₂, ..., xₙ are variables

  • M is a quantifier-free formula (called the matrix)

Structure:

Part Description
Prefix String of quantifiers (∀x₁∃x₂∀x₃...)
Matrix Quantifier-free formula containing predicates and connectives

Example:

  • Not in PNF: ∀x P(x) → ∃y Q(y)

  • In PNF: ∃x ∀y (¬P(x) ∨ Q(y))

Importance in AI:

  • Required for resolution-based theorem proving

  • Standardizes formulas for automated processing

  • Simplifies unification algorithms


Q10. ⭐ Write steps to transform FOPL to PNF.

[ Jun 2024 Q1(c) | Frequency: 1 ]

Answer

Steps to Convert FOPL Formula to Prenex Normal Form:

Step Operation Example
1 Eliminate implications (→) and biconditionals (↔) P→Q becomes ¬P∨Q
2 Move negations inward using De Morgan's laws ¬(P∧Q) becomes ¬P∨¬Q
3 Standardize variables (rename if needed) Ensure each quantifier uses unique variable
4 Move all quantifiers to the front Pull quantifiers to prefix position

Detailed Rules for Step 4:

¬∀x P(x) ≡ ∃x ¬P(x)
¬∃x P(x) ≡ ∀x ¬P(x)

(∀x P(x)) ∧ Q ≡ ∀x (P(x) ∧ Q)    [if x not free in Q]
(∀x P(x)) ∨ Q ≡ ∀x (P(x) ∨ Q)    [if x not free in Q]
(∃x P(x)) ∧ Q ≡ ∃x (P(x) ∧ Q)    [if x not free in Q]
(∃x P(x)) ∨ Q ≡ ∃x (P(x) ∨ Q)    [if x not free in Q]

Diagram:

flowchart TD A["Original FOPL Formula"] --> B["Step 1: Eliminate → and ↔"] B --> C["Step 2: Move ¬ Inward<br/>(De Morgan's Laws)"] C --> D["Step 3: Standardize Variables<br/>(Rename duplicates)"] D --> E["Step 4: Move Quantifiers<br/>to Prefix"] E --> F["Prenex Normal Form<br/>(Prefix + Matrix)"]

Q11. ⭐ Apply the steps to transform given formula to PNF.

[ Jun 2024 Q1(c), Jun 2023 Q1(e) | Frequency: 2 ]

Answer

Example: Convert ∀x P(x) → ∃y Q(y) to PNF

Step 1: Eliminate Implication

$$\forall x \, P(x) \rightarrow \exists y \, Q(y)$$
$$\equiv \neg(\forall x \, P(x)) \lor \exists y \, Q(y)$$

Step 2: Move Negation Inward

$$\equiv \exists x \, \neg P(x) \lor \exists y \, Q(y)$$

Step 3: Standardize Variables

  • Variables x and y are already distinct ✓

Step 4: Move Quantifiers to Front

$$\equiv \exists x \, (\neg P(x) \lor \exists y \, Q(y))$$
$$\equiv \exists x \, \exists y \, (\neg P(x) \lor Q(y))$$

Final PNF:

$$\boxed{\exists x \, \exists y \, (\neg P(x) \lor Q(y))}$$

Another Example: Convert ∀x(P(x) → ∀y Q(x,y)) to PNF

Step Transformation
Original ∀x(P(x) → ∀y Q(x,y))
Eliminate → ∀x(¬P(x) ∨ ∀y Q(x,y))
Move ∀y out ∀x ∀y(¬P(x) ∨ Q(x,y))

Final PNF: ∀x ∀y (¬P(x) ∨ Q(x,y))


Q12. 🔥 What is Skolemization?

[ Dec 2023 Q1(c), Dec 2022 Q1(e) | Frequency: 2 ]

Answer

Skolemization is a technique for eliminating existential quantifiers (∃) from a formula while preserving satisfiability.

Definition: The process of replacing existentially quantified variables with Skolem constants or Skolem functions.

Why Skolemization?

  • Resolution requires formulas in clausal form

  • Clausal form cannot have existential quantifiers

  • Skolemization removes ∃ while maintaining logical equivalence for satisfiability

Rules of Skolemization:

Case Rule Example
∃x not in scope of ∀ Replace with Skolem constant ∃x P(x) → P(a)
∃x in scope of ∀y Replace with Skolem function ∀y∃x P(x,y) → ∀y P(f(y),y)

Key Points:

  • Skolem constants are new unique constants (a, b, c...)

  • Skolem functions are new unique functions (f, g, h...)

  • The resulting formula is satisfiability-equivalent (not logically equivalent)


Q13. ⭐ What is the utility of Skolemization?

[ Dec 2023 Q1(c) | Frequency: 1 ]

Answer

Utility of Skolemization in AI:

Purpose Description
Clause Form Conversion Enables conversion to clausal form for resolution
Automated Theorem Proving Essential step in resolution-based provers
Eliminate ∃ Quantifiers Removes existential quantifiers systematically
Preserve Satisfiability Maintains satisfiability for proof procedures
Simplify Processing Reduces complexity of logical formulas

Importance in Resolution:

  1. Resolution works only with universally quantified clauses

  2. Skolemization removes all ∃ quantifiers

  3. After Skolemization, all variables are implicitly universally quantified

  4. Enables unification algorithm to work properly

Workflow:

Original Formula → PNF → Skolemization → CNF → Resolution


Q14. 🔥 Skolemize the given expression.

[ Dec 2023 Q1(c), Dec 2022 Q1(e) | Frequency: 2 ]

Answer

Example 1: Skolemize ∃x ∀y P(x, y)

Analysis:

  • ∃x is NOT in scope of any ∀

  • Replace x with Skolem constant 'a'

Result:

$$\exists x \, \forall y \, P(x, y) \Rightarrow \boxed{\forall y \, P(a, y)}$$


Example 2: Skolemize ∀x ∃y P(x, y)

Analysis:

  • ∃y IS in scope of ∀x

  • Replace y with Skolem function f(x)

Result:

$$\forall x \, \exists y \, P(x, y) \Rightarrow \boxed{\forall x \, P(x, f(x))}$$


Example 3: Skolemize ∀x ∀y ∃z P(x, y, z)

Analysis:

  • ∃z is in scope of both ∀x and ∀y

  • Replace z with Skolem function f(x, y)

Result:

$$\forall x \, \forall y \, \exists z \, P(x, y, z) \Rightarrow \boxed{\forall x \, \forall y \, P(x, y, f(x, y))}$$


Summary Table:

Original Skolemized Form
∃x P(x) P(a)
∀x ∃y R(x,y) ∀x R(x, f(x))
∃x ∀y ∃z Q(x,y,z) ∀y Q(a, y, g(y))

5.7 Resolution & Unification


Q15. 🔥 Explain the concept of unification in AI.

[ Dec 2023 Q3(b), Jun 2022 Q1(e) | Frequency: 2 ]

Answer

Unification is a fundamental process in AI that finds substitutions to make two logical expressions identical.

Definition: Unification is the process of finding a substitution (called the Most General Unifier or MGU) that makes two expressions syntactically identical.

Key Concepts:

Term Definition
Substitution Mapping of variables to terms {x/a, y/f(b)}
Unifier A substitution that makes two expressions identical
MGU Most General Unifier - the simplest unifier
Unifiable Two expressions that can be made identical

Unification Algorithm Steps:

  1. If both expressions are constants, they unify if identical

  2. If one is a variable, substitute it with the other term

  3. If both are compound terms, unify functor and arguments recursively

  4. Apply occurs check to prevent infinite substitutions

Diagram:

flowchart TD A["Input: Two Expressions<br/>E1 and E2"] --> B{Both are<br/>identical?} B -->|Yes| C["Return empty<br/>substitution {}"] B -->|No| D{Either is<br/>a variable?} D -->|Yes| E["Substitute variable<br/>with other term"] E --> F{Occurs<br/>Check Pass?} F -->|Yes| G["Return substitution"] F -->|No| H["FAIL:<br/>Cannot unify"] D -->|No| I{Both are<br/>compound?} I -->|Yes| J["Unify functors<br/>and arguments"] I -->|No| H J --> K{All parts<br/>unified?} K -->|Yes| L["Compose and<br/>Return MGU"] K -->|No| H

Relevance in AI:

  • Essential for resolution-based theorem proving

  • Used in Prolog for pattern matching

  • Enables logical inference in expert systems


Q16. ⭐ Explain unification in AI with suitable example.

[ Dec 2023 Q3(b) | Frequency: 1 ]

Answer

Unification Examples:

Example 1: Simple Unification

Unify: P(x, a) and P(b, a)

Step Action Result
1 Compare functors P = P ✓
2 Compare first args x and b → substitute {x/b}
3 Compare second args a = a ✓

MGU: {x/b} Unified Expression: P(b, a)


Example 2: Complex Unification

Unify: P(x, f(y)) and P(a, f(b))

Step Expression 1 Expression 2 Substitution
1 P P Match ✓
2 x a {x/a}
3 f(y) f(b) Unify recursively
4 y b {y/b}

MGU: {x/a, y/b} Unified Expression: P(a, f(b))


Example 3: Unification Failure

Unify: P(x, x) and P(a, b)

Step Action Result
1 x unifies with a {x/a}
2 Apply {x/a} → P(a, a)
3 Compare a with b a ≠ b → FAIL

Result: Cannot unify (no MGU exists)


Example 4: Occurs Check

Unify: P(x) and P(f(x))

Step Action Result
1 Try {x/f(x)}
2 Occurs check: x in f(x)? YES → FAIL

Result: Cannot unify (would create infinite term)


Summary of Unification Rules:

Case Result
Identical constants Unify with {}
Different constants FAIL
Variable with term (no occur) Substitute
Variable in term (occurs) FAIL
Same functor, same arity Unify arguments
Different functors/arity FAIL

Quick Reference: FOPL Symbolization Patterns

English Pattern FOPL Symbol
All A are B ∀x (A(x) → B(x))
Some A are B ∃x (A(x) ∧ B(x))
No A are B ∀x (A(x) → ¬B(x))
Not all A are B ∃x (A(x) ∧ ¬B(x))
Some A are not B ∃x (A(x) ∧ ¬B(x))

UNIT 6: Rule Based Systems and Other Formalism


6.2 Rule Based Systems


Q1. 🔥 Explain Rule Based Systems in AI and its types.

[ Dec 2023 Q3(c), Jun 2022 Q5(a) | Frequency: 2 ]

Answer

Rule Based Systems (RBS): A Rule Based System is an AI system that uses a set of IF-THEN rules to represent knowledge and make inferences. It consists of a knowledge base containing rules and facts, and an inference engine that applies rules to derive new information or make decisions.

Components of Rule Based System:

Component Description
Knowledge Base Contains domain-specific rules and facts
Inference Engine Applies rules to facts to derive conclusions
Working Memory Stores current facts and intermediate results
User Interface Allows interaction with the system
Explanation Facility Explains reasoning process to users

Diagram:

plantuml diagram

Types of Rule Based Systems:

Type Description Example
Forward Chaining (Data-Driven) Starts with known facts and applies rules to derive new facts until goal is reached Medical diagnosis systems
Backward Chaining (Goal-Driven) Starts with goal and works backward to find supporting facts Expert systems like MYCIN
Hybrid Systems Combines both forward and backward chaining Complex decision support systems

Rule Format:

IF <condition/antecedent> THEN <action/consequent>

Example:

Rule 1: IF fever AND cough THEN suspect_flu
Rule 2: IF suspect_flu AND body_ache THEN diagnose_flu
Rule 3: IF diagnose_flu THEN prescribe_rest_and_fluids


Q2. ⭐ Give advantages of Rule Based Systems.

[ Dec 2023 Q3(c) | Frequency: 1 ]

Answer

Advantages of Rule Based Systems:

# Advantage Description
1 Modularity Rules can be added, modified, or deleted independently without affecting other rules
2 Transparency Easy to understand how conclusions are reached; provides clear explanation trail
3 Separation of Knowledge Knowledge (rules) is separated from control (inference engine)
4 Easy Maintenance Simple to update and maintain as domain knowledge changes
5 Naturalness IF-THEN rules closely resemble human reasoning patterns
6 Reusability Same inference engine can be used with different knowledge bases
7 Consistency Always produces consistent results for same inputs
8 Expert Knowledge Capture Effectively captures and preserves domain expert knowledge
9 Incremental Development System can be built and tested incrementally
10 Explanation Capability Can explain its reasoning process to users

Q3. ⭐ Give disadvantages of Rule Based Systems.

[ Dec 2023 Q3(c) | Frequency: 1 ]

Answer

Disadvantages of Rule Based Systems:

# Disadvantage Description
1 Knowledge Acquisition Bottleneck Difficult to extract and formalize expert knowledge into rules
2 Scalability Issues Performance degrades as number of rules increases
3 Rule Conflicts Multiple rules may fire simultaneously causing conflicts
4 Brittleness Cannot handle situations not covered by existing rules
5 No Learning Capability Cannot learn from experience; rules must be manually updated
6 Combinatorial Explosion Complex domains require enormous number of rules
7 Maintenance Complexity Large rule bases become difficult to maintain and debug
8 Limited Reasoning Cannot handle uncertain or incomplete information well
9 Inefficiency May check irrelevant rules repeatedly
10 Domain Dependency Rules are specific to one domain; not easily transferable

Q4. 📌 Give the two important sources of uncertainty in Rule Based Systems.

[ Dec 2023 Q3(c) | Frequency: 1 ]

Answer

Two Important Sources of Uncertainty in Rule Based Systems:

1. Uncertainty in Rules (Rule Uncertainty):

  • Rules themselves may not be absolutely certain

  • Expert knowledge is often approximate or probabilistic

  • The relationship between conditions and conclusions may not be 100% reliable

Example:

IF patient_has_fever AND patient_has_cough 
THEN patient_has_flu (Certainty Factor: 0.7)

The rule only suggests 70% confidence in the conclusion.

2. Uncertainty in Data/Facts (Data Uncertainty):

  • Input data may be incomplete, imprecise, or unreliable

  • Sensor readings may have errors

  • User-provided information may be uncertain or vague

Example:

Fact: patient_has_fever (Confidence: 0.8)

The observation itself has only 80% reliability.

Diagram:

graphviz diagram

Handling Uncertainty:

  • Certainty Factors (CF)

  • Fuzzy Logic

  • Bayesian Probability

  • Dempster-Shafer Theory


6.2.1 Forward Chaining


Q5. 🔥 Explain Forward Chaining Systems with example.

[ Jun 2024 Q1(d), Dec 2022 Q5(a) | Frequency: 2 ]

Answer

Forward Chaining: Forward chaining is a data-driven reasoning strategy that starts with available facts and applies inference rules to extract more data until a goal is reached. It works from antecedent (IF part) to consequent (THEN part).

Characteristics:

  • Also called bottom-up reasoning or data-driven reasoning

  • Begins with known facts in working memory

  • Applies all applicable rules to generate new facts

  • Continues until goal is achieved or no more rules can fire

Algorithm:

  1. Start with initial facts in working memory

  2. Find all rules whose conditions (IF part) match the facts

  3. Fire the matched rules and add conclusions to working memory

  4. Repeat steps 2-3 until goal is reached or no rules fire

Diagram:

flowchart TD A[Start with Initial Facts] --> B[Match Rules with Facts] B --> C{Any Rule<br/>Matches?} C -->|Yes| D[Fire Rule & Add<br/>New Facts to Memory] D --> E{Goal<br/>Reached?} E -->|No| B E -->|Yes| F[Stop - Goal Achieved] C -->|No| G[Stop - No Solution] style A fill:#90EE90 style F fill:#87CEEB style G fill:#FFB6C1

Example:

Knowledge Base (Rules):

R1: IF animal_has_feathers THEN animal_is_bird
R2: IF animal_is_bird AND animal_cannot_fly THEN animal_is_penguin
R3: IF animal_is_penguin THEN animal_lives_in_cold_regions

Initial Facts:

  • animal_has_feathers

  • animal_cannot_fly

Forward Chaining Process:

Step Working Memory Rule Fired New Fact Added
1 animal_has_feathers, animal_cannot_fly R1 animal_is_bird
2 animal_has_feathers, animal_cannot_fly, animal_is_bird R2 animal_is_penguin
3 All above + animal_is_penguin R3 animal_lives_in_cold_regions

Applications:

  • Planning systems

  • Monitoring and control systems

  • Configuration systems

  • CLIPS, OPS5 expert systems


6.2.2 Backward Chaining


Q6. 🔥 Explain Backward Chaining System with example.

[ Dec 2023 Q1(d), Jun 2023 Q5(a) | Frequency: 2 ]

Answer

Backward Chaining: Backward chaining is a goal-driven reasoning strategy that starts with a goal (hypothesis) and works backward to find facts that support the goal. It works from consequent (THEN part) to antecedent (IF part).

Characteristics:

  • Also called top-down reasoning or goal-driven reasoning

  • Begins with a goal to prove

  • Finds rules whose conclusions match the goal

  • Sets rule conditions as new subgoals

  • Continues until all subgoals are proven by known facts

Algorithm:

  1. Start with the goal to be proven

  2. Find rules whose THEN part matches the goal

  3. Set the IF conditions of matching rules as subgoals

  4. Recursively try to prove each subgoal

  5. If subgoal matches a known fact, it is proven

  6. Goal is proven when all subgoals are proven

Diagram:

flowchart TD A[Start with Goal] --> B[Find Rules Matching Goal] B --> C{Rule<br/>Found?} C -->|Yes| D[Extract Conditions<br/>as Subgoals] D --> E{Subgoal is<br/>Known Fact?} E -->|Yes| F[Subgoal Proven] E -->|No| G[Set as New Goal] G --> B F --> H{All Subgoals<br/>Proven?} H -->|Yes| I[Goal Proven - Success] H -->|No| D C -->|No| J[Goal Cannot be Proven] style A fill:#FFD700 style I fill:#90EE90 style J fill:#FFB6C1

Example:

Knowledge Base (Rules):

R1: IF has_feathers THEN is_bird
R2: IF is_bird AND cannot_fly THEN is_penguin
R3: IF is_penguin THEN lives_in_cold

Known Facts: has_feathers, cannot_fly

Goal: Prove "lives_in_cold"

Backward Chaining Process:

Step Current Goal Matching Rule Subgoals Generated
1 lives_in_cold R3 is_penguin
2 is_penguin R2 is_bird, cannot_fly
3 is_bird R1 has_feathers
4 has_feathers Known fact ✓
5 cannot_fly Known fact ✓

Result: Goal "lives_in_cold" is PROVEN

Applications:

  • Diagnostic systems (MYCIN)

  • Query answering systems

  • Prolog programming language

  • Theorem proving

Comparison:

Aspect Forward Chaining Backward Chaining
Direction Data → Goal Goal → Data
Also called Bottom-up Top-down
Best for Multiple goals possible Specific goal to prove
Efficiency May derive irrelevant facts Focused on goal

6.3 Semantic Nets


Q7. 🔥 Explain Semantic Nets with suitable example.

[ Jun 2024 Q3(c)(iii), Dec 2022 Q5(b) | Frequency: 2 ]

Answer

Semantic Networks: Semantic networks (Semantic Nets) are knowledge representation schemes that use a graph structure consisting of nodes and labeled edges (arcs) to represent objects, concepts, and relationships between them.

Components:

Component Representation Description
Nodes Circles/Boxes Represent objects, concepts, or entities
Edges/Arcs Labeled arrows Represent relationships between nodes
Labels Text on edges Describe the type of relationship

Common Relationship Types:

Relationship Meaning Example
IS-A Class membership/inheritance Dog IS-A Animal
HAS-A Part-whole/composition Car HAS-A Engine
CAN Capability Bird CAN Fly
INSTANCE-OF Specific instance of class Tommy INSTANCE-OF Dog

Example: Representing Animal Kingdom

Diagram:

graphviz diagram

Advantages of Semantic Networks:

  1. Visual and intuitive representation

  2. Supports inheritance through IS-A links

  3. Easy to understand relationships

  4. Natural for representing taxonomies

  5. Efficient for certain types of queries

Disadvantages:

  1. No standard definition of link types

  2. Difficult to represent complex relationships

  3. Limited expressiveness for logic

  4. Inheritance can be ambiguous

Applications:

  • Natural Language Processing

  • Knowledge bases

  • Information retrieval

  • Conceptual modeling


Q8. ⭐ Draw a semantic network for given English statement.

[ Dec 2024 Q5(a) | Frequency: 1 ]

Answer

Statement Example: "John owns a red car. The car has four wheels. John is a person who works at IBM."

Diagram:

graphviz diagram

General Steps to Draw Semantic Network:

Step Action
1 Identify all entities/objects (nouns) → Create nodes
2 Identify relationships (verbs, prepositions) → Create labeled edges
3 Identify properties/attributes → Create property nodes
4 Connect nodes with appropriate relationship labels
5 Use IS-A for class membership, INSTANCE-OF for specific objects

6.4 Frames


Q9. 🔥 Explain Frames with suitable example.

[ Jun 2024 Q3(c)(i), Jun 2022 Q5(b) | Frequency: 2 ]

Answer

Frames: Frames are a structured knowledge representation scheme proposed by Marvin Minsky (1975). A frame represents a stereotyped situation or object as a collection of slots (attributes) and their values (fillers). Frames capture both declarative and procedural knowledge.

Structure of a Frame:

Component Description
Frame Name Unique identifier for the frame
Slots Attributes or properties of the frame
Facets Meta-information about slots (type, default, constraints)
Fillers/Values Actual values assigned to slots
Procedures Actions triggered when slots are accessed/modified

Slot Facets:

Facet Purpose
Value Current value of the slot
Default Value used if no specific value given
Type Data type constraint
Range Valid value range
If-needed Procedure to compute value when needed
If-added Procedure triggered when value is added
If-removed Procedure triggered when value is removed

Example: University Frame System

Frame: PERSON
  Slots:
    Name: [Type: String]
    Age: [Type: Integer, Range: 0-150]
    Gender: [Type: {Male, Female, Other}]

Frame: STUDENT (IS-A: PERSON)
  Slots:
    Student_ID: [Type: String]
    Department: [Type: String]
    GPA: [Type: Float, Range: 0.0-10.0, Default: 0.0]
    Courses: [Type: List]
    Advisor: [Type: PROFESSOR]

Frame: JOHN (INSTANCE-OF: STUDENT)
  Slots:
    Name: "John Smith"
    Age: 21
    Gender: Male
    Student_ID: "CS2024001"
    Department: "Computer Science"
    GPA: 8.5
    Courses: [AI, DBMS, Networks]

Diagram:

plantuml diagram

Features of Frames:

  1. Inheritance - Child frames inherit slots from parent frames

  2. Default Values - Provide typical values for slots

  3. Procedural Attachment - Demons/triggers for slot operations

  4. Multiple Inheritance - Can inherit from multiple parent frames

Advantages:

  • Natural representation of objects and classes

  • Supports inheritance and default reasoning

  • Combines data and procedures

  • Easy to understand and organize

Disadvantages:

  • Complex inheritance can be confusing

  • Difficult to handle exceptions

  • No standard frame language


6.5 Scripts


Q10. 🔥 Explain Scripts with suitable example.

[ Jun 2024 Q3(c)(ii), Jun 2023 Q5(b) | Frequency: 2 ]

Answer

Scripts: Scripts are structured knowledge representation schemes proposed by Schank and Abelson (1977) for representing stereotyped sequences of events. A script captures the typical sequence of actions that occur in familiar situations, enabling understanding and prediction of events.

Components of a Script:

Component Description Example (Restaurant Script)
Entry Conditions Prerequisites for script activation Customer is hungry, has money
Roles Participants in the script Customer, Waiter, Chef, Cashier
Props Objects used in the script Menu, Table, Food, Bill, Money
Scenes Major subdivisions of the script Entering, Ordering, Eating, Paying
Track Specific variation of the script Fast-food, Fine-dining, Cafeteria
Results Outcomes after script execution Customer not hungry, has less money

Example: Restaurant Script

SCRIPT: RESTAURANT
Track: Sit-down Restaurant

ENTRY CONDITIONS:
  - Customer is hungry
  - Customer has money
  - Restaurant is open

ROLES:
  - Customer (C)
  - Waiter (W)
  - Chef (CH)
  - Cashier (CA)

PROPS:
  - Tables, Chairs, Menu
  - Food, Plates, Utensils
  - Bill, Money/Card

SCENES:

Scene 1: ENTERING
  C enters restaurant
  C waits to be seated
  W leads C to table
  C sits down

Scene 2: ORDERING
  W brings menu
  C reads menu
  C decides on food
  W takes order
  W gives order to CH

Scene 3: EATING
  CH prepares food
  W brings food to C
  C eats food

Scene 4: PAYING
  C asks for bill
  W brings bill
  C pays CA
  C leaves tip
  C exits restaurant

RESULTS:
  - Customer is not hungry
  - Customer has less money
  - Restaurant has more money

Diagram:

flowchart LR subgraph Entry["Entry Conditions"] E1[Hungry] E2[Has Money] E3[Restaurant Open] end subgraph S1["Scene 1: Entering"] A1[Enter] --> A2[Wait] --> A3[Seated] end subgraph S2["Scene 2: Ordering"] B1[Get Menu] --> B2[Read] --> B3[Order] end subgraph S3["Scene 3: Eating"] C1[Food Prepared] --> C2[Food Served] --> C3[Eat] end subgraph S4["Scene 4: Paying"] D1[Get Bill] --> D2[Pay] --> D3[Exit] end subgraph Results["Results"] R1[Not Hungry] R2[Less Money] end Entry --> S1 --> S2 --> S3 --> S4 --> Results style Entry fill:#90EE90 style Results fill:#87CEEB

Applications of Scripts:

  1. Natural Language Understanding - Understanding stories and conversations

  2. Question Answering - Inferring unstated information

  3. Prediction - Anticipating what happens next

  4. Planning - Generating action sequences

Example Usage: Statement: "John went to a restaurant and ordered pizza."

Inferences from Restaurant Script:

  • John was probably hungry

  • John sat at a table

  • A waiter took his order

  • John will receive pizza

  • John will pay a bill

Advantages:

  • Captures temporal sequences naturally

  • Enables inference of unstated events

  • Supports expectation-based understanding

  • Good for stereotyped situations

Disadvantages:

  • Limited to familiar, routine situations

  • Cannot handle novel situations

  • Difficult to modify or combine scripts

  • Scripts may be culture-specific


Summary: Comparison of Knowledge Representation Schemes

Scheme Structure Best For Limitations
Rule-Based IF-THEN rules Decision making, diagnosis No learning, scalability
Semantic Nets Nodes + labeled edges Taxonomies, relationships Limited expressiveness
Frames Slots + values + procedures Object representation Complex inheritance
Scripts Event sequences Stereotyped situations Cannot handle novelty

UNIT 7: Probabilistic Reasoning


7.3 Review of Probability Theory


Q1. 📌 Describe the event "The maximum score is 6" when a six-faced die is thrown twice.

[ Jun 2025 Q5(a)(i) | Frequency: 1 ]

Answer

Sample Space: When a six-faced die is thrown twice, the sample space S consists of all ordered pairs (i, j) where i, j ∈ {1, 2, 3, 4, 5, 6}.

Total outcomes = 6 × 6 = 36

Event Description: The event "Maximum score is 6" means at least one of the two throws must show 6, and no throw shows more than 6 (which is impossible anyway).

Event A = {(i, j) : max(i, j) = 6}

This includes:

  • First throw is 6, second is anything: (6,1), (6,2), (6,3), (6,4), (6,5), (6,6)

  • Second throw is 6, first is anything except 6: (1,6), (2,6), (3,6), (4,6), (5,6)

Event Set:

A = {(6,1), (6,2), (6,3), (6,4), (6,5), (6,6), 
     (1,6), (2,6), (3,6), (4,6), (5,6)}

Number of favorable outcomes = 11

Probability = 11/36

Diagram:

ditaa diagram

Q2. 📌 Describe the event "The total score is 9" when a six-faced die is thrown twice.

[ Jun 2025 Q5(a)(ii) | Frequency: 1 ]

Answer

Sample Space: When a six-faced die is thrown twice, total outcomes = 6 × 6 = 36

Event Description: The event "Total score is 9" means the sum of both throws equals 9.

Event B = {(i, j) : i + j = 9}

Finding pairs where i + j = 9:

First Throw (i) Second Throw (j) Sum
3 6 9 ✓
4 5 9 ✓
5 4 9 ✓
6 3 9 ✓

Event Set:

B = {(3,6), (4,5), (5,4), (6,3)}

Number of favorable outcomes = 4

Probability = 4/36 = 1/9

Diagram:

ditaa diagram

Q3. 📌 Describe the event "Each throw results in an even score larger than 2" when die thrown twice.

[ Jun 2025 Q5(a)(iii) | Frequency: 1 ]

Answer

Sample Space: When a six-faced die is thrown twice, total outcomes = 6 × 6 = 36

Event Description: The event requires BOTH throws to satisfy:

  • Even number (2, 4, or 6)

  • Larger than 2 (3, 4, 5, or 6)

Intersection of conditions: Even AND > 2 → {4, 6}

Event C = {(i, j) : i ∈ {4, 6} and j ∈ {4, 6}}

Event Set:

C = {(4,4), (4,6), (6,4), (6,6)}

First Throw Second Throw Valid?
4 4
4 6
6 4
6 6

Number of favorable outcomes = 4

Probability = 4/36 = 1/9

Diagram:

graphviz diagram

7.4 Introduction to Bayesian Theory


Q4. 🔥 Write Bayes' Theorem and explain each component.

[ Jun 2024 Q4(b), Jun 2022 Q1(g) | Frequency: 2 ]

Answer

Bayes' Theorem:

$$P(H|E) = \frac{P(E|H) \times P(H)}{P(E)}$$

Expanded Form (using Total Probability):

$$P(H|E) = \frac{P(E|H) \times P(H)}{P(E|H) \times P(H) + P(E|\neg H) \times P(\neg H)}$$

Components Explained:

Component Name Description
P(H|E) Posterior Probability Probability of hypothesis H being true AFTER observing evidence E
P(H) Prior Probability Initial probability of hypothesis H BEFORE observing evidence
P(E|H) Likelihood Probability of observing evidence E IF hypothesis H is true
P(E) Evidence/Marginal Total probability of observing evidence E under all hypotheses
P(¬H) Prior of negation Probability that hypothesis H is false = 1 - P(H)

Diagram:

plantuml diagram

Interpretation:

  • Prior P(H): What we believed before seeing evidence

  • Likelihood P(E|H): How well evidence supports hypothesis

  • Posterior P(H|E): Updated belief after seeing evidence

Example:

Medical Diagnosis:

  • H = Patient has disease

  • E = Test result is positive

  • P(H) = 0.01 (1% of population has disease)

  • P(E|H) = 0.95 (test is 95% accurate for diseased)

  • P(E|¬H) = 0.05 (5% false positive rate)

$$P(H|E) = \frac{0.95 \times 0.01}{0.95 \times 0.01 + 0.05 \times 0.99} = \frac{0.0095}{0.059} ≈ 0.161$$

Even with positive test, only 16.1% chance of actually having the disease!

Applications:

  • Medical diagnosis

  • Spam filtering

  • Machine learning classifiers

  • Decision making under uncertainty


7.5 Bayes' Networks


Q5. ⭐ Explain Bayes' Networks.

[ Dec 2022 Q5(c) | Frequency: 1 ]

Answer

Bayesian Networks (Belief Networks): A Bayesian Network is a probabilistic graphical model that represents a set of random variables and their conditional dependencies using a Directed Acyclic Graph (DAG). It combines graph theory with probability theory for reasoning under uncertainty.

Components:

Component Description
Nodes Represent random variables
Directed Edges Represent causal/conditional dependencies
CPT (Conditional Probability Tables) Specify probability of each node given its parents
DAG Structure No cycles allowed in the graph

Key Concepts:

Term Meaning
Parent Node Node with outgoing edge to another node
Child Node Node receiving edge from parent
Root Node Node with no parents
Leaf Node Node with no children
Markov Blanket Parents, children, and children's parents

Example: Burglary Alarm Network

Diagram:

graphviz diagram

Conditional Probability Tables:

P(Burglary):

B P(B)
T 0.001
F 0.999

P(Alarm | Burglary, Earthquake):

B E P(A=T)
T T 0.95
T F 0.94
F T 0.29
F F 0.001

P(JohnCalls | Alarm):

A P(J=T)
T 0.90
F 0.05

Joint Probability Calculation:

$$P(B, E, A, J, M) = P(B) \times P(E) \times P(A|B,E) \times P(J|A) \times P(M|A)$$

Properties:

  1. Conditional Independence: A node is independent of non-descendants given its parents

  2. Compact Representation: Requires fewer parameters than full joint distribution

  3. Inference: Can compute any probability query from the network

Advantages:

  • Visual representation of dependencies

  • Handles uncertainty naturally

  • Supports both prediction and diagnosis

  • Compact knowledge representation

Applications:

  • Medical diagnosis systems

  • Fault detection

  • Risk assessment

  • Decision support systems


7.9 Dempster-Shafer Theory


Q6. ⭐ What are the different types of evidences in D-S (Dempster-Shafer) theory? Explain them briefly.

[ Jun 2025 Q1(e) | Frequency: 1 ]

Answer

Dempster-Shafer Theory (also called Theory of Evidence) handles uncertainty by allowing belief to be assigned to sets of hypotheses rather than single hypotheses.

Types of Evidence in D-S Theory:

1. Vacuous Evidence:

  • Provides no information about hypotheses

  • Belief mass assigned entirely to the frame of discernment (Θ)

  • m(Θ) = 1, m(A) = 0 for all proper subsets A

  • Represents complete ignorance

2. Certain Evidence:

  • Provides complete certainty about a single hypothesis

  • All belief mass assigned to one singleton set

  • m({h}) = 1 for some hypothesis h

  • Represents absolute knowledge

3. Consonant Evidence:

  • Focal elements are nested (one contains another)

  • Forms a chain: A₁ ⊂ A₂ ⊂ ... ⊂ Aₙ

  • Belief and Plausibility values are related simply

  • Example: m({a}) = 0.3, m({a,b}) = 0.5, m({a,b,c}) = 0.2

4. Bayesian Evidence:

  • All focal elements are singletons

  • Equivalent to standard probability

  • m assigns mass only to individual hypotheses

  • Bel(A) = Pl(A) for all subsets A

5. Categorical/Dogmatic Evidence:

  • Supports a specific proper subset with certainty

  • m(A) = 1 for some A ⊂ Θ (where A ≠ Θ)

  • No uncertainty about the supported set

Diagram:

plantuml diagram
Type Focal Elements Characteristic
Vacuous Only Θ No information
Certain One singleton Complete certainty
Consonant Nested sets Hierarchical structure
Bayesian All singletons Traditional probability
Categorical One proper subset Dogmatic belief

Q7. 🔥 Explain Dempster-Shafer Theory with example.

[ Jun 2025 Q4(b), Jun 2022 Q5(c) | Frequency: 2 ]

Answer

Dempster-Shafer Theory: The Dempster-Shafer (D-S) theory is a mathematical framework for representing and combining uncertain evidence. It generalizes Bayesian probability by allowing belief to be assigned to sets of hypotheses rather than requiring probability for each individual hypothesis.

Key Concepts:

Concept Symbol Description
Frame of Discernment Θ Set of all mutually exclusive hypotheses
Power Set 2^Θ All possible subsets of Θ
Mass Function m(A) Basic probability assignment to subset A
Belief Function Bel(A) Total belief committed to A
Plausibility Function Pl(A) Maximum possible belief in A

Mass Function Properties:

  1. m(∅) = 0 (no belief assigned to empty set)

  2. Σ m(A) = 1 for all A ⊆ Θ

Belief and Plausibility:

$$Bel(A) = \sum_{B \subseteq A} m(B)$$
$$Pl(A) = \sum_{B \cap A \neq \emptyset} m(B) = 1 - Bel(\bar{A})$$

Relationship: Bel(A) ≤ P(A) ≤ Pl(A)

Dempster's Rule of Combination: For combining evidence from two independent sources:

$$m_{12}(A) = \frac{\sum_{B \cap C = A} m_1(B) \cdot m_2(C)}{1 - K}$$

Where conflict: $K = \sum_{B \cap C = \emptyset} m_1(B) \cdot m_2(C)$

Example: Medical Diagnosis

Frame of Discernment: Θ = {Flu, Cold, Allergy}

Evidence from Source 1 (Doctor 1):

  • m₁({Flu}) = 0.6

  • m₁({Cold}) = 0.3

  • m₁(Θ) = 0.1 (uncertainty)

Evidence from Source 2 (Doctor 2):

  • m₂({Flu}) = 0.4

  • m₂({Allergy}) = 0.4

  • m₂(Θ) = 0.2

Combining Evidence:

Diagram:

graphviz diagram

Combination Calculation Table:

m₁ × m₂ {Flu} 0.4 {Allergy} 0.4 Θ 0.2
{Flu} 0.6 {Flu} 0.24 ∅ 0.24 {Flu} 0.12
{Cold} 0.3 ∅ 0.12 ∅ 0.12 {Cold} 0.06
Θ 0.1 {Flu} 0.04 {Allergy} 0.04 Θ 0.02

Conflict: K = 0.24 + 0.12 + 0.12 = 0.48

Normalization factor: 1 - K = 0.52

Combined masses:

  • m₁₂({Flu}) = (0.24 + 0.12 + 0.04) / 0.52 = 0.40/0.52 ≈ 0.77

  • m₁₂({Cold}) = 0.06 / 0.52 ≈ 0.12

  • m₁₂({Allergy}) = 0.04 / 0.52 ≈ 0.08

  • m₁₂(Θ) = 0.02 / 0.52 ≈ 0.04

Belief and Plausibility for Flu:

  • Bel({Flu}) = m({Flu}) = 0.77

  • Pl({Flu}) = m({Flu}) + m(Θ) = 0.77 + 0.04 = 0.81

Advantages of D-S Theory:

  1. Handles ignorance explicitly (mass to Θ)

  2. Does not require complete probability assignments

  3. Combines evidence from multiple sources

  4. Distinguishes between uncertainty and disbelief

Disadvantages:

  1. Computationally expensive (2^n subsets)

  2. Conflict handling can be problematic

  3. Requires independent evidence sources

  4. May produce counterintuitive results with high conflict

Applications:

  • Expert systems

  • Sensor fusion

  • Target identification

  • Medical diagnosis

  • Risk assessment


Summary: Comparison of Uncertainty Handling Methods

Method Handles Ignorance Combines Evidence Complexity
Probability Theory No (must sum to 1) Bayes' Theorem Low
Bayesian Networks Limited Message passing Medium
Dempster-Shafer Yes (mass to Θ) Dempster's Rule High (2^n)
Fuzzy Logic Partial membership Fuzzy operators Medium

UNIT 8: Fuzzy and Rough Set


8.2-8.4 Fuzzy Systems & Fuzzy Set Representation


Q1. 📌 Find fuzzy set A∪B for given fuzzy sets A and B

[ Dec 2024 Q1(e)(i) | Frequency: 1 ]

Answer

Fuzzy Set Union (A∪B):

The union of two fuzzy sets A and B is defined by taking the maximum of the membership values for each element.

Formula:

$$\mu_{A \cup B}(x) = \max[\mu_A(x), \mu_B(x)]$$

Example:

Let Universal Set X = {1, 2, 3, 4, 5}

Element (x) μ_A(x) μ_B(x) μ_A∪B(x) = max(μ_A, μ_B)
1 0.2 0.6 0.6
2 0.5 0.3 0.5
3 0.8 0.7 0.8
4 0.4 0.9 0.9
5 0.1 0.2 0.2

Result:

  • A = {0.2/1, 0.5/2, 0.8/3, 0.4/4, 0.1/5}

  • B = {0.6/1, 0.3/2, 0.7/3, 0.9/4, 0.2/5}

  • A∪B = {0.6/1, 0.5/2, 0.8/3, 0.9/4, 0.2/5}


Q2. 📌 Find fuzzy set A∩B for given fuzzy sets A and B

[ Dec 2024 Q1(e)(ii) | Frequency: 1 ]

Answer

Fuzzy Set Intersection (A∩B):

The intersection of two fuzzy sets A and B is defined by taking the minimum of the membership values for each element.

Formula:

$$\mu_{A \cap B}(x) = \min[\mu_A(x), \mu_B(x)]$$

Example:

Using the same fuzzy sets A and B:

Element (x) μ_A(x) μ_B(x) μ_A∩B(x) = min(μ_A, μ_B)
1 0.2 0.6 0.2
2 0.5 0.3 0.3
3 0.8 0.7 0.7
4 0.4 0.9 0.4
5 0.1 0.2 0.1

Result:

  • A∩B = {0.2/1, 0.3/2, 0.7/3, 0.4/4, 0.1/5}

Q3. 📌 Find fuzzy set (A∩B)' for given fuzzy sets A and B

[ Dec 2024 Q1(e)(iii) | Frequency: 1 ]

Answer

Complement of Fuzzy Set Intersection (A∩B)':

First, find A∩B (intersection), then calculate its complement.

Complement Formula:

$$\mu_{A'}(x) = 1 - \mu_A(x)$$

Step 1: Find A∩B

From Q2: A∩B = {0.2/1, 0.3/2, 0.7/3, 0.4/4, 0.1/5}

Step 2: Find (A∩B)'

Element (x) μ_A∩B(x) μ_(A∩B)'(x) = 1 - μ_A∩B(x)
1 0.2 0.8
2 0.3 0.7
3 0.7 0.3
4 0.4 0.6
5 0.1 0.9

Result:

  • (A∩B)' = {0.8/1, 0.7/2, 0.3/3, 0.6/4, 0.9/5}

Q4. ⭐ Prove Commutative property of fuzzy sets with suitable example

[ Dec 2024 Q4(b)(i) | Frequency: 1 ]

Answer

Commutative Property:

For fuzzy sets A and B:

  • A ∪ B = B ∪ A (Union is commutative)

  • A ∩ B = B ∩ A (Intersection is commutative)

Proof:

For Union:

$$\mu_{A \cup B}(x) = \max[\mu_A(x), \mu_B(x)] = \max[\mu_B(x), \mu_A(x)] = \mu_{B \cup A}(x)$$

For Intersection:

$$\mu_{A \cap B}(x) = \min[\mu_A(x), \mu_B(x)] = \min[\mu_B(x), \mu_A(x)] = \mu_{B \cap A}(x)$$

Example:

Let X = {a, b, c}

  • A = {0.3/a, 0.7/b, 0.5/c}

  • B = {0.6/a, 0.4/b, 0.8/c}

Element μ_A μ_B A∪B = max B∪A = max A∩B = min B∩A = min
a 0.3 0.6 0.6 0.6 0.3 0.3
b 0.7 0.4 0.7 0.7 0.4 0.4
c 0.5 0.8 0.8 0.8 0.5 0.5

Verified: A∪B = B∪A = {0.6/a, 0.7/b, 0.8/c} and A∩B = B∩A = {0.3/a, 0.4/b, 0.5/c}


Q5. ⭐ Prove Associative property of fuzzy sets with suitable example

[ Dec 2024 Q4(b)(ii) | Frequency: 1 ]

Answer

Associative Property:

For fuzzy sets A, B, and C:

  • (A ∪ B) ∪ C = A ∪ (B ∪ C)

  • (A ∩ B) ∩ C = A ∩ (B ∩ C)

Proof:

For Union:

$$\mu_{(A \cup B) \cup C}(x) = \max[\max[\mu_A(x), \mu_B(x)], \mu_C(x)] = \max[\mu_A(x), \mu_B(x), \mu_C(x)]$$
$$\mu_{A \cup (B \cup C)}(x) = \max[\mu_A(x), \max[\mu_B(x), \mu_C(x)]] = \max[\mu_A(x), \mu_B(x), \mu_C(x)]$$

Example:

Let X = {1, 2}

  • A = {0.2/1, 0.5/2}

  • B = {0.4/1, 0.3/2}

  • C = {0.6/1, 0.8/2}

For Union:

Element μ_A μ_B μ_C A∪B (A∪B)∪C B∪C A∪(B∪C)
1 0.2 0.4 0.6 0.4 0.6 0.6 0.6
2 0.5 0.3 0.8 0.5 0.8 0.8 0.8

Verified: (A∪B)∪C = A∪(B∪C) = {0.6/1, 0.8/2}


Q6. ⭐ Prove Distributive property of fuzzy sets with suitable example

[ Dec 2024 Q4(b)(iii) | Frequency: 1 ]

Answer

Distributive Property:

For fuzzy sets A, B, and C:

  • A ∪ (B ∩ C) = (A ∪ B) ∩ (A ∪ C)

  • A ∩ (B ∪ C) = (A ∩ B) ∪ (A ∩ C)

Proof:

$$\mu_{A \cup (B \cap C)}(x) = \max[\mu_A(x), \min[\mu_B(x), \mu_C(x)]]$$
$$\mu_{(A \cup B) \cap (A \cup C)}(x) = \min[\max[\mu_A(x), \mu_B(x)], \max[\mu_A(x), \mu_C(x)]]$$

By properties of max and min functions, both expressions are equal.

Example:

Let X = {p, q}

  • A = {0.3/p, 0.6/q}

  • B = {0.5/p, 0.4/q}

  • C = {0.7/p, 0.2/q}

Element μ_A μ_B μ_C B∩C A∪(B∩C) A∪B A∪C (A∪B)∩(A∪C)
p 0.3 0.5 0.7 0.5 0.5 0.5 0.7 0.5
q 0.6 0.4 0.2 0.2 0.6 0.6 0.6 0.6

Verified: A∪(B∩C) = (A∪B)∩(A∪C) = {0.5/p, 0.6/q}


Q7. 🔥 Prove De-Morgan's laws for fuzzy sets with suitable example

[ Dec 2024 Q4(b)(iv) | Frequency: 1 ]

Answer

De-Morgan's Laws for Fuzzy Sets:

  • (A ∪ B)' = A' ∩ B'

  • (A ∩ B)' = A' ∪ B'

Proof of First Law:

LHS: $\mu_{(A \cup B)'}(x) = 1 - \max[\mu_A(x), \mu_B(x)]$

RHS: $\mu_{A' \cap B'}(x) = \min[1-\mu_A(x), 1-\mu_B(x)]$

Since $1 - \max[a, b] = \min[1-a, 1-b]$, LHS = RHS

Example:

Let X = {1, 2, 3}

  • A = {0.2/1, 0.7/2, 0.4/3}

  • B = {0.5/1, 0.3/2, 0.6/3}

Element μ_A μ_B A∪B (A∪B)' A' B' A'∩B'
1 0.2 0.5 0.5 0.5 0.8 0.5 0.5
2 0.7 0.3 0.7 0.3 0.3 0.7 0.3
3 0.4 0.6 0.6 0.4 0.6 0.4 0.4

Verified: (A∪B)' = A'∩B' = {0.5/1, 0.3/2, 0.4/3}

Diagram:

plantuml diagram

8.7 Rough Set Theory


Q8. 🔥 Explain Rough Set Theory

[ Dec 2022 Q5(d) | Frequency: 1 ]

Answer

Rough Set Theory:

Rough Set Theory, introduced by Zdzisław Pawlak in 1982, is a mathematical approach to deal with imprecision, vagueness, and uncertainty in data. Unlike fuzzy sets, rough sets work with indiscernibility based on available information.

Key Concepts:

Concept Description
Information System A table with objects (rows) and attributes (columns)
Indiscernibility Relation Objects that cannot be distinguished based on attributes
Equivalence Classes Groups of indiscernible objects
Approximation Space Pair (U, R) where U is universe and R is equivalence relation

Approximations:

  1. Lower Approximation (R_X): Objects that certainly belong to set X
  2. Contains objects whose equivalence class is completely contained in X

  3. Upper Approximation (R̄_X): Objects that possibly belong to set X

  4. Contains objects whose equivalence class has non-empty intersection with X

  5. Boundary Region: Objects that cannot be classified with certainty

  6. Boundary = Upper Approximation - Lower Approximation

Diagram:

ditaa diagram

Example:

Consider patients with symptoms and diagnosis:

Patient Headache Fever Flu
P1 Yes Yes Yes
P2 Yes Yes Yes
P3 Yes No No
P4 No Yes No
P5 No No No

For X = {Patients with Flu} = {P1, P2}

  • Indiscernibility: P1 and P2 are indiscernible (same symptoms)

  • Lower Approximation: {P1, P2} (certainly have Flu)

  • Upper Approximation: {P1, P2} (possibly have Flu)

  • Boundary: Empty (crisp classification possible)

Properties of Rough Sets:

Property Formula
Lower ⊆ Set ⊆ Upper R_X ⊆ X ⊆ R̄_X
Empty Set R_∅ = R̄_∅ = ∅
Universal Set R_U = R̄_U = U
Complement R_(X') = (R̄_X)'

Applications:

  • Data Mining: Feature selection and reduction

  • Medical Diagnosis: Handling incomplete patient data

  • Pattern Recognition: Classification with uncertain data

  • Decision Support Systems: Rule extraction from data

Comparison: Fuzzy Sets vs Rough Sets:

Aspect Fuzzy Sets Rough Sets
Uncertainty Type Membership degree (0 to 1) Boundary region
Knowledge Subjective membership Objective from data
Representation μ_A(x) ∈ [0,1] Lower & Upper approximations
Origin Zadeh (1965) Pawlak (1982)

Diagram:

flowchart TB subgraph RST["Rough Set Theory Components"] IS["Information System<br/>(Objects × Attributes)"] IR["Indiscernibility<br/>Relation"] EC["Equivalence<br/>Classes"] subgraph AP["Approximations"] LA["Lower Approximation<br/>(Certain Members)"] UA["Upper Approximation<br/>(Possible Members)"] BR["Boundary Region<br/>(Uncertain)"] end end IS --> IR IR --> EC EC --> LA EC --> UA UA --> BR LA --> BR style LA fill:#90EE90 style UA fill:#FFD700 style BR fill:#FFA07A

Summary: Fuzzy Set Operations Quick Reference

Operation Formula Description
Union (A∪B) max[μ_A(x), μ_B(x)] Take maximum membership
Intersection (A∩B) min[μ_A(x), μ_B(x)] Take minimum membership
Complement (A') 1 - μ_A(x) Subtract from 1

Properties Summary

Property Union Intersection
Commutative A∪B = B∪A A∩B = B∩A
Associative (A∪B)∪C = A∪(B∪C) (A∩B)∩C = A∩(B∩C)
Distributive A∪(B∩C) = (A∪B)∩(A∪C) A∩(B∪C) = (A∩B)∪(A∩C)
De-Morgan (A∪B)' = A'∩B' (A∩B)' = A'∪B'
Idempotent A∪A = A A∩A = A
Identity A∪∅ = A A∩U = A

BLOCK 3: MACHINE LEARNING - I

UNIT 9: Introduction to Machine Learning Methods


Q1. 📌 State and discuss the basic steps of machine learning cycle.

[ Dec 2024 Q1(f) | Frequency: 1 ]

Answer

The Machine Learning Cycle consists of systematic steps to build and deploy ML models:

Step Description
1. Data Collection Gathering relevant data from various sources (databases, APIs, sensors)
2. Data Preparation Cleaning, transforming, and preprocessing raw data for analysis
3. Feature Engineering Selecting and creating meaningful features from data
4. Model Selection Choosing appropriate algorithm based on problem type
5. Model Training Feeding prepared data to train the model
6. Model Evaluation Testing model performance using metrics (accuracy, precision, recall)
7. Model Tuning Optimizing hyperparameters to improve performance
8. Deployment Integrating model into production environment
9. Monitoring Continuously tracking model performance and retraining if needed

Q2. 🔥 Draw block diagram for the machine learning cycle.

[ Jun 2024 Q1(e), Jun 2022 Q2(b) | Frequency: 2 ]

Answer

Diagram:

plantuml diagram

Key Components:

  • Data Collection → Preparation: Raw data is collected and cleaned

  • Feature Engineering → Model Selection: Features extracted and algorithm chosen

  • Training → Evaluation → Tuning: Iterative process until optimal performance

  • Deployment → Monitoring: Model goes live with continuous monitoring


Q3. 📌 List the steps involved in machine learning cycle.

[ Jun 2024 Q1(e) | Frequency: 1 ]

Answer

Steps in Machine Learning Cycle:

  1. Problem Definition - Clearly define the objective and success criteria

  2. Data Collection - Gather data from relevant sources

  3. Data Preprocessing - Clean, normalize, and handle missing values

  4. Exploratory Data Analysis (EDA) - Understand data patterns and distributions

  5. Feature Engineering - Create and select important features

  6. Data Splitting - Divide data into training, validation, and test sets

  7. Model Selection - Choose appropriate ML algorithm

  8. Model Training - Train model on training dataset

  9. Model Evaluation - Assess performance using appropriate metrics

  10. Hyperparameter Tuning - Optimize model parameters

  11. Model Deployment - Deploy model to production

  12. Monitoring & Maintenance - Track performance and update as needed


Q4. 📌 Discuss the role of each component of machine learning cycle briefly.

[ Jun 2022 Q2(b) | Frequency: 1 ]

Answer
Component Role
Data Collection Acquiring sufficient, relevant, and quality data that represents the problem domain
Data Preprocessing Transforming raw data into usable format by handling missing values, outliers, and noise
Feature Engineering Extracting and selecting features that best represent the underlying patterns
Model Selection Choosing algorithm that suits the problem type (classification, regression, clustering)
Training Algorithm learns patterns from data by adjusting internal parameters
Evaluation Measuring model performance using metrics like accuracy, F1-score, RMSE
Tuning Adjusting hyperparameters to achieve optimal model performance
Deployment Making the trained model available for real-world predictions
Monitoring Tracking model behavior in production and detecting performance degradation

Q5. 📌 Differentiate between Machine Learning and Data Mining.

[ Jun 2022 Q3(b) | Frequency: 1 ]

Answer
Aspect Machine Learning Data Mining
Definition Algorithms that learn from data to make predictions Process of discovering patterns in large datasets
Primary Goal Build predictive models Extract hidden knowledge and patterns
Focus Learning and generalizing from data Finding interesting relationships in data
Approach Uses known patterns to predict unknown outcomes Discovers unknown patterns from existing data
Output Trained model for predictions Rules, patterns, associations, clusters
Data Requirement Labeled/unlabeled data for training Large historical databases
Human Intervention Less (automated learning) More (domain expertise needed)
Example Spam email classifier Market basket analysis
Techniques Neural networks, SVM, Decision trees Association rules, clustering, anomaly detection

Q6. 📌 Give an example of Machine Learning.

[ Jun 2022 Q3(b) | Frequency: 1 ]

Answer

Example: Email Spam Detection

  • Problem: Classify incoming emails as spam or not spam

  • Data: Historical emails labeled as spam/not-spam

  • Features: Words frequency, sender information, links, attachments

  • Algorithm: Naive Bayes or Neural Network

  • Training: Model learns patterns distinguishing spam from legitimate emails

  • Prediction: New emails are automatically classified

Other Examples:

  • Netflix movie recommendations

  • Voice assistants (Siri, Alexa)

  • Fraud detection in banking

  • Self-driving cars

  • Medical diagnosis systems


Q7. 📌 Give an example of Data Mining.

[ Jun 2022 Q3(b) | Frequency: 1 ]

Answer

Example: Market Basket Analysis in Retail

  • Problem: Discover which products are frequently bought together

  • Data: Transaction records from supermarket

  • Technique: Association Rule Mining (Apriori Algorithm)

  • Discovery: "Customers who buy bread also tend to buy butter (confidence: 85%)"

  • Application: Product placement, cross-selling strategies

Other Examples:

  • Customer segmentation in banking

  • Detecting fraudulent insurance claims

  • Identifying disease outbreaks from health records

  • Social network analysis

  • Web usage mining for website optimization


Q8. 🔥 Discuss Supervised Learning with suitable example.

[ Dec 2024 Q1(a), Dec 2022 Q2(b)(ii) | Frequency: 2 ]

Answer

Supervised Learning is a type of machine learning where the model is trained on labeled data - data with known input-output pairs. The algorithm learns to map inputs to correct outputs.

Characteristics:

  • Requires labeled training data

  • Goal is to learn a mapping function: Y = f(X)

  • Used for classification and regression tasks

  • Model is "supervised" by correct answers

Working Process:

  1. Provide training data with labels

  2. Algorithm learns patterns from input-output relationships

  3. Model is evaluated on test data

  4. Trained model predicts outputs for new inputs

Diagram:

flowchart LR A[Labeled Training Data] --> B[Learning Algorithm] B --> C[Trained Model] D[New Input] --> C C --> E[Predicted Output] subgraph Training Phase A B end subgraph Prediction Phase D C E end

Example: House Price Prediction

  • Input Features: Size, location, bedrooms, age

  • Label: Actual house price

  • Training: Model learns relationship between features and prices

  • Prediction: Estimate price for new houses

Algorithms: Linear Regression, Decision Trees, SVM, Neural Networks, Random Forest


Q9. 🔥 Discuss Unsupervised Learning with suitable example.

[ Dec 2024 Q1(a), Dec 2022 Q2(b)(iii) | Frequency: 2 ]

Answer

Unsupervised Learning is a type of machine learning where the model is trained on unlabeled data - data without predefined categories or outcomes. The algorithm discovers hidden patterns or structures.

Characteristics:

  • No labeled data required

  • Goal is to find hidden structure in data

  • Used for clustering, association, dimensionality reduction

  • Algorithm finds patterns on its own

Working Process:

  1. Provide unlabeled data

  2. Algorithm identifies patterns, similarities, or groupings

  3. Output reveals data structure or clusters

  4. Human interprets the discovered patterns

Diagram:

flowchart LR A[Unlabeled Data] --> B[Learning Algorithm] B --> C[Discovered Patterns] C --> D[Clusters/Groups] C --> E[Associations] C --> F[Reduced Dimensions]

Example: Customer Segmentation

  • Input: Customer purchase history, demographics, browsing behavior

  • No Labels: Categories are unknown

  • Process: Algorithm groups similar customers together

  • Output: Customer segments (e.g., "budget shoppers," "premium customers")

  • Application: Targeted marketing strategies

Algorithms: K-Means Clustering, Hierarchical Clustering, PCA, Apriori Algorithm, DBSCAN


Q10. ⭐ Compare Supervised Learning and Unsupervised Learning.

[ Jun 2022 Q1(f) | Frequency: 1 ]

Answer
Aspect Supervised Learning Unsupervised Learning
Data Type Labeled data Unlabeled data
Goal Predict outcomes Discover patterns
Training Uses input-output pairs Uses only input data
Feedback Direct feedback available No feedback
Complexity Less complex More complex
Accuracy Higher accuracy Lower accuracy
Human Effort More (for labeling) Less
Applications Classification, Regression Clustering, Association
Algorithms SVM, Decision Trees, Linear Regression K-Means, PCA, Apriori
Example Spam detection, Price prediction Customer segmentation, Anomaly detection
Output Predicted class/value Groups/clusters/rules

Q11. 📌 List the algorithms for Supervised Learning.

[ Jun 2022 Q1(f) | Frequency: 1 ]

Answer

Classification Algorithms:

  1. Logistic Regression

  2. Decision Trees

  3. Random Forest

  4. Support Vector Machine (SVM)

  5. K-Nearest Neighbors (KNN)

  6. Naive Bayes

  7. Neural Networks

  8. Gradient Boosting (XGBoost, AdaBoost)

Regression Algorithms:

  1. Linear Regression

  2. Polynomial Regression

  3. Ridge Regression

  4. Lasso Regression

  5. Decision Tree Regression

  6. Random Forest Regression

  7. Support Vector Regression (SVR)

  8. Neural Network Regression


Q12. 📌 List the algorithms for Unsupervised Learning.

[ Jun 2022 Q1(f) | Frequency: 1 ]

Answer

Clustering Algorithms:

  1. K-Means Clustering

  2. Hierarchical Clustering

  3. DBSCAN (Density-Based)

  4. Gaussian Mixture Models

  5. Mean Shift Clustering

Association Rule Algorithms:

  1. Apriori Algorithm

  2. FP-Growth Algorithm

  3. Eclat Algorithm

Dimensionality Reduction:

  1. Principal Component Analysis (PCA)

  2. t-SNE (t-Distributed Stochastic Neighbor Embedding)

  3. Linear Discriminant Analysis (LDA)

  4. Singular Value Decomposition (SVD)

Anomaly Detection:

  1. Isolation Forest

  2. One-Class SVM

  3. Local Outlier Factor (LOF)


Q13. 📌 Discuss Rote Learning with suitable example.

[ Dec 2022 Q2(b)(i) | Frequency: 1 ]

Answer

Rote Learning is the simplest form of machine learning where the system memorizes training examples exactly and retrieves them during inference without generalization.

Characteristics:

  • Direct Memorization: Stores input-output pairs exactly

  • No Generalization: Cannot handle new, unseen examples

  • Fast Retrieval: Quick lookup from memory

  • Memory Intensive: Requires large storage for all examples

  • Exact Matching: Only works for previously seen inputs

Working Process:

  1. Store all training examples in memory

  2. When new input arrives, search for exact match

  3. Return stored output for matched input

  4. If no match found, cannot provide answer

Example: Lookup Table for Multiplication

Input (A × B) Output
2 × 3 6
5 × 7 35
8 × 9 72
  • System memorizes all multiplication facts

  • Query: 5 × 7 → Retrieves stored answer: 35

  • Query: 11 × 13 → Cannot answer (not memorized)

Limitations:

  • Cannot generalize to new situations

  • Impractical for large problem spaces

  • No learning of underlying concepts


Q14. 🔥 Compare/Discuss Descriptive, Predictive, and Prescriptive Analytics in Machine Learning.

[ Jun 2025 Q1(b), Jun 2024 Q2(a), Jun 2023 Q1(a) | Frequency: 3 ]

Answer
Aspect Descriptive Analytics Predictive Analytics Prescriptive Analytics
Question Answered "What happened?" "What will happen?" "What should we do?"
Purpose Summarize historical data Forecast future outcomes Recommend optimal actions
Time Focus Past Future Future actions
Complexity Low Medium High
Techniques Statistics, visualization ML algorithms, forecasting Optimization, simulation
Output Reports, dashboards Predictions, probabilities Recommendations, decisions
Data Used Historical data Historical + current data All available data
Human Role Interpret results Validate predictions Implement recommendations

Diagram:

plantuml diagram

Examples:

Analytics Type Business Example
Descriptive "Last quarter sales were $2M, 15% higher than previous quarter"
Predictive "Next quarter sales predicted to be $2.3M based on trends"
Prescriptive "Increase marketing budget by 20% in Region A to maximize sales"

Q15. 📌 Discuss Rule-based Machine Learning with suitable example.

[ Jun 2023 Q2(b)(i) | Frequency: 1 ]

Answer

Rule-based Machine Learning uses a set of IF-THEN rules to make predictions or decisions. These rules can be created by domain experts or learned from data.

Characteristics:

  • Uses explicit IF-THEN rules

  • Highly interpretable and explainable

  • Easy to modify and maintain

  • Works well with structured knowledge

  • May not handle complex patterns

Structure of Rules:

IF condition1 AND condition2 THEN action/conclusion

Learning Approaches:

  1. Expert-defined Rules: Domain experts create rules manually

  2. Rule Induction: Algorithms learn rules from data (RIPPER, OneR)

  3. Association Rule Mining: Discover rules from patterns (Apriori)

Example: Medical Diagnosis System

Rule 1: IF fever > 101°F AND cough = yes AND fatigue = yes 
        THEN diagnosis = "Flu"

Rule 2: IF fever > 101°F AND rash = yes AND joint_pain = yes 
        THEN diagnosis = "Dengue"

Rule 3: IF fever < 99°F AND runny_nose = yes AND sneezing = yes 
        THEN diagnosis = "Common Cold"

Advantages:

  • Transparent decision-making

  • Easy to audit and validate

  • Domain knowledge integration

Disadvantages:

  • Difficult to handle exceptions

  • May conflict or overlap

  • Doesn't generalize well


Q16. 📌 Discuss Bayesian Algorithms with suitable example.

[ Jun 2023 Q2(b)(ii) | Frequency: 1 ]

Answer

Bayesian Algorithms are based on Bayes' Theorem, which calculates the probability of a hypothesis given observed evidence. They provide probabilistic predictions with uncertainty quantification.

Bayes' Theorem:

$$P(H|E) = \frac{P(E|H) \times P(H)}{P(E)}$$

Where:

  • P(H|E): Posterior probability (probability of hypothesis given evidence)

  • P(E|H): Likelihood (probability of evidence given hypothesis)

  • P(H): Prior probability (initial belief)

  • P(E): Evidence probability

Common Bayesian Algorithms:

  1. Naive Bayes Classifier

  2. Bayesian Networks

  3. Bayesian Linear Regression

  4. Gaussian Naive Bayes

Example: Email Spam Classification using Naive Bayes

Training Data Analysis:

  • P(Spam) = 0.3 (30% of emails are spam)

  • P("Free" | Spam) = 0.8

  • P("Free" | Not Spam) = 0.1

Prediction for new email containing "Free":

P(Spam | "Free") = P("Free"|Spam) × P(Spam) / P("Free")
                 = (0.8 × 0.3) / P("Free")
                 = High probability → Classify as SPAM

Advantages:

  • Handles uncertainty naturally

  • Works well with small datasets

  • Provides probability scores

  • Fast training and prediction

Applications: Spam filtering, sentiment analysis, medical diagnosis, recommendation systems


Q17. 📌 What is Reinforcement Learning?

[ Dec 2022 Q1(f) | Frequency: 1 ]

Answer

Reinforcement Learning (RL) is a type of machine learning where an agent learns to make decisions by interacting with an environment. The agent receives rewards or penalties based on its actions and learns to maximize cumulative rewards over time.

Key Concepts:

  • Agent: The learner/decision-maker

  • Environment: The world the agent interacts with

  • State: Current situation of the agent

  • Action: Possible moves the agent can make

  • Reward: Feedback signal (positive or negative)

  • Policy: Strategy for choosing actions

Learning Process:

  1. Agent observes current state

  2. Agent takes an action based on policy

  3. Environment provides reward and new state

  4. Agent updates policy to maximize future rewards

  5. Process repeats until goal is achieved

Examples:

  • Game playing (Chess, Go, Video games)

  • Robot navigation

  • Autonomous vehicles

  • Stock trading algorithms


Q18. ⭐ Discuss Reinforcement Learning.

[ Dec 2024 Q1(a) | Frequency: 1 ]

Answer

Reinforcement Learning (RL) is a learning paradigm where an agent learns optimal behavior through trial and error interactions with the environment, guided by reward signals.

Core Components:

Component Description
Agent Entity that takes actions and learns
Environment External system agent interacts with
State (S) Current situation representation
Action (A) Set of possible moves
Reward (R) Numerical feedback for actions
Policy (π) Mapping from states to actions
Value Function Expected cumulative reward from state

Types of RL:

  1. Positive Reinforcement: Reward for good actions

  2. Negative Reinforcement: Penalty for bad actions

Key Characteristics:

  • No labeled data required

  • Learns from consequences of actions

  • Balances exploration vs exploitation

  • Goal is long-term reward maximization

Applications:

  • AlphaGo (game playing)

  • Self-driving cars

  • Robotics control

  • Resource management

  • Personalized recommendations


Q19. 🔥 Explain Reinforcement Learning with block diagram and component roles.

[ Dec 2023 Q4(a) | Frequency: 1 ]

Answer

Reinforcement Learning is a goal-oriented learning approach where an agent learns to achieve goals by interacting with an environment.

Diagram:

plantuml diagram

Component Roles:

Component Role
Agent Decision-maker that selects actions based on policy to maximize rewards
Environment External system that responds to agent's actions with new states and rewards
State (St) Complete description of current situation; basis for decision-making
Action (At) Choice made by agent from available options; affects environment
Reward (Rt) Immediate feedback signal; guides learning toward goal
Policy (π) Strategy mapping states to actions; can be deterministic or probabilistic
Value Function Estimates expected cumulative future reward from each state
Q-Function Estimates value of taking specific action in specific state

Learning Loop:

  1. Agent observes state St

  2. Agent selects action At using policy

  3. Environment transitions to state St+1

  4. Environment provides reward Rt+1

  5. Agent updates policy based on experience

  6. Repeat until convergence


Q20. 📌 Classify the various Reinforcement Learning algorithms.

[ Dec 2022 Q1(f) | Frequency: 1 ]

Answer

Diagram:

plantuml diagram

Classification Summary:

Category Subcategory Algorithms
Model-Based - Dynamic Programming, Dyna-Q
Model-Free Value-Based Q-Learning, SARSA, DQN
Policy-Based REINFORCE, Policy Gradient
Actor-Critic A2C, A3C, PPO, SAC, DDPG
On-Policy - SARSA, A2C, PPO
Off-Policy - Q-Learning, DQN, DDPG

Q21. 📌 Discuss Delayed-Reinforcement Learning with suitable example.

[ Dec 2022 Q2(b)(iv) | Frequency: 1 ]

Answer

Delayed Reinforcement Learning refers to situations where rewards are not received immediately after an action but are delayed until much later, often after a sequence of actions.

Characteristics:

  • Reward signal comes after many time steps

  • Difficult to attribute credit to specific actions

  • Requires temporal credit assignment

  • Common in real-world problems

Challenges:

  1. Credit Assignment Problem: Which actions led to the reward?

  2. Long-term Dependencies: Early actions affect late rewards

  3. Exploration: Must try sequences to discover rewards

  4. Sparse Rewards: Few feedback signals to learn from

Example: Chess Game

  • Agent plays hundreds of moves during a game

  • Reward: +1 for winning, -1 for losing, 0 for draw

  • Reward received only at game end

  • Challenge: Which moves were responsible for the outcome?

Timeline:

Move 1 → Move 2 → Move 3 → ... → Move 50 → Win (+1 Reward)
  ↑
Which move was crucial for winning?

Solutions:

  1. Temporal Difference (TD) Learning: Estimate value at each step

  2. Eligibility Traces: Track which states were visited

  3. Monte Carlo Methods: Average rewards over complete episodes

  4. Reward Shaping: Add intermediate rewards to guide learning

Other Examples:

  • Robot learning to walk (reward only when destination reached)

  • Stock trading (profit/loss realized later)

  • Drug treatment effects (outcomes visible after months)


Q22. ⭐ Compare Model-free Reinforcement Learning with Model-based Reinforcement Learning.

[ Jun 2024 Q4(a) | Frequency: 1 ]

Answer
Aspect Model-Free RL Model-Based RL
Definition Learns directly from experience without modeling environment Learns a model of environment, then plans
Environment Model Not required Required (learned or given)
Sample Efficiency Low (needs many interactions) High (can plan with model)
Computation Less during training More (model learning + planning)
Flexibility Adapts easily to changes Model may become outdated
Accuracy Limited by experience Limited by model accuracy
Learning Speed Slower Faster with good model
Memory Stores value/policy Stores model of dynamics
Planning No explicit planning Can simulate and plan ahead
Examples Q-Learning, SARSA, Policy Gradient Dyna-Q, PILCO, World Models

Diagram:

flowchart TB subgraph "Model-Free RL" A1[Agent] --> A2[Interact with Environment] A2 --> A3[Learn Policy/Value Directly] A3 --> A1 end subgraph "Model-Based RL" B1[Agent] --> B2[Learn Environment Model] B2 --> B3[Plan Using Model] B3 --> B4[Execute Action] B4 --> B1 end

When to Use:

  • Model-Free: When environment is complex, hard to model

  • Model-Based: When sample efficiency is critical, model is simple


Q23. ⭐ Discuss the sub-classes of Model-free Reinforcement Learning.

[ Jun 2024 Q4(a) | Frequency: 1 ]

Answer

Model-Free RL has three main sub-classes:

Diagram:

plantuml diagram

1. Value-Based Methods:

  • Learn value function (V or Q)

  • Policy derived from value function

  • Algorithms: Q-Learning, SARSA, DQN

Feature Description
Learning Estimates expected rewards for state-action pairs
Policy Implicit (choose action with max value)
Best For Discrete action spaces
Example Q-Learning: Q(s,a) ← Q(s,a) + α[R + γmax Q(s',a') - Q(s,a)]

2. Policy-Based Methods:

  • Learn policy directly (π: state → action)

  • No value function required

  • Algorithms: REINFORCE, Policy Gradient

Feature Description
Learning Directly optimizes policy parameters
Output Probability distribution over actions
Best For Continuous action spaces, stochastic policies
Advantage Can learn stochastic policies

3. Actor-Critic Methods:

  • Combines value-based and policy-based

  • Actor: Learns policy

  • Critic: Evaluates actions using value function

  • Algorithms: A2C, A3C, PPO, SAC

Feature Description
Actor Updates policy in direction suggested by critic
Critic Estimates value function to guide actor
Advantage Lower variance than pure policy gradient
Best For Complex, high-dimensional problems

Q24. ⭐ Discuss Ensemble Learning / What is Ensemble Learning.

[ Dec 2024 Q1(g), Jun 2023 Q1(f) | Frequency: 2 ]

Answer

Ensemble Learning is a machine learning technique that combines multiple models (base learners) to produce a more accurate and robust prediction than any individual model.

Core Principle: "Wisdom of the crowd" - combining diverse opinions often yields better results.

Key Ensemble Methods:

Method Description Algorithms
Bagging Train models on random subsets; aggregate by voting/averaging Random Forest
Boosting Train models sequentially; each corrects predecessor's errors AdaBoost, XGBoost, Gradient Boosting
Stacking Train meta-model on predictions of base models Stacked Generalization
Voting Combine predictions by majority vote (classification) or average (regression) Voting Classifier

Diagram:

flowchart TB D[Training Data] --> M1[Model 1] D --> M2[Model 2] D --> M3[Model 3] D --> M4[Model N] M1 --> C[Combine Predictions] M2 --> C M3 --> C M4 --> C C --> F[Final Prediction]

Advantages:

  • Higher accuracy than individual models

  • Reduced overfitting

  • More robust predictions

  • Handles noise better

Example - Random Forest:

  • Combines multiple decision trees

  • Each tree trained on different data subset

  • Final prediction: majority vote of all trees


Q25. 📌 Explain Ensemble Methods.

[ Dec 2022 Q5(g) | Frequency: 1 ]

Answer

Ensemble Methods combine predictions from multiple machine learning models to improve overall performance.

Types of Ensemble Methods:

1. Bagging (Bootstrap Aggregating):

  • Creates multiple training sets by sampling with replacement

  • Trains independent models on each set

  • Aggregates predictions (voting/averaging)

  • Example: Random Forest

2. Boosting:

  • Trains models sequentially

  • Each model focuses on errors of previous models

  • Weighted combination of models

  • Examples: AdaBoost, Gradient Boosting, XGBoost

3. Stacking:

  • Trains diverse base models

  • Uses another model (meta-learner) to combine predictions

  • More complex but often more accurate

4. Voting:

  • Hard Voting: Majority class prediction

  • Soft Voting: Average of predicted probabilities

Diagram:

plantuml diagram

Comparison:

Aspect Bagging Boosting Stacking
Training Parallel Sequential Two-level
Focus Reduce variance Reduce bias Combine strengths
Overfitting Low risk Higher risk Moderate
Complexity Low Medium High

Summary Table: Unit 9 - Machine Learning Methods

Topic Key Points
ML Cycle Data Collection → Preparation → Training → Evaluation → Deployment
Supervised Learning Labeled data, classification/regression
Unsupervised Learning Unlabeled data, clustering/association
Reinforcement Learning Agent-environment interaction, rewards
Analytics Types Descriptive → Predictive → Prescriptive
Ensemble Methods Bagging, Boosting, Stacking
Model-Free RL Value-based, Policy-based, Actor-Critic

UNIT 10: Classification


Q1. ⭐ Differentiate between Lazy Learners and Eager Learners in Classification.

[ Dec 2023 Q1(e), Dec 2022 Q3(b) | Frequency: 2 ]

Answer
Aspect Lazy Learners Eager Learners
Definition Defer processing until a query is made Build classification model during training
Training Time Minimal (just stores data) High (learns from training data)
Prediction Time Slow (processes at query time) Fast (model already built)
Memory Usage High (stores all training data) Low (stores only model)
Model No explicit model created Creates explicit model
Adaptability Easily adapts to new data Requires retraining for new data
Also Called Instance-based learning Model-based learning
Generalization Local approximation Global approximation
Handling Changes Better with dynamic data Better with static data

Key Difference: Lazy learners wait until prediction time to generalize, while eager learners generalize during the training phase itself.


Q2. ⭐ List/Give examples of algorithms used for Lazy Learners.

[ Dec 2023 Q1(e), Dec 2022 Q3(b) | Frequency: 2 ]

Answer

Lazy Learner Algorithms:

  1. K-Nearest Neighbors (KNN)
  2. Classifies based on majority class of K nearest neighbors
  3. Stores all training instances

  4. Case-Based Reasoning (CBR)

  5. Solves new problems using solutions from similar past cases
  6. Maintains case library

  7. Locally Weighted Learning (LWL)

  8. Builds local models around query point
  9. Weights instances by proximity

  10. Radius Neighbors Classifier

  11. Similar to KNN but uses fixed radius instead of K neighbors

  12. Lazy Bayesian Rules

  13. Constructs Bayesian rules only when needed

Q3. ⭐ List/Give examples of algorithms used for Eager Learners.

[ Dec 2023 Q1(e), Dec 2022 Q3(b) | Frequency: 2 ]

Answer

Eager Learner Algorithms:

  1. Decision Trees (ID3, C4.5, CART)
  2. Builds tree structure during training

  3. Naive Bayes Classifier

  4. Learns probability distributions from training data

  5. Support Vector Machine (SVM)

  6. Constructs optimal hyperplane during training

  7. Neural Networks

  8. Learns weights through backpropagation

  9. Logistic Regression

  10. Learns model parameters during training

  11. Random Forest

  12. Builds ensemble of decision trees

  13. Linear Discriminant Analysis (LDA)

  14. Computes discriminant functions during training

Q4. ⭐ Discuss the concept of Classification.

[ Jun 2025 Q1(f) | Frequency: 1 ]

Answer

Classification is a supervised machine learning technique that categorizes data into predefined classes or labels based on input features.

Key Characteristics:

  • Supervised Learning: Requires labeled training data

  • Discrete Output: Predicts categorical class labels

  • Pattern Recognition: Learns decision boundaries from data

Working Process:

Diagram:

plantuml diagram

Components:

  1. Training Phase: Algorithm learns from labeled examples

  2. Model Building: Creates decision boundaries

  3. Prediction Phase: Classifies new, unseen instances

Types:

  • Binary Classification (2 classes)

  • Multi-class Classification (>2 classes)

  • Multi-label Classification (multiple labels per instance)

Applications:

  • Spam detection, Medical diagnosis, Image recognition, Sentiment analysis

Q5. 🔥 Give/List the algorithms for Classification.

[ Jun 2025 Q1(f), Jun 2024 Q1(g), Dec 2023 Q1(a) | Frequency: 3 ]

Answer

Classification Algorithms:

Category Algorithms
Tree-Based Decision Tree, Random Forest, Gradient Boosting, XGBoost
Probabilistic Naive Bayes, Bayesian Networks
Linear Logistic Regression, Linear Discriminant Analysis (LDA)
Instance-Based K-Nearest Neighbors (KNN)
Kernel-Based Support Vector Machine (SVM)
Neural Networks Perceptron, Multi-layer Perceptron, Deep Neural Networks
Ensemble AdaBoost, Bagging, Voting Classifier

Most Commonly Used:

  1. Logistic Regression - Simple, interpretable

  2. Decision Trees - Easy to visualize and understand

  3. Random Forest - High accuracy, handles overfitting

  4. SVM - Effective in high-dimensional spaces

  5. KNN - Simple, no training phase

  6. Naive Bayes - Fast, works well with text data

  7. Neural Networks - Complex pattern recognition


Q6. 📌 Give suitable example for Classification.

[ Jun 2024 Q1(g) | Frequency: 1 ]

Answer

Example 1: Email Spam Classification

Feature Value
Contains "Free" Yes
Contains "Winner" Yes
Sender in contacts No
Has attachments Yes
Class Label SPAM

Example 2: Medical Diagnosis

  • Input: Patient symptoms, test results, age, medical history

  • Output: Disease category (Diabetic / Non-Diabetic)

Example 3: Loan Approval

  • Features: Income, credit score, employment status, debt ratio

  • Classes: Approved / Rejected

Example 4: Image Classification

  • Input: Pixel values of an image

  • Output: Cat / Dog / Bird

Example 5: Sentiment Analysis

  • Input: Customer review text

  • Output: Positive / Negative / Neutral


Q7. ⭐ Compare/Differentiate Clustering and Classification.

[ Jun 2024 Q1(g), Dec 2023 Q1(a) | Frequency: 2 ]

Answer
Aspect Classification Clustering
Learning Type Supervised Unsupervised
Labels Requires labeled data No labels required
Goal Predict predefined classes Discover natural groupings
Categories Known beforehand Unknown, discovered from data
Output Class label for each instance Cluster assignment
Training Learns from examples Finds patterns in data
Evaluation Accuracy, Precision, Recall Silhouette score, Inertia
Algorithms SVM, Decision Trees, KNN K-Means, DBSCAN, Hierarchical
Use Case Spam detection Customer segmentation
Process Assign to existing categories Create new categories

Diagram:

flowchart LR subgraph Classification A1[Labeled Data] --> B1[Train Model] B1 --> C1[Predict Class] end subgraph Clustering A2[Unlabeled Data] --> B2[Find Patterns] B2 --> C2[Discover Groups] end

Q8. 📌 What is Binary Classification?

[ Jun 2023 Q3(b) | Frequency: 1 ]

Answer

Binary Classification is a type of classification where the output has exactly two possible classes.

Characteristics:

  • Only two mutually exclusive categories

  • Most fundamental classification type

  • Output: Class 0 or Class 1 (Yes/No, True/False)

Examples:

Problem Class 0 Class 1
Email filtering Not Spam Spam
Medical test Negative Positive
Loan approval Reject Approve
Fraud detection Legitimate Fraudulent
Sentiment Negative Positive

Common Algorithms:

  • Logistic Regression

  • Support Vector Machine (SVM)

  • Decision Trees

  • Neural Networks with sigmoid output

Evaluation Metrics:

  • Accuracy, Precision, Recall, F1-Score

  • ROC-AUC Curve

  • Confusion Matrix (2×2)


Q9. ⭐ Differentiate between Multi-class Classification and Multi-label Classification.

[ Jun 2022 Q3(b) | Frequency: 1 ]

Answer
Aspect Multi-class Classification Multi-label Classification
Definition Each instance belongs to exactly one class from multiple classes Each instance can belong to multiple classes simultaneously
Classes per Instance Exactly one Zero, one, or more
Mutual Exclusivity Classes are mutually exclusive Classes are NOT mutually exclusive
Output Format Single class label Set of class labels
Example Problem Fruit classification (Apple/Orange/Banana) Movie genre (Action AND Comedy AND Romance)
Algorithms One-vs-Rest, One-vs-One, Softmax Binary Relevance, Classifier Chains
Evaluation Standard accuracy, F1-score Hamming loss, Subset accuracy

Diagram:

plantuml diagram

Q10. 📌 Give an example of Multi-class Classification.

[ Jun 2022 Q3(b) | Frequency: 1 ]

Answer

Example: Handwritten Digit Recognition (MNIST)

  • Input: 28×28 pixel image of handwritten digit

  • Classes: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 (10 classes)

  • Output: Single digit class (mutually exclusive)

Other Examples:

Problem Classes
Iris Flower Classification Setosa, Versicolor, Virginica
News Categorization Sports, Politics, Entertainment, Technology
Animal Classification Cat, Dog, Elephant, Lion, Tiger
Weather Prediction Sunny, Rainy, Cloudy, Snowy
Customer Segmentation Premium, Standard, Basic

Key Point: Each instance is assigned to exactly ONE class from multiple available options.


Q11. 📌 Give an example of Multi-label Classification.

[ Jun 2022 Q3(b) | Frequency: 1 ]

Answer

Example: Movie Genre Classification

  • Input: Movie plot, cast, keywords

  • Possible Labels: Action, Comedy, Romance, Thriller, Horror, Sci-Fi

  • Output: Multiple genres can apply

Movie Labels Assigned
"Avengers" Action, Sci-Fi, Adventure
"Titanic" Romance, Drama
"Scary Movie" Horror, Comedy

Other Examples:

Problem Possible Labels
Document Tagging Politics, Sports, Finance, Health (multiple tags)
Image Annotation Beach, Sunset, People, Mountains
Music Classification Pop, Rock, Electronic, Vocal
Protein Function Multiple biological functions
Product Categorization Electronics, Home, Office, Portable

Key Point: Each instance can have zero, one, or multiple labels simultaneously.


Q12. ⭐ Can binary classification algorithms be altered for multi-class problems? Justify.

[ Jun 2023 Q3(b) | Frequency: 1 ]

Answer

Yes, binary classification algorithms can be extended for multi-class problems using decomposition strategies.

Justification:

Many powerful algorithms (SVM, Logistic Regression) are inherently binary. To handle multiple classes, we decompose the multi-class problem into multiple binary problems.

Two Main Approaches:

Approach Method Binary Classifiers Needed
One-vs-Rest (OvR) One classifier per class vs all others N classifiers for N classes
One-vs-One (OvO) One classifier per pair of classes N(N-1)/2 classifiers

Example with 4 Classes (A, B, C, D):

  • OvR: 4 classifiers (A vs Rest, B vs Rest, C vs Rest, D vs Rest)

  • OvO: 6 classifiers (A vs B, A vs C, A vs D, B vs C, B vs D, C vs D)

Algorithms Supporting Multi-class Natively:

  • Decision Trees

  • Random Forest

  • Naive Bayes

  • Neural Networks (with softmax)

Conclusion: Binary algorithms remain valuable through decomposition strategies, making them versatile for multi-class classification.


Q13. 📌 Discuss One-versus-the-Rest (OvR) Approach.

[ Jun 2023 Q3(b) | Frequency: 1 ]

Answer

One-vs-Rest (OvR), also called One-vs-All (OvA), is a strategy to extend binary classifiers for multi-class classification.

Working Principle:

  • For N classes, train N binary classifiers

  • Each classifier distinguishes one class from all other classes combined

  • During prediction, choose class with highest confidence score

Diagram:

flowchart TB subgraph "Training (3 Classes: A, B, C)" C1[Classifier 1: A vs Not-A] C2[Classifier 2: B vs Not-B] C3[Classifier 3: C vs Not-C] end subgraph "Prediction" I[New Instance] --> C1 I --> C2 I --> C3 C1 --> D[Choose Highest\nConfidence] C2 --> D C3 --> D D --> O[Final Class] end

Example with Classes {Cat, Dog, Bird}:

  • Classifier 1: Cat vs (Dog + Bird)

  • Classifier 2: Dog vs (Cat + Bird)

  • Classifier 3: Bird vs (Cat + Dog)

Advantages:

  • Computationally efficient (only N classifiers)

  • Interpretable results

  • Works with any binary classifier

Disadvantages:

  • Class imbalance (one class vs many)

  • Ambiguous regions possible


Q14. 📌 Discuss One-versus-One (OvO) Approach.

[ Jun 2023 Q3(b) | Frequency: 1 ]

Answer

One-vs-One (OvO) is a multi-class strategy that trains a binary classifier for every pair of classes.

Working Principle:

  • For N classes, train N(N-1)/2 binary classifiers

  • Each classifier distinguishes between two specific classes

  • Prediction uses voting: each classifier votes for one class

  • Final prediction: class with most votes

Diagram:

flowchart TB subgraph "Training (3 Classes: A, B, C)" C1[Classifier 1: A vs B] C2[Classifier 2: A vs C] C3[Classifier 3: B vs C] end subgraph "Prediction" I[New Instance] --> C1 I --> C2 I --> C3 C1 --> V[Voting] C2 --> V C3 --> V V --> O["Final Class\n(Max Votes)"] end

Example with Classes {Cat, Dog, Bird}:

  • Classifier 1: Cat vs Dog

  • Classifier 2: Cat vs Bird

  • Classifier 3: Dog vs Bird

Number of Classifiers: N(N-1)/2 = 3(2)/2 = 3

Advantages:

  • Balanced training data for each classifier

  • Each classifier focuses on only two classes

  • Often more accurate than OvR

Disadvantages:

  • More classifiers needed (O(N²) complexity)

  • Higher computational cost for large N


Q15. ⭐ Compare Classification and Regression techniques of Supervised Learning.

[ Dec 2023 Q4(b), Dec 2022 Q3(b) | Frequency: 2 ]

Answer
Aspect Classification Regression
Output Type Discrete (categorical labels) Continuous (numerical values)
Prediction Predicts class/category Predicts quantity/value
Goal Assign to predefined categories Estimate numerical relationship
Decision Boundary Creates boundaries between classes Fits a curve/line to data
Evaluation Metrics Accuracy, Precision, Recall, F1 MSE, RMSE, MAE, R²
Loss Function Cross-entropy, Hinge loss Mean Squared Error
Output Example "Spam" or "Not Spam" Price = $250,000
Algorithms SVM, Decision Trees, KNN Linear Regression, SVR

Diagram:

plantuml diagram

Examples:

Classification Regression
Disease diagnosis (Yes/No) Temperature prediction (°C)
Email spam detection House price estimation
Image recognition (Cat/Dog) Stock price prediction
Credit approval (Accept/Reject) Sales forecasting

Q16. 📌 Give an example of Classification techniques.

[ Dec 2022 Q3(b) | Frequency: 1 ]

Answer

Example: Credit Card Fraud Detection

Problem: Identify whether a transaction is fraudulent or legitimate

Technique: Decision Tree Classification

Features (Input):

Feature Value
Transaction Amount $5,000
Location Overseas
Time 3:00 AM
Merchant Category Electronics
Previous Fraud History No

Target Classes: Fraudulent / Legitimate

Decision Tree Logic:

IF Amount > $1000 AND Location = Overseas AND Time = Late Night
    THEN Class = "Fraudulent"
ELSE
    Class = "Legitimate"

Other Classification Examples:

  1. Naive Bayes → Spam email detection

  2. SVM → Text categorization

  3. Random Forest → Disease prediction

  4. KNN → Handwriting recognition

  5. Logistic Regression → Customer churn prediction


Q17. ⭐ Explain the various metrics used for evaluating the classification model.

[ Dec 2023 Q4(b) | Frequency: 1 ]

Answer

Classification Evaluation Metrics:

1. Confusion Matrix Components:

Predicted Positive Predicted Negative
Actual Positive True Positive (TP) False Negative (FN)
Actual Negative False Positive (FP) True Negative (TN)

2. Key Metrics:

Metric Formula Interpretation
Accuracy (TP + TN) / (TP + TN + FP + FN) Overall correctness
Precision TP / (TP + FP) Correctness of positive predictions
Recall (Sensitivity) TP / (TP + FN) Coverage of actual positives
Specificity TN / (TN + FP) Coverage of actual negatives
F1-Score 2 × (Precision × Recall) / (Precision + Recall) Harmonic mean of precision and recall

3. Advanced Metrics:

Metric Use Case
ROC-AUC Overall model performance across thresholds
Log Loss Probabilistic confidence of predictions
Cohen's Kappa Agreement beyond chance
Matthews Correlation Balanced measure for imbalanced data

Choosing Metrics:

  • Balanced data → Accuracy

  • Imbalanced data → Precision, Recall, F1

  • Cost-sensitive → Custom weighted metrics


Q18. 🔥 Draw/Explain Confusion Matrix.

[ Jun 2023 Q1(g), Jun 2022 Q5(h) | Frequency: 2 ]

Answer

Confusion Matrix is a table that summarizes the performance of a classification model by comparing actual vs. predicted values.

Diagram:

plantuml diagram

Matrix Structure (Binary Classification):

Predicted: YES Predicted: NO
Actual: YES TP (Correct) FN (Type II Error)
Actual: NO FP (Type I Error) TN (Correct)

Components Explained:

Term Meaning Example (Disease Detection)
True Positive (TP) Correctly predicted positive Sick patient correctly diagnosed
True Negative (TN) Correctly predicted negative Healthy patient correctly identified
False Positive (FP) Incorrectly predicted positive Healthy patient wrongly diagnosed sick
False Negative (FN) Incorrectly predicted negative Sick patient wrongly identified healthy

Example:

Predicted Spam Predicted Not Spam
Actual Spam 85 (TP) 15 (FN)
Actual Not Spam 10 (FP) 90 (TN)

Derived Metrics:

  • Accuracy = (85+90)/(85+90+10+15) = 87.5%

  • Precision = 85/(85+10) = 89.5%

  • Recall = 85/(85+15) = 85%


Q19. 📌 Write formula for Accuracy.

[ Jun 2023 Q1(g) | Frequency: 1 ]

Answer

Accuracy measures the proportion of correct predictions among total predictions.

Formula:

$$\text{Accuracy} = \frac{TP + TN}{TP + TN + FP + FN}$$

Or equivalently:

$$\text{Accuracy} = \frac{\text{Number of Correct Predictions}}{\text{Total Number of Predictions}}$$

Where:

  • TP = True Positives

  • TN = True Negatives

  • FP = False Positives

  • FN = False Negatives

Example: If TP=80, TN=70, FP=10, FN=20

$$\text{Accuracy} = \frac{80 + 70}{80 + 70 + 10 + 20} = \frac{150}{180} = 0.833 = 83.3\%$$

Note: Accuracy can be misleading for imbalanced datasets.


Q20. 📌 Write formula for Precision.

[ Jun 2023 Q1(g) | Frequency: 1 ]

Answer

Precision measures the accuracy of positive predictions (how many predicted positives are actually positive).

Formula:

$$\text{Precision} = \frac{TP}{TP + FP}$$

Where:

  • TP = True Positives

  • FP = False Positives

Example: If TP=80, FP=20

$$\text{Precision} = \frac{80}{80 + 20} = \frac{80}{100} = 0.80 = 80\%$$

Interpretation: Of all instances predicted as positive, 80% were actually positive.

Use Case: Important when false positives are costly (e.g., spam detection - don't want to mark important emails as spam).


Q21. 📌 Write formula for Sensitivity (Recall).

[ Jun 2023 Q1(g) | Frequency: 1 ]

Answer

Sensitivity (also called Recall or True Positive Rate) measures the proportion of actual positives correctly identified.

Formula:

$$\text{Sensitivity (Recall)} = \frac{TP}{TP + FN}$$

Where:

  • TP = True Positives

  • FN = False Negatives

Example: If TP=80, FN=20

$$\text{Sensitivity} = \frac{80}{80 + 20} = \frac{80}{100} = 0.80 = 80\%$$

Interpretation: 80% of all actual positive cases were correctly identified.

Use Case: Critical when missing positives is costly (e.g., cancer detection - want to catch all cases).


Q22. 📌 Write formula for Specificity.

[ Jun 2023 Q1(g) | Frequency: 1 ]

Answer

Specificity (also called True Negative Rate) measures the proportion of actual negatives correctly identified.

Formula:

$$\text{Specificity} = \frac{TN}{TN + FP}$$

Where:

  • TN = True Negatives

  • FP = False Positives

Example: If TN=90, FP=10

$$\text{Specificity} = \frac{90}{90 + 10} = \frac{90}{100} = 0.90 = 90\%$$

Interpretation: 90% of all actual negative cases were correctly identified.

Use Case: Important when false positives are problematic (e.g., drug testing - don't want to falsely accuse).

Relationship with Sensitivity:

  • High Sensitivity → Fewer false negatives

  • High Specificity → Fewer false positives

  • Trade-off often exists between the two

10.5.1 Naive Bayes

Q23. ⭐ Write Naive Bayes' Algorithm.

[ Jun 2024 Q4(b) | Frequency: 1 ]

Answer

Naive Bayes Algorithm is a probabilistic classifier based on Bayes' Theorem with the "naive" assumption of feature independence.

Bayes' Theorem:

$$P(C|X) = \frac{P(X|C) \times P(C)}{P(X)}$$

Algorithm Steps:

NAIVE BAYES ALGORITHM

Input: Training dataset D with features X and class labels C
       New instance x = (x₁, x₂, ..., xₙ)

Step 1: CALCULATE PRIOR PROBABILITIES
        For each class cᵢ:
            P(cᵢ) = Count(cᵢ) / Total instances

Step 2: CALCULATE LIKELIHOOD PROBABILITIES
        For each class cᵢ and each feature xⱼ:
            P(xⱼ|cᵢ) = Count(xⱼ in cᵢ) / Count(cᵢ)

Step 3: APPLY NAIVE BAYES FORMULA
        For each class cᵢ:
            P(cᵢ|X) ∝ P(cᵢ) × ∏ P(xⱼ|cᵢ)
                              j=1 to n

Step 4: CLASSIFY
        Predicted class = argmax P(cᵢ|X)
                          cᵢ

Output: Class with maximum posterior probability

Key Assumption: All features are conditionally independent given the class label.


Q24. 🔥 Explain Naive Bayes' Algorithm with suitable example.

[ Jun 2024 Q4(b), Jun 2023 Q5(h) | Frequency: 2 ]

Answer

Naive Bayes Classifier is a supervised learning algorithm based on Bayes' Theorem that assumes all features are independent of each other given the class.

Bayes' Theorem:

$$P(Class|Features) = \frac{P(Features|Class) \times P(Class)}{P(Features)}$$

Working Principle:

Diagram:

plantuml diagram

Example: Play Tennis Classification

Training Data:

Outlook Temperature Humidity Play Tennis
Sunny Hot High No
Sunny Hot High No
Overcast Hot High Yes
Rain Mild High Yes
Rain Cool Normal Yes
Overcast Cool Normal Yes
Sunny Mild High No
Sunny Cool Normal Yes
Rain Mild Normal Yes
Sunny Mild Normal Yes

New Instance: Outlook=Sunny, Temperature=Cool, Humidity=High

Step 1: Calculate Prior Probabilities

  • P(Yes) = 7/10 = 0.7

  • P(No) = 3/10 = 0.3

Step 2: Calculate Likelihoods

Feature P(Feature|Yes) P(Feature|No)
Sunny 2/7 3/3 = 1
Cool 1/7 1/3
High 2/7 2/3

Step 3: Apply Naive Bayes

P(Yes|X) ∝ P(Yes) × P(Sunny|Yes) × P(Cool|Yes) × P(High|Yes) = 0.7 × (2/7) × (1/7) × (2/7) = 0.7 × 0.0058 = 0.0041

P(No|X) ∝ P(No) × P(Sunny|No) × P(Cool|No) × P(High|No) = 0.3 × 1 × (1/3) × (2/3) = 0.3 × 0.222 = 0.0667

Step 4: Classification Since P(No|X) > P(Yes|X), Predicted Class = No

Advantages: Fast, works well with high-dimensional data, handles missing values Applications: Spam filtering, sentiment analysis, document classification


10.5.2 K-Nearest Neighbour (K-NN)


Q25. ⭐ Explain K-Nearest Neighbour.

[ Dec 2022 Q5(h) | Frequency: 1 ]

Answer

K-Nearest Neighbour (KNN) is a lazy learning algorithm that classifies a new instance based on the majority class of its K nearest neighbors.

Key Characteristics:

  • Instance-based learning (no explicit model)

  • Non-parametric (no assumptions about data distribution)

  • Lazy learner (stores training data, computes at prediction time)

Working Principle:

Diagram:

flowchart TB A[New Data Point] --> B[Calculate Distance to All Training Points] B --> C[Select K Nearest Neighbors] C --> D[Majority Voting] D --> E[Assign Class Label] subgraph "Distance Metrics" F[Euclidean] G[Manhattan] H[Minkowski] end

Algorithm Steps:

  1. Choose the value of K (number of neighbors)

  2. Calculate distance from new point to all training points

  3. Select K nearest neighbors based on calculated distances

  4. Assign class by majority voting among K neighbors

Distance Metrics:

Metric Formula
Euclidean $\sqrt{\sum(x_i - y_i)^2}$
Manhattan $\sum|x_i - y_i|$
Minkowski $(\sum|x_i - y_i|^p)^{1/p}$

Choosing K:

  • Small K → Sensitive to noise

  • Large K → Smoother boundaries but may miss patterns

  • Typically use odd K to avoid ties

Applications: Pattern recognition, image classification, recommendation systems


Q26. 🔥 Find/Apply KNN classification algorithm to classify class of given point with K=3 and Euclidean distance.

[ Jun 2025 Q3(b), Dec 2024 Q4(a) | Frequency: 2 ]

Answer

Problem: Classify a new point using KNN with K=3 and Euclidean distance.

Given Training Data:

Point X Y Class
P1 1 1 A
P2 2 2 A
P3 3 3 A
P4 6 6 B
P5 7 7 B
P6 8 8 B

New Point to Classify: Q = (4, 4)

Step 1: Calculate Euclidean Distance

$$d = \sqrt{(x_2 - x_1)^2 + (y_2 - y_1)^2}$$
Point Distance from Q(4,4) Class
P1 $\sqrt{(4-1)^2 + (4-1)^2} = \sqrt{18} = 4.24$ A
P2 $\sqrt{(4-2)^2 + (4-2)^2} = \sqrt{8} = 2.83$ A
P3 $\sqrt{(4-3)^2 + (4-3)^2} = \sqrt{2} = 1.41$ A
P4 $\sqrt{(4-6)^2 + (4-6)^2} = \sqrt{8} = 2.83$ B
P5 $\sqrt{(4-7)^2 + (4-7)^2} = \sqrt{18} = 4.24$ B
P6 $\sqrt{(4-8)^2 + (4-8)^2} = \sqrt{32} = 5.66$ B

Step 2: Sort by Distance and Select K=3 Nearest

Rank Point Distance Class
1 P3 1.41 A
2 P2 2.83 A
3 P4 2.83 B

Step 3: Majority Voting

  • Class A: 2 votes

  • Class B: 1 vote

Step 4: Classification Result

$$\boxed{\text{New Point Q(4,4) is classified as Class A}}$$

Diagram:

svgbob diagram

Q27. ⭐ Differentiate between K-NN Algorithm and K-Means Algorithm.

[ Dec 2023 Q4(c)(ii) | Frequency: 1 ]

Answer
Aspect K-NN (K-Nearest Neighbors) K-Means Clustering
Type Supervised Learning Unsupervised Learning
Purpose Classification/Regression Clustering
K Meaning Number of neighbors to consider Number of clusters to form
Labels Required Yes (labeled data) No (unlabeled data)
Learning Type Lazy learner (instance-based) Eager learner
Training Phase None (stores all data) Iterative training
Output Class label for new point Cluster assignments
Algorithm Distance-based voting Centroid-based grouping
Prediction Time Slow (computes at runtime) Fast (uses learned centroids)
K Selection Affects classification boundary Affects number of groups
Use Case Spam detection, image recognition Customer segmentation

Diagram:

flowchart LR subgraph "K-NN (Supervised)" A1[Labeled Data] --> B1[Store Data] C1[New Point] --> D1[Find K Nearest] D1 --> E1[Vote for Class] end subgraph "K-Means (Unsupervised)" A2[Unlabeled Data] --> B2[Initialize K Centroids] B2 --> C2[Assign Points to Clusters] C2 --> D2[Update Centroids] D2 --> C2 end

Q28. 📌 Give example of K-NN Algorithm.

[ Dec 2023 Q4(c)(ii) | Frequency: 1 ]

Answer

Example: Fruit Classification

Scenario: Classify a fruit as Apple, Orange, or Banana based on weight and color intensity.

Training Data:

Fruit Weight (g) Color Intensity Class
F1 150 8 Apple
F2 160 7 Apple
F3 200 6 Orange
F4 180 5 Orange
F5 120 3 Banana
F6 130 2 Banana

New Fruit: Weight = 155g, Color Intensity = 7

Using K=3:

  1. Calculate distances to all fruits

  2. Find 3 nearest neighbors (likely F1, F2, F3)

  3. Majority voting: Apple=2, Orange=1

  4. Classification: Apple

Other KNN Applications:

  • Recommendation Systems: Recommend products based on similar user preferences

  • Medical Diagnosis: Classify disease based on similar patient records

  • Handwriting Recognition: Match digits to similar training samples

  • Credit Scoring: Classify loan applications based on similar customers


10.5.3 Decision Trees


Q29. ⭐ Write ID3 Algorithm for creating decision tree for any training dataset.

[ Jun 2025 Q4(a) | Frequency: 1 ]

Answer

ID3 (Iterative Dichotomiser 3) is a decision tree algorithm that uses Information Gain to select the best attribute for splitting.

Key Concepts:

Entropy (Measure of Impurity):

$$H(S) = -\sum_{i=1}^{c} p_i \log_2(p_i)$$

Information Gain:

$$IG(S, A) = H(S) - \sum_{v \in Values(A)} \frac{|S_v|}{|S|} H(S_v)$$

ID3 Algorithm:

ALGORITHM ID3(S, Attributes, Target)

Input:  S = Training set
        Attributes = Set of features
        Target = Target attribute (class label)

Output: Decision Tree

BEGIN
    1. Create root node for tree

    2. IF all examples in S belong to same class c THEN
           RETURN leaf node with label c
       END IF

    3. IF Attributes is empty THEN
           RETURN leaf node with majority class in S
       END IF

    4. CALCULATE Entropy of S: H(S)

    5. FOR each attribute A in Attributes DO
           CALCULATE Information Gain: IG(S, A)
       END FOR

    6. SELECT attribute A_best with highest Information Gain

    7. SET A_best as decision attribute for root

    8. FOR each value v of A_best DO
           8.1 Add branch from root for condition A_best = v
           8.2 Let S_v = subset of S where A_best = v

           8.3 IF S_v is empty THEN
                   Add leaf with majority class of S
               ELSE
                   Add subtree: ID3(S_v, Attributes - {A_best}, Target)
               END IF
       END FOR

    9. RETURN root node

END

Diagram:

plantuml diagram

Q30. ⭐ Discuss Decision Trees with suitable example.

[ Jun 2023 Q2(b)(iii) | Frequency: 1 ]

Answer

Decision Tree is a supervised learning algorithm that creates a tree-like model of decisions and their possible consequences.

Structure:

  • Root Node: Top node, first decision point

  • Internal Nodes: Decision points based on features

  • Branches: Outcomes of decisions

  • Leaf Nodes: Final classification/prediction

Example: Loan Approval System

Training Data:

Age Income Credit Score Loan Approved
Young High Good Yes
Young Low Good No
Middle High Good Yes
Middle Low Bad No
Old High Good Yes
Old Low Good Yes

Decision Tree:

Diagram:

graphviz diagram

Classification of New Instance:

  • Credit Score = Good, Income = Low, Age = Old

  • Path: Credit Score(Good) → Income(Low) → Age(Old) → YES (Approved)

Advantages:

  • Easy to interpret and visualize

  • Handles both numerical and categorical data

  • No feature scaling required

Algorithms: ID3, C4.5, CART, Random Forest


10.5.4 Logistic Regression


Q31. ⭐ What is Logistic Regression?

[ Dec 2024 Q1(c), Dec 2022 Q1(g) | Frequency: 2 ]

Answer

Logistic Regression is a supervised machine learning algorithm used for binary classification problems. Despite its name, it is a classification algorithm, not regression.

Key Characteristics:

  • Predicts probability of class membership

  • Output range: 0 to 1

  • Uses Sigmoid (Logistic) function

  • Linear decision boundary

Sigmoid Function:

$$\sigma(z) = \frac{1}{1 + e^{-z}}$$

Where: $z = w_0 + w_1x_1 + w_2x_2 + ... + w_nx_n$

Diagram:

flowchart LR A[Input Features] --> B[Linear Combination] B --> C[Sigmoid Function] C --> D{Threshold 0.5} D -->|≥ 0.5| E[Class 1] D -->|< 0.5| F[Class 0]

Output Interpretation:

  • P(Y=1|X) ≥ 0.5 → Class 1

  • P(Y=1|X) < 0.5 → Class 0

Applications:

  • Spam detection

  • Disease diagnosis

  • Credit risk assessment

  • Customer churn prediction


Q32. 🔥 Explain Logistic Regression.

[ Jun 2023 Q5(g), Jun 2022 Q4(b)(iii) | Frequency: 2 ]

Answer

Logistic Regression is a statistical method for binary classification that models the probability of an outcome using the logistic (sigmoid) function.

Mathematical Foundation:

Linear Model: $z = \beta_0 + \beta_1x_1 + \beta_2x_2 + ... + \beta_nx_n$

Sigmoid Transformation: $P(Y=1|X) = \frac{1}{1 + e^{-z}}$

Diagram:

{
  "$schema": "https://vega.github.io/schema/vega-lite/v5.json",
  "width": 300,
  "height": 200,
  "title": "Sigmoid Function",
  "data": {
    "sequence": {
      "start": -6,
      "stop": 6,
      "step": 0.1,
      "as": "z"
    }
  },
  "transform": [
    {"calculate": "1/(1+exp(-datum.z))", "as": "sigmoid"}
  ],
  "mark": "line",
  "encoding": {
    "x": {"field": "z", "type": "quantitative", "title": "z"},
    "y": {"field": "sigmoid", "type": "quantitative", "title": "P(Y=1)"}
  }
}

Working Process:

  1. Input Features: Receive feature vector X

  2. Linear Combination: Calculate weighted sum z

  3. Apply Sigmoid: Transform z to probability [0,1]

  4. Thresholding: Convert probability to class (typically 0.5 threshold)

Cost Function (Log Loss):

$$J(\theta) = -\frac{1}{m}\sum[y\log(h_\theta(x)) + (1-y)\log(1-h_\theta(x))]$$

Example: Pass/Fail Prediction

Hours Studied Pass (1) / Fail (0)
2 0
3 0
5 1
6 1

Model learns: $P(Pass) = \frac{1}{1 + e^{-(−4 + 1×hours)}}$

For 4 hours: $P(Pass) = \frac{1}{1 + e^0} = 0.5$ (boundary case)

Advantages: Interpretable, probabilistic output, efficient training


Q33. 🔥 Discuss the various types of Logistic Regressions briefly.

[ Dec 2024 Q1(c), Dec 2022 Q1(g) | Frequency: 2 ]

Answer

Types of Logistic Regression:

Type Classes Output Use Case
Binary 2 0 or 1 Spam/Not Spam
Multinomial >2 (mutually exclusive) One of K classes Digit recognition (0-9)
Ordinal >2 (ordered) Ordered categories Rating (Low/Medium/High)

1. Binary Logistic Regression:

  • Two classes (0 and 1)

  • Uses single sigmoid function

  • Most common type

$$P(Y=1) = \frac{1}{1 + e^{-z}}$$

Example: Disease positive/negative

2. Multinomial Logistic Regression:

  • Multiple classes (>2)

  • Classes are mutually exclusive

  • Uses Softmax function

$$P(Y=k) = \frac{e^{z_k}}{\sum_{j=1}^{K} e^{z_j}}$$

Example: Classify news as Sports/Politics/Entertainment

3. Ordinal Logistic Regression:

  • Multiple ordered categories

  • Preserves natural ordering

  • Uses cumulative probabilities

$$P(Y \leq k) = \frac{1}{1 + e^{-(α_k - z)}}$$

Example: Customer satisfaction (Very Poor < Poor < Average < Good < Excellent)

Diagram:

plantuml diagram

Q34. ⭐ Differentiate between Logistic Regression and Linear Regression.

[ Dec 2023 Q4(c)(i) | Frequency: 1 ]

Answer
Aspect Logistic Regression Linear Regression
Purpose Classification Regression
Output Type Discrete (categorical) Continuous (numerical)
Output Range 0 to 1 (probability) -∞ to +∞
Function Sigmoid/Logistic Linear
Prediction Class labels Numeric values
Cost Function Log Loss (Cross-entropy) Mean Squared Error
Decision Boundary S-curve (sigmoid) Straight line
Threshold Typically 0.5 Not applicable
Use Case Spam detection House price prediction
Equation $P = \frac{1}{1+e^{-z}}$ $y = mx + c$

Diagram:

flowchart TB %% First subgraph: Linear Regression subgraph LR["Linear Regression"] A1[Input X] --> B1[Y = mX + c] B1 --> C1[Continuous Value] end %% Second subgraph: Logistic Regression subgraph LG["Logistic Regression"] A2[Input X] --> B2[z = βX] B2 --> C2["Sigmoid: 1 / (1 + e⁻ᶻ)"] C2 --> D2[Probability 0–1] D2 --> E2[Class 0 or 1] end %% Explicitly connect subgraphs to force vertical stacking LR --> LG

Key Difference: Linear regression predicts quantity, logistic regression predicts probability of class membership.


Q35. 📌 Give example of Logistic Regression.

[ Dec 2023 Q4(c)(i) | Frequency: 1 ]

Answer

Example: Email Spam Detection

Problem: Classify emails as Spam (1) or Not Spam (0)

Features:

Feature Description
x₁ Contains word "free" (0/1)
x₂ Contains word "winner" (0/1)
x₃ Number of exclamation marks
x₄ Sender in contacts (0/1)

Model:

$$z = -2 + 1.5x_1 + 2x_2 + 0.3x_3 - 3x_4$$
$$P(Spam) = \frac{1}{1 + e^{-z}}$$

Prediction Example:

  • Email with: "free"=1, "winner"=1, exclamations=5, in contacts=0

  • z = -2 + 1.5(1) + 2(1) + 0.3(5) - 3(0) = 3

  • P(Spam) = 1/(1 + e⁻³) = 0.95

Decision: P(Spam) = 0.95 > 0.5 → Classified as SPAM

Other Examples:

  • Credit card fraud detection

  • Disease diagnosis (Diabetic/Non-diabetic)

  • Customer churn prediction

  • Loan default prediction


10.5.5 Support Vector Machines


Q36. 🔥 Explain Support Vector Machines with example.

[ Jun 2024 Q4(c)(i), Jun 2022 Q5(g) | Frequency: 2 ]

Answer

Support Vector Machine (SVM) is a supervised learning algorithm that finds the optimal hyperplane to separate classes with maximum margin.

Key Concepts:

Term Description
Hyperplane Decision boundary that separates classes
Support Vectors Data points closest to the hyperplane
Margin Distance between hyperplane and nearest points
Kernel Function to transform data for non-linear separation

Working Principle:

Diagram:

plantuml diagram

Linear SVM:

  • Finds hyperplane: $w \cdot x + b = 0$

  • Maximizes margin: $\frac{2}{||w||}$

  • Classification: $f(x) = sign(w \cdot x + b)$

Kernel Trick for Non-linear Data:

Kernel Formula Typical Use Case
Linear $( K(x, y) = x \cdot y )$ Linearly separable data
Polynomial $( K(x, y) = (x \cdot y + c)^d )$ Polynomial decision boundary
RBF / Gaussian $( K(x, y) = e^{-\gamma |x - y|^2} )$ Highly non-linear / complex boundaries
Example: Handwritten Digit Classification

Problem: Classify digits as "1" or "7"

Features: Pixel intensities (784 features for 28×28 image)

Process:

  1. Training data: Images of 1s and 7s with labels

  2. SVM finds hyperplane in high-dimensional space

  3. Support vectors: Images near decision boundary

  4. New digit → Map to feature space → Classify based on hyperplane

Diagram:

flowchart TB A[Training Images] --> B[Extract Features] B --> C[SVM Training] C --> D[Find Optimal Hyperplane] D --> E[Identify Support Vectors] F[New Image] --> G[Extract Features] G --> H[Apply Hyperplane] H --> I{Which Side?} I -->|Positive| J["Digit '1'"] I -->|Negative| K["Digit '7'"]

Advantages:

  • Effective in high-dimensional spaces

  • Works well with clear margin of separation

  • Memory efficient (uses only support vectors)

Applications: Image classification, text categorization, bioinformatics, face detection


Summary Table: Classification Algorithms

Algorithm Type Best For Key Feature
Naive Bayes Probabilistic Text classification Fast, simple
KNN Instance-based Pattern recognition No training
Decision Tree Rule-based Interpretable models Visual, explainable
Logistic Regression Statistical Binary classification Probabilistic output
SVM Geometric High-dimensional data Maximum margin

UNIT 11: Regression


Q1. ⭐ Discuss the concept of regression.

[ Jun 2025 Q1(f) | Frequency: 1 ]

Answer

Regression is a supervised machine learning technique used to predict continuous numerical values based on input features. It establishes a relationship between dependent (target) and independent (predictor) variables.

Key Concepts:

Aspect Description
Purpose Predict continuous outcomes (e.g., price, temperature, sales)
Output Real-valued numbers (not discrete classes)
Relationship Maps input features to continuous output
Training Uses labeled data with known target values

Mathematical Representation:

$$Y = f(X) + \epsilon$$

Where:

  • Y = Dependent variable (target)

  • X = Independent variables (features)

  • f = Function learned from data

  • ε = Error term

Types of Regression:

  1. Simple Regression - One independent variable

  2. Multiple Regression - Multiple independent variables

  3. Linear Regression - Linear relationship

  4. Non-linear Regression - Curved relationships

Applications:

  • House price prediction

  • Stock market forecasting

  • Weather prediction

  • Sales forecasting

  • Risk assessment


Q2. 📌 Give the list of algorithms for regression.

[ Jun 2025 Q1(f) | Frequency: 1 ]

Answer

Regression Algorithms:

Category Algorithms
Linear Methods Simple Linear Regression, Multiple Linear Regression, Ridge Regression, Lasso Regression, Elastic Net
Non-Linear Methods Polynomial Regression, Exponential Regression, Logarithmic Regression
Tree-Based Decision Tree Regression, Random Forest Regression, Gradient Boosting Regression, XGBoost
Support Vector Support Vector Regression (SVR)
Neural Networks Multi-layer Perceptron Regression, Deep Neural Networks
Instance-Based K-Nearest Neighbors Regression
Bayesian Bayesian Linear Regression, Gaussian Process Regression

Selection Criteria:

  • Linear relationships → Linear Regression

  • Non-linear patterns → Polynomial/SVR

  • High dimensionality → Ridge/Lasso

  • Complex patterns → Random Forest/Neural Networks


Q3. 📌 Give an example of Regression techniques.

[ Dec 2022 Q3(b) | Frequency: 1 ]

Answer

Example: House Price Prediction using Linear Regression

Problem: Predict house prices based on area (sq ft)

Dataset:

Area (sq ft) Price (₹ Lakhs)
1000 50
1500 70
2000 90
2500 110
3000 130

Model: Linear Regression

$$Price = m \times Area + c$$

Learned Parameters:

  • Slope (m) = 0.04

  • Intercept (c) = 10

Prediction Equation:

$$Price = 0.04 \times Area + 10$$

Prediction for 1800 sq ft:

$$Price = 0.04 \times 1800 + 10 = 72 + 10 = \text { \text{Rs. }82 Lakhs}$$

Diagram:

graph LR A[Input: Area<br/>1800 sq ft] --> B[Linear Regression<br/>Model] B --> C[Output: Price<br/>₹82 Lakhs] D[Training Data] --> E[Learn Parameters<br/>m=0.04, c=10] E --> B

Q4. ⭐ How regression differs from classification?

[ Jun 2022 Q4(b) | Frequency: 1 ]

Answer

Differences between Regression and Classification:

Aspect Regression Classification
Output Type Continuous numerical values Discrete categorical labels
Prediction Predicts quantity Predicts category/class
Example Output 25.5, 100.75, -3.2 "Yes/No", "Cat/Dog", "A/B/C"
Algorithms Linear Regression, SVR, Polynomial Logistic Regression, SVM, Decision Tree
Evaluation Metrics MSE, RMSE, MAE, R² Accuracy, Precision, Recall, F1-Score
Loss Function Mean Squared Error Cross-Entropy Loss
Decision Boundary Fits a line/curve through data Separates classes

Examples:

Regression Problems Classification Problems
Predict house price Predict spam/not spam
Forecast temperature Detect disease presence
Estimate sales Classify image objects
Predict stock prices Sentiment analysis

Diagram:

flowchart TB %% First subgraph: Regression subgraph REG["Regression"] R1[Input Features] --> R2[Model] R2 --> R3["Continuous Output<br/>e.g., 45.6, 78.2"] end %% Second subgraph: Classification subgraph CLS["Classification"] C1[Input Features] --> C2[Model] C2 --> C3["Discrete Output<br/>e.g., Class A, B, C"] end %% Force vertical stacking of subgraphs REG --> CLS

Q5. 📌 What is the similarity between regression and classification?

[ Jun 2022 Q4(b) | Frequency: 1 ]

Answer

Similarities between Regression and Classification:

Similarity Description
Supervised Learning Both require labeled training data
Learning from Data Both learn patterns from input-output pairs
Feature Input Both use input features to make predictions
Model Training Both optimize parameters to minimize error
Generalization Both aim to perform well on unseen data
Overfitting Risk Both susceptible to overfitting
Preprocessing Both require data preprocessing (normalization, handling missing values)
Feature Selection Both benefit from relevant feature selection
Cross-Validation Both use cross-validation for model evaluation
Regularization Both can use regularization techniques

Common Workflow:

Data Collection → Preprocessing → Feature Engineering → 
Model Training → Validation → Testing → Deployment

Shared Techniques:

  • Train-test split

  • K-fold cross-validation

  • Hyperparameter tuning

  • Feature scaling

  • Ensemble methods


Q6. 🔥 What is linear regression?

[ Jun 2023 Q4(b) | Frequency: 1 ]

Answer

Linear Regression is a fundamental supervised learning algorithm that models the relationship between dependent variable (Y) and one or more independent variables (X) using a linear equation.

Simple Linear Regression Equation:

$$Y = \beta_0 + \beta_1X + \epsilon$$

Where:

  • Y = Dependent variable (target)

  • X = Independent variable (feature)

  • β₀ = Y-intercept (constant term)

  • β₁ = Slope (coefficient)

  • ε = Error term

Key Characteristics:

Property Description
Relationship Assumes linear relationship between variables
Output Continuous numerical prediction
Goal Find best-fit line minimizing errors
Assumption Errors are normally distributed

Diagram:

graph LR A[Input X] --> B[Linear Model<br/>Y = β₀ + β₁X] B --> C[Predicted Y] D[Training Data<br/>X, Y pairs] --> E[Least Squares<br/>Optimization] E --> F[Learn β₀, β₁] F --> B

Assumptions of Linear Regression:

  1. Linearity - Linear relationship between X and Y

  2. Independence - Observations are independent

  3. Homoscedasticity - Constant variance of errors

  4. Normality - Errors are normally distributed

  5. No Multicollinearity - Independent variables not highly correlated


Q7. 📌 Discuss Linear regression briefly.

[ Jun 2022 Q4(b)(i) | Frequency: 1 ]

Answer

Linear Regression Overview:

Linear regression fits a straight line through data points to model the relationship between variables.

Types:

  1. Simple Linear Regression - One independent variable

  2. Multiple Linear Regression - Multiple independent variables

Working Principle:

Step 1: Initialize parameters (slope, intercept)
Step 2: Calculate predicted values
Step 3: Compute error (actual - predicted)
Step 4: Adjust parameters to minimize error
Step 5: Repeat until convergence

Best Fit Line:

The line that minimizes the sum of squared differences between actual and predicted values.

Diagram:

plantuml diagram

Advantages:

  • Simple and interpretable

  • Fast training and prediction

  • Works well for linearly separable data

Disadvantages:

  • Assumes linear relationship

  • Sensitive to outliers

  • Cannot capture complex patterns


Q8. 📌 Give example of Linear regression.

[ Dec 2023 Q4(c)(i) | Frequency: 1 ]

Answer

Example: Predicting Student Marks based on Study Hours

Problem Statement: Predict exam marks based on hours studied.

Training Data:

Hours Studied (X) Marks (Y)
2 40
4 55
6 65
8 80
10 90

Step 1: Calculate Means

  • Mean of X (X̄) = (2+4+6+8+10)/5 = 6

  • Mean of Y (Ȳ) = (40+55+65+80+90)/5 = 66

Step 2: Calculate Slope (β₁)

$$\beta_1 = \frac{\sum(X_i - \bar{X})(Y_i - \bar{Y})}{\sum(X_i - \bar{X})^2}$$

X Y X-X̄ Y-Ȳ (X-X̄)(Y-Ȳ) (X-X̄)²
2 40 -4 -26 104 16
4 55 -2 -11 22 4
6 65 0 -1 0 0
8 80 2 14 28 4
10 90 4 24 96 16
Sum 250 40
$$\beta_1 = \frac{250}{40} = 6.25$$

Step 3: Calculate Intercept (β₀)

$$\beta_0 = \bar{Y} - \beta_1\bar{X} = 66 - 6.25 \times 6 = 28.5$$

Final Equation:

$$Marks = 28.5 + 6.25 \times Hours$$

Prediction: For 7 hours of study:

$$Marks = 28.5 + 6.25 \times 7 = 72.25$$


Q9. 🔥 How linear regression is performed using least square method?

[ Jun 2023 Q4(b) | Frequency: 1 ]

Answer

Least Square Method finds the best-fit line by minimizing the sum of squared residuals (differences between actual and predicted values).

Objective Function:

$$\text{Minimize } J = \sum_{i=1}^{n}(Y_i - \hat{Y}_i)^2 = \sum_{i=1}^{n}(Y_i - (\beta_0 + \beta_1X_i))^2$$

Derivation of Parameters:

For Slope (β₁):

$$\beta_1 = \frac{n\sum X_iY_i - \sum X_i \sum Y_i}{n\sum X_i^2 - (\sum X_i)^2}$$

Or equivalently:

$$\beta_1 = \frac{\sum(X_i - \bar{X})(Y_i - \bar{Y})}{\sum(X_i - \bar{X})^2}$$

For Intercept (β₀):

$$\beta_0 = \bar{Y} - \beta_1\bar{X}$$

Step-by-Step Process:

Step 1: Collect data points (X₁,Y₁), (X₂,Y₂),...,(Xₙ,Yₙ)
Step 2: Calculate required sums: ΣX, ΣY, ΣXY, ΣX²
Step 3: Compute means: X̄, Ȳ
Step 4: Calculate slope β₁ using formula
Step 5: Calculate intercept β₀ using formula
Step 6: Form regression equation: Y = β₀ + β₁X

Diagram:

graph TD A["Data Points"] --> B["Calculate Sums:<br/>ΣX, ΣY, ΣXY, ΣX²"] B --> C["Calculate Means:<br/>X̄, Ȳ"] C --> D["Calculate Slope:<br/>β₁"] D --> E["Calculate Intercept:<br/>β₀"] E --> F["Regression Line:<br/>Y = β₀ + β₁X"] F --> G["Minimize SSE:<br/>Σ(Y − Ŷ)²"]

Why "Least Squares"?

  • Squaring ensures all errors are positive

  • Larger errors are penalized more heavily

  • Provides unique optimal solution

  • Mathematically tractable (closed-form solution)


Q10. 🔥 Find the regression line for given data points.

[ Jun 2023 Q4(b) | Frequency: 1 ]

Answer

Example Problem: Find regression line for the following data:

X 1 2 3 4 5
Y 2 4 5 4 5

Solution:

Step 1: Create Calculation Table

X Y XY
1 2 2 1
2 4 8 4
3 5 15 9
4 4 16 16
5 5 25 25
ΣX=15 ΣY=20 ΣXY=66 ΣX²=55

Step 2: Calculate Parameters

n = 5 (number of data points)

Slope (β₁):

$$\beta_1 = \frac{n\sum XY - \sum X \sum Y}{n\sum X^2 - (\sum X)^2}$$

$$\beta_1 = \frac{5(66) - (15)(20)}{5(55) - (15)^2} = \frac{330 - 300}{275 - 225} = \frac{30}{50} = 0.6$$

Means:

$$\bar{X} = \frac{15}{5} = 3, \quad \bar{Y} = \frac{20}{5} = 4$$

Intercept (β₀):

$$\beta_0 = \bar{Y} - \beta_1\bar{X} = 4 - 0.6(3) = 4 - 1.8 = 2.2$$

Step 3: Form Regression Equation

$$\boxed{Y = 2.2 + 0.6X}$$

Verification (for X=3):

$$\hat{Y} = 2.2 + 0.6(3) = 2.2 + 1.8 = 4$$

This matches our calculated Ȳ = 4 ✓


Q11. ⭐ Discuss the term 'mean squared error'.

[ Jun 2023 Q4(b) | Frequency: 1 ]

Answer

Mean Squared Error (MSE) is a metric used to evaluate the performance of regression models by measuring the average of squared differences between actual and predicted values.

Formula:

$$MSE = \frac{1}{n}\sum_{i=1}^{n}(Y_i - \hat{Y}_i)^2$$

Where:

  • n = Number of data points

  • Yᵢ = Actual value

  • Ŷᵢ = Predicted value

Characteristics:

Property Description
Range 0 to ∞ (lower is better)
Unit Squared unit of target variable
Sensitivity Highly sensitive to outliers
Interpretation Average squared prediction error

Example Calculation:

Actual (Y) Predicted (Ŷ) Error (Y-Ŷ) Error²
10 12 -2 4
15 14 1 1
20 18 2 4
25 26 -1 1
Sum 10
$$MSE = \frac{10}{4} = 2.5$$

Related Metrics:

Metric Formula Advantage
RMSE √MSE Same unit as target
MAE (1/n)Σ|Y-Ŷ| Less sensitive to outliers
1 - (SSE/SST) Normalized (0-1 scale)

Advantages of MSE:

  • Differentiable (useful for optimization)

  • Penalizes larger errors more

  • Unique minimum solution

Disadvantages:

  • Sensitive to outliers

  • Unit is squared (less interpretable)


Q12. 📌 Discuss the term 'mean absolute errors'.

[ Jun 2023 Q4(b) | Frequency: 1 ]

Answer

Mean Absolute Error (MAE) measures the average magnitude of errors between actual and predicted values without considering direction.

Formula:

$$MAE = \frac{1}{n}\sum_{i=1}^{n}|Y_i - \hat{Y}_i|$$

Where:

  • n = Number of data points

  • Yᵢ = Actual value

  • Ŷᵢ = Predicted value

  • |...| = Absolute value

Characteristics:

Property Description
Range 0 to ∞ (lower is better)
Unit Same as target variable
Sensitivity Robust to outliers
Interpretation Average prediction error in original units

Example Calculation:

Actual (Y) Predicted (Ŷ) |Error|
10 12 2
15 14 1
20 18 2
25 26 1
Sum 6
$$MAE = \frac{6}{4} = 1.5$$

Comparison: MAE vs MSE

Aspect MAE MSE
Formula Average of |errors| Average of errors²
Outlier Sensitivity Less sensitive More sensitive
Differentiability Not differentiable at 0 Fully differentiable
Unit Original unit Squared unit
Interpretation More intuitive Mathematical convenience

When to Use MAE:

  • When outliers should not dominate

  • When interpretability is important

  • When all errors are equally important


Q13. 📌 Discuss Multiple linear regression briefly.

[ Jun 2022 Q4(b)(ii) | Frequency: 1 ]

Answer

Multiple Linear Regression extends simple linear regression to include multiple independent variables for predicting the dependent variable.

Equation:

$$Y = \beta_0 + \beta_1X_1 + \beta_2X_2 + ... + \beta_nX_n + \epsilon$$

Matrix Form:

$$Y = X\beta + \epsilon$$

Example: House Price Prediction

$$Price = \beta_0 + \beta_1(Area) + \beta_2(Bedrooms) + \beta_3(Age) + \beta_4(Location)$$

Key Components:

Component Description
β₀ Intercept (base value)
β₁, β₂...βₙ Coefficients for each feature
X₁, X₂...Xₙ Independent variables
Y Dependent variable

Diagram:

graph LR X1[Feature 1<br/>Area] --> M[Multiple Linear<br/>Regression Model] X2[Feature 2<br/>Bedrooms] --> M X3[Feature 3<br/>Age] --> M X4[Feature 4<br/>Location] --> M M --> Y[Predicted<br/>Price]

Assumptions:

  1. Linear relationship between predictors and response

  2. No multicollinearity among predictors

  3. Homoscedasticity of residuals

  4. Independence of observations

  5. Normal distribution of errors

Coefficient Estimation (OLS):

$$\hat{\beta} = (X^TX)^{-1}X^TY$$

Applications:

  • Sales forecasting

  • Economic modeling

  • Medical research

  • Marketing analysis


Q14. 🔥 Differentiate between linear regression and polynomial regression techniques.

[ Jun 2025 Q1(c), Dec 2022 Q1(h) | Frequency: 2 ]

Answer

Comparison Table:

Aspect Linear Regression Polynomial Regression
Equation Y = β₀ + β₁X Y = β₀ + β₁X + β₂X² + ... + βₙXⁿ
Relationship Fits straight line Fits curved line
Complexity Simple model More complex
Flexibility Limited to linear patterns Captures non-linear patterns
Parameters 2 (slope, intercept) n+1 (degree n polynomial)
Overfitting Risk Low High (with high degree)
Interpretability Highly interpretable Less interpretable
Computation Fast Slower with higher degrees
Use Case Linear relationships Curved relationships

Visual Comparison:

svgbob diagram

Diagram:

graph TB subgraph Linear Regression L1[Y = β₀ + β₁X] L2[Straight Line Fit] L3[Best for Linear Data] end subgraph Polynomial Regression P1[Y = β₀ + β₁X + β₂X²...] P2[Curved Line Fit] P3[Best for Non-linear Data] end

Example:

Data Pattern Best Choice
House price vs area (linear trend) Linear Regression
Growth curve (exponential pattern) Polynomial Regression
Seasonal sales data Polynomial Regression
Simple proportional relationship Linear Regression

Key Insight: Polynomial regression is essentially linear regression with polynomial features (X, X², X³...).


Q15. 🔥 Explain Polynomial Regression with example.

[ Jun 2024 Q4(c)(iii), Jun 2022 Q4(b)(iv) | Frequency: 2 ]

Answer

Polynomial Regression fits a polynomial equation to the data, allowing modeling of non-linear relationships between variables.

General Equation (Degree n):

$$Y = \beta_0 + \beta_1X + \beta_2X^2 + \beta_3X^3 + ... + \beta_nX^n$$

Example: Quadratic Polynomial (Degree 2)

$$Y = \beta_0 + \beta_1X + \beta_2X^2$$

Practical Example: Predicting Fuel Efficiency

Speed (X) Fuel Efficiency (Y)
20 25
40 35
60 38
80 32
100 22

Note: Efficiency increases then decreases (parabolic pattern)

Solution using Degree 2 Polynomial:

Step 1: Create Feature Matrix

X Y
20 400 25
40 1600 35
60 3600 38
80 6400 32
100 10000 22

Step 2: Apply Linear Regression on [X, X²]

Using least squares: β₀ = 5, β₁ = 1.1, β₂ = -0.01

Step 3: Final Equation

$$Efficiency = 5 + 1.1(Speed) - 0.01(Speed)^2$$

Prediction for Speed = 70:

$$Y = 5 + 1.1(70) - 0.01(70)^2 = 5 + 77 - 49 = 33$$

Diagram:

graph TD A[Original Feature X] --> B[Feature Engineering] B --> C[X] B --> D[X²] B --> E[X³ if needed] C --> F[Multiple Linear<br/>Regression] D --> F E --> F F --> G[Polynomial<br/>Prediction]

Choosing Polynomial Degree:

  • Degree 1 - Linear (straight line)

  • Degree 2 - Quadratic (parabola)

  • Degree 3 - Cubic (S-curve)

  • Higher degrees - Risk of overfitting


Q16. 🔥 Explain Support Vector Regression with example. Draw diagram. Give applications.

[ Jun 2024 Q4(c)(ii), Dec 2022 Q4(b) | Frequency: 2 ]

Answer

Support Vector Regression (SVR) applies Support Vector Machine concepts to regression problems by finding a function that deviates from actual targets by at most ε (epsilon) while being as flat as possible.

Key Concepts:

Term Description
ε-tube Margin of tolerance around regression line
Support Vectors Data points on or outside the tube boundary
Slack Variables Allow some points outside ε-tube
Kernel Transform data to higher dimensions

Mathematical Formulation:

Objective: Minimize

$$\frac{1}{2}||w||^2 + C\sum_{i=1}^{n}(\xi_i + \xi_i^*)$$

Subject to:

$$Y_i - (w \cdot X_i + b) \leq \epsilon + \xi_i$$
$$(w \cdot X_i + b) - Y_i \leq \epsilon + \xi_i^*$$

Where:

  • w = Weight vector

  • C = Regularization parameter

  • ξ, ξ* = Slack variables

  • ε = Epsilon (tube width)

Diagram:

plantuml diagram

SVR Visualization:

    Y
    |    * (outlier, penalized)
    |  -------- Upper bound (f(x) + ε)
    |  *  •  *  
    |  ========= Regression line f(x)
    |    •  *  •
    |  -------- Lower bound (f(x) - ε)
    |       * (outlier, penalized)
    +-------------------------> X

    • = Support Vectors (on boundary)
    * = Data points

Example: Temperature Prediction

Time (hrs) Temperature (°C)
6 15
9 22
12 28
15 30
18 25

SVR Parameters:

  • ε = 2 (tolerance of ±2°C)

  • C = 1 (regularization)

  • Kernel = RBF (for non-linear pattern)

Result: SVR finds a regression function where most predictions fall within ±2°C of actual values.

Kernel Types for SVR:

Kernel Formula Use Case
Linear K(x,y) = x·y Linear relationships
Polynomial K(x,y) = (x·y + c)^d Polynomial patterns
RBF K(x,y) = exp(-γ||x-y||²) Complex non-linear
Sigmoid K(x,y) = tanh(αx·y + c) Neural network-like

Applications of SVR:

Domain Application
Finance Stock price prediction, Risk assessment
Weather Temperature forecasting, Rainfall prediction
Healthcare Drug response prediction, Disease progression
Engineering Load forecasting, Quality prediction
Energy Power consumption prediction, Solar output
Retail Sales forecasting, Demand prediction

Advantages:

  • Effective in high-dimensional spaces

  • Robust to outliers (ε-insensitive)

  • Kernel trick for non-linear patterns

  • Good generalization

Disadvantages:

  • Computationally intensive for large datasets

  • Sensitive to hyperparameter selection

  • Less interpretable than linear regression


Unit 11 Summary

Topic Key Points
Regression Concept Supervised learning for continuous prediction
Linear Regression Y = β₀ + β₁X, Least squares method
Multiple Regression Multiple features, matrix notation
Polynomial Regression Non-linear using polynomial features
SVR ε-tube, support vectors, kernel functions
Evaluation Metrics MSE, MAE, RMSE, R²

Important Formulas:

Metric Formula
MSE (1/n)Σ(Y - Ŷ)²
MAE (1/n)Σ|Y - Ŷ|
RMSE √MSE
1 - (SSE/SST)
Slope Σ(X-X̄)(Y-Ȳ) / Σ(X-X̄)²
Intercept Ȳ - β₁X̄

UNIT 12: Neural Networks and Deep Learning


Q1. ⭐ What is a Neural Network?

[ Jun 2023 Q1(h) | Frequency: 1 ]

Answer

Neural Network is a computational model inspired by the structure and function of biological neurons in the human brain. It consists of interconnected nodes (neurons) organized in layers that process information to learn patterns from data.

Definition: A neural network is a machine learning algorithm that uses a network of artificial neurons to transform inputs into meaningful outputs through learned weights and activation functions.

Key Components:

Component Description
Input Layer Receives raw data/features
Hidden Layer(s) Processes information through weighted connections
Output Layer Produces final prediction/classification
Weights Learnable parameters connecting neurons
Bias Additional parameter for flexibility
Activation Function Introduces non-linearity

Diagram:

graphviz diagram

Working Process:

  1. Input data fed to input layer

  2. Weighted sum computed at each neuron

  3. Activation function applied

  4. Output propagated to next layer

  5. Final output generated

Applications:

  • Image recognition

  • Natural Language Processing

  • Speech recognition

  • Medical diagnosis

  • Autonomous vehicles


Q2. ⭐ How biological neuron relates to Artificial Neuron?

[ Jun 2023 Q1(h) | Frequency: 1 ]

Answer

The Artificial Neuron is designed to mimic the structure and function of a Biological Neuron in the human brain.

Biological Neuron Working:

  1. Dendrites receive signals from other neurons

  2. Signals are processed in the Cell Body (Soma)

  3. If signal exceeds threshold, neuron fires

  4. Signal travels through Axon to other neurons

  5. Synapses transmit signals to next neurons

Artificial Neuron Working:

  1. Inputs (x₁, x₂, ...) receive data values

  2. Each input multiplied by weight (w)

  3. Weighted sum calculated: Σ(wᵢ × xᵢ) + bias

  4. Activation function determines output

  5. Output transmitted to next layer

Mathematical Model of Artificial Neuron:

$$y = f\left(\sum_{i=1}^{n} w_i x_i + b\right)$$

Where:

  • xᵢ = Input signals

  • wᵢ = Weights (synaptic strength)

  • b = Bias

  • f = Activation function

  • y = Output

Key Similarities:

Aspect Biological Artificial
Signal Reception Dendrites Input connections
Processing Cell body Summation + activation
Signal Transmission Axon Output
Learning Synaptic plasticity Weight adjustment
Threshold Action potential Activation function

Q3. 📌 Illustrate biological neuron with suitable diagram.

[ Jun 2023 Q1(h) | Frequency: 1 ]

Answer

Biological Neuron Structure:

A biological neuron is the fundamental unit of the nervous system that processes and transmits information through electrical and chemical signals.

Diagram:

plantuml diagram

Components of Biological Neuron:

Component Function
Dendrites Tree-like structures that receive input signals from other neurons
Cell Body (Soma) Contains nucleus; integrates incoming signals
Axon Hillock Decision point where action potential is initiated
Axon Long fiber that carries electrical impulses away from cell body
Myelin Sheath Insulating layer that speeds up signal transmission
Axon Terminals End points that form synapses with other neurons
Synapse Junction where neurotransmitters are released

Signal Transmission Process:

Dendrites → Cell Body → Axon Hillock → Axon → Synaptic Terminals → Next Neuron
  (Input)    (Processing)  (Threshold)  (Transmission)  (Output)


Q4. 🔥 Create a table to map the components of Biological Neuron with Artificial Neuron.

[ Jun 2023 Q1(h) | Frequency: 1 ]

Answer

Mapping Table: Biological vs Artificial Neuron

Biological Neuron Artificial Neuron Function
Dendrites Input connections (xᵢ) Receive incoming signals/data
Synapses Weights (wᵢ) Determine strength of connections
Cell Body (Soma) Summation function (Σ) Aggregate all weighted inputs
Axon Hillock Activation function Decide whether to fire/activate
Threshold potential Bias (b) Minimum activation requirement
Axon Output connection Transmit processed signal
Neurotransmitters Output signal (y) Information passed to next layer
Synaptic plasticity Learning algorithm Adjust connection strengths
Neural network Artificial Neural Network Interconnected processing units

Visual Comparison:

Diagram:

graph LR %% Biological Neuron (Left) subgraph BIO["Biological Neuron"] D[Dendrites] --> S[Soma] S --> AH[Axon Hillock] AH --> A[Axon] A --> T[Terminals] end %% Artificial Neuron (Right) subgraph ANN["Artificial Neuron"] I[Inputs xᵢ] --> W[Weights wᵢ] W --> SUM[Σ + bias] SUM --> AF[Activation f] AF --> O[Output y] end %% Invisible link to align subgraphs horizontally BIO -.-> ANN

Detailed Functional Mapping:

Aspect Biological Artificial Mathematical
Input Dendrite signals x₁, x₂, ... xₙ Input vector X
Connection Strength Synaptic efficacy w₁, w₂, ... wₙ Weight vector W
Integration Spatial-temporal summation Σ(wᵢxᵢ) + b Weighted sum
Threshold -55mV to -70mV Activation threshold Bias term
Activation All-or-none firing Sigmoid, ReLU, etc. f(z)
Learning Hebbian learning Backpropagation Δw = η × error

Q5. 🔥 Explain Convolutional Neural Networks with suitable example.

[ Dec 2023 Q5(e), Jun 2023 Q5(d) | Frequency: 2 ]

Answer

Convolutional Neural Network (CNN) is a specialized deep learning architecture designed primarily for processing structured grid data like images. It uses convolution operations to automatically learn spatial hierarchies of features.

Key Layers in CNN:

Layer Type Function
Convolutional Layer Applies filters to extract features
Activation (ReLU) Introduces non-linearity
Pooling Layer Reduces spatial dimensions
Fully Connected Makes final predictions
Softmax/Output Produces class probabilities

Diagram:

plantuml diagram

Convolution Operation:

svgbob diagram

Example: Digit Recognition (MNIST)

Problem: Classify handwritten digits (0-9)

CNN Architecture:

svgbob diagram

Key Concepts:

Concept Description
Stride Step size for filter movement
Padding Adding zeros around input borders
Feature Maps Output of convolution operation
Parameter Sharing Same filter applied across image
Local Connectivity Neurons connect only to local regions

Applications:

  • Image classification

  • Object detection

  • Face recognition

  • Medical image analysis

  • Self-driving cars


Q6. ⭐ Explain Recurrent Neural Networks.

[ Dec 2022 Q5(e) | Frequency: 1 ]

Answer

Recurrent Neural Network (RNN) is a type of neural network designed for sequential data processing. Unlike feedforward networks, RNNs have connections that form directed cycles, allowing them to maintain memory of previous inputs.

Key Characteristics:

Feature Description
Sequential Processing Processes data one element at a time
Hidden State Maintains memory across time steps
Weight Sharing Same weights used across all time steps
Variable Length Can handle sequences of different lengths

Diagram:

graph LR subgraph "RNN Unrolled Through Time" X0[x₀] --> H0[h₀] X1[x₁] --> H1[h₁] X2[x₂] --> H2[h₂] X3[x₃] --> H3[h₃] H0 --> H1 H1 --> H2 H2 --> H3 H0 --> Y0[y₀] H1 --> Y1[y₁] H2 --> Y2[y₂] H3 --> Y3[y₃] end

Mathematical Formulation:

Hidden State Update:

$$h_t = \tanh(W_{hh} \cdot h_{t-1} + W_{xh} \cdot x_t + b_h)$$

Output Calculation:

$$y_t = W_{hy} \cdot h_t + b_y$$

Where:

  • hₜ = Hidden state at time t

  • xₜ = Input at time t

  • yₜ = Output at time t

  • W = Weight matrices

  • b = Bias vectors

Types of RNN Architectures:

Type Input-Output Example Use
One-to-One Single → Single Standard classification
One-to-Many Single → Sequence Image captioning
Many-to-One Sequence → Single Sentiment analysis
Many-to-Many Sequence → Sequence Machine translation

Variants to Address Limitations:

  • LSTM (Long Short-Term Memory) - Handles long-term dependencies

  • GRU (Gated Recurrent Unit) - Simplified version of LSTM

Applications:

  • Speech recognition

  • Language modeling

  • Machine translation

  • Text generation

  • Time series prediction

Limitations:

  • Vanishing/Exploding gradient problem

  • Difficulty with long sequences

  • Sequential computation (slow training)


Q7. 📌 Explain Recursive Neural Networks.

[ Jun 2022 Q5(e) | Frequency: 1 ]

Answer

Recursive Neural Network is a type of neural network that applies the same set of weights recursively over a structured input (typically tree-structured data) to produce a fixed-size output.

Key Difference from RNN:

  • RNN: Processes sequential (linear) data

  • Recursive NN: Processes hierarchical (tree) structures

Structure:

Diagram:

plantuml diagram

Mathematical Formulation:

For combining child nodes c₁ and c₂:

$$p = f(W \cdot [c_1; c_2] + b)$$

Where:

  • p = Parent node representation

  • W = Shared weight matrix

  • [c₁; c₂] = Concatenation of child representations

  • f = Non-linear activation function

Example: Sentence Parsing

Sentence: "The cat sat on mat"

Tree Structure:
           S (Sentence)
          / \
        NP   VP
       /    /  \
     The   sat  PP
     cat       /  \
              on  NP
                   |
                  mat

Each node combines its children using the same neural network.

Applications:

Domain Application
NLP Sentence parsing, Sentiment analysis
Computer Vision Scene parsing, Image segmentation
Code Analysis Abstract syntax tree processing
Knowledge Graphs Hierarchical reasoning

Advantages:

  • Captures hierarchical structure

  • Variable-size input handling

  • Parameter sharing reduces complexity

Limitations:

  • Requires pre-defined tree structure

  • Cannot model arbitrary graphs

  • Vanishing gradient in deep trees


Q8. 🔥 Explain Restricted Boltzmann Machines with suitable example.

[ Dec 2023 Q5(d), Dec 2022 Q5(f) | Frequency: 2 ]

Answer

Restricted Boltzmann Machine (RBM) is a generative stochastic neural network that can learn a probability distribution over its inputs. It consists of visible and hidden layers with no intra-layer connections.

Architecture:

Component Description
Visible Layer (v) Represents input data
Hidden Layer (h) Learns features/patterns
Weights (W) Connections between layers
Biases (a, b) For visible and hidden units

Diagram:

graphviz diagram

"Restricted" Meaning:

  • No connections within the same layer

  • Only connections between visible and hidden layers

  • Makes computation tractable

Energy Function:

$$E(v,h) = -\sum_i a_i v_i - \sum_j b_j h_j - \sum_{i,j} v_i w_{ij} h_j$$

Probability Distribution:

$$P(v,h) = \frac{e^{-E(v,h)}}{Z}$$

Where Z is the partition function (normalization constant).

Training: Contrastive Divergence (CD)

Step 1: Positive Phase - Sample h given v (data)
Step 2: Negative Phase - Reconstruct v' from h, then h' from v'
Step 3: Update weights: ΔW = η(vh^T - v'h'^T)

Example: Movie Recommendation

Problem: Learn user preferences from movie ratings

User Movie1 Movie2 Movie3 Movie4
Visible Layer 5 3 0 4
(0 = not watched)

RBM Learning:

  • Visible units: Movie ratings

  • Hidden units: Learn latent features (genre preferences)

  • After training: Can predict ratings for unwatched movies

Applications:

  • Recommendation systems

  • Dimensionality reduction

  • Feature learning

  • Collaborative filtering

  • Pre-training deep networks


Q9. 🔥 Explain Generative Adversarial Networks with suitable example.

[ Jun 2024 Q5(d), Jun 2022 Q5(f) | Frequency: 2 ]

Answer

Generative Adversarial Network (GAN) consists of two neural networks—Generator and Discriminator—that compete against each other in a game-theoretic framework to generate realistic synthetic data.

Architecture:

Component Role Goal
Generator (G) Creates fake data Fool the discriminator
Discriminator (D) Classifies real vs fake Correctly identify fakes

Diagram:

graph LR Z["Random Noise z ~ N(0,1)"] --> G[Generator<br/>Network] G --> F[Fake Data] R[Real Data] --> D[Discriminator<br/>Network] F --> D D --> O{Real or<br/>Fake?} O -->|Feedback| G O -->|Feedback| D

Training Process:

Minimax Game:
min_G max_D V(D,G) = E[log D(x)] + E[log(1 - D(G(z)))]

Step-by-Step Training:

Step Action
1 Sample random noise z
2 Generator creates fake samples G(z)
3 Discriminator evaluates real and fake samples
4 Calculate loss for D (maximize correct classification)
5 Calculate loss for G (minimize D's success on fakes)
6 Update both networks via backpropagation
7 Repeat until equilibrium

Example: Face Generation

Problem: Generate realistic human faces

Setup:

  • Real Data: CelebA dataset (celebrity faces)

  • Generator Input: 100-dimensional random noise

  • Generator Output: 64×64×3 RGB image

  • Discriminator Output: Probability (real/fake)

Training Progression:

Epoch 1:   [Noise] → Generator → [Blurry blob]
Epoch 100: [Noise] → Generator → [Face-like shape]
Epoch 1000:[Noise] → Generator → [Realistic face]

GAN Variants:

Variant Description
DCGAN Uses convolutional layers
CGAN Conditional generation
StyleGAN Controls style attributes
CycleGAN Image-to-image translation
WGAN Wasserstein distance for stability

Applications:

  • Image synthesis (faces, art)

  • Image super-resolution

  • Data augmentation

  • Video generation

  • Drug discovery

  • Text-to-image generation


Q10. 🔥 Explain Auto Encoders with suitable example.

[ Jun 2024 Q5(e), Jun 2023 Q5(e) | Frequency: 2 ]

Answer

Autoencoder is an unsupervised neural network that learns to compress data into a lower-dimensional representation (encoding) and then reconstruct the original data from this representation (decoding).

Architecture:

Component Function
Encoder Compresses input to latent space
Latent Space (z) Compressed representation (bottleneck)
Decoder Reconstructs input from latent space

Diagram:

plantuml diagram

Mathematical Formulation:

Encoder: $z = f_{encoder}(x)$

Decoder: $\hat{x} = f_{decoder}(z)$

Loss Function (Reconstruction Loss):

$$L = ||x - \hat{x}||^2 = \sum_i(x_i - \hat{x}_i)^2$$

Example: Image Denoising

Problem: Remove noise from MNIST digit images

Architecture:

Input (28×28 noisy image)
    ↓
Encoder: 784 → 256 → 128 → 32
    ↓
Latent Space: 32 dimensions
    ↓
Decoder: 32 → 128 → 256 → 784
    ↓
Output (28×28 clean image)

Training:

  • Input: Noisy images (original + Gaussian noise)

  • Target: Clean original images

  • Loss: MSE between output and clean image

Variants of Autoencoders:

Type Description Use Case
Vanilla AE Basic encoder-decoder Dimensionality reduction
Denoising AE Trained on noisy data Image denoising
Sparse AE Sparse latent representation Feature learning
Variational AE Probabilistic latent space Generative modeling
Convolutional AE Uses CNN layers Image compression

Applications:

  • Dimensionality reduction

  • Anomaly detection

  • Image compression

  • Data denoising

  • Feature extraction

  • Generative models (VAE)


Q11. ⭐ Explain Transformers.

[ Jun 2023 Q5(f) | Frequency: 1 ]

Answer

Transformer is a deep learning architecture that uses self-attention mechanisms to process sequential data in parallel, eliminating the need for recurrence. Introduced in "Attention is All You Need" (2017).

Key Innovation: Self-Attention

Allows each element to attend to all other elements in the sequence simultaneously.

Architecture Components:

Component Function
Multi-Head Attention Attends to different positions
Feed-Forward Network Processes attention output
Layer Normalization Stabilizes training
Positional Encoding Adds position information

Diagram:

graph TB subgraph "Transformer Block" I[Input Embeddings] --> PE[+ Positional Encoding] PE --> MHA[Multi-Head<br/>Self-Attention] MHA --> AN1[Add & Norm] PE --> AN1 AN1 --> FF[Feed Forward<br/>Network] FF --> AN2[Add & Norm] AN1 --> AN2 AN2 --> O[Output] end

Self-Attention Mechanism:

$$Attention(Q, K, V) = softmax\left(\frac{QK^T}{\sqrt{d_k}}\right)V$$

Where:

  • Q = Query matrix

  • K = Key matrix

  • V = Value matrix

  • dₖ = Dimension of keys

Multi-Head Attention:

$$MultiHead(Q,K,V) = Concat(head_1,...,head_h)W^O$$

Each head learns different attention patterns.

Complete Transformer (Encoder-Decoder):

svgbob diagram

Applications:

  • BERT: Bidirectional understanding (NLU)

  • GPT: Text generation (NLG)

  • T5: Text-to-text tasks

  • Vision Transformer (ViT): Image classification

  • ChatGPT: Conversational AI


Q12. 📌 Explain Non-Monotonic Reasoning Systems.

[ Jun 2023 Q5(c) | Frequency: 1 ]

Answer

Non-Monotonic Reasoning is a form of logical reasoning where conclusions can be retracted when new information is added. Unlike classical (monotonic) logic, adding facts can invalidate previous conclusions.

Comparison:

Monotonic Logic Non-Monotonic Logic
New facts only add conclusions New facts may retract conclusions
Conclusions never withdrawn Conclusions can be revised
Complete information assumed Handles incomplete information
Traditional deductive logic Default reasoning

Example:

Monotonic:

Fact: Tweety is a bird
Rule: Birds can fly
Conclusion: Tweety can fly ✓
Add: Tweety is a penguin
Conclusion: Tweety can fly ✓ (unchanged)

Non-Monotonic:

Fact: Tweety is a bird
Default: Birds typically fly
Conclusion: Tweety can fly ✓
Add: Tweety is a penguin
Conclusion: Tweety cannot fly ✗ (revised)

Types of Non-Monotonic Reasoning:

Type Description
Default Logic Rules with exceptions
Circumscription Minimize abnormalities
Autoepistemic Logic Based on agent's beliefs
Abductive Reasoning Best explanation inference

Default Logic Notation:

$$\frac{A : B}{C}$$

"If A is provable and B is consistent, then conclude C"

Example Default Rule:

$$\frac{Bird(x) : CanFly(x)}{CanFly(x)}$$

"If x is a bird, and it's consistent that x can fly, conclude x can fly"

Applications:

  • Expert systems

  • Knowledge representation

  • Common-sense reasoning

  • Diagnosis systems

  • Planning under uncertainty


Q13. 📌 Explain Closed World Assumption.

[ Jun 2022 Q5(d) | Frequency: 1 ]

Answer

Closed World Assumption (CWA) is a reasoning principle stating that if a fact cannot be proven true from the knowledge base, it is assumed to be false.

Definition:

"What is not known to be true is assumed to be false."

Comparison with Open World Assumption:

Closed World (CWA) Open World (OWA)
Unknown = False Unknown = Unknown
Complete knowledge assumed Incomplete knowledge allowed
Used in databases Used in Semantic Web
Definite conclusions Uncertain conclusions

Example:

Knowledge Base:

enrolled(John, CS101)
enrolled(John, MATH201)
enrolled(Mary, CS101)

Query: Is John enrolled in PHYSICS301?

CWA Answer OWA Answer
No (not stated, so false) Unknown (might be true)

Diagram:

graph TD Q[Query: enrolled<br/>John, PHYSICS301?] Q --> KB[Search<br/>Knowledge Base] KB --> F{Found?} F -->|Yes| T[True] F -->|No| CWA[Closed World:<br/>Assume FALSE] F -->|No| OWA[Open World:<br/>UNKNOWN]

Formal Definition:

For a knowledge base KB and ground atom A:

  • CWA: If KB ⊭ A, then infer ¬A

  • OWA: If KB ⊭ A, A's truth value is unknown

Applications of CWA:

  • Relational databases (SQL)

  • Logic programming (Prolog)

  • Expert systems

  • Planning systems

Limitations:

  • Cannot handle incomplete information properly

  • May lead to incorrect conclusions

  • Requires complete knowledge assumption

Relation to Non-Monotonic Reasoning: CWA is a form of non-monotonic reasoning because adding new facts can change conclusions from false to true.


Unit 12 Summary

Topic Key Points
Neural Network Interconnected neurons, layers, weights, activation
Biological vs Artificial Dendrites→Inputs, Synapse→Weights, Soma→Summation
CNN Convolution, pooling, feature extraction for images
RNN Sequential processing, hidden state memory
Recursive NN Tree-structured data processing
RBM Visible-hidden layers, energy-based model
GAN Generator vs Discriminator adversarial training
Autoencoder Encoder-Decoder, latent space compression
Transformer Self-attention, parallel processing, BERT/GPT
Non-Monotonic Conclusions can be retracted
CWA Unknown facts assumed false

BLOCK - 4. Machine Learning - II

UNIT 13: Feature Selection and Extraction


Q1. 🔥 Explain the term 'Dimensionality Reduction'.

[ Dec 2023 Q1(f), Jun 2022 Q1(h) | Frequency: 2 ]

Answer

Dimensionality Reduction is a technique used in machine learning and data analysis to reduce the number of input variables (features) in a dataset while preserving as much meaningful information as possible.

Definition: The process of transforming data from a high-dimensional space to a lower-dimensional space such that the reduced representation retains the essential characteristics of the original data.

Key Concepts:

Aspect Description
High Dimensionality Dataset with many features (columns)
Curse of Dimensionality Problems arising when data has too many dimensions
Information Preservation Retaining variance and patterns in reduced form
Computational Efficiency Faster processing with fewer dimensions

Why Dimensionality Reduction is Needed:

  1. Curse of Dimensionality: As dimensions increase, data becomes sparse

  2. Overfitting Prevention: Fewer features reduce model complexity

  3. Visualization: Data can be plotted in 2D or 3D

  4. Storage Reduction: Less memory required

  5. Noise Removal: Eliminates irrelevant features

Diagram:

ditaa diagram

Example: An image dataset with 1000×1000 pixels (1,000,000 features) can be reduced to 100 principal components, retaining 95% of the variance while dramatically reducing computational requirements.


Q2. 🔥 Name the general techniques to perform dimensionality reduction.

[ Dec 2023 Q1(f), Jun 2022 Q1(h) | Frequency: 2 ]

Answer

Dimensionality reduction techniques are broadly classified into two main categories:

1. Feature Selection Methods: Select a subset of original features without transforming them.

Technique Description
Filter Methods Use statistical measures (correlation, chi-square, mutual information)
Wrapper Methods Use ML model performance (forward/backward selection, RFE)
Embedded Methods Feature selection built into algorithm (LASSO, Ridge, Decision Trees)

2. Feature Extraction Methods: Transform original features into new reduced set of features.

Technique Type Description
PCA Linear Projects data onto principal components
LDA Linear Maximizes class separability
SVD Linear Singular Value Decomposition
t-SNE Non-linear For visualization, preserves local structure
UMAP Non-linear Uniform Manifold Approximation
Autoencoders Non-linear Neural network-based compression

Diagram:

plantuml diagram

Q3. ⭐ Give merits of dimensionality reduction.

[ Dec 2023 Q1(f) | Frequency: 1 ]

Answer

Merits of Dimensionality Reduction:

S.No Merit Explanation
1 Reduced Computation Time Fewer features mean faster training and prediction
2 Reduced Storage Space Less memory required to store data
3 Avoids Overfitting Simpler models generalize better to new data
4 Removes Redundancy Eliminates correlated and duplicate features
5 Noise Reduction Filters out irrelevant features and noise
6 Better Visualization Enables plotting in 2D/3D for analysis
7 Improved Model Performance Often leads to better accuracy
8 Handles Multicollinearity Removes feature dependencies
9 Easier Interpretation Simpler models are more explainable
10 Mitigates Curse of Dimensionality Works better with high-dimensional data

Q4. ⭐ Give limitation of dimensionality reduction.

[ Dec 2023 Q1(f) | Frequency: 1 ]

Answer

Limitations of Dimensionality Reduction:

S.No Limitation Explanation
1 Information Loss Some important information may be discarded
2 Interpretability Issues Transformed features may lack physical meaning
3 Computational Overhead Initial reduction process can be expensive
4 Parameter Sensitivity Results depend on hyperparameter choices
5 Not Always Beneficial May hurt performance if done incorrectly
6 Difficulty in Choosing Dimensions Optimal number of components is not always clear
7 Assumption Dependency Linear methods assume linear relationships
8 May Miss Non-linear Patterns Linear methods fail with complex data structures
9 Reversibility Issues Original data cannot be fully reconstructed
10 Domain Knowledge Required Understanding data is crucial for proper application

Q5. ⭐ Discuss the advantages of dimensionality reduction.

[ Dec 2024 Q1(h) | Frequency: 1 ]

Answer

Advantages of Dimensionality Reduction:

1. Computational Efficiency:

  • Faster model training and inference

  • Reduced memory consumption

  • Enables processing of large-scale datasets

2. Model Performance Improvement:

  • Reduces overfitting by simplifying models

  • Eliminates noisy and redundant features

  • Improves generalization to unseen data

3. Data Visualization:

  • High-dimensional data can be visualized in 2D/3D

  • Helps identify patterns, clusters, and outliers

  • Enables better exploratory data analysis

4. Storage Optimization:

  • Compressed representation requires less storage

  • Efficient data transmission and sharing

5. Handling Curse of Dimensionality:

  • Addresses sparsity in high-dimensional spaces

  • Improves distance-based algorithm performance

Diagram:

blockdiag diagram

Q6. ⭐ Discuss the disadvantages of dimensionality reduction.

[ Dec 2024 Q1(h) | Frequency: 1 ]

Answer

Disadvantages of Dimensionality Reduction:

1. Information Loss:

  • Reduction inherently discards some data

  • Important subtle patterns may be lost

  • Cannot guarantee preservation of all relevant information

2. Interpretability Challenges:

  • New features may be abstract combinations

  • Difficult to explain transformed features to stakeholders

  • Original feature importance becomes unclear

3. Computational Cost of Transformation:

  • Initial computation can be expensive for large datasets

  • Some methods (t-SNE, Autoencoders) are computationally intensive

4. Method Selection Complexity:

  • Many techniques available with different assumptions

  • Wrong choice can degrade model performance

  • Requires domain expertise

5. Hyperparameter Sensitivity:

  • Number of components affects results significantly

  • Optimal parameters vary across datasets

Disadvantage Impact Mitigation
Information Loss Reduced accuracy Use explained variance ratio
Poor Interpretability Hard to explain Use feature selection instead
Computation Cost Slow processing Use incremental methods
Method Dependency Inconsistent results Cross-validate techniques

Q7. ⭐ Discuss Dimensionality reduction with suitable example.

[ Jun 2023 Q2(b)(iv) | Frequency: 1 ]

Answer

Dimensionality Reduction - Detailed Example:

Concept: Reducing the number of features in a dataset while retaining maximum information.

Example: Image Compression using PCA

Consider a facial recognition system with 64×64 pixel images:

Stage Dimensions Description
Original 4096 features Each pixel is a feature
After PCA 100 components Retains 95% variance
Reduction 97.5% Massive space savings

Step-by-Step Process:

Original Image: 64 × 64 = 4096 dimensions
         ↓
Step 1: Standardize data (mean = 0, std = 1)
         ↓
Step 2: Compute covariance matrix
         ↓
Step 3: Find eigenvalues and eigenvectors
         ↓
Step 4: Select top k eigenvectors (principal components)
         ↓
Step 5: Project data onto k dimensions
         ↓
Reduced Data: 100 dimensions (retaining 95% information)

Diagram:

ditaa diagram

Practical Benefits:

  • Storage: 4096 → 100 values per image (97.5% reduction)

  • Processing: Face recognition 40× faster

  • Quality: Faces still recognizable

Another Example: Customer Segmentation

Feature Set Before After PCA
Demographics 15 features 3 components
Purchase History 50 features 5 components
Behavior 30 features 4 components
Total 95 features 12 components

Q8. 📌 Explain Feature Selection.

[ Jun 2023 Q5(i) | Frequency: 1 ]

Answer

Feature Selection is the process of selecting a subset of relevant features from the original dataset for use in model construction.

Definition: A technique that chooses the most important original features without transforming them, reducing dimensionality while maintaining interpretability.

Types of Feature Selection Methods:

Method Approach Examples
Filter Statistical measures independent of model Correlation, Chi-square, Mutual Information, Variance Threshold
Wrapper Uses model performance as selection criterion Forward Selection, Backward Elimination, RFE
Embedded Built into learning algorithm LASSO (L1), Ridge (L2), Tree-based importance

Diagram:

graphviz diagram

Comparison of Methods:

Aspect Filter Wrapper Embedded
Speed Fast Slow Medium
Accuracy Moderate High High
Model Dependency No Yes Yes
Overfitting Risk Low High Medium

Advantages:

  • Maintains original feature interpretation

  • Reduces training time

  • Improves model accuracy

  • Prevents overfitting


Q9. 📌 What is the purpose of feature extraction in machine learning?

[ Jun 2025 Q1(g) | Frequency: 1 ]

Answer

Feature Extraction is the process of transforming raw data into a reduced set of features that effectively represent the original data while reducing dimensionality.

Purpose of Feature Extraction:

Purpose Description
Dimensionality Reduction Transforms high-dimensional data to lower dimensions
Information Compression Captures essential patterns in fewer features
Noise Reduction Filters out irrelevant variations in data
Improved Performance Enhances model accuracy and efficiency
Pattern Discovery Reveals hidden structures in data

Key Differences from Feature Selection:

Aspect Feature Selection Feature Extraction
Transformation No Yes
Original Features Preserved Combined/Transformed
Interpretability High Low
Techniques Filter, Wrapper PCA, LDA, Autoencoders

Diagram:

blockdiag diagram

Applications:

  • Image Processing: Extract edges, textures, shapes

  • Text Mining: Word embeddings, TF-IDF

  • Signal Processing: Frequency components, wavelets

  • Biometrics: Facial features, fingerprint minutiae


Q10. 🔥 Explain Principal Component Analysis with suitable example.

[ Jun 2024 Q5(a), Dec 2022 Q5(i) | Frequency: 2 ]

Answer

Principal Component Analysis (PCA) is an unsupervised linear dimensionality reduction technique that transforms data into a new coordinate system where the greatest variance lies on the first coordinate (first principal component), second greatest variance on the second coordinate, and so on.

Key Concepts:

Term Description
Principal Components New orthogonal axes that capture maximum variance
Eigenvalues Indicate amount of variance explained by each PC
Eigenvectors Direction of principal components
Explained Variance Ratio Proportion of total variance captured

Algorithm Steps:

Step 1: Standardize the data (mean = 0, variance = 1)
Step 2: Compute covariance matrix
Step 3: Calculate eigenvalues and eigenvectors
Step 4: Sort eigenvectors by eigenvalues (descending)
Step 5: Select top k eigenvectors
Step 6: Transform data to new k-dimensional space

Diagram - PCA Geometric Interpretation:

pikchr diagram

Mathematical Formulation:

  • Covariance Matrix: C = (1/n) XᵀX

  • Eigenvalue equation: Cv = λv

  • Transformed data: Y = X × V

Example: 2D to 1D Reduction

Consider 5 data points:

Point X₁ X₂
A 2.5 2.4
B 0.5 0.7
C 2.2 2.9
D 1.9 2.2
E 3.1 3.0

Step 1: Calculate Mean

  • Mean(X₁) = 2.04, Mean(X₂) = 2.24

Step 2: Standardize

  • Subtract mean from each value

Step 3: Covariance Matrix

C = [0.616  0.615]
    [0.615  0.716]

Step 4: Eigenvalues and Eigenvectors

  • λ₁ = 1.284, v₁ = [0.677, 0.735]

  • λ₂ = 0.049, v₂ = [-0.735, 0.677]

Step 5: Explained Variance

PC1: 1.284 / (1.284 + 0.049) = 96.3%
PC2: 0.049 / (1.284 + 0.049) = 3.7%

Result: Data reduced from 2D to 1D with 96.3% variance retained.

Applications:

  • Face Recognition (Eigenfaces)

  • Image Compression

  • Gene Expression Analysis

  • Financial Portfolio Analysis


Q11. 📌 Calculate the principal component using PCA algorithm for given two-dimensional patterns.

[ Dec 2024 Q3(b) | Frequency: 1 ]

Answer

Problem: Calculate principal components for 2D data points.

Given Data Points:

Point X₁ X₂
1 1 2
2 2 3
3 3 3
4 4 5
5 5 5

Step 1: Calculate Mean

μ₁ = (1+2+3+4+5)/5 = 15/5 = 3
μ₂ = (2+3+3+5+5)/5 = 18/5 = 3.6

Step 2: Center the Data (Subtract Mean)

Point X₁ - μ₁ X₂ - μ₂
1 -2 -1.6
2 -1 -0.6
3 0 -0.6
4 1 1.4
5 2 1.4

Step 3: Compute Covariance Matrix

Var(X₁) = [(-2)² + (-1)² + 0² + 1² + 2²] / 4 = 10/4 = 2.5

Var(X₂) = [(-1.6)² + (-0.6)² + (-0.6)² + 1.4² + 1.4²] / 4 
        = [2.56 + 0.36 + 0.36 + 1.96 + 1.96] / 4 = 7.2/4 = 1.8

Cov(X₁,X₂) = [(-2)(-1.6) + (-1)(-0.6) + (0)(-0.6) + (1)(1.4) + (2)(1.4)] / 4
           = [3.2 + 0.6 + 0 + 1.4 + 2.8] / 4 = 8/4 = 2

Covariance Matrix:

C = [2.5  2.0]
    [2.0  1.8]

Step 4: Calculate Eigenvalues

Using characteristic equation: det(C - λI) = 0

|2.5-λ    2.0  |
|2.0    1.8-λ  | = 0

(2.5-λ)(1.8-λ) - (2.0)(2.0) = 0
4.5 - 2.5λ - 1.8λ + λ² - 4 = 0
λ² - 4.3λ + 0.5 = 0

Using quadratic formula:

λ = (4.3 ± √(18.49 - 2)) / 2
λ = (4.3 ± √16.49) / 2
λ = (4.3 ± 4.06) / 2

λ₁ = 4.18 (First Principal Component)
λ₂ = 0.12 (Second Principal Component)

Step 5: Calculate Eigenvectors

For λ₁ = 4.18:

(C - λ₁I)v = 0
[-1.68  2.0 ][v₁]   [0]
[2.0   -2.38][v₂] = [0]

-1.68v₁ + 2.0v₂ = 0
v₂ = 0.84v₁

Normalized: v₁ = [0.766, 0.643]

For λ₂ = 0.12:

Normalized: v₂ = [-0.643, 0.766]

Step 6: Principal Components Summary

Component Eigenvector Eigenvalue Variance Explained
PC1 [0.766, 0.643] 4.18 97.2%
PC2 [-0.643, 0.766] 0.12 2.8%

Scatter Plot Visualization:

vegalite diagram

Step 7: Transform Data onto PC1

Projection onto PC1:

Y = X_centered × v₁

Point 1: (-2)(0.766) + (-1.6)(0.643) = -2.56
Point 2: (-1)(0.766) + (-0.6)(0.643) = -1.15
Point 3: (0)(0.766) + (-0.6)(0.643) = -0.39
Point 4: (1)(0.766) + (1.4)(0.643) = 1.67
Point 5: (2)(0.766) + (1.4)(0.643) = 2.43

Result: Data successfully reduced from 2D to 1D with 97.2% variance retained.


Q12. 🔥 Explain Linear Discriminant Analysis with suitable example.

[ Dec 2023 Q5(a), Jun 2022 Q5(i) | Frequency: 2 ]

Answer

Linear Discriminant Analysis (LDA) is a supervised linear dimensionality reduction technique that finds a linear combination of features that best separates two or more classes.

Key Difference from PCA:

Aspect PCA LDA
Type Unsupervised Supervised
Objective Maximize variance Maximize class separability
Uses Labels No Yes
Output Principal components Discriminant functions

LDA Objective: Maximize the ratio of between-class variance to within-class variance.

Fisher's Criterion:

J(w) = (μ₁ - μ₂)² / (s₁² + s₂²)

where:
μ₁, μ₂ = class means
s₁², s₂² = class variances

Algorithm Steps:

plantuml diagram

Mathematical Formulation:

Within-Class Scatter Matrix (Sw):

Sw = Σᵢ Σₓ∈Cᵢ (x - μᵢ)(x - μᵢ)ᵀ

Between-Class Scatter Matrix (Sb):

Sb = Σᵢ Nᵢ(μᵢ - μ)(μᵢ - μ)ᵀ

Optimization:

Maximize: w = Sw⁻¹(μ₁ - μ₂)

Example: Two-Class Classification

Given Data:

Class Points (X₁, X₂)
Class 0 (●) (4,2), (2,4), (2,3), (3,6), (4,4)
Class 1 (○) (9,10), (6,8), (9,5), (8,7), (10,8)

Step 1: Calculate Class Means

μ₀ = [(4+2+2+3+4)/5, (2+4+3+6+4)/5] = [3, 3.8]
μ₁ = [(9+6+9+8+10)/5, (10+8+5+7+8)/5] = [8.4, 7.6]

Step 2: Within-Class Scatter Matrix

S₀ = Σ(xᵢ - μ₀)(xᵢ - μ₀)ᵀ for Class 0
S₁ = Σ(xᵢ - μ₁)(xᵢ - μ₁)ᵀ for Class 1
Sw = S₀ + S₁

Step 3: Between-Class Scatter Matrix

Sb = (μ₀ - μ₁)(μ₀ - μ₁)ᵀ
   = ([3,3.8] - [8.4,7.6])([3,3.8] - [8.4,7.6])ᵀ
   = [-5.4, -3.8][-5.4, -3.8]ᵀ

Step 4: Compute Projection Vector

w = Sw⁻¹(μ₁ - μ₀)

LDA Geometric Visualization:

pikchr diagram

1D Projection Result:

ditaa diagram

Key Properties of LDA:

Property Description
Maximum Components min(c-1, d) where c=classes, d=features
Assumption Classes are normally distributed
Covariance Assumes equal covariance for all classes
Linearity Finds linear decision boundaries

Applications:

  • Face Recognition

  • Medical Diagnosis

  • Marketing (Customer Segmentation)

  • Pattern Recognition

  • Bankruptcy Prediction

Advantages vs Disadvantages:

Advantages Disadvantages
Uses class labels for better separation Assumes normal distribution
Reduces dimensionality preserving class info Limited to (c-1) components
Works well when classes are separable Sensitive to outliers
Computationally efficient Requires labeled data

UNIT 14: Association Rules

Q1. ⭐ Explain Association Rules.

[ Dec 2022 Q5(j) | Frequency: 1 ]

Answer

Association Rules discover interesting relationships (co-occurrence patterns) among items in large transactional databases (e.g., market-basket data).
A rule is of the form:

  • $X \rightarrow Y$, where $X$ and $Y$ are itemsets, and $X \cap Y = \varnothing$
    Meaning: “If a transaction contains $X$, it is likely to contain $Y$.”

Key measures

Let $N$ = total number of transactions.

  • Support of itemset $X$

    $$\text{support}(X)=\frac{\#(X)}{N}$$
    where $#(X)$ = number of transactions containing $X$.

  • Support of rule $X \rightarrow Y$

    $$\text{support}(X\rightarrow Y)=\text{support}(X\cup Y)$$

  • Confidence of rule $X \rightarrow Y$ (conditional probability)

    $$\text{confidence}(X\rightarrow Y)=\frac{\text{support}(X\cup Y)}{\text{support}(X)}=P(Y\mid X)$$

  • Lift (strength vs. independence)

    $$\text{lift}(X\rightarrow Y)=\frac{\text{confidence}(X\rightarrow Y)}{\text{support}(Y)}$$
    Interpretation:

  • Lift $> 1$: positive association
  • Lift $= 1$: independent
  • Lift $< 1$: negative association

Rule mining tasks

  1. Frequent itemset generation: find all itemsets with support $\ge \text{minsup}$

  2. Rule generation: from frequent itemsets, generate rules with confidence $\ge \text{minconf}$ (optionally filter by lift)

Example (conceptual) If many receipts contain ${Bread, Milk}$, a rule may be:

  • ${Bread}\rightarrow{Milk}$ with high confidence (customers who buy bread often buy milk)

Q2. 🔥 Explain Apriori Algorithm with suitable example.

[ Jun 2024 Q5(b) | Frequency: 1 ]

Answer

Apriori is a level-wise algorithm to find frequent itemsets using the Apriori property:

  • Apriori property (anti-monotone):
    If an itemset is frequent, then all of its non-empty subsets are frequent.
    Equivalently, if an itemset is infrequent, all its supersets are infrequent.

This property allows pruning many candidates without counting their supports.

Algorithm idea (level-wise)

  1. Find frequent 1-itemsets $L_1$

  2. For $k=2,3,\dots$:

  3. Generate candidate $k$-itemsets $C_k$ by joining $L_{k-1}$ with itself
  4. Prune candidates whose any $(k-1)$-subset is not in $L_{k-1}$
  5. Scan database to count supports of $C_k$
  6. Keep those meeting minsup as $L_k$

  7. Stop when $L_k$ becomes empty

Pseudo-steps

  • $L_1 = { \text{frequent 1-itemsets} }$

  • For $k \ge 2$:
    $C_k = \text{apriori_gen}(L_{k-1})$
    Count supports of $C_k$ by scanning DB
    $L_k = { c \in C_k \mid \text{support}(c)\ge \text{minsup} }$

Diagram:

flowchart TB A["Scan DB"] --> B["Find L1 (frequent 1-itemsets)"] B --> C{"Is L(k-1) empty?"} C -->|Yes| Z["Stop"] C -->|No| D["Join L(k-1) with itself to form Ck"] D --> E["Prune candidates in Ck using Apriori property"] E --> F["Scan DB to count support of Ck"] F --> G["Form Lk from Ck using minsup"] G --> C

Small example (illustration of pruning)
If ${A,B,C}$ is a candidate 3-itemset, then Apriori requires:

  • ${A,B}$, ${A,C}$, and ${B,C}$ must all be frequent (in $L_2$)
    If even one is not frequent, ${A,B,C}$ is pruned without support counting.

Q3. 🔥 Generate association rules using Apriori algorithm for a transaction dataset.

[ Jun 2025 Q3(a), Dec 2024 Q3(a) | Frequency: 2 ]

Answer

Below is a complete worked example (same steps apply to any given dataset).

Transaction database ($N=5$)

TID Items
T1 Bread, Milk
T2 Bread, Diaper, Beer, Eggs
T3 Milk, Diaper, Beer, Coke
T4 Bread, Milk, Diaper, Beer
T5 Bread, Milk, Diaper, Coke

Assume:

  • minsup = 40% $\Rightarrow$ minimum support count $=0.40\times 5=2$

  • minconf = 60%


(A) Frequent itemset generation (Apriori)

1-itemset counts ($C_1 \rightarrow L_1$)

Item Count Support
Bread 4 $4/5=0.80$
Milk 4 $4/5=0.80$
Diaper 4 $4/5=0.80$
Beer 3 $3/5=0.60$
Coke 2 $2/5=0.40$
Eggs 1 $1/5=0.20$ (drop)

So,

  • $L_1={Bread, Milk, Diaper, Beer, Coke}$

2-itemset counts ($C_2 \rightarrow L_2$)

2-itemset Count Support Frequent? ($\ge 2$)
Bread, Milk 3 $3/5=0.60$ Yes
Bread, Diaper 3 $3/5=0.60$ Yes
Bread, Beer 2 $2/5=0.40$ Yes
Bread, Coke 1 $1/5=0.20$ No
Milk, Diaper 3 $3/5=0.60$ Yes
Milk, Beer 2 $2/5=0.40$ Yes
Milk, Coke 2 $2/5=0.40$ Yes
Diaper, Beer 3 $3/5=0.60$ Yes
Diaper, Coke 2 $2/5=0.40$ Yes
Beer, Coke 1 $1/5=0.20$ No

So, $L_2$ includes all “Yes” pairs above.


3-itemset counts ($C_3 \rightarrow L_3$)
(Generated from $L_2$ and pruned using Apriori property)

3-itemset Count Support Frequent? ($\ge 2$)
Bread, Milk, Diaper 2 $2/5=0.40$ Yes
Bread, Diaper, Beer 2 $2/5=0.40$ Yes
Milk, Diaper, Beer 2 $2/5=0.40$ Yes
Milk, Diaper, Coke 2 $2/5=0.40$ Yes

No 4-itemset is frequent (each occurs only once).


(B) Rule generation (minconf = 60%)

Generate rules from each frequent itemset of size $\ge 2$.

Rules from frequent 2-itemsets

For each pair ${X,Y}$: $X\rightarrow Y$ and $Y\rightarrow X$

Rule Support Confidence
Bread $\rightarrow$ Milk $3/5=0.60$ $(3/5)/(4/5)=3/4=0.75$
Milk $\rightarrow$ Bread $0.60$ $3/4=0.75$
Bread $\rightarrow$ Diaper $0.60$ $3/4=0.75$
Diaper $\rightarrow$ Bread $0.60$ $3/4=0.75$
Beer $\rightarrow$ Bread $2/5=0.40$ $(2/5)/(3/5)=2/3\approx 0.6667$
Diaper $\rightarrow$ Beer $3/5=0.60$ $(3/5)/(4/5)=3/4=0.75$
Beer $\rightarrow$ Diaper $0.60$ $(3/5)/(3/5)=3/3=1.00$
Coke $\rightarrow$ Milk $2/5=0.40$ $(2/5)/(2/5)=2/2=1.00$
Coke $\rightarrow$ Diaper $0.40$ $2/2=1.00$

(Examples dropped due to low confidence: Bread$\rightarrow$Beer $=2/4=0.50$, Milk$\rightarrow$Beer $=2/4=0.50$, etc.)


Rules from frequent 3-itemsets
Example for ${A,B,C}$: test $2\rightarrow 1$ rules (and $1\rightarrow 2$ if needed)

1) From ${Bread, Milk, Diaper}$ (support $=2/5$)

  • ${Bread, Milk}\rightarrow{Diaper}$: confidence $=(2/5)/(3/5)=2/3\approx 0.6667$

  • ${Bread, Diaper}\rightarrow{Milk}$: $(2/5)/(3/5)=2/3\approx 0.6667$

  • ${Milk, Diaper}\rightarrow{Bread}$: $(2/5)/(3/5)=2/3\approx 0.6667$

2) From ${Bread, Diaper, Beer}$ (support $=2/5$)

  • ${Bread, Diaper}\rightarrow{Beer}$: $(2/5)/(3/5)=2/3\approx 0.6667$

  • ${Bread, Beer}\rightarrow{Diaper}$: $(2/5)/(2/5)=2/2=1.00$

  • ${Diaper, Beer}\rightarrow{Bread}$: $(2/5)/(3/5)=2/3\approx 0.6667$

  • ${Beer}\rightarrow{Bread, Diaper}$: $(2/5)/(3/5)=2/3\approx 0.6667$

3) From ${Milk, Diaper, Beer}$ (support $=2/5$)

  • ${Milk, Diaper}\rightarrow{Beer}$: $(2/5)/(3/5)=2/3\approx 0.6667$

  • ${Milk, Beer}\rightarrow{Diaper}$: $(2/5)/(2/5)=2/2=1.00$

  • ${Diaper, Beer}\rightarrow{Milk}$: $(2/5)/(3/5)=2/3\approx 0.6667$

  • ${Beer}\rightarrow{Milk, Diaper}$: $(2/5)/(3/5)=2/3\approx 0.6667$

4) From ${Milk, Diaper, Coke}$ (support $=2/5$)

  • ${Milk, Diaper}\rightarrow{Coke}$: $(2/5)/(3/5)=2/3\approx 0.6667$

  • ${Milk, Coke}\rightarrow{Diaper}$: $(2/5)/(2/5)=2/2=1.00$

  • ${Diaper, Coke}\rightarrow{Milk}$: $(2/5)/(2/5)=2/2=1.00$

  • ${Coke}\rightarrow{Milk, Diaper}$: $(2/5)/(2/5)=2/2=1.00$


(Optional) Lift for a few strong rules

  • Diaper $\rightarrow$ Beer: confidence $=0.75$, support(Beer)$=0.60$

    $$\text{lift}=\frac{0.75}{0.60}=1.25$$

  • Beer $\rightarrow$ Diaper: confidence $=1.00$, support(Diaper)$=0.80$

    $$\text{lift}=\frac{1.00}{0.80}=1.25$$

  • Coke $\rightarrow$ Milk: confidence $=1.00$, support(Milk)$=0.80$

    $$\text{lift}=\frac{1.00}{0.80}=1.25$$


Q4. 🔥 Explain the working of FP-Growth algorithm.

[ Jun 2024 Q1(f), Dec 2023 Q5(b) | Frequency: 2 ]

Answer

FP-Growth (Frequent Pattern Growth) mines frequent itemsets without candidate generation, using a compressed structure called the FP-tree.

Main idea

  • Compress transactions into an FP-tree using frequency ordering.

  • Mine the FP-tree by exploring conditional pattern bases and conditional FP-trees.


Step 1: First database scan (find frequent 1-itemsets)

  • Compute support count of each item.

  • Remove infrequent items ($< \text{minsup}$).

  • Sort remaining items in descending frequency (global order).

Step 2: Second scan (build FP-tree + header table)

For each transaction:

  1. Remove infrequent items

  2. Sort items by global frequency order

  3. Insert into FP-tree:

  4. share common prefixes
  5. increment counts along existing paths

  6. Maintain header table linking all nodes of the same item (node-links)

Diagram:

graphviz diagram

(This diagram is illustrative: actual node counts/shape depend on minsup and chosen ordering. In exam, draw the FP-tree according to the given transaction set and ordering.)


Step 3: Mine frequent patterns (recursive)

For each item $i$ in header table (usually from least frequent to most):

  1. Conditional pattern base of $i$
  2. collection of prefix paths ending in $i$ (with counts)

  3. Build conditional FP-tree for $i$

  4. Extract frequent itemsets that end with $i$

  5. Repeat recursively until conditional tree becomes empty or single path

Why FP-growth is efficient

  • avoids generating large candidate sets (unlike Apriori)

  • uses compressed representation via shared prefixes

  • mines locally in conditional trees


Q5. ⭐ Give advantages of FP-Growth over Apriori Algorithm.

[ Jun 2024 Q1(f) | Frequency: 1 ]

Answer

Advantages of FP-Growth

  1. No candidate generation
  2. Apriori generates $C_2, C_3, \dots$ which can explode combinatorially.

  3. Fewer database scans

  4. FP-growth typically needs two full scans (count + build tree), then mines the tree.
  5. Apriori needs one scan per level $k$.

  6. Better performance on dense data / long patterns

  7. When many items co-occur, Apriori produces huge candidates; FP-growth remains practical due to compression.

  8. Compact storage

  9. FP-tree compresses repeated transaction prefixes, saving memory and computation.

  10. Localized mining

  11. Conditional FP-trees focus only on relevant portions of data, reducing work.

When Apriori may still be preferred

  • Very sparse datasets with short patterns and very high minsup (small search space), due to its simplicity.

[ Jun 2022 Q5(j) | Frequency: 1 ]

Answer

Pincer Search is a frequent itemset mining method designed to find maximal frequent itemsets efficiently by combining:

  • Bottom-up search (like Apriori)

  • Top-down pruning guided by maximal sets

Key terms

  • Maximal Frequent Itemset (MFI): a frequent itemset that has no frequent superset.

  • MFS: set of maximal frequent itemsets found so far.

  • MFCS (Maximal Frequent Candidate Set): a set of candidate maximal itemsets maintained during search.

Working (core idea)

  1. Start with MFCS containing the set of all items (or large candidates).

  2. At each level $k$:

  3. Generate and count candidates bottom-up (as in Apriori) to find $L_k$.
  4. Simultaneously test members of MFCS for frequency.

  5. When an itemset is found infrequent, update MFCS by splitting it (remove items) to ensure MFCS still covers all possible MFIs.

  6. When a member of MFCS becomes frequent, it is a maximal frequent itemset and is moved to MFS.

  7. Continue until no new frequent itemsets / MFIs can exist.

Diagram:

flowchart LR A[Start: MFCS initialized] --> B[Bottom-up level-wise counting] A --> C[Top-down test MFCS candidates] B --> D{Infrequent found?} D -- Yes --> E[Update MFCS by splitting<br/>to remove infrequent subsets' supersets] D -- No --> F[Update Lk] C --> G{Any MFCS itemset frequent?} G -- Yes --> H["Add to MFS (maximal frequent)"] G -- No --> I[Continue] E --> I F --> I H --> I I --> B

Outcome

  • Efficiently discovers maximal frequent patterns, often faster than enumerating all frequent itemsets when long patterns exist.

Q7. ⭐ What is the advantage of using Pincer Search over Apriori algorithm?

[ Jun 2025 Q1(h) | Frequency: 1 ]

Answer

Advantages of Pincer Search over Apriori

  1. Targets maximal frequent itemsets directly
  2. Apriori may generate all frequent itemsets, which can be extremely large.
  3. Pincer focuses on MFIs, reducing output size and computation.

  4. Bidirectional pruning (top-down + bottom-up)

  5. Top-down testing of MFCS can quickly confirm large frequent sets.
  6. Once a maximal frequent itemset is found, many smaller candidates become unnecessary.

  7. Fewer candidate generations in dense datasets

  8. In dense data with long frequent patterns, Apriori’s candidate explosion is severe.
  9. Pincer reduces explosion by shrinking MFCS using infrequent evidence.

  10. Potentially fewer scans and less counting work

  11. By eliminating large portions of the search space early, support counting can be reduced compared to pure level-wise Apriori.

Note

  • Benefit is strongest when maximal patterns are long and the dataset is dense; for very sparse data, simple Apriori may be competitive due to lower overhead.

UNIT 15: Clustering


Q1. ⭐ Discuss the concept of clustering.

[ Jun 2025 Q1(f) | Frequency: 1 ]

Answer

Clustering is an unsupervised learning technique that groups a set of objects such that objects in the same group (called a cluster) are more similar to each other than to those in other clusters.

Key Characteristics:

  • No predefined labels: Unlike classification, clustering does not use labeled training data

  • Similarity-based: Objects are grouped based on distance or similarity measures

  • Exploratory: Used to discover hidden patterns or structures in data

Formal Definition: Given a dataset $D = {x_1, x_2, \ldots, x_n}$, clustering partitions $D$ into $k$ clusters $C = {C_1, C_2, \ldots, C_k}$ such that:

  • $C_1 \cup C_2 \cup \ldots \cup C_k = D$

  • $C_i \cap C_j = \varnothing$ for $i \neq j$ (in hard clustering)

Goals of Clustering:

  1. High intra-cluster similarity: Points within a cluster should be close

  2. Low inter-cluster similarity: Points in different clusters should be far apart

Common Distance Measures:

Distance Formula
Euclidean $d(x, y) = \sqrt{\sum_{i=1}^{n}(x_i - y_i)^2}$
Manhattan $d(x, y) = \sum_{i=1}^{n} |x_i - y_i|$
Minkowski $d(x, y) = \left(\sum_{i=1}^{n} |x_i - y_i|^p\right)^{1/p}$

Applications:

  • Customer segmentation in marketing

  • Document categorization

  • Image segmentation

  • Anomaly detection

  • Gene expression analysis


Q2. 🔥 Give/List the algorithms for clustering.

[ Jun 2025 Q1(f), Jun 2024 Q1(g), Dec 2023 Q1(a) | Frequency: 3 ]

Answer

Major Clustering Algorithms by Category:

Category Algorithms Key Idea
Partition-based K-Means, K-Medoids (PAM), CLARA, CLARANS Divide data into $k$ non-overlapping partitions
Hierarchical Agglomerative (AGNES), Divisive (DIANA) Build tree of clusters (dendrogram)
Density-based DBSCAN, OPTICS, DENCLUE Group dense regions, identify noise
Grid-based STING, CLIQUE, WaveCluster Quantize space into grid cells
Model-based Gaussian Mixture Models (GMM), EM Fit statistical models to data

Brief Description of Key Algorithms:

  1. K-Means: Partitions data into $k$ clusters by minimizing within-cluster variance; uses mean as centroid

  2. K-Medoids (PAM): Similar to K-Means but uses actual data points as cluster centers; robust to outliers

  3. Agglomerative Hierarchical: Bottom-up approach; starts with individual points and merges closest clusters iteratively

  4. DBSCAN: Groups points based on density (eps, MinPts); can find arbitrarily shaped clusters and identifies outliers

  5. Gaussian Mixture Models: Assumes data is generated from a mixture of Gaussian distributions; soft clustering

Diagram:

plantuml diagram


Q3. 📌 Give suitable example for clustering.

[ Jun 2024 Q1(g) | Frequency: 1 ]

Answer

Example: Customer Segmentation in Retail

A retail company wants to segment customers based on their purchasing behavior to create targeted marketing strategies.

Dataset Features:

  • Annual Income (in thousands)

  • Spending Score (1-100)

Sample Data:

Customer Income ($K) Spending Score
C1 15 39
C2 16 81
C3 70 75
C4 72 78
C5 75 10
C6 73 12

After K-Means Clustering ($k=3$):

Cluster Customers Characteristics Marketing Strategy
Cluster 1 C1, C2 Low income, varied spending Budget-friendly offers
Cluster 2 C3, C4 High income, high spending Premium products, loyalty rewards
Cluster 3 C5, C6 High income, low spending Value-for-money promotions

Diagram:

ditaa diagram

Other Clustering Examples:

  • Image segmentation: Grouping pixels by color/intensity

  • Document clustering: Organizing news articles by topic

  • Biological: Grouping genes with similar expression patterns


Q4. ⭐ Differentiate between Hierarchical clustering and Partition-based clustering.

[ Jun 2025 Q2(a) | Frequency: 1 ]

Answer
Aspect Hierarchical Clustering Partition-based Clustering
Approach Builds a tree of clusters (dendrogram) Divides data into flat, non-overlapping partitions
Number of clusters ($k$) Not required in advance; determined by cutting dendrogram Must specify $k$ beforehand
Output Dendrogram (nested hierarchy) Flat partition of exactly $k$ clusters
Types Agglomerative (bottom-up), Divisive (top-down) K-Means, K-Medoids, CLARA
Flexibility Can choose different $k$ from same dendrogram Need to re-run algorithm for different $k$
Time Complexity $O(n^2 \log n)$ or $O(n^3)$ $O(n \cdot k \cdot t)$ where $t$ = iterations
Space Complexity $O(n^2)$ (distance matrix required) $O(n + k)$
Scalability Poor for large datasets Better for large datasets
Reversibility Irreversible (once merged/split, cannot undo) Iterative refinement possible
Cluster Shape Can handle various shapes Tends to find spherical clusters
Sensitivity Less sensitive to initialization Sensitive to initial centroid selection

Diagram:

graph TB %% Hierarchical Clustering (Top) subgraph Hierarchical["Hierarchical Clustering"] H1[Start: n individual clusters] --> H2[Compute distance matrix] H2 --> H3[Merge/Split closest clusters] H3 --> H4[Update distances] H4 --> H5{Single cluster?} H5 -- No --> H3 H5 -- Yes --> H6[Output: Dendrogram] end %% Invisible link to enforce vertical stacking Hierarchical -->| | Partition %% Partition-based Clustering (Bottom) subgraph Partition["Partition-based Clustering"] P1[Select k] --> P2[Initialize k centroids] P2 --> P3[Assign points to nearest] P3 --> P4[Update centroids] P4 --> P5{Converged?} P5 -- No --> P3 P5 -- Yes --> P6[Output: k clusters] end

Q5. 📌 Give a suitable example of Hierarchical clustering.

[ Jun 2025 Q2(a) | Frequency: 1 ]

Answer

Example: Taxonomy of Animals

Hierarchical clustering naturally represents biological classification where organisms are grouped into nested categories.

Dataset: Animals with Features

Animal Has Fur Warm-blooded Lays Eggs Aquatic
Dog Yes Yes No No
Cat Yes Yes No No
Eagle No Yes Yes No
Penguin No Yes Yes Yes
Salmon No No Yes Yes
Shark No No No Yes

Agglomerative Clustering Process:

Step Action Resulting Clusters
0 Initial {Dog}, {Cat}, {Eagle}, {Penguin}, {Salmon}, {Shark}
1 Merge Dog, Cat (most similar) {Dog, Cat}, {Eagle}, {Penguin}, {Salmon}, {Shark}
2 Merge Eagle, Penguin {Dog, Cat}, {Eagle, Penguin}, {Salmon}, {Shark}
3 Merge Salmon, Shark {Dog, Cat}, {Eagle, Penguin}, {Salmon, Shark}
4 Merge {Eagle, Penguin}, {Salmon, Shark} {Dog, Cat}, {Eagle, Penguin, Salmon, Shark}
5 Final merge {All Animals}

Diagram:

graph BT D[Dog] --- M1[Mammals] C[Cat] --- M1 E[Eagle] --- B1[Birds] P[Penguin] --- B1 S[Salmon] --- F1[Fish] Sh[Shark] --- F1 B1 --- NM[Non-Mammals] F1 --- NM M1 --- All[All Animals] NM --- All

Dendrogram Interpretation:

  • Cut at height 1: 6 clusters (individual animals)

  • Cut at height 2: 3 clusters (Mammals, Birds, Fish)

  • Cut at height 3: 2 clusters (Mammals vs Non-Mammals)


Q6. 📌 Give a suitable example of Partition-based clustering.

[ Jun 2025 Q2(a) | Frequency: 1 ]

Answer

Example: Grouping Cities by Climate

Cluster cities based on average temperature and annual rainfall using K-Means.

Dataset:

City Avg Temp (°C) Rainfall (mm)
A (Dubai) 35 50
B (Delhi) 33 45
C (London) 10 200
D (Paris) 12 180
E (Mumbai) 28 100
F (Chennai) 30 110

K-Means with $k=3$:

Step 1: Initialize centroids randomly

  • $\mu_1 = A(35, 50)$, $\mu_2 = C(10, 200)$, $\mu_3 = E(28, 100)$

Step 2: Assign points to nearest centroid

City $d(\mu_1)$ $d(\mu_2)$ $d(\mu_3)$ Cluster
A 0 175.9 50.2 C1
B 5.4 178.4 55.9 C1
C 175.9 0 118.5 C2
D 167.6 20.1 92.2 C2
E 50.2 118.5 0 C3
F 60.4 110.2 10.2 C3

Step 3: Update centroids

  • $\mu_1 = \frac{(35,50)+(33,45)}{2} = (34, 47.5)$ — Hot-Dry

  • $\mu_2 = \frac{(10,200)+(12,180)}{2} = (11, 190)$ — Cold-Wet

  • $\mu_3 = \frac{(28,100)+(30,110)}{2} = (29, 105)$ — Moderate

Final Result:

Cluster Cities Climate Type
C1 Dubai, Delhi Hot and Dry (Desert)
C2 London, Paris Cold and Wet (Temperate)
C3 Mumbai, Chennai Warm and Moderate (Tropical)

Q7. ⭐ Explain the working of partition-based clustering.

[ Dec 2023 Q1(g) | Frequency: 1 ]

Answer

Partition-based clustering divides a dataset of $n$ objects into $k$ non-overlapping clusters, where $k$ is specified in advance.

Objective Function: Minimize the sum of squared distances between each point and its cluster centroid:

$$J = \sum_{i=1}^{k} \sum_{x \in C_i} \|x - \mu_i\|^2$$

where $\mu_i$ is the centroid of cluster $C_i$.

Properties:

  • Each object belongs to exactly one cluster

  • Each cluster has at least one object

  • Clusters are mutually exclusive: $C_i \cap C_j = \varnothing$ for $i \neq j$

General Working Algorithm:

  1. Initialization: Select $k$ initial cluster centers (randomly or using heuristics like K-Means++)

  2. Assignment: Assign each data point to the nearest cluster center

    $$C_i = \{x : \|x - \mu_i\| \leq \|x - \mu_j\|, \forall j \neq i\}$$

  3. Update: Recalculate cluster centers

  4. For K-Means: $\mu_i = \frac{1}{|C_i|} \sum_{x \in C_i} x$ (mean)
  5. For K-Medoids: Select actual data point that minimizes total distance

  6. Convergence Check: Repeat steps 2-3 until:

  7. Centroids don't change significantly, OR
  8. Maximum iterations reached, OR
  9. No point changes cluster

Diagram:

flowchart TB A[Start] --> B[Choose number of clusters k] B --> C[Initialize k cluster centers] C --> D[Calculate distance of each point<br/>to all cluster centers] D --> E[Assign each point to nearest center] E --> F[Recalculate cluster centers] F --> G{Convergence<br/>criteria met?} G -- No --> D G -- Yes --> H[Output k clusters with centers] H --> I[End]

Characteristics:

  • Produces globular (spherical) clusters

  • Sensitive to initialization and outliers

  • Converges to local optimum

  • Works best with numeric data


Q8. 📌 Mention any two methods used for partition-based clustering.

[ Dec 2023 Q1(g) | Frequency: 1 ]

Answer

1. K-Means Algorithm

Aspect Description
Center Type Mean of cluster points (virtual point)
Distance Typically Euclidean distance
Update Rule $\mu_i = \frac{1}{|C_i|} \sum_{x \in C_i} x$
Complexity $O(nkt)$, where $n$ = number of points, $k$ = clusters, $t$ = iterations
Pros Fast, simple, scalable to large datasets
Cons Sensitive to outliers, requires $k$ in advance, finds only spherical clusters

2. K-Medoids (PAM - Partitioning Around Medoids)

Aspect Description
Center Type Actual data point (medoid)
Medoid Point that minimizes sum of dissimilarities within cluster
Update Rule Select $m_i$ such that $\sum_{x \in C_i} d(x, m_i)$ is minimized
Complexity $O(k(n-k)^2)$ per iteration
Pros More robust to outliers and noise, works with any distance metric
Cons Computationally expensive for large datasets

Comparison:

Feature K-Means K-Medoids
Center representation Mean (may not be actual point) Actual data point
Outlier sensitivity High Low
Distance measures Euclidean only Any metric
Data types Numeric only Categorical also possible
Speed Faster Slower

Diagram:

graph LR %% K-Means (Left) subgraph KMeans["K-Means"] A1[Points] --> B1[Calculate Mean] B1 --> C1[Virtual Centroid ★] end %% K-Medoids (Right) subgraph KMedoids["K-Medoids"] A2[Points] --> B2[Find Best Point] B2 --> C2[Actual Medoid ●] end %% Optional invisible edge to align horizontally KMeans --- KMedoids

Q9. ⭐ Explain Hierarchical Clustering with suitable example.

[ Jun 2024 Q5(c) | Frequency: 1 ]

Answer

Hierarchical Clustering creates a tree-like structure (dendrogram) of nested clusters without requiring the number of clusters in advance.

Types:

  1. Agglomerative (Bottom-up): Start with $n$ clusters (each point), merge until one cluster

  2. Divisive (Top-down): Start with one cluster (all points), split until $n$ clusters

Agglomerative Algorithm Steps:

  1. Treat each data point as a single cluster

  2. Compute distance/similarity matrix between all clusters

  3. Merge the two closest clusters

  4. Update distance matrix

  5. Repeat steps 3-4 until single cluster remains

  6. Result: Dendrogram showing merge history


Example:

Data Points: A(1,1), B(1,2), C(4,4), D(5,4)

Step 1: Initial Distance Matrix (Euclidean)

A B C D
A 0 1.0 4.24 5.0
B 1.0 0 3.61 4.47
C 4.24 3.61 0 1.0
D 5.0 4.47 1.0 0

Step 2: Clustering Process (Single Linkage)

Step Minimum Distance Action Clusters
0 Initial {A}, {B}, {C}, {D}
1 $d(A,B)=1.0$ Merge A, B {A,B}, {C}, {D}
2 $d(C,D)=1.0$ Merge C, D {A,B}, {C,D}
3 $d({A,B},{C,D})=3.61$ Merge all {A,B,C,D}

Updated Distance Matrix after Step 1:

{A,B} C D
{A,B} 0 3.61 4.47
C 3.61 0 1.0
D 4.47 1.0 0

(Single linkage: $d({A,B}, C) = \min(d(A,C), d(B,C)) = \min(4.24, 3.61) = 3.61$)

Diagram:

graph BT A["A(1,1)"] --- AB["{A,B}<br/>height=1.0"] B["B(1,2)"] --- AB C["C(4,4)"] --- CD["{C,D}<br/>height=1.0"] D["D(5,4)"] --- CD AB --- ABCD["{A,B,C,D}<br/>height=3.61"] CD --- ABCD

Cutting the Dendrogram:

  • Cut at height 2.0 → 2 clusters: {A,B} and {C,D}

  • Cut at height 0.5 → 4 clusters: {A}, {B}, {C}, {D}


Q10. 🔥 Explain Single linkage with example.

[ Dec 2024 Q2(b) | Frequency: 1 ]

Answer

Single Linkage (MIN / Nearest Neighbor)

Distance between two clusters equals the minimum distance between any pair of points from the two clusters.

$$d(C_i, C_j) = \min_{x \in C_i, y \in C_j} d(x, y)$$

Characteristics:

  • Tends to create elongated, chain-like clusters

  • Susceptible to chaining effect (noise can bridge clusters)

  • Can detect non-elliptical shapes


Example:

Data Points: P1(0,0), P2(1,0), P3(5,0), P4(6,0), P5(3,0)

Initial Distance Matrix:

P1 P2 P3 P4 P5
P1 0 1 5 6 3
P2 1 0 4 5 2
P3 5 4 0 1 2
P4 6 5 1 0 3
P5 3 2 2 3 0

Clustering Steps:

Step Min Distance Merge Clusters
1 $d(P1,P2)=1$ P1, P2 {P1,P2}, {P3}, {P4}, {P5}
2 $d(P3,P4)=1$ P3, P4 {P1,P2}, {P3,P4}, {P5}
3 $d({P1,P2},P5)=2$ {P1,P2}, P5 {P1,P2,P5}, {P3,P4}
4 $d({P1,P2,P5},{P3,P4})=2$ All {P1,P2,P3,P4,P5}

Calculation for Step 3:

$$d(\{P1,P2\}, P5) = \min(d(P1,P5), d(P2,P5)) = \min(3, 2) = 2$$

Calculation for Step 4:

$$d(\{P1,P2,P5\}, \{P3,P4\}) = \min(d(P1,P3), d(P1,P4), d(P2,P3), d(P2,P4), d(P5,P3), d(P5,P4))$$
$$= \min(5, 6, 4, 5, 2, 3) = 2$$

Diagram:

ditaa diagram


Q11. 🔥 Explain Complete linkage with example.

[ Dec 2024 Q2(b) | Frequency: 1 ]

Answer

Complete Linkage (MAX / Farthest Neighbor)

Distance between two clusters equals the maximum distance between any pair of points from the two clusters.

$$d(C_i, C_j) = \max_{x \in C_i, y \in C_j} d(x, y)$$

Characteristics:

  • Produces compact, spherical clusters

  • Sensitive to outliers (one distant point affects merge)

  • Avoids chaining effect


Example:

Data Points: P1(0,0), P2(1,0), P3(5,0), P4(6,0), P5(3,0)

Using same initial distance matrix as Single Linkage.

Clustering Steps:

Step Min Distance Merge Clusters
1 $d(P1,P2)=1$ P1, P2 {P1,P2}, {P3}, {P4}, {P5}
2 $d(P3,P4)=1$ P3, P4 {P1,P2}, {P3,P4}, {P5}
3 $d(P5,{P3,P4})=3$ P5, {P3,P4} {P1,P2}, {P3,P4,P5}
4 $d({P1,P2},{P3,P4,P5})=6$ All {P1,P2,P3,P4,P5}

Calculation for Step 3:

$$d(\{P3,P4\}, P5) = \max(d(P3,P5), d(P4,P5)) = \max(2, 3) = 3$$
$$d(\{P1,P2\}, P5) = \max(d(P1,P5), d(P2,P5)) = \max(3, 2) = 3$$

(Tie-breaking: choose {P3,P4} and P5)

Calculation for Step 4:

$$d(\{P1,P2\}, \{P3,P4,P5\}) = \max(d(P1,P3), d(P1,P4), d(P1,P5), d(P2,P3), d(P2,P4), d(P2,P5))$$
$$= \max(5, 6, 3, 4, 5, 2) = 6$$

Diagram:

ditaa diagram

Comparison with Single Linkage:

  • Single linkage merged all at distance 2

  • Complete linkage merged all at distance 6

  • Complete linkage creates tighter clusters


Q12. 🔥 Explain Average linkage with example.

[ Dec 2024 Q2(b) | Frequency: 1 ]

Answer

Average Linkage (UPGMA - Unweighted Pair Group Method with Arithmetic Mean)

Distance between two clusters equals the average of all pairwise distances between points from the two clusters.

$$d(C_i, C_j) = \frac{1}{|C_i| \cdot |C_j|} \sum_{x \in C_i} \sum_{y \in C_j} d(x, y)$$

Characteristics:

  • Compromise between single and complete linkage

  • Less susceptible to noise and outliers

  • Computationally more expensive


Example:

Data Points: P1(0,0), P2(1,0), P3(5,0), P4(6,0)

Initial Distance Matrix:

P1 P2 P3 P4
P1 0 1 5 6
P2 1 0 4 5
P3 5 4 0 1
P4 6 5 1 0

Clustering Steps:

Step Action Clusters
1 Merge P1, P2 (dist=1) {P1,P2}, {P3}, {P4}
2 Merge P3, P4 (dist=1) {P1,P2}, {P3,P4}
3 Merge {P1,P2}, {P3,P4} {P1,P2,P3,P4}

Calculating Average Distance for Step 3:

$$d(\{P1,P2\}, \{P3,P4\}) = \frac{d(P1,P3) + d(P1,P4) + d(P2,P3) + d(P2,P4)}{2 \times 2}$$
$$= \frac{5 + 6 + 4 + 5}{4} = \frac{20}{4} = 5$$

Comparison of All Linkage Methods:

Linkage Distance between {P1,P2} and {P3,P4} Formula Used
Single $\min(5,6,4,5) = 4$ Minimum
Complete $\max(5,6,4,5) = 6$ Maximum
Average $\frac{5+6+4+5}{4} = 5$ Mean

Diagram:

ditaa diagram


Q13. ⭐ Explain Density-based clustering with suitable example.

[ Dec 2023 Q5(c) | Frequency: 1 ]

Answer

Density-based clustering groups points that are densely packed together, marking points in low-density regions as outliers/noise.

Key Concept: A cluster is a dense region of points separated by low-density regions.

DBSCAN (Density-Based Spatial Clustering of Applications with Noise)

Parameters:

  • $\varepsilon$ (eps): Radius of neighborhood

  • MinPts: Minimum points required to form a dense region

Definitions:

  • $\varepsilon$-neighborhood: $N_\varepsilon(p) = {q \in D : d(p,q) \leq \varepsilon}$

  • Core Point: Point $p$ where $|N_\varepsilon(p)| \geq MinPts$

  • Border Point: Not a core point but within $\varepsilon$ of a core point

  • Noise Point: Neither core nor border

Algorithm:

  1. For each unvisited point P:
  2. Mark P as visited
  3. Find all points within $\varepsilon$ distance ($N_\varepsilon(P)$)
  4. If $|N_\varepsilon(P)| \geq MinPts$: Create new cluster, expand it
  5. Else: Mark P as noise (may later become border point)

Example:

Dataset with $\varepsilon = 2$, MinPts = 3

Point Coordinates
A (1, 1)
B (1.5, 1.2)
C (0.8, 0.8)
D (8, 8)
E (8.2, 8.5)
F (7.8, 8.2)
G (20, 20)

Analysis:

Point Neighbors within $\varepsilon=2$ Count Type
A B, C 3 (including self) Core
B A, C 3 Core
C A, B 3 Core
D E, F 3 Core
E D, F 3 Core
F D, E 3 Core
G None 1 Noise

Result:

  • Cluster 1: {A, B, C} — bottom-left dense region

  • Cluster 2: {D, E, F} — top-right dense region

  • Noise: {G} — isolated point

Diagram:

graph LR %% --- Cluster 1 --- subgraph Cluster1["Cluster 1"] A((A)) --- B((B)) B --- C((C)) A --- C end %% --- Cluster 2 --- subgraph Cluster2["Cluster 2"] D((D)) --- E((E)) E --- F((F)) D --- F end %% --- Noise Point --- G((G<br/>Noise)) style G fill:#ff6666,stroke:#333,stroke-width:1px %% --- Invisible connectors for alignment --- Cluster1 --- Cluster2 --- G

Advantages of DBSCAN:

  • No need to specify number of clusters

  • Can find arbitrarily shaped clusters

  • Robust to outliers (identifies them as noise)

  • Works well with spatial data


Q14. 🔥 Explain K-means clustering algorithm with example.

[ Dec 2024 Q2(a), Jun 2023 Q5(j) | Frequency: 2 ]

Answer

K-Means is a partition-based algorithm that divides $n$ data points into $k$ clusters by minimizing within-cluster variance.

Objective Function:

$$J = \sum_{i=1}^{k} \sum_{x \in C_i} \|x - \mu_i\|^2$$

Algorithm Steps:

  1. Choose $k$ (number of clusters)

  2. Initialize $k$ centroids randomly

  3. Assign each point to nearest centroid

  4. Update centroids as mean of assigned points

  5. Repeat 3-4 until convergence


Example:

Data: P1(2,3), P2(3,3), P3(6,8), P4(7,8), P5(8,7), P6(7,9)

$k=2$, Initial centroids: $\mu_1 = P1(2,3)$, $\mu_2 = P3(6,8)$

Iteration 1 - Assignment (Euclidean Distance):

Point Coordinates $d(\mu_1)$ $d(\mu_2)$ Cluster
P1 (2,3) $\sqrt{0+0}=0$ $\sqrt{16+25}=6.40$ C1
P2 (3,3) $\sqrt{1+0}=1$ $\sqrt{9+25}=5.83$ C1
P3 (6,8) $\sqrt{16+25}=6.40$ $\sqrt{0+0}=0$ C2
P4 (7,8) $\sqrt{25+25}=7.07$ $\sqrt{1+0}=1$ C2
P5 (8,7) $\sqrt{36+16}=7.21$ $\sqrt{4+1}=2.24$ C2
P6 (7,9) $\sqrt{25+36}=7.81$ $\sqrt{1+1}=1.41$ C2

Clusters: $C1 = {P1, P2}$, $C2 = {P3, P4, P5, P6}$

Update Centroids:

$$\mu_1 = \frac{(2,3)+(3,3)}{2} = (2.5, 3)$$
$$\mu_2 = \frac{(6,8)+(7,8)+(8,7)+(7,9)}{4} = (7, 8)$$

Iteration 2 - Assignment:

Point $d(\mu_1=(2.5,3))$ $d(\mu_2=(7,8))$ Cluster
P1 $\sqrt{0.25+0}=0.5$ $\sqrt{25+25}=7.07$ C1
P2 $\sqrt{0.25+0}=0.5$ $\sqrt{16+25}=6.40$ C1
P3 $\sqrt{12.25+25}=6.10$ $\sqrt{1+0}=1$ C2
P4 $\sqrt{20.25+25}=6.73$ $\sqrt{0+0}=0$ C2
P5 $\sqrt{30.25+16}=6.80$ $\sqrt{1+1}=1.41$ C2
P6 $\sqrt{20.25+36}=7.50$ $\sqrt{0+1}=1$ C2

No change in cluster membership → Converged!

Final Clusters:

  • C1: {P1(2,3), P2(3,3)} with centroid (2.5, 3)

  • C2: {P3(6,8), P4(7,8), P5(8,7), P6(7,9)} with centroid (7, 8)

Diagram:

flowchart TB A[Start] --> B[Select k=2] B --> C["Initialize: μ1=(2,3), μ2=(6,8)"] C --> D[Calculate distances to centroids] D --> E["Assign: C1={P1,P2}, C2={P3,P4,P5,P6}"] E --> F["Update: μ1=(2.5,3), μ2=(7,8)"] F --> G[Recalculate distances] G --> H{Clusters changed?} H -- No --> I[Output final clusters] I --> J[End]

Q15. 🔥 Calculate cluster heads using Manhattan distance and K-means algorithm.

[ Jun 2025 Q2(b) | Frequency: 1 ]

Answer

Manhattan Distance Formula:

$$d(x, y) = \sum_{i=1}^{n} |x_i - y_i| = |x_1 - y_1| + |x_2 - y_2|$$


Problem:

Data Points: A(1,2), B(2,1), C(4,3), D(5,4), E(6,5), F(7,6)

$k=2$, Initial centroids: $\mu_1 = A(1,2)$, $\mu_2 = D(5,4)$


Iteration 1 - Assignment:

Point Coordinates $d_{Manhattan}(\mu_1)$ $d_{Manhattan}(\mu_2)$ Cluster
A (1,2) $|1-1|+|2-2|=0$ $|1-5|+|2-4|=6$ C1
B (2,1) $|2-1|+|1-2|=2$ $|2-5|+|1-4|=6$ C1
C (4,3) $|4-1|+|3-2|=4$ $|4-5|+|3-4|=2$ C2
D (5,4) $|5-1|+|4-2|=6$ $|5-5|+|4-4|=0$ C2
E (6,5) $|6-1|+|5-2|=8$ $|6-5|+|5-4|=2$ C2
F (7,6) $|7-1|+|6-2|=10$ $|7-5|+|6-4|=4$ C2

Clusters after Iteration 1:

  • $C1 = {A, B}$

  • $C2 = {C, D, E, F}$

Update Centroids:

$$\mu_1 = \frac{(1,2)+(2,1)}{2} = (1.5, 1.5)$$
$$\mu_2 = \frac{(4,3)+(5,4)+(6,5)+(7,6)}{4} = (5.5, 4.5)$$


Iteration 2 - Assignment:

Point Coordinates $d_{Manhattan}(\mu_1)$ $d_{Manhattan}(\mu_2)$ Cluster
A (1,2) $|1-1.5|+|2-1.5|=1$ $|1-5.5|+|2-4.5|=7$ C1
B (2,1) $|2-1.5|+|1-1.5|=1$ $|2-5.5|+|1-4.5|=7$ C1
C (4,3) $|4-1.5|+|3-1.5|=4$ $|4-5.5|+|3-4.5|=3$ C2
D (5,4) $|5-1.5|+|4-1.5|=6$ $|5-5.5|+|4-4.5|=1$ C2
E (6,5) $|6-1.5|+|5-1.5|=8$ $|6-5.5|+|5-4.5|=1$ C2
F (7,6) $|7-1.5|+|6-1.5|=10$ $|7-5.5|+|6-4.5|=3$ C2

No change in cluster membership → Converged!


Final Results:

Cluster Members Cluster Head (Centroid)
C1 A(1,2), B(2,1) (1.5, 1.5)
C2 C(4,3), D(5,4), E(6,5), F(7,6) (5.5, 4.5)

Diagram:

ditaa diagram


Q16. 📌 Give example of K-means algorithm.

[ Dec 2023 Q4(c)(ii) | Frequency: 1 ]

Answer

Problem: Cluster the following 2D points into 2 groups using K-means.

Dataset: P1(2,10), P2(2,5), P3(8,4), P4(5,8), P5(7,5), P6(6,4), P7(1,2), P8(4,9)

Parameters: $k=2$, Initial centroids: $\mu_1 = P1(2,10)$, $\mu_2 = P3(8,4)$


Iteration 1 - Assignment:

Point Coordinates $d(\mu_1)$ $d(\mu_2)$ Cluster
P1 (2,10) $0$ $\sqrt{36+36}=8.49$ C1
P2 (2,5) $\sqrt{0+25}=5$ $\sqrt{36+1}=6.08$ C1
P3 (8,4) $8.49$ $0$ C2
P4 (5,8) $\sqrt{9+4}=3.61$ $\sqrt{9+16}=5$ C1
P5 (7,5) $\sqrt{25+25}=7.07$ $\sqrt{1+1}=1.41$ C2
P6 (6,4) $\sqrt{16+36}=7.21$ $\sqrt{4+0}=2$ C2
P7 (1,2) $\sqrt{1+64}=8.06$ $\sqrt{49+4}=7.28$ C2
P8 (4,9) $\sqrt{4+1}=2.24$ $\sqrt{16+25}=6.40$ C1

Clusters: $C1={P1,P2,P4,P8}$, $C2={P3,P5,P6,P7}$

Update Centroids:

$$\mu_1 = \frac{(2,10)+(2,5)+(5,8)+(4,9)}{4} = \frac{(13,32)}{4} = (3.25, 8)$$
$$\mu_2 = \frac{(8,4)+(7,5)+(6,4)+(1,2)}{4} = \frac{(22,15)}{4} = (5.5, 3.75)$$


Iteration 2 - Assignment:

Point $d(\mu_1=(3.25,8))$ $d(\mu_2=(5.5,3.75))$ Cluster
P1 $\sqrt{1.56+4}=2.36$ $\sqrt{12.25+39.06}=7.16$ C1
P2 $\sqrt{1.56+9}=3.25$ $\sqrt{12.25+1.56}=3.72$ C1
P3 $\sqrt{22.56+16}=6.21$ $\sqrt{6.25+0.06}=2.51$ C2
P4 $\sqrt{3.06+0}=1.75$ $\sqrt{0.25+18.06}=4.28$ C1
P5 $\sqrt{14.06+9}=4.80$ $\sqrt{2.25+1.56}=1.95$ C2
P6 $\sqrt{7.56+16}=4.86$ $\sqrt{0.25+0.06}=0.56$ C2
P7 $\sqrt{5.06+36}=6.41$ $\sqrt{20.25+3.06}=4.83$ C2
P8 $\sqrt{0.56+1}=1.25$ $\sqrt{2.25+27.56}=5.46$ C1

No change in cluster membership → Converged!


Final Results:

Cluster Members Centroid Interpretation
C1 P1, P2, P4, P8 (3.25, 8) Upper-left region (high Y)
C2 P3, P5, P6, P7 (5.5, 3.75) Lower-right region (low Y)

Diagram:

ditaa diagram


Quick Revision: Key Clustering Formulas

Concept Formula
Euclidean Distance $d(x,y) = \sqrt{\sum_{i=1}^{n}(x_i - y_i)^2}$
Manhattan Distance $d(x,y) = \sum_{i=1}^{n} \lvert x_i - y_i \rvert$
K-Means Objective $J = \sum_{i=1}^{k} \sum_{x \in C_i} \lVert x - \mu_i \rVert^2$
Centroid Update $\mu_i = \frac{1}{\lvert C_i \rvert} \sum_{x \in C_i} x$
Single Linkage $d(C_i, C_j) = \min_{x \in C_i,\, y \in C_j} d(x,y)$
Complete Linkage $d(C_i, C_j) = \max_{x \in C_i,\, y \in C_j} d(x,y)$
Average Linkage $d(C_i, C_j) = \frac{1}{\lvert C_i \rvert \lvert C_j \rvert} \sum_{x \in C_i} \sum_{y \in C_j} d(x,y)$
DBSCAN Core Point $\lvert N_\varepsilon(p) \rvert \geq \text{MinPts}$

UNIT 16: Machine Learning Programming using Python

(This unit covers practical implementations. Questions related to specific algorithms are mapped to their respective theoretical units above.)

Unit 16 Total: 0 families (all covered in respective algorithm units)