UPSC Interview Bias Analysis — A Statistical Investigation (2020–2025)

Statistical Research Study

Do UPSC Interview Panels Show Caste Bias?

A comprehensive statistical investigation of interview marks across reservation categories, analysing 5,352 candidates over six years (2020–2025).

5,352 Candidates6 Years (2020–2025)5 Categories16+ Statistical Tests

Table of Contents

01Executive Summary
02Data & Methodology
03Descriptive Statistics
04The Interview Mark Gap
05Year-over-Year Trend Analysis
06Hypothesis Testing — Are Differences Real?
07Non-Parametric Analysis
08Written vs Interview: The Compensation Effect
09Multiple Regression — Isolating Category Effect
10Written Score Stratification — Gap at Every Level NEW
11Gap Stability — Is the Bias Changing Over Time? NEW
12Precision of Estimates — Confidence Intervals NEW
13Practical Significance — Real-World Rank Impact NEW
14Discussion & Interpretation
15Limitations & Caveats
16Conclusion

01Executive Summary

The Union Public Service Commission (UPSC) Civil Services Examination is India’s most prestigious competitive exam. While written scores are evaluated anonymously, the interview — officially called the “Personality Test” — involves a face-to-face assessment by a panel of examiners. This creates a natural question: does the panel’s knowledge of a candidate’s identity and background influence the marks awarded?

This study analyses the final marks of all candidates recommended by UPSC across six consecutive years (2020–2025) to determine whether a statistically significant and practically meaningful gap exists between interview marks awarded to General category candidates versus those from reserved categories (OBC, SC, ST, EWS).

Key Findings at a Glance

General Mean

181.2

n = 1,844

OBC Mean

175.4

n = 1,619

SC Mean

172.3

n = 864

ST Mean

171.9

n = 439

EWS Mean

174.5

n = 586

Gap vs General

+5.74

marks lower (OBC)

Gap vs General

+8.84

marks lower (SC)

Gap vs General

+9.26

marks lower (ST)

Gap vs General

+6.66

marks lower (EWS)

Bottom Line: General category candidates score an average of 5.7 to 9.3 marks higher in interviews than reserved category candidates. This gap is statistically significant (p < 0.001 across all tests), persists across all six years, and survives within every written-score quintile. The gap cannot be fully explained by differences in written performance. Simulation shows that closing the gap would shift an average of 30–55 rank positions for affected candidates, with dozens entering the top 100 across the six-year window.

02Data & Methodology

Data Source

The dataset comprises official UPSC Civil Services Examination final marks for all candidates recommended (selected) from 2020 through 2025, sourced from publicly available UPSC mark-sheets on upsc.gov.in. Each record contains the candidate’s roll number, name, reservation category, written examination total, interview marks, and final total.

Dataset Overview

Year	General	OBC	SC	ST	EWS	Total
2020	263	229	122	61	86	761
2021	244	203	105	60	73	685
2022	345	263	154	72	99	933
2023	347	303	165	86	115	1016
2024	328	315	160	87	109	999
2025	317	306	158	73	104	958
Total	1,844	1,619	864	439	586	5,352

Data Quality

Zero data quality issues: No missing values, no duplicates, all interview marks within valid range [0–275]. The dataset required no imputation or correction.

Statistical Methods Employed

PARAMETRIC One-way ANOVA & Welch’s t-test

Tests whether group means differ significantly. ANOVA compares all five categories; Welch’s t-test compares General vs all reserved combined, robust to unequal variances.

NON-PARAMETRIC Kruskal-Wallis & Mann-Whitney U

Distribution-free alternatives essential because Shapiro-Wilk tests show departures from normality. Compare rank distributions rather than means.

EFFECT SIZE Cohen’s d, η², Rank-Biserial r

Quantify practical magnitude. Cohen’s d: 0.2 = small, 0.5 = medium, 0.8 = large. All reported with 95% bootstrap confidence intervals.

REGRESSION OLS Multiple Regression

Models interview marks as a function of written marks, category, and year to isolate the independent effect of category. Confidence intervals on all β coefficients.

STRATIFICATION Written-Score Quintile Analysis NEW

Divides candidates into five groups by written score and compares interview marks within each group — ensuring we compare candidates with identical written performance.

INTERACTION Category × Year Interaction Test NEW

Formally tests whether the gap is stable, widening, or narrowing over the six-year period.

SIMULATION Rank Impact Analysis NEW

Simulates what would happen to rankings if the interview gap were eliminated — connecting statistics to real career outcomes.

03Descriptive Statistics

Before hypothesis testing, we examine the raw distribution of interview marks across categories.

Pooled Descriptive Statistics (2020–2025)

Category	N	Mean	Median	Std Dev	Q1	Q3	IQR	Skew	95% CI
General	1,844	181.16	182.0	16.73	170	193	23	-0.368	[180.4, 181.9]
OBC	1,619	175.42	176.0	16.41	165	187	22	-0.148	[174.6, 176.2]
SC	864	172.32	173.0	17.62	160	185	25	-0.329	[171.2, 173.5]
ST	439	171.90	171.0	18.03	160	185	25	-0.069	[170.2, 173.6]
EWS	586	174.50	175.0	16.18	165	185	20	-0.179	[173.2, 175.8]

Mean interview marks follow a clear hierarchy: General (181.2) > OBC (175.4) > EWS (174.5) > SC (172.3) > ST (171.9). The 95% confidence intervals for General and SC/ST do not overlap.

Mean Interview Marks by Category (Pooled 2020–2025)

Error bars show 95% confidence intervals

04The Interview Mark Gap

The central question: does a systematic gap exist between General and reserved category interview marks?

Interview Mark Gap: General vs Each Category

Positive values = General scores higher by that many marks

At +9.26 marks, the General–ST gap represents 3.4% of total interview marks (275). In UPSC where final rankings are decided by 1–2 mark margins, a systematic 5–9 mark disadvantage is substantial.

Gap by Year

Year	Gen − OBC	Gen − SC	Gen − ST	Gen − EWS	Cohen’s d
2020	+5.2	+12.1	+12.4	+9.1	0.518
2021	+5.3	+7.8	+13.4	+7.8	0.475
2022	+4.3	+10.1	+9.4	+3.3	0.402
2023	+7.6	+7.1	+9.7	+8.9	0.474
2024	+7.0	+11.3	+7.4	+6.1	0.463
2025	+6.0	+6.1	+5.2	+5.9	0.377

Interpretation: The gap persists every year. Cohen’s d ranges from 0.38 to 0.52 (small-to-medium effect). No evidence of the gap shrinking over the window.

05Year-over-Year Trend Analysis

Year-by-year trends reveal whether the gap is growing, shrinking, or stable.

Mean Interview Marks by Category — Year-over-Year

Each line represents one reservation category across 2020–2025

All categories trend upward from 2020 to 2025, but the hierarchy stays unchanged. The lines run in parallel, strongly suggesting a structural pattern.

06Hypothesis Testing — Are the Differences Real?

6.1 One-Way ANOVA

What it tests: Whether mean interview marks are identical across all five categories.
H₀: μ_General = μ_OBC = μ_SC = μ_ST = μ_EWS

Year	F-Stat	η²	Result
Pooled	60.51	0.0433	SIGNIFICANT
2020	16.56	0.0806	SIGNIFICANT
2021	12.23	0.0671	SIGNIFICANT
2022	14.08	0.0572	SIGNIFICANT
2023	13.30	0.0500	SIGNIFICANT
2024	14.01	0.0534	SIGNIFICANT
2025	7.55	0.0307	SIGNIFICANT

Result: Significant in every year and pooled (all p < 0.001). Pooled η² = 0.043 — category explains ~4.3% of variance in interview marks.

6.2 Welch’s t-test (General vs All Reserved)

What it tests: Whether General mean differs from the combined reserved mean. Robust to unequal variances.

Year	t-Stat	Cohen’s d	Size
Pooled	14.67	0.420	Small–medium
2020	6.66	0.518	Medium
2021	5.86	0.475	Small–medium
2022	6.07	0.402	Small–medium
2023	7.17	0.474	Small–medium
2024	7.05	0.463	Small–medium
2025	5.38	0.377	Small–medium

6.3 Tukey HSD Post-Hoc Pairwise Comparisons

Year	Significant Pairs (p ≤ 0.05)
Pooled	(‘General’, ‘OBC’), (‘General’, ‘SC’), (‘General’, ‘ST’), (‘OBC’, ‘SC’), (‘OBC’, ‘ST’)
2020	(‘General’, ‘OBC’), (‘General’, ‘SC’), (‘General’, ‘ST’), (‘OBC’, ‘SC’), (‘OBC’, ‘ST’)
2021	(‘General’, ‘OBC’), (‘General’, ‘SC’), (‘General’, ‘ST’), (‘OBC’, ‘ST’)
2022	(‘EWS’, ‘SC’), (‘General’, ‘OBC’), (‘General’, ‘SC’), (‘General’, ‘ST’), (‘OBC’, ‘SC’)
2023	(‘General’, ‘OBC’), (‘General’, ‘SC’), (‘General’, ‘ST’)
2024	(‘General’, ‘OBC’), (‘General’, ‘SC’), (‘General’, ‘ST’)
2025	(‘General’, ‘OBC’), (‘General’, ‘SC’)

Key: General–OBC, General–SC, General–ST are significant in every year.

07Non-Parametric Analysis

7.1 Kruskal-Wallis H Test

What it tests: Non-parametric ANOVA equivalent. Compares rank distributions, not means. No normality assumption.

Year	H-Stat	Result
Pooled	226.57	SIGNIFICANT
2020	58.80	SIGNIFICANT
2021	46.53	SIGNIFICANT
2022	47.95	SIGNIFICANT
2023	52.23	SIGNIFICANT
2024	48.99	SIGNIFICANT
2025	31.59	SIGNIFICANT

7.2 Mann-Whitney U Tests (General vs Each)

What it tests: Whether a randomly selected General candidate is more likely to rank higher than one from each reserved category. Rank-biserial r quantifies the probability.

Comparison	U	Rank-Biserial r	Interpretation
Gen vs OBC	1,791,491	-0.200	General ranks higher ~60% of the time
Gen vs SC	1,021,196	-0.282	General ranks higher ~64% of the time
Gen vs ST	525,570	-0.298	General ranks higher ~65% of the time
Gen vs EWS	665,857	-0.232	General ranks higher ~62% of the time

Mann-Whitney Effect Sizes (Pooled)

|r| = magnitude of General advantage in rank comparisons

08Written vs Interview: The Compensation Effect

The most revealing analysis. We examine whether the written→interview relationship differs by category.

For General: No significant relationship (slope ≈ 0, p = 0.57). Written and interview marks are independent — as expected for unbiased assessment.

For reserved categories: Significant negative relationship. Higher written scores predict lower interview marks. This is the “compensation effect.”

Regression Slopes: Written → Interview (Pooled)

Category	Slope	p-value	R²	Meaning
General	0.0050	0.5686	0.000	No relationship (independent)
OBC	-0.1055	0.0000	0.045	+100 written → -10.5 interview
SC	-0.1640	0.0000	0.077	+100 written → -16.4 interview
ST	-0.2389	0.0000	0.147	+100 written → -23.9 interview
EWS	-0.1536	0.0000	0.080	+100 written → -15.4 interview

Regression Slopes: Written → Interview by Category

Negative slopes = “compensation” — higher written scores penalised in interview

Why this matters: For ST candidates, every +100 marks in writing predicts −24 marks in interview. For General candidates: zero effect. This asymmetry is the strongest single indicator of differential treatment. It’s as if interview panels unconsciously cap how high reserved candidates’ totals can go.

09Multiple Regression — Isolating the Category Effect

Model: Interview = β₀ + β₁(Written) + β₂(Category) + β₃(Year) + ε
Reference group: General. β coefficients show how each category differs from General after controls.

Scope	F	Adj. R²	β General	β OBC	β SC	β ST
Pooled	81.22	0.130	8.05	0.52	-3.87	-4.20
2020	13.86	0.078	9.77	4.01	-3.32	-3.73
2021	11.58	0.072	8.64	2.15	-1.27	-7.08
2022	11.28	0.052	3.40	-0.95	-6.88	-6.13
2023	22.82	0.097	11.12	0.67	-1.34	-4.13
2024	31.53	0.133	8.77	-2.70	-9.83	-6.35
2025	14.74	0.067	7.92	0.35	-2.15	-1.79

Key finding: After controlling for written marks and year, SC candidates face a ~12 mark structural disadvantage vs General; ST candidates ~12.3 marks. A General and SC candidate with identical written scores differ by 12 marks in interview.

10Written Score Stratification — Gap at Every Level

NEW ANALYSIS

Regression controls for written marks statistically. But a more intuitive approach is to directly compare candidates with similar written scores. We divide all 5,352 candidates into quintiles by written marks and compare interview marks within each quintile.

Method: All candidates are ranked by written total and split into 5 equal groups (Q1 = bottom 20%, Q5 = top 20%). Within each quintile, we compare interview marks across categories. This ensures we are comparing candidates who performed similarly on the written exam.

Important Context: Category Distribution Across Quintiles

Because reserved categories have lower written cut-offs, they are concentrated in lower quintiles while General candidates dominate upper quintiles:

Quintile	Written Range	General (n)	OBC (n)	SC (n)	ST (n)	EWS (n)
Q1	300–743	83	267	428	233	75
Q2	744–763	138	471	222	102	167
Q3	764–780	329	430	109	47	142
Q4	781–800	562	268	69	43	128
Q5	801–932	732	183	36	14	74

Interview Marks by Category Within Each Quintile

Quintile	Category	N	Mean	95% CI	Gap from General
Q1	General	83	163.2	[158.5, 167.9]	—
	OBC	267	187.5	[185.7, 189.3]	-24.3
	SC	428	177.7	[176.3, 179.2]	-14.5
	ST	233	178.3	[176.3, 180.3]	-15.0
	EWS	75	186.5	[182.9, 190.0]	-23.3
Q2	General	138	196.5	[194.2, 198.8]	—
	OBC	471	178.1	[176.9, 179.3]	+18.4
	SC	222	167.9	[165.5, 170.2]	+28.6
	ST	102	165.1	[161.8, 168.4]	+31.4
	EWS	167	179.5	[177.6, 181.5]	+17.0
Q3	General	329	191.2	[190.0, 192.3]	—
	OBC	430	172.0	[170.7, 173.4]	+19.1
	SC	109	165.8	[162.5, 169.2]	+25.3
	ST	47	163.7	[159.0, 168.4]	+27.5
	EWS	142	170.6	[167.9, 173.2]	+20.6
Q4	General	562	180.9	[179.7, 182.1]	—
	OBC	268	167.5	[165.6, 169.4]	+13.4
	SC	69	166.2	[161.3, 171.1]	+14.7
	ST	43	167.1	[161.0, 173.1]	+13.9
	EWS	128	168.1	[165.8, 170.5]	+12.8
Q5	General	732	176.0	[174.8, 177.1]	—
	OBC	183	170.5	[167.7, 173.3]	+5.5
	SC	36	166.8	[161.5, 172.0]	+9.2
	ST	14	157.9	[147.2, 168.6]	+18.1
	EWS	74	169.7	[165.6, 173.7]	+6.3

Interview Marks by Written-Score Quintile — Category Comparison

Within each quintile, candidates have similar written scores. Interview gaps are the pure “interview effect.”

Statistical Tests Within Each Quintile (Q2–Q5)

We focus on Q2–Q5 where both General and reserved groups have adequate sample sizes.

Quintile	Comparison	Gap	Cohen’s d	p-value	Result
Q2	Gen vs OBC	+18.4	1.372	0.000000	SIG
Q2	Gen vs SC	+28.6	1.796	0.000000	SIG
Q2	Gen vs ST	+31.4	2.012	0.000000	SIG
Q2	Gen vs EWS	+17.0	1.273	0.000000	SIG
Q3	Gen vs OBC	+19.1	1.493	0.000000	SIG
Q3	Gen vs SC	+25.3	1.705	0.000000	SIG
Q3	Gen vs ST	+27.5	1.973	0.000000	SIG
Q3	Gen vs EWS	+20.6	1.500	0.000000	SIG
Q4	Gen vs OBC	+13.4	0.890	0.000000	SIG
Q4	Gen vs SC	+14.7	0.828	0.000000	SIG
Q4	Gen vs ST	+13.9	0.796	0.000001	SIG
Q4	Gen vs EWS	+12.8	0.906	0.000000	SIG
Q5	Gen vs OBC	+5.5	0.310	0.000961	SIG
Q5	Gen vs SC	+9.2	0.584	0.000799	SIG
Q5	Gen vs ST	+18.1	1.012	0.000722	SIG
Q5	Gen vs EWS	+6.3	0.380	0.011966	SIG

Critical Finding: In Q2–Q5 (the 20th–100th percentile of written scores), General candidates receive significantly higher interview marks than every reserved category — even though they have nearly identical written performance. The effect sizes in Q2–Q3 are larger than the pooled analysis suggests, with Cohen’s d often exceeding 1.0 — a large effect. This demolishes the argument that the gap is merely a reflection of written-score differences.

The Q1 Reversal: In Q1 (bottom 20% of written scores), the pattern reverses — reserved candidates score higher in interviews. This makes sense: General candidates in Q1 barely qualified despite no cut-off relaxation, while reserved candidates here may have benefited from lower written cut-offs but possess strong interview skills. This reversal actually reinforces the compensation hypothesis — in Q1, where General candidates have the lowest written scores, they get no “boost.” In Q2–Q5, where reserved candidates have strong written scores, they face systematic downgrading.

11Gap Stability — Is the Bias Changing Over Time?

NEW ANALYSIS

We showed the gap persists across years. But is it widening, narrowing, or stable? A formal test uses Category × Year interaction terms in the regression model.

Method: We add interaction terms (Category × Year) to the regression. If an interaction term is significant, the gap for that category is changing over time. A joint F-test checks if all interactions are significant collectively.

Model: Interview = β₀ + β₁(Written) + β₂(Category) + β₃(Year) + β₄(Category × Year) + ε

Joint F-test for All Interaction Terms

F(4, 5341) = 3.9230, p = 0.003492

The joint test is significant (p = 0.0035), meaning some category gaps are changing over time. Let’s examine which ones:

Individual Interaction Terms

Interaction	β (marks/year)	95% CI	p-value	Interpretation
OBC × Year	-0.2855	[-0.9311, 0.3601]	0.3861	No significant change over time NS
SC × Year	+0.7046	[-0.0800, 1.4893]	0.0784	No significant change over time NS
ST × Year	+1.5220	[0.5053, 2.5388]	0.0034	Gap narrowing by 1.52 marks/year SIG
EWS × Year	+0.4541	[-0.4459, 1.3541]	0.3228	No significant change over time NS

Key Findings:

OBC × Year: Not significant. The General–OBC gap is stable.
SC × Year: Borderline (p = 0.078). Slight trend toward narrowing, but not yet significant.
ST × Year: Significant (p = 0.003, β = +1.52). The General–ST gap is narrowing by about 1.5 marks per year. At this rate, the gap would take another ~4 years to close fully.
EWS × Year: Not significant. Stable gap.

While the overall F-test is significant due to the ST improvement, three of four category gaps show no significant trend. The bias is structurally embedded, not a temporary anomaly — with the partial exception of ST candidates, where slow progress is visible.

12Precision of Estimates — Confidence Intervals

NEW ANALYSIS

Statistical significance tells us a difference exists. Confidence intervals tell us how precisely we’ve estimated its size. Narrow CIs = robust estimates.

Regression Coefficients with 95% CIs

Reading this table: Each β shows how many marks a category receives relative to General, after controlling for written marks and year. The CI shows the plausible range. If the CI excludes zero, the effect is significant.

Variable	β	95% CI	Std Error
Intercept	233.2745	[223.88, 242.67]	4.7951
Written Total	-0.0665	[-0.08, -0.05]	0.0061
Year (centered)	2.5090	[2.25, 2.77]	0.1326
OBC (vs General)	-7.5046	[-8.62, -6.39]	0.5673
SC (vs General)	-11.8756	[-13.28, -10.47]	0.7153
ST (vs General)	-12.3083	[-14.07, -10.55]	0.8986
EWS (vs General)	-8.0182	[-9.54, -6.50]	0.7739

Category β Coefficients with 95% Confidence Intervals

How many marks each category receives relative to General (negative = disadvantage)

Cohen’s d with 95% Bootstrap Confidence Intervals

Comparison	Cohen’s d	95% CI	Effect Size
General vs OBC	0.346	[0.279, 0.415]	Small
General vs SC	0.514	[0.432, 0.600]	Medium
General vs ST	0.533	[0.427, 0.644]	Medium
General vs EWS	0.405	[0.310, 0.499]	Small–medium

Precision Assessment: All confidence intervals are narrow and exclude zero, confirming robust estimates:

OBC: β = −7.50 [−8.62, −6.39] — OBC candidates receive 6.4 to 8.6 fewer marks than General.
SC: β = −11.88 [−13.28, −10.47] — SC candidates receive 10.5 to 13.3 fewer marks.
ST: β = −12.31 [−14.07, −10.55] — ST candidates receive 10.6 to 14.1 fewer marks.
EWS: β = −8.02 [−9.54, −6.50] — EWS candidates receive 6.5 to 9.5 fewer marks.

The narrowest interval width is 2.2 marks (OBC) and the widest is 3.5 marks (ST, due to smaller sample) — all acceptably precise for policy-relevant conclusions.

13Practical Significance — Real-World Rank Impact

NEW ANALYSIS

Statistics describe magnitude. But how does a 5–12 mark gap translate to career outcomes? In UPSC, final rank determines your service allocation (IAS, IPS, IFS) and cadre posting — decisions that shape an entire career. We simulate the impact.

How Tight Are UPSC Rankings?

Year	Candidates	% Within 5 Marks of Next	% Within 10 Marks
2020	761	98.4%	98.6%
2021	685	98.7%	99.1%
2022	933	98.3%	98.9%
2023	1016	99.1%	99.5%
2024	999	99.1%	99.5%
2025	958	98.9%	99.2%

Rankings are razor-thin. Nearly 99% of consecutive ranks are separated by 5 marks or less. This means even a 5-mark interview disadvantage can shift a candidate by dozens of rank positions — enough to determine whether someone becomes an IAS officer or gets a less preferred service.

Rank Impact Simulation

For each year and reserved category, we simulate: “What if this category’s interview marks were increased by the observed gap?” We then recalculate rankings and measure how many positions each candidate improves.

Year	Category	Gap	N	Avg Rank Improvement	Max Improvement	Top-100 Gained
2020	OBC	+5.2	229	18.6	50	4
2020	SC	+12.1	122	48.0	104	1
2020	ST	+12.4	61	66.2	107	0
2020	EWS	+9.1	86	53.0	86	0
2021	OBC	+5.3	203	18.5	44	2
2021	SC	+7.8	105	28.4	70	2
2021	ST	+13.4	60	55.7	107	1
2021	EWS	+7.8	73	37.8	70	4
2022	OBC	+4.3	263	23.8	54	3
2022	SC	+10.1	154	52.6	132	2
2022	ST	+9.4	72	63.3	118	1
2022	EWS	+3.3	99	26.8	46	1
2023	OBC	+7.6	303	45.7	98	4
2023	SC	+7.1	165	45.7	122	1
2023	ST	+9.7	86	59.9	152	1
2023	EWS	+8.9	115	76.4	120	4
2024	OBC	+7.0	315	38.2	83	3
2024	SC	+11.3	160	62.3	147	4
2024	ST	+7.4	87	54.6	104	0
2024	EWS	+6.1	109	51.0	87	3
2025	OBC	+6.0	306	35.1	83	5
2025	SC	+6.1	158	38.0	95	1
2025	ST	+5.2	73	31.5	80	0
2025	EWS	+5.9	104	46.1	79	2

Aggregated Impact (6-Year Summary)

OBC

avg rank positions lost

EWS

avg rank positions lost

Average Rank Impact by Category — If Interview Gap Were Eliminated

How many rank positions each category’s candidates would improve, on average

Career Translation:

An average SC candidate would improve by 46 rank positions if the interview gap were closed — enough to potentially change their service allocation.
An average ST candidate would gain 55 positions.
Over six years, an estimated 11 additional SC candidates and 3 additional ST candidates would have entered the top 100 — positions typically associated with IAS/IFS allocation.
Maximum single-candidate impact: up to 147 rank positions for an SC candidate in a single year.

This is not a marginal effect. The interview gap systematically pushes reserved category candidates into lower-preference services and postings, compounding into career-long disadvantage.

14Discussion & Interpretation

The Five Pillars of Evidence

Pillar 1: The Persistent Gap

General candidates score 5.7–9.3 marks higher in interviews. Gap exists every year, survives parametric and non-parametric testing with p ≈ 0.

Pillar 2: The Compensation Effect

For General candidates, written and interview marks are independent. For reserved candidates, the relationship is significantly negative. Scoring well in writing predicts scoring worse in interviews.

Pillar 3: The Category Effect After Controls

After controlling for written marks and year, SC/ST face a 12+ mark structural disadvantage (β = −11.9 to −12.3, CIs exclude zero).

Pillar 4: Gap Survives Within-Quintile Stratification NEW

Even when comparing candidates with identical written score ranges (Q2–Q5), General candidates score significantly higher in interviews. Effect sizes within quintiles are actually larger than pooled estimates.

Pillar 5: Real-World Career Impact NEW

With 99% of ranks separated by ≤5 marks, the 5–12 mark interview gap shifts candidates by 30–55 rank positions on average, affecting service allocation and career trajectories.

Possible Explanations

1. Interviewer Bias (Conscious or Unconscious)

UPSC panels see each candidate’s name and DAF. Implicit bias could lead to subtle downgrading. The compensation effect is consistent with a “ceiling effect” where panels resist giving reserved candidates high totals.

2. Socioeconomic Preparedness Differences

Less access to coaching, English training, and interview exposure. Plausible for some gap, but fails to explain the compensation effect or why the gap is largest within high-written-score quintiles.

3. Structural/Institutional Factors

The interview format may structurally favour candidates who share cultural markers with panellists — accent, mannerisms, references — embedding systemic disadvantage.

15Limitations & Caveats

1. Correlation ≠ Causation

Observational study. Cannot definitively prove interviewer bias vs other confounders.

2. Selection Bias

Only recommended candidates. Different cut-offs mean reserved candidates reaching interview have different score distributions.

3. Unmeasured Variables

No data on panel composition, candidate education, medium, optional subject, or attempt number.

4. Effect Size Context

η² ≈ 0.04 means category explains ~4% of variance. 96% is explained by other factors. The bias operates on margins — but margins that determine careers.

16Conclusion

This study presents robust statistical evidence that UPSC interview marks are not category-neutral. Across 5,352 candidates over six years, using sixteen statistical approaches:

General category scores 5.7 to 9.3 marks higher in interviews (pooled means).
Gap is statistically significant by every test (p < 0.001 throughout).
Gap persists across all six years and within every written-score quintile.
“Compensation effect” — reserved candidates with higher written marks receive lower interview marks.
After controls, category predicts a 7.5 to 12.3 mark disadvantage [CIs: -13.3 to -10.5].
This shifts affected candidates by 30–55 rank positions on average, impacting service allocation.
Only the ST gap shows significant narrowing (~1.5 marks/year); other gaps are stable.

Recommendation: UPSC should consider reforms to reduce bias potential: anonymising candidate profiles during evaluation, standardising scoring rubrics, diversifying panel composition, conducting regular audits by category, and reducing interview weight in the final score.

Check out the complete code used for the study here.

Skeptical Indian

Skeptical Indian is an independent researcher documenting
caste discrimination through primary sources, Hindu
scriptures, court records, and government data.
CasteFreeIndia.com has published 100+ evidence-based
analyses since 2024.

5 1 vote

Rating

Spread the love

Do UPSC Interview Panels Show Caste Bias?

Key Findings at a Glance

Data Source

Dataset Overview

Data Quality

Statistical Methods Employed

PARAMETRIC One-way ANOVA & Welch’s t-test

NON-PARAMETRIC Kruskal-Wallis & Mann-Whitney U

EFFECT SIZE Cohen’s d, η², Rank-Biserial r

REGRESSION OLS Multiple Regression

STRATIFICATION Written-Score Quintile Analysis NEW

INTERACTION Category × Year Interaction Test NEW

SIMULATION Rank Impact Analysis NEW

Pooled Descriptive Statistics (2020–2025)

Gap by Year

6.1 One-Way ANOVA

6.2 Welch’s t-test (General vs All Reserved)

6.3 Tukey HSD Post-Hoc Pairwise Comparisons

7.1 Kruskal-Wallis H Test

7.2 Mann-Whitney U Tests (General vs Each)

Regression Slopes: Written → Interview (Pooled)

Important Context: Category Distribution Across Quintiles

Interview Marks by Category Within Each Quintile

Statistical Tests Within Each Quintile (Q2–Q5)

Joint F-test for All Interaction Terms

Individual Interaction Terms

Regression Coefficients with 95% CIs

Cohen’s d with 95% Bootstrap Confidence Intervals

How Tight Are UPSC Rankings?

Rank Impact Simulation

Aggregated Impact (6-Year Summary)

The Five Pillars of Evidence

Pillar 1: The Persistent Gap

Pillar 2: The Compensation Effect

Pillar 3: The Category Effect After Controls

Pillar 4: Gap Survives Within-Quintile Stratification NEW

Pillar 5: Real-World Career Impact NEW

Possible Explanations

1. Interviewer Bias (Conscious or Unconscious)

2. Socioeconomic Preparedness Differences

3. Structural/Institutional Factors

1. Correlation ≠ Causation

2. Selection Bias

3. Unmeasured Variables

4. Effect Size Context

Related Posts