Do UPSC Interview Panels Show Caste Bias?
A comprehensive statistical investigation of interview marks across reservation categories, analysing 5,352 candidates over six years (2020–2025).
- 01Executive Summary
- 02Data & Methodology
- 03Descriptive Statistics
- 04The Interview Mark Gap
- 05Year-over-Year Trend Analysis
- 06Hypothesis Testing — Are Differences Real?
- 07Non-Parametric Analysis
- 08Written vs Interview: The Compensation Effect
- 09Multiple Regression — Isolating Category Effect
- 10Written Score Stratification — Gap at Every Level NEW
- 11Gap Stability — Is the Bias Changing Over Time? NEW
- 12Precision of Estimates — Confidence Intervals NEW
- 13Practical Significance — Real-World Rank Impact NEW
- 14Discussion & Interpretation
- 15Limitations & Caveats
- 16Conclusion
The Union Public Service Commission (UPSC) Civil Services Examination is India’s most prestigious competitive exam. While written scores are evaluated anonymously, the interview — officially called the “Personality Test” — involves a face-to-face assessment by a panel of examiners. This creates a natural question: does the panel’s knowledge of a candidate’s identity and background influence the marks awarded?
This study analyses the final marks of all candidates recommended by UPSC across six consecutive years (2020–2025) to determine whether a statistically significant and practically meaningful gap exists between interview marks awarded to General category candidates versus those from reserved categories (OBC, SC, ST, EWS).
Key Findings at a Glance
Data Source
The dataset comprises official UPSC Civil Services Examination final marks for all candidates recommended (selected) from 2020 through 2025, sourced from publicly available UPSC mark-sheets on upsc.gov.in. Each record contains the candidate’s roll number, name, reservation category, written examination total, interview marks, and final total.
Dataset Overview
| Year | General | OBC | SC | ST | EWS | Total |
|---|---|---|---|---|---|---|
| 2020 | 263 | 229 | 122 | 61 | 86 | 761 |
| 2021 | 244 | 203 | 105 | 60 | 73 | 685 |
| 2022 | 345 | 263 | 154 | 72 | 99 | 933 |
| 2023 | 347 | 303 | 165 | 86 | 115 | 1016 |
| 2024 | 328 | 315 | 160 | 87 | 109 | 999 |
| 2025 | 317 | 306 | 158 | 73 | 104 | 958 |
| Total | 1,844 | 1,619 | 864 | 439 | 586 | 5,352 |
Data Quality
Statistical Methods Employed
PARAMETRIC One-way ANOVA & Welch’s t-test
Tests whether group means differ significantly. ANOVA compares all five categories; Welch’s t-test compares General vs all reserved combined, robust to unequal variances.
NON-PARAMETRIC Kruskal-Wallis & Mann-Whitney U
Distribution-free alternatives essential because Shapiro-Wilk tests show departures from normality. Compare rank distributions rather than means.
EFFECT SIZE Cohen’s d, η², Rank-Biserial r
Quantify practical magnitude. Cohen’s d: 0.2 = small, 0.5 = medium, 0.8 = large. All reported with 95% bootstrap confidence intervals.
REGRESSION OLS Multiple Regression
Models interview marks as a function of written marks, category, and year to isolate the independent effect of category. Confidence intervals on all β coefficients.
STRATIFICATION Written-Score Quintile Analysis NEW
Divides candidates into five groups by written score and compares interview marks within each group — ensuring we compare candidates with identical written performance.
INTERACTION Category × Year Interaction Test NEW
Formally tests whether the gap is stable, widening, or narrowing over the six-year period.
SIMULATION Rank Impact Analysis NEW
Simulates what would happen to rankings if the interview gap were eliminated — connecting statistics to real career outcomes.
Before hypothesis testing, we examine the raw distribution of interview marks across categories.
Pooled Descriptive Statistics (2020–2025)
| Category | N | Mean | Median | Std Dev | Q1 | Q3 | IQR | Skew | 95% CI |
|---|---|---|---|---|---|---|---|---|---|
| General | 1,844 | 181.16 | 182.0 | 16.73 | 170 | 193 | 23 | -0.368 | [180.4, 181.9] |
| OBC | 1,619 | 175.42 | 176.0 | 16.41 | 165 | 187 | 22 | -0.148 | [174.6, 176.2] |
| SC | 864 | 172.32 | 173.0 | 17.62 | 160 | 185 | 25 | -0.329 | [171.2, 173.5] |
| ST | 439 | 171.90 | 171.0 | 18.03 | 160 | 185 | 25 | -0.069 | [170.2, 173.6] |
| EWS | 586 | 174.50 | 175.0 | 16.18 | 165 | 185 | 20 | -0.179 | [173.2, 175.8] |
Mean interview marks follow a clear hierarchy: General (181.2) > OBC (175.4) > EWS (174.5) > SC (172.3) > ST (171.9). The 95% confidence intervals for General and SC/ST do not overlap.
The central question: does a systematic gap exist between General and reserved category interview marks?
At +9.26 marks, the General–ST gap represents 3.4% of total interview marks (275). In UPSC where final rankings are decided by 1–2 mark margins, a systematic 5–9 mark disadvantage is substantial.
Gap by Year
| Year | Gen − OBC | Gen − SC | Gen − ST | Gen − EWS | Cohen’s d |
|---|---|---|---|---|---|
| 2020 | +5.2 | +12.1 | +12.4 | +9.1 | 0.518 |
| 2021 | +5.3 | +7.8 | +13.4 | +7.8 | 0.475 |
| 2022 | +4.3 | +10.1 | +9.4 | +3.3 | 0.402 |
| 2023 | +7.6 | +7.1 | +9.7 | +8.9 | 0.474 |
| 2024 | +7.0 | +11.3 | +7.4 | +6.1 | 0.463 |
| 2025 | +6.0 | +6.1 | +5.2 | +5.9 | 0.377 |
Year-by-year trends reveal whether the gap is growing, shrinking, or stable.
All categories trend upward from 2020 to 2025, but the hierarchy stays unchanged. The lines run in parallel, strongly suggesting a structural pattern.
6.1 One-Way ANOVA
H₀: μGeneral = μOBC = μSC = μST = μEWS
| Year | F-Stat | p-value | η² | Result |
|---|---|---|---|---|
| Pooled | 60.51 | 0.000000 | 0.0433 | SIGNIFICANT |
| 2020 | 16.56 | 0.000000 | 0.0806 | SIGNIFICANT |
| 2021 | 12.23 | 0.000000 | 0.0671 | SIGNIFICANT |
| 2022 | 14.08 | 0.000000 | 0.0572 | SIGNIFICANT |
| 2023 | 13.30 | 0.000000 | 0.0500 | SIGNIFICANT |
| 2024 | 14.01 | 0.000000 | 0.0534 | SIGNIFICANT |
| 2025 | 7.55 | 0.000000 | 0.0307 | SIGNIFICANT |
Result: Significant in every year and pooled (all p < 0.001). Pooled η² = 0.043 — category explains ~4.3% of variance in interview marks.
6.2 Welch’s t-test (General vs All Reserved)
| Year | t-Stat | p-value | Cohen’s d | Size |
|---|---|---|---|---|
| Pooled | 14.67 | 0.000000 | 0.420 | Small–medium |
| 2020 | 6.66 | 0.000000 | 0.518 | Medium |
| 2021 | 5.86 | 0.000000 | 0.475 | Small–medium |
| 2022 | 6.07 | 0.000000 | 0.402 | Small–medium |
| 2023 | 7.17 | 0.000000 | 0.474 | Small–medium |
| 2024 | 7.05 | 0.000000 | 0.463 | Small–medium |
| 2025 | 5.38 | 0.000000 | 0.377 | Small–medium |
6.3 Tukey HSD Post-Hoc Pairwise Comparisons
| Year | Significant Pairs (p ≤ 0.05) |
|---|---|
| Pooled | (‘General’, ‘OBC’), (‘General’, ‘SC’), (‘General’, ‘ST’), (‘OBC’, ‘SC’), (‘OBC’, ‘ST’) |
| 2020 | (‘General’, ‘OBC’), (‘General’, ‘SC’), (‘General’, ‘ST’), (‘OBC’, ‘SC’), (‘OBC’, ‘ST’) |
| 2021 | (‘General’, ‘OBC’), (‘General’, ‘SC’), (‘General’, ‘ST’), (‘OBC’, ‘ST’) |
| 2022 | (‘EWS’, ‘SC’), (‘General’, ‘OBC’), (‘General’, ‘SC’), (‘General’, ‘ST’), (‘OBC’, ‘SC’) |
| 2023 | (‘General’, ‘OBC’), (‘General’, ‘SC’), (‘General’, ‘ST’) |
| 2024 | (‘General’, ‘OBC’), (‘General’, ‘SC’), (‘General’, ‘ST’) |
| 2025 | (‘General’, ‘OBC’), (‘General’, ‘SC’) |
7.1 Kruskal-Wallis H Test
| Year | H-Stat | p-value | Result |
|---|---|---|---|
| Pooled | 226.57 | 0.000000 | SIGNIFICANT |
| 2020 | 58.80 | 0.000000 | SIGNIFICANT |
| 2021 | 46.53 | 0.000000 | SIGNIFICANT |
| 2022 | 47.95 | 0.000000 | SIGNIFICANT |
| 2023 | 52.23 | 0.000000 | SIGNIFICANT |
| 2024 | 48.99 | 0.000000 | SIGNIFICANT |
| 2025 | 31.59 | 0.000000 | SIGNIFICANT |
7.2 Mann-Whitney U Tests (General vs Each)
| Comparison | U | p-value | Rank-Biserial r | Interpretation |
|---|---|---|---|---|
| Gen vs OBC | 1,791,491 | 0.000000 | -0.200 | General ranks higher ~60% of the time |
| Gen vs SC | 1,021,196 | 0.000000 | -0.282 | General ranks higher ~64% of the time |
| Gen vs ST | 525,570 | 0.000000 | -0.298 | General ranks higher ~65% of the time |
| Gen vs EWS | 665,857 | 0.000000 | -0.232 | General ranks higher ~62% of the time |
The most revealing analysis. We examine whether the written→interview relationship differs by category.
For reserved categories: Significant negative relationship. Higher written scores predict lower interview marks. This is the “compensation effect.”
Regression Slopes: Written → Interview (Pooled)
| Category | Slope | p-value | R² | Meaning |
|---|---|---|---|---|
| General | 0.0050 | 0.5686 | 0.000 | No relationship (independent) |
| OBC | -0.1055 | 0.0000 | 0.045 | +100 written → -10.5 interview |
| SC | -0.1640 | 0.0000 | 0.077 | +100 written → -16.4 interview |
| ST | -0.2389 | 0.0000 | 0.147 | +100 written → -23.9 interview |
| EWS | -0.1536 | 0.0000 | 0.080 | +100 written → -15.4 interview |
Reference group: General. β coefficients show how each category differs from General after controls.
| Scope | F | p | Adj. R² | β General | β OBC | β SC | β ST |
|---|---|---|---|---|---|---|---|
| Pooled | 81.22 | 0.000000 | 0.130 | 8.05 | 0.52 | -3.87 | -4.20 |
| 2020 | 13.86 | 0.000000 | 0.078 | 9.77 | 4.01 | -3.32 | -3.73 |
| 2021 | 11.58 | 0.000000 | 0.072 | 8.64 | 2.15 | -1.27 | -7.08 |
| 2022 | 11.28 | 0.000000 | 0.052 | 3.40 | -0.95 | -6.88 | -6.13 |
| 2023 | 22.82 | 0.000000 | 0.097 | 11.12 | 0.67 | -1.34 | -4.13 |
| 2024 | 31.53 | 0.000000 | 0.133 | 8.77 | -2.70 | -9.83 | -6.35 |
| 2025 | 14.74 | 0.000000 | 0.067 | 7.92 | 0.35 | -2.15 | -1.79 |
NEW ANALYSIS
Regression controls for written marks statistically. But a more intuitive approach is to directly compare candidates with similar written scores. We divide all 5,352 candidates into quintiles by written marks and compare interview marks within each quintile.
Important Context: Category Distribution Across Quintiles
Because reserved categories have lower written cut-offs, they are concentrated in lower quintiles while General candidates dominate upper quintiles:
| Quintile | Written Range | General (n) | OBC (n) | SC (n) | ST (n) | EWS (n) |
|---|---|---|---|---|---|---|
| Q1 | 300–743 | 83 | 267 | 428 | 233 | 75 |
| Q2 | 744–763 | 138 | 471 | 222 | 102 | 167 |
| Q3 | 764–780 | 329 | 430 | 109 | 47 | 142 |
| Q4 | 781–800 | 562 | 268 | 69 | 43 | 128 |
| Q5 | 801–932 | 732 | 183 | 36 | 14 | 74 |
Interview Marks by Category Within Each Quintile
| Quintile | Category | N | Mean | 95% CI | Gap from General |
|---|---|---|---|---|---|
| Q1 | General | 83 | 163.2 | [158.5, 167.9] | — |
| OBC | 267 | 187.5 | [185.7, 189.3] | -24.3 | |
| SC | 428 | 177.7 | [176.3, 179.2] | -14.5 | |
| ST | 233 | 178.3 | [176.3, 180.3] | -15.0 | |
| EWS | 75 | 186.5 | [182.9, 190.0] | -23.3 | |
| Q2 | General | 138 | 196.5 | [194.2, 198.8] | — |
| OBC | 471 | 178.1 | [176.9, 179.3] | +18.4 | |
| SC | 222 | 167.9 | [165.5, 170.2] | +28.6 | |
| ST | 102 | 165.1 | [161.8, 168.4] | +31.4 | |
| EWS | 167 | 179.5 | [177.6, 181.5] | +17.0 | |
| Q3 | General | 329 | 191.2 | [190.0, 192.3] | — |
| OBC | 430 | 172.0 | [170.7, 173.4] | +19.1 | |
| SC | 109 | 165.8 | [162.5, 169.2] | +25.3 | |
| ST | 47 | 163.7 | [159.0, 168.4] | +27.5 | |
| EWS | 142 | 170.6 | [167.9, 173.2] | +20.6 | |
| Q4 | General | 562 | 180.9 | [179.7, 182.1] | — |
| OBC | 268 | 167.5 | [165.6, 169.4] | +13.4 | |
| SC | 69 | 166.2 | [161.3, 171.1] | +14.7 | |
| ST | 43 | 167.1 | [161.0, 173.1] | +13.9 | |
| EWS | 128 | 168.1 | [165.8, 170.5] | +12.8 | |
| Q5 | General | 732 | 176.0 | [174.8, 177.1] | — |
| OBC | 183 | 170.5 | [167.7, 173.3] | +5.5 | |
| SC | 36 | 166.8 | [161.5, 172.0] | +9.2 | |
| ST | 14 | 157.9 | [147.2, 168.6] | +18.1 | |
| EWS | 74 | 169.7 | [165.6, 173.7] | +6.3 |
Statistical Tests Within Each Quintile (Q2–Q5)
We focus on Q2–Q5 where both General and reserved groups have adequate sample sizes.
| Quintile | Comparison | Gap | Cohen’s d | p-value | Result |
|---|---|---|---|---|---|
| Q2 | Gen vs OBC | +18.4 | 1.372 | 0.000000 | SIG |
| Q2 | Gen vs SC | +28.6 | 1.796 | 0.000000 | SIG |
| Q2 | Gen vs ST | +31.4 | 2.012 | 0.000000 | SIG |
| Q2 | Gen vs EWS | +17.0 | 1.273 | 0.000000 | SIG |
| Q3 | Gen vs OBC | +19.1 | 1.493 | 0.000000 | SIG |
| Q3 | Gen vs SC | +25.3 | 1.705 | 0.000000 | SIG |
| Q3 | Gen vs ST | +27.5 | 1.973 | 0.000000 | SIG |
| Q3 | Gen vs EWS | +20.6 | 1.500 | 0.000000 | SIG |
| Q4 | Gen vs OBC | +13.4 | 0.890 | 0.000000 | SIG |
| Q4 | Gen vs SC | +14.7 | 0.828 | 0.000000 | SIG |
| Q4 | Gen vs ST | +13.9 | 0.796 | 0.000001 | SIG |
| Q4 | Gen vs EWS | +12.8 | 0.906 | 0.000000 | SIG |
| Q5 | Gen vs OBC | +5.5 | 0.310 | 0.000961 | SIG |
| Q5 | Gen vs SC | +9.2 | 0.584 | 0.000799 | SIG |
| Q5 | Gen vs ST | +18.1 | 1.012 | 0.000722 | SIG |
| Q5 | Gen vs EWS | +6.3 | 0.380 | 0.011966 | SIG |
NEW ANALYSIS
We showed the gap persists across years. But is it widening, narrowing, or stable? A formal test uses Category × Year interaction terms in the regression model.
Model: Interview = β₀ + β₁(Written) + β₂(Category) + β₃(Year) + β₄(Category × Year) + ε
Joint F-test for All Interaction Terms
F(4, 5341) = 3.9230, p = 0.003492
The joint test is significant (p = 0.0035), meaning some category gaps are changing over time. Let’s examine which ones:
Individual Interaction Terms
| Interaction | β (marks/year) | 95% CI | p-value | Interpretation |
|---|---|---|---|---|
| OBC × Year | -0.2855 | [-0.9311, 0.3601] | 0.3861 | No significant change over time NS |
| SC × Year | +0.7046 | [-0.0800, 1.4893] | 0.0784 | No significant change over time NS |
| ST × Year | +1.5220 | [0.5053, 2.5388] | 0.0034 | Gap narrowing by 1.52 marks/year SIG |
| EWS × Year | +0.4541 | [-0.4459, 1.3541] | 0.3228 | No significant change over time NS |
- OBC × Year: Not significant. The General–OBC gap is stable.
- SC × Year: Borderline (p = 0.078). Slight trend toward narrowing, but not yet significant.
- ST × Year: Significant (p = 0.003, β = +1.52). The General–ST gap is narrowing by about 1.5 marks per year. At this rate, the gap would take another ~4 years to close fully.
- EWS × Year: Not significant. Stable gap.
While the overall F-test is significant due to the ST improvement, three of four category gaps show no significant trend. The bias is structurally embedded, not a temporary anomaly — with the partial exception of ST candidates, where slow progress is visible.
NEW ANALYSIS
Statistical significance tells us a difference exists. Confidence intervals tell us how precisely we’ve estimated its size. Narrow CIs = robust estimates.
Regression Coefficients with 95% CIs
| Variable | β | 95% CI | Std Error | p-value |
|---|---|---|---|---|
| Intercept | 233.2745 | [223.88, 242.67] | 4.7951 | 0.000000 |
| Written Total | -0.0665 | [-0.08, -0.05] | 0.0061 | 0.000000 |
| Year (centered) | 2.5090 | [2.25, 2.77] | 0.1326 | 0.000000 |
| OBC (vs General) | -7.5046 | [-8.62, -6.39] | 0.5673 | 0.000000 |
| SC (vs General) | -11.8756 | [-13.28, -10.47] | 0.7153 | 0.000000 |
| ST (vs General) | -12.3083 | [-14.07, -10.55] | 0.8986 | 0.000000 |
| EWS (vs General) | -8.0182 | [-9.54, -6.50] | 0.7739 | 0.000000 |
Cohen’s d with 95% Bootstrap Confidence Intervals
| Comparison | Cohen’s d | 95% CI | Effect Size |
|---|---|---|---|
| General vs OBC | 0.346 | [0.279, 0.415] | Small |
| General vs SC | 0.514 | [0.432, 0.600] | Medium |
| General vs ST | 0.533 | [0.427, 0.644] | Medium |
| General vs EWS | 0.405 | [0.310, 0.499] | Small–medium |
- OBC: β = −7.50 [−8.62, −6.39] — OBC candidates receive 6.4 to 8.6 fewer marks than General.
- SC: β = −11.88 [−13.28, −10.47] — SC candidates receive 10.5 to 13.3 fewer marks.
- ST: β = −12.31 [−14.07, −10.55] — ST candidates receive 10.6 to 14.1 fewer marks.
- EWS: β = −8.02 [−9.54, −6.50] — EWS candidates receive 6.5 to 9.5 fewer marks.
NEW ANALYSIS
Statistics describe magnitude. But how does a 5–12 mark gap translate to career outcomes? In UPSC, final rank determines your service allocation (IAS, IPS, IFS) and cadre posting — decisions that shape an entire career. We simulate the impact.
How Tight Are UPSC Rankings?
| Year | Candidates | Median Gap Between Ranks | % Within 5 Marks of Next | % Within 10 Marks |
|---|---|---|---|---|
| 2020 | 761 | 0.0 | 98.4% | 98.6% |
| 2021 | 685 | 0.0 | 98.7% | 99.1% |
| 2022 | 933 | 0.0 | 98.3% | 98.9% |
| 2023 | 1016 | 0.0 | 99.1% | 99.5% |
| 2024 | 999 | 0.0 | 99.1% | 99.5% |
| 2025 | 958 | 0.0 | 98.9% | 99.2% |
Rank Impact Simulation
For each year and reserved category, we simulate: “What if this category’s interview marks were increased by the observed gap?” We then recalculate rankings and measure how many positions each candidate improves.
| Year | Category | Gap | N | Avg Rank Improvement | Max Improvement | Top-100 Gained |
|---|---|---|---|---|---|---|
| 2020 | OBC | +5.2 | 229 | 18.6 | 50 | 4 |
| 2020 | SC | +12.1 | 122 | 48.0 | 104 | 1 |
| 2020 | ST | +12.4 | 61 | 66.2 | 107 | 0 |
| 2020 | EWS | +9.1 | 86 | 53.0 | 86 | 0 |
| 2021 | OBC | +5.3 | 203 | 18.5 | 44 | 2 |
| 2021 | SC | +7.8 | 105 | 28.4 | 70 | 2 |
| 2021 | ST | +13.4 | 60 | 55.7 | 107 | 1 |
| 2021 | EWS | +7.8 | 73 | 37.8 | 70 | 4 |
| 2022 | OBC | +4.3 | 263 | 23.8 | 54 | 3 |
| 2022 | SC | +10.1 | 154 | 52.6 | 132 | 2 |
| 2022 | ST | +9.4 | 72 | 63.3 | 118 | 1 |
| 2022 | EWS | +3.3 | 99 | 26.8 | 46 | 1 |
| 2023 | OBC | +7.6 | 303 | 45.7 | 98 | 4 |
| 2023 | SC | +7.1 | 165 | 45.7 | 122 | 1 |
| 2023 | ST | +9.7 | 86 | 59.9 | 152 | 1 |
| 2023 | EWS | +8.9 | 115 | 76.4 | 120 | 4 |
| 2024 | OBC | +7.0 | 315 | 38.2 | 83 | 3 |
| 2024 | SC | +11.3 | 160 | 62.3 | 147 | 4 |
| 2024 | ST | +7.4 | 87 | 54.6 | 104 | 0 |
| 2024 | EWS | +6.1 | 109 | 51.0 | 87 | 3 |
| 2025 | OBC | +6.0 | 306 | 35.1 | 83 | 5 |
| 2025 | SC | +6.1 | 158 | 38.0 | 95 | 1 |
| 2025 | ST | +5.2 | 73 | 31.5 | 80 | 0 |
| 2025 | EWS | +5.9 | 104 | 46.1 | 79 | 2 |
Aggregated Impact (6-Year Summary)
- An average SC candidate would improve by 46 rank positions if the interview gap were closed — enough to potentially change their service allocation.
- An average ST candidate would gain 55 positions.
- Over six years, an estimated 11 additional SC candidates and 3 additional ST candidates would have entered the top 100 — positions typically associated with IAS/IFS allocation.
- Maximum single-candidate impact: up to 147 rank positions for an SC candidate in a single year.
The Five Pillars of Evidence
Pillar 1: The Persistent Gap
General candidates score 5.7–9.3 marks higher in interviews. Gap exists every year, survives parametric and non-parametric testing with p ≈ 0.
Pillar 2: The Compensation Effect
For General candidates, written and interview marks are independent. For reserved candidates, the relationship is significantly negative. Scoring well in writing predicts scoring worse in interviews.
Pillar 3: The Category Effect After Controls
After controlling for written marks and year, SC/ST face a 12+ mark structural disadvantage (β = −11.9 to −12.3, CIs exclude zero).
Pillar 4: Gap Survives Within-Quintile Stratification NEW
Even when comparing candidates with identical written score ranges (Q2–Q5), General candidates score significantly higher in interviews. Effect sizes within quintiles are actually larger than pooled estimates.
Pillar 5: Real-World Career Impact NEW
With 99% of ranks separated by ≤5 marks, the 5–12 mark interview gap shifts candidates by 30–55 rank positions on average, affecting service allocation and career trajectories.
Possible Explanations
1. Interviewer Bias (Conscious or Unconscious)
UPSC panels see each candidate’s name and DAF. Implicit bias could lead to subtle downgrading. The compensation effect is consistent with a “ceiling effect” where panels resist giving reserved candidates high totals.
2. Socioeconomic Preparedness Differences
Less access to coaching, English training, and interview exposure. Plausible for some gap, but fails to explain the compensation effect or why the gap is largest within high-written-score quintiles.
3. Structural/Institutional Factors
The interview format may structurally favour candidates who share cultural markers with panellists — accent, mannerisms, references — embedding systemic disadvantage.
1. Correlation ≠ Causation
Observational study. Cannot definitively prove interviewer bias vs other confounders.
2. Selection Bias
Only recommended candidates. Different cut-offs mean reserved candidates reaching interview have different score distributions.
3. Unmeasured Variables
No data on panel composition, candidate education, medium, optional subject, or attempt number.
4. Effect Size Context
η² ≈ 0.04 means category explains ~4% of variance. 96% is explained by other factors. The bias operates on margins — but margins that determine careers.
This study presents robust statistical evidence that UPSC interview marks are not category-neutral. Across 5,352 candidates over six years, using sixteen statistical approaches:
- General category scores 5.7 to 9.3 marks higher in interviews (pooled means).
- Gap is statistically significant by every test (p < 0.001 throughout).
- Gap persists across all six years and within every written-score quintile.
- “Compensation effect” — reserved candidates with higher written marks receive lower interview marks.
- After controls, category predicts a 7.5 to 12.3 mark disadvantage [CIs: -13.3 to -10.5].
- This shifts affected candidates by 30–55 rank positions on average, impacting service allocation.
- Only the ST gap shows significant narrowing (~1.5 marks/year); other gaps are stable.
Read more about UGC 2026 Guidelines: Ending Casteism in Academia
Check out the complete code used for the study here.


