Research Question: Do multilingual embedding models encode culturally-specific semantic relationships between abstract moral concepts?
Answer: Yes. The geometric relationships between moral concepts vary significantly across languages, suggesting that embedding models absorb cultural values from training data.
The relationship between justice, mercy, and punishment differs by 21% across languages: This difference is statistically significant (F=48.0, p<0.0001).
| Language | Ratio* | Interpretation |
|---|---|---|
| ๐ฏ๐ต Japanese | 1.279 | Justice and mercy closely related |
| ๐ฌ๐ง English | 1.290 | Balanced relationship |
| ๐จ๐ณ Chinese | 1.308 | Moderate separation |
| ๐ธ๐ฆ Arabic | 1.476 | Greater conceptual distance |
| ๐ฎ๐ณ Hindi | 1.549 | Justice and mercy are distinct concepts |
*Ratio = distance(justiceโmercy) / distance(justiceโpunishment)
Statistical Validation:
Methodology
Model: paraphrase-multilingual-mpnet-base-v2 (768-dimensional embeddings)
Languages: English, Hindi, Japanese, Arabic, Chinese
Concepts: 10 abstract moral terms (justice, mercy, duty, honor, forgiveness, punishment, law, freedom, loyalty, sacrifice)
Metric: Cosine distance = 1 - cosine_similarity
Statistical Test: One-way ANOVA with bootstrap resampling (n=20 per language)
Visualization: Interactive Plotly HTML charts
Shows the 21% variation in concept relationships across languages
Visualizes how moral concepts cluster in Hindi embeddings
Visualizes how moral concepts cluster in Japanese embeddings
```bash
git clone https://github.com/SShreeya-Das/Embedding-compass.git cd embedding-compass
pip install -r requirements.txt
python analysis.py # (if you created this file)