Notes
I try to put my notes here, so I keep track of my studies :)
I try to put my notes here, so I keep track of my studies :)

🗓 Day 1 — Monday, 10 Feb 2025 🌍 What Is Common Crawl? Common Crawl is a huge, free snapshot of the public web. A non‑profit updates it every month, storing: Billions of HTML pages Their cleaned‑up text content Extra metadata (links, timestamps, MIME types, …) Why It Matters Track language change – see how words, memes, and topics shift over time. Map the web’s link network – study which sites connect and why. Train big ML models – use real‑world data instead of tiny toy datasets. Because each release includes both the raw HTML and a parsed text layer, you can analyze: ...

Here are the notes from some talks I attented at ACL 2025 in Vienna! Eye-tracking Why gaze? Eye movements reflect online processing (not just end products), letting us probe difficulty, attention, and strategies during reading. That’s gold for modeling and evaluation. (PubMed) Data is maturing: Multilingual, multi‑lab efforts (e.g., MECO, MultiplEYE) + tooling (e.g., pymovements) have made high‑quality datasets and pipelines more accessible. (meco-read.com, multipleye.eu, arXiv) Models & evals: Gaze can improve certain NLP tasks and also evaluate systems with behavioral signals (e.g., readability, MT, summarization). But gains are often modest unless modeling is careful or data is task‑aligned. Open debates: How well LLM surprisal predicts human reading times varies with model size, layers, and populations; adding recency biases can help fit human behavior. (ACL Anthology, dfki.de, ACL Anthology, ACL Anthology) Eye‑tracking 101 👁️ Some basic concepts: ...
A digital mathematical notebook on embedding spaces: learned maps, similarity metrics, normalization, intrinsic dimensionality, spectra, anisotropy, contrastive geometry, hubness, and practical diagnostics for retrieval and sentence embeddings. ...
A digital mathematical notebook on Geometric Deep Learning: symmetry, invariance, equivariance, sets, graphs, manifolds, geodesics, gauges, and transformers through the same structural lens. ...
A digital mathematical notebook on optimal transport: Monge and Kantorovich formulations, Wasserstein distances, duality, one-dimensional transport, Sinkhorn regularization, barycenters, and the bridge to persistence diagrams.