Shifat Santo
I build vector databases in C++ because I wanted to know how they actually work. I trained a language model because I wanted to understand tokenization. I built a voice AI platform because I wanted to hear latency, not read about it. Most of what I know, I learned by building it wrong first.
Things I built to understand things
Not wrappers. Not tutorials. Systems I wrote from scratch because the only way to really learn something is to build it.

Soniq
Phone rings, AI picks up, handles the whole call. Checks availability, books appointments, takes orders, escalates to a human if it needs to. LiveKit + Deepgram + Cartesia. Had to get it under 200ms end-to-end or the conversation feels broken.

VectorVault
I wanted to understand how vector databases actually work. Not the API -- the graph construction, the distance math, the persistence. So I built one. HNSW in C++20 with AVX2 SIMD, mmap, and CRC32 checksums.

Quantum Tunnel
Solves the time-dependent Schrodinger equation on a 3D grid. Split-operator FFT, OpenMP parallelism, two render modes (marching cubes and volume raycasting). Built it because I wanted to see quantum tunneling, not just read the math.

nightshift
Sits between your agent and LLM APIs. Compresses context with T5, deduplicates with SHA-256, routes queries through a UCB1 bandit between cheap and expensive models. I built it so I could run research agents overnight without burning my API budget. 141 tests.

undertone
Hold a hotkey, speak, release. Words typed at cursor. Groq Whisper for speed, local faster-whisper when you want privacy. Published on PyPI because existing voice typing tools were either Mac-only or had garbage latency.

Bengali Tokenizer Research
Tested 14 tokenizers on Bengali text. Multilingual LLM tokenizers are 5-9x less efficient than dedicated ones. Not 'a few percent.' A completely different cost structure. Root cause: one missing Unicode character (Bengali Nukta) accounts for 89.8% of byte-fallback tokens.
Writing
I built in silence for too long. Now I write about what I build and what breaks along the way.
I Scanned 20 MCP Server Configs. 19 Failed.
These aren't contrived attack scenarios. These are the configs developers copy from documentation and paste into their settings.
A $9/Month Content Pipeline That Does Everything
One git push cross-posts to Dev.to and Hashnode, generates social content, and tracks what went where. I'm an engineer. I automated it.
Why I'm Writing Now
26 repos and 3 followers. That's what building in silence gets you.
What I've actually done
Not claims. Specific things I built, measured, and shipped.
About
CS student at UT Dallas. I spend most of my time building things that probably don't make sense for a student to build — vector databases, language models, voice AI platforms. I do it anyway because using an API without understanding what's behind it bothers me.
I write C++, Rust, Go, Python, and TypeScript. Not because I'm trying to collect languages, but because different problems need different tools. You don't write a Schrodinger equation solver in JavaScript. You don't build a dashboard in C.
Currently looking for internships where I can work on hard problems with people who care about getting them right.