CS @ UT Dallas

Shifat Santo

I build vector databases in C++ because I wanted to know how they actually work. I trained a language model because I wanted to understand tokenization. I built a voice AI platform because I wanted to hear latency, not read about it. Most of what I know, I learned by building it wrong first.

// projects

Things I built to understand things

Not wrappers. Not tutorials. Systems I wrote from scratch because the only way to really learn something is to build it.

Soniq architecture overview

Soniq

Phone rings, AI picks up, handles the whole call. Checks availability, books appointments, takes orders, escalates to a human if it needs to. LiveKit + Deepgram + Cartesia. Had to get it under 200ms end-to-end or the conversation feels broken.

PythonTypeScriptLiveKitVoice AI
Deep dive →
VectorVault architecture overview

VectorVault

I wanted to understand how vector databases actually work. Not the API -- the graph construction, the distance math, the persistence. So I built one. HNSW in C++20 with AVX2 SIMD, mmap, and CRC32 checksums.

C++20HNSWSIMDAVX2
Deep dive →
Quantum Tunnel architecture overview

Quantum Tunnel

Solves the time-dependent Schrodinger equation on a 3D grid. Split-operator FFT, OpenMP parallelism, two render modes (marching cubes and volume raycasting). Built it because I wanted to see quantum tunneling, not just read the math.

C++FFTW3OpenGLOpenMP
Deep dive →
nightshift architecture overview

nightshift

Sits between your agent and LLM APIs. Compresses context with T5, deduplicates with SHA-256, routes queries through a UCB1 bandit between cheap and expensive models. I built it so I could run research agents overnight without burning my API budget. 141 tests.

PythonT5MiniLMChromaDB
Deep dive →
undertone architecture overview

undertone

Hold a hotkey, speak, release. Words typed at cursor. Groq Whisper for speed, local faster-whisper when you want privacy. Published on PyPI because existing voice typing tools were either Mac-only or had garbage latency.

PythonPyPIWhisper
Deep dive →
Bengali Tokenizer Research architecture overview

Bengali Tokenizer Research

Tested 14 tokenizers on Bengali text. Multilingual LLM tokenizers are 5-9x less efficient than dedicated ones. Not 'a few percent.' A completely different cost structure. Root cause: one missing Unicode character (Bengali Nukta) accounts for 89.8% of byte-fallback tokens.

PythonNLPSentencePiece
Deep dive →
// proof of work

What I've actually done

Not claims. Specific things I built, measured, and shipped.

200K+
Lines Shipped
306M
Params Trained
<200ms
Voice Latency
141
Tests Green
I built a voice AI platform
Inbound phone calls hit a SIP trunk, route to LiveKit, and an AI agent handles the conversation. Bookings, orders, escalation. Sub-200ms or it's useless.
I wrote a vector database in C++
Not a wrapper. HNSW graph construction, AVX2 distance functions, mmap persistence with CRC32 checksums. Because I wanted to know how they actually work.
I trained a language model
306M parameters. Bengali. Custom data pipeline, custom tokenizer, pre-training on A100s. Found that one missing Unicode character causes 89.8% of byte-fallback tokens.
I ship across the whole stack
C++ and Rust at the bottom. Python in the middle. TypeScript at the top. The language doesn't matter. Understanding the problem does.
// about

About

CS student at UT Dallas. I spend most of my time building things that probably don't make sense for a student to build — vector databases, language models, voice AI platforms. I do it anyway because using an API without understanding what's behind it bothers me.

I write C++, Rust, Go, Python, and TypeScript. Not because I'm trying to collect languages, but because different problems need different tools. You don't write a Schrodinger equation solver in JavaScript. You don't build a dashboard in C.

Currently looking for internships where I can work on hard problems with people who care about getting them right.

Technical Stack

Languages
C++RustGoPythonTypeScript
AI / ML
LLMsRAGAgentsWhisperEmbeddingsFine-tuning
Infrastructure
DockerLinuxCUDAA100/H100FastAPI
Frontend
Next.jsReactTailwindOpenGL
// contact

Say hi

Looking for internships. Also happy to talk about systems, AI, or whatever you're building.