Back to work

Private/research language model

bangla-llm

Bengali model quality depends on corpus cleaning, tokenizer coverage, contamination control, and training discipline.

What I built

A 306M-parameter Bengali decoder-only model trained from scratch with a curated corpus, Bengali-specific normalization, tokenizer audit, and A100 training run.

Stack

PythonTransformersSentencePieceBF16A100

Status

Research/private direction; secondary