The Gist
Second author on research involving LLMs and Graph Unlearning.
Built experimental pipelines and optimized local model inference under compute
constraints.
Key Win
Achieved a 4x speedup in experiment runtimes by optimizing
CUDA configurations and quantization settings.
Skills
PyTorch, Hugging Face, CUDA, Graph Neural Networks.
Context
Research assistant on projects involving large language models, graph unlearning, and topological graph structures. Primary responsibilities: replicating results from existing papers, implementing experimental pipelines, and supporting ongoing research. Second author on a paper exploring the use of LLMs for node and graph unlearning tasks.
Constraints: limited compute budget, tight publication timelines, no dedicated infrastructure.
Action
- Inference 4x faster — profiled HF Transformers serving path; identified KV-cache thrash and tokenizer overhead as bottlenecks.
- Evaluated vLLM — concluded setup cost outweighed gains under research timeline; documented the call.
- Custom data pipeline — assembled a one-off ingestion stack for unstructured experiment logs using third-party tools.
Lessons
- Sophistication has overhead. A simpler CUDA setup outperformed a more advanced serving framework given the constraints. Knowing when not to use a tool is as important as knowing how.
- Economic thinking applies to research. Time spent building infrastructure is time not spent running experiments. I learned to evaluate build-vs-buy tradeoffs quickly.