Context
Research assistant on projects involving large language models, graph unlearning, and topological graph structures. Primary responsibilities: replicating results from existing papers, implementing experimental pipelines, and supporting ongoing research. Second author on a paper exploring the use of LLMs for node and graph unlearning tasks.
Constraints: limited compute budget, tight publication timelines, no dedicated infrastructure.
Action
Model inference & systems work: Set up local inference for an open-source Hugging Face model on my own machine. Worked directly with CUDA, VRAM constraints, and memory optimization. Evaluated vLLM as a serving framework for improved throughput; determined that setup complexity and time constraints made it impractical for this project. Made a deliberate decision to revert to a simpler CUDA-based configuration. Optimized quantization settings and achieved ~4× speedup in experiment runtime.
Data collection & tooling: Tasked with collecting datasets related to AI conference discourse on X (Twitter)—follower graphs, post content, engagement data. Assessed that building a custom scraping solution would be slow, brittle, and blocked by rate limits and firewalls. Recommended using existing third-party tools. No single tool satisfied all requirements, so I combined multiple tools to assemble the dataset—effectively a "frankensteined" pipeline that prioritized speed and reliability over elegance.
Lessons
- Sophistication has overhead. A simpler CUDA setup outperformed a more advanced serving framework given the constraints. Knowing when not to use a tool is as important as knowing how.
- Economic thinking applies to research. Time spent building infrastructure is time not spent running experiments. I learned to evaluate build-vs-buy tradeoffs quickly.
- Execution quality matters. Replication work and pipeline implementation aren't glamorous, but they're where most research actually happens. Doing them reliably builds trust.
- Scrappy solutions can be correct solutions. The data pipeline wasn't elegant, but it worked, and the alternative would have taken weeks longer.