Hank Sha

Early-career AI practitioner. I ramp quickly in unfamiliar domains and apply reasoning under uncertainty.

mechanistic interpretability alignment applied ML

This is a living document. It replaces a traditional resume and reflects how I think and learn, supported by concrete work.

Updated as I learn. Built in public.

Last updated:

About

I'm an undergraduate senior studying Statistics & Data Science, working across AI research and engineering. I'm interested in problems where modeling, systems, and product constraints intersect—especially in applied ML and LLM-based systems.

I've worked on production ML systems at early-stage startups, contributed to peer-reviewed research, and enjoy operating in ambiguous, fast-moving environments. This site reflects how I think, learn, and evaluate technical tradeoffs.

Learning Philosophy

Skills as investments. I view skills as investments rather than virtues. Every new skill has an opportunity cost in time, energy, and focus, so gaps are often the result of intentional prioritization rather than lack of ability. My learning decisions are guided by expected leverage—how much a skill expands my ability to reason about systems, deliver impact, or adapt to new problem spaces—rather than by checklist completeness.

How I Learn & Ramp

My approach to entering unfamiliar domains. Role-agnostic; applies to research, engineering, and hybrid work.

Skill Map

An honest self-assessment. Gaps are current learning priorities, not apologies.

Strong

  • Python (NumPy, PyTorch, pandas)
  • Technical writing
  • Paper reading & synthesis
  • Rapid domain ramp-up
  • Identifying what matters vs. what doesn't
  • Asking clarifying questions
  • Communicating uncertainty

Moderate

  • Transformer architectures
  • Experiment design
  • Statistical modeling
  • Collaborative research
  • JavaScript / web development
  • Working with ambiguous requirements

Gaps (Learning Now)

  • CUDA / GPU programming
  • Large-scale distributed training
  • Interpretability tooling (TransformerLens, etc.)
  • Publishing in peer-reviewed venues
  • Leading multi-person research projects

Building/Learning in Public

I write to clarify my own thinking. Publishing creates accountability and invites correction from people who know more than I do.

This is intellectual exploration, not expertise. My understanding evolves; older posts may reflect superseded views.

Selected Work

Applications of the above. Each follows: Context (domain, uncertainty, constraints) → Action (what I learned, evaluated, built) → Lessons (judgment, tradeoffs, generalizable insights).

Context

Analyzed Phase II clinical trial data for an MDD drug at a pre-IPO biotech company. Objective: use ML and statistical methods to identify potential gaps, latent signals, or patterns not captured by traditional analyses. I had no prior experience in clinical trials or biostatistics.

Constraints: small trial size, limited data points, high noise, and significant placebo effects—conditions that are common in psychiatric trials and fundamentally limit what ML can detect.

Action

Domain ramp-up: Navigated extensive clinical trial documentation, unfamiliar terminology, trial protocols, and outcome measures. Synthesized enough context to work effectively within weeks.

Exploratory analysis: Conducted deep EDA—placebo response patterns, subgroup analyses, outcome distribution shifts. Compared findings against traditional statistical outputs to evaluate where additional methods might add signal versus noise.

ML experimentation: Trained multiple classification models to explore whether ML could surface patterns missed by standard analyses. XGBoost performed best among the models tested. However, accuracy and precision were unstable across folds; performance was insufficient for any production or decision-support use. Made a deliberate judgment that these results should not be over-interpreted or presented as actionable.

Lessons

  • Knowing when ML doesn't apply is part of the job. Small, noisy clinical datasets often lack the statistical power for ML to add value. Recognizing this early prevents wasted effort and misleading conclusions.
  • EDA and domain understanding matter more than model complexity. The most useful insights came from careful exploratory analysis, not from algorithmic sophistication.
  • Negative results are still results. Concluding that ML was not appropriate for this dataset was the correct scientific judgment—not a failure.
  • Unfamiliar domains become navigable quickly with focused reading. Clinical trial structure, terminology, and conventions are learnable; the key is knowing what to prioritize.

Context

Research assistant on projects involving large language models, graph unlearning, and topological graph structures. Primary responsibilities: replicating results from existing papers, implementing experimental pipelines, and supporting ongoing research. Second author on a paper exploring the use of LLMs for node and graph unlearning tasks.

Constraints: limited compute budget, tight publication timelines, no dedicated infrastructure.

Action

Model inference & systems work: Set up local inference for an open-source Hugging Face model on my own machine. Worked directly with CUDA, VRAM constraints, and memory optimization. Evaluated vLLM as a serving framework for improved throughput; determined that setup complexity and time constraints made it impractical for this project. Made a deliberate decision to revert to a simpler CUDA-based configuration. Optimized quantization settings and achieved ~4× speedup in experiment runtime.

Data collection & tooling: Tasked with collecting datasets related to AI conference discourse on X (Twitter)—follower graphs, post content, engagement data. Assessed that building a custom scraping solution would be slow, brittle, and blocked by rate limits and firewalls. Recommended using existing third-party tools. No single tool satisfied all requirements, so I combined multiple tools to assemble the dataset—effectively a "frankensteined" pipeline that prioritized speed and reliability over elegance.

Lessons

  • Sophistication has overhead. A simpler CUDA setup outperformed a more advanced serving framework given the constraints. Knowing when not to use a tool is as important as knowing how.
  • Economic thinking applies to research. Time spent building infrastructure is time not spent running experiments. I learned to evaluate build-vs-buy tradeoffs quickly.
  • Execution quality matters. Replication work and pipeline implementation aren't glamorous, but they're where most research actually happens. Doing them reliably builds trust.
  • Scrappy solutions can be correct solutions. The data pipeline wasn't elegant, but it worked, and the alternative would have taken weeks longer.

Context

Part-time scout role at an early-stage fund (pre-seed, seed, early Series A) investing across AI, deep tech, energy, cleantech, and web3. This was a sourcing and light evaluation role—not deal-leading or deep diligence. I did not make or influence investment decisions.

Action

Sourcing & outreach: Sourced startups through university venture clubs, campus events, and founder communities. Conducted high-volume cold outreach via email and LinkedIn. Learned that many student startups were too early for fundraising, and that effective sourcing requires persistence and tolerance for rejection and non-responses.

Evaluation & research: Performed light startup evaluation—value proposition clarity, market opportunity, team background, early traction, competitive landscape. Reviewed pitch decks with a critical but early-stage lens. Summarized findings for partners and answered basic diligence questions when asked.

Internal coordination: Communicated regularly with firm partners. Provided concise updates and timely responses. Helped advise and onboard newer scouts.

Lessons

  • Communication is foundational. Clarity, tone, and responsiveness matter more than I expected. This role was a training ground for professional communication—cold approaches, follow-ups, handling rejection without disengaging.
  • Early-stage evaluation is different. At pre-seed, there's little data; the work is about identifying potential under uncertainty, not validating traction.
  • Cold outreach is a learned skill. I started uncomfortable with it; I improved through repetition. Rejection is normal and not personal.
  • This role pushed me outside my comfort zone socially, which was valuable independent of any specific outcome.

Context

GitRoll is a pre-seed HR-tech startup that scans GitHub repositories to evaluate developer contributions—designed to surface signal beyond resumes and help underrepresented developers get fairer evaluations.

How I got here: As a junior without a summer internship, I attended a demo event at Draper University specifically to network and find opportunities. After initial conversations with the founders, I interviewed but felt I performed poorly due to inexperience. I proposed working unpaid for 1–2 weeks so they could evaluate my contribution before committing. Instead, they gave me a take-home project: evaluate and tune prompts for their LLM-based grading system. Given two weeks, I completed it in one. The work exceeded expectations; I received an offer.

Action

LLM & NLP systems: Led R&D for a uniqueness grading model using LLMs and NLP techniques. The system has been applied to 1.4M+ users. Proposed and implemented improved grading metrics, prompt engineering strategies, and output calibration for consistency.

Model evaluation & cost optimization: Benchmarked LLMs (GPT, Gemini, Claude) for cost and performance tradeoffs. Reduced LLM-related costs by ~13% through model selection and prompt optimization.

Repository analysis & bias reduction: Scanned and analyzed 200+ GitHub repositories. Iterated on grading logic through user collaboration and internal feedback. Improved model performance by ~20% over previous versions. Explicitly worked on bias reduction and fairness-aware evaluation.

Supervised learning & labeling: Manually labeled 216+ GitHub pull request issues. Built classification models achieving ~92% precision. Experimented with word-bank approaches and NLTK-based categorization. Classified issues into 3 impact levels; this logic was integrated into contribution scoring and CURISM metrics.

Static analysis & tooling: Used ast-grep for code linting and analysis. Built a programming language progression roadmap (Python, JavaScript, TypeScript) for developers. Improved internal development velocity by ~30%.

Lessons

  • Opportunities can be created, not just found. Proposing a trial period was uncomfortable but effective. Execution speaks louder than credentials.
  • Early-stage startups reward breadth. I touched LLM systems, labeling, static analysis, and product iteration—all within a few months. Context-switching is the job.
  • Fairness in ML is a design problem, not just a metrics problem. Bias reduction required iterating on grading logic with real users, not just adjusting thresholds.
  • Speed matters at pre-seed. Completing the take-home in half the time signaled more than the work itself.