Hank Sha

About

AI Engineer building real-world, production-grade systems.

I solve ambiguous product and customer problems with scalable AI, backend engineering, and strong technical communication.

Focus

Applied ML & LLM Systems — Building and optimizing production-oriented pipelines.
Engineering & Infrastructure — End-to-end data engineering and deployment.
Technical Communication — Clear translation between technical tradeoffs and product impact.

Learning Philosophy

I prioritize skills based on leverage, not checklists. I focus on high-impact capabilities that expand my system-level reasoning, viewing every new skill as a strategic allocation of time and focus.

How I Learn & Ramp

My approach to quickly entering new domains and delivering working solutions. Built for engineering, product, and customer-facing environments.

Map to System: Move from high-level context to architecture-level understanding of how things actually run.
Build to Understand: Learn by shipping small, functional versions early and iterating fast.
Edge-Case Driven: Actively probe failure points to harden systems for real-world use.
Tight Feedback Loops: Incorporate stakeholder and user feedback to refine direction quickly.
Write to Align: Use concise documentation to clarify tradeoffs and align technical decisions with business goals.

Skill Map

An honest self-assessment. Gaps are current learning priorities, not apologies.

Strong

Python for production ML (NumPy, PyTorch, pandas)
Technical writing & documentation
Rapid domain ramp-up
Signal extraction (identifying core constraints vs. noise)
Structured problem clarification
Communicating tradeoffs & uncertainty

Working Knowledge

LLM & Transformer systems
Experiment design & evaluation pipelines
Statistical modeling
Cross-functional collaboration
JavaScript / lightweight frontend integration
Operating under ambiguous requirements

Currently Expanding

GPU-level optimization (CUDA fundamentals)
Distributed training systems
Advanced LLM interpretability tooling
Technical leadership in multi-person builds

Building/Learning in Public

I write to clarify my own thinking. Publishing creates accountability and invites correction from people who know more than I do.

Selected Writing

5 AI Prompts That Will Seriously Save You 20+ Hours a Week (Free Guide) — Sharing AI prompts in my workflow.
Under the Hood of Neural Networks: Implementing Autograd from Scratch — Technical notes on Autograd implementation.

More writing on Medium →

Competitions

Data4Good Datathon — 1st Place, Civic Engagement & Policy Track — UCSB, 2026. Won the Civic Engagement & Policy track at the Data4Good datathon.

Selected Work

Applications of the above. Each follows: Context → Action → Lessons

The Gist: Second author on research involving LLMs and Graph Unlearning. Built experimental pipelines and optimized local model inference under compute constraints.

Key Win: Achieved a 4× speedup in experiment runtimes by optimizing CUDA configurations and quantization settings.

Skills: PyTorch, Hugging Face, CUDA, Graph Neural Networks.

Context

Research assistant on projects involving large language models, graph unlearning, and topological graph structures. Primary responsibilities: replicating results from existing papers, implementing experimental pipelines, and supporting ongoing research. Second author on a paper exploring the use of LLMs for node and graph unlearning tasks.

Constraints: limited compute budget, tight publication timelines, no dedicated infrastructure.

Action

Model inference & systems work: Set up local inference for an open-source Hugging Face model on my own machine. Worked directly with CUDA, VRAM constraints, and memory optimization. Evaluated vLLM as a serving framework for improved throughput; determined that setup complexity and time constraints made it impractical for this project. Made a deliberate decision to revert to a simpler CUDA-based configuration. Optimized quantization settings and achieved ~4× speedup in experiment runtime.

Data collection & tooling: Tasked with collecting datasets related to AI conference discourse on X (Twitter)—follower graphs, post content, engagement data. Assessed that building a custom scraping solution would be slow, brittle, and blocked by rate limits and firewalls. Recommended using existing third-party tools. No single tool satisfied all requirements, so I combined multiple tools to assemble the dataset—effectively a "frankensteined" pipeline that prioritized speed and reliability over elegance.

Lessons

Sophistication has overhead. A simpler CUDA setup outperformed a more advanced serving framework given the constraints. Knowing when not to use a tool is as important as knowing how.
Economic thinking applies to research. Time spent building infrastructure is time not spent running experiments. I learned to evaluate build-vs-buy tradeoffs quickly.
Execution quality matters. Replication work and pipeline implementation aren't glamorous, but they're where most research actually happens. Doing them reliably builds trust.
Scrappy solutions can be correct solutions. The data pipeline wasn't elegant, but it worked, and the alternative would have taken weeks longer.

The Gist: Developed a commercial lens for evaluating AI and deep-tech startups at pre-seed and seed stages.

Key Win: Built strong technical communication and sourcing skills, bridging deep-tech capabilities with market viability.

Skills: Startup Evaluation, Market Research, Professional Outreach.

Context

Part-time scout role at an early-stage fund (pre-seed, seed, early Series A) investing across AI, deep tech, energy, cleantech, and web3. This was a sourcing and light evaluation role—not deal-leading or deep diligence. I did not make or influence investment decisions.

Action

Sourcing & outreach: Sourced startups through university venture clubs, campus events, and founder communities. Conducted high-volume cold outreach via email and LinkedIn. Learned that many student startups were too early for fundraising, and that effective sourcing requires persistence and tolerance for rejection and non-responses.

Evaluation & research: Performed light startup evaluation—value proposition clarity, market opportunity, team background, early traction, competitive landscape. Reviewed pitch decks with a critical but early-stage lens. Summarized findings for partners and answered basic diligence questions when asked.

Internal coordination: Communicated regularly with firm partners. Provided concise updates and timely responses. Helped advise and onboard newer scouts.

Lessons

Communication is foundational. Clarity, tone, and responsiveness matter more than I expected. This role was a training ground for professional communication—cold approaches, follow-ups, handling rejection without disengaging.
Early-stage evaluation is different. At pre-seed, there's little data; the work is about identifying potential under uncertainty, not validating traction.
Cold outreach is a learned skill. I started uncomfortable with it; I improved through repetition. Rejection is normal and not personal.
This role pushed me outside my comfort zone socially, which was valuable independent of any specific outcome.

The Gist: Applied ML to Phase II clinical trial data for MDD (Major Depressive Disorder). Conducted deep exploratory data analysis to find signals in high-noise environments.

Key Win: Provided a high-integrity technical audit that correctly identified ML as not suitable for the dataset, preventing action on unreliable signals.

Skills: XGBoost, Statistical Modeling, EDA, Biostatistics.

Context

Analyzed Phase II clinical trial data for an MDD drug at a pre-IPO biotech company. Objective: use ML and statistical methods to identify potential gaps, latent signals, or patterns not captured by traditional analyses. I had no prior experience in clinical trials or biostatistics.

Constraints: small trial size, limited data points, high noise, and significant placebo effects—conditions that are common in psychiatric trials and fundamentally limit what ML can detect.

Action

Domain ramp-up: Navigated extensive clinical trial documentation, unfamiliar terminology, trial protocols, and outcome measures. Synthesized enough context to work effectively within weeks.

Exploratory analysis: Conducted deep EDA—placebo response patterns, subgroup analyses, outcome distribution shifts. Compared findings against traditional statistical outputs to evaluate where additional methods might add signal versus noise.

ML experimentation: Trained multiple classification models to explore whether ML could surface patterns missed by standard analyses. XGBoost performed best among the models tested. However, accuracy and precision were unstable across folds; performance was insufficient for any production or decision-support use. Made a deliberate judgment that these results should not be over-interpreted or presented as actionable.

Lessons

Knowing when ML doesn't apply is part of the job. Small, noisy clinical datasets often lack the statistical power for ML to add value. Recognizing this early prevents wasted effort and misleading conclusions.
EDA and domain understanding matter more than model complexity. The most useful insights came from careful exploratory analysis, not from algorithmic sophistication.
Negative results are still results. Concluding that ML was not appropriate for this dataset was the correct scientific judgment—not a failure.
Unfamiliar domains become navigable quickly with focused reading. Clinical trial structure, terminology, and conventions are learnable; the key is knowing what to prioritize.

The Gist: Scaled an LLM-based developer evaluation system to 1.4M+ users. Focused on prompt engineering, cost optimization, and reducing algorithmic bias.

Key Win: Reduced API costs by 13% and improved model performance by 20% through iterative benchmarking of GPT, Gemini, and Claude.

Skills: Python, LLMs, NLP, ast-grep, Supervised Learning.

Context

GitRoll is a pre-seed HR-tech startup that scans GitHub repositories to evaluate developer contributions—designed to surface signal beyond resumes and help underrepresented developers get fairer evaluations.

How I got here: As a junior without a summer internship, I attended a demo event at Draper University specifically to network and find opportunities. After initial conversations with the founders, I interviewed but felt I performed poorly due to inexperience. I proposed working unpaid for 1–2 weeks so they could evaluate my contribution before committing. Instead, they gave me a take-home project: evaluate and tune prompts for their LLM-based grading system. Given two weeks, I completed it in one. The work exceeded expectations; I received an offer.

Action

LLM & NLP systems: Led R&D for a uniqueness grading model using LLMs and NLP techniques. The system has been applied to 1.4M+ users. Proposed and implemented improved grading metrics, prompt engineering strategies, and output calibration for consistency.

Model evaluation & cost optimization: Benchmarked LLMs (GPT, Gemini, Claude) for cost and performance tradeoffs. Reduced LLM-related costs by ~13% through model selection and prompt optimization.

Repository analysis & bias reduction: Scanned and analyzed 200+ GitHub repositories. Iterated on grading logic through user collaboration and internal feedback. Improved model performance by ~20% over previous versions. Explicitly worked on bias reduction and fairness-aware evaluation.

Supervised learning & labeling: Manually labeled 216+ GitHub pull request issues. Built classification models achieving ~92% precision. Experimented with word-bank approaches and NLTK-based categorization. Classified issues into 3 impact levels; this logic was integrated into contribution scoring and CURISM metrics.

Static analysis & tooling: Used ast-grep for code linting and analysis. Built a programming language progression roadmap (Python, JavaScript, TypeScript) for developers. Improved internal development velocity by ~30%.

Lessons

Opportunities can be created, not just found. Proposing a trial period was uncomfortable but effective. Execution speaks louder than credentials.
Early-stage startups reward breadth. I touched LLM systems, labeling, static analysis, and product iteration—all within a few months. Context-switching is the job.
Fairness in ML is a design problem, not just a metrics problem. Bias reduction required iterating on grading logic with real users, not just adjusting thresholds.
Speed matters at pre-seed. Completing the take-home in half the time signaled more than the work itself.

Hank Sha

About

Focus

Learning Philosophy

How I Learn & Ramp

Skill Map

Strong

Working Knowledge

Currently Expanding

Building/Learning in Public

Selected Writing

Competitions

Selected Work

Research Assistant

Context

Action

Lessons

Venture Scout

Context

Action

Lessons

Clinical Data Science Internship

Context

Action

Lessons

AI/ML Engineer Intern

Context

Action

Lessons