How to use open-source, permissive-use data and collect less labeled samples for our tasks.
9 patterns including HITL, hard mining, reframing, cascade, data flywheel, business rules layer, and more.
Writing good instructions to achieve high precision and throughput.
Collecting ground truth, data augmentation, cascading heuristics and models, and more.
Pilot & copilot, literature review, methodology review, and timeboxing.
Or why I should write fewer integration tests.
Pushing back on the cult of complexity.
Understanding and spotting patterns to use code and components as intended.
Industry examples, exploration strategies, warm-starting, off-policy evaluation, and more.
Thinking about recsys as interventional vs. observational, and inverse propensity scoring.
What to consider for in terms of data, roadmap, role, manager, tooling, etc.
Beyond getting that starting role, how does one continue growing in the field?
Daliana and I had a 2hr chat on all things data science and machine learning.
More than two dozen interviews with ML Practitioners sharing their stories and advice
Why this is the first rule, some baseline heuristics, and when to move on to machine learning.
An overview of system design, candidate retrieval, and ranking, with industry examples.
How to generate labels from scratch with semi, active, and weakly supervised learning.
Building semantic search; how to calculate recall when relevant documents are unknown.
Why real-time RecSys? What does the system design look like in industry? How to build an MVP?
A whirlwind tour of bandits, embedding+MLP, sequences, graph, and user embeddings.
How to go from knowing machine learning to applying it at work to drive impact.
An overview and comparison of the various approaches, with examples from industry search systems.
Mike and I take a philosophical detour on Talk Python and discuss life lessons from machine learning.
Short vs. long-term gain, incremental vs. disruptive innovation, and resume-driven development.
Pointers to think through your methodology and implementation, and the review process.
Access, serving, integrity, convenience, autopilot; use what you need.
Design and architecture, tech stack, methodology, results, and lessons learned.
Why real-time? How have China & US companies built them? How to design & build an MVP?
Data cleaning, transfer learning, overfitting, ensembling, and more.
A personal take on their deliverables and skills, and what it means for the industry and your team.
Setbacks she faced, overcoming them, and how writing changed her life.
Step-by-step walkthrough on the environment, compilers, and installation for ScaNN.
Checking for correct implementation, expected learned behaviour, and satisfactory performance.
Should I switch from a regex-based to ML-based solution on my application?
Why (and why not) be more end-to-end, how to, and Stitch Fix and Netflix's experience
Part II of the previous write-up, this time on applications and frameworks of Spark in production
Sharing my notes & practical knowledge from the conference for people who don't have the time.
Can maintaining machine learning in production be easier? I go through some practical tips.
I thought deploying machine learning was hard. Then I had to maintain multiple systems in prod.
Comparing baselines (matrix factorization) against novel approaches using graphs & NLP.
In-depth sharing on how to put machine learning systems into production.
Keynote on how Asia's tech giants scale and their SuperApp strategy.
OMSCS CS7646 (Machine Learning for Trading) - Don't sell your house to trade algorithmically.
How we built an ML system to predict hospitalization costs at admission; sharing at DATAx Conference.
OMSCS CS6601 (Artificial Intelligence) - First, start with the simplest solution, and then add intelligence.
OMSCS CS7641 (Machine Learning) - Revisiting the fundamentals and learning new techniques.
Or how to put machine learning models into production.
Cleaning up text and messing with ascii (urgh!)
How Lazada ranks products to improve customer experience and conversion at Strata 2016.
Parsing json and formatting product titles and categories.
Sharing about my first data science competition at DataScience SG.
20 Jun 2015  ·  1 min  ·  machinelearning
Join 6,100+ readers getting updates on machine learning, RecSys, LLMs, and engineering.