Beyond LLMs: Thriving Frontiers in Traditional Machine Learning & Statistics
The current buzz around Large Language Models (LLMs) might suggest that other areas of machine learning and statistics are fading into the background. However, a recent Hacker News discussion reveals a vibrant landscape where 'traditional' ML and statistical methods continue to thrive and evolve. Professionals shared diverse and impactful projects, underscoring the ongoing relevance of these foundational fields.
Diverse Applications of Traditional ML & Statistics
Several commenters highlighted how traditional techniques are crucial in their respective domains:
-
Finance and Business Intelligence: One user works in the hedge fund industry, focusing on forecasting individual company performance using alternative data like clickstream, point-of-sale, and payments. This involves significant data cleaning, time series analysis, and domain-specific modeling, primarily with tabular data. Interestingly, LLMs are seen as helpful for specific sub-tasks like entity resolution in data cleaning, but core predictive models still rely on established approaches.
-
Privacy-Preserving Data Science: Another active area is differentially private query answering and synthetic data generation for tabular data. This field has seen significant advances, allowing for data analysis and model training while protecting individual privacy.
-
Interacting with the Physical World: The application of ML to physical systems is a strong theme. Examples include:
- Developing AI to repair physical machinery, a complex challenge combining 3D understanding with dynamic states.
- Using ML on sensor data (e.g., sound) for condition monitoring of HVAC systems in buildings.
- Running ML inference directly on microcontroller-based sensors, as demonstrated by projects like
emlearn
.
-
Causal Inference: Understanding cause-and-effect relationships is a critical area where traditional methods, particularly causal inference, are being applied to process mining and event logs. The goal is to determine why an event occurred and how to influence future outcomes.
The Enduring Role of Statistical Modeling
One commenter emphasized the distinct ideology of statistical modeling compared to machine learning. Professional statisticians often work with randomized experiment design or observational designs in fields like hard sciences, actuarial sciences, finance (risk management), manufacturing, and polling/census research. The R language was highlighted as a primary tool for serious statisticians, and searching for jobs requiring R can reveal applications of statistics unrelated to LLMs.
Open Research Frontiers in Classical ML
Despite the advancements in LLMs, many fundamental challenges in classical ML remain active research areas. These include:
- Causal inference: Moving beyond correlation to understand causation.
- Robustness to distribution shifts: Ensuring models perform well when data characteristics change.
- Adversarial resilience: Protecting models from malicious inputs.
- Continual and online learning: Enabling models to adapt to new data without forgetting past knowledge.
- Multi-modal learning: Effectively fusing information from diverse sources like vision, time series, and structured data (beyond just text).
- Interpretability: Making model decisions understandable, especially in high-stakes domains.
Tooling and Accessibility
There's also ongoing work to make traditional ML techniques more accessible. One contributor is building a tool to simplify ML on tabular data for tasks like forecasting and imputation, aiming to enable users to quickly create and iteratively improve models with a user-friendly workflow.
In conclusion, while LLMs are transformative, the discussion clearly shows that traditional ML and statistics are not only surviving but are essential for solving a wide array of complex problems. They offer robust solutions, particularly for structured and sensor data, and continue to be areas of active research and development. For many applications, these established methods provide the core modeling power, with LLMs sometimes serving as useful auxiliary tools.