- Data mining is a subfield of computer science which blends many techniques from statistics, data science, database theory and machine learning.
- Applications of data mining include customer profiling and segmentation, market basket analysis, and anomaly detection.
Data mining does not have a single inventor. Instead, it has evolved over time through contributions from various researchers and practitioners across different domains. The development of data mining involves a combination of advances in statistics, machine learning, artificial intelligence, and computer science. In this blog, you can see some key figures and milestones in the history of data mining.
The origins of data mining
John Tukey (1915-2000): An American statistician, Tukey’s contributions to exploratory data analysis (EDA) were groundbreaking. His development of methods for summarising and visualising data provided a crucial foundation for later data mining techniques. Tukey’s work emphasised the importance of looking beyond raw data to understand its underlying structure and patterns.
Early contributions to statistical techniques
As data mining evolved, it drew heavily on statistical methods to analyse and interpret data. Jerome Friedman, Robert Tibshirani, and Trevor Hastie: This trio of statisticians significantly advanced the field with their work on classification and regression techniques. Their development of algorithms like classification trees and ensemble methods, including boosting, became fundamental components of modern data mining. Their contributions provided the theoretical underpinnings for many techniques used in extracting insights from data.
Also read: 5 essential risks of data mining you need to know
Also read: Understanding data mining and its importance in business
The advent of machine learning
Arthur Samuel (1901-1990): Often credited with coining the term “machine learning,” Samuel’s work in the 1950s on algorithms that improve through experience laid the groundwork for many data mining methods. His research in creating programs that could learn from data was instrumental in shaping the algorithms used in data mining today.
Database systems and Association Rules
The 1990s saw significant advancements in database systems and algorithms, which greatly impacted data mining practices. Rakesh Agrawal, Tomasz Imielinski, and Arun Swami: These researchers developed the Apriori algorithm, a pioneering method for mining association rules in large databases. Their work allowed businesses and researchers to uncover relationships between variables in datasets, such as finding which products are often bought together. This development became a cornerstone of data mining, particularly in market basket analysis.
Modern data mining: Formalising the field
As data mining continued to evolve, efforts were made to formalise and standardise the techniques and methodologies used. Jiawei Han and Micheline Kamber: Their influential textbook, “Data Mining: Concepts and Techniques,” has become a staple in the field. Han and Kamber’s work helped to synthesise and articulate the methods and applications of data mining, making it accessible to students and professionals alike. Their contributions provided a comprehensive overview of data mining techniques and best practices.