- These early contributions by pioneers like Lawrence Roberts and Bela Julesz established fundamental principles and techniques that have profoundly influenced the development of computer vision.
- The evolution of computer vision from the foundational theories of the 1970s and 1980s to the revolutionary advancements in neural networks and deep learning in the 1990s and 2000s has significantly shaped the discipline, leading to groundbreaking applications and methodologies that are integral to modern AI and image processing.
- The 21st century has seen a significant boom in computer vision, with groundbreaking advancements and achievements in deep learning and neural networks revolutionizing image classification, object detection, segmentation, natural language processing, and beyond, showcasing the profound integration of visual understanding and AI.
The invention and development of computer vision was not accomplished by a single figure but was gradually formed by many scholars, researchers, and engineers over a long period and through joint efforts. The field involves the intersection of multiple disciplines, including computer science, mathematics, physics, engineering and neuroscience.
Also read: RoboVision secures $42M to enhance AI integration in manufacturing
Origin and early development of computer vision
The roots of computer vision can be traced back to the 1950s and 1960s, when the advent and development of electronic computers laid the foundation for image processing and pattern recognition.
Lawrence Roberts
Lawrence Roberts is considered one of the pioneers of computer vision. He introduced many of the basic concepts and techniques of computer vision in his 1963 PhD thesis, Machine Perception of Three-Dimensional Solids. His work dealt with how to extract three-dimensional information from two-dimensional images, one of the central problems of computer vision. Roberts’ research laid the foundation for later research in 3D reconstruction and stereo vision.
Bela Julesz
Bela Julesz was a visual psychologist whose research on random-dot stereograms in the 1960s had a significant impact on computer vision. Julesz showed experimentally how the human visual system perceives depth from random dot images, which has important implications for understanding stereopsis and depth perception.
Also read: Intel develops the largest neuromorphic computer system
Developments in the 1970s and 1980s
During the 1970s and 1980s, computer vision took shape as a discipline, and many key concepts and techniques were developed and promoted during this period.
David Marr
David Marr is another important figure in the field of computer vision. He proposed a series of theories on visual processing in the 1970s and 1980s that attempted to explain how the human visual system processes and understands visual information. Marr elaborated on his theories in his 1982 book, Vision: A Computational Investigation into the Human Representation and Processing of Visual Information, including a hierarchical model of visual information processing.
He proposes that visual processing can be divided into three main stages: primal sketch, 2.5D sketch and 3D model representation. Marr’s work has had a profound impact on both computer vision and neuroscience.
John Hopfield and David Marr
John Hopfield and David Marr’s work on pattern recognition and neural networks has also had a significant impact on computer vision. Hopfield’s network was an early model of a neural network that showed how pattern recognition problems could be solved by neural computation. These studies provided a theoretical foundation for image recognition and classification tasks in computer vision.
Modern developments in computer vision
Computer vision has made great strides in algorithms, computational power, and application areas since the 1990s and 2000s.
Takeo Kanade
Takeo Kanade is a leading scholar in the field of computer vision and robotics. He has developed several important computer vision systems and algorithms, including facial recognition, stereo vision, and mobile robot navigation. Takeo Kanade’s work has had a wide impact on both academia and industry, and he is a key member of the Computer Science Department and the Robotics Institute at Carnegie Mellon University.
David Forsyth and Jean Ponce
David Forsyth and Jean Ponce are co-authors of Computer Vision: A Modern Approach, an important textbook in the field of computer vision that covers a wide range of topics from basic theory to practical applications. Widely used in computer vision teaching and research, it is a classic in the field of computer vision.
Geoffrey Hinton, Yann LeCun and Yoshua Bengio
Geoffrey Hinton, Yann LeCun, and Joshua Bengio’s work on neural networks and deep learning revolutionised computer vision. Their work has led to the success of convolutional neural networks (CNNs) in tasks such as image classification, object detection, and semantic segmentation. In particular, AlexNet’s victory in the 2012 ImageNet competition marked a breakthrough in the application of deep learning to computer vision.
Computer vision development boom
Since the beginning of the 21st century, the field of computer vision has entered a boom period. During this period, computer vision achieved many amazing results, as shown in the timeline below:
In 2012, AlexNet made a splash in the ImageNet image classification competition, using a deep convolutional neural network (CNN) to beat all other entrants, reducing the error rate by 10 percentage points.
In 2014, GoogLeNet and VGGNet (visual geometry group) repeated their success in the ImageNet image classification competition, using deeper and more complex CNN structures to further improve classification performance.
In 2015, ResNet (residual neural) set a new record in the ImageNet image classification competition, using Residual Connection to solve the problem of difficult deep network training and reduce the error rate to below the human level.
In 2016, YOLO (you only look once) and SSD (single shot multibox detector) made a breakthrough in the target detection task, using a One-stage CNN structure to achieve fast and accurate detection of multiple targets in an image.
In 2017, Mask R-CNN made a breakthrough in the target segmentation task, achieving accurate segmentation of multiple targets in an image using a two-stage CNN structure.
In 2018, BERT (bidirectional encoder representations from transformers) made a breakthrough in the natural language processing task, using a Bidirectional Transformer structure to achieve a deep understanding of language, providing a powerful tool for the joint processing of images and text.
In 2019, AlphaStar made a breakthrough in the Starcraft II game, using Reinforcement Learning and Self-play to train intelligence that outperformed top human players, demonstrating a high degree of integration of computer vision and decision-making.
In 2020, GPT-3 made a breakthrough in natural language generation, using a 175 billion parameter transformer structure to generate fluent and logical text, making it possible to convert images and text into each other.