A short introduction to computer vision

  • Computer vision, often abbreviated as CV, is defined as a field of study that seeks to develop techniques to help computers “see” and understand the content of digital images such as photographs and videos.
  • It uses machine learning, specifically deep learning, and convolutional neural networks to analyse data.

Computer vision is a field of AI that utilises machine learning and neural networks to enable computers and systems to extract meaningful information from digital images, videos, and other visual inputs. This enables them to make recommendations or take actions in response to defects or issues they perceive.

What is computer vision?

Computer vision applies machine learning to images and videos to understand media and make decisions based on them. Essentially, it gives software and technology the ability to “see.”

If AI allows computers to think, computer vision enables them to see, observe, and understand. While computer vision operates similarly to human vision, humans have the advantage of contextual experience to distinguish objects, judge distances, detect motion, or identify image anomalies.

How does computer vision work?

Computer vision relies heavily on data. It repeatedly analyses data to discern patterns and ultimately recognise images. For instance, training a computer to identify automobile tires requires feeding it extensive images of tires and related items to learn distinctions and accurately identify tires, especially those without defects. Two key technologies used for this purpose are deep learning and convolutional neural networks (CNNs).

Machine learning employs algorithmic models that allow computers to autonomously learn the context of visual data. Given sufficient data, the computer learns to differentiate between images on its own, rather than through explicit programming for image recognition.

A CNN assists machine learning or deep learning models by breaking down images into tagged or labeled pixels. Using these labels, the CNN performs convolutions—a mathematical operation combining two functions to produce a third—and predicts the content it “sees.” The neural network refines its predictions through iterative convolutions, gradually improving accuracy until its predictions align with reality. In this manner, it perceives or recognises images akin to human perception.

Also read: Exploring computer vision through autonomous driving

Also read: Why is computer vision so difficult?

History of computer vision

For approximately 60 years, scientists and engineers have endeavored to develop methods for machines to perceive and comprehend visual data. Initial experiments in 1959 involved neurophysiologists presenting arrays of images to cats to observe corresponding brain responses.

The 1960s witnessed the emergence of AI as an academic discipline, marking the beginning of efforts to address human vision challenges.

1974 introduced optical character recognition (OCR) technology capable of identifying text regardless of font or typeface. Similarly, intelligent character recognition (ICR) could decipher hand-written text that is using neural networks.

In 1982, neuroscientist David Marr established the hierarchical nature of vision and introduced algorithms enabling machines to detect edges, corners, curves, and other fundamental shapes.

By 2000, the focus shifted towards object recognition, culminating in the debut of real-time facial recognition applications in 2001. Throughout the 2000s, the standardisation of tagging and annotating visual datasets gained prominence.


Revel Cheng

Revel Cheng is an intern news reporter at Blue Tech Wave specialising in Fintech and Blockchain. She graduated from Nanning Normal University. Send tips to r.cheng@btw.media.

Related Posts

Leave a Reply

Your email address will not be published. Required fields are marked *