Computer Vision Test: Amazon v. Google v. IBM v. Microsoft v. Pinterest
A comparison of major tech companies’ computer vision technologies based on the labels they affixed to 10 photos.
Major tech companies are developing artificial intelligence technology to teach computers how to see images the way people do, from detecting individual objects to recognizing entire scenes. Amazon, Google, IBM, Microsoft and Pinterest have been using their computer vision capabilities to help people to find products, media companies to automatically edit videos and marketers to target ads.
But how well can computers actually see? To test for an answer, I took 10 photos — five professionally shot product images, five amateurishly shot by me — and ran them through the five aforementioned companies’ computer vision tools (Amazon’s, Google’s, IBM’s and Microsoft’s computer vision APIs and Pinterest’s in-app Lens feature) to see how the labels that each company affixed to each image compared. Watch the video below to see what they saw.
[youtube]https://youtu.be/-SYchGku4RE[/youtube]
So how well could the computers see?
For a computer, pretty good. All five companies recognized that shoes were shoes, a shirt was a shirt, a dress was a dress, a desk was a desk, a bag was a bag and a couch was a couch (or a sofa). And some were able to get even more specific. Pinterest used its metadata from people’s pins to identify that a sneaker was more specifically a Vans sneaker. And both IBM and Pinterest picked up on the fact that the couch was a sectional.
For a person, though, they might need to update their prescription lenses. Microsoft thought a tomato was an apple, and while Amazon and Pinterest were confident that the tomato was a tomato, they also thought it might be a persimmon or apple, respectively. And when it came to the pictures I took of my shoes on the floor, Microsoft paid more attention to the floor than the shoes. IBM also seemed a bit overzealous with some of its results; I don’t know what “jodhpur breeches” are, but those Levi’s were not them, and my messenger bag is not a mailbag, as much as that could be a cool work bag. And as capable as Pinterest was at recognizing my high-tops were Vans, it thought my running shoes were Nikes, even though it says Hoka across the heel.
Do computers have perfect vision? No, not now, and maybe not ever. But do they have adequate vision? Yeah. It may be ideal for a computer to always be able to identify the brand behind a product — be it a t-shirt, a pair of shoes or a messenger bag — in order to recognize if a person has an affinity for that brand or to show similar products from that brand, through an ad or otherwise. But to know, at least, that a shirt is a shirt and a shoe is a shoe shows how capable computers have become at shedding light on what was once a completely black box.
Opinions expressed in this article are those of the guest author and not necessarily MarTech. Staff authors are listed here.
Related stories
New on MarTech