Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
I previously did a short review on Microsoft’s image recognition and face detection API. A couple of weeks ago Google announced their vision API providing some similar features. Even though there is no R package or code to dive into this API and their API documentation is rather sparse, I thought it could be fun and inspiring to give it a try.
In general, it works as Microsoft’s API, provide an image, select which kind of analysis you like and receive a (well) coded response.
To give you an idea how that looks like for “face detection”. Let’s use the same Arnold Schwarzenegger photo.
For the provided image, one receives a table with the following columns: “boundingPoly” “fdBoundingPoly” “landmarks” “rollAngle” “panAngle” “tiltAngle” “detectionConfidence” “landmarkingConfidence” “joyLikelihood” “sorrowLikelihood” “angerLikelihood” “surpriseLikelihood” “underExposedLikelihood” “blurredLikelihood” “headwearLikelihood”.
with 2 rows. One for Arnold, one for his wife. The following sub-setted table shows the results:
tiltAngle | detectionConfidence | landmarkingConfidence | joyLikelihood | sorrowLikelihood |
---|---|---|---|---|
-12,861,863 | 0,99996805 | 0,73490918 | VERY_LIKELY | VERY_UNLIKELY |
-0,25818413 | 0,99998611 | 0,76625621 | VERY_UNLIKELY | VERY_UNLIKELY |
In comparison to Microsoft’s API: Not very impressive. So let’s try something else: The API also provides access to a functionality called logo detection.
description | score |
---|---|
Walmart | 0.50977039 |
Coca Cola Shoes | 0.48768377 |
Sainsburys | 0.47962409 |
IKEA | 0.45845419 |
Kellogg’s | 0.454154 |
Disney | 0.44845602 |
Guardian Co Uk | 0.42800492 |
Nintendo | 0.41539443 |
Heinz | 0.41503713 |
Interesting! The results show that some brand logos are correctly detected. However most logos go unrecognized, even their own brands Google and the YouTube are not returned. I tried some other other images with different brands and the results are mixed at best. As a quick note: Google’s Vision API is not on par with human recognition.
Let’s finally test their OCR capabilities by providing the same image of brand logos.
As a result the API returns:
“Tube, Sainsbury’s, Royal Mail, Colgate 4, You, HEINZ BBC, VISA, PEPSI, MARKS, SPENCER, Vodafone, Dove, amazon YAHOO!, twitter, Nintendo, WIKIPEDIA, ISNEp r BlackBerry, Google IKKEA, C2, facebook, Oxfam, BTe, ER the, market, com, dyson, Microsoft, compare, TESCO John Lewis, Walmart, Save money. Live better., AMSUN, orange, CHANEL, SONY, guardian, SkV, MasterCard, BARCLAYS, “
That looks pretty impressive to me. Even though all brands use their own typo and colors, most brand names are well returned.
In case you want to try it yourself, please see the commented R-code. Compared to Microsoft, you need to provide billing information even though the first 1000 API calls are supposed to be free…
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.