Slava Wagner – SEA, CRO, Data & Forecasting

Image captioning with Microsoft Azure Vision Studio for image search engine optimization

Hundreds of stock images are still on the website – but you haven’t felt like search engine optimization for image search so far? With Image Captioning on the Microsoft Azure Cloud you can automatically create image descriptions and tags.

If you want organic reach, you need content: Search engines can only trigger organic impressions and clicks from users if there is website content that matches certain search terms. For this purpose, information pages and blog articles with relevant keywords for the Google search are created in content marketing. However, hundreds or thousands of stock images, infographics and dashboard screenshots quickly pile up Labeling with search engine relevant titles, descriptions and alt tags often falls by the wayside.

This is also understandable, since organic Google traffic from image search often accounts for only 10% of all organic traffic for companies . That may not sound like much, but it is definitely a hidden potential when it comes to generating more ad hoc clicks from organic Google searches – especially if your organic hits have stagnated for months despite new content. So if you need a bit of inspiration and speed up your SEO captions, here’s what you can do: Automatically generate descriptions, titles, and tags with contextual object and face recognition via Microsoft Azure Cognitive Services Vision Studio.

Main overview in Microsoft Azure Cognitive Services Vision Studio. Here you can access Image Captioning and Common Tags features.

Image captioning with Microsoft Azure Cognitive Services Vision Studio

In 2020 , Microsoft released a new and revolutionary application on the Microsoft Azure cloud: Image Captioning. This is a powerful AI image recognition that can identify images and faces as well as classify them in their context. All you have to do is upload an image or select it via URL and the system will analyze it. The data is then output in the form of an automatically generated, coherent and appropriate text that reflects what can be seen in the image: the objects, people and the context in which they are located. What is particularly remarkable is the high reliability in the recognition of objects and people, as well as the reproduction of their contextual relationship. You can either program this directly in the Microsoft Azure Cloud or use it in the Microsoft Azure Vision Studio without any programming knowledge.

Example of contextual image recognition with image captioning in Microsoft Azure Cognitive Services Vision Studio: The caption A person holding a phone was extracted from this image.

In this example, the AI ​​image recognition using image captioning identified the sentence A laptop with a keyboard.

Another example: This image was given the description with Image Captioning: A glass lantern with lights.

Common tags with Microsoft Azure Cognitive Services Vision Studio

There is another important building block for the search engine optimization of your images: the application Common Tags. This allows you to extract properties from images via Microsoft Azure Cognitive Services in the form of keywords. This is also done via AI image recognition for objects, faces and contexts. The list of these keywords is much richer than a single sentence with image captioning, since related terms are also listed here, generic terms and many object designations that appear in the image next to the main context.

You can use the extracted tags with Microsoft Azure Cognitive Services to create additional descriptions for your SEO image captions, for example by making one or more sets of them for image ALT tags.

With Microsoft Azure Cognitive Services Image Captioning, you can extract relevant keywords from an image using AI-based object recognition. This can be done with the Common Tags application.

In this example, the common tags are displayed, which also contain percentage confidence values ​​under Detected Attributes. These describe how likely it is that the keyword most closely applies to the image context.

In the Microsoft Azure Cognitive Services Vision Studio, objects and people are recognized based on AI. Direct designations, related designations and generic terms in the image context are output as keywords.

Instructions: How to use Vision Studio via Microsoft Azure Cloud

Step 1 - Create Microsoft Azure Account

In order to be able to use the Microsoft Azure Vision Studio for image captioning and common tags, you first need an account on the Microsoft Azure Cloud. You can get this for free with a starting credit of 200 US dollars for 30 days. That’s enough to work through your website images textually.

Step 2 - Create Resource Group on Microsoft Azure Cloud

After you create an account with a subscription, make note of your subscription name. This can be Azure subscription 1, for example. Then go to the Resource Groups menu item in the Cloud Console. Click the Create button at the top.

Create a new resource group in the Microsoft Azure Cloud Cloud Console.

In the new resource group, select your subscription, set a name for the resource group and – very importantly – select region US East or US West (nothing else) for resource details. The reason for this is: Vision Studio is served from these server locations, among others, and if you select a different region, the application will not work either.

Select your subscription and name your resource group.


Step 3 - Open the Microsoft Azure Cognitive Services Vision Studio

Now open the Microsoft Azure Cognitive Services Vision Studio and select: Add captions to images (Image Captioning) if you want to generate full sentences as image captions, or Extract common tags from images (Common Tags) if you want to extract the most important keywords from an image want to extract.

Step 4 - Select Resource Group on Microsoft Azure Cloud

BBefore you can upload an image, you must first select your resource group that you just created. Confirm that the use of Vision Studio will be billed from your free starting credit on the Microsoft Azure Cloud and select your resource group at Please select a resource.

In the Microsoft Azure Cognitive Services Vision Studio, click Please select a resource.

Now select your Microsoft Azure subscription and click on Create a new resource under Azure Resources .

Create a new resource (create a new resource), which you can then assign to your previously created resource group.


Give your new Azure resource a name, select your Azure subscription (Subscription). Then under Resource group you can select the resource group that you previously created in the Microsoft Azure Cloud. For Location, select the server region that you have configured for your resource group (i.e.: East US or West US ). Select S0 for price tier – this is the standard billing for your free starting credit. Then click on Create resource. Finished! Now you can upload your images and use the Image Captioning and Common Tags features.

Now give the resource a name, select your previously created resource group from the drop-down menu, set the server location (East US or West US) and use S0 as the price tier.

Image SEO with image captioning via Microsoft Vision Studio

How can you automate and systematize your image captions for image SEO so you get results fast? If you have any questions regarding this blog article, feel free to ask:

Ask your question


What does image captioning with Microsoft Azure Vision Studio do for your image SEO?

Image captions with strong keywords can account for a good 10% of all organic clicks from Google searches. 

This means: if you don’t already have that many clicks from Google image search, you can generate 10% more organic clicks by creating image captions and enriching them with keywords that have a high search volume.

Image captioning with Microsoft Azure Vision Studio helps you to analyze the content of the image based on AI and to create complete sentences and keywords. You can enrich these text modules with keywords and use them for your SEO.

Ask your question

Summary: Microsoft Azure Image Captioning

With the Image Captioning application on the Microsoft Azure Cloud, you have the option of automatically analyzing and labeling images in order to use the text templates for your image search engine optimization. To do this, you should first create a free Azure account and create a resource group on the Azure cloud. You can then upload images to Vision Studio and use Image Captioning to generate a caption that summarizes the content of the image.

Image captioning with Microsoft Azure offers companies two main advantages. On the one hand, it simplifies the captions for your image SEO by automatically summarizing the content of images quickly and easily. Second, it can also help improve image accessibility by making image content accessible to people who cannot see the image (via text readers that read the image’s ALT tag). Overall, Microsoft Azure offers a powerful tool for automatically analyzing and labeling images with its Computer Vision service.

FAQ - Summary of Image Captioning with Microsoft Azure Vision Studio

Here you will find a summary of important and frequently asked questions about image captioning with Microsoft Azure Vision Studio:

With image captioning on the Microsoft Azure Cloud, you can automatically label your images in Vision Studio. People and objects are automatically recognized in the image, as well as the context in which they stand.

With Common Tags on the Microsoft Azure Cloud in Vision Studio you can tag uploaded images. This gives you numerous suitable keywords as well as synonyms and generic terms for the content of your image for your image tags in the area of search engine optimization (SEO).

With the automated image captions via Image Captioning on the Microsoft Azure Cloud, you can make the work of finding SEO captions for your images easier, since you get a complete sentence for your image that you can use as an ALT tag, for example.

With the Microsoft Azure Vision Studio, you can use the Image Captioning function for your images with an Azure account, which gives you an automated text description of your image. You can also automatically tag your images with common tags.

You might also be interested in:

Track phone number redirects from Google Ads campaigns

With Google Tag Manager you can install phone number forwarding for Google Ads campaigns on your website.

Create reports with Data Exploration in Google Analytics 4

Many report views that were preconfigured in Google Analytics 3 must be manually created in Google Analytics 4.