3 reasons for copywriters to relax: keyword density, TF-IDF, and semantic search

Elena Dryamina

11 years ago

Content is for humans. But humans find content by using search engines.

For each query, Google retrieves the most relevant pages from its databases. As a professional SEO copywriter, you do believe that your webpage is the best one out of all indexed pages. However, Google’s bots might not share your opinion.

To avoid misunderstanding with search engine bots, you are trying to understand what they take into account while estimating your content’s quality.

But sometimes, while analysing ranking factors, you get confused. Are you targeting the right keywords? Should you add more of them into your post? Oh, may this be considered spam? Should you then remove them?

Do you want to focus on producing SEO-friendly content without constantly wondering whether or not it would pass search engine crawlers’ checks? Then, read on!

In this post, we will take a closer look at keyword density, TF-IDF and semantic search. Personally, these three phenomena were my biggest SEO confusions. That’s why today I propose to discuss how to approach them.

Should keyword density stay or go?

“Keyword density is a SEO myth. There is no ideal percent of keywords. Measuring percentage of keywords is a waste of time.” This is what we hear the most about this phenomenon, and this sounds quite logical.

Search engines are smart, and they use complicated information retrieval mechanisms. That’s why it would be naive to believe that we can seduce search bots by incorporating more keywords into a webpage.

So, dear SEO copywriters, breathe a deep sigh of relief, and forget about counting keywords! Instead, focus on their placement in your copy.

For purposes of making your content more understandable for bots and more eye-catching for humans, make sure that you have your target keywords in:

(Strongly recommended)

Title tag
Meta description
H1 tag

(Optionally, in case if relevant)

Sub-headings (h2, h3, etc.)
ALT texts and names of images
The URL of a webpage.

As for keyword usage in the <body> element of your webpage, it is crucial not to overuse terms.

This is when keyword density analysis tools come handy. They help us avoid keyword stuffing by analysing information on counts and percentage of a keyword or phrase within a webpage, its body, title, meta description, URL, etc.

Here are few tools you can use to check whether or not your copy is overloaded with particular keywords:

Keyword Density Analysis tool by Internet Mar k eting Ninjas
Plugin Seoq uake
Keyword Density Analyzer by SEO book
By the way, if you ever see that your copy has too many “to”, “you”, “and”, don’t worry.

Should I worry about TF-IDF?

The TF-IDF algorithm goes beyond counting how many times a term appeared on a webpage. It determines a term’s significance (or weight) by calculating its frequency within a specific document compared to the inverse proportion of this term over the entire collection of documents (or a corpus).

Terms that appear in a small group of documents will have higher TF-IDF weight than common words such as articles and prepositions.

Summing the TF-IDF for each query term in a search phrase can serve as one of ranking factors.

The Term Frequency (TF) shows a keyword or phrase’s importance within a single document, and it can be simply represented by raw frequency:

TF =

number of times a word appeared in a document

In order to normalize the TF value with respect to the document length, term frequency can be divided by the total number of terms in a document:

TF =

number of times a word appeared in a document /

total number of words in a document

You can also find more variants to calculate term frequency:

In this post, we will use the first two variants.

The Inverse Document Frequency (IDF) represents a term’s significance over a corpus, and it can be calculated based on the following formula:

IDF =

log (number of documents in a corpus/ number of documents containing a word)

The use of logarithms (here the base of the log is 10) serves to weigh down more frequent terms and to add more weight to rare ones.

What did you say?! Logarithm?!!

No worries, guys! I wasn’t a big fan of logarithmic quotations in high school neither. So pinky swear, no complex calculations.

To get the TF-IDF score, we should simply multiply Term Frequency by Inverse Document Frequency:

Formula 1

TF-IDF score =

number of times a word appeared in a document

(log (number of documents in a corpus / number of documents containing a word))

Formula 2

TF-IDF score =

number of times a word appeared in a document/

total number of words in a document

log (number of documents in a corpus number of documents containing a word)

Let’s use these formulas and our imagination to better understand what TF-IDF means.

Imagine, we want to write an article using the term “folding bike” and to calculate its weight. We also have an imaginary collection of 10,000 webpages related to transport, and only 800 webpages containing both “folding” and “bike” terms. We have already data for the IDF calculation:

the number of documents in a corpus: 10,000
the number of documents containing a term: 800.

One more imagined condition is that three pages of our collection appeared at the top of Google’s search results for the query “folding bike”. Logically, the webpage with the highest TF-IDF should pop up on the first position, and the one with the lowest score should appear at the third position. Let’s check this idea.

For this purpose, let’s create a term count table containing the number of times words “folding” and “bike” appeared in a webpage, as well as the total number of words per article:

Position	URL of a document	folding	bikes	total
1	https://en.wikipedia.org/wiki/Folding_bicycle	64	33	2324
2	http://www.independent.co.uk/life-style/the-10-best-folding-bikes-8683766.html	17	14	1197
3	http://www.nycewheels.com/compact-bikes.html	31	44	1087

*The SEO Keyword Density checker was used for counting words.

Now, let’s use this data to calculate the TF-IDF score separately for words “folding” and “bike” for each document, and then sum up their values.

It’s time to incorporate data into our formulas. As promised, using logarithms won’t require you any additional efforts. Just use Google’s search bar, and you’ll get the TF-IDF almost automagically!

By using the Formula 1, we’ve got the following results:

Position	URL	TF-IDF folding	TF-IDF bikes	TF-IDF Sum
1	https://en.wikipedia.org/wiki/Folding_bicycle	70.2022408325	36.1980304293	106.400271262
2	http://www.independent.co.uk/life-style/the-10-best-folding-bikes-8683766.html	18.6474702211	15.3567401821	34.0042104032
3	http://www.nycewheels.com/compact-bikes.html	34.0042104032	48.2640405724	82.2682509756

Surprisingly, the webpage from Independent.co.uk, which has a lower TF-IDF score than the article from Nycewheels.com, appears higher in search results:

By using the Formula 2, we get the heaviest TF-IDF for the Nycewheels.com’s article. But the post by Independent.co.uk still has the lowest score.

Position	URL	TF-IDF folding	TF-IDF bikes	Sum
1	https://en.wikipedia.org/wiki/Folding_bicycle	0.03020750466	0.01557574459	0.04578324925
2	http://www.independent.co.uk/life-style/the-10-best-folding-bikes-8683766.html	0.01557850477	0.01282935687	0.02840786164
3	http://www.nycewheels.com/compact-bikes.html	0.03128262226	0.04440114128	0.07568376354

Surely, this is just an imaginary example. We don’t know for sure the size of a corpus and the number of relevant documents Google analysed before listing search results. But hypothetically, we can still assume that, due to the little term frequency of the Independent.co.uk’s document, its TF-IDF score will be dramatically lower than this one of Nycewheels.com’s post.

Obviously, TF-IDF is just a part of the huge and complex Google’s information retrieval and analysis mechanisms, and it doesn’t have a direct impact on rankings.

So don’t be stressed about your scores, but just try to make it more understandable. How? Of course, semantically!

Semantic research is the answer

With the arrival of the Hummingbird algorithm, the keyword era ended. Since then, only the user-intent matters. What is the intent behind a user’s search query? This is what you should keep in mind while editing your copy.

Your content should aim a specific intent and answer a particular user’s question. Keywords are just tools for modelling your topic and help bots to extract contextual meaning.

For example, there are all kinds of synonyms and related terms for the phrase “folding bikes”:

Synonyms

Folding bicycle

Compact bicycle

Portable bike

Collapsible bike

Electric bike

Mountain bike

Road bikes

Antonyms

Non-folding models

Standard bike

Regular bike

General notions

Urban transport

Cycling

Features

Folding mechanisms

Folding speed

Folding ease

Compactness

Compact mode

Folding version

Portability

Off-road riding

Material and elements

Aluminium frame

Disc brakes

Gears

Folding pedals

Steel model

BMX wheels

Upgradeable frame

Brands

DAHON JIFO

BIRDY TOURING DISC

STRIDA SX

And more!

You can combine the same keywords to cover different users’ needs. Decide what exactly you’d like to cover: The best folding bike; Where to buy a folding bike; The cheapest folding bike; Folding bikes vs. Standard bikes; A folding bike for urban cycling; the best model of folding bikes for off-road riding; Steel vs. Aluminium models. Or something else.

Once you have a direction, make sure that you fully answer a user’s question by providing him or her with information-rich and well-structured content. This way, the right keywords will come over naturally.

Final Thoughts

Ranking factors and search algorithms constantly evolve and might seem confusing. But they all tend to provide a user with the best answers. So, if you do the same, relax and create great content:

Instead of stuffing your target keyword all over the webpage, make sure you place them in title and meta description tags, as well as in headings.
Since less frequent terms might have more search weight, you can try more niche and low-volume keywords for your content.
User-intent is everything, keywords are just tools.

What is the most confusing thing in SEO for you? Share with us in comments.