TFIDF, or TF.IDF
Don’t be confuse artists known it as the Prince, Yeah seriously don’t be confuse it all have the same concept in the backend.
Before having an eye over it, you first need to understand why it came in the rise?
Why TF-IDF is a High Demand Ranking factor?
Many few people know that TF-IDF is a concept which is used by Google as the algorithm to rank documents or we can say that it’s a factor which outranks your content for a long period.
There is a rumor about Search Engine (as the other rumors) that Search Engine seems to focus more on term frequency then counting the keywords, it can be wrong or right we are not making any of the certified stamps.
And this is the reason which made us clear about the significance of TF-IDF.
But now I hope that you have got an idea that why the TF-IDF has risen so let’s move an up to our next topic.
What is TF-IDF?
TF-IDF stands for the Term Frequency – Inverse Document Frequency which is used by the Search Engines to better understand the content which is undervalued.
The main aspects of our blog are to make the SEO expert and content creators understood about TF-IDF that how a Search Engine uses this factor to rank any content.
According to Onely:
TF-IDF is an information retrieval technique that weighs a term’s frequency (TF) and its Inverse Document Frequency (IDF). Each word or terms have its respective TF and IDF score.
Now I know that you want to jump on the last topic of this guide, so let’s start.
How to Calculate TF-IDF?
The calculation part of TF-IDF is a little bit clumsy but you don’t be worry because here I have made the Simplest Guide over the internet.
The TF-IDF term is made up of two terms:
Term Frequency (TF): The first term is Term Frequency which means, the number of times a word appear in a document divided by the total number of words in that document.
TF = (Number of times a term appear in a document/total number of terms appear in the document)
Inverse Document Frequency (IDF): This second term can be calculated as the logarithm of the number, of the document available in a corpus divided by the number of a document where the specific term appears.
IDF = log_e(Total number of documents / number of documents with term t in it)
Still not to understand?
Let’s take an example:
Assume there is a document containing 100 words, where the word cat appears 3 times, so Term Frequency for the cat is then (3/100) = 0.03.
Now assume that we have 10 million documents and the word cat appears in one thousand of these 10 million. Then IDF will be log(10,000,000 / 1000) = 4
So, the TF-IDF weight is the product of these quantities 0.03*4 =0.12