Google’s John Mueller Discusses TF-IDF Algo

    Google’s John Mueller discussed the role of TF-IDF in Google’s algorithm. He discussed what it was and offered a better way to optimize for ranking web pages.

    What is TF-IDF?

    Wikipedia has a concise definition of what TF-IDF is:

    “…tf–idf or TFIDF, short for term frequency–inverse document frequency, is a numerical statistic that is intended to reflect how important a word is to a document in a collection… The TF-IDF value increases proportionally to the number of times a word appears in the document and is offset by the number of documents in the corpus that contain the word, which helps to adjust for the fact that some words appear more frequently in general.”

    The key thing to focus on is that TF-IDF is a metric related to the entire “collection” or “corpus.” That means all the web pages containing a specific word or phrase. In the case of web search, this means that the metric depends on how often the word or phrase appears in every web page that exists online. This is a statistical analysis.

    That part about “some words appear more frequently in general” is about how TF-IDF is used to catch and remove commonly used words (and, a, the). TF-IDF is important for removing common words (like and, a, and the) from consideration for ranking purposes.

    TF-IDF is used to create statistic averages of the use of words and phrases throughout the web. It’s not the magic content solution that some people have suggested.

    Here is the question.

    “What are your thoughts on TF-IDF keywords? Does Google use a similar mechanism?

    Should we make use of this to make our content better?”

    John Mueller answered:

    “…TF-IDF keywords is essentially a metric that is used in information retrieval.”

    That reference to “information retrieval” is a reference to the general field of information retrieval. This includes the science of searching through the GMAIL inbox. Information Retrieval is a somewhat ambiguous term.

    Then he said this:

    “With regards to trying to understand which are the relevant words on a page, we use a ton of different techniques from information retrieval. And there’s tons of these metrics that have come out over the years.”

    This is a hint that focusing on an old metric that is useful for finding “stop words” is not useful because there are many other techniques used.

    TF-IDF and Ranking in Google

    “…My general recommendation here is not to focus on these kinds of artificial metrics… because it’s something where on the one hand you can’t reproduce this metric directly because it’s based on the overall index of all of the content on the web.

    So it’s not that you can kind of like say well, this is what I need to do, because you don’t really have that metric overall.”

    This means that it’s not possible to calculate the TF-IDF metric because it is based on statistics of the entire web.

    John Mueller Recommendations for Ranking Better

    John Mueller went on to describe a better alternative to focusing on TF-IDF:

    “Instead, I would strongly recommend focusing on your website and its users and making sure that what you’re providing is something that Google will in the long term still recognize and continue to use as something valuable.”

    Mueller revealed that this is a very old metric, implying that modern information retrieval has become more sophisticated:

    “The other thing is… this is a fairly old metric and things have evolved quite a bit over the years. …there are lots of other metrics as well.”

    Then he said that focusing on users is a better approach because it’s immune to changes. Google is focused on delivering the most useful search results. If you focus on useful content then the page will likely remain popular and shown on Google.

    Here’s what Mueller said

    “So just blindly focusing on just one kind of theoretical metric and trying to squeeze those words into your pages, I don’t think that’s a useful thing.

    I think that’s very shortsighted thinking because you’re focusing just purely on a search engine where you think that these words have a stronger effect.

    So, don’t just focus on artificially adding keywords. Make sure that you’re doing something where all of the new algorithms will continue to look at your pages and say, well this is really awesome stuff. We should show it more visibly in the search results.”

    TF-IDF and SEO

    • A major use for TF-IDF is for finding stop words like athe, and and.
    • This is an old and basic content metric
    • There are many other content metrics that are better than the basic and simple TF-IDF metric

    In a world where AI, neural networks and machine learning are the norm, TF-IDF is like a kids bike on training wheels compared to a Ferrari.

    Mueller referenced its use for weeding out stop words (i.e. words like and, the, and that). That seems a fitting use for such an old technology. A basic algo like this could very well be limited to contributing to the simple task of identifying stop words.

    We can’t know for sure, but the fact that Mueller mentioned TF-IDF in the context of stop word removal and didn’t mention any other context is meaningful.

    Watch the Google Webmaster Hangout here.

    Screenshots by Author, Modified by Author

    Recent Articles

    PlayStation Plus 12-month codes just dropped to $42 at Amazon ahead of PS5 news

    PlayStation Plus 12-month memberships just dropped to $41.99 on Amazon. It’s the first time PS Plus has gone on sale at a discount since...

    How to download offline Google Maps to your phone

    I've been on many a trip where I didn't have a good cell signal or Wi-Fi, and really needed to know where I was...

    Latest iPad leak reveals a huge change that could spill over to Apple’s iPhones

    The new iPad Air scheduled to launch in the second half of 2020 will reportedly replace its Lightning connector with a USB-C port...

    This massive, ultra-rugged Bluetooth speaker is 27% off

    Delivering 100 watts of power and styled with Ford Raptor truck accents, the ION Audio Raptor is now $217 on Amazon, just $7 shy...

    Amazon slashes 35% off Apple Watch 4 – just in time for Father’s Day

    Looking for a Father's Day gift? Amazon just knocked 35% off Apple Watch 4 models with GPS + Cellular. Amazon has dropped prices on Apple...

    Latest Stories

    Stay on op - Ge the daily news in your inbox

    Do NOT follow this link or you will be banned from the site!
    Translate »