Our client Pickyourtrail shares this interesting case study where the company has solved their issue of Google not crawling their React JS site.
A little background
Pickyourtrail website offers travel solutions/packages and also has a destination guide for travelers.
The Pickyourtrail team came up with this whole solution because their both the ‘products’ are built on different stacks.
The stacks don’t use Java, instead, they are built on React/angular using Node at the backend. When they built the product, they shifted from JS to React. The reason is that Java is an established language while Node was still new. While they didn’t have an issue with Java and they don’t see any reason to move out of Java architecture, they did face a challenge.
Here is the case study, as written by team Pickyourtrail and we are pretty sure you will connect with them the same way we did 🙂
Google was not crawling all the pages on our site that have been rendered with ReactJS. Take this page as an example – https://pickyourtrail.com/guides/bali is powered by React and the data for the application is dependent on our backend API’s.
When we did our research, we found that most articles would talk of fetch+render on Google webmasters. When we tried this solution on our page, it seemed to work, but not entirely. The content of this page was not indexed, we found.
Methods like SSR (Server Side Rendering) or pre-rendering was the most common way people solved this issue. However, in our case, this was a time-consuming task and not feasible. So we decided to come up with our own solution.
It was strange.
We analyzed the pages that were indexed to understand the behavior:
The above GSC screenshot shows the Googlebot view (left) and visitor view (right). It is obvious from this that Googlebot is unable to read/crawl the content rendered based on API response.
So the API call seemed to be creating the issue. Refer to the test below-
The react pages we built falls under the last category that is dependent on an AJAX API data call. The entire page took more than 5 seconds to render completely since the page had more than 20 – 25 images, a couple of Google maps embedded and a carousel, etc.
To ensure we had hit upon the real issue, we tried removing all the images and the embedded Google maps from the page. We also removed some parts of the response payload to reduce the size of the DOM tree.
We tried submitting the page to Google again.
At last, Googlebot was able to see what the user would see. The bot waited exactly 5 seconds and crawled whatever is available in the DOM. Now we knew clearly which way to head – the entire page had to load within 5 seconds at a reasonable internet speed.
And this how the test worked:
The golden ‘5 seconds’ and how we came to it
We had quite a challenge to get the entire page to load in 5 seconds, based on the API response.
We used react router to render for routing and the page functions as below:
The Page Loads
An API call is made based on the requested URL.
Content is rendered based on the response received from the API.
Conclusion: Three different ways to approach this problem
Approach 1: Progressively loading the content
The entire page had one API call that gave a relatively large payload with the content. Initially, we had an idea of breaking this into multiple API calls to reduce the size of the payload. This would also load the content in such a way that the DOM tree would be minimal for the first load.
Since Googlebot would see different content and user sees a different content, our SEO ( search engine optimization ) team said NO to this plan.
Pros: Reduced initial page load time, better UX
Cons: Possibility of a negative SEO implication
Approach 2: Using a “Read more” button
We had another idea where if we could load the main fold content alone using a quick API call, put a ‘Read more’ call to action, load the rest of the page’s content with another API call.
This would resolve the issue of cloaking, as the content would be the same for both the bot and the user until the user clicks on the Call to action button. This, however, leads to very poor user experience and so it was a NO.
Pros: Reduced initial page load time, no risk of SEO penalization for cloaking
Cons: Leads to poor UX
Approach 3: Optimize everything, almost everything
The safe method of performance optimization was what we had to go back to, and this worked for us.
- API Response payload size was reduced.
- Highest possible lossless compression of images on a page.
- Removed every unwanted line and optimized the CSS
- Cached the API response (though this didn’t have an impact from crawlability stand-point)
- Lazy loaded all the images and progressively loaded the embedded Google maps.
- Enabled GZIP
The page now loads very well under 5 seconds, however, the image loads a little after. The initial load of this page is well within the ‘less than 5 seconds’ test.
We crossed our fingers and submitted it to Google Webmaster tools to fetch and render.
And it got crawled in a couple of days! 🙂
How did we handle meta title and description?
This project was a Client Side Rendering one and we had to dynamically rewrite the meta title values and index.html page description for every dynamic page. The documentation of create-react-app suggests replacing the values of meta title and description at the server as seen below.
“Generating Dynamic <meta> Tags on the Server
Support Server rendering is not supported by the Create React App and so it might be a problem as to how to make <meta> tags dynamic to reflect the current URL. To solve this, we recommend to add placeholders into the HTML, like this:
Irrespective of the backend you use, on the server it is possible to read index.html into memory. Replace placeholders such as __OG_TITLE__, __OG_DESCRIPTION__ with values corresponding to the URL. To ensure these +placeholders are safely embedded into HTML, cleanup interpolated values.
You can share the route matching logic between client and server if you are using a Node server. The other option would be to duplicate it and it works fine in simple cases.”
This solution, however, is not in our scheme of things, given the limitations of our architecture. So we looked out for other options.
Using React Helmet to manage the meta tags
Be cautious when you use React Helmet. Because React Helmet replaces the meta tags after it’s invocation in the client’s browser. Although it happens in a fraction of seconds, you really need to know that it doesn’t change the meta tag values as it loads from the server. One of the best practices, while you use React Helmet, is to Render <Helmet> component as close to the top of the root element’s render method.
React Helmet really worked for us. We were able to change the meta title and description values as received from the response to our backend API calls. We run tests on various meta descriptions as well, they get crawled without any issues.