0

In my multisite application, I need to include a robot.txt file for each of the site. The implementation for this goes as follows:

1- Included a RobotsContent property of type textarea within the Start page.

2- Added a hander as given below with a web config entry for the handler.

public void ProcessRequest(HttpContext context)
        {
            var uri = context.Request.Url;

            var currentSite = _siteDefinitionRepository.List().FirstOrDefault(siteDefinition => siteDefinition.Hosts.Any(hostDefinition => hostDefinition.Authority.Hostname.Equals(uri.Host)));
            if (currentSite != null)
            {
                var startPage = _contentLoader.Get<StartPage>(currentSite.StartPage);

                var robotsContentProperty = startPage.RobotsContent;

                // Generate robots.txt file
                // Set the response code, content type and appropriate robots file here
                if (!string.IsNullOrEmpty(robotsContentProperty))
                {
                    context.Response.ContentType = "text/plain";
                    context.Response.Write(robotsContentProperty);
                    context.Response.StatusCode = 200;
                    context.Response.End();
                }
            }
        }

I am aware there are a few nuget packages available for handling robot.txt but for some reasons & the need to have more control on this one ,I created a custom one. The above works as expected.

Referreing https://developers.google.com/search/docs/advanced/robots/create-robots-txt

It mentions that the rules are case sensitive ,comes in a group(user-agent, allow, disallow),directives(user-agent, allow, disallow )are required. With all these rules in place & this being a free textarea,I can add any random stuff within this.So is there any validations that I can apply to this?There are online validations avaliable for this but is there any way I can validate the text when it is being published.

1 Answers1

1

You can implement an EPiServer validation attribute and use that on your RobotsContent property.

using EpiServer.Validation
public class RobotTxtValidatorAttribute : IValidate<StartPage>
{
    public IEnumerable<ValidationError> Validate(StartPage startPage)
    {
        // Validate the property value here, i.e. by using an HttpClient to use the online validation that you mentioned.
    }
}
public class StartPage
{
    [RobotTxtValidator]
    public string RobotsContent { get; set; }
}

If using an online validator is not an option this could be handled by a i.e. a regular expression inside the Validate method of the attribute implementation.

Marcus Åberg
  • 213
  • 2
  • 10
  • Thanks for the response Marcus.Yes,this is what I have in mind to go with custom validator .However, using a 3rd party api to validate the same is where I am skeptical considering the long term usage.I havent found it yet but If there is an api from google that I can use for this ? – Farhin Shaikh Apr 13 '22 at 05:53
  • I am not aware of any such api, but I haven't looked for one either. If you do not want to rely on a third party package/service then you can implement your own validation logic i.e. by using a regular expression. Why the decision to put the contents of the robot.txt in a property for the editors to edit, if I may ask? – Marcus Åberg Apr 13 '22 at 22:58
  • Having the content as a property is because i have a multi site application & the contents vary on a site by site basis plus having it as a property gives it flexibility to add/remove contents whenever there is a change needed.Also,it's more of a human error possibility whether it is being edited by a non technical /technical editor so was just looking for some validation to be on a safer side.I hope I got ur question right here. – Farhin Shaikh Apr 14 '22 at 07:11