I don’t believe URLs anymore

Had an interesting problem float past me the other day, that had me thinking a little outside the CF square.

Situation: We have an existing HTML website, and we are moving it to a CF website.

Problem: We don't want to break any of our existing HTML links. I.e. if you we have http://www.mysite.com/page.html it should stay the same – so no sneaky .cfm extensions.

Interesting Point: It's a standalone CF Instance running on a J2EE server. Hmnnn…

So the obvious starting point is to map the CF Servlet to *.html – much like you could already do within a IIS configuration, nothing all that interesting there.

Obviously there is now some new CF magick, that allows for the pulling in of content (basically through a single CF Custom tag, that does some processing dependent on the URL).

But given the standard CF setup, that would mean we have to create a CF page for every old html page. That is seriously gonna suck, and is a lot of hard work, for not much payoff.

So of course, I got a' thinking and said – 'wait a minute, why can't we use a servlet to fake the html page, and then pass the URL information to a single CF page via the request scope?'

I.e. do something like this:

public class CFForward extends HttpServlet
{

private static final String FORWARD = "forward.cfm";

public void doGet(HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException
{

//grabs the servlet path
String servletPath = request.getServletPath();

/*
Sets the request scope parameter called 'servletpath' that is accessable from the
Coldfusion page. Key must be LOWERCASE!!!
*/
request.setAttribute("servletpath", servletPath);

//forward on to the coldfusion page.
this.getServletContext().getRequestDispatcher(FORWARD).forward(request, response);
}
}

Then add this Servlet mapping to *.html (and get rid of the CF one) – and presto, seamlessly drives all *.html requests over to "forward.cfm", along with the relevent details in it's request scope.

That AND all the URL and FORM scope data is still there as you would need it.

(Note: my version pulls in details from the web.xml, and does some other stuff but I figured I'd keep it simple)

So now you have what looks like a static HTML page, which is really a servlet running a ColdFusion page behind the scenes.

Yup, I don't believe what a URL tells me anymore. Where have the days gone when .html was static, .cgi was perl, and .exe was crazy ;o) You always knew where you stood with a webpage, just by looking at it's extension. Now it's all smoke and mirrors.

The only problem I've hit so far – on the website, the web stats package picks up 404 errors, however, with a servlet mapped to *.html, ANY page with ends with a .html will get picked up by the servlet. If relevent (i.e. no content for that URL), I need to be able to push a 404 error to the Server (getting it to the client isn't hard, you don't even have to use a real 404). Not sure how I'm going to do that yet.

Thought it was a nifty idea just the same.

 

Leave a Comment

Comments

  • Steve Nelson | June 16, 2004

    Why not just change the extension in the webserver?

  • Mark | June 16, 2004

    To recap:

    That was the original idea – however, that means we would have to create a new cfm page for each of the old html files. That’s a serious workload when you have several thousand html files floating around.

    Much better idea to fake the URL, and then pass the relevent information through a *single* cfm page, and then you encapsulate the logic of the system through a single entry point.

    Make sense?

  • Shawn Porter | June 16, 2004

    If you’re using Apache, you can also just do a little mod_rewrite action. The following set of directives tell Apache that for any requests for filenames named *.html it should check to see if the requested file exists. If it doesnt and the same filename with a .cfm extention does, it rewrites the request to *.cfm. ISAPI_Rewrite (for IIS) doesn’t support file tests, so a similar rule could only be applied to individaul files or globally, not dependent on the existance of a file as you can do with Apache and mod_rewrite.

    RewriteEngine on

    # if the filename ends in .html
    # parse it out,
    # remember that this happened
    # continue processing
    RewriteRule ^(.*).html$ $1 [C,E=WasHTML:yes]

    # if filename.html doesn’t exist
    RewriteCond %{REQUEST_FILENAME}.html !-f
    # and filename.cfm does
    RewriteCond %{REQUEST_FILENAME}.cfm -f
    # use filename.cfm
    # and skip the next rule
    RewriteRule ^(.*)$ $1.cfm [S=1]

    # revert to filename.html
    RewriteCond %{ENV:WasHTML} ^yes$
    RewriteRule ^(.*)$ $1.html

  • Craig M. Rosenblum | June 17, 2004

    I’d create a 301 redirect to a new folder, that reads the file name via a cgi variable, then uses that to search for the content.

  • Mark | June 17, 2004

    Shawn – how is that different than mapping all CF requests to file with a .html extension?
    (either through the J2EE container, or whatever web server you are looking for).

    Craig – Interesting idea! Might have a look into that.

  • Shawn Porter | June 18, 2004

    Mark – I’m not familiar with the other solutions. But with mod_rewrite it works like this….

    Lets say your actual directory contents looks like this:

    /index.cfm
    /new.cfm
    /old.html
    /other.html
    /other.cfm

    Requests will be directed like this:

    (simple requests)
    /new.cfm -> /new.cfm
    /old.html -> /old.html

    (if both *.html and *.cfm exist)
    /other.html -> /other.html
    /other.cfm -> /other.cfm

    (when *.html doesn’t exist but *.cfm does)
    /index.html -> /index.cfm
    /new.html -> /new.cfm

    I run Apache on Linux and in my ~/public_html/ directory I have an .htaccess file with these mod_rewrite directives.

    On my site, these two URLs are both handled by car.cfm

    http://rit.net/~sporter/car.cfm
    http://rit.net/~sporter/car.html

  • Mark | June 18, 2004

    Shawn –

    That is some really nifty stuff.

    The only thing is, we dont’ want to have to create .cfm files to replace all the .html file.

    The power of the servlet is that it takes all requests that end with .html, and forwards them to forward.cfm – so:
    /index.html -> forward.cfm
    /products/dogtoys.html -> forward.cfm

    etc, and passes through the relevent URL details through the request scope to the CFM page, so it can then use them. (That’s the real nifty bit)

    That one of the really nice things about Java with CF – they can share information.

  • Shawn Porter | June 18, 2004

    Oh, is see what you want. That is easy in Apache also…

    # have forward.cfm handle every *.html request
    ScriptAliasMatch *.html /home/sporter/public_html/forward.cfm

  • Shawn Porter | June 18, 2004

    I forgot to mention that when using ScriptAliasMatch you will be able to access the originally requested URL in your ColdFusion code by using the variable CGI.REQUEST_URI

  • Mark | June 18, 2004

    Shawn.
    Nice one. Going to pass that on.