Wednesday, February 18, 2009

FileServlet supporting resume and caching and GZIP

Introduction

In the almost 2 year old FileServlet and ImageServlet articles you can find basic examples of a download servlet and an image servlet. It does in fact nothing more than obtaining an InputStream of the desired resource/file and writing it to the OutputStream of the HTTP response along with a set of important response headers. It does not support resumes and effective caching of client side data.

If one downloaded a big file and got network problems on 99% of the file, one wouldn't be happy to discover the need to download the complete file again after getting network back. If a browser decides to check the cached images for changes, it would send a HEAD request to determine under each the unique file identifier and its timestamp or it would send a conditional GET request to determine the response status. If the image isn't changed according to the server response, the client won't re-request the image again to save the network bandwidth and other efforts.

You could leverage the task to a default servlet of the webcontainer/appserver you're using, but most of them doesn't implement all of the performance enhancements, so does for example Tomcat's DefaultServlet not support the Expires header.

Back to top

Resume downloads

To enable download resumes, the server have to send at least the Accept-Ranges, ETag and Last-Modified response headers to the client along with the file.

The Accept-Ranges response header with the value "bytes" informs the client that the server supports byte-range requests. With this the client could request for a specific byte range using the Range request header.

The ETag response header should contain a value which represents an unique identifier of the file in question so that both the server and the client can identify the file. You can use a combination of the file name, file size and file modification timestap for this. Some servers hauls this combination through a MD5 function to get an unique 32 character hexadecimal string. But this is not necessarily unique because two different strings could generate the same MD5 hash, so we won't use it here. The client could resend the obtained ETag back to the server for validation using the If-Match or If-Range request headers.

The Last-Modified response header should contain a date which represents the last modification timestamp of the file as it is at the server side. The client could resend the obtained timestamp back to the server for validation using the If-Unmodified-Since or If-Range request headers. Important note: keep in mind that the timestamp accuracy in server side Java is in milliseconds while the accurancy of the Last-Modified header is in seconds. In Java code you should add 1 second (1000ms) to the value of the If-* request headers to bridge this difference before validation.

Whenever the client sends a partial GET request with a Range request header to the server, then server should intercept on the conditional GET request headers (all headers starting with If) and handle accordingly. Whenever the If-Match or If-Unmodified-Since conditions are negative, the server should send a 412 "Precondition Failed" response back without any content. Whenever the If-Range condition is negative, then the server should ignore the Range header and send the full file back. Whenever the Range header is in invalid format, then the server should send a 416 "Requested Range Not Satisfiable" response back without any content.

If a partial GET request with a valid Range header is sent by the client, then the server should send the specific byte range(s) back as a 206 "Partial Content" response.

Back to top

Client side caching

The principle is the same as with resume downloads, with the only difference that no Range request header is been sent to the server. The server only have to check and validate any conditional GET request headers and respond accordingly. Usually those are the If-None-Match or If-Modified-Since request headers. The client could also send a HEAD request (for which the server should respond exactly like a GET, but completely without content) and determine the obtained ETag and Last-Modified response headers itself.

Whenever the If-None-Match or If-Modified-Since conditions are positive, the server should send a 304 "Not Modified" response back without any content. If this happens, then the client is allowed to use the content which is already available in the client side cache.

Further on you can use the Expires response header to inform the client how long to keep the content in the client side cache without firing any request about that, even no HEAD requests.

Back to top

GZIP compression

To save more network bandwitch, we could compress text files (text/javascript, text/css, text/xml, text/csv, etcetera) with GZIP. Generally you can save up to 70% of network bandwidth by compressing text files with GZIP. We only need to check if the client accepts GZIP encoding by checking if the Accept-Encoding header contains "gzip". If this is true, and the client is requesting the full file, then the full text file will be compressed. Statistics learn that about 90% of the browsers supports GZIP.

This may also be possible for all files other than text, but as it usually concerns images and another kinds of (large) binary files, it may unnecessarily generate too much overhead to (de)compress them.

Back to top

The Code

OK, enough boring technical background blah, now on to the code!

This fileservlet does everything what it should do based on the request headers as described above. It also supports multipart byte requests (the client could send multiple ranges commaseparated along with the Range header). The whole stuff is targeted on at least Java EE 5 and developed and tested in Eclipse 3.4 with Tomcat 6. It is tested with different webbrowsers (FireFox2/3, IE6/7/8, Opera8/9, Safari2/3 and Chrome) and also with a plain vanilla Java Application using URLConnection.

You can use it for any file types: binary files, text files, images, etcetera. When the requested file is a text file or an image or when its content type is covered by the Accept request header of the client, then it will be displayed inline, otherwise it will be sent as an attachment which will pop up a 'save as' dialogue.

It's almost 485 lines of code of which the nearly half are less or more rudimentary due to comments (read them all though), long-code line breaks and blank lines. You can just copy'n'paste and run it. You're free to make changes whenever needed as long as it's not for commercial use.

/*
 * net/balusc/webapp/FileServlet.java
 *
 * Copyright (C) 2009 BalusC
 *
 * This program is free software: you can redistribute it and/or modify it under the terms of the
 * GNU Lesser General Public License as published by the Free Software Foundation, either version 3
 * of the License, or (at your option) any later version.
 * 
 * This library is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without
 * even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
 * Lesser General Public License for more details.
 * 
 * You should have received a copy of the GNU Lesser General Public License along with this library.
 * If not, see <http://www.gnu.org/licenses/>.
 */

package net.balusc.webapp;

import java.io.Closeable;
import java.io.File;
import java.io.IOException;
import java.io.OutputStream;
import java.io.RandomAccessFile;
import java.net.URLDecoder;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.List;
import java.util.zip.GZIPOutputStream;

import javax.servlet.ServletException;
import javax.servlet.ServletOutputStream;
import javax.servlet.http.HttpServlet;
import javax.servlet.http.HttpServletRequest;
import javax.servlet.http.HttpServletResponse;

/**
 * A file servlet supporting resume of downloads and client-side caching and GZIP of text content.
 * This servlet can also be used for images, client-side caching would become more efficient.
 * This servlet can also be used for text files, GZIP would decrease network bandwidth.
 *
 * @author BalusC
 * @link http://balusc.blogspot.com/2009/02/fileservlet-supporting-resume-and.html
 */
public class FileServlet extends HttpServlet {

    // Constants ----------------------------------------------------------------------------------

    private static final int DEFAULT_BUFFER_SIZE = 10240; // ..bytes = 10KB.
    private static final long DEFAULT_EXPIRE_TIME = 604800000L; // ..ms = 1 week.
    private static final String MULTIPART_BOUNDARY = "MULTIPART_BYTERANGES";

    // Properties ---------------------------------------------------------------------------------

    private String basePath;

    // Actions ------------------------------------------------------------------------------------

    /**
     * Initialize the servlet.
     * @see HttpServlet#init().
     */
    public void init() throws ServletException {

        // Get base path (path to get all resources from) as init parameter.
        this.basePath = getServletContext().getRealPath(getInitParameter("basePath"));

        // Validate base path.
        if (this.basePath == null) {
            throw new ServletException("FileServlet init param 'basePath' is required.");
        } else {
            File path = new File(this.basePath);
            if (!path.exists()) {
                throw new ServletException("FileServlet init param 'basePath' value '"
                    + this.basePath + "' does actually not exist in file system.");
            } else if (!path.isDirectory()) {
                throw new ServletException("FileServlet init param 'basePath' value '"
                    + this.basePath + "' is actually not a directory in file system.");
            } else if (!path.canRead()) {
                throw new ServletException("FileServlet init param 'basePath' value '"
                    + this.basePath + "' is actually not readable in file system.");
            }
        }
    }

    /**
     * Process HEAD request. This returns the same headers as GET request, but without content.
     * @see HttpServlet#doHead(HttpServletRequest, HttpServletResponse).
     */
    protected void doHead(HttpServletRequest request, HttpServletResponse response)
        throws ServletException, IOException
    {
        // Process request without content.
        processRequest(request, response, false);
    }

    /**
     * Process GET request.
     * @see HttpServlet#doGet(HttpServletRequest, HttpServletResponse).
     */
    protected void doGet(HttpServletRequest request, HttpServletResponse response)
        throws ServletException, IOException
    {
        // Process request with content.
        processRequest(request, response, true);
    }

    /**
     * Process the actual request.
     * @param request The request to be processed.
     * @param response The response to be created.
     * @param content Whether the request body should be written (GET) or not (HEAD).
     * @throws IOException If something fails at I/O level.
     */
    private void processRequest
        (HttpServletRequest request, HttpServletResponse response, boolean content)
            throws IOException
    {
        // Validate the requested file ------------------------------------------------------------

        // Get requested file by path info.
        String requestedFile = request.getPathInfo();

        // Check if file is actually supplied to the request URL.
        if (requestedFile == null) {
            // Do your thing if the file is not supplied to the request URL.
            // Throw an exception, or send 404, or show default/warning page, or just ignore it.
            response.sendError(HttpServletResponse.SC_NOT_FOUND);
            return;
        }

        // URL-decode the file name (might contain spaces and on) and prepare file object.
        File file = new File(basePath, URLDecoder.decode(requestedFile, "UTF-8"));

        // Check if file actually exists in filesystem.
        if (!file.exists()) {
            // Do your thing if the file appears to be non-existing.
            // Throw an exception, or send 404, or show default/warning page, or just ignore it.
            response.sendError(HttpServletResponse.SC_NOT_FOUND);
            return;
        }

        // Prepare some variables. The ETag is an unique identifier of the file.
        String fileName = file.getName();
        long length = file.length();
        long lastModified = file.lastModified();
        String eTag = fileName + "_" + length + "_" + lastModified;
        long expires = System.currentTimeMillis() + DEFAULT_EXPIRE_TIME;


        // Validate request headers for caching ---------------------------------------------------

        // If-None-Match header should contain "*" or ETag. If so, then return 304.
        String ifNoneMatch = request.getHeader("If-None-Match");
        if (ifNoneMatch != null && matches(ifNoneMatch, eTag)) {
            response.setStatus(HttpServletResponse.SC_NOT_MODIFIED);
            response.setHeader("ETag", eTag); // Required in 304.
            response.setDateHeader("Expires", expires); // Postpone cache with 1 week.
            return;
        }

        // If-Modified-Since header should be greater than LastModified. If so, then return 304.
        // This header is ignored if any If-None-Match header is specified.
        long ifModifiedSince = request.getDateHeader("If-Modified-Since");
        if (ifNoneMatch == null && ifModifiedSince != -1 && ifModifiedSince + 1000 > lastModified) {
            response.setStatus(HttpServletResponse.SC_NOT_MODIFIED);
            response.setHeader("ETag", eTag); // Required in 304.
            response.setDateHeader("Expires", expires); // Postpone cache with 1 week.
            return;
        }


        // Validate request headers for resume ----------------------------------------------------

        // If-Match header should contain "*" or ETag. If not, then return 412.
        String ifMatch = request.getHeader("If-Match");
        if (ifMatch != null && !matches(ifMatch, eTag)) {
            response.sendError(HttpServletResponse.SC_PRECONDITION_FAILED);
            return;
        }

        // If-Unmodified-Since header should be greater than LastModified. If not, then return 412.
        long ifUnmodifiedSince = request.getDateHeader("If-Unmodified-Since");
        if (ifUnmodifiedSince != -1 && ifUnmodifiedSince + 1000 <= lastModified) {
            response.sendError(HttpServletResponse.SC_PRECONDITION_FAILED);
            return;
        }


        // Validate and process range -------------------------------------------------------------

        // Prepare some variables. The full Range represents the complete file.
        Range full = new Range(0, length - 1, length);
        List<Range> ranges = new ArrayList<Range>();

        // Validate and process Range and If-Range headers.
        String range = request.getHeader("Range");
        if (range != null) {

            // Range header should match format "bytes=n-n,n-n,n-n...". If not, then return 416.
            if (!range.matches("^bytes=\\d*-\\d*(,\\d*-\\d*)*$")) {
                response.setHeader("Content-Range", "bytes */" + length); // Required in 416.
                response.sendError(HttpServletResponse.SC_REQUESTED_RANGE_NOT_SATISFIABLE);
                return;
            }

            // If-Range header should either match ETag or be greater then LastModified. If not,
            // then return full file.
            String ifRange = request.getHeader("If-Range");
            if (ifRange != null && !ifRange.equals(eTag)) {
                try {
                    long ifRangeTime = request.getDateHeader("If-Range"); // Throws IAE if invalid.
                    if (ifRangeTime != -1 && ifRangeTime + 1000 < lastModified) {
                        ranges.add(full);
                    }
                } catch (IllegalArgumentException ignore) {
                    ranges.add(full);
                }
            }

            // If any valid If-Range header, then process each part of byte range.
            if (ranges.isEmpty()) {
                for (String part : range.substring(6).split(",")) {
                    // Assuming a file with length of 100, the following examples returns bytes at:
                    // 50-80 (50 to 80), 40- (40 to length=100), -20 (length-20=80 to length=100).
                    long start = sublong(part, 0, part.indexOf("-"));
                    long end = sublong(part, part.indexOf("-") + 1, part.length());

                    if (start == -1) {
                        start = length - end;
                        end = length - 1;
                    } else if (end == -1 || end > length - 1) {
                        end = length - 1;
                    }

                    // Check if Range is syntactically valid. If not, then return 416.
                    if (start > end) {
                        response.setHeader("Content-Range", "bytes */" + length); // Required in 416.
                        response.sendError(HttpServletResponse.SC_REQUESTED_RANGE_NOT_SATISFIABLE);
                        return;
                    }

                    // Add range.
                    ranges.add(new Range(start, end, length));
                }
            }
        }


        // Prepare and initialize response --------------------------------------------------------

        // Get content type by file name and set default GZIP support and content disposition.
        String contentType = getServletContext().getMimeType(fileName);
        boolean acceptsGzip = false;
        String disposition = "inline";

        // If content type is unknown, then set the default value.
        // For all content types, see: http://www.w3schools.com/media/media_mimeref.asp
        // To add new content types, add new mime-mapping entry in web.xml.
        if (contentType == null) {
            contentType = "application/octet-stream";
        }

        // If content type is text, then determine whether GZIP content encoding is supported by
        // the browser and expand content type with the one and right character encoding.
        if (contentType.startsWith("text")) {
            String acceptEncoding = request.getHeader("Accept-Encoding");
            acceptsGzip = acceptEncoding != null && accepts(acceptEncoding, "gzip");
            contentType += ";charset=UTF-8";
        } 

        // Else, expect for images, determine content disposition. If content type is supported by
        // the browser, then set to inline, else attachment which will pop a 'save as' dialogue.
        else if (!contentType.startsWith("image")) {
            String accept = request.getHeader("Accept");
            disposition = accept != null && accepts(accept, contentType) ? "inline" : "attachment";
        }

        // Initialize response.
        response.reset();
        response.setBufferSize(DEFAULT_BUFFER_SIZE);
        response.setHeader("Content-Disposition", disposition + ";filename=\"" + fileName + "\"");
        response.setHeader("Accept-Ranges", "bytes");
        response.setHeader("ETag", eTag);
        response.setDateHeader("Last-Modified", lastModified);
        response.setDateHeader("Expires", expires);


        // Send requested file (part(s)) to client ------------------------------------------------

        // Prepare streams.
        RandomAccessFile input = null;
        OutputStream output = null;

        try {
            // Open streams.
            input = new RandomAccessFile(file, "r");
            output = response.getOutputStream();

            if (ranges.isEmpty() || ranges.get(0) == full) {

                // Return full file.
                Range r = full;
                response.setContentType(contentType);
                response.setHeader("Content-Range", "bytes " + r.start + "-" + r.end + "/" + r.total);

                if (content) {
                    if (acceptsGzip) {
                        // The browser accepts GZIP, so GZIP the content.
                        response.setHeader("Content-Encoding", "gzip");
                        output = new GZIPOutputStream(output, DEFAULT_BUFFER_SIZE);
                    } else {
                        // Content length is not directly predictable in case of GZIP.
                        // So only add it if there is no means of GZIP, else browser will hang.
                        response.setHeader("Content-Length", String.valueOf(r.length));
                    }

                    // Copy full range.
                    copy(input, output, r.start, r.length);
                }

            } else if (ranges.size() == 1) {

                // Return single part of file.
                Range r = ranges.get(0);
                response.setContentType(contentType);
                response.setHeader("Content-Range", "bytes " + r.start + "-" + r.end + "/" + r.total);
                response.setHeader("Content-Length", String.valueOf(r.length));
                response.setStatus(HttpServletResponse.SC_PARTIAL_CONTENT); // 206.

                if (content) {
                    // Copy single part range.
                    copy(input, output, r.start, r.length);
                }

            } else {

                // Return multiple parts of file.
                response.setContentType("multipart/byteranges; boundary=" + MULTIPART_BOUNDARY);
                response.setStatus(HttpServletResponse.SC_PARTIAL_CONTENT); // 206.

                if (content) {
                    // Cast back to ServletOutputStream to get the easy println methods.
                    ServletOutputStream sos = (ServletOutputStream) output;

                    // Copy multi part range.
                    for (Range r : ranges) {
                        // Add multipart boundary and header fields for every range.
                        sos.println();
                        sos.println("--" + MULTIPART_BOUNDARY);
                        sos.println("Content-Type: " + contentType);
                        sos.println("Content-Range: bytes " + r.start + "-" + r.end + "/" + r.total);

                        // Copy single part range of multi part range.
                        copy(input, output, r.start, r.length);
                    }

                    // End with multipart boundary.
                    sos.println();
                    sos.println("--" + MULTIPART_BOUNDARY + "--");
                }
            }
        } finally {
            // Gently close streams.
            close(output);
            close(input);
        }
    }

    // Helpers (can be refactored to public utility class) ----------------------------------------

    /**
     * Returns true if the given accept header accepts the given value.
     * @param acceptHeader The accept header.
     * @param toAccept The value to be accepted.
     * @return True if the given accept header accepts the given value.
     */
    private static boolean accepts(String acceptHeader, String toAccept) {
        String[] acceptValues = acceptHeader.split("\\s*(,|;)\\s*");
        Arrays.sort(acceptValues);
        return Arrays.binarySearch(acceptValues, toAccept) > -1
            || Arrays.binarySearch(acceptValues, toAccept.replaceAll("/.*$", "/*")) > -1
            || Arrays.binarySearch(acceptValues, "*/*") > -1;
    }

    /**
     * Returns true if the given match header matches the given value.
     * @param matchHeader The match header.
     * @param toMatch The value to be matched.
     * @return True if the given match header matches the given value.
     */
    private static boolean matches(String matchHeader, String toMatch) {
        String[] matchValues = matchHeader.split("\\s*,\\s*");
        Arrays.sort(matchValues);
        return Arrays.binarySearch(matchValues, toMatch) > -1
            || Arrays.binarySearch(matchValues, "*") > -1;
    }

    /**
     * Returns a substring of the given string value from the given begin index to the given end
     * index as a long. If the substring is empty, then -1 will be returned
     * @param value The string value to return a substring as long for.
     * @param beginIndex The begin index of the substring to be returned as long.
     * @param endIndex The end index of the substring to be returned as long.
     * @return A substring of the given string value as long or -1 if substring is empty.
     */
    private static long sublong(String value, int beginIndex, int endIndex) {
        String substring = value.substring(beginIndex, endIndex);
        return (substring.length() > 0) ? Long.parseLong(substring) : -1;
    }

    /**
     * Copy the given byte range of the given input to the given output.
     * @param input The input to copy the given range to the given output for.
     * @param output The output to copy the given range from the given input for.
     * @param start Start of the byte range.
     * @param length Length of the byte range.
     * @throws IOException If something fails at I/O level.
     */
    private static void copy(RandomAccessFile input, OutputStream output, long start, long length)
        throws IOException
    {
        byte[] buffer = new byte[DEFAULT_BUFFER_SIZE];
        int read;

        if (input.length() == length) {
            // Write full range.
            while ((read = input.read(buffer)) > 0) {
                output.write(buffer, 0, read);
            }
        } else {
            // Write partial range.
            input.seek(start);
            long toRead = length;

            while ((read = input.read(buffer)) > 0) {
                if ((toRead -= read) > 0) {
                    output.write(buffer, 0, read);
                } else {
                    output.write(buffer, 0, (int) toRead + read);
                    break;
                }
            }
        }
    }

    /**
     * Close the given resource.
     * @param resource The resource to be closed.
     */
    private static void close(Closeable resource) {
        if (resource != null) {
            try {
                resource.close();
            } catch (IOException ignore) {
                // Ignore IOException. If you want to handle this anyway, it might be useful to know
                // that this will generally only be thrown when the client aborted the request.
            }
        }
    }

    // Inner classes ------------------------------------------------------------------------------

    /**
     * This class represents a byte range.
     */
    protected class Range {
        long start;
        long end;
        long length;
        long total;

        /**
         * Construct a byte range.
         * @param start Start of the byte range.
         * @param end End of the byte range.
         * @param total Total length of the byte source.
         */
        public Range(long start, long end, long total) {
            this.start = start;
            this.end = end;
            this.length = end - start + 1;
            this.total = total;
        }

    }

}

In order to get the FileServlet to work, add the following entries to the Web Deployment Descriptor web.xml:

<servlet>
    <servlet-name>fileServlet</servlet-name>
    <servlet-class>net.balusc.webapp.FileServlet</servlet-class>
    <init-param>
        <param-name>basePath</param-name>
        <param-value>/WEB-INF/resources</param-value>
    </init-param>
</servlet>

<servlet-mapping>
    <servlet-name>fileServlet</servlet-name>
    <url-pattern>/resources/*</url-pattern>
</servlet-mapping>

The basePath value must be relative to the webcontent of your webapplication. You can of course change the value of the basePath parameter and the url-pattern of the servlet-mapping to your taste.

Here are some basic use examples:

<!-- XHTML or JSP -->
<a href="resources/files/foo.exe">download foo.exe</a>
<a href="resources/files/bar.zip">download bar.zip</a>

<img src="resources/images/pic.jpg" />
<img src="resources/images/logo.gif" />

<!-- JSF -->
<h:outputLink value="resources/files/foo.exe">download foo.exe</h:outputLink>
<h:outputLink value="resources/files/bar.zip">download bar.zip</h:outputLink>
<h:outputLink value="resources/files/#{myBean.fileName}">
    <h:outputText value="download #{myBean.fileName}" />
</h:outputLink>

<h:graphicImage value="resources/images/pic.jpg" />
<h:graphicImage value="resources/images/logo.gif" />
<h:graphicImage value="resources/images/#{myBean.imageFileName}" />

Important note: this servlet example does not take the requested file as request parameter, but just as part of the absolute URL, because a certain widely used browser developed by a team in Redmond would take the last part of the servlet URL path as filename during the 'Save As' dialogue instead of the in the headers supplied filename. Using the filename as part of the absolute URL (and thus not as request parameter) will fix this utterly stupid behaviour. As a bonus, the URL's look much nicer without query parameters.

Back to top

Copyright - GNU Lesser General Public License

(C) February 2009, BalusC

72 comments:

Sathya said...

Hi BalusC,
Thanks for sharing this article.

I tried to use it but it always returns full document instead of byte range for a Linearized PDF. Based on your code it make sense as requested will not populate 'range' when clicked on a link to download.
Not sure how to initiate the byte-range streaming. Looks like I am missing a fundamental.

Thanks in advance.
- Satya

BalusC said...

That entirely depends on how the client initiates the request. In this specific case it's Adobe Acrobat which should request the PDF file as byte ranges.

Sathya said...

I have a link on our website that points to FileServlet. When user clicks on the link it opens new popup window to display PDF document. In this scenario, how Acrobat can send initiate request byte ranges?

Should I be sending additional params as part of the request URL?

BalusC said...

As said, that depends on the client. You need to open the URL with Adobe Acrobat, not with your webbrowser.

Lilianne E. Blaze said...

Great work, but could you please consider releasing it as LGPL (or Apache License / whatever) so it could be used with non-GPL code?

BalusC said...

Good point. Haven't considered that closely enough. Done so ;)

caegumelo said...

Hey BalusC, you da man! Your articles rock, good coding, I always come by to check what you`re up to.

I have posted something on a SUN forum, would you help a brother out?

http://forums.sun.com/thread.jspa?threadID=5381788&tstart=0

Thanks.

rbulling said...

Thank you for writing and sharing this, this FileServlet was almost exactly what I needed. I've enhanced this to have an extra, optional init parameter, "useServletPath", that makes the FileServlet use the servlet path as part of its file mapping.

So, for example, if you mapped the servlet to both "/static/*" and "/foobar/*" in web.xml, and used "/temp" as the basePath, it would be able to serve up the files "/temp/static/test.txt" and "/temp/foobar/baz.jpg" without having to declare the servlet twice.

Would you like the patch?

BalusC said...

@rbulling: I like to keep it as basic as possible. You're free to make changes :)

@anyone who is interested: today I added GZIP capabilities to the FileServlet. It will GZIP text content when the browser supports it. It will save up to 70% of network bandwidth! Useful if you have large CSS and JS files. In one of my projects the JS file, which actually exist of at least 4 merged JS files (jQuery and plugins) and is already minified, went from ~125KB to ~37KB!

P said...

Good stuff!
I've worked today on task that seemed to be trivial: just compress data on the fly with ZIP and send the compressed archive to the browser for saving to the file system. Applied "typical" code for this. Spent then over half a day investigating why the file saved by the browser is corrupted. The answer was astonishing: somewhere in the process unwanted content was being added to the servlet response! And instead of getting "raw" binary data the response was "decorated" by the html code of current page attached at the end of stream.
In the case of data type which size can be predicted/calculated it is enough to set: response.setHeader("Content-Length", String.valueOf(file.length()));
This "trims" the outputStream, so nothing unwanted can be appended. But in the case of compressed archive one cannot set the length as spotted in the example above. So, the question is how to deal with that? How to disable this extra content appearing in response? Likely it can depend on the used technologies, and I couldn’t find any solution to my problem in the net. Anyway this happens for JSF 1.2 (myfaces 1.2.3) + Facelets, with tomahawk 1.1.8 running with Spring 2.5 and SWF 2.0. As I remember it worked in the same manner with older versions as well.

BalusC said...

1) Do not use Writer. Use OutputStream only.
2) Handle in a Java class entirely (Servlet/Filter/Bean), not in a view page (JSP/JSF).
3) Call response.reset() before and response.close() after.
4) When request is covered by FacesServlet, call FacesContext#responseComplete() to avoid JSF postprocess the response (also see "PDF handling" article somewhere at this blog site).

P said...

That's good, that the principles of downloading a file were reminded in short. They are simple and obvious, and were applied to my original code. But still I'm facing with the problem. Tried even a workaround by saving temporarily the zip to the file system and then read it, then get its size. In this case, although the resulting zip was of the same size as temp file, its content was cut. Besides, despite setting FacesContext#responseComplete(), JSF still does postprocessing. From many tests I did the conclusions are as follows:
1. perhaps something is missing when playing with zip, since text/xml can be saved correctly,
2. JSF ignores FacesContext#responseComplete() (it still enters RENDER_RESPONSE(6) phase after APPLY_REQUEST_VALUES(2)).

BalusC said...

1) Are you saying that when you write it to a file, it is already corrupt? Did you read/write every byte properly? Did you close the streams?

2) JSF will indeed just kick in the render response phase (to finalize some internals, e.g. releasing the FacesContext and so on), but it will not write anything to the response when you call responseComplete(). If it did, while you have explicitly closed the outputstream, you should have seen an IllegalStateException in the appserver logs.

P said...

Perhaps I don't see something.
Ad.1) The same code is able to write correct zip file to the file system when instead of writing to the response i write to a file i.e.:
ZipOutputStream out = new ZipOutputStream(new FileOutputStream("myfile.zip"));
instead of:
ZipOutputStream out = new ZipOutputStream(response.getOutputStream());

I tried several methods of writing to the output stream. Writing seemed to be correct - but when it was saved then by browser it was corrupt.

Here goes the code:
public void exportToZipFile(String fileName, String fileExtension, String dataToExport){
ByteBuffer data = ByteBuffer.allocate(dataToExport.getBytes().length);
data.put(dataToExport.getBytes());

String filename = getOutputFileName(fileName) + "." + fileExtension; // Filename suggested in browser Save As dialog
FacesContext context = FacesContext.getCurrentInstance();
HttpServletResponse response = (HttpServletResponse)context.getExternalContext().getResponse();
response.reset();
response.setContentType("application/zip");
response.setHeader("Content-Disposition", "attachment; filename=" + filename);

try {
// Create the ZIP OutputStream
ZipOutputStream out = new ZipOutputStream(response.getOutputStream());

ZipEntry entry = new ZipEntry("ODM.xml");
out.putNextEntry(entry);
// Transfer bytes from the file to the ZIP file
out.write(data.array());
// Complete the entry
out.closeEntry();

// Complete the ZIP file
out.flush();
out.close();

response.flushBuffer();
context.responseComplete();

} catch (IOException e) {
}
}

BalusC said...

You're swallowing the IOException. Never ignore exceptions unless you exactly know what you're doing. Add a throws clause or at least e.printStackTrace(). Another tip: do the closing in finally block.

P said...

I removed the exception handling intentionally to make the code snippet shorter. For the same reason the code contains only the most important things.
The problem, as it occurred, isn't a quite new. I faced with it a time ago and then forgot it then. But that time the file to be saved was text one and workaround was setting "Content-Length" in the Header. I'm wondering where the problem lies. So, if the code follows the principles of dealing with response, maybe there is any configuration setting affecting the process or just a bug in the jsf implementation (I do not use RI)?

Sharad said...

Hi Balu, I was working on video download code on iphone and getting some trouble. I did go thru your article. The video download works fine when its read as an file. but when the same file hosted in a different domain and is read thrugh HTTP (URLConnection), the video does not work on iphone.Any idea

BalusC said...

A video stream is a blocking stream. You should keep the HTTP connection of URLConnection alive and read it as a blocking stream.

Dev said...

Can plz you elaborate what you mean.
thanks

Ole Christian Rynning said...

Nice Servlet, Thank you! This saved me some coding :)

nodje said...

Hi Balus

nice servlet, I plan to use it to be able to share JS scripts in JARs accross apps, to avoid code duplication.

However, if this fits, we'll be using it in our customer's application.
We're are not selling the application but we're getting paid to support and develop it.
How does that matches the licensing scheme in use?

BalusC said...

@nodje: You can do so.

nodje said...

thanks Balus.

Actually after a little examination, I realize I need to get access to the classpath, not the file system, to be able to share JAR contained resources.

But your Servlet is still a clever way to do that. I just have to add the code to look into getClass().getResources(...).
I'm struggling to have the FileServlet automatically and transparently look both into filesystem and classpath.

I'm a little wary about the security issues though, I'd be giving access to the entire classpath!!
I assume limiting the lookup into a directory only, such as the 'resources' you use, should be safe for the rest of the classpath below that directory.

What do you think, does it make any sense for you to add this capability to your servlet?

Anyway let me know if you're interested or have other views on the problem.

Happy New Year!

Eric said...

Hi BalusC,

I've created an ant project which creates a binary jar, sources jar and javadoc jar. The project uses your source code with a very small change -- I added a generated serialVersionUID. The project is available on github: http://github.com/t11e/balusc_file_servlet
Initial versions of the three files are available for download from the github project.

Kind regards,
Eric Meyer

BalusC said...

Cool, that's truly no problem :)

Даниел Славов said...

Great stuff! It is very helpful. I have a question. Where should I put some code that deletes the file after it is fully sent to the browser?

Thanks in advance
Daniel

Josh said...
This comment has been removed by the author.
Josh said...
This comment has been removed by the author.
BalusC said...

@Daniel: at end of method, after the close().

@Josh: thanks for the feedback.

Josh said...

[deleted my early comments to consolidate]

A few points.

You say you MUST include the Etag and Last-Modified headers, that's incorrect. Either-Or will do, Etag being preferred. ETag is the Http/1.1 revision of Last-Modified and in all reality much more reliable for Conditional GET then Last-Modified (being that second granularity in today's internet is simply not enough).

Also, the chance of two random strings generating the same MD5 hash is so remote that your comment about not using MD5 is pretty far off the mark. Now using truncated MD5 strings is a bad idea, but the full MD5 hash is probably better then anything you can come up with on your own.

Your understanding of conditional GET is incorrect. If a server recieves a GET with an Etag value that does not match the current representation then the correct response is a 200 that contains the current representation (and a 304 "Not Modified" with no content body if the Etags match). Not a 412 "Precondition Failed" response. The 412 is used during a PUT when the client is trying to replace the representation but does not have the correct Etag of the actual current representation.

Per the HTTP/1.1 spec, a Range header with an invalid format is not cause for a 416. Invalid format range headers should be ignored and a normal 200 response should be returned. Only Ranges outside the available content range (i.e. a file of 2K and the want the 5000 byte, or the start offset to greater then the end offset) are cause for a 416.

Даниел Славов said...

@BalusC

Thanks for your comment.

The problem I have is when using this servlet to send files to iPod/iPhone. It seems iPhone makes multiple requests to get the full file and if I delete the file in the finally block it will be delete after the first request, and so the file will not be sent to the iPhone. I need somehow to understand that the last chunk has been sent. Any ideas will be appreciated.

Regards,
Daniel

Jake said...

Hi,

We were getting broken pipe exceptions while integrating with a genome visualizer (IGV from Broad) and found your code to be very useful for handling range bytes.

Thanks very much,
Jake@ISB

jcmúzimo said...

Excelente codigo amigo, muchas gracias. :D

Namita said...

Whats the difference between the addHeader() and setHeader() .

Namita said...

Hi Balusc,

We are having issues in setting the eTag header.First we used setHeader("ETag",value) but the containHeader("ETag") returns as false.
Then we Tried using addHeader("ETag",value) still we get the containHeader("ETag") as returns false

Could please let me now wat could be the issue.

Thanks and Regards,
Namita

BalusC said...

@Namita: addHeader() will add the given value to the given header name if it already exist. Otherwise it will just create one. setHeader() will override any existing header on the same given header name.

As to your concrete problem, I don't know what containsHeader() is. Be sure that you check the response headers, not the request headers with containsHeader(). Better yet, use a HTTP traffic debugger which can show the headers being transferred, like Firebug (check the "Net" panel).

Peter J said...

Hi BalusC,
I had just implemented my own downloader to use from Tomcat, based on the DefaultServlet. I like your version because it implements gzip, so I am testing it.

One question, when calling doHead for a full file, the Content-Length is set to 0. Is this conformant with the http spec?

I am inclined to change your code so that Content-Length is returned in that case.

Best regards,
Peter

p.s. coincidentally, I am updating my server code to behave more correctly with IGV.

Ali said...

Hi BalusC,

Many thanks. That saved me lot of time in coding. I was able to successfully integrate this with my JSF portlet code.

Cheers

Ali Syed

healeyb said...

Hi BalusC, I'm trying to apply the caching principles to a filter to send a 304 when the browser has an up to date copy. The chrome browser stores the etag like this: W/"15475-1326469468158. The server side code sent this: styleSheet.css_15475_1326469468158. It looks like I'm trying to compare the right things, but the matches() method doesn't return true. Could this be the quotes? Thanks.

healeyb said...

I see the problem now - glassfish is for some reason overriding my setting of the ETag header. I set Cache-Control in the same place, but ETag gets lost along the way for some reason. I saw a post saying to disable the file cache on the http-listeners, so I did that, but still have the problem. Will bring it up with the g/f people. Thanks. Great servlet.

Avramucz Péter said...

Hi! Great and working code! Could you please change the license as Lilianne E. Blaze asked?

TABS MAN said...

Great code. Yet still cann't undserstand why uploading video file is impossible. I got this as error:

java.lang.ClassCastException: org.apache.commons.fileupload.FileUploadException cannot be cast to org.apache.commons.fileupload.FileItem
ANY HELP

JakedUp said...

Hey man, great job on the FileServlet script for Java. It's very complex to say the least!

I come from a PHP background, and I am starting to learn Java. I am pretty good at modularizing scripts so that I can call them dynamically using functions. But Java is a little more complex than PHP, so I am still trying to wrap my head around the code differences. How would I go about making this script a little more dynamic, by allowing the basePath to be called via a function? e.g. Sometimes I would want to download a file from example.com and sometimes foobar.com. Any suggestions you can provide would be awesome!

Roman said...

Thanks for this servlet. Good job!
Very helpful for me.

mindas said...

I have ported this code to my project and observed some strange client behaviour on some Apple platforms. More on this: http://stackoverflow.com/questions/12637728/http-byte-range-protocol-client-behaviour-on-ipad-iphone

I'd appreciate if anyone could comment on these findings.

Karsten Pflum biggest fan said...

Cheers for the article. This saved me hours of research and coding. Cheers to you once more!

Jordan Marinov said...

I had problems with "response.setHeader("Content-Length", String.valueOf(r.length));", instead I had to use "response.setContentLength((int) r.length);". Otherwise great stuff!

TT said...

Note that you should put quotes around the eTag value in the response. So it should look like this:

response.setHeader("ETag", "\"" + eTag + "\"");

Examples from RFC2616:

ETag: "xyzzy"
ETag: W/"xyzzy"
ETag:

Raghu Ravindra said...

Hi this code seems works fine locally when am trying in apache tomcat and access localhost:8084 but in LAN when i try to access , it's showing save popup window ..... it is not playing video in browser ..... any Suggestions

javad sabbagh said...

What a wonderful weblog!
Thank you Bauke.

Atul Dambalkar said...

Hi Bauke,

Thanks for this well-written blog post. I have modified it slightly and used it to develop a Spring view. I will publish that code.

Thanks again,

-Atul

a9237034-883b-11e2-b256-000bcdcb5194 said...

Great work on this code, very useful!

I tried this on my browser and it downloaded the file; the file being an image. How can I get it to just display the image in the browsers instead of downloading it?

Bauke Scholtz said...

Use HTML <img> element.

Lakshya Kumar said...

Hi BalusC, I am using your FileServlet for GZIP compressions and it was working fine until I started using JSF 2 h:outputScript and h:outputStylesheet tags for resource versioning. Now gzip is not working because the URL pattern of FileServlet not matching with the url pattern generated by these JSF tags (h:outputStylesheet etc.). I also tried to change the URL pattern in web.xml from '/resources/*' to '/faces/javax.faces.resource/*' but after this I got 404 error for all the css and js files. Please let me know how to work it out, as I am using primefaces 3.5 lib and due to which my page load time is suffering badly as primefaces includes many js and css files into my page and their sizes are also heavy.

Thanks in advance.
- Sandeep

Bauke Scholtz said...

Lakshya: use GzipResponseFilter of OmniFaces.
http://showcase.omnifaces.org/filters/GzipResponseFilter

Lakshya Kumar said...

Thanks BalusC for your reply..
Please let me know how to use this filter. I have configured it in the web.xml but where do I find jar files library for that filter. Or do I need to copy all the three source files and place those in org.omnifaces.filter package in my application? Also do I need any license to use this filter or it is free to use. Thanks!

Bauke Scholtz said...

Lakshya: just install OmniFaces JAR. See showcase homepage or project homepage for instructions.

Lakshya Kumar said...

Thanks a lot BalusC. Gzip compression is working now with the help of OmniFaces filter. BalusC I have one more problem in my JSF application, can I ask this favor as well from you. Please let me know where can I post that question? in this blog or at stackoverflow? Many thanks!! you are truly genius... :)

Bauke Scholtz said...

Please post questions at Stack Overflow.

Akshay Sahu said...

How should I send a Range value. I cannot see the range header being sent from your HTML code.

And because of this, the code at the server will never get into the if condition present at the line number 139.

And it will always get into the if condition present at the line number 466.

Can you please an example code with Range header.

Thanks in Advance,
Akshay Sahu

Broc Seib said...

Hey BalusC,

Should the 304 status codes be sent using response.sendError() rather than setHeader() et al.?

I noticed some odd behavior: 1) my client got the 304 just fine, but 2) my tomcat server was logging hits to my "/last-ditch" error page per a catch-all error page config in my web.xml, i.e.,


/last-ditch-error-page

Broc Seib said...

Shoot, I forget that blogger doesn't like html-ish angle brackets... Let me try once more before giving up. :-)

Here my catch-all error page config from my web.xml:

<error-page>
<location>/last-ditch-error-page</location>
</error-page>

Broc Seib said...

>
> 1nn/2nn/3nn responses are not errors.
>

To clarify, I believe two places in your code above (where it returns 304) should use setStatus() rather than sendError().

Steve Boutilier said...

Thanks BalusC. I needed a servlet that could serve up wav files from a database with partial content support so they work in the HTML5 audio tag. With a little modification, this worked perfectly!

cgullcharlie said...

Hi BalusC, can you please clarify the license detail " You're free to make changes whenever needed as long as it's not for commercial use."
If we want to make changes and we want to use commercially, what then?

Michele (nikhes) said...

Hi, i have problem only with firefox.

Firefox not send the header "If-None-Match" or "If-Modified-Since" header.

help pls ;(

Shasha said...
This comment has been removed by the author.
Shasha said...

Hi BalusC

could you please put the java code which can upload say xml file using resume option

Sha

kavita dixit said...

Hi BalusC,

COuld you please share the java client code for this fileservlet

kavita dixit said...

Hi BalusC,

COuld you please share the java client code for this fileservlet

mukesh said...

Hi BalusC,
Thank you so so so much......

Thanks for sharing this article.

qrs said...

Has anyone experienced any problems using this code with PDFs and IE?

What we are seeing is the download proceeding normally in chunks and then being interrupted by the server presumably due to a bad byte range or other header issue.

We've seen this problem with some PDF files (not all) and several different versions of IE including IE 9.