Raw file handling

Please note that this is an experimental development direction with expected incompatible changes in future releases. Please contact us if you plan to use this extension library to discuss your use case.

Storing and retrieving raw content through the REST API is essential to integrate the MadFast REST server as a non-static, mutable component.

Main components of the raw file handling:

Creating raw files

Launch the MadFast server (from the distribution's root directory).

bin/gui.sh -port 8085

Create content to be posted:

echo "Hello, World" > hello.txt
echo "<html><head><title>Hello</title></head><body>Hello, World</body></html>" > hello.html

Use curl to add these files by sending a multipart/form-data POST request to rest/experimental-rawfiles: endpoint. The response ExperimentalRawFileInfo is the resource descriptor in JSON for the freshly created resource.

curl -X POST -F file=@hello.txt http://localhost:8085/rest/experimental-rawfiles | python -m json.tool
{
    "contenttype": "application/octet-stream",
    "description": "Uploaded from file hello.txt",
    "name": "hello.txt",
    "size": 13,
    "time": 0,
    "url": "rest/experimental-rawfiles/hello.txt"
}

The resource is created with the default application/octet-stream content type.

Accessing rawfiles

We can access rawfile resources with curl from endpoint rest/experimental-rawfiles/{res}/raw:

curl -i http://localhost:8085/rest/experimental-rawfiles/hello.txt/raw
HTTP/1.1 200 OK
Date: Thu, 16 May 2019 20:38:20 GMT
Content-Type: application/octet-stream
content-disposition: attachment; filename = hello.txt
Content-Length: 13
Server: Jetty(9.4.15.v20190215)

Hello, World

Option -i passed to curl will print response HTTP headers as part of the output. Note the presence of the content-disposition header.

Skip content-disposition header

With endpoint rest/experimental-rawfiles/{res}/raw-nocd the content-disposition HTTP header will not be attached:

curl -i http://localhost:8085/rest/experimental-rawfiles/hello.txt/raw-nocd
HTTP/1.1 200 OK
Date: Thu, 16 May 2019 20:45:09 GMT
Content-Type: application/octet-stream
Content-Length: 13
Server: Jetty(9.4.15.v20190215)

Hello, World

An equivalent endpoint rest/experimental-rawfiles-content/{res} is also available to access the rawfile contents with an URL structure more conform to HTTP conventions:

curl -i http://localhost:8085/rest/experimental-rawfiles-content/hello.txt
HTTP/1.1 200 OK
Date: Thu, 16 May 2019 20:45:44 GMT
Content-Type: application/octet-stream
Content-Length: 13
Server: Jetty(9.4.15.v20190215)

Hello, World

Accessing from browser

Or from the browser on URLs:

Note that when content-disposition header is set the browser will save the content as a file. Without this header the browser will try to render it in-place. In this case (since the content-type is the default application/octet-stream) the browser typically is not able to render the content.

Specifying details

We can specify the content type and description too (see documentation of rest/experimental-rawfiles endpoint POST request):

curl -X POST \
    -F contenttype=text/plain \
    -F "description=This is a text file" \
    -F file=@hello.txt \
    http://localhost:8085/rest/experimental-rawfiles | python -m json.tool
{
    "contenttype": "text/plain",
    "description": "This is a text file",
    "name": "hello.txt-1",
    "size": 13,
    "time": 0,
    "url": "rest/experimental-rawfiles/hello.txt-1"
}

The resource is created with name hello.txt-1. If we request the file with no content-disposition header we can now view it from the browser on URL http://localhost:8085/rest/experimental-rawfiles/hello.txt-1/raw-nocd or on URL http://localhost:8085/rest/experimental-rawfiles-content/hello.txt-1.

See https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Content-Disposition for further details on the content-disposition header.

Modifying raw files

With a PUT request sent to rest/experimental-rawfiles/{res} (where {res} is the raw file resource name to be modified) we can explicitly specify the resource name by overwriting previous files:

curl -X PUT \
    -F contenttype=text/plain \
    -F "description=Text file with proper (text/plain) content type" \
    -F file=@hello.txt \
    http://localhost:8085/rest/experimental-rawfiles/hello.txt | python -m json.tool
{
    "contenttype": "text/plain",
    "description": "Text file with proper (text/plain) content type",
    "name": "hello.txt",
    "size": 13,
    "time": 0,
    "url": "rest/experimental-rawfiles/hello.txt"
}

Deleting raw files

A DELETE request sent to /experimental/rawfiles/{res} removes the raw file:

curl -i -X DELETE http://localhost:8085/rest/experimental-rawfiles/hello.txt
HTTP/1.1 204 No Content
Date: Thu, 16 May 2019 20:49:57 GMT
Server: Jetty(9.4.15.v20190215)

As we see the DELETE request returned with HTTP status 204 No Content and no further content.

Choosing proper content type

Uploading the HTML file with different content types:

# Content to be uploaded
echo "<html><head><title>Hello</title></head><body>Hello, World</body></html>" > hello.html

# Use improper (text/plain) content type
curl -X PUT \
    -F contenttype=text/plain \
    -F "description=HTML file with improper (text/plain) content type" \
    -F file=@hello.html \
    http://localhost:8085/rest/experimental-rawfiles/hello.html-as-text | python -m json.tool

# Use proper (text/html) content type
curl -X PUT \
    -F contenttype=text/html \
    -F "description=HTML file with proper (text/html) content type" \
    -F file=@hello.html \
    http://localhost:8085/rest/experimental-rawfiles/hello.html-as-html | python -m json.tool
{
    "contenttype": "text/plain",
    "description": "HTML file with improper (text/plain) content type",
    "name": "hello.html-as-text",
    "size": 72,
    "time": 0,
    "url": "rest/experimental-rawfiles/hello.html-as-text"
}
{
    "contenttype": "text/html",
    "description": "HTML file with proper (text/html) content type",
    "name": "hello.html-as-html",
    "size": 72,
    "time": 0,
    "url": "rest/experimental-rawfiles/hello.html-as-html"
}

We can see the difference when opened from a browser

View raw files on WebUI

Raw files are presented on the default index page of the WebUI <http:/localhost:8085/>. Note that the following screenshots are made before deleting hello.txt:

Index page

Contents of individual files can be displayed:

Raw file content

And metadata:

Raw file metadata

Reading raw files on server startup

It is possible to read raw files on server startup with command line option -rawfile <SPEC>:

# Write a file
echo "<html><head><title>Hello</title></head><body>Hello, World</body></html>" > hello.html
# Launch server, read the written file with multiple options
bin/gui.sh -port 8085 \
    -rawfile -file:hello.html:-name:read_with_default_opts \
    -rawfile "-file:hello.html:-name:read_as_text:-contenttype:text/plain:-description:File read with text/plain content type" \
    -rawfile "-file:hello.html:-name:read_as_html:-contenttype:text/html:-description:File read with text/html content type"

Please note that command line option -additionalresourcedir <DIRECTORY> also provides a mechanism to serve additional static files. See document REST API / Web UI for similarity searches for details.

Security considerations

The ability to upload or modify arbitrary raw content over the REST API could introduce security risks in certain deployments. See document REST API security considerations for additional details on the options to mitigate these risks using server feature flag options.