2

This question has two parts:

  1. By default, what URL protocols are considered valid for specifying resources to Cypher's LOAD CSV command?

    • So far, I've successfully loaded CSV files into Neo4j using http and file protocols. A comment on this unrelated question indicates that ftp works as well, but I haven't had tried this because I have no use case.
  2. What practical options do I have to configure non-standard URI protocols? I'm running up against a Neo.TransientError.Statement.ExternalResourceFailure: with "Invalid URL specified (unknown protocol)". Other than digging into the Neo4j source, is there anyway to modify this validation/setting, provided that the host machine is capable of resolving the resource with the specified protocol?

Community
  • 1
  • 1
smartcaveman
  • 41,281
  • 29
  • 127
  • 212

2 Answers2

2
  1. Neo4j relies on the capabilities of the JVM. According to https://docs.oracle.com/javase/7/docs/api/java/net/URL.html the default protocols are:

    http, https, ftp, file, jar

    Please note that file URLs are interpreted from the server's point of view and not from the client side (a common source of confusion).

  2. To use custom URLs you need to understand how the JVM deals with those. The javadocs for URL class explain an approach by using a system property to provide custom URL handlers. It should be good enough to provide this system property in neo4j-wrapper.conf and drop the jar file containing your handler classes into the plugins folder. (Note: I did not validate that approach myself, but I'm pretty confident that it will work).

Stefan Armbruster
  • 39,465
  • 6
  • 87
  • 97
1

Here is a complete example, using the technique of implementing your own URLStreamHandler to handle the resource protocol. You must name your class 'Handler', and the last segment of the package name must be the protocol name (in this case, resource)

src/main/java/com/example/protocols/resource/Handler.java:

package com.example.protocols.resource;

import java.io.FileNotFoundException;
import java.io.IOException;
import java.net.URL;
import java.net.URLConnection;
import java.net.URLStreamHandler;

public class Handler extends URLStreamHandler {
    private final ClassLoader classLoader;

    public Handler() {
        this.classLoader = getClass().getClassLoader();
    }

    @Override
    protected URLConnection openConnection(URL url) throws IOException {
        URL resource = classLoader.getResource(url.getPath());
        if (resource == null) {
            throw new FileNotFoundException("Resource file not found: " + url.getPath());
        }
        return resource.openConnection();
    }
}

From here, we need to set the system property java.protocol.handler.pkgs to include the base package com.example.protocols so that the protocol is registered. This can be done statically in a Neo4j ExtensionFactory. Since the class gets loaded by Neo4j, we know that the static block will be executed. We also need to provide our own URLAccessRule, since Neo4j by default only allows use of a few select protocols. This can also happen in the ExtensionFactory.

src/main/java/com/example/protocols/ProtocolInitializerFactory.java:

package com.example.protocols;

import org.neo4j.annotations.service.ServiceProvider;
import org.neo4j.graphdb.security.URLAccessRule;
import org.neo4j.kernel.extension.ExtensionFactory;
import org.neo4j.kernel.extension.ExtensionType;
import org.neo4j.kernel.extension.context.ExtensionContext;
import org.neo4j.kernel.lifecycle.Lifecycle;
import org.neo4j.kernel.lifecycle.LifecycleAdapter;

@ServiceProvider
public class ProtocolInitializerFactory extends ExtensionFactory<ProtocolInitializerFactory.Dependencies> {

    private static final String PROTOCOL_HANDLER_PACKAGES = "java.protocol.handler.pkgs";

    private static final String PROTOCOL_PACKAGE = ProtocolInitializerFactory.class.getPackageName();

    static {
        String currentValue = System.getProperty(PROTOCOL_HANDLER_PACKAGES, "");
        if (currentValue.isEmpty()) {
            System.setProperty(PROTOCOL_HANDLER_PACKAGES, PROTOCOL_PACKAGE);
        } else if (!currentValue.contains(PROTOCOL_PACKAGE)) {
            System.setProperty(PROTOCOL_HANDLER_PACKAGES, currentValue + "|" + PROTOCOL_PACKAGE);
        }
    }

    public interface Dependencies {
        URLAccessRule urlAccessRule();
    }

    public ProtocolInitializerFactory() {
        super(ExtensionType.DATABASE, "ProtocolInitializer");
    }

    @Override
    public Lifecycle newInstance(ExtensionContext context, Dependencies dependencies) {
        URLAccessRule urlAccessRule = dependencies.urlAccessRule();
        return LifecycleAdapter.onInit(() -> {
            URLAccessRule customRule = (config, url) -> {
                if ("resource".equals(url.getProtocol())) { // Check the protocol name
                    return url; // Optionally, you can validate the URL here and throw an exception if it is not valid or should not be allowed access
                }
                return urlAccessRule.validate(config, url);
            };
            context.dependencySatisfier().satisfyDependency(customRule);
        });
    }
}

After setting this up, follow the guide to packaging these classes as a Neo4j plugin and drop it into your database's plugins directory.

Admittedly, needing to override the default URLAccessRule feels a little bit shady. It may be better to simply implement the URLStreamHandler, and use another CSV loading method like APOC's apoc.load.csv. This will not require overriding the URLAccessRule, but it will require setting the Java system property java.protocol.handler.pkgs.

Ivan G.
  • 700
  • 8
  • 19