0

I am trying to use this example of Crate with Common Crawl: https://github.com/crate/crate-commoncrawl
I have setup the Crate and even created the table schema using the instructions from the example. I am accessing CRATE using the URL: http://localhost:4200/_plugin/crate-adminas I am working on my own system.

The only issue that I facing is the with the COPY. Let me show you that line:

COPY commoncrawl FROM 'ccrawl://cr8.is/1WSiodP';

It is triggering unknown exceptions. Here is the error and the trace of the error:

COPY ERROR (0.000 sec)
Error!

SQLActionException[MalformedURLException: unknown protocol: ccrawl] 

Error Trace:

SQLActionException: INTERNAL_SERVER_ERROR 5000 MalformedURLException: unknown protocol: ccrawl
    at java.net.URL.<init>(URL.java:600)
    at java.net.URL.<init>(URL.java:490)
    at java.net.URL.<init>(URL.java:439)
    at java.net.URI.toURL(URI.java:1089)
    at io.crate.operation.collect.files.URLFileInput.getStream(URLFileInput.java:52)
    at io.crate.operation.collect.files.FileReadingCollector.readLines(FileReadingCollector.java:228)
    at io.crate.operation.collect.files.FileReadingCollector.doCollect(FileReadingCollector.java:205)
    at io.crate.operation.collect.MapSideDataCollectOperation$1$1.run(MapSideDataCollectOperation.java:135)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)

I am using UBUNTU 16.04 operating system. Here is the image of teh error: Crate issue image Kindly, help me. I am not able to understand the problem. do share your thoughts.

Community
  • 1
  • 1
Jaffer Wilson
  • 7,029
  • 10
  • 62
  • 139

1 Answers1

1

Looks like the crate-commoncrawl plugin was not installed correctly. See https://github.com/crate/crate-commoncrawl#build--install.

Sebastian Utz
  • 719
  • 3
  • 9
  • I have tried these steps too. But I do not know what is the problem. I have built the jar and copied it to the plugin folder of CRATE. but I don't know what is the problem.. Kindly let me know if I am wrong. I have used the link as `http://localhost:4200/_plugin/crate-admin` as I am working on my system. I would like to know whether this is the right url or not... – Jaffer Wilson Dec 02 '16 at 07:03
  • did you check crate's startup logs if the ccrawl plugin is listed under the loaded plugins? if not, installation seems not to work, otherwise plugin seems to be broken. I'd open a issue in both cases at https://github.com/crate/crate-commoncrawl with detailed information (e.g. log ouput, steps to reproduce etc) – Sebastian Utz Dec 02 '16 at 10:29
  • Sure. I look forward to hear from you Sebastian. I would like to know why this problem occurred. Do keep updating. Thankx – Jaffer Wilson Dec 02 '16 at 11:16