2

tl;dr: Here's a repo containing the problem.


Cassandra and HDFS both use guava internally, but neither of them shades the dependency for various reasons. Because the versions of guava aren't binary compatible, I'm finding NoSuchMethodErrors at runtime.

I've tried to shade guava myself in my build.sbt:

val HadoopVersion =  "2.6.0-cdh5.11.0"

// ...

val hadoopHdfs = "org.apache.hadoop" % "hadoop-hdfs" % HadoopVersion
val hadoopCommon = "org.apache.hadoop" % "hadoop-common" % HadoopVersion
val hadoopHdfsTest = "org.apache.hadoop" % "hadoop-hdfs" % HadoopVersion % "test" classifier "tests"
val hadoopCommonTest = "org.apache.hadoop" % "hadoop-common" % HadoopVersion % "test" classifier "tests"
val hadoopMiniDFSCluster = "org.apache.hadoop" % "hadoop-minicluster" % HadoopVersion % Test

// ...

assemblyShadeRules in assembly := Seq(
  ShadeRule.rename("com.google.common.**" -> "shade.com.google.common.@1").inLibrary(hadoopHdfs).inProject,
  ShadeRule.rename("com.google.common.**" -> "shade.com.google.common.@1").inLibrary(hadoopCommon).inProject,
  ShadeRule.rename("com.google.common.**" -> "shade.com.google.common.@1").inLibrary(hadoopHdfsTest).inProject,
  ShadeRule.rename("com.google.common.**" -> "shade.com.google.common.@1").inLibrary(hadoopCommonTest).inProject,
  ShadeRule.rename("com.google.common.**" -> "shade.com.google.common.@1").inLibrary(hadoopMiniDFSCluster).inProject
)

assemblyJarName in assembly := s"${name.value}-${version.value}.jar"

assemblyMergeStrategy in assembly := {
  case PathList("META-INF", "MANIFEST.MF") => MergeStrategy.discard
  case _ => MergeStrategy.first
}

but the runtime exception persists (ha -- it's a cassandra joke, people).

The specific exception is

[info] HdfsEntitySpec *** ABORTED ***
[info]   java.lang.NoSuchMethodError: com.google.common.base.Objects.toStringHelper(Ljava/lang/Object;)Lcom/google/common/base/Objects$ToStringHelper;
[info]   at org.apache.hadoop.metrics2.lib.MetricsRegistry.toString(MetricsRegistry.java:406)
[info]   at java.lang.String.valueOf(String.java:2994)
[info]   at java.lang.StringBuilder.append(StringBuilder.java:131)
[info]   at org.apache.hadoop.ipc.metrics.RetryCacheMetrics.<init>(RetryCacheMetrics.java:46)
[info]   at org.apache.hadoop.ipc.metrics.RetryCacheMetrics.create(RetryCacheMetrics.java:53)
[info]   at org.apache.hadoop.ipc.RetryCache.<init>(RetryCache.java:202)
[info]   at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initRetryCache(FSNamesystem.java:1038)
[info]   at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.<init>(FSNamesystem.java:949)
[info]   at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.<init>(FSNamesystem.java:796)
[info]   at org.apache.hadoop.hdfs.server.namenode.NameNode.format(NameNode.java:1040)
[info]   ...

How can I properly shade guava to stop the runtime errors?

Community
  • 1
  • 1
erip
  • 16,374
  • 11
  • 66
  • 121
  • 1
    Can you post the error that you're seeing? – Tim Moore Dec 20 '17 at 19:46
  • 1
    I think you also need to rename in the Guava library itself. – Tim Moore Dec 20 '17 at 19:53
  • 1
    You probably want to follow a process similar to this one: http://manuzhang.github.io/2016/10/15/shading.html – Tim Moore Dec 20 '17 at 19:54
  • @TimMoore edited the post. Trying to shade guava more generally now. – erip Dec 20 '17 at 20:02
  • @TimMoore No luck shading `"com.google.**"`. – erip Dec 20 '17 at 20:43
  • 1
    From what I know, shading with assembly has some limitations, I encourage you to have a look at this alternative shading plugin in Coursier https://github.com/coursier/coursier/tree/master/sbt-shading/src. – Jorge Dec 21 '17 at 10:44
  • @JorgeVicenteCantero It's not documented. – erip Dec 21 '17 at 13:49
  • 1
    LOL. You can read code, my friend. The very user of that plugin is in the coursier build, if you want an example. The problem with assembly is that it cannot shade things transitively, which coursier can IIRC. Worth it. – Jorge Dec 21 '17 at 15:51
  • 1
    you are getting the error when running tests? the shading will accrue only when building a fat jar, not during regular compile – lev Dec 22 '17 at 03:22
  • @lev that's right. I've tried `sbt test` and `sbt assemble:test`. – erip Dec 22 '17 at 11:45
  • @JorgeVicenteCantero I tried to use coursier but I haven't quite figured it out. [Here](https://gist.github.com/erip/aeb32a9b81006ef93b57b7671cb68d16) is my attempt. – erip Dec 22 '17 at 13:24

1 Answers1

2

The shading rules will only apply when you are building a fat jar. It won't be applied during other sbt tasks.

If you want to shade some library inside of your hadoop dependencies, you can create a new project with only the hadoop dependencies, shade the libraries, and publish a fat jar with the all the shaded hadoop dependencies.

This is not a perfect solution, because all of the dependencies in the new hadoop jar will be "unknown" to whom uses them, and you will need to handle conflicts manually.

Here is the code that you will need in your build.sbt to publish a fat hadoop jar (using your code and sbt assembly docs):

val HadoopVersion =  "2.6.0-cdh5.11.0"

val hadoopHdfs = "org.apache.hadoop" % "hadoop-hdfs" % HadoopVersion
val hadoopCommon = "org.apache.hadoop" % "hadoop-common" % HadoopVersion
val hadoopHdfsTest = "org.apache.hadoop" % "hadoop-hdfs" % HadoopVersion classifier "tests"
val hadoopCommonTest = "org.apache.hadoop" % "hadoop-common" % HadoopVersion %  classifier "tests"
val hadoopMiniDFSCluster = "org.apache.hadoop" % "hadoop-minicluster" % HadoopVersion 

lazy val fatJar = project
  .enablePlugins(AssemblyPlugin)
  .settings(
    libraryDependencies ++= Seq(
        hadoopHdfs,
        hadoopCommon,
        hadoopHdfsTest,
        hadoopCommonTest,
        hadoopMiniDFSCluster
    ),
      assemblyShadeRules in assembly := Seq(
      ShadeRule.rename("com.google.common.**" -> "shade.@0").inAll
    ),
    assemblyMergeStrategy in assembly := {
      case PathList("META-INF", "MANIFEST.MF") => MergeStrategy.discard
      case _ => MergeStrategy.first
    },
    artifact in (Compile, assembly) := {
      val art = (artifact in (Compile, assembly)).value
      art.withClassifier(Some("assembly"))
    },
    addArtifact(artifact in (Compile, assembly), assembly),
    crossPaths := false, // Do not append Scala versions to the generated artifacts
    autoScalaLibrary := false, // This forbids including Scala related libraries into the dependency
    skip in publish := true
  )

lazy val shaded_hadoop = project
  .settings(
    name := "shaded-hadoop",
    packageBin in Compile := (assembly in (fatJar, Compile)).value
  )

I haven't tests it, but that is the gist of it.


I'd like to point out out another issue that I noticed, your merge strategy might cause you problems, since you want to apply different strategies on some of the files. see the default strategy here.
I would recommend using something like this to preserve the original strategy for everything that is not deduplicate

assemblyMergeStrategy in assembly := {
          entry: String => {
            val strategy = (assemblyMergeStrategy in assembly).value(entry)
            if (strategy == MergeStrategy.deduplicate) MergeStrategy.first
            else strategy
          }
      }
lev
  • 3,986
  • 4
  • 33
  • 46
  • I haven't had time to build the fat jar until today. This was an extremely useful answer and put me on the right track. The only change I needed to make was in the merge strategy for the `fatJar`. I needed to add `case PathList("META-INF", "services", "org.apache.hadoop.fs.FileSystem") => MergeStrategy.filterDistinctLines` so hdfs can maintain all of the different filesystems. – erip Feb 09 '18 at 12:46
  • And that pattern came from [this](https://stackoverflow.com/a/23810585/2883245) answer's comments. – erip Feb 09 '18 at 12:57