In my Java application I have an implementation for a file-system layer, where my file class is a wrapper for Hadoop filesystem methods. I am upgrading the from hadoop3-1.9.17 to hadoop3-2.2.8 and I am using the shaded jar of the new version.
My File class which has methods like write, read ...etc
class File {
private String path;
private FileSystem fs;
}
Here is how my write method is implemented
@Override
public OutputStream write(boolean overwriteIfExists) throws IOException {
return fs.create(path, overwriteIfExists);
}
And my read method:
@Override
public InputStream read() throws IOException {
return fs.open(path);
}
I have a performance test case which I run on the above fileSystem implementation which uses org.apache.hadoop.fs.FileSystem
the test runs creates many threads, each thread creates an instance of File class which has a specific path (i.e gs://some-bucket/objectX) and each thread run same operation, read, rename, checkExists..etc.
I ran same tests several time on both versions of the Hadoop connectors and the new [2.2.8] is showing overall slower execution time (almost 2X the old connector time).
Below is a comparison between the average execution time for each operation while using each connector version:
operation, hadoop3-1.9.17, hadoop3-2.2.8
READ 4542.71, 10171.26, (X2 old)
RENAME 1347.75, 4483.27, (X4 old)
EXISTS 47.23, 1538.74, (X50 old)
CREATE 570.1, 1539.81, (X3 old)
I have checked this github issue & tried to follow the recommendation to fine tune the performance using the configs/params but failed to find any improvement.
Is there any guidelines on parameter configurations to improve the above operations time?
Or might this performance issue be due to some incompatibility in my class-path jars? Even though I am using the shaded jar can other jars interfere?
Here is a list of jars I have in my class path:
- gcs-connector-hadoop3-2.2.8-shaded.jar
- google-extensions-0.7.1.jar
- google-api-client-1.32.2.jar
- google-http-client-apache-v2-1.40.1.jar
- proto-google-common-protos-2.7.3.jar
- google-http-client-1.41.8.jar
- google-oauth-client-1.33.3.jar
- google-http-client-jackson2-1.40.1.jar
- grpc-google-cloud-storage-v2-2.2.2-alpha.jar
- google-http-client-gson-1.41.8.jar
- google-cloud-monitoring-1.82.0.jar
- google-cloud-core-http-2.5.4.jar
- proto-google-cloud-storage-v2-2.2.2-alpha.jar
- google-api-client-jackson2-1.32.2.jar
- google-api-services-iamcredentials-v1-rev20210326-1.32.1.jar
- google-oauth-client-java6-1.27.0.jar
- google-cloud-core-grpc-2.5.4.jar
- google-http-client-appengine-1.34.2.jar
- google-cloud-core-2.5.4.jar
- google-auth-library-credentials-1.7.0.jar
- google-cloud-storage-1.106.0.jar
- proto-google-iam-v1-1.2.3.jar
- google-api-services-storage-v1-rev20211018-1.32.1.jar
- google-auth-library-oauth2-http-1.7.0.jar
- proto-google-cloud-monitoring-v3-1.64.0.jar
- grpc-services-1.43.2.jar
- grpc-netty-shaded-1.43.2.jar
- grpc-alts-1.43.2.jar
- grpc-stub-1.43.2.jar
- grpc-census-1.43.2.jar
- grpc-protobuf-1.43.2.jar
- grpc-api-1.43.2.jar
- grpc-xds-1.43.2.jar
- grpc-core-1.43.2.jar
- grpc-protobuf-lite-1.43.2.jar
- grpc-context-1.43.2.jar
- opencensus-contrib-grpc-metrics-0.31.0.jar
- grpc-auth-1.43.2.jar
- gax-grpc-2.7.1.jar
- grpc-grpclb-1.43.2.jar
- api-common-2.1.4.jar
- gax-2.7.1.jar
- gax-httpjson-0.73.0.jar
- util-2.2.8.jar
- util-hadoop-hadoop3-2.2.8.jar
- auto-value-annotations-1.9.jar