2

If we using 6 mapper in sqoop to importing the data from Oracle, then how many connection will be establish between sqoop and source.

Will it be a single connection or it will be 6 connections for each mapper.

Ram Ghadiyaram
  • 28,239
  • 13
  • 95
  • 121
smisra3
  • 107
  • 1
  • 12

3 Answers3

2

As per sqoop docs:

Likewise, do not increase the degree of parallism higher than that which your database can reasonably support. Connecting 100 concurrent clients to your database may increase the load on the database server to a point where performance suffers as a result.

That means all the mappers will make concurrent connections.

Also keep in mind, if your table has 2 records only, then sqoop will only use 2 mappers not all the 6 mappers.

Check my other answer to understand concept of number of mappers in Sqoop command.

EDIT:

All the mappers will make inactive connections as JDBC client program. Then active connections (which actually fires SQL query) will be shared among multiple mappers.

Fire SQOOP IMPORT command in -verbose mode, you will see logs -

DEBUG manager.OracleManager$ConnCache: Got cached connection for jdbc:oracle:thin:@192.xx.xx.xx:1521:orcl/dev

DEBUG manager.OracleManager$ConnCache: Caching released connection for jdbc:oracle:thin:@192.xx.xx.xx:1521:orcl/dev

Check getConnection and recycle methods for more details.

Community
  • 1
  • 1
Dev
  • 13,492
  • 19
  • 81
  • 174
  • Question is about number of connections if you use 6 number of mappers. Its not about the number of mappers and its side effects, which is not relevant! – Ram Ghadiyaram Jul 14 '16 at 04:04
  • @RamPrasadG It is clearly mentioned **all the mappers will make concurrent connection.** I hope you can understand what does it mean? Also, I removed extra information from docs section. – Dev Jul 14 '16 at 04:11
  • ok you mean irrespective of number of mappers there will be one connection. Question is how many connections ? your answer would be one in that case ? – Ram Ghadiyaram Jul 14 '16 at 04:15
  • I think this answer is relevant, since not only mentions that each mapper will make a single connection to the database (which is the answer to the original question), but also the relationship between the number of records and the number of mappers. – Jaime Caffarel Jul 14 '16 at 07:14
  • @JaimeCr: "**_all the mappers will make concurrent connection**_" meaning is .. connection is shared between all mappers. Where he was telling that one mapper one connection.. its out of my wisdom. pls also note that after my comment, answer was edited – Ram Ghadiyaram Jul 14 '16 at 07:31
  • Sure, I'm just refering to the answer as it is right now, not as it was. In my opinion, "all the mappers will make concurrent connections" means that each mapper will have its own connection (all of them concurrent). Maybe it is not crystal clear (or it wasn't before the edit), and it should have included "6 connections" as part of the answer, like you did, but I would not consider the answer as "not relevant" IMHO. – Jaime Caffarel Jul 14 '16 at 08:04
  • @JaimeCr I added more details after observing connections with running sqoop commands. – Dev Jul 14 '16 at 08:44
  • I understand that referred methods from grepcode are connection pooling or connection cache (other java examples are apache dbcp). But where is the caller in mapper of mapreduce. IMHO mappers setup method should open the connection and cleanup method should release the connection. also *** connectionMap.put(key, conn);*** in *recycle* method means that you are using multiple connection which are in connectionMap and as and when you release the connection then it will sit in connectionMap... – Ram Ghadiyaram Jul 14 '16 at 09:13
0

It probably depends on Manager but I guess all of them likely to create one. Take DirectPostgresSqlManager. It creates one connection per mapper through psql COPY TO STDOUT Please take a look at managers at Sqoop Managers

yusufaytas
  • 1,231
  • 13
  • 20
-1

Each map task will get a DB connection. so in your case 6 maps then 6 connections. please visit github/sqoop to see how it was implemented

-m specify the number of mapper task will be running as part of the Job. so more number of mappers then more number of connections.

Ram Ghadiyaram
  • 28,239
  • 13
  • 95
  • 121