3

Which is more efficient? Is there a downside to using open() -> write() -> close() compared to using logger.info()?

PS. We are accumulating query logs for a university, so there's a perchance that it becomes big data soon (considering that the min-max cap of query logs per day is 3GB-9GB and it will run 24/7 constantly for a lifetime). It would be appreciated if you could explain and differentiate in great detail the efficiency in time and being error prone aspects.

Community
  • 1
  • 1
anobilisgorse
  • 906
  • 2
  • 11
  • 25
  • 2
    I highly doubt that one application writing to a log file is considered "big data" unless you are generating megabytes of text per second. – OneCricketeer Apr 24 '16 at 05:18
  • If you think this is big data, you are in for a shock if you ever get to work with actual big data. – Burhan Khalid Apr 24 '16 at 05:21
  • @cricket_007 I'm sorry if ever I am quite overstating, it's just that we are accumulating an approximate 9GB maximum of query logs per day, and this engine of ours will be utilized by our university constantly since the day of integration. – anobilisgorse Apr 24 '16 at 05:26
  • @BurhanKhalid Yes I do understand. But with 9GB of data per day, running 24/7 for a year, I guess we would reach that term soon enough within years. (Although we might be deleting the unnecessary ones for limited resources issues) – anobilisgorse Apr 24 '16 at 05:27
  • 1
    @AngeloSembrano: 9GB per day is not "big data" (not that there's a specific definition, but you know what I mean). All of your data can fit on a single hard drive uncompressed for the next 2 years. – Blender Apr 24 '16 at 05:28
  • @AngeloSembrano In a previous job we logged an average of 80,000 queries per second. I think you'll be alright. – Kirk Strauser Apr 24 '16 at 05:28
  • @Blender There are single 6.5 TB drives? – OneCricketeer Apr 24 '16 at 05:29
  • @cricket_007: You can get an 8TB hard drive for like $250. – Blender Apr 24 '16 at 05:30
  • @Blender Oops, I keep forgetting how large single HDD's are getting :) – OneCricketeer Apr 24 '16 at 05:32
  • Big Data usually refers to complexity of the data, it very rarely means the size of the data - although generally those two have a strong correlation. Big Data simply means - a data set so large **or complex** that traditional tools cannot parse and extract any meaningful information from it. – Burhan Khalid Apr 24 '16 at 05:32

2 Answers2

4

Use the method that more closely describes what you're trying to do. Are you making log entries? Use logger.*. If (and only if!) that becomes a performance issue, then change it. Until then it's an optimization that you don't know if you'll ever need.

Pros for logging:

  • It's semantic. When you see logging.info(...), you know you're writing a log message.
  • It's idiomatic. This is how you write Python logs.
  • It's efficient. Maybe not extremely efficient, but it's so thoroughly used that it has lots of nice optimizations (like not running string interpolation on log messages that won't be emitted because of loglevels, etc.).

Cons for logging:

  • It's not as much fun as inventing your own solution (which will invariably turn into an unfeatureful, poorly tested, less efficient version of logging).

Until you know that it's not efficient enough, I highly recommend you use it. Again, you can always replace it later if data proves that it's not sufficient.

Kirk Strauser
  • 30,189
  • 5
  • 49
  • 65
0

It is always better to use a built-in facility unless you are facing issues with the built-in functionality.

So, use the built-in logging function. It is proven, tested and very flexible - something you cannot achieve with open() -> f.write() -> close().

Burhan Khalid
  • 169,990
  • 18
  • 245
  • 284