0

I have a Python app based on Django that I run over a docker container. The app browses the file system and does analysis on some XML files and extracts embedded source code and exports that into separate files.

The app should run a Java jar file that does static code analysis on files generated by the Django web app.

I thought of isolating both parts of the whole platform. The Python Django part is on a container, the jar file (it's an open source tool) runs on another alpine container.

Now I want to continue development of the tool and make the Django app run the tool through a command on each file that's generated that contains source code.

  • Should I create another Django wrapper on the jar file to expose some endpoints so that the first container could run it? And possibly make the wrapper handle a GET request that would use eval() to run the tool?
  • Is there another way I could enhance this architecture?

Edit: The tool I'm using: https://github.com/AbletonDevTools/groovylint

magsul
  • 1

1 Answers1

0

There is nothing wrong with the way you have the tool set up now, without Docker, and I would keep doing what you're doing unless you're trying to address some specific problem with it.

When you write out the problem description as "browse the file system", "exports into files", "run a jar file", this isn't an architecture that works well in Docker. Sharing files between containers is tricky (requires startup-time options, there are lurking permission issues), one container can't directly run a command in another, and one container can't start another container without being trusted with unrestricted root-level access over the whole system.

Rebuilding this into a form that would work well with Docker would involve some reengineering, that's kind of secondary to the actual problem you're trying to solve. Two typical approaches:

  1. Instead of the Java tool being a tool that's run once and exits, make it be a long-running process with an HTTP interface. The Django front-end would HTTP POST a file it wants processed, and get the results back in the HTTP response.

  2. Deploy a job queue like RabbitMQ. When the Django application has a file it wants processed, it posts it into the job queue. Wrap the Java tool in a worker that reads from the request queue, does its processing, and writes to a response queue. The Django application is then able to pick up the response.

The last option is "most industrial" -- if you needed to process thousands of files at a time, scale up and down the number of Java workers you had based on the workload, and generally run this as a production network service, it's a good way to go. That doesn't sound like the scale you're aiming for.

Finally: you should never eval() anything, and you should especially never eval() something you're getting from an HTTP request. That's a massive security disaster waiting to happen. (Anyone who can reach your service would be able to read or write any file your service could, and potentially even change the code the service is running; this could very easily become a scenario where someone takes over your entire system.)

David Maze
  • 130,717
  • 29
  • 175
  • 215