0

Just started a new gig where I manage SAS programs which gather data, apply suppressions, and ultimately create a list of contact information for campaigns to be distributed to customers.

Pardon the ignorance of my question in advance, but here it goes.

The team currently stores all their code on a Unix server, executes and tests it via Putty sessions, and then does everything from creating directories, merging code from weekly code run folders (branches) back to the master code directory, etc. It's very manual. The quality check process is also manual, sending code to another team, and then they use Subversion or Beyond Compare to compare the diffs.

With the limited knowledge of GitHub and other repositories (SVN, BitBucket) I have, it seems as if one of those solutions would be excellent to stream line this process. Just curious to have someone validate my thinking and make sure I'm on the right track, and then see if this is even possible.

  1. Implement Github or Bitbucket, then migrate the existing "Master Code" folders from the server to the repo master branches.
  2. When a campaign needs to run weekly, create a new branch accordingly (such as 09022016, 09092016, etc.) and then make any necessary changes.
    1. Commit changes to the respective weekly branch, and then trigger a code review in something such as Crucible. Is this possible?
    2. Once approved, deploy changes to the Unix server (in a folder with the same name as the branch in the repo) so that the .SAS or .ksh files can be executed. Is this doable? What would need to be used for this to happen?
    3. If changes need to be utilized for future campaign executions, merge the code from the branch back to the master.
    4. Repeat for all future campaign executions.

Does this make sense? Am I missing something? I was previously a developer at another organization, but never did any heavy coding, plus all these processes were already in place, so they just worked.

Sorry for the long winded question. Any help would be greatly appreciated.

Edit: More simplified questions... - Can I use Github to replace storing SAS programs in traditional file directories? Does this make more sense? - Can I then trigger a deployment from Github to a Unix server upon check in/approval? - If I choose to, could that deployment from Github to Unix also initiate a SAS program to kick off an entire campaign? - If I remember correctly, I can go into Crucible, select the commits/code I want to review, and then initiate a code review with the appropriate individuals, so this doesn't need to be automated. Right?

user5442184
  • 43
  • 1
  • 5
  • Way too opinionated based for Stack Overflow. So many different ways to go about this. – Charlie Fish Sep 09 '16 at 17:21
  • Thanks for the feedback, perhaps you could make a recommendation for a better place to ask this question then? – user5442184 Sep 09 '16 at 17:31
  • @CharlieFish I don't see why this is necessarily opinion-based, though it needs some revision. – Joe Sep 09 '16 at 18:35
  • @user5568219 I would make some revisions to make sure this is asking directly answerable question without opinion, but I think mostly this is okay as is. "Can you use X with Y to do Z" is okay; "Is this the best way to do XYZ" is not (that's an opinion). – Joe Sep 09 '16 at 18:36
  • @Joe Well I don't really understand the question then. Personally it sounds like OP was posting this as more of a discussion as opposed to an actual question which makes it opinion based. Doesn't seem like there is a specific issue or problem. – Charlie Fish Sep 09 '16 at 18:40
  • OP is asking whether he can use SVN/github to develop in SAS (whether it's possible to/is a reasonable way to develop). That's answerable and a programming question. He's also asking if that workflow is a reasonable workflow for using git/svn. Again, I think answerable. – Joe Sep 09 '16 at 18:49
  • There are about seven questions here, all along the lines of "can/how do I do continuous deployment with SAS and Git and Unix" The simplest answers is yes, you can, but I think this would be better as several, more specific questions. – david25272 Sep 12 '16 at 22:22

1 Answers1

0

SAS is definitely usable with source control. If you're using Enterprise Guide 7.1 as your IDE, you actually have direct in-IDE integration with git if you want (and the IDE saves projects as mini-git repositories).

Prior to that version, or if you're not using EG, you can use source control flavor of your choice with .sas files directly and manage it just like any other programming language. What you describe is roughly what I'd recommend, with some fine tuning depending on how you do things and how your servers/etc. are set up.

The main difference you're probably going to see with SAS vs something like c/java is that you may not be able to test your code locally, if you don't have a SAS install locally. You either will need a SAS test server, or a test branch on your SAS server separate from your prod branch (and dev also?) that you keep entirely separate and treat as a different server (even if not actually different). SAS costs a lot, so it's harder to justify separate dev/test/prod environment servers.

For example, what I do on my projects:

/prod/PROJECT/R##/<code> - <code> is coming from SVN via pull
/test/PROJECT/R##/<code> - <code> is committed to SVN from here, and every round has a R## folder.  All of them are synced to the same trunk, separate branches.

I actually do my development on test due to the nature of what I do (I need to use PII to develop, which can't live on dev) so obviously this would be better in a 3 tier setup but it would be similar then. When I do things optimally, anyway, I develop in test branch, commit, and then pull that to the prod.

One thing that I think is interesting and has two answers: whether you put the round/week in PROD or not. That I think you can do either way depending on your use case. If it's necessary or useful to have immediate access to each round's (week's in your case) programs, then you keep them separated.

If it's not, if you run that week's run and then are done with the program, it may be better to just have the main run in a single folder that overwrites each week with the newest code. That way you know you have the newest code available and know that if you just pull from the SVN you get the up to date code.


One final note. If you're running the same code each week except for changing some details that are data-driven, consider having a constant codebase (other than of course non-week based improvements) and having those data-driven details stored in a database. Obvious data-driven elements: the email addresses, the date, text fills in the email.

Even if the text of the email entirely changes each week (say this is the Subway email blast each week that sends a new coupon, for example), the SAS code can be 100% identical every pull. Your database contains a few fields/tables/etc. that store the email addresses, the email body, the fills, whatever, and you have something in SAS that identifies what are the current ones (perhaps just the newest, or perhaps something based on the date you're running the code), pulls them out, and runs.

In that context, there's no reason to have weekly branches or anything like that; you only have branches when there's a change in the program, like if you change the query that pulls email addresses, or add the functionality to have a link in the text, or whatever. That way you avoid having programming changes when possible and mostly just have the data change - which is easier to QC. (Of course, perhaps the data is generated by a program, which also will need QCing and branches and such, but that program also shouldn't need to be changed every week, right?)

Joe
  • 62,789
  • 6
  • 49
  • 67
  • Regarding costs- licences for non-PRD servers are sometimes discounted, depending on the arrangements your company has in place, but I agree that SAS is still some of the more expensive software out there. – user667489 Sep 09 '16 at 21:53