If you need to know the user-agent for one case, there's little you can do other than whitelisting the User-Agent:
header in the relevant CloudFront behavior.
CloudFront caches responses against the request headers it sends, so the net result will be that for a given request, a cached response that was obtained by forwarding a request with User-Agent: Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36
will not be considered usable by CloudFront for serving a future request for User-Agent: Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/40.0.2214.85 Safari/537.36
even though for all practical purposes, it's the same browser.
When you configure CloudFront to cache based on one or more headers and the headers have more than one possible value, CloudFront forwards more requests to your origin server for the same object. This slows performance and increases the load on your origin server. If your origin server returns the same object regardless of the value of a given header, we recommend that you don't configure CloudFront to cache based on that header.
— http://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/header-caching.html
The phrase "configure CloudFront to cache based on one or more headers" is synonymous with "whitelisting headers" to be forwarded to the origin.
This is the reason to forward only as much as you need -- to do otherwise hurts your cache hit ratio, in this case because of the variation of User-Agent:
strings, which means you're not getting the full benefit of the edge caches, there are more requests processed by the origin server and more bandwidth used between the origin and CloudFront... but there isn't really an alternative. CloudFront doesn't charge anything for storage in the edge caches, so the only cost difference will be whatever is found in those other factors.
The phrase "slows performance" (above) doesn't mean CloudFront gets slower -- it only refers to the reduced likelihood of any particular request being a cache hit, because of the variation in possible header values.
Incidentally, the behavior of CloudFront is correct, in this regard, since a varying User-Agent:
can mean a varied response, as indeed you've indicated it does.
Judicious use of path patterns and multiple cache behaviors is the key to getting the best possible use of the CDN cache, given the circumstance you've described. Only whitelisting User-Agent:
on path patterns that need it, such as /images/*
(which is, of course, a path I just made up) would be advisable. This same advice also applies to cookies and query strings as well as headers. For path patterns where you don't need the cookies and/or query strings, don't enable cookie and/or query string forwarding -- otherwise, cached responses will only be served to users who present the same cookie or for a request where the path and the query string match a cached response -- so, obviously, there would not be a lot of cache hits in such a situation.