Even relatively complex operations like say convert this document into a PDF etc basically only has two useful states either it worked or something specific failed at which point just tell me that thing.
Now independent software like web servers or database can have useful logs because they have completely independent interfaces with the outside world. But I call libraries they don’t call me.
Any library can do a bad job here, that doesn’t come down to logging vs error messages.
Errors are massaged for the reader - a database access library will know that a DNS error occurred and that is (the first step for debugging) why it cannot connect to the specified datastore. The service layer caller does not need to know that there is a DNS error, it just needs to know that the specified datastore is uncontactable (and then it can move on to the approriate resilience strategy, retry that same datastore, fallback to a different datastore, or tell the API that it cannot complete the call at all).
The caller can then decide what to do (typically say "Well, I tried, but nothing's happening, have yourself a merry 500)
It makes no sense for the Service level to know the details of why the database access layer could not connect, no more than it makes any sense for the database access layer to know why there is a DNS configuration error - the database access just needs to log the reasons (for humans to investigate), and tell the caller (the service layer) that it could not do the task it was asked to do.
If the service layer is told that the database access layer encountered a DNS problem, what is it going to do?
Nothing, the best it can do is log (tell the humans monitoring it) that a DB access call (to a specific DB service layer) failed, and try something else, which is a generic strategy, one that applies to a host of errors that the database call could return.
I'll accept that it is a security problem; why would it be a serious security problem? Any error that the client knows about the configuration is unlikely to be one that is exploitable anyway, and if it is (for example, the client gets told "could not connect to 192.168.1.139:5432"), then you have bigger problems than sending error messages to clients.
What sort of example did you have in mind that makes this a serious security problem?
Technical infrastructure details: Database types, versions, server configurations File paths and directory structures: Enabling directory traversal attacks Programming logic: Including code snippets that expose application behavior Sensitive credentials: Database connection strings, usernames, passwords Software versions: Allowing attackers to identify known vulnerabilities The impact of this vulnerability is significant. Error messages can expose not just that a system runs PHP, but that it runs a specific, unsupported version — providing attackers with a clear exploitation path.
Security researchers have documented numerous instances where verbose error messages enabled breaches:
Dating App Vulnerability (2016): Tinder’s login system displayed error messages indicating whether specific email addresses were registered, enabling brute-force attacks to identify valid accounts. Password Manager Leak (2019): A popular password manager’s login form disclosed through error messages whether email addresses were registered with the service, facilitating targeted attacks. Government Agency Breach (2020): A major US government agency’s website displayed error messages revealing whether specific usernames existed in the system, enabling attackers to enumerate valid accounts.
[1] https://medium.com/@instatunnel/security-misconfiguration-th...
I mean, sure, it's a security issue, but on a scale of 1-10, with 1 being "security issue, we'll fix in next point release" and 10 being "All-hands until this emergency patch goes out, and we keep the system offline while fixing it", this is definitely a 1.
Secondly, this barely counts as a security issue; some systems I worked on recently required error messages to tell the user how to fix the error they got. You don't simply say (for example) "attachment not found", you say "Field $FIELD is empty. This is a mandatory field" or similar.
There are still plenty of secure systems out there that will direct the user to create an account if an unregistered user attempts to log in.
It's a trade-off in usability: some places go the "Authentication failed (but we won't tell you why)" route, and others go the "Click here to sign up" route.
Jesus no.
Aside from this now being an argument on semantics, someone enumerating every customer/user account you have is serious.
It opens the door for privacy leaks, targeted attacks (like password attempts, phishing, or account lockouts)
If you don't want to take that seriously, thank you for your honesty, I will ensure that I never have an account on any service you work on.
Second, it can lose information about at what exact time and in what exact order things happened. For example, cleanup operations during stack unwinding can also produce log messages, and then it’s not clear anymore that the original error happened before those.
Even when you include a timestamp at each level, that’s often not sufficient to establish a unique ordering, unless you add some sort of unique counter.
It gets even more complicated when exceptions are escalated across thread boundaries.
Personally I don't mind it... the whole "$outer: $inner" convention naturally lends to messages that still parse in my brain and actually include the details in a pretty natural way. Something like:
"Error starting up: Could not connect to database: Could not read database configuration: Could not open config file: Permission denied"
Tells me the config file for the database has broken permissions. Because the permission denied error caused a failure opening the config file, which caused a failure reading the database configure, which caused a failure connecting to the database, which caused an error starting up. It's deterministic in that for "$outer: $inner", $inner always caused $outer.
Maybe it's just experience though, in a sense that it takes a lot of time and familiarity for someone to actually prefer the above. Non-technical people probably hate such messages and I don't necessarily blame them.
Then catch the exception on the backup path and wrap it in a custom exception that conveys to the handler the fact that you were on the backup path. Then throw the new exception.
Should if throw an exception for that to let you know, or should it gracefully fallback so your service stays alive ? The middle ground is leaving a log and chugging along, your proposition throws that out of the window.
There are a lot of libraries that haven non-idempotent actions. There are a lot of inputs that can be problematic to log, too.
I guess in those cases standard practice is for lib to return a detailed error yeah.
As far as traces, trying to solve issues that depend on external systems is indeed a tall order for your code. Isn't it beyond the scope of the thing being programmed.
It is very, very common that the code that you have written isn't even the code that executes. It gets modified by enterprise anti virus or "endpoint security". All too often do I see "File.Open" calls return true that the caller has access, but actually what's happened is AV has intercepted the call, blocked it improperly, and returns 0 bytes file that exists (even though there is actually a larger file there) instead of saying the file cannot open.
I will never, in a million years, be granted access to attach a debugger to such a client computer. In fact, they will not even initially disclose that they are using anti virus. They will just say the machine is set up per company policy and that your software doesn't work, fix it. The assumption is always that your software is to blame and they give you nearly nothing, except for the logs.
The only way I ever get this solved in a reasonable amount of time is by looking at verbose logs, determining that the scenario they have described is impossible, explaining which series of log messages is not able to occur, yet occurred on their system, and ask them to investigate further. Usually this ends up being closed with a resolution like "Checked SuperProtectPro360 logs and found it was writing infernal error logs at the same time as using the software. Adjusted the monitoring settings and problem is now resolved."
Either way logging the input (file name) is notably not sufficient for debugging if the file can change between invocations. The action can be idempotent and still be affected by other changes in the system.
> trying to solve issues that depend on external systems is indeed a tall order for your code. Isn't it beyond the scope of the thing being programmed.
If my program is broken I need it fixed regardless of why it’s broken. The specific example here of a file changing is likely to manifest as flakiness that’s impossible to diagnose without detailed logs from within the library.
I will say that error handling and logging in general is one of my weakpoints, but I made a comment about my approach so far being dbg/pdb based, attaching a debugger and creating breakpoints and prints ad-hoc rather than writing them in code. I'm sure there's reasons why it isn't used as much and logging in code is so much more common, but I have faith that it's a path worth specializing in.
Back to the file reading example, for a non-idempotent function. Considering we are using an encapsulating approach we have to split ourselves into 3 roles. We can be the IO library writer, we can be the calling code writer, and we can be an admin responsible for the whole product. I think a common trap engineers fall for is trying to keep all of the "global" context (or as much as they can handle) at all times.
In this case of course we wouldn't be writing the non-idempotent library, so of course that's not a hat we wear, do not quite care about the innards of the function and its state, rather we have a well defined set of errors that are part of the interface of the function (EINVAL, EACCES, EEXIST).
In this sense we respect the encapsulation boundaries and are provided the information necessary by the library. If we ever need to dive into the actual library code, first the encapsulation is broken and we are dealing with a leaky abstraction, second we just dive into the library code, (or the filesystem admin logs themselves).
It's not precisely the type of responsibility that can be handled at design time and in code anyways, when we code we are wearing the calling-module programmer hat. We cannot think of everything that the sysadmin might need at the time of experiencing an error, we have to think that they will be sufficiently armed with enough tools to gather the information necessary with other tools. And thank god for that! checking /proc/fs and looking at crash dumps, and attaching processes with dbg will yield far better info than relying on whatever print statements you somehow added to your program.
Anyways at least that's my take on the specific example of glibc-like implementations of POSIX file operations like open(). I'm sure the implications may change for other non-idempotent functions, but at some point, talking about specifics is a bit more productive than talking in the abstract.