I always get distressed when I read about the latest example of someone cracking a commercial system, stealing credit card numbers, etc. There's absolutely no excuse for this! We in the industry should have the knowledge and skills to preclude the possibility of anyone hacking into back-end systems. Those of us who've been at it for a number of years have learned the architectures and techniques for protecting information. I can only surmise that the people responsible for some of these systems have not had the benefit of experience. I don't personally know anyone who would make the kind of obvious mistakes we read about on a frighteningly frequent basis.

Firewalls are powerful tools which can be used to limit the ports numbers which are exposed to the Internet. In the case of a WWW server, the only one which should be accessible should be port 80; also port 443 if you're using HTTPS. Some servers expose a wide variety of ports and protocols by default. While I consider this to be a huge mistake from an architectural standpoint, using a firewall can prevent external access to these ports. While internal servers are reasonably safe from hacking, external-facing servers need to be hardened against any possible attacks.

I should admit my bias at the outset. I don't consider Microsoft's products to be very secure. One simply has to look at the frequency of security updates to the underlying software and applications. I wouldn't even dream of using IIS as an external server, for example. All I have to do is look at my Apache webserver logs to see the typical attacks which are designed to exploit security holes in IIS. Microsoft programmers also don't seem to have learned how to handle buffer overruns. That's one of the advantages of using Java over languages such as C and C++. The platform handles strings rather than requiring the programmer to allocate and manage a fixed-length buffer.

I use an Apache/Tomcat combination as my front-line server. While it doesn't offer full J2EE functionality, it's a high-performance HTTP server combined with the reference implementation of a servlet container. You can use JDBC in order to interface to a database or RMI/IIOP to communicate with a back-end J2EE server. And that's one of the keys to designing a secure network architecture. Using a second firewall between your front-end and back-end servers, and limiting the ports which are exposed, ensures that there's no direct access to the back-end servers from the 'net. Front-end servers typically reside in what is commonly referred to as the "demilitarized zone" or DMZ. Here's a diagram.

In this architecture, the external firewall would typically only pass requests for ports 80 and 443. Since the webserver only needs to communicate with the appserver, only port 900 would need to be enabled on the internal firewall. Since the front-end servers should be equipped with dual network interfaces, it's easy to also limit access by IP address on the internal firewall. Finally, on robust systems such as RedHat Linux, you can use iptables to serve as the firewall. Since it functions at the network interface layer, you can actually combine the firewall and front-end functionality in a single server. The disadvantage is that you can't use load balancing in that scenario.

Load balancing is another complex topic and actually impacts how one designs an application. There are two types of load balancers; statefull and stateless. In the first the load balancer examines the source of IP packets and routes requests to the same server every time. This can make it easier for developers to maintain session data on a single server. The drawback is that if a server goes down then the session data is lost and the users routed to that machine will have to login to the application again. What I consider to be a better approach is one which permits users to access any front-line server without regard to session persistence.

And this is where load balancing impacts the application architecture. J2EE servers have the ability to persist session data to a database. Whether using URL rewriting or cookies, a session reference can be maintained by the client and servers can retrieve the data from the back-end database. If a developer opts to maintain a lot of data in the session then there's obviously going to a performance impact with my preferred approach. The solution is simple, however: keep your session data to a minimum. The caching capabilities of a modern RDBMS should be able to serve up the data in a fraction of a second. What I like about stateless load balancers is the ability to add front-end servers according to load or take them down for required maintenance with no impact to users.

So now we have no direct access to the internal servers from the 'net. They're blocked by both port numbers and IP addresses. Using a reserved IP address range (such as 192.168.x.x) means that nothing coming from the net can masquerade as one of your front-end servers. Since only ports 80 and 443 are allowed in through the external firewall then they also couldn't access port 900 on the internal firewall anyway. One of the dangers of not limiting access via the external firewall is that other protocols could be used to breach security on your front-end servers. Skilled hackers could then inject trafiic on the second network interface to gain access to back-end servers.

Security needs to be an overarching concern when building web applications. From using HTTPS when requesting personally-identifiable information to using encryption to store credit card numbers1, people have a right to expect that a company is doing everything in their power to secure information. While some might suggest that the hackers are more capable than the custodians of information, I would disagree. We have learned much about ways to secure information. Tools and techniques exist to prevent access to sensitive data. We need to ensure that we apply them intelligently or else run the risk of a loss of faith in the entire e-commerce industry.

NOTES:

  1. The Payment Card Industry (PCI) has strict rules regarding the use and storage of payment card information. While some might consider them onerous, I fully support their approach. It's why I prefer to use an Internet Payment Processor (IPP) for payment processing. The cost of ensuring compliance with the PCI DSS (Data Security Standard) can be formidable, as in up to $1 million. I'd prefer to put the onus on a third-party, even if it means slightly higher per transaction fees.

Other Thoughts

I could probably write an entire article about this topic but will add it here for your consideration. As I mentioned above, I don't consider IIS to be a suitable host for "industrial strength" applications. I don't even consider it appropriate for entirely internal applications. And that's why I'm perpetually surprised by the number of job postings I see these days which require both J2EE and .NET/ASP. As far as I'm concerned you either commit to a platform-agnostic solution like J2EE or you choose the properietary Microsoft approach. I've found that it's often a question of personal philosophy. That could explain why nobody I know personally has embraced both. You're either in one camp or the other so requiring both doesn't make a great deal of sense to me.

Part of the problem, as I see it anyway, is that some companies have chosen the proprietary approach because it's quick and easy, not to mention cheap. Let's face it: VB programmers are dime-a-dozen. But applications written in Visual BASIC are simply not what I would choose to support critical business functions. Java doesn't let you make the typical C-language mistakes (assignment versus comparison in an if statement) and type declarations such as ArrayList<String> can catch incorrect assignments at the compilation stage. The more mistakes you can catch at the development phase, the less likelihood of serious application errors at the testing and deployment stages.

Finally, it surprises me how many companies require skills in technologies which I consider to be obselete. Perl was the original language used to generate dynamic content on webservers. It was also used by UNIX system administrators for generating reports. It's not a strongly typed language so it's easy to make mistakes. PHP originally stood for Personal Home Page and was designed along similar lines. ColdFusion was an attempt to improve the situation but ultimately couldn't support applications at the enterprise scale, IMHO. The volume of concurrent requests on the 'net these days can tax even server solutions like J2EE. But at least J2EE is scalable and, with appropriate knowledge of how to architect solutions, can support incredible request volumes.

Of course not everyone will agree with these conclusions. Some can reasonably suggest that solutions like WebSphere and WebLogic are too expensive, even though OSS solutions like Glassfish are readily available. My response would likely be that these same people might try to use something like MySQL rather than a solid RDBMS solution like Oracle or DB/2. There's a big difference between designing solutions which will serve hundreds or thousands of requests per day as opposed to millions or tens of millions. If your website is vital to your company and your cashflow, has to run 24x7x365 then you should select the most robust solutions available. As always, YMMV.

April 29th, 2009

Updated January 14th, 2017

Copyright 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017 by Phil Selby. All right reserved.