Jump to content

Archive for 2008

Pac-Man crashed the SIFMA 2008 party. His message: “I will help you save on your electricity bill”

Thursday, June 19th, 2008

June 10th, New York City: At about 100 degrees, this was probably one of the hottest spring days in 2008. The SIFMA (Securities Industry and Financial Markets Association) technology management exhibit was just opening up, and to keep all the suit-wearing businessmen cool, the hotel’s air conditioners were throwing many BTUs away…

This wasn’t without reminding me of the reason why I was there, standing at the AMD booth, demonstrating computation running on an AMD FireStream graphic card…

It all started almost a year ago, when most of our Wall Street customers asked us whether we could help with programming to GPU (Graphical Processing Unit), or most widely known as ‘accelerated graphics card’.

Their interest is pretty simple - financial computations require a LOT of computing power. And with a traditional CPU-based approach like a grid, a LOT of computing power requires a LOT of electrical power, which at the end of the day is lost in the air conditioning system…

It is foreseen than ‘accelerated computing’ based on hardware derived from what is commonly known as ‘graphics cards’ is the best chance to save a lot of those BTUs…

Why?

First, GPU can accelerate computations by a huge ratio.

Pac-Man was released in 1981, and by today’s standards, moving the yellow flat circle across the screen is no more considered a technological achievement. As a matter of fact, 3dfx Inc. revolutionized gaming in 1995 by introducing the first ‘consumer accelerated graphics card’, hence delivering ‘mind bending graphics’. Simultaneously, Microsoft introduced the first version of their DirectX API, now the leading reference in gaming development. And 13 years later, GPUs are now able to create three dimensional images more than 200 times faster than regular CPUs.

It is not a stretch to establish a parallel between the Black-Scholes paper (published in 1973 and introducing basic option pricing concepts) and Pac-Man. After all, they both created a new industry. But while the ‘accelerated computing’ revolution happened in 1995 for video-gaming, Financial and General Purpose accelerated computing is being revolutionized today.

AMD and nVIDIA are the first to introduce new dedicated cards that are no more limited to graphics and linear algebra only, but can also run full double-precision C-like programs on extremely large sets of data. Simultaneously, general purposes APIs are becoming available. And preliminary tests show 10X - 40X improvement for some applications. It is still a bit shy of the acceleration we are seeing in the graphics world, but remember we are talking 1st generation hardware.

Second, GPUs can save power.

Even at a 20X improvement, a single GPU offers performance of 20 CPU cores. And it consumes around less than 150Watts… And if you can picture it correctly, it is easy to compare the size of the 8 cores server I was using at the SIFMA show and the AMD FireStream card (about a fourth of a shoe box).

SIFMA 2008 was the perfect opportunity to confirm that ‘accelerated computing’ is the future. But, the overall feel remains that in most cases GPU development is slowed down by a still maturing industry both for APIs and hardware, but people are seriously investigating it.

And hedging the investment made in a single one of those new technologies is still the main concern, people being a bit reluctant to put all their eggs in the same basket.

As a matter of fact, lessons can be learned from history: When developing video games in 1995, the API of choice was Glide, actually published by the leading and only vendor in the accelerated 3D card market: 3dfx Inc. But “By 2000, the improved performance of Direct3D and OpenGL on the average personal computer, coupled with the huge variety of new 3D cards on the market, the widespread support of these standard APIs by the game developer community and the closure of 3dfx, would make Glide obsolete.” (source: Wikipedia)

I almost forgot! But that was really my reason to be there.

I have been helping in a lot of the parallel assessments for Wall Street and non-Wall Street customer to evaluate current implementations of CPU intensive and non-CPU intensive applications, to see how GPUs and other techniques like multi-threading and service grids can help improve throughputs and reduce latency of applications. And with the support of our development team, we can provide solutions to quickly evaluate, code and test GPU implementation (on multiple APIs) and on multi-core Technology.

You will be surprised at some of the results…

Why would SOA become the dominant architecture for software development?

Wednesday, June 18th, 2008

In a recent blog post, Alex Cameron with EDS talks about SOA becoming the dominant architecture for Software Development. I could definitely see how this could be true. It seems software development has progressed and chosen certain styles of programming languages for a reason. As Java and C++ instrumented separate implementation and interfaces, developers realized they could more easily use another developer’s work without having to know what was going on under the covers. Companies and managers saw that they could more efficiently manage and control large projects with various teams interacting with each other. It led to easier to understand software, more productive development teams, and even documenting the software became simpler as the interface was a great guide as to what the component did.

So what is the extension of that? Not only would that developer like to use someone else’s work without knowing anything about it, but they also want to have access to work done on other OSes, on different hardware, in different languages, and all without having to understand the details. So the previous model of finding a .h file or some other class description in the appropriate programming language would be replaced by a search of WSDLs for the functionality needed. No longer would the developer be limited by language, platform, or in some cases, even geography or affiliation.

Rogue Wave / AMD partnership for Multi-core CPU and GPU

Wednesday, June 11th, 2008

By Patrick Leonard, VP Engineering & Product Strategy

Expansion of our Relationship

Rogue Wave and AMD have a long-standing partnership to advance C++ software development on AMD’s Opteron CPU platform. I’m excited that our two companies have recently announced an expansion of that relationship to make it easier for software applications to take advantage of the additional computing power available on multi-core CPUs and on GPUs (graphical processing units).

For several years, increased performance from all hardware vendors has largely come from additional “cores” instead of faster clock speeds. This provided significant additional processing power, but most existing software doesn’t adequately take advantage without significant modification. This is called the “Multi-core Dilemma”.

Challenge and Opportunity

The Multi-core Dilemma is both a challenge and an opportunity that will increase rapidly as the number of cores and threads continues to increase. A typical GPU already has 128 threads. For applications that lend themselves to parallel processing, this can mean a significant gain in throughput.

Although GPUs have the potential for even greater processing power than their CPU counterparts (for certain applications), there are additional challenges as well:
1. Developer productivity - use of the software tools requires special training.
2. Portability - software written for GPUs will not run on other GPUs or on CPUs.

Our partnership is designed to address both of these issues, and to close the gap between hardware and software that has been widening over the past few years.

Although both companies are committed to broadly applicable solutions, our initial focus is on the financial services industry, where much of the activity is already happening.

What are your experiences with multi-core CPU and / or GPU? Please post a response with your thoughts.

Matrix multiply in parallel - is a different result ok?

Monday, June 9th, 2008

By Patrick Leonard, VP Engineering & Product Strategy

When moving a production application from one system to another, extensive testing is generally done to ensure, among other things, that results from the new systems agree with expected results from the old system. This is true whether changing operating systems, hardware, or anything else.

For example, many financial services firms have moved from Unix systems to Linux for a variety of good reasons. When moving quantitative analysis applications, they had to verify - to multiple significant digits - that calculations done on Linux would not be different from what they got in the old system.

Different is not always wrong - sometimes a different new result is “more correct” - but it takes effort and time to verify that and make sure.

Now many companies are moving from sequential processing to parallel processing. This can actually be a bit trickier. Certain mathematical algorithms calculate differently in a parallel environment than in a sequential environment. This may not have anything to do with the implementation, it is often just the nature of the numbers.

Matrix multiplication is an example of this. Since matrix multiplication is not commutative in most cases, multiplying a matrix in parallel can result in a different outcome because the multiplication and subsequent addition is necessarily done in a different order.

Here is an example (thanks to David Haney):

Given two 4 x 4 matrices (A and B), you would normally calculate the result in 0,0 as:


(A00 * B00) + (A01 * B10) + (A02 * B20) + (A03 * B03)

If you change the order of operations though, like the following (note the parens):


((A00 * B00) + (A01 * B10)) + ((A02 * B20) + (A03 * B03))

Then you might see different results, depending on how the floating point rounding turns out. You probably won’t see much skew at this scale (especially if all of the numbers are roughly the same magnitude), but if you were dealing with an 1024 x 1024 matrix, you would probably start seeing some variation.

There are some algorithms for breaking up a matrix multiply that allow you to maintain equivalent results to sequential, but still at least partially execute the code in parallel, but from what we’ve seen those methods look like they’re less efficient than algorithms that do some amount of reordering.

The outcome, although different, may not be any less “correct”. But that difference may have business consequences that need to be planned for. Regardless of the software programming model and technology used to go parallel, this is something to be mindful of.

Release at Any Time: the Documentation Perspective

Tuesday, June 3rd, 2008

At Rogue Wave, the trend is towards agile development, with frequent releases of new features between major product releases. To this end, we maintain an impressive infrastructure of nightly automated testing of a large code base across a daunting number of platform, compiler, and database combinations. The system includes extensive reporting of test results against code quality baselines, regression analysis, and ongoing fixing of priority bugs. The goal is to maintain the code base at a high level of quality such that we can release at any time with confidence.

As a documentation person, the good news is that Rogue Wave has always valued documentation highly, and considers good documentation an important part of the product. The challenge is that documentation must therefore strive for the same level of consistent, release-at-any-time quality.

== Getting There with the Process Automation ==

When I realized that the documentation team either needed to match the agility and automation of the development teams or risk becoming less relevant, I could take comfort in the fact that documentation already had in place considerable process automation. For many years at Rogue Wave, a conversion architecture has supported the ability to reconvert FrameMaker source documents into the release formats easily and at any time. An added feature of this process was extensive reporting of formatting and linking problems found during the conversions.

The first step was to create infrastructure to support automated nightly conversion runs. The biggest obstacle was automating up-to-date PDF creation, the one main distribution format that was still created manually. A utility called FrameScript was the solution to that problem. With a little more creative jiggling, we reached a state where all documentation could be converted nightly, and the conversion error reporting neatly summarized on a single point of access Web page.

So far so good, but what customers expect to see is not an amorphous bundle of document files. They expect a well-formed document distribution, with convenient access points to the information. So we next devised a process for defining a manifest of everything that needed to be in a given product distribution, and a script to act on that manifest to create document distributions exactly as we expected to deliver to customers. Naturally we added some testing, too, resulting in a nightly distribution quality report.

== Document Health ==

All well and good, but all of this automation counts for very little without a commitment and a process to maintain good document quality — what we choose to call document health. Scripts are very non-judgemental, which is the inspiration for the old saying about the consequences of feeding them garbage. So while we in documentation were emulating the automated processes of our development colleagues, we were also adopting their scrum-based agile methodology. As they work on a feature, we work beside them on its documentation. Critically, we also continually monitor the reports that come out of our nightly automation, and attempt to keep the errors at or near zero. This works quite well with the incremental changes expected with an established, stable product, not quite so well with the major revisions and refactorings that are the inevitable burden with dynamically changing newer products.

Even if the picture is sometimes less than completely rosy, there is no arguing with the vision. When it is going well, this approach gloriously meets its intended goal. The document distributions that are created each night are exactly the documentation we intend to release. If the document set is reasonably stable and we are on top of the errors, we truly can on any given day publish a document distribution to release engineering and be proud to have it given to our customers.

It doesn’t get any better than that.

Life after CORBA

Tuesday, June 3rd, 2008

I have been involved in distributed computing for a number of years, and recognize that Service Oriented Architecture is just another approach to getting distributed applications to work together. Previous generations include things like rpc’s, DCE (http://en.wikipedia.org/wiki/Distributed_computing_environment), CORBA (http://www.omg.org/) etc. The advantage of SOA lies in fact that the underlying standards, ie XML, SOAP etc, are broadly accepted across the industry, so interoperability between vendor products is much more real now than it has ever been previously.

I happen to have a good deal of experience in the CORBA world, having worked for Visigenic Software both before and after it was acquired by Borland. CORBA was an effective tool for connecting distributed objects, providing both language and platform neutrality. This was true so long as your platform was not Windows, because then you had to deal with COM/DCOM and the world of COM/CORBA bridges. This split between Microsoft and the rest of the world was a key issue that ultimately limited the proliferation of CORBA, but not before it was broadly adopted, particularly in the Telco and Financial Services industries. You still see many implementations of CORBA in what are now referred to as legacy applications, but not as much in newly developed systems.

Many of our SourcePro C++ customers also use CORBA orbs, most often Orbix. What we are finding is that many customers have applications that use older versions of Orbix that are no longer supported, and yet they continue to pay significant maintenance fees on those licenses. One customer explained that they feel they are at risk every time they touch the application, because if something breaks, they have no good avenue to seek help. This is not the ideal that IT strives for, ie it is both expensive and risky. The good news is that for many customers, there is a better alternative that it easy to put in place.

In many cases, orbs were used essentially as a communications mechanism between remote applications, maybe handling the mediation between C++ and Java applications. Today, this problem can easily be solved using a Web services approach. Rogue Wave has a product known as HydraExpress that has the capability to easily turn a C++ application into a service. For CORBA users the paradigm is familiar. This product can take WSDL (remember IDL?) as an input and generate stubs and skeletons for a Web services client or server. There are open source tools http://search.cpan.org/~perrad/CORBA-XMLSchemas-0.41/idl2wsdl.pl) that help you to convert IDL to WSDL, which is the key step in the process. It is not always that simple, but often it is darn close. Once complete, you have an application based upon modern standards that gives you more flexibility, less risk and at significantly less cost. Sounds pretty good to me.

The problems inherit in ticket distribution

Monday, June 2nd, 2008

I recently bought tickets to an upcoming once-in-a-lifetime concert event. The tickets were being sold through an online ticket distributor which seems to have a firm hold on the market for ticket distribution. There were quite a few people trying for these tickets and I was expecting lots of problems. Here in Colorado we lived through the World Series ticket fiasco of 2007 and I was expecting nothing less for this one. I anticipated slow page loads, having to refresh often, being dependent on luck to get through, and ultimately I expected to come away with no tickets.

However, I was pleasantly surprised. The site never failed. It allowed me to specify the tickets I was looking for, then it searched, and then I was presented with seats that I could buy as well as an option to search again. Then came the surcharges: $14.50 convenience on each ticket; $4.50 Processing; $2.50 delivery; and a $4.00 Facility charge on each ticket. Granted the facility charge is probably from the venue itself, but the rest go to the distributor. It made me think: How can they charge so much without driving customers away? I would certainly use someone else if there was an option.

All we have to do is look at that 2007 World Series to see that the process isn’t that easy. You have to manage impossible amounts of traffic that comes in a very short amount of time. Seats have to be held and assigned in the order in which the requests come in without giving the same seats away twice. There has to be a process for holding and releasing tickets. And you have to have a scalable server workforce to handle anything from the small venues where 30 tickets might be sold up through events where you might have 300,000 tickets for a series, all of which might sell out within an hour or two.

This use case is tailor-made for Hydra. Hydra allows for scalability, maintaining proper order, failover in case of a server crash, and will take advantage of the extra processing power allowed by multi-core hardware. Hydra will also allow new servers to come online to handle an increase in demand with minimal configuration change and no disruption to existing services that are being provided. This way, idle servers can be assigned to high demand areas when needed, and can be moved back and forth between projects or events as volume changes. Hydra will simplify the real difficulties of ticket distribution and let someone work on the business model and user experience rather than the technological difficulties.

Next task: Take on the ticket distribution market!

What is going with this GPU stuff?

Friday, April 25th, 2008

By Patrick Leonard, VP Engineering & Product Strategy

There is a lot of buzz in the industry today about the use of graphic cards for general computing, a.k.a. GP-GPU. Essentially, as clock speeds for CPUs have slowed down, we have all been scrambling to go parallel. CPU vendors have introduced dual-core, quad-core and more to increase performance, introducing what we at Rogue Wave termed the Multi-core Dilemma.

In addition, many people have been looking at specialty hardware for additional threads. One of the most interesting ones that has gained a lot of attention lately is the graphical processing unit (GPU). These have been used for years as graphics accelerators for video games and other media.

Recently people have been using them for general purpose computing since they are so good at crunching numbers (after all, rendering graphics is all about advanced math). This led to the term GP-GPU (general purpose graphical processing unit). It also happens that this hardware is very parallel. A typical consumer grade graphics card has about 128 threads. That’s a lot of calculations in a small space - no wonder they are so attractive. And for certain applications, it’s not uncommon to see anywhere from 10 to 30x throughput increase over a dual core CPU.

However, the software development environment for GPUs has several problems:
1. GPU hardware is difficult to program. This is improved from a few years ago, but it’s still much more difficult than a typical CPU environment and lacks the robust tools we’re all used to.
2. APIs for this hardware are proprietary to the vendor hardware. This is something we hear on a regular basis as an inhibitor to adopting GP-GPU.

And new development isn’t the only thing - probably more important right now is how to make existing code run here without rewriting everything.

All of this gets to the heart of parallel computing in general. The progress of software development in general depends on our ability to do parallel computing, and do it well. GP-GPU programming is a window into the world of challenges - and opportunities - that lay ahead.

Architecting your Concurrency Model

Friday, March 7th, 2008

By Patrick Leonard, VP Engineering & Product Strategy

Abstraction of concurrency from software application logic

(a.k.a: “I feel the need… the need for - Concurrency…”)

From Monolith to Component Architectures

In the olden days of computing, everything was combined into a single lump of software - including operating system functions, application logic, data, user presentation. After some time, we realized that creating a separation between the hardware and the software application would be useful, and the operating system was born. Some time later, we realized that managing the data was a distinct task, and databases became a separate entity. Some time after that, we decided that it would be a good idea to split out the user interface, and the era of client/server began.

So it went, the software industry continued to evolve our architectures into more componentized and modular arrangements.

Architecting for Concurrent Computing

Now that multi-core architectures are common and the need for concurrency in software architectures well understood, the next question is how best to architect our applications and how best to structure development organizations to support them. For the moment, let’s talk about the development organization. In the past, concurrency was a task limited to edge cases, but now ubiquitous multi-core hardware is making it much more common.

There are two ways to handle this. First, train all your software developers to be experts in concurrency (yikes…). Second, have concurrency specialists to focus on making your applications parallel. Since training all of your software developers to be experts in concurrency seems daunting, it would seem that the second option is better. But if concurrency is now required through significant portion of your application architecture, how can only a few engineers or architects be responsible for it?

The answer lies in abstracting the concurrency model from the application logic. While this may not be possible for all aspects of your concurrency model, it certainly should be for some - especially for well-defined services (in the SOA definition of the word). Services can be run in multiple instances across multiple threads and multiple cores (even multiple servers) to achieve a significant degree of concurrency without rewriting the code inside the service.

To the extent that you can do this, you can allow your application developers to focus on application logic and your concurrency experts to focus on concurrency. In addition, you can gain quite a bit more flexibility to your application architecture. The more your concurrency is abstracted, the easier it is to change without affecting the application logic. It’s really just an extension of the idea of loose coupling.

The Concurrency Model

This means that the application developer does not have to be the primary owner of the concurrency model. The application developer is able to focus on the application, and a concurrency expert can design about the concurrency model, just as a DBA does with the data model - I wouldn’t be surprised if we start to see something like a Concurrency Model Architect (CMA?).

Anyway, the whole thing relies on being able to separate your concurrency model from your application logic. More on that later.

C++ in 2008?

Sunday, February 10th, 2008

Joe Pruit posted a blog responding to some thoughts from Rogue Wave on C++ in 2008. We know that not everybody sees C++ in the same way we do, but we issued the press release was to challenge some of the conventions of how developers and architects perceive C++. Joe’s comments are worth reading, and are also worth responding to.

Enterprise applications do widely use Java and .NET, no question. Managed languages are the right tool for the job for a wide variety of applications. C++ however, (plus C and other native languages) continues a solid, and in many cases, growing presence in several areas:

    High performance: for applications that require low latency and/or low memory usage, a large number of architects are choosing C++ in favor of managed languages. These are common in Financial Services, Military and many other applications. An interesting recent example is the team from Carnegie Mellon that won $2mm in the DARPA Urban Challenge for building an intelligent robotic car. According to the project lead, “Everything we did was written in C++.”
    Embedded: For embedded and mobile devices, one of the fastest growing areas in computing, C++ is the language of choice over both Java and .NET. Lower memory, tighter power and heat limits and other requirements make C++ a natural choice for optimized application development. According to the Gartner Dataquest report ‘User Survey Analysis: Embedded Software Development Tools and RTOS, North America, 2006′, “For application development, C and C++ are the most popular development languages. Surprisingly, Java usage dropped in 2006.” (Daya Nadamuni, 13 September 2008)
    Existing apps: And, of course there are billions of lines of existing mission critical applications built on C++ in enterprises around the world.

These are almost all mission critical, and many are also legacy. In my experience, many if not most mission critical applications are also legacy. My favorite quote on this: years ago, one of my dev managers once quipped that the definition of legacy is “anything that has gone into production…”

As far as the overall viability of the language, there are a few interesting points worth considering:

    1. C++ developers are commanding strong salaries, in many cases higher than developers with Java and .NET. I don’t know for sure, but I suspect it’s a combination of C++ resurgence and a smaller number of C++ developers available.
    2. Universities are starting to reconsider their move to teaching computer science students in Java. Many universities continue to teach in C++ so that students have a solid foundational understanding of how systems work. Some professors contend that teaching Java has contributed to a decline in computer science skills.
    3. The language has a vibrant (and broad) standards community and continues a solid evolution. The C++0X standards effort is creating the next version of the language. The proposed enhancements include modern concepts from Java and elsewhere, but still maintain what makes C++ unique and different.

Although Java and .NET continue to have significant mindshare for mainstream applications, C++ is the language of choice for many architects in new development projects - probably more than most people think.