Jump to content

Archive for the 'News and Analysis' Category

Rogue Wave / AMD partnership for Multi-core CPU and GPU

Wednesday, June 11th, 2008

By Patrick Leonard, VP Engineering & Product Strategy

Expansion of our Relationship

Rogue Wave and AMD have a long-standing partnership to advance C++ software development on AMD’s Opteron CPU platform. I’m excited that our two companies have recently announced an expansion of that relationship to make it easier for software applications to take advantage of the additional computing power available on multi-core CPUs and on GPUs (graphical processing units).

For several years, increased performance from all hardware vendors has largely come from additional “cores” instead of faster clock speeds. This provided significant additional processing power, but most existing software doesn’t adequately take advantage without significant modification. This is called the “Multi-core Dilemma”.

Challenge and Opportunity

The Multi-core Dilemma is both a challenge and an opportunity that will increase rapidly as the number of cores and threads continues to increase. A typical GPU already has 128 threads. For applications that lend themselves to parallel processing, this can mean a significant gain in throughput.

Although GPUs have the potential for even greater processing power than their CPU counterparts (for certain applications), there are additional challenges as well:
1. Developer productivity - use of the software tools requires special training.
2. Portability - software written for GPUs will not run on other GPUs or on CPUs.

Our partnership is designed to address both of these issues, and to close the gap between hardware and software that has been widening over the past few years.

Although both companies are committed to broadly applicable solutions, our initial focus is on the financial services industry, where much of the activity is already happening.

What are your experiences with multi-core CPU and / or GPU? Please post a response with your thoughts.

Matrix multiply in parallel - is a different result ok?

Monday, June 9th, 2008

By Patrick Leonard, VP Engineering & Product Strategy

When moving a production application from one system to another, extensive testing is generally done to ensure, among other things, that results from the new systems agree with expected results from the old system. This is true whether changing operating systems, hardware, or anything else.

For example, many financial services firms have moved from Unix systems to Linux for a variety of good reasons. When moving quantitative analysis applications, they had to verify - to multiple significant digits - that calculations done on Linux would not be different from what they got in the old system.

Different is not always wrong - sometimes a different new result is “more correct” - but it takes effort and time to verify that and make sure.

Now many companies are moving from sequential processing to parallel processing. This can actually be a bit trickier. Certain mathematical algorithms calculate differently in a parallel environment than in a sequential environment. This may not have anything to do with the implementation, it is often just the nature of the numbers.

Matrix multiplication is an example of this. Since matrix multiplication is not commutative in most cases, multiplying a matrix in parallel can result in a different outcome because the multiplication and subsequent addition is necessarily done in a different order.

Here is an example (thanks to David Haney):

Given two 4 x 4 matrices (A and B), you would normally calculate the result in 0,0 as:


(A00 * B00) + (A01 * B10) + (A02 * B20) + (A03 * B03)

If you change the order of operations though, like the following (note the parens):


((A00 * B00) + (A01 * B10)) + ((A02 * B20) + (A03 * B03))

Then you might see different results, depending on how the floating point rounding turns out. You probably won’t see much skew at this scale (especially if all of the numbers are roughly the same magnitude), but if you were dealing with an 1024 x 1024 matrix, you would probably start seeing some variation.

There are some algorithms for breaking up a matrix multiply that allow you to maintain equivalent results to sequential, but still at least partially execute the code in parallel, but from what we’ve seen those methods look like they’re less efficient than algorithms that do some amount of reordering.

The outcome, although different, may not be any less “correct”. But that difference may have business consequences that need to be planned for. Regardless of the software programming model and technology used to go parallel, this is something to be mindful of.

Release at Any Time: the Documentation Perspective

Tuesday, June 3rd, 2008

At Rogue Wave, the trend is towards agile development, with frequent releases of new features between major product releases. To this end, we maintain an impressive infrastructure of nightly automated testing of a large code base across a daunting number of platform, compiler, and database combinations. The system includes extensive reporting of test results against code quality baselines, regression analysis, and ongoing fixing of priority bugs. The goal is to maintain the code base at a high level of quality such that we can release at any time with confidence.

As a documentation person, the good news is that Rogue Wave has always valued documentation highly, and considers good documentation an important part of the product. The challenge is that documentation must therefore strive for the same level of consistent, release-at-any-time quality.

== Getting There with the Process Automation ==

When I realized that the documentation team either needed to match the agility and automation of the development teams or risk becoming less relevant, I could take comfort in the fact that documentation already had in place considerable process automation. For many years at Rogue Wave, a conversion architecture has supported the ability to reconvert FrameMaker source documents into the release formats easily and at any time. An added feature of this process was extensive reporting of formatting and linking problems found during the conversions.

The first step was to create infrastructure to support automated nightly conversion runs. The biggest obstacle was automating up-to-date PDF creation, the one main distribution format that was still created manually. A utility called FrameScript was the solution to that problem. With a little more creative jiggling, we reached a state where all documentation could be converted nightly, and the conversion error reporting neatly summarized on a single point of access Web page.

So far so good, but what customers expect to see is not an amorphous bundle of document files. They expect a well-formed document distribution, with convenient access points to the information. So we next devised a process for defining a manifest of everything that needed to be in a given product distribution, and a script to act on that manifest to create document distributions exactly as we expected to deliver to customers. Naturally we added some testing, too, resulting in a nightly distribution quality report.

== Document Health ==

All well and good, but all of this automation counts for very little without a commitment and a process to maintain good document quality — what we choose to call document health. Scripts are very non-judgemental, which is the inspiration for the old saying about the consequences of feeding them garbage. So while we in documentation were emulating the automated processes of our development colleagues, we were also adopting their scrum-based agile methodology. As they work on a feature, we work beside them on its documentation. Critically, we also continually monitor the reports that come out of our nightly automation, and attempt to keep the errors at or near zero. This works quite well with the incremental changes expected with an established, stable product, not quite so well with the major revisions and refactorings that are the inevitable burden with dynamically changing newer products.

Even if the picture is sometimes less than completely rosy, there is no arguing with the vision. When it is going well, this approach gloriously meets its intended goal. The document distributions that are created each night are exactly the documentation we intend to release. If the document set is reasonably stable and we are on top of the errors, we truly can on any given day publish a document distribution to release engineering and be proud to have it given to our customers.

It doesn’t get any better than that.

Life after CORBA

Tuesday, June 3rd, 2008

I have been involved in distributed computing for a number of years, and recognize that Service Oriented Architecture is just another approach to getting distributed applications to work together. Previous generations include things like rpc’s, DCE (http://en.wikipedia.org/wiki/Distributed_computing_environment), CORBA (http://www.omg.org/) etc. The advantage of SOA lies in fact that the underlying standards, ie XML, SOAP etc, are broadly accepted across the industry, so interoperability between vendor products is much more real now than it has ever been previously.

I happen to have a good deal of experience in the CORBA world, having worked for Visigenic Software both before and after it was acquired by Borland. CORBA was an effective tool for connecting distributed objects, providing both language and platform neutrality. This was true so long as your platform was not Windows, because then you had to deal with COM/DCOM and the world of COM/CORBA bridges. This split between Microsoft and the rest of the world was a key issue that ultimately limited the proliferation of CORBA, but not before it was broadly adopted, particularly in the Telco and Financial Services industries. You still see many implementations of CORBA in what are now referred to as legacy applications, but not as much in newly developed systems.

Many of our SourcePro C++ customers also use CORBA orbs, most often Orbix. What we are finding is that many customers have applications that use older versions of Orbix that are no longer supported, and yet they continue to pay significant maintenance fees on those licenses. One customer explained that they feel they are at risk every time they touch the application, because if something breaks, they have no good avenue to seek help. This is not the ideal that IT strives for, ie it is both expensive and risky. The good news is that for many customers, there is a better alternative that it easy to put in place.

In many cases, orbs were used essentially as a communications mechanism between remote applications, maybe handling the mediation between C++ and Java applications. Today, this problem can easily be solved using a Web services approach. Rogue Wave has a product known as HydraExpress that has the capability to easily turn a C++ application into a service. For CORBA users the paradigm is familiar. This product can take WSDL (remember IDL?) as an input and generate stubs and skeletons for a Web services client or server. There are open source tools http://search.cpan.org/~perrad/CORBA-XMLSchemas-0.41/idl2wsdl.pl) that help you to convert IDL to WSDL, which is the key step in the process. It is not always that simple, but often it is darn close. Once complete, you have an application based upon modern standards that gives you more flexibility, less risk and at significantly less cost. Sounds pretty good to me.

The problems inherit in ticket distribution

Monday, June 2nd, 2008

I recently bought tickets to an upcoming once-in-a-lifetime concert event. The tickets were being sold through an online ticket distributor which seems to have a firm hold on the market for ticket distribution. There were quite a few people trying for these tickets and I was expecting lots of problems. Here in Colorado we lived through the World Series ticket fiasco of 2007 and I was expecting nothing less for this one. I anticipated slow page loads, having to refresh often, being dependent on luck to get through, and ultimately I expected to come away with no tickets.

However, I was pleasantly surprised. The site never failed. It allowed me to specify the tickets I was looking for, then it searched, and then I was presented with seats that I could buy as well as an option to search again. Then came the surcharges: $14.50 convenience on each ticket; $4.50 Processing; $2.50 delivery; and a $4.00 Facility charge on each ticket. Granted the facility charge is probably from the venue itself, but the rest go to the distributor. It made me think: How can they charge so much without driving customers away? I would certainly use someone else if there was an option.

All we have to do is look at that 2007 World Series to see that the process isn’t that easy. You have to manage impossible amounts of traffic that comes in a very short amount of time. Seats have to be held and assigned in the order in which the requests come in without giving the same seats away twice. There has to be a process for holding and releasing tickets. And you have to have a scalable server workforce to handle anything from the small venues where 30 tickets might be sold up through events where you might have 300,000 tickets for a series, all of which might sell out within an hour or two.

This use case is tailor-made for Hydra. Hydra allows for scalability, maintaining proper order, failover in case of a server crash, and will take advantage of the extra processing power allowed by multi-core hardware. Hydra will also allow new servers to come online to handle an increase in demand with minimal configuration change and no disruption to existing services that are being provided. This way, idle servers can be assigned to high demand areas when needed, and can be moved back and forth between projects or events as volume changes. Hydra will simplify the real difficulties of ticket distribution and let someone work on the business model and user experience rather than the technological difficulties.

Next task: Take on the ticket distribution market!

What is going with this GPU stuff?

Friday, April 25th, 2008

By Patrick Leonard, VP Engineering & Product Strategy

There is a lot of buzz in the industry today about the use of graphic cards for general computing, a.k.a. GP-GPU. Essentially, as clock speeds for CPUs have slowed down, we have all been scrambling to go parallel. CPU vendors have introduced dual-core, quad-core and more to increase performance, introducing what we at Rogue Wave termed the Multi-core Dilemma.

In addition, many people have been looking at specialty hardware for additional threads. One of the most interesting ones that has gained a lot of attention lately is the graphical processing unit (GPU). These have been used for years as graphics accelerators for video games and other media.

Recently people have been using them for general purpose computing since they are so good at crunching numbers (after all, rendering graphics is all about advanced math). This led to the term GP-GPU (general purpose graphical processing unit). It also happens that this hardware is very parallel. A typical consumer grade graphics card has about 128 threads. That’s a lot of calculations in a small space - no wonder they are so attractive. And for certain applications, it’s not uncommon to see anywhere from 10 to 30x throughput increase over a dual core CPU.

However, the software development environment for GPUs has several problems:
1. GPU hardware is difficult to program. This is improved from a few years ago, but it’s still much more difficult than a typical CPU environment and lacks the robust tools we’re all used to.
2. APIs for this hardware are proprietary to the vendor hardware. This is something we hear on a regular basis as an inhibitor to adopting GP-GPU.

And new development isn’t the only thing - probably more important right now is how to make existing code run here without rewriting everything.

All of this gets to the heart of parallel computing in general. The progress of software development in general depends on our ability to do parallel computing, and do it well. GP-GPU programming is a window into the world of challenges - and opportunities - that lay ahead.

Architecting your Concurrency Model

Friday, March 7th, 2008

By Patrick Leonard, VP Engineering & Product Strategy

Abstraction of concurrency from software application logic

(a.k.a: “I feel the need… the need for - Concurrency…”)

From Monolith to Component Architectures

In the olden days of computing, everything was combined into a single lump of software - including operating system functions, application logic, data, user presentation. After some time, we realized that creating a separation between the hardware and the software application would be useful, and the operating system was born. Some time later, we realized that managing the data was a distinct task, and databases became a separate entity. Some time after that, we decided that it would be a good idea to split out the user interface, and the era of client/server began.

So it went, the software industry continued to evolve our architectures into more componentized and modular arrangements.

Architecting for Concurrent Computing

Now that multi-core architectures are common and the need for concurrency in software architectures well understood, the next question is how best to architect our applications and how best to structure development organizations to support them. For the moment, let’s talk about the development organization. In the past, concurrency was a task limited to edge cases, but now ubiquitous multi-core hardware is making it much more common.

There are two ways to handle this. First, train all your software developers to be experts in concurrency (yikes…). Second, have concurrency specialists to focus on making your applications parallel. Since training all of your software developers to be experts in concurrency seems daunting, it would seem that the second option is better. But if concurrency is now required through significant portion of your application architecture, how can only a few engineers or architects be responsible for it?

The answer lies in abstracting the concurrency model from the application logic. While this may not be possible for all aspects of your concurrency model, it certainly should be for some - especially for well-defined services (in the SOA definition of the word). Services can be run in multiple instances across multiple threads and multiple cores (even multiple servers) to achieve a significant degree of concurrency without rewriting the code inside the service.

To the extent that you can do this, you can allow your application developers to focus on application logic and your concurrency experts to focus on concurrency. In addition, you can gain quite a bit more flexibility to your application architecture. The more your concurrency is abstracted, the easier it is to change without affecting the application logic. It’s really just an extension of the idea of loose coupling.

The Concurrency Model

This means that the application developer does not have to be the primary owner of the concurrency model. The application developer is able to focus on the application, and a concurrency expert can design about the concurrency model, just as a DBA does with the data model - I wouldn’t be surprised if we start to see something like a Concurrency Model Architect (CMA?).

Anyway, the whole thing relies on being able to separate your concurrency model from your application logic. More on that later.

C++ in 2008?

Sunday, February 10th, 2008

Joe Pruit posted a blog responding to some thoughts from Rogue Wave on C++ in 2008. We know that not everybody sees C++ in the same way we do, but we issued the press release was to challenge some of the conventions of how developers and architects perceive C++. Joe’s comments are worth reading, and are also worth responding to.

Enterprise applications do widely use Java and .NET, no question. Managed languages are the right tool for the job for a wide variety of applications. C++ however, (plus C and other native languages) continues a solid, and in many cases, growing presence in several areas:

    High performance: for applications that require low latency and/or low memory usage, a large number of architects are choosing C++ in favor of managed languages. These are common in Financial Services, Military and many other applications. An interesting recent example is the team from Carnegie Mellon that won $2mm in the DARPA Urban Challenge for building an intelligent robotic car. According to the project lead, “Everything we did was written in C++.”
    Embedded: For embedded and mobile devices, one of the fastest growing areas in computing, C++ is the language of choice over both Java and .NET. Lower memory, tighter power and heat limits and other requirements make C++ a natural choice for optimized application development. According to the Gartner Dataquest report ‘User Survey Analysis: Embedded Software Development Tools and RTOS, North America, 2006′, “For application development, C and C++ are the most popular development languages. Surprisingly, Java usage dropped in 2006.” (Daya Nadamuni, 13 September 2008)
    Existing apps: And, of course there are billions of lines of existing mission critical applications built on C++ in enterprises around the world.

These are almost all mission critical, and many are also legacy. In my experience, many if not most mission critical applications are also legacy. My favorite quote on this: years ago, one of my dev managers once quipped that the definition of legacy is “anything that has gone into production…”

As far as the overall viability of the language, there are a few interesting points worth considering:

    1. C++ developers are commanding strong salaries, in many cases higher than developers with Java and .NET. I don’t know for sure, but I suspect it’s a combination of C++ resurgence and a smaller number of C++ developers available.
    2. Universities are starting to reconsider their move to teaching computer science students in Java. Many universities continue to teach in C++ so that students have a solid foundational understanding of how systems work. Some professors contend that teaching Java has contributed to a decline in computer science skills.
    3. The language has a vibrant (and broad) standards community and continues a solid evolution. The C++0X standards effort is creating the next version of the language. The proposed enhancements include modern concepts from Java and elsewhere, but still maintain what makes C++ unique and different.

Although Java and .NET continue to have significant mindshare for mainstream applications, C++ is the language of choice for many architects in new development projects - probably more than most people think.

Second Year at OOP Conference

Friday, February 1st, 2008

OOP 

It was our second year participating in the OOP conference in Munich last week.  This conference focuses on aspects of modern software engineering, so as you can imagine, it is heavily infused with SOA technologies.  I gave two talks, one about modernizing C++ applications, and the other on creating a service grid.  In my talks I focused on the challenges of maintaining performance characteristics when moving from a monolithic application to an SOA, while also dealing with distributed computing issues that may not have been previously relevant.  Both drew a good number of people and several of them came by our booth later to discuss their situation and how it would fit in with what I discussed. 

I was a bit surprised that there were a good number of people that said they have problems dealing with large XML files.  We call this VLM (Very Large Messaging) and we are very familiar with the issues surrounding it.  The most common issue has to do with memory consumption.  Typical DOM based parsers use about 6 times the size of the original XML file in memory.  When you start dealing with large files, that quickly becomes a problem.  The other main issue people are having is rooted in performance.  Parsing large files is time consuming, especially when unexpected things like garbage collection kick in.  We have been solving this problem for a while now with our HydraSDO for XML product.  It’s very fast being written in C++, provides a native Java API as well that utilizes shared memory so as not to impact performance, and uses a non-extractive, indexed parsing model that gives a 1.5 times original XML document size memory footprint. 

I’m excited for the opportunity to help these people with the large XML file problems they’re having.  If you’re having similar issues, let me know and we can get you some quick help.

HydraSDO for Databases Launched

Tuesday, January 29th, 2008

I am very happy to announce the General Availability of HydraSDO for Databases 1.0.  We already have a “Java Edition” so this completes the set by providing support for the C++ SDO API.  The Service Data Object specification is particularly important for database access because it is the only industry standard API designed specifically for accessing data in Service Oriented Environments. HydraSDO for Databases provides a Data Access Service (DAS) for relational data that is built on SourcePro DB. This of course means that it is very, very fast – we expect it to be the fastest available in the market. It also means that existing SourcePro DB users can confidently migrate their database applications to a Service Oriented Environment. The database tools provided mean that there is no database access programming required.  HydraSDO for Databases also integrates seamlessly with HydraSCA. This is particularly important because it can be expected that relational data will be a core element of applications built using a Service Component Architecture, just as it has been in older application architectures such as client-server. HydraSDO for Databases is freely available for evaluation at our Download Center.