Jump to content

Author Archive

Parallel Computing Discussion at ACM in Chicago

Friday, June 27th, 2008

Recently I had the opportunity to address the Chicago chapter of ACM (Association of Computer Machinery) on the subject of parallel computing. In addition to giving me the opportunity to make a Star Wars reference or two, Yoda
it was a very interesting conversation on the subject. About a quarter of the attendees were from the financial services industry and close to half of the attendees are interested in using GPUs.

The main point of my talk was to give an overview on current trends in the industry and to discuss a model for parallel computing in software development. Most parallel programming has been focused at the task and data level with tools like OpenMP and MPI. Data parallelism continues to be very important, but I suggest that there is a higher level of granularity in parallel computing - Service Parallelism.

Service Parallelism is essentially the intersection of SOA and parallel computing. Rather than taking a loop or function inside of a program and making a parallel you take a whole service and run multiple instances of that service (loops can be parallel too, they are just running inside the service).

There are several advantages to this:

  • If you already have services there is little to no recoding required.
  • Changing service parallelism means a change in configuration not encode so ongoing maintenance is much easier.
  • Service parallelism separates the parallel aspect of the application from the logic so your application developers don’t have to be experts in the parallel model.

If you are interested in a review of industry trends on multi-core CPU, GPU, and ideas for software parallelism, take a look at the slides on the Rogue Wave web site.

Intel’s ‘Ct’

Wednesday, June 25th, 2008

Intel recently announced that they are working on a new programming language specifically designed for multi-core CPU hardware - called ‘Ct’. Ct is ‘C’ for throughput, and is essentially the C programming language with extensions.

It is similar in many ways to CUDA from nVidia and Brook+ from AMD, although Ct is for CPUs and CUDA & Brook+ are for GPUs (see earlier post re: GPUs). This is likely to be a good thing for software developers who are working on getting existing and yet-to-be-written software to scale appropriately on multi-core hardware.

Ct uses the combination of a compiler and runtime to take much of the burden of parallelism from the software developer. For example, the basic tasking unit is a ‘future’, which can be executed now or later and receives data consistency guarantees from the runtime. You can find details on how it will work on Intel’s site.

It does, however, highlight again the split that has occurred in hardware design - all vendors are going multi-core/multi-thread, but some are taking more of a homogeneous CPU approach, and some are taking a more heterogeneous GPU (accelerator) approach.

For software engineers, this means productivity challenges (”how do I get my existing code to run on GPUs, how do I get it to scale on multi-core CPUs”) as well as portability issues (”I don’t really want to maintain code written in CUDA, Brook+ and Ct, even though they are all variants of C”). This is all related to the Multi-core Dilemma that I have written about previously on the Intel Blog site and elsewhere.

Rogue Wave’s ‘Hydra’ product uses Service Parallelism to address the Multi-core Dilemma on CPUs, and we have worked with Intel a great deal on this, as it is complementary to Ct and other Intel technologies like TBB.

We are also working with both nVidia and AMD on Project “Gazelle” to address GPUs. “Gazelle” will generate optimized code for nVidia and AMD GPUs, and could do the same for Intel Ct in the future to ease migration for existing applications.

Rogue Wave / AMD partnership for Multi-core CPU and GPU

Wednesday, June 11th, 2008

Expansion of our Relationship

Rogue Wave and AMD have a long-standing partnership to advance C++ software development on AMD’s Opteron CPU platform. I’m excited that our two companies have recently announced an expansion of that relationship to make it easier for software applications to take advantage of the additional computing power available on multi-core CPUs and on GPUs (graphical processing units).

For several years, increased performance from all hardware vendors has largely come from additional “cores” instead of faster clock speeds. This provided significant additional processing power, but most existing software doesn’t adequately take advantage without significant modification. This is called the “Multi-core Dilemma”.

Challenge and Opportunity

The Multi-core Dilemma is both a challenge and an opportunity that will increase rapidly as the number of cores and threads continues to increase. A typical GPU already has 128 threads. For applications that lend themselves to parallel processing, this can mean a significant gain in throughput.

Although GPUs have the potential for even greater processing power than their CPU counterparts (for certain applications), there are additional challenges as well:
1. Developer productivity - use of the software tools requires special training.
2. Portability - software written for GPUs will not run on other GPUs or on CPUs.

Our partnership is designed to address both of these issues, and to close the gap between hardware and software that has been widening over the past few years.

Although both companies are committed to broadly applicable solutions, our initial focus is on the financial services industry, where much of the activity is already happening.

What are your experiences with multi-core CPU and / or GPU? Please post a response with your thoughts.

Matrix multiply in parallel - is a different result ok?

Monday, June 9th, 2008

When moving a production application from one system to another, extensive testing is generally done to ensure, among other things, that results from the new systems agree with expected results from the old system. This is true whether changing operating systems, hardware, or anything else.

For example, many financial services firms have moved from Unix systems to Linux for a variety of good reasons. When moving quantitative analysis applications, they had to verify - to multiple significant digits - that calculations done on Linux would not be different from what they got in the old system.

Different is not always wrong - sometimes a different new result is “more correct” - but it takes effort and time to verify that and make sure.

Now many companies are moving from sequential processing to parallel processing. This can actually be a bit trickier. Certain mathematical algorithms calculate differently in a parallel environment than in a sequential environment. This may not have anything to do with the implementation, it is often just the nature of the numbers.

Matrix multiplication is an example of this. Since matrix multiplication is not commutative in most cases, multiplying a matrix in parallel can result in a different outcome because the multiplication and subsequent addition is necessarily done in a different order.

Here is an example (thanks to David Haney):

Given two 4 x 4 matrices (A and B), you would normally calculate the result in 0,0 as:


(A00 * B00) + (A01 * B10) + (A02 * B20) + (A03 * B03)

If you change the order of operations though, like the following (note the parens):


((A00 * B00) + (A01 * B10)) + ((A02 * B20) + (A03 * B03))

Then you might see different results, depending on how the floating point rounding turns out. You probably won’t see much skew at this scale (especially if all of the numbers are roughly the same magnitude), but if you were dealing with an 1024 x 1024 matrix, you would probably start seeing some variation.

There are some algorithms for breaking up a matrix multiply that allow you to maintain equivalent results to sequential, but still at least partially execute the code in parallel, but from what we’ve seen those methods look like they’re less efficient than algorithms that do some amount of reordering.

The outcome, although different, may not be any less “correct”. But that difference may have business consequences that need to be planned for. Regardless of the software programming model and technology used to go parallel, this is something to be mindful of.

What is going with this GPU stuff?

Friday, April 25th, 2008

There is a lot of buzz in the industry today about the use of graphic cards for general computing, a.k.a. GP-GPU. Essentially, as clock speeds for CPUs have slowed down, we have all been scrambling to go parallel. CPU vendors have introduced dual-core, quad-core and more to increase performance, introducing what we at Rogue Wave termed the Multi-core Dilemma.

In addition, many people have been looking at specialty hardware for additional threads. One of the most interesting ones that has gained a lot of attention lately is the graphical processing unit (GPU). These have been used for years as graphics accelerators for video games and other media.

Recently people have been using them for general purpose computing since they are so good at crunching numbers (after all, rendering graphics is all about advanced math). This led to the term GP-GPU (general purpose graphical processing unit). It also happens that this hardware is very parallel. A typical consumer grade graphics card has about 128 threads. That’s a lot of calculations in a small space - no wonder they are so attractive. And for certain applications, it’s not uncommon to see anywhere from 10 to 30x throughput increase over a dual core CPU.

However, the software development environment for GPUs has several problems:
1. GPU hardware is difficult to program. This is improved from a few years ago, but it’s still much more difficult than a typical CPU environment and lacks the robust tools we’re all used to.
2. APIs for this hardware are proprietary to the vendor hardware. This is something we hear on a regular basis as an inhibitor to adopting GP-GPU.

And new development isn’t the only thing - probably more important right now is how to make existing code run here without rewriting everything.

All of this gets to the heart of parallel computing in general. The progress of software development in general depends on our ability to do parallel computing, and do it well. GP-GPU programming is a window into the world of challenges - and opportunities - that lay ahead.

Architecting your Concurrency Model

Friday, March 7th, 2008

Abstraction of concurrency from software application logic

(a.k.a: “I feel the need… the need for - Concurrency…”)

From Monolith to Component Architectures

In the olden days of computing, everything was combined into a single lump of software - including operating system functions, application logic, data, user presentation. After some time, we realized that creating a separation between the hardware and the software application would be useful, and the operating system was born. Some time later, we realized that managing the data was a distinct task, and databases became a separate entity. Some time after that, we decided that it would be a good idea to split out the user interface, and the era of client/server began.

So it went, the software industry continued to evolve our architectures into more componentized and modular arrangements.

Architecting for Concurrent Computing

Now that multi-core architectures are common and the need for concurrency in software architectures well understood, the next question is how best to architect our applications and how best to structure development organizations to support them. For the moment, let’s talk about the development organization. In the past, concurrency was a task limited to edge cases, but now ubiquitous multi-core hardware is making it much more common.

There are two ways to handle this. First, train all your software developers to be experts in concurrency (yikes…). Second, have concurrency specialists to focus on making your applications parallel. Since training all of your software developers to be experts in concurrency seems daunting, it would seem that the second option is better. But if concurrency is now required through significant portion of your application architecture, how can only a few engineers or architects be responsible for it?

The answer lies in abstracting the concurrency model from the application logic. While this may not be possible for all aspects of your concurrency model, it certainly should be for some - especially for well-defined services (in the SOA definition of the word). Services can be run in multiple instances across multiple threads and multiple cores (even multiple servers) to achieve a significant degree of concurrency without rewriting the code inside the service.

To the extent that you can do this, you can allow your application developers to focus on application logic and your concurrency experts to focus on concurrency. In addition, you can gain quite a bit more flexibility to your application architecture. The more your concurrency is abstracted, the easier it is to change without affecting the application logic. It’s really just an extension of the idea of loose coupling.

The Concurrency Model

This means that the application developer does not have to be the primary owner of the concurrency model. The application developer is able to focus on the application, and a concurrency expert can design about the concurrency model, just as a DBA does with the data model - I wouldn’t be surprised if we start to see something like a Concurrency Model Architect (CMA?).

Anyway, the whole thing relies on being able to separate your concurrency model from your application logic. More on that later.

C++ in 2008?

Sunday, February 10th, 2008

Joe Pruit posted a blog responding to some thoughts from Rogue Wave on C++ in 2008. We know that not everybody sees C++ in the same way we do, but we issued the press release was to challenge some of the conventions of how developers and architects perceive C++. Joe’s comments are worth reading, and are also worth responding to.

Enterprise applications do widely use Java and .NET, no question. Managed languages are the right tool for the job for a wide variety of applications. C++ however, (plus C and other native languages) continues a solid, and in many cases, growing presence in several areas:

    High performance: for applications that require low latency and/or low memory usage, a large number of architects are choosing C++ in favor of managed languages. These are common in Financial Services, Military and many other applications. An interesting recent example is the team from Carnegie Mellon that won $2mm in the DARPA Urban Challenge for building an intelligent robotic car. According to the project lead, “Everything we did was written in C++.”
    Embedded: For embedded and mobile devices, one of the fastest growing areas in computing, C++ is the language of choice over both Java and .NET. Lower memory, tighter power and heat limits and other requirements make C++ a natural choice for optimized application development. According to the Gartner Dataquest report ‘User Survey Analysis: Embedded Software Development Tools and RTOS, North America, 2006′, “For application development, C and C++ are the most popular development languages. Surprisingly, Java usage dropped in 2006.” (Daya Nadamuni, 13 September 2008)
    Existing apps: And, of course there are billions of lines of existing mission critical applications built on C++ in enterprises around the world.

These are almost all mission critical, and many are also legacy. In my experience, many if not most mission critical applications are also legacy. My favorite quote on this: years ago, one of my dev managers once quipped that the definition of legacy is “anything that has gone into production…”

As far as the overall viability of the language, there are a few interesting points worth considering:

    1. C++ developers are commanding strong salaries, in many cases higher than developers with Java and .NET. I don’t know for sure, but I suspect it’s a combination of C++ resurgence and a smaller number of C++ developers available.
    2. Universities are starting to reconsider their move to teaching computer science students in Java. Many universities continue to teach in C++ so that students have a solid foundational understanding of how systems work. Some professors contend that teaching Java has contributed to a decline in computer science skills.
    3. The language has a vibrant (and broad) standards community and continues a solid evolution. The C++0X standards effort is creating the next version of the language. The proposed enhancements include modern concepts from Java and elsewhere, but still maintain what makes C++ unique and different.

Although Java and .NET continue to have significant mindshare for mainstream applications, C++ is the language of choice for many architects in new development projects - probably more than most people think.

HydraSDO for Databases Launched

Tuesday, January 29th, 2008

I am very happy to announce the General Availability of HydraSDO for Databases 1.0.  We already have a “Java Edition” so this completes the set by providing support for the C++ SDO API.  The Service Data Object specification is particularly important for database access because it is the only industry standard API designed specifically for accessing data in Service Oriented Environments. HydraSDO for Databases provides a Data Access Service (DAS) for relational data that is built on SourcePro DB. This of course means that it is very, very fast – we expect it to be the fastest available in the market. It also means that existing SourcePro DB users can confidently migrate their database applications to a Service Oriented Environment. The database tools provided mean that there is no database access programming required.  HydraSDO for Databases also integrates seamlessly with HydraSCA. This is particularly important because it can be expected that relational data will be a core element of applications built using a Service Component Architecture, just as it has been in older application architectures such as client-server. HydraSDO for Databases is freely available for evaluation at our Download Center. 

HydraSDO for XML 2.2 Launched

Friday, January 25th, 2008

I am pleased to announce the General Availability of HydraSDO for XML 2.2. As the product matures and gains more widespread use, some important use cases are emerging: 

Parsing Very Large XML Documents - The industry trend of increasing large XML documents has resulted in unexpected problems with applications slowing down due to slow parsing times and increased memory usage. HydraSDO for XML includes an XML parser that has been designed to quickly parse very large XML documents, which provides an immediate boost in application performance. Using very large XML documents to share data between applications is sometimes referred to as Very Large Messaging - VLM. As a rough guide, an XML document is considered large when it is over 10 MB and very large when it is over 100 MB. The XML schema complexity is also, of course, an important factor. The problem with most XML parsers is that, unlike HydraSDO for XML, they do not scale linearly.

Efficient XML Parser Memory Usage - One of the special characteristics of the XML parser that is shipped with HydraSDO for XML is that is optimized for low memory usage. This can provide an immediate performance enhancement for some applications as well as generally reducing hardware resource requirements. In extreme cases, for very large XML documents, it can prevent applications from crashing due to the unexpectedly excessive memory usage during parsing.  As a rough guide, depending on the XML document complexity, HydraSDO for XML uses about half the memory of a typical parser.

Standardized Access for Custom Data Formats - Writing data access code for custom data formats can be time consuming and can require specialist knowledge and skills. The HydraSDO SDK can be used with HydraSDO for XML to develop the necessary custom DAS for reading and writing custom data. Providing the single, standard SDO API in both Java and C++ for disparate custom data formats saves development time and costs.   

SDO DataObject Streaming - With HydraSDO for XML, you only have to parse an XML document once to take advantage of the ability to stream the SDO DataObject between computers. Using this capability, you remove the need to parse the document repeatedly while maintaining the same simple XPath-based API to work with the data (this feature is called Distributed SDO).  

Sharing Data Between Java and C++ Applications -  HydraSDO for XML uses the SDO DataObject to efficiently share data in memory between Java and C++ applications. Depending on the nature of the shared data, the feature works well for tightly coupled applications where it significantly reduces the memory footprint compared with other methods such as SOAP or CORBA (this feature is called Shared Memory Access - SMA).

Vertical Industry XML Document Handling - the internal HydraSDO for XML test suite includes a wide range of standard industry XML documents, particularly financial services. The test suite is exceptionally extensive because Rogue Wave Software has a long history of providing high performance XML parsers aimed at enterprise developers with requirements for support for specific industry XML formats.  

Service Data Objects (SDO) Standard - SDO is the industry standard for data access in Service Oriented Architectures (SOA). The SDO standard provides access to disparate data formats through the common SDO API, which is available in both Java and C++.  SDO is the data access standard for Service Component Architecture.

HydraSDO for XML 2.2 is available for download at the Download Center.

SourcePro Edition 10 Launched

Tuesday, December 18th, 2007

I’m very pleased to announce the General Availability of SourcePro Edition 10, a very special event for Rogue Wave Software.

SourcePro is our flagship product and the main reason why we can claim to be the market leader in enterprise C++. We are launching the tenth version of SourcePro - a significant milestone that very few application development products reach. It’s a testament to the quality of the product and a demonstration of Rogue Wave’s long term commitment to the product, especially given the recent market trends moving in favor of C++  - most notably, new language standards coming in C++0X and giving C++ the same status as Java in new SOA industry standards such as SCA and SDO.

The full GA version of the product is available for customers now and the evaluation version will be available on our Web site in February.