John Ratcliff's Code Suppository

A place where I insert my code into the anus of the Internet.

 

Friday, December 09, 2011

New Implementation of the HACD Algorithm : Julio Jerez


So, it turns out that Julio Jerez, author of the Newton Physics Engine, did his own implementation of Khaled Mamou's HACD algorithm (hierarchical approximate convex decomposition).

The big problem with Khaled's version is that it was extremely slow due to how he went about performing a distance calculation.  Julio's implementation does not suffer from this, and is much, much faster.

In addition, I added my own merge-hulls code as an optional post process step.

You can download the new version from this Google Code Project page.

Once you are synced you can try running say the complex segmented bug model.

Run: TestHACD hornbug.obj

This will produce a 58 convex hull model saved out to the file 'ConvexDecomposition.obj'

If you want one that is even more precision you can try something like:

TestHACD hornbug.obj -c 0.1

This sets the concavity value to 0.1 from the default of 0.2.

This will produce 73 convex hulls.

However, if you want to run the post-process merge phase, you can reduce the number of hulls by the following.

Run: TestHACD hornbug.obj -m 40

That will recursively collapse the convex hulls (starting with the smallest to the largest) until there are no more than 40 hulls left.

Take a look at the resulting file 'ConvexDecomposition.obj' and you will see how nicely the convex hulls were collapsed.

This version is *extremely* fast and can be used even on meshes with high vert counts without taking forever.

While, in general, Julio's implementation seems to be working great, we have found a couple of edge cases it fails on.

In the project you will find two wavefront OBJ files, 'box-thick.obj' and 'cylinder.obj'.

'Box-thick.obj' is a hollowed out box, which Khaled's implementation handles correctly, but Julio's fails on.  Same thing for the cylinder.

Hopefully we can get this tracked down soon. 

To use this source code in your own project, just copy the sources from the sub-directory 'hacd'.  The rest of your application only needs access to the folder 'hacd\public'.

Modify the header file 'PlatformConfigHACD.h' to change the global memory allocation routines if you want them to vector to your own methods.

Thursday, November 03, 2011

GatherPictures : A console app to gather all JPEG files into one location without duplicates


Today I'm providing a little console application that I wrote for my own personal needs.  On the theory that if I found it useful, probably some other people might find it useful too, I'm making it available here.


The tool is provided as a ZIP file.  It includes both the source code and the Windows console application.  It is designed to be run from the command line, so if you are a Windoze user, this might not be the tool for you.

So, here is the problem I was trying to solve.  I think many other people may have this problem too.  Over the years you transfer files from various memory cards, digital cameras or Iphones.  You have them on a bunch of different places on your hard drive.  You have them on your computer, you have them on your wife's computer you have them on old CD-ROMs or external USB drives.

Then, one day, you would like to simply get a copy of all the photographs you have scattered around all these locations into one spot, without any binary duplicates.  This tool does not do an inexact match, it won't detect images which are the same, but at different resolution or have been cropped or rotated.  There are commercial tools that can perform this operation fairly well.  Personally I use Visual Similarity Duplicate Image Finder.

This tool also does not do image resizing, there are many commercial tools that can do that as well.

What the tool is designed to do is collect all of my pictures, from many different sources, into one single flat directory so that I can then, easily, run these commercial tools on them to create a nice clean set of non-duplicate images ready to be copied onto a memory card to put into a digital picture frame.

There is nothing particularly special or magical about this source code.  It's nothing I'm proud of in any particular way, it's just something I hacked up real quick and seems to work.

Now, here is how you use the tool.  Let's say you have a bunch of pictures in a directory called 'MyPhotographs' and under that directory are many sub-directories.  Now, let's say you have another directory called 'WifesPhotographs' which may, or may not, have some of the same pictures duplicated.

The way to use this tool is you create a folder where you want to 'gather' these images into.

Next, go to that directory and at the command prompt and run the GatherPictures.exe (should be placed somewhere that your default search path can find):

GatherPictures c:\MyPhotographs

This will find all JPG files in that directory and all sub-directories and copy them into your current directory giving each image a unique flat file name.

For example, if you had an image 'c:\MyPhotographs\vacation\img001.jpg'

It would get copied as the file name 'MyPhotographs_vacation_img001.jpg'

If after running 'GatherPictures' against 'c:\MyPhotographs' you run it a second time, then it will detect that all the files are the same and it won't copy anything.  On the other hand, if new photographs get added to that directory, it will detect the new ones and pick them up.

You can continue to run 'GatherPictures' against any directory and it will avoid copying any binary duplicates.

Once you have 'gathered' all of your pictures you may find that you have gigs and gigs of photographs.  For example, I had about 50gb of photographs in 38,000 files.  One of the issues I ran into is that when I tried to run some image processing tools on this directory they crashed because they had never been designed to handle that many images.  Also, there is the use case where you might want to break these up into say 8gb or 16gb chunks suitable for copying onto a memory card.

I added a new feature to 'GatherPictures' which will take all of the pictures in the source directory and copy them into sub-directories of a specified size limit.  The original pictures are left alone.

The usage for this is:

GatherPictures -split

So, if you wanted to split all files into 1gb sub-folders you would use:

GatherPictures -split 1024

(A gigabyte is 1,024 megabytes)

For those who are interested, here is how the code works.

At start up the first thing it does is scan all JPG files in the current directory.  It examines the size of each file and places it into a hash map.  It also builds a hash map of all file names.

Next, after scanning all of the files in the source directory, it does the following.

First, it makes sure the file name doesn't match one that has already been gathered.  It uses the previously built hash map to speed this up.  If it matches, it skips it.

If the file name doesn't match it then looks up the binary file size against the hash map of files by size.  If there is already one, or more, files of that exact size it then tests to see if it is or is not duplicate.

First, it reads the source file into memory and computes a 32 bit CRC.  Next it compares that CRC to the CRC of the previously registered files.  If it is the first time the file has been checked, it has to read the file into memory to compute the CRC.  However, it caches the CRC so this is only done once.  The next time it encounters a file of this same size, it can just use the cached CRC for an early reject.

If the CRCs match only then does it do an exact binary memory compare to see if the two files are, in fact, exactly identical.

Let me know if you find the tool useful or can think of improvements.  As I said, my goal was not to replace the functionality of commercial duplicate image finder tools but, rather, I just wanted a way to collate all my image files into one single location giving each file a unique file name based on it's original source directory location.

Sunday, August 28, 2011

Don't look a gift horse in the mouth



My friend Charles Bloom likes to rant on his blog, and I often envy him.  He lets his inner id loose in a way few others do. Personally I don't let out a good rant that often but, tonight I will.

Now, earlier in the evening I must admit that I had a few drinks and I also admit that as a rule most of my public rants are fueled by exactly that kind of behavior. But, tonight I don't think that has anything to do with it.

This blog is called the 'code suppository' for a reason. It is the place where I insert my computer software into the anus of the internet.

Now, I release open source code for a number of reasons. Let me take this opportunity to say what they are and in order of importance.

(1) The number one reason is that I do it for myself. Like many programmers I have my 'bag of tricks' code snippets which I use over and over again. No one wants to keep writing the same parser,
or similar piece of utility code over and over again. Many long time C++ programmers accumulate a personal library of goodies over the years and, frankly we publish the source for ourselves as much as anyone else. Chances are I personally go back to the code snippets hosted on this site probably more often than anyone else.  I never want to ever again have to write from scratch how to compute the nearest intersection point between a line segment and another arbitrary 3d point in space.  I never want to write again form scratch code to read a .INI file, handle key-value spec properties, parse an XML file, etc. etc.  So, I post those code snippets primarily for my own use.  If anyone else finds them useful, that is just a bonus.  This is, I believe, pretty much what my friend Sean Patrick Barret does.  Sean is an incredibly generous guy who has released more useful free source code snippets used by more people than almost anyone else out there.  For example, I think Sean's PNG reader/writer code has probably found its way into literally thousands and thousands of software projects.

(2) The second but, to me, the most important reason that I publish open code source is out of a sense of moral obligation. Over my 30+ years as a professional software developer I have benefited greatly hundreds, if not thousands, of times from the open source work of others. And when I say 'open source I don't mean that bullshit GPL garbage. I mean really, truly free open source conforming to the BSD/MIT license. Truly free source code in every way with with no restrictions on it other than "you can't sue me if it doesn't work."

(3) The third reason I release stuff open source is in the hopes that some other programmer might find bugs in the code and contribute fixes and/or improvements.  There is no better example of this than when Khaled Mamou took my approximate convex decomposition algorithm and used it as inspiration for his dramatically improved hierarchical convex decomposition code HACD.  It gives me a tremendous amount of satisfaction that my work inspired someone to do something even better.  And, guess what, I'm returning the favor by optimizing HACD so that it will run several orders of magnitude faster than it does today.  This process of using open source to work together as a community, helping one another out, is something I find very gratifying to be a part of.  Or, at least *used* to find gratifying...

(4) The fourth and final reason that I release open source code is for educational purposes.  I think of it as a service to the community and also as an educational exercise.  Over the years, when I have tried to get something useful in terms of available open source code I have often run into major challenges trying to extract just the specific piece of functionality that I want.  Many software engineers write their code on top of rather massive template libraries; especially for data structures and mathematics.  So, if you want to get just a particular routine (like say computing the best fit plane equation to a collection of 3d samples), you cannot simply get just that routine.  Instead, you have to adopt 50 to 100 thousand lines of source code, including a massive library and template framework to get access to that functionality.  This isn't a problem if you want their whole library, but it is is a problem if you just want that one routine or want to refactor it to work with your own data structures and math template library.

My goal is to help educate the public in what I personally believe is the ideal way to share source code with others over the Internet; as 'code snippets'.  With most of my releases I try to set a good example and educate others.  I have received enough positive feedback over the years to know that a lot of people really do appreciate it.

Now, it's time to get back to the rant.


I have edited this post from it's original and will replace that content with one simple request.

If someone gives away free open source, and you have questions, feedback, or suggestions for improvements, please send a polite and friendly email to the author.  Chances are that author will try to be of help.  If you send the author a rude and insulting email, that's probably not going to be very helpful or put them in a mood to help you.

Also, if you don't get an answer to your email, don't be upset right away.  Like many people on the Internet, authors of open source work can receive hundreds of email a day, and it's easy for one of them to either get marked as spam or the person intends to give a reply, but it slips through the cracks.

And, as your mother always said, if you have nothing nice to say, you can always say nothing at all.

Thursday, August 18, 2011

Raycasting using AABB (Axis Aligned Bounding Volume) trees

Note: This photograph has nothing to do with this post; I just love showing off my new puppy Winston!

Today I am releasing an arguably somewhat useful code snippet which can preform high-speed raycasting against an arbitrary triangle mesh.  Now, this isn't particularly unique.  There are many libraries on the internet for doing this; however this may be the only one presented as an actual 'code snippet'; meaning just a single header file and a single relatively modestly sized CPP implementation.

If you need a robust general purpose raycasting and collision system I strongly recommend you use OPCODE written by my friend Pierre Terdiman.  OpCode is a phenomenal library that is extremely high performance.  You can also use the PhysX SDK from NVIDIA; which has a version of OPCODE internally which is also very high performance as well.

However, OPCODE does a whole lot more than just simple raycasting against a triangle mesh; it has a lot of other features and is distributed as about 100 source files comprising tens of thousands of lines of code.  Again, this isn't an issue if you are working on a large project and want to add the library in.

All that said, sometimes you are working on a small utility program or tool and you just need a quick and dirty way to raycast against a triangle mesh to speed up an algorithm.  In this case, you might find my implementation useful since you only need to add two small source files to your project.

This code snippet might also serve as a nice educational tool for students trying to understand how to build and use an axis-aligned-bounding-volume AABB tree.

You can download the source code from this Google Code project page.


RaycastMesh

The project includes a sample application which will load a 3d mesh, perform a million raycast operations, and save out a PNG image file showing the results.

However, to include it into your own project you need only the file 'RaycastMesh.h' and 'RaycastMesh.cpp'




Monday, July 11, 2011

Revision to TestHACD


After making my post last week Khaled Mamou has made some bug fixes to his HACD implementation that takes extended mesh connectivity into account.  In the previous version, if you ran convex decomposition on a single input mesh that was actually comprised of say 10 separate unique sub-meshes (I like to call them 'mesh islands') then it would always produce a minimum of 10 convex hulls.

Khaled fixed this by adding a new option called 'connectivity distance' which will merge mesh islands if they touch within this distance.

I integrated Khaled's changes into my depot as well, and also removed my hack solution which performed hull merging to get around this previous problem.

Right now my code and Khaled's are in sync, but my code drop is significantly refactored since I made it optional to build and run it either with or without the standard template library.  Perhaps we can combine our efforts sometime in the future but, for now, I will just continue to manually integrate changes.

My current check-in adds support for my MeshImporter plugins which can import a wide variety of graphics file formats.  I will have a new MeshImporter plugin that will support COLLADA soon. This is a windows only feature; for non-windows developers there is an option to just compile in Wavefront OBJ support.

Also, my current code drop has the beginnings of the automatic skeleton generation code.  As I find time this week, I will try to complete it.  This code will take an arbitrary triangle mesh, perform HACD on it, and then infer a skeleton at the points wherever two convex hulls touch each other.  As a final option it can auto-generate a skinned mesh from the original non-skinned artwork.  This will allow you to automatically create a ragdoll or psuedo-softbody from any arbitrary mesh.

Here is the link to my depot on Google Code.  TestHACD

Here is the link to Khaled Mamou's official copy of HACD on SourceForge.

Thursday, July 07, 2011

HACD : Hierarchical Approximate Convex Decomposition by Khaled Mamou Sample Application


I have just created and made available a test application for people to use/experiment with Khaled Mamou's excellent HACD system.

TestHACD on Google Code

Due to requirements that I have at my job, I had to refactor the code so that it did not use any of the standard template library (STL) and 100% of all memory allocations had to be trapped by a user provided allocator. (i.e. no global new/delete, malloc, free).

One of the more challenging things to factor out regarding the STL was the use of the STL set and map containers.  To accomplish this what I did was refactor a copy of the EASTL (written by Paul Pedriana).  Even though Paul's EASTL is a dramatically smaller code base than the full STL implementation, I went a step further and refactored his code so that I extracted *only* the STL map/set containers and nothing else.  I then put this refactored code into just three small source files where all memory allocations and data types could be trapped and revectored.

There is a single header file in the provided project 'PxSimpleTypes.h'.  You can modify the data types and memory allocation macros in this one header file to make the source conform to the coding standard of your own game engine, or at least it provides a single easy location to do some global find/replace on the source to make it conform to your internal standards.

Khaled's algorithm produces absolutely amazing results much better than my version ever did.  However, one thing I have yet to be able to get his to do is produce just one our two hulls.

With the provided sample mesh no matter what I set the concavity threshold to, no matter how high, I still get 14 convex hulls.  I don't know if this is a limitation of his algorithm or I am not exposing all of the correct parameters at this time.

If you wish to make improvements to this code, please let me know and I will add you to the Google Code project so you can make changes.  Also, sometime in the next week, I am going to revise the code to auto-generate a skeletal deformed mesh from the HACD output.  I will add this to the TestHACD open source project.

Be aware that Khaled's algorithm can be extremely CPU intensive.  With triangle meshes with very large vertex counts; it could possibly take an extremely lengthy period of time.

Saturday, June 04, 2011

Overly Optimistic?



I just had someone post a comment on the blog about my overly optimistic view of my new project. I figured I would post my answers so everyone else interested could read them as well.

>>overly optimistic/ambitious?

That's not an unfair assessment.  However, I do have confidence in a few things.  I'm confident that I can do this project; based on the fact that I have successfully completed similar projects in the past, more than once.

I am also confident that I am managing technical risk.

For example, one of my first requirements is not to get even any game play working.  My first requirement is to get a server written that can manage 1,000 simultaneous connections and to test it with bot scripts which actually create 1,000 connections and move characters around the world.

The 'M' in MMO stands for 'massive' and if I can't achieve that, then nothing about the game client really matters.

The key to success, in my view, is to provide a unique truly massive gaming experience; the kind you could only ever get, to date, from 'Planetside' and do better.  Having been the lead software engineer on 'Planetside' I would like to think I know a little bit about this.

>>have you already prioritized a list of features so that you know what to drop when you don't meet deadlines?

Yes.  I'm using Agile/Scrum methodology and Rally to track everything.

>>(and have designed play tests to know that these features aren't make-or-break?)

Well, we aren't doing the project that way.  This project will be in a continuous state of open beta from the day we have enough server infrastructure in place and two people can walk around the game world.  The game will never officially 'ship' in that sense; but rather features will continuously be added, and game design refined through play test over time.

This is the only way any truly successful multiplayer game gets developed.  If you do not always have a playable game, during the entire development process, you are taking an absurd risk in my opinion.

And, while we will never officially 'ship' the game, we will officially start charging people money for beta testing.

However, even then we have a lot of flexibility.  One of the key things to test is the monetization of the game, and what works best.  So monetization will be play tested and balanced just like everything else.

My design is that, from the stand point of the game mechanics, you will never play the game for 'free'.  What will be free are credits to play the game, which I will hand out with abandon to anyone who is willing to help us test and tune the development process.

Once the game reaches a level that it is truly fun to play, and I feel worth charging money for, at that point I will just stop giving away so many free credits.

For example, let's say you want to beta test.  Well, I can give you a $100 worth of credits to play the game (enough credits to play the game a ridiculous number of hours for five months).

Once those credits run out, if that guy is still in beta test, he is going to want some more credits to keep playing the game.

If I haven't created a game that is enough fun to accomplish this, then I will have failed.

However, this game is starting with a known, proven, and completely addictive game design.  So, my confidence level is fairly high that I can find a decent sized group of people who will enjoy playing it and be willing to pay a modest amount of money for the privilege. I'm not talking about charging them some absurd amount of money in subscription fees, or annoying the shit out of them with micro-transactions and/or advertising.  My initial plan is simply to charge very modest some of money per time played.  Something roughly on the order of 25 cents for half an hour.  But, this too, is all subject to change, refinement, and tuning as testing plays out.

This game is going to provide a unique gaming experience you simply cannot get anywhere else in any other form.

>>anyway, good luck with it all! Keen to see your demo in 2 weeks

Thanks, looking forward to getting it out!

Friday, June 03, 2011

Progress update





I have made some big decisions to move this project forward in the past 24 hours; and these decisions involved committing thousands of dollars out of my own savings to get things off the ground.

First, I have formed a limited liability corporation, under the name 'Duatiu LLC', which will have 10,000 shares of stock to distribute to partners.

Second, I filed a trademark application on the name 'Duatiu'.  I expect no difficulties since the name is a coined term and doesn't show up anywhere on any search engines.

Third, I have come to an agreement with partner Shawn Presser to work on Duatiu *full time* for the indefinite future.  Shawn is being reimbursed for his time spent on this project as a combination of shares of stock and a modest amount of cash to help defray his own personal expenses.

This is a pretty big deal; having a talented software engineer able to work on the project full time should go a long way towards pushing the game to the next level.

Our current ball park milestones are as follows:

  • Within Two Weeks : A downloadable demo of the Duatiu game world.  You will be able to fly a free camera and walk an avatar around the game environment.  The visual production values will be close to the final product.  Using precomputed radiosity, nice diffuse textures, and a high resolution detail texture, we expect the game environment itself to be very visually stunning.  The game engine is being written exclusively in C++ for OpenGL.  While it will initially run on Windows; it may at a future time run on a wide array of tablet operating systems.  (Perhaps of interest to the programmers who follow this blog, I will be releasing my radiosity lightmapping tool completely open source at the same time.)
  • Within Four to Six Weeks : Be able to connect to a server, hosted on Amazon.com, and walk around the game environment and see avatars representing other playrs.
  • Within Three Months : Be able to play a simple basic game involving calling in supplies, operating a shield, and shooting human opponents; all in a multiplayer environment.
  • Within Six Months : Functional AI agents running on the server; more of the actual game play implemented.

These are just back of the napkin estimates, but it does give you a general ideal of the scope and time frame we are thinking about.  I'm extremely excited that Shawn has committed to working full time on this project.  I expect good things to come from this soon!

Thursday, June 02, 2011

My new project 'Duatiu' is actually getting some traction after just 24 hours of announcing it!


Well, I have already received quite a bit of positive feedback in the past 24 hours about this project.  First, a number of my friends in the industry have promised to help me out and that has been amazing!

So far I have commitments of time from:

Shawn Presser : Who is writing the core graphics engine and has already been working on the project for several months.  Shawn's contributions to this project have already been amazing.  We are currently working towards our first milestone to provide a demo that will allow anyone to run a character around one of our game world environments sometime in the next couple of weeks.

Jared Freedman, An excellent game designer, developer, long time friend, and all around good guy.

Billy Zelsnack : Billy is a well known figure in the game industry and a long time friend.  He has volunteered to help with the game project when he can.

Charles Bloom : It helps to have good friends in the game industry and Charles, even though he is one of the top guys in the world he still took the time yesterday just to whip up a little bit of code I needed to solve a problem.  I see shares of stock in Charles's future.  Plus, he's a fellow Porsche fan so we have that in common.

John Miles :  What has John Miles done on this project?  Nothing.  Does John Miles even know that this project exists?  Probably not.  Have I even asked John Miles if he will work on this project?  No I haven't.  But, as a life long friend, I'm sure he will help me out when I need it.

Wajih-Halim : A recent college graduate in computer science who wishes to help with graphics special effects on the game.

Ben Hesketh : Runs his own successful game company and contacted me from this website because he thinks the project sounds interesting.  Ben has volunteered to contribute on the server code.  The website for Ben's company, Compass Engine, can be found here.  And a link to his recent game 'Bounty Island' can be found here.

Robert Sitton : Robert is a long time game industry veteran and was one of my key sidekicks when developing the game 'Planetside' for Sony Online Entertainment.

Joe Bizz : Musician and sound engineer and fellow UFO enthusiast.  So how that networking shit works?

RAD Game Tools : The guys at Rad Game Tools are long time friends of mine, Jeff, Mitch, David, Brian, Sean, and others.  They have always been extremely helpful in supporting me and my projects and it looks like I may be able use their latest product 'Iggy' to accelerate the development process for this game.  'Iggy' is an amazing product written by the the no less amazing Sean Patrick Barrett (one of the Gods of the open source community).  'Iggy' is an embedded Adobe flash engine that virtually eliminates the need to do any custom user interface programming.  The entire user interface for a game becomes a series of Flash/ActionScript/FLEX data assets.  This should greatly accelerate the development of the user interface for this game.  Of course, that means that now, instead of looking for a GUI programmer, I need an expert who knows how to use these tools; but that seems like a much simpler hurdle to leap.

This is a really good start to the virtual team, but I still need some more artists to join the project.  Hopefully when people see the caliber of people involved they will realize they are missing the boat if they don't hop on soon.

Tuesday, May 31, 2011

Duatiu ; A Massively Multiplayer Online First Person Shooter written by.....you?



For those who follow this blog, please check out the website and postings about a new project I'm trying to get off the ground.  Feedback, but more importantly, partners wanted.

Friday, May 20, 2011

HACD : Hierarchical Approximate Convex Decomposition by Khaled Mamou






Many people come to this weblog because they are searching on terms for 'Convex Decomposition'.  This blog post is to announce that there is a much, much, better open source implementation than the one I did.  I suppose I can take some tiny credit as my work appears to have inspired the authors of the new algorithm but that's not really saying much.  All credit where credit is due and the open source implementation released by Khaled Mamou is vastly superior to the brute force approach that mine took.  His is much more elegant and also solves a lot of extremely hard problems; such as objects with holes in them (something my algorithm could not deal with).  His also does not suffer with the problem mine had with excessive recursion depths producing hollow interiors.

You can find the paper for this algorithm at this location:

ftp://ftp.elet.polimi.it/users/Stefano.Tubaro/ICIP_USB_Proceedings_v2/pdfs/0003501.pdf


You can download the source to HACD from this location on SourceForge

http://sourceforge.net/projects/hacd/

With this announcement I will no longer be supporting my legacy ConvexDecomposition code in any way.  I am switching to using HADC in all of my projects.  I recommend you do the same.

Congratulations to Khaled and Faouzi who have done a fantastic job on an important problem.

One thing I will volunteer to the community is that sometime soon, I will create an open source tool using HACD that will auto-generate skeletal deformed meshes.

If you simply look at these various screenshots you will notice that the convex decomposition naturally produces a logical skeleton for an arbitrary piece of geometry.  Obviously, that skeleton isn't perfect compared to what an artist would create but it is certainly a useful approximate skeleton.

From previous work I have done on the 'create dynamics' project, I know that you can not only synthesize a skeleton but you can also auto-generate skinned bone weightings.  If you then create physical collision shapes which are constrained at these joint locations you can get an extremely convincing looking softbody object; with accurate collisions, in a very CPU cheap fashion.


























Thursday, December 02, 2010

K-Means Clustering Algorithm

I haven't uploaded a new code snippet in quite a while, not because I haven't been coding, I've just been coding in a certain unnamed behemoth game engine that doesn't lend itself to writing anything in 'snippet' form.
However, I am working on a new project for a short period of time and yesterday I needed to solve a problem that, apparently, lots of people have solved before.   I have a large set of data points (in this case representing objects at different scales) and I needed to reduce it to a smaller subset of the (N) number of most significant sizes.

This is a common problem, and many developers have seen it when trying to reduce a color-space.  (For example reducing an image with say 16k colors to only 256 colors used to be a common thing people needed to do).

One of the more elegant solutions to this problem is by building an octree; at least one version of this algorithm is called 'Wu Quantization'.  This was popularly presented in Graphics Gems.
What I needed yesterday was something very quick and dirty and, that turned out to be the K-means clustering algorithm.  You can read about it here on Wikipedia and here is another link comparing it to a different algorithm.

The algorithm itself is very straightforward.

Let's say you have 100 input data points, but you want to reduce them to say 10 clusters where the center of each cluster represents the mean of the surrounding points.

Step #1 : You seed the 10 clusters with the 10 sample points taken from the input data.

Step #2 : You set an temp array of counts to zero and temp points to zero for each cluster you are building.

Step #3 : You iterate through each of the input data points.  For each point you figure out which of the current clusters it is closest to.  You then add that point to the new set of centroids and increment a counter.

Step #4 : You accumulate a total distance error of each point from its nearest cluster.

Step #5 : You compute the average center position for each of the centroids and assign them as your new clusters.

Step #6 : You compute the difference between the distance error on this pass and the distance error on the previous pass.  If the difference in the error is less than a tolerance passed in, you are done.  If the error is still converging, then you go back to step #2 and continue the loop.

My implementation also added the ability to prune the results.  Pruning is needed if all of the input points are at the same location, or there are fewer input points than clusters requested.

Another reason to prune the output is in the following example.  Let's say you have 100 data points that you want to put into 10 clusters.  However, all of those data points are right next to each other.

The standard K-means algorithm will still produce 10 clusters, since it is basically unit-less.  However, this implementation allows you to specify a tolerance threshold that says if two clusters are practically next to each other, simply merge them.

The main reason I am posting this code snippet is because of the absolutely abhorrent code snippet I found on the internet.

Click on this link to see the C implementation version that I found on the Internet.  Clear as mud right?

kmeans.c

Oh yeah, I would like to also add this.  There is a link to a C++ implementation on the Wikipedia page.  I downloaded that link and it uses not only the STL but BOOST as well!

Yikes.  Mine has a vanilla C style function interface, accepts pointers to 1d, 2d, 3d, or 4d arrays of data points as either a const float pointer or const double pointer and uses no STL, or BOOST, or any other fluffy nonsense.


Now, here is the header file and C++ for the version that I wrote.  I hope you will agree it is a lot easier to understand and use.  As always, feedback welcome.


kmeans.h
kmeans.cpp

Sunday, April 18, 2010

Understanding a .MAP file


My friend Andy Finkenstadt helped me figure this out.  Apparently the contents of a .MAP file are *not* in sorted order.  I will write a follow-up to this post later.

Saturday, March 27, 2010

Memory Tracking Utility


Today's code snippet is a very useful tool.  The default version is provided as a 32 bit windows DLL.  There is no reason the code couldn't be refactored to be used on other platforms, but that wasn't a requirement for my own personal use.

This tool is used to detect and identify memory leaks when an application exits or, more commonly, used to simply identify where all of your memory allocations are coming from.

What's nice about using this tool as a DLL, rather than source code, is that it doesn't directly affect any of your existing allocation code.

This code does *not* trap memory allocations.  You must already be doing that yourself.  What it does is allow you to track and report on those allocations.

Ideally your application would trap every single new/delete malloc/free performed.  This can be done in a variety of different ways, either by replacing the global memory allocator, or by inheriting a base allocator class.

I strongly recommend that every class in your code base inherit a default allocator class.  Then, in every location where you used to do a 'new' operator, replace that with a macro expansion which calls an inherited new operator which accepts class name, file name, and line number.

The value of the Memory Tracking report is only as good as the data you feed into it.  If you feed into it the class name, source code filename and line number for every allocation then the report will be able to give you some tremendous information identify when, how, and where, all of your memory allocations are occurring.

One problem with tracking all memory allocations is with templates.  When you do a lot of template metaprogramming it becomes difficult to know where the source allocations are occurring.  Sometimes it is useful to add additional parameters to your templates so you can better identify where the allocations are coming from.

So, here is how you actually use the MemoryTracker DLL in your application.

You need three things:

MemoryTracker.h
MemoryTrackerBinding.cpp
MemoryTracker.dll

The header file 'MemoryTracker.h' defines the pure virtual interface to the memory tracking utility.  It also declares a single global variable pointer to the Memory tracking interface and a routine to initialize the DLL.

In your code, before you do any memory allocations, initialize the MemoryTracker.dll by calling 'NVSHARE::createMemoryTracker("MemoryTracker.dll').   You should pass the name of the DLL on disk, including the full path location if necessary.

The routine 'createMemoryTracker' is in 'MemoryTrackerBinding.cpp' which will perfom a load library call on the DLL and initialize it; also setting the global variable 'gMemoryTracker'.

Once the memory tracker DLL is loaded, you can now track all memory allocations performed by your project.

First, set the logging detal level.  The options in 'setLogLevel' are:

logEveryAllocation : This will produce an enormous amount of data, and is not recommended unless you really need to track every single allocation at startup.

logEveryFrame : This will produce a summary of allocations and frees each frame.  For this to work you need to call the method 'trackFrame' once per logical 'frame' in your application.  Again, this feature is not recommended unless you are specifically wanting this data.

There are three methods for tracking your memory allocations.  They are:

'trackAlloc', 'trackRealloc', and 'trackFree'.

The enum 'MemoryType' designates what was the type/purpose of each allocation.  This will identify if it was a 'new' 'delete', a 'new array' or 'delete array', or a 'global new', etc.  The difference between MT_NEW and MT_GLOBAL_NEW, is to indicate whether the new operator was trapped or not.

When you track an allocation you can specify the class name, or simply a tag for the purpose/type of allocation, and the file name and line number of the original C++ source code.  You can do this by using the compiler preprocessor directive __FILE__ and __LINE__.

It is very important to note that these string pointers must be persistent for the lifetime of your application!  You cannot, and should not, ever pass stack or temporary variables in as the parameters!

One of the most valuable options when tracking memory is 'context'.  The 'context' allows you to organize your report based on its overall meaning within your application.  A good example of this is if you are working with a piece of middleware, like say the PhysX SDK, which allows you to provide a user allocator override.   Every time this SDK does a callback to your application for memory you can pass in as the context field 'PhysX'.  Then, when you generate the memory report, you will have all of the memory used by PhyX organized separately from the memory allocated by other parts of your application.

When you application exits, and after all memory allocated should have been freed, you can call the method 'detectMemoryLeaks'.  You pass it the filename of the HTML report you want saved.  Also you pass the option 'reportAllLeaks'.  If 'reportAllLeaks' is true then every single piece of allocated memory which was never freed will be reported in detail. 

In addition to using 'detectMemoryLeaks' to generate a report on exit, you may also call it at any time in your application simply to get a general report of overall memory usage.

If you use it for this purpose it is strongly recommended that you set 'reportAllLeaks' to false, otherwise you could generate a massive data report.

The real value of this utility is the report itself.  The output is sorted by class name, source code and line number, as well as context.  It comes with totals and subtotals and is some very nicely formatted HTML tables as output.  When I get a chance I will upload a sample report for you to take a look at.

The MemoryTracker utility is an open source project hosted on Google Code.  You an find it at the following link.

If you don't care about the source code to the memory tracker tool but simply want to download the tool try these links.

MemoryTracker.h
MemoryTrackerBinding.cpp
MemoryTracker.dll

Thursday, March 25, 2010

Any SIMD optimization specialist looking for some easy consulting work?


Anyone in the game industry who has expertise at optimizing SIMD code for PC (SSE/3D Now,etc.) Xbox-360 Altivec, and PS3 SPU, looking for some easy money doing a short term gig, let me know.

I have a handful of vanilla C++ routines which need to be optimized for these platforms using a shared math library.  It's not that I can't do it myself, it's that I am already over-committed on my time for a number of other projects.  Apparently I have the ability to pay a contractor to do some of the work, so this could be easy money for somebody who really enjoys doing low-level optimization work.

Send me an email if you are interested.

Sunday, March 21, 2010

MapFile : A tool to analyze .MAP files produced by Visual Studio and report the amount of memory being used by data and code


It really annoys me that I had to write today's code snippet.  I fully expected to be able to find an existing tool that could accomplish this basic task but after literally a couple of hours of searching the Internet for something I gave up.  In the end I had to write my own tiny console application and I am making it available here so nobody else feels like they have to write it again.

Also, if anyone reading this is really good at web stuff, it would be nice to have a website that would let you upload a .MAP file and then be able to immediately view the HTML content.

The data in this version is pretty complete.  For some reason when I parse all of the address data found in the .MAP file it doesn't add up to the exact byte count that is in the sections summary.  Maybe somebody from Microsoft can explain why that is.

The data could probably be a little better organized and it could really benefit from some hyperlinks.  That said, still all of the data you need is there.

A .MAP file is an ASCII text file that can be produced by the linker phase in Visual Studio.  Under Linux the GCC compiler can also produce a .MAP file but I have no idea if the two formats are compatible.  For my needs I was only concerned about builds that were done with Visual Studio.  This works even if you are building for XBOX as well.

A .MAP file is essentially a 'memory map' of the executable.  It describes by memory address every piece of code and data in the resulting executable file.

Unfortunately it it not very human readable.  The function names in the map file are 'decorated'; meaning they have been scrambled/compressed into a machine only readable format.  A DLL that comes with Windows called 'dbghelp.dll' can 'undecorate' these strings into their human readable form.

The MapFile console application will read in a .MAP file and output an HTML file called 'output.html' that contains tables representing all of the code and data sections in the executable.

The tables are sorted either by object name or by function name.

An executable is broken up into various 'section's and the HTML output is likewise broken up the same way.

You might ask, why would you need this tool?  Well, I suppose that is a fair question.  Let's say you are targeting an Xbox-360 console machine and your executable size is 10 megabytes.  That's just huge.  So, you are going to want to know where all of that data coming from.  Is it code and, if so, from what source files and what functions?  Is it data and, if so, what kind of data is it and where is it located?

I have set up a Google Project to host this tool.  You can find it at the following link:

MapFile on Google Code

Another reason you might want to download the source code to this tool is that it demonstrates how to use two of my most powerful code snippets.  The first is the 'InPlaceParser', which is an extremely high speed text parser and the second is 'HtmlTable' which is a code snippet that will output nicely formated HTMLTables with headers, sorted columns, and totals.

Both code snippets are really useful.

Tuesday, February 23, 2010

PhysX Rocket for 2.8.3


The physics demo application, PhysX Rocket, has not been officially released in some time.  However, I frequently get questions from people who still enjoy playing with it as a tinker toy.  It can also be useful as a physics scripting application.

Overall, Rocket is really showing its age.  It is very crufty, especially in terms of the source code.

However, if you want a copy to play with, here is the link.  It is simply a ZIP file.  It does not have an installer, and does not include source code.

www.amillionpixels.us/rocket.zip

NvCoreDump : Easily perform an NxuStream XML compatible core dump with PhysX 2.8.3


NvCoreDump is a windows 32 bit DLL which allows any PhysX 2.8.3 based project to perform an NxuStream XML compatible core dump in a single function call.

Rather than including all of the NxuStream source code in your application, instead you can simply demand load this tiny DLL and save it out.  The value, purpose, and benefit is to take simply add the ability to export the contents of the current PhysX SDK in any application by simply adding this tiny code snippet and the DLL.

This project is hosted on Google Code at the following location:

https://code.google.com/p/nvcoredump/

Simply add the source files 'NvCoreDump.h' and 'NvCoreDump.cpp' into your project.  Next add the DLL 'NvCoreDump283.dll' to your binary/executable directory.

Somewhere in your project, typically when someone types a command at the console, simply call the method 'nvCoreDump' passing a pointer to the 2.8.3 PhysX SDK and a string pointer of the file name you want it saved as.

The file will be saved in XML format as NxuStream.  NxuStream is a reflective object model representation of all of the internal data of the PhysX SDK, including scenes, meshes, constraints, clothing, and even fluids.

PhysX2Obj : Convert Physics Scenes into a Wavefront OBJ file


Today's code snippet might prove useful to anyone who works with the PhysX SDK.  This code has been compiled and tested with PhysX 2.8.3 but should work with earlier versions of the SDK with little or no modification.

In any PhysX application simply include the header file 'PhysX2Obj.h' and invoke the single method 'exportPhysX2Obj'.

You pass in a pointer to the scene you wish to export, the file name to save it as, and a bool indicating whether or not you want to export the data in world space or object space.

If you select world space, you will simply get one large Wavefront OBJ file with all of the static geometry in your game level.  You can then import this Wavefront OBJ into 3D Studio Max or other tools for testing purposes.

If you select object space, it will export all of the geometry in object space and annotate the file, with comments, indicating where each object should be instantiated in the world.

Some of you might be familiar with my previous work on NxuStream; which is an XML file format that allows one to capture the state of a PhysX SDK scene.  However, that format required a large code base and could only be used in the context of the PhysX SDK.

This tool is not a complete 'core dump' like NxuStream was; instead it is used primarily to export the gross game level geometry of an application so that it can be reviewed or debugged in another application.

I have set up a Google Code page to host this code snippet.  You can find it here:

http://code.google.com/p/physx2obj/

You can also download the source directly using these two links.  However, the Google Code project will always be the most current copy.

PhysX2Obj.cpp
PhysX2Obj.h

Tomorrow I am going to write a code snippet 'Obj2PhysX' which will read one of these object files, either in world space or object space, and instantiate it on the PhysX SDK.  The code will be written in a generic enough fashion that you can also instantiate the data through some other API if you prefer.

Thursday, December 03, 2009

Two more compression libraries added to the test compression tool



A contributor, Mārtiņš Možeiko, just added support for two more compression libraries. They are LZMA and FastLZ. Click on the picture above to see the performance differences between the various compression libraries.

As you can see it is pretty much a tie between several of the fast LZW style compressors. I would recommend you use the one which has the most liberal license and the cleanest looking source code.

I might, sometime soon, and some additional test cases. The current test case is a 10mb XML file; which I thought was pretty representative of a big ASCII file you might hope to get some good compression out of. I should also add some binary test cases as well.

Thanks all for the contributions. And, once again, here is the link to the compression test library.

Wednesday, December 02, 2009

Compression Test



Back in March of 2008 I made a posting about comparing various open source compression libraries. You can read the original post at this link.

I don't get a lot of feedback about this coding website. That doesn't bother me because I upload stuff here as much for myself as anyone else. Over the years I have taken advantage of a lot of source code other developers have written and posted on the Internet. It is for this reason that I feel a sincere debt that I should contribute back as often as I can.

I use the various code snippets and projects I host on this site for my own personal needs all of the time so I don't really need feedback that other people find them useful for me to continue uploading stuff.

Nevertheless, it is quite gratifying when someone does give me feedback that they found something useful. The most gratifying experience of all is when someone made bug fixes and/or improvements to something I have posted; that is the ultimate reward.

Yesterday I received an email from Anthony Whitaker who updated the compression test app to support a new format that is faster than MiniLZO and does not have a restrictive license.

Anthony also wanted me to mention that "The LZMA SDK from 7zip is also in the freeware domain. And 7zip seems quite fast (relative to Rar and Winzip)"

The new format that Anthony added is Marc Lehmann's LibLZF. This is a very impressive tiny library which is extremely fast and lightweight. It is hands down the winner of all of the compression libraries already added to the test framework.

If you feel you have a better one, feel free to modify the compression test library to support your format. Let me know if you want to be a project member of the Google Code site.

The official location of the compression test library is located here on Google Code.

For the record, the compression test library now supports:

CRYPTOPP
MINILZO
ZLIB
BZIP
and now LIBLZF

Monday, November 30, 2009

Convex Decomposition Library now available on Google Code




***NOTICE ** NOTICE ** My convex decomposition library is now deprecated.

Please see this blog post for further details.

Today I refactored my convex decomposition code into the form of a 'drop in' library that people can use. Here is a link to the Google Code repository.

You need only include a single header file 'NvConvexDecomposition.h' to use the tool.

What is convex decomposition?

Convex decomposition is the process of taking an arbitrarily complex concave triangle mesh and approximating it as a collection of convex objects.

See the image above to for an example of how an approximate convex decomposition can, depending on the recursion depth and other factors, accurately represent a collision model for a triangle mesh. There are no hard and cold set of parameters to use for all triangle meshes; how detailed you make the approximate decomposition is always going be dependent on your specific use case.

Why do I need it?

Most physics engines will not allow you to represent arbitrary triangle meshes as dynamic objects. Even if the physics/collision engine does support arbitrary triangle mesh collision, it always comes at a major cost in terms of performance and memory.

Therefore, if you have complex triangle meshes that you wish to be realistically simulated in a dynamic world the best solution is to substitute it for a collection of compound convex hulls.

The physics engine will see the collection of convex hulls for collision while the graphics displays the original source triangle mesh for the viewer. Depending on how accurately your compound convex hulls match the source graphics you may, or may not, see some small artifacts. The choice of how detailed to make the approximation is almost always a design decision based on your specific use case.

How do I use it?

I am providing the Convex Decomposition library as a relatively small collection of source code that resides in its own namespace. The source code is C++ and is in the namespace 'CONVEX_DECOMPOSITION'. Even though there are a number of CPP and header files you only need to include a single header file to use the library.

Simply include 'NvConvexDecomposition.h'

Create an instance of the library:

CONVEX_DECOMPOSITION::iConvexDecomposition *ic = CONVEX_DECOMPOSITION::createConvexDecomposition();


Next, add the triangles from the original raw triangle mesh you wish to approximate.

You add them one triangle at a time, passing in three const float pointers, each representing a single X,Y,Z co-ordinate in the original triangle mesh.

Example:

for (NxU32 i=0; iaddTriangle( tris[i].p0, tris[i].p1, tris[i].p2 );


Once you have added all of the triangles you can compute the convex decomposition as follows:

ic->computeConvexDecomposition(skinWidth,
decompositionDepth,
maxHullVertices,
concavityThresholdPercent,
mergeThresholdPercent,
volumeSplitThresholdPercent,
useInitialIslandGeneration,
useIslandGeneration,
useBackgroundThreads);


The convex decomposition process can take a very long time. For this reason the default behavior is to perform the computation in a background thread.

You will have to query: 'isComputeComplete' to know when it has finished.

If you want to interrupt the computation before it is done you can call 'cancelCompute'

To find out how many convex hulls were produced you call: 'getHullCount'

You can then iterate through the hull results and grab each one as needed: 'getConvexHullResult'

When you are finished using the library, simply call:

CONVEX_DECOMPOSITION::releaseConvexDecomposition(ic);

What are the current limitations of this implementation?

The code is functional but could use the following improvements:

(1) The convex hull generator, originally written by Stan Melax, could use some major code cleanup.

(2) The code to remove T-junctions appears to have a bug in it. This code was working fine before, but I haven't had time to debug why it stopped working.

(3) Island generation once the mesh has been split is currently disabled due to the fact that the Remove Tjunctions functionality has a bug in it.

(4) The code to perform a raycast against a triangle mesh does not currently use any acceleration data structures.

(5) When a split is performed, the surface that got split is not 'capped'. This causes a problem if you use a high recursion depth on your convex decomposition. It will cause the object to be modelled as if it had a hollow interior. A lot of work was done to solve this problem, but it hasn't been integrated into this code drop yet.

When will these limitations be fixed?

It is hard to say. The current version is functional and I don't foresee having a lot of time to work on this any time soon. If a specific need arises I may get back to it sooner. I definitely want to add the acceleration structures to the raycast routine; which will speed things up. I also really want to know why the remove tjunctions code has a bug in it, since that was working fine before.

What is all of this source code, what is it for?

As I said before, you only need to include 'NvConvexDecomposition.h' and nothing else to use the library. I will, however, document what the other pieces of source code are and what purpose they serve.

  • NvConcavityVolume.h / .cpp : Computes the 'volume of concavity' of a triangle mesh. It does this by first computing the convex hull for this triangle mesh. Next, it examines every single individual triangle in the original mesh and 'projects it' onto both the convex hull and on itself. If the projection hits the hull, then that extruded triangle volume is added as part of the 'total concavity volume'. It the projection hits the mesh itself, it first determines if it hit an interior or exterior face. If it hits an interior face, the volume is rejected. Otherwise, it adds to the volume of concavity the projection up to the point of self-intersection. This would be easier to explain with some drawings and pictures but I don't have time to make those up right now.
  • NvConvexDecomposition.h / .cpp : This is the main public interface to the convex decomposition library. This is the only header your application should need to include directly.
  • NvFloatMath.h / .cpp / .inl : This contains a collection of common and useful math library routines which work of of either const float or const double pointers.
  • NvHashMap.h : This single header file contains support for both vector and hashmap container template classes. The reason this header file is used is because some of the game engine integrations that I work on do not allow the use of the STL at all.
  • NvMeshIslandGeneration.h / .cpp : This is a code snippet which walks the topology of a triangle mesh, effectively flood filling it along the edges, to compute all of the independent mesh islands. Let's take the simple case. You pass in a single triangle mesh which actually contains two separate boxes. By walking the topology of the mesh the NvMeshIslandGeneration code snippet would immediately identify these as two distinctly unique triangle meshes.
  • NvRayCast.h / .cpp : This code snippet provides based support to cast a ray against an arbitrary triangle mesh. Ideally it would use acceleration structures (such as an axis-aligned bounding volume tree). The current version does not yet have this optimization built in.
  • NvRemoveTjunctions.h / .cpp : This code snippet is designed to identify and remove T-junctions in the triangle mesh. After a triangle mesh has been split by a plane it is common for t-junctions to be introduced. However, if you wish to continue to walk the topology of the mesh, either to identify mesh islands or to generate capped ends, there can be no t-junctions allowed in the mesh. This routine was not working properly when I put the library together so is currently disabled.
  • NvSimpleTypes.h : This is a tiny header file that creates a set of platform and compiler independent standard defines for integer and floating point data types.
  • NvStanHull.h / .cpp : Contains the convex hull generation code written originally by Stan Melax.
  • NvThreadConfig.h / .cpp : This is a small code snippet that provides a platform independent abstraction for creating threads and mutexes.
  • NvUserMemAlloc.h : This is a simple header file that controls how memory allocation is performed by this library. The default behavior is to use malloc/free and global new/delete. However, you can modify this header file to redirect memory allocations performed by this library to vector off to your application specific memory allocator if needed.
  • wavefront.h / cpp : This is a tiny code snippet to read a Wavefront OBJ triangle mesh file. It is not considered part of the convex decomposition library and is only included for the sample application.
  • main.cpp : This is a basic sample application that shows how to use the convex decomposition library by loading a triangle mesh from a Wavefront OBJ and uses command line switches to experiment with different parameters.

How does the convex decomposition algorithm work?

The convex decomposition algorithm works as follows.

  1. The input mesh has any possible t-junctions removed to make sure that it has a clean topology. (Currently disabled)
  2. The input mesh is broken up into any unique mesh islands. This is collection of triangles which all share edges. (Currently only enabled on the initial triangle mesh)
  3. For each mesh island we do the following steps.
  4. We compute the overall volume of the triangle mesh.
  5. Make sure we have not exceeded our maximum decomposition depth, if we have, we are done.
  6. We compute the volume of the current part of the mes we are working on. We then calculate what percentage the volume of this piece is, relative to the volume of the entire object. It the volume is too small (based on the input parameter 'volumeSplitThresholdPercent' then we are done with this piece.
  7. Next we compute the 'volume of concavity'. This is done by building a convex hull around this piece of the mesh and then projecting the volume of each triangle on the original mesh onto either the mesh itself or the surrounding convex hull. The sum total is represented as the 'volume of concavity'.
  8. We compute what percentage the volume of concavity is relative ot the volume of the surrounding convex hull. This percentage is then compared against the input parameter 'concavityThresholdPercent'. If the volume percentage of concavity is below this threshold we consider it 'good enough' and accept this piece as is.
  9. If we have determined that the mesh piece is too concave we split it in half. First we compute the best fit OBB (object oriented box) around the convex hull of this piece. We create a split plane exactly half way along the long axis. (Other techniques for performing approximate convex decomposition have focused a lot of work finding just the absolutely perfect split plane to use. As you will see later, this algorithm doesn't require that.)
  10. Once we have split the mesh by the plan we need to cap the ends so that the two pieces do not have holes in them. If we do not cap the ends then, as we recurse more deeply we will end up producing a hollow interior. It is important to note that this feature is not yet implemented! A lot of work has been done on this process and quite likely the library will be revised to support this feature at some time in the future.
  11. Having split the mesh at the mid-point we just keep recursively calling the routine until we have exceeded our maximum recursion depth and/or other limits specified.
  12. The final step is critical. Since we split each piece on the long axis of its orientation there is no guarantee that we split at exactly the correct location. We make up for this by actually splitting the source mesh many more times that is necessary. The final clean-up phase called 'merging' corrects for this.
  13. For every hull we have generated we try combining it with every other hull in the decomposition. We compute the volume of the two hulls separately and compare that to the volume of the two hulls combined. If the difference in volume is below the parameter 'mergeThresholdPercent' than we combine the two hulls into one. We continue this process until we can no longer merge any hulls. In a brute force manner this technique causes the merged hulls to converge towards the optimal split boundaries.

Thursday, November 19, 2009

ThreadFrac revised



Grrr...I went to run my copy of 'ThreadFrac' that was uploaded in a couple of places and it wouldn't run due to side-by-side assembly issues or some such nonsense.

I rebuilt it with an embedded manifest and it should run now.

Let me know if you have a problem; requires Windows with DX9 to run.

The reason I was looking for ThreadFrac is that I want to start messing with the new 3d equation called 'Mandelbulb'.

I'm going to write a real-time multi-threaded raytracer and add it to my ThreadFrac program when I get a chance to work on.

If you haven't seen the 'Mandelbulb' yet, check it out.

Here is the link to the revised executable for ThreadFrac.

Thursday, November 05, 2009

A test application for the MeshImport library and showcasing EZMesh



I spent some time this morning and created a really nice sample program to demonstrate how to use the MeshImport plugins. (Here is a link to the formal documentation for the EZ-Mesh file format. It is a work in progress but should be finished up tomorrow)

The MeshImport plugin library provides an easy mechanism to both import and export fully skeletal deformed and animated mesh content.

It supports multiple graphics file formats but I am strongly recommending the EZMesh graphics file format.

I describe the reasons for this in the enclosed readme.

The readme can be found at this link: MeshImportReadme.txt

An example of an EZ-Mesh file which describes a skeleton, an animation, and a deformed mesh, can be found at the following link. In only 34 lines of XML it is able to describe all of the components necessary to a skeletal deformed (and animated) mesh in a highly human readable format. Let's see COLLADA (or any other graphics file format) try doing that.

EXPORT.EZM

To download the sample application, with all of the necessary DLL components, use this link:

MeshImportTest.zip

You can browse the sample demonstration source code here, so see how easy it is to actually use the MeshImport library.

main.cpp

Documentation for the Ez-Mesh file format.

Here are some more important links:

Here is the link to a Google Code project containing the Mesh Import test application.

If you are interested in building the plugins, or perhaps writing a new mesh importer. I have recently updated the source in the Google Code project.

The Google Code project for the Mesh Import libraries is located here.

Once you sync, to build the project load the solution file in:

\MeshImport\compiler\vc8\MeshConvert.sln

The FBX importer is not built by default, because it is dependent on the FBX SDK from Autodesk which you must obtain here. None of the other importers contain any external dependencies.

The solution file for the FBX importer is located in:

\MeshImport\compiler\vc8\MeshImportFBX.sln

You will need to change the include paths to point to wherever you have the FBX SDK on your machine.

Friday, October 23, 2009

Bug Fixes : Improvements to 'FastXml'



I made a couple of bug fixes and improvements to the 'FastXml' code snippets. 'FastXml' is a very tiny code snippet (less than 300 lines of code) that will parse an XML file *in place*. It is very fast and easy to use.

The 'in place' behavior is what makes it extremely powerful. When it calls back with elements, attributes, and data, those pointers are *persistent for the lifetime of the object*! That means you do not need to do any string copying and can cache those pointers directly if it speeds up your processing code.

The bugs which were fixed are as follows:

(1) The original version had a bug that if you had an element with no attributes or data it would fail. Example:

(2) The original version could not interpret XML comments. It now does, and will call you back with the comment data. Example:

The feature improvement is that it will now call you back when an element is closed. This is sometimes necessary when parsing data with hierarchical state. Not typically necessary if you are just parsing the XML data as a linear stream.

I have updated the source files and they are located here:

FastXml.h
FastXml.cpp

Tuesday, September 29, 2009

Memory allocations are not 'bad' Debunking the myth



I recently had a conversation with an engineer at a game company where he expressed annoyance that his development team was using that damned STL (STLPort in particular). I asked why and he felt that way. His response was that he couldn't stand how much memory allocation that the STL was doing. He also expressed annoyance that the performance of STL map was so slow and his programmers kept using it.

This raises a couple of points. First of all, there is rarely a reason to use STL map. The STL map/set is a red-black tree. The only reason you would ever use it is if you needed to maintain a set of objects in perfectly sorted order. This use case doesn't come up a lot, though, and most people are using it just to do key/value lookups. However, if all you want is a key/value lookup you should instead be using the hash_map.

What I really want to address in this post is the myth that 'memory allocations' are 'bad'.

What does someone mean by 'bad'?

I think what the mean is as follows:

  • Memory allocations are slow
  • Memory allocations cause fragmentation
  • Memory allocations cause thread contention
  • Memory allocations have way too much overhead
  • Memory allocations are not cache-coherent
Developers often hold these assumptions, but are they really true?

What if none of these things were true? Then, do you really have a valid complaint about how many allocations the STL is doing, or your other container classes?

With the micro-allocator I presented in the previous post, none of these criticisms are really valid.

Myth #1 : Memory allocations are slow : With the micro-allocator I presented memory allocations, and frees, are extraordinarily fast. A memory allocation simply involves returning a free pointer from a linked list 99.9999% of the time.

One reason people sometimes think that memory allocators are slow is...because they are *when running in a debugger in a debug build*. It is important to measure real performance outside of the debugger and in a release build when considering the cost of your memory allocation system.

Myth #2 : Memory allocations cause fragmentation : The design of my micro-allocator is that you give it a budget which is equal to an estimate of the maximum amount of memory needed for micro-allocations by your application. This is allocated a single time and carved up in power of two multiples. The allocator works within this single fixed block of memory and, so long as you do not exceed the budget estimate, it will never range outside this block of memory. No fragmentation, in the sense of memory being returned from all over the address space, will occur.

Myth #3 : Memory allocations cause thread contention : Last year when I was working on my JobSwarm code snippet I spent some time making sure it worked with a lockless queue. Once I had this working I tried to create a benchmark that would show the performance benefit of a lockless queue versus a standard mutex lock. What I found, is that it was very difficult to create a scenario where any significant performance distinction could be measured. It turns out, that unless you have an application which is spending almost all of its time in contention, you can't even quantify a benefit. Add to this the fact that my micro-allocator is so fast, usually just taking a few instruction cycles, any contention from a thread lock is going to be very small unless the only thing your application does *is* memory allocation in multiple threads simultaneously.

All that said, ideally a micro-allocator should entirely avoid or minimize thread contention. There are commercial micro-allocators that do this. The most well known is The Hoard Memory Allocator. Hoard is specifically designed to be highly efficient in multi-threaded environments. Also, as we move to more and more cores and, therefore more and more simultaneous threads, minimizing thread contention is important.

For my current implementation of the micro-allocator I just threw a mutex lock around the allocation and free routines. This was a quick and lazy approach to the problem though, as I said before, I doubt it is even an issue in most scenarios. That said, I would like to optimize it to be lockless, or near lockless, in the future. (It appears that I can download an evaluation copy of Hoard so sometime in the next week or two I will try to run benchmarks between it and my micro-allocator and I will report the results.)

Myth #4 : Memory allocations have way too much overhead : Most memory allocators will stick a header block in front of every allocation. This header block will often include linked list pointers and possibly a sentinel in a debug build.

The micro-allocator I presented, however, does not introduce any additional overhead. If you allocate 8 bytes, it takes exactly 8 bytes of memory. If you allocate 16 bytes, it takes exactly 16 bytes, etc. Now, if you allocate 14 bytes, it will, in fact, return a block of memory that is actually 16 bytes in length (the nearest power of two) thus 'wasting' the 2 bytes in between.

Perhaps the micro-allocator could be considered as having 'too much overhead' in the sense that it is a power of two allocator and always rounds up to the nearest power of two boundary. However, since it works primarily in a fixed budget of memory for small allocations, this is probably acceptable.

Myth #5 : Memory allocations are not cache-coherent : My micro-allocator is a power of two allocator and, especially since it requires no header block, always returns cache aligned blocks of memory. Any allocations of small objects in a sequence will naturally end up sequentially in the same area of memory.

All allocations 16 bytes to 256 bytes in size will be perfectly aligned on a page boundary. 8 byte allocations will lie on an 8 byte boundary or, every other one, 16 bytes.

Conclusion

The point I am trying to make here is that the concerns people have about using the STL, or other containers which do per-object allocations, are largely irrelevant *if* you have a high-speed micro-allocator under the hood.

There is no need to make special containers and special small object allocators if you have confidence that your core allocator already has these behaviors built right in.

Over the course of the next few months I plan to be doing some optimizations on console platforms using the micro-allocator I wrote and will be reporting any performance numbers I come up with.

So far I have only tested it extensively on the PC and it really doesn't show any performance difference between standard malloc/free which, as I pointed out earlier, is surprisingly fast anyway.

The lesson to be learned is that you shouldn't complain about the 'cost' of lots of tiny memory allocations until you know for fact that your application is really feeling a legitimate cost. If you find that there is a significant cost by using the standard library malloc/free, don't resort to writing a bunch of custom allocators and template code until you have evaluated a high performance general purpose small object allocator like Hoard, Doug Lea's malloc, or my own micro allocator I have posted here.