POST
Conditioning Scripts
My engine relies on hashed strings as identifiers, uh, a lot more than is probably appropriate. I know that I’ve worked with engineers that would be utterly furious with me for that. Actually a lot more than just that because engineers are sort of notoriously catty, but I meant more specifically about all of the string hashing.
Hashing all of these strings every frame, especially through the scripting layer in Lua, seemed deeply unnecessary. A nice initial solution for me was to pre-process all of my scripts in my resource conditioning pipeline.
Basic Approach
I’ve got hundreds of calls to my hashing function using static strings all throughout the script sources like this:
SomeModule.SomeFunctionINeed( Hash32( 'some_free-form_ID of varying levels of quality' ) );
Now that’s not so bad, but hundreds and thousands of those calls each frame is just a lot of thrashing about for no particular reason.
By opening up the script files during resource compilation and searching for all instances of Hash32( 'somestring' )
and replacing them with the resultant hash value.
The runtime then just sees a 32bit integer constant in its place.
Before:
function UpdateReferences( world )
_directors = {}
World.Push( world );
_directors.ship = Tags.Find( Hash32( 'ship' ) );
_directors.camera = Tags.Find( Hash32( 'camera' ) );
World.Pop();
end
After:
function UpdateReferences( world )
_directors = {}
World.Push( world );
_directors.ship = Tags.Find( 381975909 );
_directors.camera = Tags.Find( 3546554283 );
World.Pop();
end
A step further would be to intern all of the strings as they’re seen to create a nice reverse mapping of hash->string for debugging output, etc. That seemed a bit too industrious for me, so I’m going to be waiting for Future Ryan to get fed up with raw numeric hash values in the debugging output. He’s a good sport and he’ll certainly solve this for us.
Nice Side-Effect
The original scripts are still completely valid. This doesn’t matter much for production, but in my project scripts are looked for first outside of the built data in order to make rapid script work a little less painful, so having unprocessed scripts functional leaves that mechanism extremely straightforward (the raw scripts are just in a Physfs mount point at a higher priority than the processed ones).
The Alternative(s)
A favorite solution to using raw strings in a project seems to be having a source file somewhere mapping raw strings to symbolic names to be used in code. I can tell you I’ve never seen that work out in practice. It is nice to corral all of your string typos into a single place, sure, but inevitably at least a handful of those strings will change out from under the variable name created for them and all of a sudden nothing is as it seems and you’ve blown out your compile times by creating a file that if you touch it at all the entire engine rebuilds. Just… just don’t be that person, OK?
For C/C++ sources there are things you can do to get your hash function to actually run at compile time.
Where this doesn’t help
Of course this doesn’t do anything to help with instances where a variable is used or the string is built up using other strings, but this basic approach could be extended to try and capture some of those cases. The scripts could be loaded into a special luaState that’s got some jerry-rigging in it to do some more analysis or other helper function calls
Other Situations
I’ve been using a similar approach for my audio patch scripts (which are in PureData) to scrub through and modify paths of other patches and audio files to be what the engine and libPD expect.
Additionally I recently modified the engine to build a dependency map which dictates which resources rely on which others at resource compilation time. Resources are scrubbed by their individual resource processor scripts and their dependencies are output there.
A post about the resource pipeline will have to wait for another time.
Now I know that this was pretty dry and image-less, so here’s some ad that I thought was funny: