CRT-Royale

CRT-Royale is a collection of shaders developed by TroggleMonkey.

It is quite an advanced shader that requires a bit of documentation in order to get the most out of it.
Contents

Basics

crt-royale is a highly customizable CRT shader for Retroarch and other programs supporting the libretro Cg shader standard. It uses a number of nonstandardized extensions like sRGB FBO's, mipmapping, and runtime shader parameters, but hopefully it will run without much of a fuss on new implementations of the standard as well.

Customizable Parameters

There are a huge number of parameters. Among the things you can customize:

How to Customize

There are two major ways to customize the shader:

Preset Versions

You may also note that there are two major versions of the shader preset:

There's a lot to play around with, and I encourage everyone using this shader to read through the user-settings.h file to learn about the parameters. Before loading the shader, be sure to read the next section, entitled...

Frequently Asked Questions

Why is the shader crashing when I load it!?

Do you get C6001 or C6002 errors with integrated graphics, like Intel HD 4000? If so, please try one of the following .cgp presets:

These load .cg wrappers that #define INTEGRATED_GRAPHICS_COMPATIBILITY_MODE (also available in user-settings.h) before loading the main .cg shader files.

Integrated graphics compatibility mode will disable these three features, which currently require more registers or instructions than Intel GPU's allow:

Using Intel-specific .cgp files is equivalent to #defining INTEGRATED_GRAPHICS_COMPATIBILITY_MODE in your user-settings.h. Out of the box, user-settings.h is configured for maximum configurability and compatibility with dedicated nVidia and AMD/ATI GPU's. Compatibility mode is disabled by default to avoid silently degrading quality for AMD/ATI and nVidia users, so Intel-specific .cgp's are a convenient way for Intel users to play with the shader without editing text files.

I've tested this solution on Intel HD 4000 graphics, and it should work for that GPU at least, but please let me know if you're still having problems!

Why is everything so slow!?

Out of the box, this will be a problem for all but monster GPU's. The default user-settings.h file disables any features and optimizations which might cause compilation failure on AMD/ATI GPU's. Despite the name of the options, this is not a problem with your card or drivers; it's a shortcoming in the Cg shader compiler's nVidia-centric profile setups.

Uncommenting the following #define macros at the top of user-settings.h will help performance a good deal on compatible nVidia cards:

1
2
3
4
#define DRIVERS_ALLOW_DERIVATIVES
#define DRIVERS_ALLOW_DYNAMIC_BRANCHES
#define ACCOMODATE_POSSIBLE_DYNAMIC_LOOPS #define DRIVERS_ALLOW_TEX2DLOD
#define DRIVERS_ALLOW_TEX2DBIAS

A few of these warrant some elaboration. First, derivatives:

Dynamic branches allow the shader to avoid performing computations that it doesn't need (but might have, given different runtime options). Without them, the shader has to either let the GPU evaluate every possible codepath and select a result, or make a "best guess" ahead of time. The full phosphor bloom suffers most from not having dynamic branches, because the shader doesn't know how big of a blur to use until it knows your phosphor mask dot pitch...which you set at runtime if shader parameters are enabled.

If RUNTIME_PHOSPHOR_BLOOM_SIGMA is commented out (faster), this won't matter: The shader will just select the blur size and standard deviation suitable for the mask_triad_size_desired_static setting in user-settings.cgp. It will be fast, but larger triads won't blur enough, and smaller triads will blur more than they need to. However, if RUNTIME_PHOSPHOR_BLOOM_SIGMA is enabled, the shader will calculate an optimal standard deviation and try to use the right blur size for it...but using an "if standard deviation is such and such" condition would be prohibitively slow without dynamic branches. Instead, the shader uses the largest and slowest blur the user lets it use (to cover the widest range of triad sizes and standard deviations), according to these macros:

1
2
3
4
#define PHOSPHOR_BLOOM_TRIADS_LARGER_THAN_3_PIXELS
//#define PHOSPHOR_BLOOM_TRIADS_LARGER_THAN_6_PIXELS
//#define PHOSPHOR_BLOOM_TRIADS_LARGER_THAN_9_PIXELS
//#define PHOSPHOR_BLOOM_TRIADS_LARGER_THAN_12_PIXELS

The more you have uncommented, the larger the triads you can blur, but the slower runtime sigmas will be if your GPU can't use dynamic branches. By default, triads up to 6 pixels wide will be bloomed perfectly, and a little beyond that (8 should be fine), but going too far beyond that will create blocking artifacts in the blur due to an insufficient support size.

The tex2Dlod function allows the shader to disables anisotropic filtering, which can get confused when we're manually tiling the texture coordinates for a small resized phosphor mask tile (it creates nasty seam artifacts). There are several ways the shader can deal with this: The cheapest is to use tex2Dlod to tile the output of MASK_RESIZE across the screen...and the slower alternatives either require derivatives or force the shader to draw 2 tiles to MASK_RESIZE in each direction, thereby reducing your maximum allowed dot pitch by half.

According to nVidia's Cg language standard library page, tex2Dbias requires the fp30 profile, which doesn't work on ATI/AMD cards...but you might actually have mixed results. This can be used as a substitute for tex2Dlod at times, so it's worth trying even on ATI.

#Why is everything still so slow?!?

For maximum quality and configurability out of the box, almost all shader parameters are enabled by default (except for the disproportionately expensive runtime subpixel offsets). Some are more expensive than others. Commenting the following macro disables all shader parameters:

1
#define RUNTIME_SHADER_PARAMS_ENABLE

Commenting these macros disables selective shader parameters:

1
2
3
4
5
6
7
#define RUNTIME_PHOSPHOR_BLOOM_SIGMA
#define RUNTIME_ANTIALIAS_WEIGHTS
//#define RUNTIME_ANTIALIAS_SUBPIXEL_OFFSETS
#define RUNTIME_SCANLINES_HORIZ_FILTER_COLORSPACE
#define RUNTIME_GEOMETRY_TILT
#define RUNTIME_GEOMETRY_MODE
#define FORCE_RUNTIME_PHOSPHOR_MASK_MODE_TYPE_SELECT

Note that all shader parameters will still show up in your GUI list, and the disabled ones simply won't work.

Finally, there are a lot of other options enabled by default that carry serious performance penalties. For instance, the default antialiasing filter is a cubic filter, because it's the most configurable, but it's also quite slow if RUNTIME_ANTIALIAS_WEIGHTS is #defined. A lot of the static true/false options have a significant influence, and the shader is faster if the red subpixel offset (from which the blue one is calculated as well) is zero...even if it's a static value, because RUNTIME_ANTIALIAS_SUBPIXEL_OFFSETS is commented out. To avoid any confusion, I should also clarify now that subpixel offsets are separate from scanline beam convergence offsets.

To quickly see how much performance you can get from other settings, you can temporarily replace your user-settings.h with one of:

Why won't my shader bloom my phosphors enough?

First, see the discussion about dynamic branching above, in 1. If you don't have dynamic branches, you can either uncomment the lines that let the shader pessimistically use larger blurs than it's guaranteed to need (which is slow), or...you can just use crt-royale-fake-bloom.cgp, which doesn't have this problem. :)

Why can't I make my phosphors any bigger?

By default, the phosphor mask is Lanczos-resized in earlier passes to your specified dot pitch (mask_sample_mode = 0). This gives a much sharper result than mipmapped hardware sampling (mask_sample_mode = 1), but it can be much slower without taking proper care: If the input mask tile (containing 8 phosphor triads by default) is large, like 512x512, and you try to resize it to 24x24 for 3x3 pixel triads, the resizer has to take 128 samples in each pass/direction (the max allowed) for a 3-lobe Lanczos filter. This can be very slow, so I made the output of MASK_RESIZE very small by default: Just 1/16th of the viewport size in each direction. The exact limit scales with your viewport size, and it should be reasonable, but the restrictions can get tighter if we can't use tex2Dlod and have to fit two whole tiles (16 phosphor triads with default 8-triad tiles) into the MASK_RESIZE pass for compatibility with anisotropic filtering (long story).

If you want bigger phosphor triads, you have two options:

.cgp files are allowed to pass more information to .cg shaders (which would require major updates to the cg2glsl script).

Why can't I make my phosphors any smaller than 2 pixels per triad?

This is controlled by mask_min_allowed_triad_size in your user-settings.h file. Set it to 1.0 instead of 2.0 (anything lower than 1 is pointless), and you're set. It defaults to 2.0 to make mask resizing twice as fast when dynamic branches aren't allowed. Some people may want to be able to fade the phosphors away entirely to get a more PVM-like scanlined image though, so change it to 1.0 for that (or get a higher-resolution display ;)).

Note: This setting should be obsolete soon. I have some ideas for more sophisticated mask resampling that I just don't have a spare few hours to implement yet.

I am not running integrated graphics. Why am I getting errors?

First recheck the top of your user-settings.h to make sure incompatible driver options are commented out (disabled). If they're all disabled and you're still having problems, you've probably found a bug. There are bound to be a number of them with certain setting combinations, and there might even be a few individual settings I broke more recently than I tested them. My contact information is up top, so let me know!

Why am I getting banding in dark colors? Or, why won't mipmapping work?

crt-royale uses features like sRGB and mipmapping, which are not available in the latest Retroarch release (1.0.0.2) at the time of this writing.

You may get banding in dark colors if your platform or Retroarch version doesn't support sRGB FBO's, and mask_sample_mode 1 will look awful without mipmapping. I expect most platforms capable of running this shader at full speed will support sRGB FBO's, but if yours doesn't, please let me know, and I'll include a note about it.

Alternately, setting levels_autodim_temp too low will cause precision loss and banding.

How do I set geometry/curvature/etc.?

If RUNTIME_SHADER_PARAMS_ENABLE and RUNTIME_GEOMETRY_MODE are both #defined (not commented out) in user-settings.cgp, you can find these options in your shader parameters (in Retroarch's RGUI for instance) under e.g. geom_mode. Otherwise, you can set the corresponding e.g. geom_mode_static options in user-settings.h.

Why don't my shader parameters stick?

This is a bit confusing, at least in the version of Retroarch I'm using. In the Shader Options menu, Parameters (Current) controls what's on your screen right now, whereas Parameters (RGUI) seems to control what gets saved to a shader preset (in your base shaders directory) with Save As Shader Preset.

Why did you slow the shader down with all of these features I don't want? Why didn't you make the defaults more to my liking?

The default settings tend to best match flat ~13" slot mask TV's with sharp scanlines. Real CRT's however vary a lot in their characteristics (and many are softer in more ways than one), so it's impossible to make the default settings look like everyone's favorite CRT. Moreover, it's impossible to decide which of the slower features and options are superfluous:

Some people love curvature, and some people hate it. Some people love scanlines, and some people hate them. Some people love phosphors, and some people hate them. Some people love interlacing support, and some people hate it. Some people love sharpness, and some people hate it. Some people love convergence error, and some people hate it. The one thing you hate the most is probably someone else's most critical feature. This is why there are so many options, why the shader is so complicated, and why it's impossible to please everyone out of the box...unfortunately.

That said, if you spend some time tweaking the settings, you're bound to get a picture you like. Once you've made up your mind, you can save the settings to a user-settings.h file and disable shader parameters and other slow options to get the kind of performance you want.

Why didn't you include a shader preset with NTSC support? Why didnt' you include more canned presets with different options? Why can't I select from one of several user settings files without manual file renaming?

I do plan on adding a version that uses the NTSC shader for the first two passes, but it will take a bit of work, because there are several NTSC shader versions as it is. It's easy enough to combine the HALATION_BLUR passes into a one-pass blur from blurs/blur9x9fast.cg, but I'm not sure yet just how much modification the NTSC shader passes themselves might need for best results.

I originally wanted NTSC support to be included out-of-the-box, but I'd also like to release the shader ASAP, so it'll have to wait.

As for other canned presets, that's a little more complicated: I DO intend on creating more canned presets, but the combinatorial explosion of major codepath options in this shader is too overwhelming to be as exhaustive as I'd like. When I get the time, I'll add what I can to make this more user-friendly. In the meantime, I'll start adding a few different default versions of the user settings file and put them in a subdirectory for people to manually place in the main directory and rename to "user-settings.h."

However, the libretro Cg shader specification (and the Cg to GLSL compiler) does not currently allow .cgp files to pass any static settings to the source files. This presents a huge problem, because it means that in order to create a new preset with different options, I also have to create duplicate files for EVERY single .cg pass for every permutation, not just the .cgp. I plan on creating a number of skeleton wrapper .cg files in a subdirectory (which set a few options and then include the main .cg file for the pass), but it'll be a while yet. In the meantime, I'd rather let people play with what's already done than keep it hidden on my hard drive.

Why do so many values in user_settings.h have a _static suffix?

The "_static" suffix is there to prevent naming conflicts with runtime shader parameters: The shader usually uses a version without the suffix, which is assigned either the value of the "_static" version or the runtime shader parameter version. If a value in user-settings.h doesn't have a "_static" suffix, it's usually because it's a static compile-time option only, with no corresponding runtime version. Basically, you can ignore the suffix. :)

Are there any broken settings I should be aware of? What if I want to change settings in the .cgp file?

As far as I know, all of the options in user-settings.h and the runtime shader parameters are pretty robust, with a few caveats:

There is a broken and unimplemented option in derived-settings-and-constants.h, but users shouldn't need to mess around in there anyway. (It's related to the more efficient phosphor mask resampling I want to implement.)

However, the .cgp files are another story: They are pretty brittle, especially when it comes to their interaction with user-cgp-constants.h. Be aware that the shader passes rely on scale types and sizes in your .cgp file being exactly what they expect. Do not change any scale types from the defaults, or you'll get artifacts under certain conditions. You can change the BLOOM_APPROX and MASK_RESIZE scale values (not scale types), but you must update the associated constant in user-cgp-constants.h to let the .cg shader files know about it, and the implications may reach farther than you expect. Similarly, if you plan on changing an LUT texture, make sure you update the associated constants in user-cgp-constants.h. In short, if you plan on changing anything in a .cgp file, you'll want to read it thoroughly first, especially the "IMPORTANT" section at the top.

What are the most common dot pitches for CRT televisions? What kind of resolution would I need for a real shadow mask?

The most demanding CRT we're ever likely to emulate is a Sony PVM-20M4U:

For 3-pixel triads, we would need about 6k UHD resolution. A BVM-20F1U has similar requirements.

However, common slot masks are far more similar to the kind of image this shader will produce at 900p, 1080p, 1200p, and 1440p:

The included EDP shadow mask starts looking very good with ~6-pixel triads, so it may take nearly 4k resolution to make it a particularly compelling option. However, it's possible to make smaller shadow masks on a pixel-by-pixel basis and tile them at a 1:1 ratio (mask_sample_mode = 2). I may include a mask like this in a future update.

Is this phosphor bloom realistic?

Probably not:

Realistically, the "phosphor bloom" blurs bright phosphors significantly more than your eyes would bloom the brighter phosphors on a real CRT. This extra blurring however is necessary to distribute enough brightness to nearby pixels that we can amplify the overall brightness to that of the original source after applying the phosphor mask. If you're interested, there are more comments on the subject at the top of the fragment shader in crt-royale-bloom-approx.cg.

On the subject of the phosphor bloom: I intended to include some exposition about the math behind the brightpass calculation (and the much more complex and thorough calculation I originally used to blur the minimal amount necessary, which turned out to be inferior in practice), but that document isn't release-ready at the moment. Sorry Hyllian. ;)
So what do you plan on adding in the future?

I'd like to add these relatively soon:

Maybe's: