Drawing many large sprites with the pixel game engine (AKA achieving parallax with the pge)

please note this article refers to the pixel game engine version 1, version 2 will have support for far more optimal large sprite rendering out of the box, however some of the information here might be interesting to others that follow. especially if wanting to do own manipulation of sprites.

History

A few weeks ago at the time of writing I started work on my 3rd and most ambitious (so far) pixel game engine project, this was to be a rough and ready conversion of the arcade game scramble, it was about this point in time I realised I didn’t really properly recall how scramble worked, so I decided instead I would write a fairly generic side scroller taking some of the ideas from scramble, some from defender, and add mix in some side scroller mechanics from the various side scrolling games I had played through out the years, and what of course does any side scroller need?… Parallax!

First Implementation

My initial implementation saw me create a little game engine that was 640×480 pixels, I decided to break the scene into 4 parts, I was to have a static background (or sky) painted directly, a background that moved at 1/2 the rate of the play area, a play area that would essentially have all the sprites and a forground that would move at twice the rate of the play area.

These were to be drawn in back to front order (the traditional painters algorithm), the sky would be painted with NORMAL mode, and all other layers would be masked.

This was implemented fairly quickly with the sky being a simple sprite drawn at 0,0 equal to the width and height of the main window (640×480), and the background and forground layers being “periodic” (1920×480) sprites set to drawn with DrawPartialSprite and the x offset based on a multiplication of the land position (a variable I use to store how far a player has travelled into the world), the play area at this time was simply a rectangle and a ship sprite but that was enough to get the idea. However even this simple setup was starting to display some performance issues.

First Speed hurdle and first optimization

Even with these relatively few simple layers performance was starting to drop. Debug mode was basically unusable and even release mode was dropping down towards the 30-40 fps mark it was clear that some optimizations were required.

The first and simplest optimisation I made was to look at the static sky layer, I had a sprite that was of exactly the same dimensions as the area it was being drawn to surely there had to be an easy optimisation there, and indeed there was.

Poking under the hood of the PGE you start to understand that everything that you can draw to is just a sprite and all a sprite is, is a very small wrapper around an array of pixels (a 4 byte structure with each byte representing a red, green, blue or alpha value between 0-255). the DrawSprite command will then loop over the array setting the value of the target based on the value in the source… This is a huge amount of work when we know that we want the draw target’s array to be a complete copy of the sources. So Instead of doing anything clever we can just use memcpy to do a bulk copy from the source sprite onto our draw target. The following is a modification I made to the pixelgameeninge code

void PixelGameEngine::DrawSprite(int32_t x, int32_t y, Sprite *sprite, uint32_t scale)
	{
		if (sprite == nullptr)
			return;

		if (scale &gt; 1)
		{
			//code removed for clarity
		}
		else if ((nPixelMode == Pixel::NORMAL) &amp;&amp; (x==0) &amp;&amp; (y==0) &amp;&amp; (sprite-&gt;width==pDrawTarget-&gt;width) &amp;&amp; (sprite-&gt;height == pDrawTarget-&gt;height))
		{
			olc::Pixel * srcpix  = sprite-&gt;GetData();
			olc::Pixel* dstpix = pDrawTarget-&gt;GetData();
			memcpy(dstpix, srcpix, sizeof(Pixel) * sprite-&gt;width * sprite-&gt;height);
		}

And while the code has since been removed and replaced into the ScanLineSprite class (see bellow), this optimization was enough to get performance back beyond 60 fps, which at the time was the aim.

Scanline Sprites

Optimizing the general MASKed case

Performance now was above my minimum threshold for playability, it was still far from perfect, and I knew one thing! I was going to want to add lots and lots more layers of parallax and I was going to want to have some large boss sprites on screen, and it was clear that further optimizations would need to be done, but how… The answer (or at least the answer I chose to implement) was something I am chosing to call ScanLineSprites.

What is a scan line sprite?

Put simply it is a derivative of the normal olc::Sprite class that holds an extra array this array will be of length equal to the height of the sprite, and each element of the array will itself be an array of “ScanLineDrawAreas” (SLDAs) which represent a position in the x axis (this could be derived solely from length as the first SLDA will always be at 0) and length, and a flag whether the SLDA is ON or OFF, off represents an area of the sprite that would not pass the mask test and ON represents an area that will pass it and then need to be drawn. In addition knowing how long that “area” is allows us to again use memcpy as a fast way of drawing (in masked mode) as we know how long the ScanLineDrawArea will be and in the case where there is nothing to draw it is very efficient to jump to the next area where pixels are to be drawn.

The Drawing Routines

To the ScanlineSprite class two drawing routines were added these are DrawOnTarget and DrawPeriodicOnTarget the first roughly replacing DrawSprite and the second replacing DrawPartialSprite (though it only deals with the PERIODIC case), these then allowed me to simply replace my exisiting DrawSprite and DrawPartialSprite calls (while it would have been possible to have replaced these functions directly in my engine however I chose to be more explicit.

Conclusions

The Speed ups

with the ScanLineSprite class in place I was able to achieve on my machine 80+ layers of full screen parallax, as well as many large enemy sprites onscreen at ~100 fps, dropping down to 60fps if I use the APHA blending mode on all layers. This compares with single digit figures ~5 fps when using the original sprite drawing routines. And while it is hard to give an exact number of overdraw without the optimizations this would be the equivalent to the PGE writing ~2.5 billion pixels per second.

Summary

One of the more fun things about using an engine, whether it being the pixel game engine or any other is when you start to get to grips with how it fits together, I would whole heartedly encourage anyone using this engine to try and push it outside it’s comfort zone, as the understanding gained optimizing for things it was not intended to do originally are priceless to understanding the tools you are working with (as well as a lot of fun).

In game breakdown

The Code

Please note it is my intention to place this project on GitHub at which point I will update this article (or place a comment with a link) however those that wish are welcome to use this code as they see fit (though I will not take any responsibility for doing so)

//ScanlineSprite.h
//Disclaimer
//The code here within is free to use and modify as seen fit 
//However the author will take no responsibility for any adverse effects of using this code either as or as not intended.

#pragma once
#include "olcPixelGameEngine.h"

struct ScanLineDrawArea
{
	int x;
	int length;
	bool on;
};

class ScanLineSprite : public olc::Sprite
{
private:
	std::vector<std::vector<ScanLineDrawArea&gt;&gt; mScanLines;
	bool mNeedsRecalculate = true;
public:

	void RecalculateScanLines(int maskCutoff)
	{
		mNeedsRecalculate = false;
		mScanLines.clear();
		for (int y = 0; y < height; y++)
		{
			bool inLine = (GetPixel(0, y).a &gt;= maskCutoff);
			int start = 0;
			std::vector<ScanLineDrawArea&gt; currLine;
			for (int x = 0; x < width; x++)
			{
				if (inLine != (GetPixel(x, y).a &gt;= maskCutoff)) {
					currLine.push_back(ScanLineDrawArea{ start, x - start, inLine });
					inLine = !inLine;
					start = x;
				}
			}
			currLine.push_back(ScanLineDrawArea{ start, width - start, inLine });
			mScanLines.push_back(currLine);
		}
	}

	ScanLineSprite(std::string sImageFile, int maskCutoff = 255) : Sprite(sImageFile)
	{
		RecalculateScanLines(maskCutoff);
	}

	ScanLineSprite(std::string sImageFile, olc::ResourcePack* pack, int maskCutoff = 255) : Sprite(sImageFile, pack)
	{
		RecalculateScanLines(maskCutoff);
	}

	ScanLineSprite(int32_t w, int32_t h) : Sprite(w, h)
	{
		//really if going to draw on a ScanLineSprite directly recalculate should be manually called before use
		mNeedsRecalculate = true;
	}

	void DrawOnTarget(olc::Sprite* target, int32_t x, int32_t y, olc::Pixel::Mode mode = olc::Pixel::MASK)
	{
		if ((mode == olc::Pixel::NORMAL) &amp;&amp; (x == 0) &amp;&amp; (y == 0) &amp;&amp; (width == target-&gt;width) &amp;&amp; (height == target-&gt;height))
		{
			olc::Pixel* srcpix = GetData();
			olc::Pixel* dstpix = target-&gt;GetData();
			memcpy(dstpix, srcpix, sizeof(olc::Pixel) * width * height);
		}
		else
		{
			DrawPeriodicOnTarget(target, x, y, 0, 0, width, height, mode);
		}
	}

	void DrawPeriodicOnTarget(olc::Sprite* target, int32_t x, int32_t y, int32_t ox, int32_t oy, int32_t w, int32_t h, olc::Pixel::Mode mode = olc::Pixel::MASK)
	{
		if (mNeedsRecalculate) {
			RecalculateScanLines(255);
		}

		if ((x &gt; target-&gt;width) || (y &gt; target-&gt;height))
		{
			return;
		}
		if (x < 0) {
			w = w + x;
			ox = ox - x;
			x = 0;
		}
		if (y < 0) {
			h = h + y;
			oy = oy - y;
			y = 0;
		}
		w = std::min(w, target-&gt;width - x);
		h = std::min(h, target-&gt;height - y);
		olc::Pixel* basetargetData = target-&gt;GetData();
		olc::Pixel* basesourceData = GetData();
		for (int j = 0; j < h; j++) {
			int sourcescanline = (oy + j) % height;
			std::vector<ScanLineDrawArea&gt; slda = mScanLines[sourcescanline];
			int lookForX = ox % width;
			int startIndex = 0;
			int offset = 0;
			for (int i = 0; i < slda.size(); i++)
			{
				if (slda[i].x <= lookForX) {
					startIndex = i;
					offset = slda[i].x - lookForX;
				}
				else {
					break;
				}
			}
			int len = 0;
			int left = w;
			offset = lookForX - slda[startIndex].x;
			olc::Pixel* dst = basetargetData;
			dst += ((y + j) * target-&gt;width) + x;
			while (left &gt; 0)
			{
				olc::Pixel* src = basesourceData;
				src += sourcescanline * width;
				src += slda[startIndex].x + offset;
				int toCopy = std::min(slda[startIndex].length - offset, left);
				if (slda[startIndex].on || mode == olc::Pixel::NORMAL)
				{
					if ((mode != olc::Pixel::ALPHA))
					{
						memcpy(dst, src, toCopy * sizeof(olc::Pixel));
					}
					else
					{
						olc::Pixel* currDst = dst;
						olc::Pixel* currSrc = src;
						for (int i = 0; i < toCopy; i++)
						{
							olc::Pixel d = *currDst;
							olc::Pixel p = *currSrc;
							if (p.a == 255)
							{
								*currDst = *currSrc;
							}
							else
							{
								float srcAlpha = p.a/255.0f;
								float c = 1.0f - srcAlpha;
								uint8_t* col = (uint8_t*)currDst;
								*col = (uint8_t)(srcAlpha * (float)p.r + c * (float)d.r);
								col++;
								*col = (uint8_t)(srcAlpha * (float)p.g + c * (float)d.g);
								col++;
								*col = (uint8_t)(srcAlpha * (float)p.b + c * (float)d.b);
								//we don't really use this so we could remove it
								col++;
								*col = (uint8_t)(srcAlpha * (float)p.a + c * (float)d.a);
							}
							currDst++;
							currSrc++;
						}
					}
				}
				dst += toCopy;
				left -= toCopy;
				startIndex = (startIndex + 1) % slda.size();
				offset = 0;
			}
		}
	}
};