Home » c# » dictionary – C# Memory Concerns-Exceptionshub

dictionary – C# Memory Concerns-Exceptionshub

Posted by: admin February 24, 2020 Leave a comment

Questions:

I work with cellular automata. My repo for my work is here. The basic structure is

1) A grid of
2) cells, which may have
3) agents.

The agents act according to a set of rules, and typically one designates “states” for the agents (agents of different states have different rules). One (relatively) well-known CA is the game of life.

I’m trying to expand things a bit more and incorporate other types of “properties” in my CAs, mainly to simulate various phenomena (imagine an animal agent that consumes a plant agent, with the plant’s “biomass” or what have you decreasing).

To do this I’m incorporating a normal dictionary, with strings as keys and a struct called CAProperty as the value. The struct is as follows:

public struct CAProperty
{
    public readonly string name;
    public readonly dynamic value;
    //public readonly Type type;
    public CAProperty(string name, dynamic value)
    {
        this.name = name;
        this.value = value;
    }
}

(note: previously I had the “type” variable to enable accurate typing at runtime…but in attempts to solve the issue in this post I removed it. Fact is, it’ll need to be added back in)

This is well and good. However, I’m trying to do some work with large grid sizes. 100×100. 1000×1000. 5000×5000, or 25 million cells (and agents). That would be 25 million dictionaries.

Visual Studio memory snapshot

See the image: a memory snapshot from Visual Studio for a 4000×4000 grid, or 16 million agents (I tried 5000×5000, but Visual Studio wouldn’t let me take a snapshot).

On the right, one can clearly see that the debugger is reading 8 GB memory usage (and I tried this in a release version to see 6875 MB usage). However, when I count up everything in the third column of the snapshot, I arrive at less than 4 GB.

Why is there such a dramatic discrepancy between the total memory usage and the size of objects stored in memory?

Additionally: how might I optimize memory usage (mainly the dictionaries – is there another collection with similar behavior but lower memory usage)?

Edit: For each of the three “components” (Grid, Cell, Agent) I have a class. They all inherit from an original CAEntity class. All are shown below.

    public abstract class CAEntity
    {
        public CAEntityType Type { get; }
        public Dictionary<string, dynamic> Properties { get; private set; }

        public CAEntity(CAEntityType type)
        {
            this.Type = type;
        }

        public CAEntity(CAEntityType type, Dictionary<string, dynamic> properties)
        {
            this.Type = type;
            if(properties != null)
            {
                this.Properties = new Dictionary<string, dynamic>(properties);
            }
        }
    }

    public class CAGraph:CAEntity
    {
        public ValueTuple<ushort, ushort, ushort> Dimensions { get; }
        public CAGraphCell[,,] Cells { get;}
        List<ValueTuple<ushort, ushort, ushort>> AgentCells { get; set; }
        List<ValueTuple<ushort, ushort, ushort>> Updates { get; set; }
        public CA Parent { get; private set; }
        public GridShape Shape { get; }
        //List<double> times = new List<double>();
        //System.Diagnostics.Stopwatch sw = new System.Diagnostics.Stopwatch();

        public CAGraph (CA parent, ValueTuple<ushort, ushort, ushort> size, GridShape shape):base(CAEntityType.Graph)
        {
            this.Parent = parent;
            this.Shape = shape;
            AgentCells = new List<ValueTuple<ushort, ushort, ushort>>();
            Updates = new List<ValueTuple<ushort, ushort, ushort>>();
            Dimensions = new ValueTuple<ushort, ushort, ushort>(size.Item1, size.Item2, size.Item3);
            Cells = new CAGraphCell[size.Item1, size.Item2, size.Item3];
            for (ushort i = 0; i < Cells.GetLength(0); i++)
            {
                for (ushort j = 0; j < Cells.GetLength(1); j++)
                {
                    for (ushort k = 0; k < Cells.GetLength(2); k++)
                    {
                        Cells[i, j, k] = new CAGraphCell(this, new ValueTuple<ushort, ushort, ushort>(i, j, k));
                    }
                }
            }
        }

        public CAGraph(CA parent, ValueTuple<ushort, ushort, ushort> size, GridShape shape, List<ValueTuple<ushort, ushort, ushort>> agents, CAGraphCell[,,] cells, Dictionary<string, dynamic> properties) : base(CAEntityType.Graph, properties)
        {
            Parent = parent;
            Shape = shape;
            AgentCells = agents.ConvertAll(x => new ValueTuple<ushort, ushort, ushort>(x.Item1, x.Item2, x.Item3));
            Updates = new List<ValueTuple<ushort, ushort, ushort>>();
            Dimensions = new ValueTuple<ushort, ushort, ushort>(size.Item1, size.Item2, size.Item3);
            Cells = new CAGraphCell[size.Item1, size.Item2, size.Item3];
            for (ushort i = 0; i < size.Item1; i++)
            {
                for (ushort j = 0; j < size.Item2; j++)
                {
                    for (ushort k = 0; k < size.Item3; k++)
                    {
                        //if(i == 500 && j == 500)
                        //{
                        //    Console.WriteLine();
                        //}
                        Cells[i, j, k] = cells[i, j, k].Copy(this);
                    }
                }
            }
        }
    }

    public class CAGraphCell:CAEntity
    {
        public CAGraph Parent { get; set; }
        public CAGraphCellAgent Agent { get; private set; }
        public ValueTuple<ushort, ushort, ushort> Position { get; private set; }
        //private Tuple<ushort, ushort, ushort>[][] Neighbors { get; set; }
        //System.Diagnostics.Stopwatch sw = new System.Diagnostics.Stopwatch();

        public CAGraphCell(CAGraph parent, ValueTuple<ushort, ushort, ushort> position):base(CAEntityType.Cell)
        {
            this.Parent = parent;
            this.Position = position;
            //this.Neighbors = new Tuple<ushort, ushort, ushort>[Enum.GetNames(typeof(CANeighborhoodType)).Count()][];
        }

        public CAGraphCell(CAGraph parent, ValueTuple<ushort, ushort, ushort> position, Dictionary<string, dynamic> properties, CAGraphCellAgent agent) :base(CAEntityType.Cell, properties)
        {
            this.Parent = parent;
            this.Position = position;
            if(agent != null)
            {
                this.Agent = agent.Copy(this);
            }
        }
    }

    public class CAGraphCellAgent:CAEntity
    {
        // have to change...this has to be a property? Or no, it's a CAEntity which has a list of CAProperties.
        //public int State { get; set; }
        public CAGraphCell Parent { get; set; }
        //System.Diagnostics.Stopwatch sw = new System.Diagnostics.Stopwatch();

        public CAGraphCellAgent(CAGraphCell parent, ushort state):base(CAEntityType.Agent)
        {
            this.Parent = parent;
            AddProperty(("state", state));
        }

        public CAGraphCellAgent(CAGraphCell parent, Dictionary<string, dynamic> properties) :base(CAEntityType.Agent, properties)
        {
            this.Parent = parent;
        }
    }
How to&Answers:

It sounds like your problem is that your representation of your agents (using dictionaries) consumes too much memory. If so, the solution is to find a more compact representation.

Since you’re working in an object-oriented language, the typical solution would be to define an Agent class, possibly with subclasses for different types of agents, and use instance variables to store the state of each agent. Then your CA grid will be an array of Agent instances (or possibly nulls for vacant cells). This will be a lot more compact than using dictionaries with string keys.

Also, I would recommend not storing the position of the agent on the grid as part of the agent’s state, but passing it as a parameter to any methods that need it. Not only does this save a bit of memory just by itself, but it also allows you to place references to the same Agent instance at multiple cells on the grid to represent multiple identical agents. Depending on how often such identical agents occur in your CA, this may save a huge amount of memory.

Note that, if you modify the state of such a reused agent instance, the modification will obviously affect all agents on the grid represented by that instance. For that reason, it may be a good idea to make your Agent objects immutable and just create a new one whenever the agent’s state changes.

You might also want to consider maintaining a cache (e.g. a set) of Agent instances already on the grid so that you can easily check if a new agent might be identical with an existing one. Whether this will actually do any good depends on your specific CA model — with some CA you might be able to handle de-duplication sufficiently well even without such a cache (it’s perfectly OK to have some duplicate Agent objects), while for others there might simply not be enough identical agents to make it worthwhile. Also, if you do try this, note that you’ll need to either design the cache to use weak references (which can be tricky to get right) or periodically purge and rebuild it to avoid old Agent objects lingering in the cache even after they’ve been removed from the grid.


Addendum based on your comment below, which I’ll quote here:

Imagine an environment where the temperature varies seasonally (so a property on the graph). There are land and water cells (so properties on cells), and in low enough temperatures the water cells become frozen so animal agents can use them to cross over between land locations. Imagine those animal agents hunt other animal agents to eat them (so properties on the agents). Imagine the animal agents that get eaten eat trees (so other agents with properties), and tend to eat young saplings (limiting tree growth), thereby limiting their own growth (and that of the carnivore agents).

OK, so let’s sketch out the classes you’d need. (Please excuse any syntax errors; I’m not really a C# programmer and I haven’t actually tested this code. Just think of it as C#-like pseudocode or something.)

First of all, you will obviously need a bunch of agents. Let’s define an abstract superclass for them:

public abstract class Agent {
    public abstract void Act(Grid grid, int x, int y, float time);
}

Our CA simulation (which, for simplicity, I’m going to assume to be stochastic, i.e. one where the agents act one at a time in a random order, like in the Gillespie algorithm) is basically going to involve repeatedly picking a random cell (x, y) on the grid, checking if that cell contains an agent, and if so, calling Act() on that agent. (We’ll also need to update any time-dependent global state while we’re doing that, but let’s leave that for later.)

The Act() methods for the agents will receive a reference to the grid object, and can call its methods to make changes to the state of nearby cells (or even get a reference to the agents in those cells and call their methods directly). This could involve e.g. removing another agent from the grid (because it just got eaten), adding a new agent (reproduction), changing the acting agent’s location (movement) or even removing that agent from the grid (e.g. because it starved or died of old age). For illustration, let’s sketch a few agent classes:

public class Sapling : Agent {
    private static readonly double MATURATION_TIME = 10;  // arbitrary time delay

    private double birthTime;  // could make this a float to save memory
    public Sapling(double time) => birthTime = time;

    public override void Act(Grid grid, int x, int y, double time) {
        // if the sapling is old enough, it replaces itself with a tree
        if (time >= birthTime + MATURATION_TIME) {
            grid.SetAgentAt(x, y, Tree.INSTANCE);
        }
    }
}

public class Tree : Agent {
    public static readonly Tree INSTANCE = new Tree();

    public override void Act(Grid grid, int x, int y, double time) {
        // trees create saplings in nearby land cells
        (int x2, int y2) = grid.RandomNeighborOf(x, y);
        if (grid.GetAgentAt(x2, y2) == null && grid.CellTypeAt(x2, y2) == CellType.Land) {
            grid.SetAgentAt(x2, y2, new Sapling(time));
        }
    }
}

For the sake of brevity, I’ll leave the implementation of the animal agents as an exercise. Also, the Tree and Sapling implementations above are kind of crude and could be improved in various ways, but they should at least illustrate the concept.

One thing worth noting is that, to minimize memory usage, the agent classes above have as little internal state as possible. In particular, the agents don’t store their own location on the grid, but will receive it as arguments to the act() method. Since omitting the location in fact made my Tree class completely stateless, I went ahead and used the same global Tree instance for all trees on the grid! While this won’t always be possible, when it is, it can save a lot of memory.

Now, what about the grid? A basic implementation (ignoring the different cell types for a moment) would look something like this:

public class Grid {
    private readonly int width, height;
    private readonly Agent?[,] agents;

    public Grid(int w, int h) {
        width = w;
        height = h;
        agents = new Agent?[w, h];
    }

    // TODO: handle grid edges
    public Agent? GetAgentAt(int x, int y) => agents[x, y];
    public void SetAgentAt(int x, int y, Agent? agent) => agents[x, y] = agent;
}

Now, what about the cell types? You have a couple of ways to handle those.

One way would be to make the grid store an array of Cell objects instead of agents, and have each cell store its state and (possibly) an agent. But for optimizing memory use it’s probably better to just have a separate 2D array storing the cell types, something like this:

public enum CellType : byte { Land, Water, Ice }

public class Grid {
    private readonly Random rng = new Random();
    private readonly int width, height;
    private readonly Agent?[,] agents;
    private readonly CellType[,] cells;  // TODO: init in constructor?

    private float temperature = 20;  // global temperature in Celsius

    // ...

    public CellType CellTypeAt(int x, int y) {
        CellType type = cells[x,y];
        if (type == CellType.Water && temperature < 0) return CellType.Ice;
        else return type;
    }
}

Note how the CellType enum is byte-based, which should keep the array storing them a bit more compact than if they were int-based.

Now, let’s finally look at the main CA simulation loop. At its most basic, it could look like this:

Grid grid = new Grid(width, height);
grid.SetAgentAt(width / 2, height / 2, Tree.INSTANCE);

// average simulation time per loop iteration, assuming that each
// actor on the grid acts once per unit time step on average
double dt = 1 / (width * height);

for (double t = 0; t < maxTime; t += dt) {
    (int x, int y) = grid.GetRandomLocation();
    Agent? agent = grid.GetAgentAt(x, y);
    if (agent != null) agent.Act(grid, x, y, t);
    // TODO: update temperature here?
}

(Technically, to correctly implement the Gillespie algorithm, the simulation time increment between iterations should be an exponentially distributed random number with mean dt, not a constant increment. However, since only the actor on one of the width * height cells is chosen in each iteration, the number of iterations between actions by the same actor is geometrically distributed with mean width * height, and multiplying this by dt = 1 / (width * height) gives an excellent approximation for an exponential distribution with mean 1. Which is a long-winded way of saying that in practice using a constant time step is perfectly fine.)

Since this is getting long enough, I’ll let you continue from here. I’ll just note that there are plenty of ways to further expand and/or optimize the algorithm I’ve sketched above.

For example, you could speed up the simulation by maintaining a list of all grid locations that contain a live actor and randomly sampling actors from that list (but then you’d also need to scale the time step by the inverse of the length of the list). Also, you may decide that you want some actors to get more frequent chances to act than others; while the simple way to do that is just to use rejection sampling (i.e. have the actor only do something if rng.Sample() < prob for some prob between 0 and 1), a more efficient way would be to maintain multiple lists of locations depending on the type of the actor there.