Home » c# » c# – Split a comma-separated list that contains some simple strings, and some JSON-Exceptionshub

c# – Split a comma-separated list that contains some simple strings, and some JSON-Exceptionshub

Posted by: admin February 24, 2020 Leave a comment

Questions:

Here’s a weird one. I’m given an ill-conceived input string that is a list of simple strings OR JSON blobs, separated by commas. e.g.:

string input = "{<some JSON object>},Normal Text,Some-Other-String-Without-Commas,{JSON_3},...,{JSON_n}"

And I have to break this into two lists – a list of JSON strings, and a list of non-JSON strings.

The nice thing is that the non-JSON strings are known to contain no special characters (no commas, and no curly braces which might be mistaken as JSON). The not-nice thing is that the JSON blobs (all of which will start with { and end with }), will obviously contain plenty of commas.


The “obvious” solution (using String.Split):

List<string> split = input.Split(',').ToList();

would of course fail to escape commas present within the JSON objects ({}) themselves


I was considering a manual approach – walking the string character-by-character and only splitting out a new element if the count of { is equal to the count of }. Something like:

List<string> blobs = new List<string>();
int start = 0, nestingLevel = 0;
for (int i = 0; i < input.Length; i++)
{
    if (input[i] == '{') nestingLevel++;
    else if (input[i] == '}') nestingLevel--;
    else if (input[i] == ',' && nestingLevel == 0)
    {
        blobs.Add(input.Substring(start, i - start));
        start = i + 1;
    }
}
// Trivial TODO: split blobs into JSON and non-JSON by checking if the first character is '{'

(Note: above definitely contains bugs)

This approach probably fails to handle a myriad of things that might appear in JSON. For example, The characters { and } may ‘benignly’ appear in JSON if they are escaped within a string (quotation marks) – but if I start counting quotation marks, I might encounter escaped quotation marks (\"), which should not be counted. But if I check for escape characters, I better make sure they-themselves are not escaped (\\) – what a nightmare. I would prefer not to end up writing a full-fledged JSON parser myself.


I had also considered adding JSON array braces on either end of the string ([]) and letting a JSON serializer deserialize it as a JSON array, then re-serialize each of the array elements one at a time:

List<string> JsonBlobs = Newtonsoft.Json.Linq.JArray.Parse("[" + input + "]").Select(t => t.ToString()).ToList();

The only problem with this is that any JSON deserializer I’ve encoutnered will “not” handle random non-JSON strings within a list of object.

My guess is that the ideal solution will need to be a hybrid between the above two solutions. The following monstrosity comes to mind:

List<string> blobs = new List<string>();
int start = 0;
bool in_json_land = false;
for (int i = 0; i < input.Length; i++)
{
    if (input[i] == '{') in_json_land = true;
    else if (input[i] == '}' and in_json_land) {
        try {
            JToken blob = Newtonsoft.Json.Parse(input.Substring(start, i - start));
            blobs.Add(blob.ToString());
            start = i + 1;
        } catch { /* Must not have encountered the end of the JSON yet... */ }
    }
    else if (input[i] == ',' && !in_json_land)
    {
        blobs.Add(input.Substring(start, i - start));
        start = i + 1;
    }
}

Any better suggestions?

How to&Answers: