Strings (std::string) Demystified in C++ (including concatenation)

What is a String?

It is simply just a word. Strings are all around us. Like dog, cat, etc. A more formal definition would be the following: it is a series of characters that has a varying length.
Needless to say, strings are one of the most important concepts in programming.

Before we dive in, I would like to address those experienced only in high level languages such as Java, Python or Javascript. In C, there is no built-in support for strings.  Fortunately, the C++ standard library provides users with a string class. C++ developers probably let out a sigh of relief, since they didn’t have to write implementations for the equals operator or the size() method.

Below is a very simple example of using a std::string class.
#include <string>
using namespace std;                    // Purely for demonstration purposes. Don't do this in the real world.
 
int main() {
    string firstString = "bob";
    cout << bob.length() << endl;      // 3
    return 0;
}

Why should I use Strings?

The std::string provides a robust API that has been thoroughly tested. Consider this: if we built our own string, we would have to do some fairly intense low-level programming. Even after all that, chances are, our string class would likely not up to par (performance-wise).

Fastest way to append Strings

In this experiment (from stackoverflow.com), the following methods methods were used to append strings in a for loop that iterated 100,000 times.
  • "+" operator. I.E.
    l_czTempStr = s1 + "Test data2" + "Test data3";
  •  “+=” operator. I.E.
    l_czTempStr += "Test data2";
  • append() method.
    I.E.l_czTempStr.append("Test data2");
  • using std::ostringstream. I.E.
    std::ostringstream oss;

    oss << "Test data1";

The test results are quite interesting. The results using the  <b>+</b> operator and ostringstream are by far the slowest. No contest. By examining the results, it is evident that append() or the += operator appears to be the fastest. The compiler used for the test was

clang++ -std=c++11 -O3 -DVER=1 -Wall -pedantic -pthread main.cpp. 

In general, it is fine to either use the append() or += method, but the general rule of thumb is

do not append using + operator. The ostringstream use case should be situational. It should only be used when users need to convert non-string values to string.
int number = 1337;
std::ostringstream oss;
oss << "Adding numbers to strings: " << number;          // number is converted.

String Concatenation behind the scenes

If it wasn’t made clear in previous sections, concatenation is the process of combining two body of sentences to form a paragraph or a new sentence. We are able to concatenate strings in different ways I.E. using the  += and + operator.

Concatenation with the + operator

You might be surprised with what happens behind the scenes. There is a good reason why string concatenation is expensive (even in high level language). Every time a string concatenation changes, we are not manipulating another value. Instead, all the characters inside of the previous string (on the left hand side), copied and added onto a new string. Performance-wise (time and memory), this becomes costly when we are dealing with many strings.
According to cppreference, the + operator

Returns a string containing characters from

lhs

followed by the characters from

rhs.

Under the cover, strings are one-dimensional character array ending with a null terminator ‘\0‘. When we concatenate a string, we are creating a new string, copying the contents of both the first and second argument into the new array. Needless to say, concatenation via the + operator is unnecessarily costly in most situations.

Concatenation with the += operator

In the previous section, we already established the fact that the += operator is much faster. But why? Because we are simply appending the right hand side to the existing left hand side string. As a result, NO new instance is being created here! I hope that examining the code has made things clearer.

Don’t believe what I just said? Take a look at cppreference yourself! It states that += appends additional characters to the string. Take note that += returns *this. It is the pointer to the current string object. As a result, this method is chainable.

Concatenation via append()

According to cppreferenceappend()is almost identical to the +=operator. It returns *thisand also appends additional characters to the string. However, the main difference is that the function has different overloads. For example, unlike the += operator,  append() also accepts the starting and ending index of a string as its second and third parameter. This enables users to append parts of a string. Therefore, append() can be more flexible. On the downside however, an excessive amount of overloaded definitions can be a source of confusion.

Concatenation via stringstream

To those coming from a Java or C# background, beware! Here I was, thinking I was being efficient by adding strings to the string stream and converting it into a string when it was time to use it. In case you didn’t get it … newsflash: Stringstreams are not the equivalent of StringBuilders. I too was deceived when starting out on C++. Let us scrutinize this class and examine what is really going on behind the scenes.

In the example above, we use the <<operator, which inserts formatted data.The ostringstream

class is used for outputs, istringstream for inputs. stringstreamis used for handling both output and inputs. Newsflash: stringstream as the name suggests processes streams of data I.E. byte streams. Behind the scenes, it implicitly converts these data (I.E. int) to string. Internally, stringstreamuse growable buffers. In other words, memory is constantly allocated as the buffer grows. Therefore, the memory allocation gets gradually slower as the string grows in size.

Ways to create strings

There are many ways to create strings in C++. The examples used in this section are from C++ Primer (5th Edition). Although it is quite a thick book, C++ primer is a great resource for those that are willing to fork in the time and effort to sift through its contents. As a result, I highly recommend it to those who are serious about C++.
  1. string s1;                    // default initialization; s1 is an empty string
  2. string s2 = s1;               // s2 is a copy of s1
  3. string s2(s1);                // same as above.
  4. string s3("Hello")            // s3 is a copy of the string literal. Does not include null.
  5. string s3 = "hiya";           // Equivalent to above, s3 is a copy of the string literal
  6. string s4(10, 'c');           // s4 is "cccccccccc"
In a later post, I will be covering C++ copy constructors. For the purposes of this tutorial, a default copy constructor is a copy constructor that can be called with argument of the same type. In the case above, it would be a constructor with a string object passed to it.
Examples two and three use the default constructor in the example provided above.
Important points to take away
  • When a string literal is supplied, all characters supplied minus the null terminator are copied into newly created instance.
  • When supplied with count and char, the result contains that many copies of the character.

What to avoid

Avoid concatenating strings using the + operator like the plague. You might be saying to yourself  “not this again”. But it is THAT important! A common example of the aforementioned plague is the code on the following line.
std::cout << "Troll source " + aStr + ". Dont append like this" << std::endl;
Instead, it should be written out like this.
std::cout << "Not troll " << aStr << ". Do it like this" << std::endl;

 

Companion Types

The string has a few companion type. They enable library types to be used in a machine-independent manner. The type size_typeis an example. Although don’t know its exact type, we know that it is an unsigned number. As the name suggests, it is used to store a string’s size.

E.g. string::size_type size = testStr.size();

String Comparison

Strings are the primary method of storing and representing textual data. Cases will arise where strings need to be processed depending on how they fare in comparison to one another. It is therefore, a fundamental operation.

Using equality operators (==, !=)

Unlike Java, we can use == or != to compare two strings. In Java, the equality operators (I.e. == or !=) checks whether the two items are pointing at the same space in memory.  Instead, in Java, we uses the built-in equals() method to compare the value of two strings. Below are some basic gotchas.

  • String comparisons are case sensitive. I.E.
    "test" == "Test" // false
  • Null terminator does not count as a character. I.E.
    "test" == "test\0"  // true

Using relational operators (<, <=, >, >=)

 Because the explanation in C++ Primer is so good, I am going to borrow from it again. Because the explanations are quite difficult to understand on its own, I am going to paraphrase it to make it as clear as possible.
  1. If two strings have different length
    • If every character in the shorter text is equal to the corresponding character of the longer text.
      • Result: shorter text is less than the longer one.
    • Otherwise, result of the string comparison = result of comparing first character at which string differs.
  2. If any character at corresponding character in two strings differ
    • Then, result of the string comparison = result of comparing first character at which string differs.

Case 1: Two strings have different length

Let us visit the first point: if two strings are different in length. Consider two strings: “Hell” and “Hello“. Note that every character is equal to the corresponding characters in the other text. Therefore,  "hell" < "hello" == true. Finally, let us look at one more case. If two strings are different in length and also in corresponding characters. Consider “herr” and  “hello“. Since the length is different and the corresponding text are not equal, we will compare the first pair of characters that are different, which are ‘r‘ and ‘l‘ at index 2. If we compare ‘l‘ with ‘r‘ (their ascii code values), 'l' < 'r' == true. Therefore the following is true. For the example below, we will assume that the value stored in the variables below are the same as their name. I.E.

std::cout << (herr < hello) << std::endl;    // 1, which is equivalent to true.

Case 2: Two strings have same length

Let us take a look at the two strings “Hello“and “Herro“. They are the same length, so we have to use point two. The two strings are definitely not equal. Therefore, we compare the first character at which the text differs, which are once again, ‘r‘ and ‘l‘at index 2. As examined in a previous example, 'l' < 'r' is true.

Key points

If in the case that the content is too overwhelming, below are a list of points that you can take away from this post.
  • Do not concatenate using + operator. Use += or append().
  • Only use stringstream when you need to convert other data types (E.g. int) to strings
  • String literals E.g. “Hello” are not of type std::string. They are character arrays.
  • The append() method offers many useful overloads for more concise string manipulation.

Best Way to Understand Strings

There is an overused saying that practice makes perfect. The saying is overused, because more often that not, the saying proves true in a large amount of cases. Practice, practice practice! Especially to beginners, I cannot stress this enough. In software development, head knowledge will only get you so far. All of it is meaningless unless you are able to utilize the knowledge when writing code.  And what better way to gain that experience but by writing code and using strings? Reading code is great, don’t get me wrong. But reading alone will only get you so far. At some point, you will need to write in order to solidify the knowledge that your brain is processing.

For absorbing new knowledge, it is important to always be mindful of that piece of knowledge when writing code. For example, with the + vs +=, a good way to apply the knowledge is to go through your existing code. Replace all instances of + with += where applicable. Think of when it would be a good idea and when it would not be. Assess whether the code appending strings with + can be optimized. One possible method may be to build up your block of text with append().

Conclusion

In conclusion, strings are an indispensable part of any program. Therefore, I highly recommend everybody to get acquainted with it as soon as possible. The concepts surrounding strings are quite elementary. As a result, strings can be understood and even mastered in time.
In case you are wondering, yes, in this post, the amount of source code examples is quite sparse. I have tried to keep it as simple as possible, but if you want additional source code examples, please leave me a message.

About the Author Jay

I am a programmer currently living in Seoul, South Korea. I created this blog as an outlet to express what I know / have been learning in text form for retaining knowledge and also to hopefully help the wider community. I am passionate about data structures and algorithms. The back-end and databases is where my heart is at.

follow me on:
9 Shares