Discussion:
std::string does not have reference counter
(too old to reply)
Boris Vidolov
2003-09-30 09:55:50 UTC
Permalink
Hello all,

Maybe you have noticed that the new std::string object is not reference
counted. Consider this code:



string s = "This is one loooooooooong string..................";



string t = s;



In this case the whole long memory is copied to the buffer of variable t,
which is significantly slower than the old version of std::string - in it we
simply increment the proper reference counter and set t to point to the same
string. This technique is very fast in copying and is widely (I personally
have created at about 20 different classes, implementing the refcounted
technique).



The question is why in the last version of Microsoft Visual C++ this
mechinism is suppressed. A colleague of mine joked, that thus we could
compile a C++ code, using std::string, than a C# code, which string is
possibly refcounted and then say: see the C++ code is even slower than the
C# one! C# is even faster than C++ :) This would be a good marketing trick,
but behind the scenes what is the real reason for removing refcounted
internal object from std::string?



Regards,

Boris
André Pönitz
2003-09-30 10:18:49 UTC
Permalink
Post by Boris Vidolov
Maybe you have noticed that the new std::string object is not reference
string s = "This is one loooooooooong string..................";
string t = s;
In this case the whole long memory is copied to the buffer of variable t,
which is significantly slower than the old version of std::string - in it we
simply increment the proper reference counter and set t to point to the same
string. This technique is very fast in copying and is widely (I personally
have created at about 20 different classes, implementing the refcounted
technique).
It's hard to get right in the presence of 'dangling references' as
handed out by 'operator[]' and doesn't buy as much as one may think.

You have to un-share the string e.g. as soon as operator[] is called on
a non-const string, so often you end up with doing the copy anyway _and_
have the ref count overhead.

Moreover, most strings are 'short', so the usual 'short string
optimization' where strings up to a certain size are not stored on the
heap but in the object itself yields pretty good performance.

Have you done any benchmarking on any 'real world application' and
hard numbers that non-ref counted strings are slower, or do you just
have a 'bad feeling'?

Andre'
tom_usenet
2003-09-30 10:33:15 UTC
Permalink
On Tue, 30 Sep 2003 12:55:50 +0300, "Boris Vidolov"
Post by Boris Vidolov
Hello all,
Maybe you have noticed that the new std::string object is not reference
string s = "This is one loooooooooong string..................";
string t = s;
In this case the whole long memory is copied to the buffer of variable t,
which is significantly slower than the old version of std::string - in it we
simply increment the proper reference counter and set t to point to the same
string. This technique is very fast in copying and is widely (I personally
have created at about 20 different classes, implementing the refcounted
technique).
However, in multithreaded code the use of reference counting is a
major pessimization. Since the string has to be usable for
multithreaded code, they chose to remove reference counting. Other
standard library implementations have also taken this step to avoid
the complication of maintaining separate implementations for single
and multithreaded code.
Post by Boris Vidolov
The question is why in the last version of Microsoft Visual C++ this
mechinism is suppressed. A colleague of mine joked, that thus we could
compile a C++ code, using std::string, than a C# code, which string is
possibly refcounted and then say: see the C++ code is even slower than the
C# one! C# is even faster than C++ :) This would be a good marketing trick,
but behind the scenes what is the real reason for removing refcounted
internal object from std::string?
They've chosen a string that is fast in multithreaded code, and very
fast for short strings (<32 characters IIRC). However, if you want a
string that is optimal for longish strings in single threaded code,
then you have to look elsewhere. E.g.
http://www.cuj.com/documents/s=7994/cujcexp1906alexandr/

What are you using std::string for that has proven too slow? You might
be better off using a rope class or similar. There are numerous
articles available about the problems of std::string trying to do two
jobs at the same time (be a immutable string and a mutable string -
String and StringBuffer in Java).

Tom
Thore B. Karlsen
2003-09-30 12:14:11 UTC
Permalink
On Tue, 30 Sep 2003 12:55:50 +0300, "Boris Vidolov"
Post by Boris Vidolov
Hello all,
Maybe you have noticed that the new std::string object is not reference
string s = "This is one loooooooooong string..................";
string t = s;
In this case the whole long memory is copied to the buffer of variable t,
which is significantly slower than the old version of std::string - in it we
simply increment the proper reference counter and set t to point to the same
string. This technique is very fast in copying and is widely (I personally
have created at about 20 different classes, implementing the refcounted
technique).
The question is why in the last version of Microsoft Visual C++ this
mechinism is suppressed.
The refcounted version didn't work. It was not thread safe.

Check out these articles:

http://www.gotw.ca/gotw/043.htm
http://www.gotw.ca/gotw/044.htm
http://www.gotw.ca/gotw/045.htm
--
Be seeing you.
Boris Vidolov
2003-09-30 15:09:22 UTC
Permalink
Wow,
Good answers,
thank you very much for the quick responce.
It is not a matter of bad feeling, I simply know the price of allocations -
allocations of small data.
The good thing in the new implementation is the fact that those allocations
do not occur for strings, which length is <= 16.

For a long period I was working on a huge CAD system. In its source we count
on the fact that the std::string is refcounted, which made us do things like
this:

string GetName() const { return msName; }

instead of:

const string& GetName() const { return msName; }

We use the upper variant not only because we were lazy, but also because in
some situations, using the const reference to the string we have to execute
indirectly code like this

msName = msName;

Unfortunately on some compiler's implementations of the string object (e.g.
the old Borlanad C++ 5.02) there was problems with this code. It caused
access violations, because it first decremented the refcounter of msName,
then increment the one of the newly freeed object (it is freeed as result of
decrementing). Assuming that the copy operation of a std::string object is
fast, we change a lot of code to return string instead of const string&. We
also rarely used operator [], but quite often used used operator +.

Now we are afraid, that the code, we wrote is slow.
This is the reason for that discussion. Anyway I am leaving the project now,
so my colleagues would test this.

Regards,
Boris
Post by Boris Vidolov
Hello all,
Maybe you have noticed that the new std::string object is not reference
string s = "This is one loooooooooong string..................";
string t = s;
In this case the whole long memory is copied to the buffer of variable t,
which is significantly slower than the old version of std::string - in it we
simply increment the proper reference counter and set t to point to the same
string. This technique is very fast in copying and is widely (I personally
have created at about 20 different classes, implementing the refcounted
technique).
The question is why in the last version of Microsoft Visual C++ this
mechinism is suppressed. A colleague of mine joked, that thus we could
compile a C++ code, using std::string, than a C# code, which string is
possibly refcounted and then say: see the C++ code is even slower than the
C# one! C# is even faster than C++ :) This would be a good marketing trick,
but behind the scenes what is the real reason for removing refcounted
internal object from std::string?
Regards,
Boris
Stephen Howe
2003-09-30 17:28:18 UTC
Permalink
Post by Boris Vidolov
The question is why in the last version of Microsoft Visual C++ this
mechinism is suppressed. A colleague of mine joked, that thus we could
compile a C++ code, using std::string, than a C# code, which string is
possibly refcounted and then say: see the C++ code is even slower than the
C# one! C# is even faster than C++ :) This would be a good marketing trick,
but behind the scenes what is the real reason for removing refcounted
internal object from std::string?
AFAIK, all the major compiler vendors have removed reference counting
(copy-on- write: COW) from std::string. Why? Because it practical impossible
to get good performance from COW strings if multithreaded.

Have a look here:
http://www.gotw.ca/gotw/045.htm

And while there are benchmarks that show COW beating non-COW, there are
others showing the reverse. It really depends on what programmers do with
their strings.

Stephen Howe

Continue reading on narkive:
Loading...