Skip to content

Why should we allow leaking temporaries in C++23, since we can delete them with "deducing this&&"? #2276

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
igormcoelho opened this issue May 28, 2025 · 8 comments

Comments

@igormcoelho
Copy link

Sorry if this is not the right place to propose some ideas for cpp core guidelines, but I could not find any other place to begin a discussion. This proposal tries to create a new rule that prevents common dangling pointer errors with c++23 "deducing this".

In C++, it is typical to have dangling pointer errors in std::string code using c_str(), and it seems that tooling is able to warn them in some situations, like for std::string with clang Wdangling-gsl:

std::string f1() {
  std::string s{"abc1"};
  return s;
}
// ....
const char* s1 = f1().c_str(); // clang warns with Wdangling-gsl: Object backing the pointer will be destroyed at the end of the full-expression

If user ignores the warning of Wdangling-gsl or does not use -fsanitize=address, then it may cause unexpected problems. It also seems to be hardcoded to detect only the case of c_str() from std::string, as it does not issue warnings on similar cases. Consider MyString:

struct MyString {
  std::unique_ptr<char[]> ptr;
  int n;
  explicit MyString(const char* s) {
    this->n = std::strlen(s);
    ptr = std::make_unique<char[]>(n + 1);
    std::strcpy(ptr.get(), s);
  }

  // this is equally dangerous, but will not trigger any warning!
  const char* c_str() { return ptr.get(); }

  int length() const { return n; }
};

MyString f2() {
  MyString s{"abc2"};
  return s;
}
// ...
const char *s2 = f2().c_str();  // no warning here! but problem is the same...

Since c++23 we have "deducing this" (https://wg21.link/P0847R7), I believe we are finally able to delete these dangerous dangling operations, when we are certain they will fail:

  // const char* c_str() { return ptr.get(); }
  const char* c_str(this MyString& self) { return self.ptr.get(); }
  const char* c_str(this MyString&&) = delete;

This way, this flawed code fails to compile:

  const char *s2 = f2().c_str(); // error: Call to deleted member function 'c_str'

I could not find such discussions in the P0847R7 paper itself, and also could not find any guideline here that incentives people to delete this&& methods, specially when resources are leaked.

For example, the length() operation does not seem to be so evil as c_str(), since it only leaks a copy of the size:

std::cout << "s2 len=" << f2().length() << std::endl;  // this is fine! s2 len=4

So, there's no need to delete ALL functions that are rvalued on this, only the ones that leaks internal resources. I found this on cpp core guidelines:

What to do with leaks out of temporaries? : p = (s1 + s2).c_str();

If this is still an open issue, maybe C++ std::string should implement this C++23 strategy for c_str(), and other similar cases, like vector iterators, etc, and while this does not happen, could core guidelines incentive this strategy? Or is there some unknown or serious drawback on enforcing that?

Another similar case that can be solved with this strategy is related to for-range, that also dangles:

class Resource {
public:
  const auto &items() { return data; }
private:
  std::vector<const char *> data{"A", "B", "C"};
};
// ....
  for (auto &x : getResource().items()) // PROBLEM!
    std::cout << x << std::endl;

A recent solution is to do use c++20 range init instead:

  for (auto r = getResource(); auto &x : r.items()) // Fine!
    std::cout << x << std::endl;

However, I believe the real problem here is the items() function that it's inherently flawed, and should disallow leaking internal resources when rvalued:

  const auto &items(this Resource &self) { return self.data; }
  const auto &items(this Resource &&) = delete;
  // const auto &items() { return data; }

This correctly generates an error, even with classic for-range:

  for (auto &x : getResource().items()) // error: Call to deleted member function 'items'
    std::cout << x << std::endl;

Thanks for your time, and I hope this advice can help others, or even improving the c++ STL someday.

@jwakely
Copy link
Contributor

jwakely commented May 28, 2025

Since c++23 we have "deducing this" (https://wg21.link/P0847R7), I believe we are finally able to delete these dangerous dangling operations, when we are certain they will fail:

  // const char* c_str() { return ptr.get(); }
  const char* c_str(this MyString& self) { return self.ptr.get(); }
  const char* c_str(this MyString&&) = delete;

Maybe I'm missing something, but your overload set doesn't allow calling c_str() on a const lvalue, and doesn't seem to have any benefit compared to this (which has been possible since C++11):

const char* c_str() const { return ptr.get(); }
void c_str() && = delete;

This way, this flawed code fails to compile:

  const char *s2 = f2().c_str(); // error: Call to deleted member function 'c_str'

But what about this perfectly fine code?

printf("%s", f2().c_str());

Isn't that also prevented?

@igormcoelho
Copy link
Author

Thanks for the interesting comments @jwakely, C++ is indeed quite creepy sometimes...

But what about this perfectly fine code?
printf("%s", f2().c_str());

I believe that a safe language, or at least some safe guidelines, should not consider this as a safe and valid code... I think this only works because of lifetime extension, as a corner case, not some regular case (although I'm not fully certain of this).
Wouldn't it be better to force the user to first store the object from f2() locally before using some dangerously dangling c_str() function? Maybe that's the price for safety. And it would only occur with c_str(), not .length() and other less-dangerous functions.

Maybe I'm missing something, but your overload set doesn't allow calling c_str() on a const lvalue, and doesn't seem to have any benefit compared to this (which has been possible since C++11):

It's interesting this && notation after function from C++11, I didn't know about it before!
However, I could not make this overload work:

  const char *c_str() const { return ptr.get(); }
  const char *c_str() && = delete; // Error: Cannot overload a member function with ref-qualifier '&&' with a member function without a ref-qualifierclang(ref_qualifier_overload)

If there's a way to do that on C++11 it would be much better than depending on c++23 features indeed.

@jwakely
Copy link
Contributor

jwakely commented May 28, 2025

Thanks for the interesting comments @jwakely, C++ is indeed quite creepy sometimes...

But what about this perfectly fine code?
printf("%s", f2().c_str());

I believe that a safe language, or at least some safe guidelines, should not consider this as a safe and valid code... I think this only works because of lifetime extension, as a corner case, not some regular case (although I'm not fully certain of this).

No, it is perfectly regular and it has nothing to do with lifetime extension. It's literally just normal lifetime rules, no extension involved whatsoever.

The temporary returned by f2() is alive until the end of the full expression (i.e. the ; effectively), so it is not destroyed until after printf has returned. So the pointer returned by c_str() is valid for as long as it needs to be.

Wouldn't it be better to force the user to first store the object from f2() locally before using some dangerously dangling c_str() function?

No.

There is nothing wrong with using temporaries this way, the language guarantees that it's safe.

For another example:

void use_string(std::string_view);

use_string(f1());

Should this be prevented? The function creates a string_view that refers to the same pointer as f1().c_str(). It's guaranteed to work because the std::string returned by f1() remains valid until after use_string returns. Why should a local variable by needed to make this "safe"?

std::string useless = f1();
use_string(useless);

Maybe that's the price for safety. And it would only occur with c_str(), not .length() and other less-dangerous functions.

c_str() is not dangerous when used on an object that is within its lifetime. It's only a problem when the pointer it returns outlives the object that it points to. That doesn't happen in the examples above.

Maybe I'm missing something, but your overload set doesn't allow calling c_str() on a const lvalue, and doesn't seem to have any benefit compared to this (which has been possible since C++11):

It's interesting this && notation after function from C++11, I didn't know about it before! However, I could not make this overload work:

  const char *c_str() const { return ptr.get(); }
  const char *c_str() && = delete; // Error: Cannot overload a member function with ref-qualifier '&&' with a member function without a ref-qualifierclang(ref_qualifier_overload)

If there's a way to do that on C++11 it would be much better than depending on c++23 features indeed.

The first one should be const& qualified (although I think compilers should accept it without that, but none of them implement that rule yet):

const char *c_str() const& { return ptr.get(); }
const char *c_str() && = delete; 

@igormcoelho
Copy link
Author

igormcoelho commented May 28, 2025

The first one should be const& qualified (although I think compilers should accept it without that, but none of them implement that rule yet):

Thanks for that, now this works on C++11!

Regarding the examples, it's getting even more interesting with std::string_view, that's also kind of flawed too.

Should this be prevented?

Note that the example you mentioned involves passing a std::string to std::string_view (not .c_str()), but even in this case, Yes, it must be prevented, if following the same reasoning of this post:

void use_string(std::string_view);
use_string(f1()); // this is bad!

This badness is of a similar nature as the one mentioned in the guidelines:

What to do with leaks out of temporaries? : p = (s1 + s2).c_str();

This badness looks like this one:

 std::string s1{"X"};
 std::string s2{"Y"};
 std::string_view sv3{s1 + s2}; // this is bad!

Note that current linter, at least for clang, it cannot detect this error.
To better understand the implications of this, I extended an example to another class called MyStringView:

struct MyStringView {
  const char *ptr;

  MyStringView(const char *_ptr) : ptr{_ptr} {}

  MyStringView(MyStringView &&other) : ptr{other.ptr} {}

  MyStringView(MyStringView &other) : ptr{other.ptr} {}

  MyStringView(MyString &&other) = delete; // evil
  // MyStringView(MyString &&other) : ptr{other.ptr.get()} {}

  MyStringView(MyString &other) : ptr{other.ptr.get()} {}

  friend std::ostream &operator<<(std::ostream &os, const MyStringView &me) {
    os << me.ptr;
    return os;
  }
};

My conclusion is that std::string_view should also never build upon a dangling std::string&&, as this is also risky:

MyStringView(MyString &&other) = delete; // evil

So, another conclusion could be that we should never create views upon dangling objects, only using them to std::move.
As a complete example (since it took some time for me to build it):

struct MyString {
  std::unique_ptr<char[]> ptr;
  int n;
  MyString(const MyString &other) {
    this->n = other.n;
    ptr = std::make_unique<char[]>(n + 1);
    std::strcpy(ptr.get(), other.ptr.get());
  }

  explicit MyString(const char *s) {
    this->n = std::strlen(s);
    ptr = std::make_unique<char[]>(n + 1);
    std::strcpy(ptr.get(), s);
  }

  // const char *c_str(this const MyString &self) { return self.ptr.get(); }
  // const char *c_str(this MyString &&) = delete;
  const char *c_str() const & { return ptr.get(); }
  const char *c_str() && = delete;

  int length() const { return n; }

  MyString &operator+=(const MyString &str) {
    this->append(str);
    return *this;
  }

  MyString &append(const MyString &str) {
    auto new_uptr = std::make_unique<char[]>(n + str.n + 1);
    std::strcpy(new_uptr.get(), ptr.get());
    std::strcpy(new_uptr.get() + n, str.ptr.get());
    this->n = n + str.n + 1;
    this->ptr = std::move(new_uptr);
    return *this;
  }
};

MyString operator+(const MyString &lhs, const MyString &rhs) {
  MyString result{lhs}; // copy lhs
  result += rhs;        // append rhs
  return result;
}

This correctly fails to compile:

  MyString ms1{"X"};
  MyString ms2{"Y"};
  MyString ms3 = ms1 + ms2;
  std::cout << ms3.c_str() << std::endl;
  MyStringView msv1{ms1};       // ok
  MyStringView msv3{ms1 + ms2}; // error
  std::cout << msv3 << std::endl;

In short, two sides of the same coin:
1.Never leak internals of temporaries when in rvalue&& method
2. Never build views upon internals of temporary rvalue&& objects

Finally,

It's only a problem when the pointer it returns outlives the object that it points to.

Surely, this is a trade between ergonomics and safety, so a warning is typically better than an error, unless the developer does not make mistakes of this kind. This is a common error. Maybe newer library developers can use these advices to enforce future safer code, not breaking existing std::string and std::string_view, which is likely impossible to change, but at least for simpler cases of views not risking the same problems that currently exist on std::string_view and related types.

@jwakely
Copy link
Contributor

jwakely commented May 28, 2025

But value category is not lifetime

@igormcoelho
Copy link
Author

igormcoelho commented May 28, 2025

This text is very interesting, thanks for sharing! Almost the same discussion here... author describes a String class containing a std::string and an operator std::string_view, and a TokenIterator class containing a const char*. Then it realizes that disabling operator std::string_view && resolves some issues (the same as here). Then it disables String&& constructor at TokenIterator to prevent taking a view from a corpse (same as here). I only don't understand why he went so far, to just label it "The unfortunate “solution”", since it actually resolved the issues above, same as here.
To show that "it doesn't work", then he used an obviously bad approach, which is to use const std::string& instead of a std::string_view, and then the code breaks! But code didn't break because of the prohibited && operations, it broke because const std::string& is known to be bad!

Even here in the guidelines this is enforced:

  • SL.str.2: Use std::string_view or gsl::span to refer to character sequences

http://isocpp.github.io/CppCoreGuidelines/CppCoreGuidelines#Rstr-view

  • F.16: For “in” parameters, pass cheaply-copied types by value and others by reference to const

http://isocpp.github.io/CppCoreGuidelines/CppCoreGuidelines#f16-for-in-parameters-pass-cheaply-copied-types-by-value-and-others-by-reference-to-const

However, the example itself should be improved, IMHO, because it contradicts itself:

void f1(const string& s);  // OK: pass by reference to const; always cheap
void f2(string s);         // bad: potentially expensive
void f3(int x);            // OK: Unbeatable
void f4(const int& x);     // bad: overhead on access in f4()

Why is "const string&" considered OK here, instead of std::string_view, since std::string_view is obviously cheap and value-based?

Finally, since passing "const string&" and "const char*" are similar in cost and behave almost the same, std::string_view should replace both... at least string_view is advised to replace "const char*" on C.49: Prefer initialization to assignment in constructors.

At this point, I just assume const std::string& is a bad approach compared to std::string_view, so if std::string_view was used, code would be correct.

--

I don't mean, like the author did, to call it our “dangling-reference-proof” String class, because nothing in C++ can protect us from badness that happens outside our scope. The intention here is to show that it is inherently bad to leak internal references on this&& and to build upon rvalue constructors of other classes to make views.

So, a simple change on the authors' code, according to these Cpp Core Guidelines, easily fixes the issue:

// TokenIterator make_iterator(const std::string &haystack) {
TokenIterator make_iterator(std::string_view haystack) {
  return TokenIterator(haystack);
}

I just added a std::string_view on the constructor of TokenIterator:

struct TokenIterator {
  const char *s_;
  explicit TokenIterator(std::string_view s) : s_(s.data()) {} // CORRECT ONE! 
  // I didn't even remove this bad one below, but C++ is "smart enough" to 
  //      do the right thing, if you at least give it the correct opportunity ;)
  explicit TokenIterator(const std::string &s) : s_(s.data()) {}
  explicit TokenIterator(const std::string &&) = delete;
  friend std::ostream &operator<<(std::ostream &os, const TokenIterator &me) {
    os << me.s_;
    return os;
  }
};

Then test() runs correctly, instead of crashing:

void test() {
  auto tk = make_iterator("hello world");
  std::cout << tk << std::endl; // compiles runs fine
}

Taking view data from a const std::string& is just as bad as taking view data from &&, since both could be dangling anyway.
So, from the text perspective, it's not fair to do some wrong thing and blame another proposal, which is completely uncorrelated with the issue generated in the end.


Just to be crystal clear that the author wrote a very nice text, just to do a silly bug in the end, to try to contradict all the good ideas presented before (and unrelated to the bug). What he wrote, instead of TokenIterator, consider a class SillyBug:

struct SillyBug {
  const char *s_;
  explicit SillyBug(const std::string &s) : s_{s.data()} {}
  friend std::ostream &operator<<(std::ostream &os, const SillyBug &me) {
    os << me.s_;
    return os;
  }
};
//
  SillyBug bug{"bug"};
  std::cout << bug << std::endl; // bad 

This code breaks because of flawed const string& (and wouldn't with std::string_view) and its unrelated to the && approach discussed before... in fact, this bug demonstrates (again!) how dangerous is to construct things based on dangling stuff like const string& or string&&.

@jwakely
Copy link
Contributor

jwakely commented May 29, 2025

This text is very interesting, thanks for sharing! Almost the same discussion here... author describes a String class containing a std::string and an operator std::string_view, and a TokenIterator class containing a const char*. Then it realizes that disabling operator std::string_view && resolves some issues (the same as here). Then it disables String&& constructor at TokenIterator to prevent taking a view from a corpse (same as here). I only don't understand why he went so far, to just label it "The unfortunate “solution”", since it actually resolved the issues above, same as here.

Because the "solution" actually causes other problems, and doesn't solve all lifetime bugs, and that's because value category is not lifetime. It is a mistake to think "lvalue good, rvalue bad". It's not that simple, and trying to solve lifetime issues by using C++ features that can only tell the difference between lvalues and rvalues is misguided. Because value category is not lifetime.

To show that "it doesn't work", then he used an obviously bad approach, which is to use const std::string& instead of a std::string_view, and then the code breaks! But code didn't break because of the prohibited && operations, it broke because const std::string& is known to be bad!

You seem to be ignoring the first problem, that print_it(string_view) no longer works. Because the "solution" prevents using any temporary strings, even when their lifetime is longer than the function invocation that uses the string_view.

Anyway, it's naive to say that const std::string& is "bad", that's far too simplistic. What if I have an API that requires a null terminated string? Then using std::string_view doesn't work. What if I need to call some other API (maybe pre-C++17) that takes a const std::string& argument? Then using std::string_view requires another allocation to create a std::string that copies the string_view, but that might be wasteful if I already have a std::string that can be passed without allocating a new copy.

Even here in the guidelines this is enforced:

* **SL.str.2: Use std::string_view or gsl::span to refer to character sequences**

But also SL.str.1: Use std::string to own character sequences.

At this point, I just assume const std::string& is a bad approach compared to std::string_view, so if std::string_view was used, code would be correct.

No, the TokenIterator would still have all the same issues with dangling.

I don't mean, like the author did, to call it our “dangling-reference-proof” String class, because nothing in C++ can protect us from badness that happens outside our scope. The intention here is to show that it is inherently bad to leak internal references on this&& and to build upon rvalue constructors of other classes to make views.

So, a simple change on the authors' code, according to these Cpp Core Guidelines, easily fixes the issue:

// TokenIterator make_iterator(const std::string &haystack) {
TokenIterator make_iterator(std::string_view haystack) {
  return TokenIterator(haystack);
}

I just added a std::string_view on the constructor of TokenIterator:

struct TokenIterator {
  const char *s_;
  explicit TokenIterator(std::string_view s) : s_(s.data()) {} // CORRECT ONE! 

What makes you think this is correct? Previously s_ was guaranteed to point to a null-terminated string, but with this change, it isn't guaranteed. When does the token iterator stop reading from s_?

// I didn't even remove this bad one below, but C++ is "smart enough" to
// do the right thing, if you at least give it the correct opportunity ;)
explicit TokenIterator(const std::string &s) : s_(s.data()) {}
explicit TokenIterator(const std::string &&) = delete;
friend std::ostream &operator<<(std::ostream &os, const TokenIterator &me) {
os << me.s_;
return os;
}
};


Then test() runs correctly, instead of crashing:

For one specific, trivial example. But your change doesn't help at all for plenty of other uses:

auto make_foo_iterator() {
  std::string s = "foo bar baz";
  return TokenIterator(s);
}

Your changes don't help at all here, neither does preventing construction from a temporary string, because there is no temporary here, and neither does construction from string_view:

auto make_foo_iterator() {
  std::string s = "foo bar baz";
  return TokenIterator(std::string_view(s)); // still bad
}
auto make_foo_iterator() {
  char s[] = "foo bar baz";
  return TokenIterator(std::string_view(s)); // also still bad
}

Taking view data from a const std::string& is just as bad as taking view data from &&, since both could be dangling anyway.

And so can a string_view.

@igormcoelho
Copy link
Author

igormcoelho commented May 29, 2025

What makes you think this is correct? Previously s_ was guaranteed to point to a null-terminated string, but with this change, it isn't guaranteed. When does the token iterator stop reading from s_?

The author didn't even implement the complete TokenIterator class...

 class TokenIterator {
    const char *s_;
public:
    explicit TokenIterator(const std::string& s) : s_(s.data()) {}
    auto& operator++() {
        // complicated logic to advance `s_` through the string
    }
};

If the size is necessary here, it can be used as a marker as well.

Anyway, it's naive to say that const std::string& is "bad", that's far too simplistic.

There are differences of course to a string_view, but in the context of this proposal is clear that const string& and string&& have equivalent problems, that's what I'm pointing out here... it makes no sense to block string&& while allowing const string& to do exactly the same damage.

You seem to be ignoring the first problem, that print_it(string_view) no longer works. Because the "solution" prevents using any temporary strings, even when their lifetime is longer than the function invocation that uses the string_view.

Note that the authors considers this "reasonable", as I don't, as pointed out above... I'll copy from the article:

void print_it(std::string_view param) {
    std::cout << param << std::endl;
}

void test() {
    String hello("hello");
    String world("world");
    print_it(hello + world);
}
/*
See, here the expression hello + world yields a temporary String object, 
and we’re trying to pass that String to a function that takes a string_view parameter. 
There’s nothing intrinsically wrong with this code. */

As pointed out above there is something intrinsically wrong with this code :

 std::string s1{"X"};
 std::string s2{"Y"};
 std::string_view sv3{s1 + s2}; // this is bad!

And so can a string_view. (dangle)

It can dangle only if its input is already dangling, but it cannot do what SillyBug does... promoting a fine static/global const char* into a dangling const string& then into some "valid" local std::string&&, to just take a view from it again "as it was" good, and then crash. This is specifically caused by const string& interaction, not with value-type string_view.

Your changes don't help at all here, neither does preventing construction from a temporary string, because there is no temporary here, and neither does construction from string_view: ... std::string s = "foo bar baz"; return TokenIterator(std::string_view(s)); // still bad

My proposed changes cannot solve this, you are correct, but it is not meant to solve all problems (since this case the string would not be a string&&). But it does solve the known problem also pointed out in the guidelines, which is a common problem:

What to do with leaks out of temporaries? : p = (s1 + s2).c_str();

That's the focus of the proposal. And I agree with you that some valid ergonomics is lost, specially this one: printf("%s", f2().c_str()); that good programmers will know that will work... but if forcing this to become a local variable allows reducing problems, I think it's a good tradeoff, specially on a language that faces increasing criticism for lack of safety.

It's hard to change standard stuff like std::string and std::string_view, but maybe, at least this could help issuing Warnings for users that they should pay attention to these situations, as currently the linters are unable to.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants