Is it just me, or does it seem like everyone who develops a programming lnaguage puts in specific pieces of syntactic sugar just to allow people to debate obscura?

Recently a post has been making the rounds that explains exactly what ||= does in Rails, and the long and short of it is that it is the same as typing a || a = b.

I’m not really against sugar by itself, but why does it have to be so hard to understand the memory allocation of high level languages? There’s a small but significant difference in a || a = b and a = a || b, and most people were using the functionality without knowing which one they were getting (though they were getting the more performant of the two).

Now maybe, you’d argue, we don’t need to understand how memory is allocated as part of a high level language. I disagree, but I can see why people would argue it.

So what are the things that I wish people knew about their memory allocation in ruby?

String Concatenation: You’re doing it wrong

+= is the devil. That’s right, I’m calling out everyone (including myself) who has ever used += to do string concatenation. Why? Well let’s do a little experiment. If you don’t already know what the results are going to be, play along with me and start up IRB.

s = “john”

s.object_id

s += “smith”

s.object_id

You’ve probably figured out where I’m going with this already. += creates a new string containing “johnsmith”. You’re not concatenating (truly) you’re copying the object out and modifying its copy.

But the real surprise?

Okay, so I started this by saying there’s a big difference in a ||= b vs. a = a || b and that is only sort of true. Yes you’re calling a setter every time (setting a to a) in the second example but did you know redeclaring a variable to itself does not reinitialize it in memory?

And did you know that when setting a variable to another variable, you’re actually creating a pointer to the original object in memory?

Let’s run another experiment.

s = “John”

s.object_id

d = s

d.object_id

See that? Good, now, need some real proof on how handling these objects in memory can matter to you?

s << “smith”

puts s

puts d

Yeah, so, when I said you were doing concatenation wrong, I meant it. There’s concatenation and there’s copying. Sometimes you want to copy, so make sure which one you mean to use.

Using Dup:

So what about when you need (or want) an exact copy¬†of an object, and not a pointer to the original object? That’s where a method called Dup comes into play.

Same basic experiment.

s = “john”

s.object_id

d = s.dup

d.object_id

s << “smith”

puts d

As you can see, dup created a new in memory copy of our string, which meant that when we modified the string we left the copy unmolested. This is especially important if you need to heavily modify a string or array but want to maintain an original copy of the data used.

This isn’t just about strings

Arrays, Hashes, Strings, Integers. These can all be acted on both as single objects or as dups. Because dup hangs off of object absolutely everything in ruby implements it. Every single time you call a setter a decision is made behind the scenes on whether you’re making a copy or pointing to the original, and without understanding which is happening it is very easy to get lost.

s = “john”

d = s.reverse

I just created a copy of my original string in d, which is reversed.

s = “john”

d = s.reverse!

And now I’ve created a pointer, so that acting to modify d will also modify s, and I’ve reversed my string.

Oh But the Fun Doesn’t Stop

Gotchas are always my favorite part. They’re when you think you’ve got something worked out, but you really don’t.

s = ["john"]

s.object_id

d = s.dup

d.object_id

s[0].reverse!

puts d

I bet you thought that just because you had a copy of the array, you also had a copy of the things inside that array. Nope, you had a new array object containing the same in-memory objects. How’s that for unexpected?

This means you can modify the array s (add items, remove items) without affecting array d, but if you modify objects within that array, you modify it within both both a and b.

Because in the end, it’s shallow

Dup in this case performs what’s called a shallow copy, copying only the array structure into a new array object while. To perform a deep copy we need to resort to the far less performant Marshal command.

s = ["John"]

d = Marshal.load(Marshal.dump(s))

s[0].object_id

d[0].object_id

Now Marshal is essentially data serialization and deserialization, and there are a lot of reasons not to use it, performance being a big one. However; so far as I know it is the easiest way to create a deep copy like what we were looking for here.

Conclusion

Now I haven’t touched on clone (copying tainted and frozen states) or any of the sometimes significant memory management issues that can occur when dealing with various types of collections, and there’s enough material there to fill a few pages at least. For that I’ll leave it to others who are far more versed in the black arts of object states.

It’s enough to point out that memory management details, like how we’re referencing objects in memory and how you use them, are very much a concern for any developer. Even a developer who spends his or her days in ruby.