We all know that
String is immutable in Java, but check the following code:
String s1 = "Hello World"; String s2 = "Hello World"; String s3 = s1.substring(6); System.out.println(s1); // Hello World System.out.println(s2); // Hello World System.out.println(s3); // World Field field = String.class.getDeclaredField("value"); field.setAccessible(true); char value = (char)field.get(s1); value = 'J'; value = 'a'; value = 'v'; value = 'a'; value = '!'; System.out.println(s1); // Hello Java! System.out.println(s2); // Hello Java! System.out.println(s3); // World
Why does this program operate like this? And why is the value of
s2 changed, but not
String is immutable* but this only means you cannot change it using its public API.
What you are doing here is circumventing the normal API, using reflection. The same way, you can change the values of enums, change the lookup table used in Integer autoboxing etc.
Now, the reason
s2 change value, is that they both refer to the same interned string. The compiler does this (as mentioned by other answers).
s3 does not was actually a bit surprising to me, as I thought it would share the
value array (it did in earlier version of Java, before Java 7u6). However, looking at the source code of
String, we can see that the
value character array for a substring is actually copied (using
Arrays.copyOfRange(..)). This is why it goes unchanged.
You can install a
SecurityManager, to avoid malicious code to do such things. But keep in mind that some libraries depend on using these kind of reflection tricks (typically ORM tools, AOP libraries etc).
*) I initially wrote that
Strings aren’t really immutable, just “effective immutable”. This might be misleading in the current implementation of
String, where the
value array is indeed marked
private final. It’s still worth noting, though, that there is no way to declare an array in Java as immutable, so care must be taken not to expose it outside its class, even with the proper access modifiers.
As this topic seems overwhelmingly popular, here’s some suggested further reading: Heinz Kabutz’s Reflection Madness talk from JavaZone 2009, which covers a lot of the issues in the OP, along with other reflection… well… madness.
It covers why this is sometimes useful. And why, most of the time, you should avoid it. 🙂
In Java, if two string primitive variables are initialized to the same literal, it assigns the same reference to both variables:
String Test1="Hello World"; String Test2="Hello World"; System.out.println(test1==test2); // true
That is the reason the comparison returns true. The third string is created using
substring() which makes a new string instead of pointing to the same.
When you access a string using reflection, you get the actual pointer:
Field field = String.class.getDeclaredField("value"); field.setAccessible(true);
So change to this will change the string holding a pointer to it, but as
s3 is created with a new string due to
substring() it would not change.
You are using reflection to circumvent the immutability of String – it’s a form of “attack”.
There are lots of examples you can create like this (eg you can even instantiate a
Void object too), but it doesn’t mean that String is not “immutable”.
There are use cases where this type of code may be used to your advantage and be “good coding”, such as clearing passwords from memory at the earliest possible moment (before GC).
Depending on the security manager, you may not be able to execute your code.
You are using reflection to access the “implementation details” of string object. Immutability is the feature of the public interface of an object.
Visibility modifiers and final (i.e. immutability) are not a measurement against malicious code in Java; they are merely tools to protect against mistakes and to make the code more maintainable (one of the big selling points of the system). That is why you can access internal implementation details like the backing char array for
Strings via reflection.
The second effect you see is that all
Strings change while it looks like you only change
s1. It is a certain property of Java String literals that they are automatically interned, i.e. cached. Two String literals with the same value will actually be the same object. When you create a String with
new it will not be interned automatically and you will not see this effect.
#substring until recently (Java 7u6) worked in a similar way, which would have explained the behaviour in the original version of your question. It didn’t create a new backing char array but reused the one from the original String; it just created a new String object that used an offset and a length to present only a part of that array. This generally worked as Strings are immutable – unless you circumvent that. This property of
#substring also meant that the whole original String couldn’t be garbage collected when a shorter substring created from it still existed.
As of current Java and your current version of the question there is no strange behaviour of
String immutability is from the interface perspective. You are using reflection to bypass the interface and directly modify the internals of the String instances.
s2 are both changed because they are both assigned to the same “intern” String instance. You can find out a bit more about that part from this article about string equality and interning. You might be surprised to find out that in your sample code,
s1 == s2 returns
Which version of Java are you using? From Java 1.7.0_06, Oracle has changed the internal representation of String, especially the substring.
Quoting from Oracle Tunes Java’s Internal String Representation:
In the new paradigm, the String offset and count fields have been removed, so substrings no longer share the underlying char  value.
With this change, it may happen without reflection (???).
There are really two questions here:
- Are strings really immutable?
- Why is s3 not changed?
To point 1: Except for ROM there is no immutable memory in your computer. Nowadays even ROM is sometimes writable. There is always some code somewhere (whether it’s the kernel or native code sidestepping your managed environment) that can write to your memory address. So, in “reality”, no they are not absolutely immutable.
To point 2: This is because substring is probably allocating a new string instance, which is likely copying the array. It is possible to implement substring in such a way that it won’t do a copy, but that doesn’t mean it does. There are tradeoffs involved.
For example, should holding a reference to
reallyLargeString.substring(reallyLargeString.length - 2) cause a large amount of memory to be held alive, or only a few bytes?
That depends on how substring is implemented. A deep copy will keep less memory alive, but it will run slightly slower. A shallow copy will keep more memory alive, but it will be faster. Using a deep copy can also reduce heap fragmentation, as the string object and its buffer can be allocated in one block, as opposed to 2 separate heap allocations.
In any case, it looks like your JVM chose to use deep copies for substring calls.
To add to the @haraldK’s answer – this is a security hack which could lead to a serious impact in the app.
First thing is a modification to a constant string stored in a String Pool. When string is declared as a
String s = "Hello World";, it’s being places into a special object pool for further potential reusing. The issue is that compiler will place a reference to the modified version at compile time and once the user modifies the string stored in this pool at runtime, all references in code will point to the modified version. This would result into a following bug:
There was another issue I experienced when I was implementing a heavy computation over such risky strings. There was a bug which happened in like 1 out of 1000000 times during the computation which made the result undeterministic. I was able to find the problem by switching off the JIT – I was always getting the same result with JIT turned off. My guess is that the reason was this String security hack which broke some of the JIT optimization contracts.
According to the concept of pooling, all the String variables containing the same value will point to the same memory address. Therefore s1 and s2, both containing the same value of “Hello World”, will point towards the same memory location (say M1).
On the other hand, s3 contains “World”, hence it will point to a different memory allocation (say M2).
So now what’s happening is that the value of S1 is being changed (by using the char [ ] value). So the value at the memory location M1 pointed both by s1 and s2 has been changed.
Hence as a result, memory location M1 has been modified which causes change in the value of s1 and s2.
But the value of location M2 remains unaltered, hence s3 contains the same original value.
The reason s3 does not actually change is because in Java when you do a substring the value character array for a substring is internally copied (using Arrays.copyOfRange()).
s1 and s2 are the same because in Java they both refer to the same interned string. It’s by design in Java.
String is immutable, but through reflection you’re allowed to change the String class. You’ve just redefined the String class as mutable in real-time. You could redefine methods to be public or private or static if you wanted.
The sin is the line
field.setAccessible(true); which says to violate the public api by allowing access to a private field. Thats a giant security hole which can be locked down by configuring a security manager.
The phenomenon in the question are implementation details which you would never see when not using that dangerous line of code to violate the access modifiers via reflection. Clearly two (normally) immutable strings can share the same char array. Whether a substring shares the same array depends on whether it can and whether the developer thought to share it. Normally these are invisible implementation details which you should not have to know unless you shoot the access modifier through the head with that line of code.
It is simply not a good idea to rely upon such details which cannot be experienced without violating the access modifiers using reflection. The owner of that class only supports the normal public API and is free to make implementation changes in the future.
Having said all that the line of code is really very useful when you have a gun held you your head forcing you to do such dangerous things. Using that back door is usually a code smell that you need to upgrade to better library code where you don’t have to sin. Another common use of that dangerous line of code is to write a “voodoo framework” (orm, injection container, …). Many folks get religious about such frameworks (both for and against them) so I will avoid inviting a flame war by saying nothing other than the vast majority of programmers don’t have to go there.
Strings are created in permanent area of the JVM heap memory. So yes, it’s really immutable and cannot be changed after being created.
Because in the JVM, there are three types of heap memory:
1. Young generation
2. Old generation
3. Permanent generation.
When any object are created, it goes into the young generation heap area and PermGen area reserved for String pooling.
Here is more detail you can go and grab more information from:
How Garbage Collection works in Java .
You can get a clear view behind the question “Why the String class is designed to be immutable” by reading the reason in detail from here
Exploring the String class would get you a clear view on how it is designed to become immutable Click Here to Explore the String Class