PHP, Hell or Heaven: Data types

Hell or Heaven?
In PHP there are multiple data types but the main difference between other languages (especially more low level languages such as C++) is that you can assign any value to any variable. This is also called type juggling. Therefore in PHP the type of an variable is defined by its content and not by the developer or by the way it’s been set-up. For example, you could do the following in PHP:
$var = 1; $var = 'a'; $var = 3.0;
You should note that the type changes in the example above. So the type will change from an integer to a string and finally to a float. In other languages you have to define a type which can’t and will not change. Now to get back to the topic (is PHP hell or heaven?) we have to answer a very important question: do we like this?
For beginners I think it’s great. You can start with learning basic skills such as control structures and don’t have to worry about the types of variables. On the other hand more restrictive rules will (possibly) make you a better programmer. When you can’t juggle with types, you won’t. Anyway when you are getting better at PHP you probably think about what you assign to variables so it becomes a bit less important. For instance I use phpDoc for every large project I work on so I define types for variables like this:
/** * Will output name and value. * @param string $name * @param int $value * @return void */ function test($name, $value) { echo $name . ': ' . $value; }
As you see I wrote the variable types in my docblock. Nevertheless nothing is stopping me from using other types. I could just run test(23, 1). A possible solution to this is using assertions (see also Assertions, protection against yourself). So, if we combine phpDoc with assertions we would get:
/** * Will output name and value. * @param string $name * @param int $value * @return void */ function test($name, $value) { assert('is_string($name) && is_int($value)'); echo $name . ': ' . $value; }
Above we have a function that only accepts variables of a defined type. That is good, if we accidentally used wrong values we will get an error. That is also very useful for people who don’t know how your code works but (have to) use it. Then again if we look at another language (in this example C++) we see how it can be done much and much better (and also easier):
void test(string name, int value) { cout < < name << ": " << value; }
Combining all these facts I think PHP fails in this case. I like being strict and better type handling would improve that. We will also be relieved from writing all kinds of value checks which are boring..
Is there a solution? Yes and no. Since PHP5 you can use type hinting for arrays and objects. In the header of the function you can define for only these two types what the value of the parameter must be:
function example1(array $param1) { } function example2(Class_Of_Some_Kind $param1) { }
The above is a good start (I use it very often) but you cannot yet use it for integers, floats and other types. For those types we have to do something else. I just recently found an experimental extension that gives more opportunities for better type handling: SPL Type Handling. This extension offers five objects you can use for type handling:
$integer = new SplInt(12); $float = new SplFloat(0.2); $boolean = new SplBool(true); $string = new SplString('hello');
The fifth object is SplEnum and is abstract. You can only use this object if you extend it. For each of these objects applies: if you try to assign an invalid value an UnexpectedValueException will be thrown. I have not yet used these new objects in one of my projects but I am definitely going to check them out. I don't think they are useful for all variables but they could be a nice addition to models. You can find more information about the usage of these objects on the documentation page:
http://php.net/manual/en/book.spl-types.php
As I said, I haven't yet used them so I can't say anything about performance. If they are slow or consume to much memory they are useless. When I'll know more I will post my findings here. For now I think I've told you enough about type handling in PHP. It is time for you to decide! What do you think? Do you like the way types are handled in PHP? Is type juggling useful? Or is it like hell? What do you think about this new extension? Let me know
Tags: assertions, c++, data, hell or heaven, pec, PHP, spl, type juggling, types, variables
I’m going to be a comment whore today as type handling is such a contentious area in PHP.
Most of the time, PHP’s type handling is irrelevant to me as I cast everything I set that will need a comparison in future, and cast when comparing as well:
$i = (int) $var;
if ((int) $i === 1) blah();
Over-cautious? Probably, but as a beginner, I was bitten by and subsequently shy away from misnomers that occur when false == 0 (I rarely rarely use the equality comparison operator because it can produce results that I won’t expect!).
As you say, I think PHP is pretty good for beginners for this reason, but it can produce confusing and unexpected results if you’re inexperienced. I personally think that the flexibility of type juggling is a great strength of PHP’s – because you can never rely on the input of users, and the last thing you want is a variable expecting and integer, but crashing because a user inputs a string and you didn’t expect it!
On that last part I partially disagree. If you add an try-catch-block you could easily process errors, don’t you think? And then, when you don’t get any errors, you will know for sure that the type of an variable is what you expect.
That is also the power of these new objects, they are in my opinion not meant for validation but for storage. So if you really want a integer to use in a function you could do this:
function test(SplInt $var) { }
Nevertheless you’re absolutely right that you shouldn’t use them for user input, I would first validate them with Zend_Filter or something like that.
Anyway, for a person like me who creates almost only PHP CLI applications and likes OOP they could be useful (and fun).
No, absolutely. What I was saying is that because PHP doesn’t care what type a variable is, it can be both a blessing and a curse. Obviously, you’re correct – the logical way to handle things like this would be through exceptions and filtering in most cases (a lot of the time, it’s relatively easy to assume a numeric value where a user has entered a string, for example). One thing I would say, however, is that exception throwing and handling can also be a bit of an odd concept to get your head round (speaking as someone who cut their teeth on procedural PHP as an introduction to PHP and programming in general, then made the way to OOP), but that’s not to say that it’s impossible and not totally worth it!!
I can’t say I don’t envy you for mostly writing CLI stuff – I love scripting with PHP on CLI. Have you done anything using compiled PHP in that context? Would love to read about that.
Writing something about PHP CLI is indeed a good idea. I tested compiling PHP but objects are not yet supported so it didn’t work for me.
However, the newest version of PHP has a new functionality called PHAR files which can be used to ‘compile’ multiple files into one. The downside is that you still have to run that file through:
> php file.phar
I’m going to gather some more resources about PHAR files and compiling and see what comes out
$a = ‘car’; // $a is a string
$a[0] = ‘b’; // $a is still a string
echo $a; // echos ‘bar’
(string)”false” == (int)0 //evaluates to true.
This is why dynamic typing could be considered bad. However, assuming the developer knows the rules of the type juggling of a programming language, this rarely poses a problem, and provides a large degree of freedoms that would only be available via casts otherwise.
Not that forcing casts is a bad thing. Essentially, most scripting languages (PHP is a scripting language) check to see if the data type implied is the same as the data type saved when variable operations happen. If not, some sort of heuristics engine identifies what it is, and makes the cast. Normally, in C++ or any other static data type language, the user would specify the cast.
The issue here is that of speed, in which any scripted language is going to have issues with. Assigning a integer value to an integer variable is extremely fast in C++, because it doesn’t bother to run through checking and casting, as that is checked compile time. In PHP, the same operation would be slower, as the engine has to actually check that the variable being assigned a value really is an integer variable. A further performance hit happens if it is, for example, a string, as the engine has to figure out what’s being assigned, taking into account the context it’s being assigned in and the value being assigned.
None of this prevents or encourages bad programming, in my opinion. A person having issues with type casting in C++ could just cast every variable in every situation to allow compile, but that’s going to lead to horrible bugs and maintainability problems, not to mention a performance hit. You can suck at programming equally in PHP and C++.
All that said, the examples at the top could actually be time saving and useful. Replacing characters in a string by accessing the individual characters by array index can shorten code, producing what one may consider a more elegant solution. Also, evaluation of 0 to false is not necessarily uncommon, as other programming languages return ‘1′ for true when a boolean is accessed as a string or integer, and ‘0′ for false, so why shouldn’t it work the other way? It’s a matter of opinion, really.
In the end, you’ll find that most scripting languages are made to be flexible with variables, pricing programmer productivity as more valuable than performance. This is usually a true fact, except in cases where performance is paramount, which generally has to do with media processing or heavy math, neither of which are usually handled in the domain of scripting languages.