Dynamic data and SQL statements Published the 2019-06-02 > Edit: Added a more complete guide to proper anti-injection measures, > thanks to [Tari R. Alfaro's comment](https://dev.to/tarialfaro/comment/bfp9). We often need to make SQL requests to work with dynamically-provided content. For that, there is the "prepare" mechanism. From the [PHP documentation](https://www.php.net/manual/en/pdo.prepared-statements.php), it allows one to "prepare" SQL requests. This is not only provided by PDO, virtually every SQL tool have prepared statements, as "prepare" is a standard RDBMS mechanism. [prepared statements]: https://phpdelusions.net/sql_injection > If you want a more in-depth explanation of "What are prepared statements", > make sure to check out [this article][prepared statements]. ## Bad idea: directly insert dynamic data in a SQL request As seen in the `htmlspecialchars` example, there's lots of occurences on which we see dynamically-inserted data (like the example below). ```php query("SELECT * FROM users WHERE username = $user"); ``` This creates a few issues. - RDBMS won't be able to properly optimize the request - They also won't be able to pre-validate the content type of the field - This allows for very easy [SQL injections](https://en.wikipedia.org/wiki/SQL_injection). ## Good idea: Using the prepare mechanism to securely and efficiently pass dynamic data *I won't go into detail on how preparing statements is a benefit, see [the article linked above][prepared statements] for that.* Preparing statements is a very easy thing to do. - Create a request with placeholders instead of your values. Documentation for your SQL library will give you the placeholders to use. - Execute that request, passing data that should be used instead of those placeholders. Another approach is to manually bind each value before executing. In PHP, both approaches are very simple. ```php prepare("SELECT * FROM users WHERE username = ?"); $req->execute([$username]); // Example with execute-time data passing and named placeholder $req = $db->prepare("SELECT * FROM users WHERE username = :username"); $req->execute([ 'username' => $username ]); // Example with manual binding and named placeholder before execute $req = $db->prepare("SELECT * FROM users WHERE username = :username"); $req->bindParam('username', $username); $req->execute(); ``` ### Additional steps to go through The part above this one will give you a good base to work on, but to really make things as foolproof as possible, we need to have a few other tweaks and bits. #### Preparation emulation First of all, and because the preparation mechanism is a real *database construct*, we need to disable what is called "emulation" (which consists in the PDO library simulating the preparation mechanism, for DBMSs that don't have a decent preparation mechanism). To *do* that, we need to set a PDO configuration, `PDO::EMULATE_PREPARES`, to `false`. ```php $db->setAttribute(PDO::EMULATE_PREPARES, false); ``` #### Data validation A golden rule of data handling is "never trust the user". To properly handle form submission, you need another step *before* trying to even imagine inserting data into your database: validation. You won't "format" data, you won't change anything, but, for every bit of info that you received, you'll take it, and compare it against a set of rules, to make sure everything is as expected. Sounds complicated? It isn't. For a native PHP only solution, you have the `filter_var` method to work with. [validate]: https://www.php.net/manual/en/filter.filters.validate.php As the [documentation](https://www.php.net/manual/en/function.filter-var.php) shows, you have [*a lot* of different filters and rules][validate] you can use to make sure that you are receiving data you expected. Too bothersome? There are *a lot* of libraries that can greatly simplify that for you, like [this library (`siriusphp/validation`)](https://github.com/siriusphp/validation) . #### Final note for MySQL Remember, folks, that if you want to store UTF-8-encoded data in your MySQL DBMS, you need to use the type `utf8mb4`, which is the *real* UTF-8 type, instead of using `utf8`, which **is not** the real UTF-8. The `utf8` format is only encoded on 3 bytes, instead of 4, which excludes *a lot* of characters.