PHP: Bad Idea / Good Idea - Artemis' blog

File-type verification

Artemis — Mon, 25 May 2020 00:00:00 +0200

When allowing users to upload files, it's important to make sure that it's the expected format(s).

There are lots of solutions on Internet, but some are just plain awful, bad practices, and only still around due to misconceptions and bad habits.

Bad idea: Using the extension to check a file's format

For everyone used to knowing that a .jpeg file is an image, .pptx file is a presentation file, and .html is a HTML file, here's a huge news: your computer doesn't give a fuck about the thing at the end of the file.

You can call your image thisisnotanimage.mp4 and it won't change the fact that it is an image.

What does that mean?

Well, simply that if you validate file type by checking the extension, and if someone want to upload, for example, a malicious PHP file, they can simply add an "accepted" extension.

You'd then receive a file named virus.php.png, and you'd gladly accept it!

Solutions based on file extension should simply be dropped.

Good idea: Using the mime-type to check a file's format

The first question to ask is "what does format means?".

Basically, if the file respects a certain structure, which is tied to a certain format specification, it is recognized to be of said format.

The current "best-practice" solution to check this "structure" is the "Mime-type" mechanism.

Basically, a Mime is a set of expected indexes and values for a given format.

For example, a very naïve way to check if an image is a PNG file is to check if the file is at least 4 bytes long, and to check if the first 4 bytes are of value %PNG.

A PNG file being expected to have this format, that means that any file following this format would be considered as a valid PNG file.

Note that it's a dumbed-down rule, to keep the example simple.

So, for PHP, how do you actually check a file's format?

You have several ways, the simplest being mime_content_type($filename);.

As an example, if the file you want to test is available at a path stored in $path, the following code would return the mime-type.


$mimetype = mime_content_type($path);
// The file is a PNG file, the mimetype is image/png

E-mail validation

Artemis — Sun, 03 Nov 2019 00:00:00 +0100

When a user provides an email, how can you be sure that it's a valid email?

The fact that an email is a complicated format can be a pain in the ass, because depending on how you validate the e-mail, you may leave out some users.

Now, for validation, there are two approaches: being lenient and being restrictive.

Bad idea: Using Regexes to validate an e-mail

Using a regex may be one of the most common choices to people that are unaware of problems caused with this approach.

Most regexes found on Internet have as goal to be restrictive: they'll try to match as closely as possible a "common" e-mail, producing a lot of false-negatives.

Good idea: Being lenient, and using e-mail validation instead of filtering

Before the introduction of UTF-8 in domain names, there was some well-tested methods to verify that a string matches the format of an e-mail.

For example, in PHP, the filter_var method is perfect for this need.

But with the diversity of formats, instead of being more and more restrictive, which produces a hell-ish code and more test constraints, why not be more lenient?

The concept is simple: Check that the e-mail contains two strings separated by an @, which kind of look like an e-mail, and directly send this e-mail a confirmation link.

Not only you'll verify if the e-mail is valid, but you'll also manage to check if it's an existing e-mail account!

Global variables

Artemis — Thu, 29 Aug 2019 00:00:00 +0200

Globals are variables defined on the top-level of PHP scripts.

They can be accessed from within a function by explicitly using global $var.

Bad idea: Using global variables and the `global` keyword

Using globals means that your entire code is tied to some top-level variables, which means that:

the variable name cannot change
the variable content can change at any time
you can't have a function working on different instances

An usual example we see is the following.


$db = get_db();

function get_articles() {
    global $db;
    return $db->query("SELECT id, title, author FROM articles")
        ->fetchAll(PDO::FETCH_ASSOC);
}

$articles = get_articles();

Good idea: Using parameters, or even classes

Following the example below, the most direct change you can do is simply passing $db as a parameter.

The snippet in the previous example then becomes the following.


$db = get_db();

function get_articles($db_instance) {
    // Explicitly changed name to show difference between
    // the global $db and the local.
    return $db_instance->query("SELECT id, title, author FROM articles")
        ->fetchAll(PDO::FETCH_ASSOC);
}

$articles = get_articles($db);

However, in this example, we can clearly see that the function will always interact with an instance of our DB class.

That means that, since it works on a live variable, it can become a full class.


class ArticleRepository {
    public function __construct($db) {
        $this->db = $db;
    }

    public function get_articles() {
        return $this->db->query("SELECT id, title, author FROM articles")
            ->fetchAll(PDO::FETCH_ASSOC);
    }
}

$repo = new ArticleRepository(get_db());
$articles = $repo->get_articles();

Identification and Authentication

Artemis — Sun, 18 Aug 2019 00:00:00 +0200

Identification is recognizing that a given user is who they claim to be, whereas authentication is actually confirming that what they're saying is right.

Those are two strictly different notions.

An email, or a username, is made to identify a user.
A password is made to authenticate this person.

I'll take the example of a users table with username and password.

Bad idea: Using passwords for identification

When you have your users table in your database, you have their username in clear text, so you can identify them (find the corresponding row associated to them).

What you musn't do, however, is to try to identify them based on their password.

Remember: a password is an authentication mechanism, not an identification one.

The following SQL request to try to log in a user is then inherently wrong.


// $username and $password contains the cleartext username and password
$stmt = $db->prepare(
    "SELECT id FROM users WHERE username = ? AND password = ?");
$stmt->execute([$username, $password]);

Not only it is wrong because it uses the password as an identification mechanism (as opposed to an authentication mechanism), but it also forces the developer to ignore password storing standards, as the stored password will forcibly be stored using an unsuitable, and unsecure, mechanism.

Yes, I'm really hammering the difference between identification and authentication, as it's a core concept here.

Good idea: Only using the identifier for identification

As we saw, a user is identified by his username in our example above.

The good solution is then to try to find a row identified by this username, and then only to verify the password against the stored secure value.

As an example, the snippet below demonstrates a proper mechanism.


// $username and $password contains the cleartext username and password
$stmt = $db->prepare("SELECT id, password FROM users WHERE username = ?");
$row = $stmt->execute([$username])->fetch(PDO::FETCH_ASSOC);
$can_log_in = password_verify($password, $row['password']);

The $can_log_in variable will be set to true if the right username/password combination have been entered, and false otherwise.

Note that for the example's sake, we omit error verification for the request, which obviously shouldn't be done on a real website.

Password storing

Artemis — Mon, 03 Jun 2019 00:00:00 +0200

User authentication is something crucial for every system requiring to have user accounts.

Even (especially) for small websites and businesses, it is critical to always make sure to follow good practices for maximum security.

We're not talking about some hardcore stuff, though, as the OWASP cheatsheet demonstrates.

Bad idea: plain-text storing and unfit mechanisms

Every technique below is a bad idea, resulting in very poor security.

Storing passwords in plain text, is it really necessary to explain?
Ciphering passwords, as a password should never be deciphered.
Using hash mechanisms, as a plain hash algorithm is not made to protect passwords, only to generate a trace of a data (you can throw away md5/sha1).
Changing the encoding of the password, like base 64. You're not protecting anything, it's basically plaintext here.
Stacking hash algorithms together, you'll only augment the collision risk and it's still not made for this purpose.

As a developer, you musn't re-develop security mechanisms like password protection. Home-made security gives you no guarantee that your system is to be trusted, unlike provided and well-audited mechanisms, which are "almost" guaranteed safe.

If you had the technical knowledge to do so in a proper and secure way, you would be working in security anyways!

Good idea: Using the provided mechanisms, or using dedicated libraries

As a golden rule, a secure password should never be seen by anyone.

So, how do you actually do that?

Well, you won't store the password, but a derived value, commonly called a hash (note, as this can be confusing, that the value contains more than the plain hash, a lot more is done behind the scenes).

In PHP, the password_hash function does the job for you.

You'll store the value generated by the following code in database.


// Example
$hashed_value_to_store = password_hash($password, PASSWORD_DEFAULT);

As of today, the best algorithm is PASSWORD_ARGON2ID.

As much as possible, you should follow the following order of preference to choose the algorithm you're gonna use. Note that you can change at any time, it won't break your website.

PASSWORD_ARGON2ID: The best choice, available from 7.3 onwards

PASSWORD_ARGON2I: Second best choice, if you must maintain a legacy system

PASSWORD_DEFAULT: Third best choice, as it'll evolve towards the "current best algorithm" when you'll update

PASSWORD_BCRYPT: Fourth best choice, to avoid if possible (using the first two instead)

But how do you actually verify that the password the user is providing during login is the one they entered during registering?

Before you answer, no, you won't make another password_hash and compare both results.

For that, every library provides a function, and PHP provides the nifty password_verify function.

You can simply use it like the following example.


// Example, password comes from the login form, and hashed_value comes from the
// saved entry in database
$will_be_true_if_matches = password_verify($password, $hashed_value);

Dynamic data and SQL statements

Artemis — Sun, 02 Jun 2019 00:00:00 +0200

Edit: Added a more complete guide to proper anti-injection measures, thanks to Tari R. Alfaro's comment.

We often need to make SQL requests to work with dynamically-provided content.

For that, there is the "prepare" mechanism.

From the PHP documentation, it allows one to "prepare" SQL requests.

This is not only provided by PDO, virtually every SQL tool have prepared statements, as "prepare" is a standard RDBMS mechanism.

If you want a more in-depth explanation of "What are prepared statements", make sure to check out this article.

Bad idea: directly insert dynamic data in a SQL request

As seen in the htmlspecialchars example, there's lots of occurences on which we see dynamically-inserted data (like the example below).


// Example
$user = $_POST['username'];
$db->query("SELECT * FROM users WHERE username = $user");

This creates a few issues.

RDBMS won't be able to properly optimize the request
They also won't be able to pre-validate the content type of the field
This allows for very easy SQL injections.

Good idea: Using the prepare mechanism to securely and efficiently pass dynamic data

I won't go into detail on how preparing statements is a benefit, see the article linked above for that.

Preparing statements is a very easy thing to do.

Create a request with placeholders instead of your values. Documentation for your SQL library will give you the placeholders to use.
Execute that request, passing data that should be used instead of those placeholders. Another approach is to manually bind each value before executing.

In PHP, both approaches are very simple.


// Example with execute-time data passing and unnamed placeholder
$req = $db->prepare("SELECT * FROM users WHERE username = ?");
$req->execute([$username]);
// Example with execute-time data passing and named placeholder
$req = $db->prepare("SELECT * FROM users WHERE username = :username");
$req->execute([
    'username' => $username
]);
// Example with manual binding and named placeholder before execute
$req = $db->prepare("SELECT * FROM users WHERE username = :username");
$req->bindParam('username', $username);
$req->execute();

Additional steps to go through

The part above this one will give you a good base to work on, but to really make things as foolproof as possible, we need to have a few other tweaks and bits.

Preparation emulation

First of all, and because the preparation mechanism is a real database construct, we need to disable what is called "emulation" (which consists in the PDO library simulating the preparation mechanism, for DBMSs that don't have a decent preparation mechanism).

To do that, we need to set a PDO configuration, PDO::EMULATE_PREPARES, to false.

$db->setAttribute(PDO::EMULATE_PREPARES, false);

Data validation

A golden rule of data handling is "never trust the user".

To properly handle form submission, you need another step before trying to even imagine inserting data into your database: validation.

You won't "format" data, you won't change anything, but, for every bit of info that you received, you'll take it, and compare it against a set of rules, to make sure everything is as expected.

Sounds complicated? It isn't.

For a native PHP only solution, you have the filter_var method to work with.

As the documentation shows, you have a lot of different filters and rules you can use to make sure that you are receiving data you expected.

Too bothersome? There are a lot of libraries that can greatly simplify that for you, like this library (siriusphp/validation) .

Final note for MySQL

Remember, folks, that if you want to store UTF-8-encoded data in your MySQL DBMS, you need to use the type utf8mb4, which is the real UTF-8 type, instead of using utf8, which is not the real UTF-8.

The utf8 format is only encoded on 3 bytes, instead of 4, which excludes a lot of characters.

htmlspecialchars()

Artemis — Sat, 01 Jun 2019 00:00:00 +0200

From the PHP documentation, it converts special characters to HTML entities.

Bad idea: Using `htmlspecialchars` for "clearing" input

This method is made to transform HTML-related characters to their HTML entity counterparts, not to "clean" data before a save operation, e.g. a SQL INSERT (1).

We see a lot of htmlspecialchars usage for saving data into a database, which is definitely not a good thing.


// Example
$username = htmlspecialchars($_POST['username']);
$db->query("SELECT * FROM users WHERE username = '$username';");

Not only this won't properly prevent SQL injections, but you'll also end up modifying the data in a non-reversible way. You cannot revert back the data to "not HTML special chars" in a reliable way.

This means that, by using htmlspecialchars here, you can't provide any "edit" system, as you won't be able to allow the user to edit the original message.

Good idea: Using `htmlspecialchars` to sanitize user-generated content

As said before, this method is made to be used when outputting content to a page. It's tasked with replacing any HTML-related character with their HTML entity counterpart.

For example, if you have a forum or a comment space, you can use this method to avoid XSS flaws.


// Example
$comment = 'This is a comment  to test XSS';
?>
// ...
= htmlspecialchars($comment); ?>