<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0">
  <channel>
    <title>PHP: Bad Idea / Good Idea - Artemis' blog</title>
    <link>https://www.artemix.org</link>
    <description>PHP: Bad Idea / Good Idea - Artemis' blog</description>
    <atom:link xmlns:atom="http://www.w3.org/2005/Atom" href="https://www.artemix.org/blog/feeds/php-bad-idea-good-idea.xml" type="application/rss+xml" rel="self"/>
    <language>en_US</language>
    <pubDate>Fri, 31 Mar 2023 16:57:47 +0200</pubDate>
    <lastBuildDate>Fri, 31 Mar 2023 16:57:47 +0200</lastBuildDate>
    <ttl>3600</ttl>
    <item>
      <title><![CDATA[File-type verification]]></title>
      <link>https://www.artemix.org/blog/file-type-verification</link>
      <description><![CDATA[<p>When allowing users to upload files, it's important to make sure that it's
the expected format(s).</p>
<p>There are lots of solutions on Internet, but some are just plain awful, bad
practices, and only still around due to misconceptions and bad habits.</p>
<h2>Bad idea: Using the extension to check a file's format</h2>
<p>For everyone used to knowing that a <code>.jpeg</code> file is an image, <code>.pptx</code> file is a
presentation file, and <code>.html</code> is a HTML file, here's a huge news:
your computer <em>doesn't give a fuck</em> about the thing at the end of the file.</p>
<p>You can call your image <code>thisisnotanimage.mp4</code> and it won't change the fact
that <em>it is</em> an image.</p>
<p>What does that mean?</p>
<p>Well, simply that if you validate file type by checking the extension, and
if someone want to upload, for example, a malicious PHP file, they can simply add
an &quot;accepted&quot; extension.</p>
<p>You'd then receive a file named <code>virus.php.png</code>, and you'd gladly accept it!</p>
<p>Solutions based on file extension should simply be dropped.</p>
<h2>Good idea: Using the mime-type to check a file's format</h2>
<p>The first question to ask is &quot;what does <em>format</em> means?&quot;.</p>
<p>Basically, if the file respects a certain structure, which is tied to a certain
format specification, it is recognized to be of said format.</p>
<p>The current &quot;best-practice&quot; solution to check this &quot;structure&quot; is the
&quot;Mime-type&quot; mechanism.</p>
<p>Basically, a Mime is a set of expected indexes and values for a given format.</p>
<p>For example, a very naïve way to check if an image is a PNG file is to check
if the file is <em>at least</em> 4 bytes long, and to check if the first 4 bytes are
of value <code>%PNG</code>.</p>
<p>A PNG file being expected to have this format, that means that any file
following this format would be considered as a valid PNG file.</p>
<blockquote>
<p>Note that it's a dumbed-down rule, to keep the example simple.</p>
</blockquote>
<p>So, for PHP, how do you actually check a file's format?</p>
<p>You have several ways, the simplest being
<a href="https://secure.php.net/manual/en/function.mime-content-type.php"><code>mime_content_type($filename);</code></a>.</p>
<p>As an example, if the file you want to test is available at a path stored in
<code>$path</code>, the following code would return the mime-type.</p>
<pre><code class="language-php"><span class="hljs-meta">&lt;?php</span>
$mimetype = mime_content_type($path);
<span class="hljs-comment">// The file is a PNG file, the mimetype is image/png</span></code></pre>]]></description>
      <content:encoded><![CDATA[<p>When allowing users to upload files, it's important to make sure that it's
the expected format(s).</p>
<p>There are lots of solutions on Internet, but some are just plain awful, bad
practices, and only still around due to misconceptions and bad habits.</p>
<h2>Bad idea: Using the extension to check a file's format</h2>
<p>For everyone used to knowing that a <code>.jpeg</code> file is an image, <code>.pptx</code> file is a
presentation file, and <code>.html</code> is a HTML file, here's a huge news:
your computer <em>doesn't give a fuck</em> about the thing at the end of the file.</p>
<p>You can call your image <code>thisisnotanimage.mp4</code> and it won't change the fact
that <em>it is</em> an image.</p>
<p>What does that mean?</p>
<p>Well, simply that if you validate file type by checking the extension, and
if someone want to upload, for example, a malicious PHP file, they can simply add
an &quot;accepted&quot; extension.</p>
<p>You'd then receive a file named <code>virus.php.png</code>, and you'd gladly accept it!</p>
<p>Solutions based on file extension should simply be dropped.</p>
<h2>Good idea: Using the mime-type to check a file's format</h2>
<p>The first question to ask is &quot;what does <em>format</em> means?&quot;.</p>
<p>Basically, if the file respects a certain structure, which is tied to a certain
format specification, it is recognized to be of said format.</p>
<p>The current &quot;best-practice&quot; solution to check this &quot;structure&quot; is the
&quot;Mime-type&quot; mechanism.</p>
<p>Basically, a Mime is a set of expected indexes and values for a given format.</p>
<p>For example, a very naïve way to check if an image is a PNG file is to check
if the file is <em>at least</em> 4 bytes long, and to check if the first 4 bytes are
of value <code>%PNG</code>.</p>
<p>A PNG file being expected to have this format, that means that any file
following this format would be considered as a valid PNG file.</p>
<blockquote>
<p>Note that it's a dumbed-down rule, to keep the example simple.</p>
</blockquote>
<p>So, for PHP, how do you actually check a file's format?</p>
<p>You have several ways, the simplest being
<a href="https://secure.php.net/manual/en/function.mime-content-type.php"><code>mime_content_type($filename);</code></a>.</p>
<p>As an example, if the file you want to test is available at a path stored in
<code>$path</code>, the following code would return the mime-type.</p>
<pre><code class="language-php"><span class="hljs-meta">&lt;?php</span>
$mimetype = mime_content_type($path);
<span class="hljs-comment">// The file is a PNG file, the mimetype is image/png</span></code></pre>]]></content:encoded>
      <guid isPermaLink="false">file-type-verification</guid>
      <pubDate>Mon, 25 May 2020 00:00:00 +0200</pubDate>
      <author>Artemis</author>
    </item>
    <item>
      <title><![CDATA[E-mail validation]]></title>
      <link>https://www.artemix.org/blog/e-mail-validation</link>
      <description><![CDATA[<p>When a user provides an email, how can you be sure that it's a valid email?</p>
<p>The fact that an email is a complicated format can be a pain in the ass,
because depending on how you validate the e-mail, you may leave out some users.</p>
<p>Now, for validation, there are two approaches: being lenient and being
restrictive.</p>
<h2>Bad idea: Using Regexes to validate an e-mail</h2>
<p>Using a regex may be one of the most common choices to people that are unaware
of problems caused with this approach.</p>
<p>Most regexes found on Internet have as goal to be restrictive: they'll try
to match as closely as possible a &quot;common&quot; e-mail, producing a lot of
false-negatives.</p>
<h2>Good idea: Being lenient, and using e-mail validation instead of filtering</h2>
<p>Before the introduction of UTF-8 in domain names, there was some well-tested
methods to verify that a string matches the format of an e-mail.</p>
<p>For example, in PHP, the
<a href="https://www.php.net/manual/en/function.filter-var.php"><code>filter_var</code></a>
method is perfect for this need.</p>
<p>But with the diversity of formats, instead of being more and more restrictive,
which produces a hell-ish code and more test constraints,
why not be more lenient?</p>
<p>The concept is simple: Check that the e-mail contains two strings separated by
an <code>@</code>, which kind of look like an e-mail, and directly send this e-mail a
confirmation link.</p>
<p>Not only you'll verify if the e-mail is valid, but you'll also manage to check
if it's an existing e-mail account!</p>]]></description>
      <content:encoded><![CDATA[<p>When a user provides an email, how can you be sure that it's a valid email?</p>
<p>The fact that an email is a complicated format can be a pain in the ass,
because depending on how you validate the e-mail, you may leave out some users.</p>
<p>Now, for validation, there are two approaches: being lenient and being
restrictive.</p>
<h2>Bad idea: Using Regexes to validate an e-mail</h2>
<p>Using a regex may be one of the most common choices to people that are unaware
of problems caused with this approach.</p>
<p>Most regexes found on Internet have as goal to be restrictive: they'll try
to match as closely as possible a &quot;common&quot; e-mail, producing a lot of
false-negatives.</p>
<h2>Good idea: Being lenient, and using e-mail validation instead of filtering</h2>
<p>Before the introduction of UTF-8 in domain names, there was some well-tested
methods to verify that a string matches the format of an e-mail.</p>
<p>For example, in PHP, the
<a href="https://www.php.net/manual/en/function.filter-var.php"><code>filter_var</code></a>
method is perfect for this need.</p>
<p>But with the diversity of formats, instead of being more and more restrictive,
which produces a hell-ish code and more test constraints,
why not be more lenient?</p>
<p>The concept is simple: Check that the e-mail contains two strings separated by
an <code>@</code>, which kind of look like an e-mail, and directly send this e-mail a
confirmation link.</p>
<p>Not only you'll verify if the e-mail is valid, but you'll also manage to check
if it's an existing e-mail account!</p>]]></content:encoded>
      <guid isPermaLink="false">e-mail-validation</guid>
      <pubDate>Sun, 03 Nov 2019 00:00:00 +0100</pubDate>
      <author>Artemis</author>
    </item>
    <item>
      <title><![CDATA[Global variables]]></title>
      <link>https://www.artemix.org/blog/global-variables</link>
      <description><![CDATA[<p>Globals are variables defined on the top-level of PHP scripts.</p>
<p>They can be accessed from within a function by explicitly using <code>global $var</code>.</p>
<h2>Bad idea: Using global variables and the <code>global</code> keyword</h2>
<p>Using globals means that your <em>entire</em> code is tied to some top-level variables,
which means that:</p>
<ul>
<li>the variable name cannot change</li>
<li>the variable content can change at any time</li>
<li>you can't have a function working on different instances</li>
</ul>
<p>An usual example we see is the following.</p>
<pre><code class="language-php"><span class="hljs-meta">&lt;?php</span>
$db = get_db();

<span class="hljs-function"><span class="hljs-keyword">function</span> <span class="hljs-title">get_articles</span><span class="hljs-params">()</span> </span>{
    <span class="hljs-keyword">global</span> $db;
    <span class="hljs-keyword">return</span> $db-&gt;query(<span class="hljs-string">"SELECT id, title, author FROM articles"</span>)
        -&gt;fetchAll(PDO::FETCH_ASSOC);
}

$articles = get_articles();</code></pre>
<h2>Good idea: Using parameters, or even classes</h2>
<p>Following the example below, the most direct change you can do is simply passing
<code>$db</code> as a parameter.</p>
<p>The snippet in the previous example then becomes the following.</p>
<pre><code class="language-php"><span class="hljs-meta">&lt;?php</span>
$db = get_db();

<span class="hljs-function"><span class="hljs-keyword">function</span> <span class="hljs-title">get_articles</span><span class="hljs-params">($db_instance)</span> </span>{
    <span class="hljs-comment">// Explicitly changed name to show difference between</span>
    <span class="hljs-comment">// the global $db and the local.</span>
    <span class="hljs-keyword">return</span> $db_instance-&gt;query(<span class="hljs-string">"SELECT id, title, author FROM articles"</span>)
        -&gt;fetchAll(PDO::FETCH_ASSOC);
}

$articles = get_articles($db);</code></pre>
<p>However, in this example, we can clearly see that the function will always
interact with an instance of our DB class.</p>
<p>That means that, since it works on a live variable, it can become a full class.</p>
<pre><code class="language-php"><span class="hljs-meta">&lt;?php</span>
<span class="hljs-class"><span class="hljs-keyword">class</span> <span class="hljs-title">ArticleRepository</span> </span>{
    <span class="hljs-keyword">public</span> <span class="hljs-function"><span class="hljs-keyword">function</span> <span class="hljs-title">__construct</span><span class="hljs-params">($db)</span> </span>{
        <span class="hljs-keyword">$this</span>-&gt;db = $db;
    }

    <span class="hljs-keyword">public</span> <span class="hljs-function"><span class="hljs-keyword">function</span> <span class="hljs-title">get_articles</span><span class="hljs-params">()</span> </span>{
        <span class="hljs-keyword">return</span> <span class="hljs-keyword">$this</span>-&gt;db-&gt;query(<span class="hljs-string">"SELECT id, title, author FROM articles"</span>)
            -&gt;fetchAll(PDO::FETCH_ASSOC);
    }
}

$repo = <span class="hljs-keyword">new</span> ArticleRepository(get_db());
$articles = $repo-&gt;get_articles();</code></pre>]]></description>
      <content:encoded><![CDATA[<p>Globals are variables defined on the top-level of PHP scripts.</p>
<p>They can be accessed from within a function by explicitly using <code>global $var</code>.</p>
<h2>Bad idea: Using global variables and the <code>global</code> keyword</h2>
<p>Using globals means that your <em>entire</em> code is tied to some top-level variables,
which means that:</p>
<ul>
<li>the variable name cannot change</li>
<li>the variable content can change at any time</li>
<li>you can't have a function working on different instances</li>
</ul>
<p>An usual example we see is the following.</p>
<pre><code class="language-php"><span class="hljs-meta">&lt;?php</span>
$db = get_db();

<span class="hljs-function"><span class="hljs-keyword">function</span> <span class="hljs-title">get_articles</span><span class="hljs-params">()</span> </span>{
    <span class="hljs-keyword">global</span> $db;
    <span class="hljs-keyword">return</span> $db-&gt;query(<span class="hljs-string">"SELECT id, title, author FROM articles"</span>)
        -&gt;fetchAll(PDO::FETCH_ASSOC);
}

$articles = get_articles();</code></pre>
<h2>Good idea: Using parameters, or even classes</h2>
<p>Following the example below, the most direct change you can do is simply passing
<code>$db</code> as a parameter.</p>
<p>The snippet in the previous example then becomes the following.</p>
<pre><code class="language-php"><span class="hljs-meta">&lt;?php</span>
$db = get_db();

<span class="hljs-function"><span class="hljs-keyword">function</span> <span class="hljs-title">get_articles</span><span class="hljs-params">($db_instance)</span> </span>{
    <span class="hljs-comment">// Explicitly changed name to show difference between</span>
    <span class="hljs-comment">// the global $db and the local.</span>
    <span class="hljs-keyword">return</span> $db_instance-&gt;query(<span class="hljs-string">"SELECT id, title, author FROM articles"</span>)
        -&gt;fetchAll(PDO::FETCH_ASSOC);
}

$articles = get_articles($db);</code></pre>
<p>However, in this example, we can clearly see that the function will always
interact with an instance of our DB class.</p>
<p>That means that, since it works on a live variable, it can become a full class.</p>
<pre><code class="language-php"><span class="hljs-meta">&lt;?php</span>
<span class="hljs-class"><span class="hljs-keyword">class</span> <span class="hljs-title">ArticleRepository</span> </span>{
    <span class="hljs-keyword">public</span> <span class="hljs-function"><span class="hljs-keyword">function</span> <span class="hljs-title">__construct</span><span class="hljs-params">($db)</span> </span>{
        <span class="hljs-keyword">$this</span>-&gt;db = $db;
    }

    <span class="hljs-keyword">public</span> <span class="hljs-function"><span class="hljs-keyword">function</span> <span class="hljs-title">get_articles</span><span class="hljs-params">()</span> </span>{
        <span class="hljs-keyword">return</span> <span class="hljs-keyword">$this</span>-&gt;db-&gt;query(<span class="hljs-string">"SELECT id, title, author FROM articles"</span>)
            -&gt;fetchAll(PDO::FETCH_ASSOC);
    }
}

$repo = <span class="hljs-keyword">new</span> ArticleRepository(get_db());
$articles = $repo-&gt;get_articles();</code></pre>]]></content:encoded>
      <guid isPermaLink="false">global-variables</guid>
      <pubDate>Thu, 29 Aug 2019 00:00:00 +0200</pubDate>
      <author>Artemis</author>
    </item>
    <item>
      <title><![CDATA[Identification and Authentication]]></title>
      <link>https://www.artemix.org/blog/identification-and-authentication</link>
      <description><![CDATA[<p>Identification is recognizing that a given user is who they claim to be,
whereas authentication is actually confirming that what they're saying is right.</p>
<p>Those are two strictly different notions.</p>
<ul>
<li>An email, or a username, is made to identify a user.</li>
<li>A password is made to authenticate this person.</li>
</ul>
<blockquote>
<p>I'll take the example of a users table with username and password.</p>
</blockquote>
<h2>Bad idea: Using passwords for identification</h2>
<p>When you have your users table in your database, you have their username in
clear text, so you can identify them
(find the corresponding row associated to them).</p>
<p>What you musn't do, however, is to try to identify them based on
<em>their password</em>.</p>
<p>Remember: a password is an <strong>authentication</strong> mechanism,
not an <strong>identification</strong> one.</p>
<p>The following SQL request to try to log in a user is then inherently wrong.</p>
<pre><code class="language-php"><span class="hljs-meta">&lt;?php</span>
<span class="hljs-comment">// $username and $password contains the cleartext username and password</span>
$stmt = $db-&gt;prepare(
    <span class="hljs-string">"SELECT id FROM users WHERE username = ? AND password = ?"</span>);
$stmt-&gt;execute([$username, $password]);</code></pre>
<p>Not only it is wrong because it uses the password as an identification mechanism
(as opposed to an authentication mechanism), but it also forces the developer
to ignore password storing standards, as the stored password will forcibly be
stored using an unsuitable, and unsecure, mechanism.</p>
<blockquote>
<p>Yes, I'm really hammering the difference between identification and
authentication, as it's a core concept here.</p>
</blockquote>
<h2>Good idea: Only using the identifier for identification</h2>
<p>As we saw, a user is identified by his username in our example above.</p>
<p>The good solution is then to try to find a row identified by this username,
and then only to verify the password against the stored secure value.</p>
<p>As an example, the snippet below demonstrates a proper mechanism.</p>
<pre><code class="language-php"><span class="hljs-meta">&lt;?php</span>
<span class="hljs-comment">// $username and $password contains the cleartext username and password</span>
$stmt = $db-&gt;prepare(<span class="hljs-string">"SELECT id, password FROM users WHERE username = ?"</span>);
$row = $stmt-&gt;execute([$username])-&gt;fetch(PDO::FETCH_ASSOC);
$can_log_in = password_verify($password, $row[<span class="hljs-string">'password'</span>]);</code></pre>
<p>The <code>$can_log_in</code> variable will be set to true if the right username/password
combination have been entered, and false otherwise.</p>
<blockquote>
<p>Note that for the example's sake, we omit error verification for the request,
which obviously shouldn't be done on a real website.</p>
</blockquote>]]></description>
      <content:encoded><![CDATA[<p>Identification is recognizing that a given user is who they claim to be,
whereas authentication is actually confirming that what they're saying is right.</p>
<p>Those are two strictly different notions.</p>
<ul>
<li>An email, or a username, is made to identify a user.</li>
<li>A password is made to authenticate this person.</li>
</ul>
<blockquote>
<p>I'll take the example of a users table with username and password.</p>
</blockquote>
<h2>Bad idea: Using passwords for identification</h2>
<p>When you have your users table in your database, you have their username in
clear text, so you can identify them
(find the corresponding row associated to them).</p>
<p>What you musn't do, however, is to try to identify them based on
<em>their password</em>.</p>
<p>Remember: a password is an <strong>authentication</strong> mechanism,
not an <strong>identification</strong> one.</p>
<p>The following SQL request to try to log in a user is then inherently wrong.</p>
<pre><code class="language-php"><span class="hljs-meta">&lt;?php</span>
<span class="hljs-comment">// $username and $password contains the cleartext username and password</span>
$stmt = $db-&gt;prepare(
    <span class="hljs-string">"SELECT id FROM users WHERE username = ? AND password = ?"</span>);
$stmt-&gt;execute([$username, $password]);</code></pre>
<p>Not only it is wrong because it uses the password as an identification mechanism
(as opposed to an authentication mechanism), but it also forces the developer
to ignore password storing standards, as the stored password will forcibly be
stored using an unsuitable, and unsecure, mechanism.</p>
<blockquote>
<p>Yes, I'm really hammering the difference between identification and
authentication, as it's a core concept here.</p>
</blockquote>
<h2>Good idea: Only using the identifier for identification</h2>
<p>As we saw, a user is identified by his username in our example above.</p>
<p>The good solution is then to try to find a row identified by this username,
and then only to verify the password against the stored secure value.</p>
<p>As an example, the snippet below demonstrates a proper mechanism.</p>
<pre><code class="language-php"><span class="hljs-meta">&lt;?php</span>
<span class="hljs-comment">// $username and $password contains the cleartext username and password</span>
$stmt = $db-&gt;prepare(<span class="hljs-string">"SELECT id, password FROM users WHERE username = ?"</span>);
$row = $stmt-&gt;execute([$username])-&gt;fetch(PDO::FETCH_ASSOC);
$can_log_in = password_verify($password, $row[<span class="hljs-string">'password'</span>]);</code></pre>
<p>The <code>$can_log_in</code> variable will be set to true if the right username/password
combination have been entered, and false otherwise.</p>
<blockquote>
<p>Note that for the example's sake, we omit error verification for the request,
which obviously shouldn't be done on a real website.</p>
</blockquote>]]></content:encoded>
      <guid isPermaLink="false">identification-and-authentication</guid>
      <pubDate>Sun, 18 Aug 2019 00:00:00 +0200</pubDate>
      <author>Artemis</author>
    </item>
    <item>
      <title><![CDATA[Password storing]]></title>
      <link>https://www.artemix.org/blog/password-storing</link>
      <description><![CDATA[<p>User authentication is something crucial for every system requiring to have user
accounts.</p>
<p>Even (<em>especially</em>) for small websites and businesses, it is <em>critical</em> to
always make sure to follow good practices for maximum security.</p>
<p>We're not talking about some hardcore stuff, though, as
<a href="https://github.com/OWASP/CheatSheetSeries/blob/master/cheatsheets/Password_Storage_Cheat_Sheet.md">the OWASP cheatsheet</a> demonstrates.</p>
<h2>Bad idea: plain-text storing and unfit mechanisms</h2>
<p>Every technique below is a bad idea, resulting in very poor security.</p>
<ul>
<li>Storing passwords in plain text, is it really necessary to explain?</li>
<li>Ciphering passwords, as a password should <em>never</em> be deciphered.</li>
<li>Using hash mechanisms, as a plain hash algorithm is not made to protect
passwords, only to generate a trace of a data (you can throw away md5/sha1).</li>
<li>Changing the encoding of the password, like base 64. You're not protecting
anything, it's basically plaintext here.</li>
<li>Stacking hash algorithms together, you'll only augment the collision risk and
it's still not made for this purpose.</li>
</ul>
<p>As a developer, you <em>musn't</em> re-develop security mechanisms like password
protection.
Home-made security gives you no guarantee that your system is to be trusted,
unlike provided and well-audited mechanisms, which are &quot;almost&quot; guaranteed safe.</p>
<blockquote>
<p>If you had the technical knowledge to do so in a proper and secure way, you
would be working in security anyways!</p>
</blockquote>
<h2>Good idea: Using the provided mechanisms, or using dedicated libraries</h2>
<p>As a golden rule, a secure password should <em>never</em> be seen by anyone.</p>
<p>So, how do you actually do that?</p>
<p>Well, you won't store the password, but a derived value, commonly called a hash
(note, as this can be confusing, that the value contains more than the plain
hash, a lot more is done behind the scenes).</p>
<p><img src="/image/hash_format" alt="Image displaying the password_hash format" /></p>
<p>In PHP, the <a href="https://www.php.net/password-hash"><code>password_hash</code></a> function does
the job for you.</p>
<p>You'll store the value generated by the following code in database.</p>
<pre><code class="language-php"><span class="hljs-meta">&lt;?php</span>
<span class="hljs-comment">// Example</span>
$hashed_value_to_store = password_hash($password, PASSWORD_DEFAULT);</code></pre>
<blockquote>
<p>As of today, the best algorithm is <code>PASSWORD_ARGON2ID</code>.</p>
<p>As much as possible, you should follow the following order of preference to choose the algorithm you're gonna use.
Note that you can change at any time, it <em>won't break your website</em>.</p>
<ul>
<li><code>PASSWORD_ARGON2ID</code>: The best choice, available from 7.3 onwards</li>
<li><code>PASSWORD_ARGON2I</code>: Second best choice, if you must maintain a legacy system</li>
<li><code>PASSWORD_DEFAULT</code>: Third best choice, as it'll evolve towards the &quot;current best algorithm&quot; when you'll update</li>
<li><code>PASSWORD_BCRYPT</code>: Fourth best choice, to avoid if possible (using the first two instead)</li>
</ul>
</blockquote>
<p>But how do you actually verify that the password the user is providing during
login is the one they entered during registering?</p>
<p>Before you answer, <strong>no</strong>, you won't make another <code>password_hash</code> and compare
both results.</p>
<p>For that, every library provides a function, and PHP provides the nifty
<a href="https://www.php.net/manual/en/function.password-verify.php">password_verify</a>
function.</p>
<p>You can simply use it like the following example.</p>
<pre><code class="language-php"><span class="hljs-meta">&lt;?php</span>
<span class="hljs-comment">// Example, password comes from the login form, and hashed_value comes from the</span>
<span class="hljs-comment">// saved entry in database</span>
$will_be_true_if_matches = password_verify($password, $hashed_value);</code></pre>]]></description>
      <content:encoded><![CDATA[<p>User authentication is something crucial for every system requiring to have user
accounts.</p>
<p>Even (<em>especially</em>) for small websites and businesses, it is <em>critical</em> to
always make sure to follow good practices for maximum security.</p>
<p>We're not talking about some hardcore stuff, though, as
<a href="https://github.com/OWASP/CheatSheetSeries/blob/master/cheatsheets/Password_Storage_Cheat_Sheet.md">the OWASP cheatsheet</a> demonstrates.</p>
<h2>Bad idea: plain-text storing and unfit mechanisms</h2>
<p>Every technique below is a bad idea, resulting in very poor security.</p>
<ul>
<li>Storing passwords in plain text, is it really necessary to explain?</li>
<li>Ciphering passwords, as a password should <em>never</em> be deciphered.</li>
<li>Using hash mechanisms, as a plain hash algorithm is not made to protect
passwords, only to generate a trace of a data (you can throw away md5/sha1).</li>
<li>Changing the encoding of the password, like base 64. You're not protecting
anything, it's basically plaintext here.</li>
<li>Stacking hash algorithms together, you'll only augment the collision risk and
it's still not made for this purpose.</li>
</ul>
<p>As a developer, you <em>musn't</em> re-develop security mechanisms like password
protection.
Home-made security gives you no guarantee that your system is to be trusted,
unlike provided and well-audited mechanisms, which are &quot;almost&quot; guaranteed safe.</p>
<blockquote>
<p>If you had the technical knowledge to do so in a proper and secure way, you
would be working in security anyways!</p>
</blockquote>
<h2>Good idea: Using the provided mechanisms, or using dedicated libraries</h2>
<p>As a golden rule, a secure password should <em>never</em> be seen by anyone.</p>
<p>So, how do you actually do that?</p>
<p>Well, you won't store the password, but a derived value, commonly called a hash
(note, as this can be confusing, that the value contains more than the plain
hash, a lot more is done behind the scenes).</p>
<p><img src="/image/hash_format" alt="Image displaying the password_hash format" /></p>
<p>In PHP, the <a href="https://www.php.net/password-hash"><code>password_hash</code></a> function does
the job for you.</p>
<p>You'll store the value generated by the following code in database.</p>
<pre><code class="language-php"><span class="hljs-meta">&lt;?php</span>
<span class="hljs-comment">// Example</span>
$hashed_value_to_store = password_hash($password, PASSWORD_DEFAULT);</code></pre>
<blockquote>
<p>As of today, the best algorithm is <code>PASSWORD_ARGON2ID</code>.</p>
<p>As much as possible, you should follow the following order of preference to choose the algorithm you're gonna use.
Note that you can change at any time, it <em>won't break your website</em>.</p>
<ul>
<li><code>PASSWORD_ARGON2ID</code>: The best choice, available from 7.3 onwards</li>
<li><code>PASSWORD_ARGON2I</code>: Second best choice, if you must maintain a legacy system</li>
<li><code>PASSWORD_DEFAULT</code>: Third best choice, as it'll evolve towards the &quot;current best algorithm&quot; when you'll update</li>
<li><code>PASSWORD_BCRYPT</code>: Fourth best choice, to avoid if possible (using the first two instead)</li>
</ul>
</blockquote>
<p>But how do you actually verify that the password the user is providing during
login is the one they entered during registering?</p>
<p>Before you answer, <strong>no</strong>, you won't make another <code>password_hash</code> and compare
both results.</p>
<p>For that, every library provides a function, and PHP provides the nifty
<a href="https://www.php.net/manual/en/function.password-verify.php">password_verify</a>
function.</p>
<p>You can simply use it like the following example.</p>
<pre><code class="language-php"><span class="hljs-meta">&lt;?php</span>
<span class="hljs-comment">// Example, password comes from the login form, and hashed_value comes from the</span>
<span class="hljs-comment">// saved entry in database</span>
$will_be_true_if_matches = password_verify($password, $hashed_value);</code></pre>]]></content:encoded>
      <guid isPermaLink="false">password-storing</guid>
      <pubDate>Mon, 03 Jun 2019 00:00:00 +0200</pubDate>
      <author>Artemis</author>
    </item>
    <item>
      <title><![CDATA[Dynamic data and SQL statements]]></title>
      <link>https://www.artemix.org/blog/dynamic-data-and-sql-statements</link>
      <description><![CDATA[<blockquote>
<p>Edit: Added a more complete guide to proper anti-injection measures,
thanks to <a href="https://dev.to/tarialfaro/comment/bfp9">Tari R. Alfaro's comment</a>.</p>
</blockquote>
<p>We often need to make SQL requests to work with dynamically-provided content.</p>
<p>For that, there is the &quot;prepare&quot; mechanism.</p>
<p>From the <a href="https://www.php.net/manual/en/pdo.prepared-statements.php">PHP documentation</a>,
it allows one to &quot;prepare&quot; SQL requests.</p>
<p>This is not only provided by PDO, virtually every SQL tool have prepared
statements, as &quot;prepare&quot; is a standard RDBMS mechanism.</p>
<blockquote>
<p>If you want a more in-depth explanation of &quot;What are prepared statements&quot;,
make sure to check out <a href="https://phpdelusions.net/sql_injection">this article</a>.</p>
</blockquote>
<h2>Bad idea: directly insert dynamic data in a SQL request</h2>
<p>As seen in the <code>htmlspecialchars</code> example, there's lots of occurences on which
we see dynamically-inserted data (like the example below).</p>
<pre><code class="language-php"><span class="hljs-meta">&lt;?php</span>
<span class="hljs-comment">// Example</span>
$user = $_POST[<span class="hljs-string">'username'</span>];
$db-&gt;query(<span class="hljs-string">"SELECT * FROM users WHERE username = $user"</span>);</code></pre>
<p>This creates a few issues.</p>
<ul>
<li>RDBMS won't be able to properly optimize the request</li>
<li>They also won't be able to pre-validate the content type of the field</li>
<li>This allows for very easy
<a href="https://en.wikipedia.org/wiki/SQL_injection">SQL injections</a>.</li>
</ul>
<h2>Good idea: Using the prepare mechanism to securely and efficiently pass dynamic data</h2>
<p><em>I won't go into detail on how preparing statements is a benefit, see
<a href="https://phpdelusions.net/sql_injection">the article linked above</a> for that.</em></p>
<p>Preparing statements is a very easy thing to do.</p>
<ul>
<li>Create a request with placeholders instead of your values.
Documentation for your SQL library will give you the placeholders to use.</li>
<li>Execute that request, passing data that should be used instead of those
placeholders.
Another approach is to manually bind each value before executing.</li>
</ul>
<p>In PHP, both approaches are very simple.</p>
<pre><code class="language-php"><span class="hljs-meta">&lt;?php</span>
<span class="hljs-comment">// Example with execute-time data passing and unnamed placeholder</span>
$req = $db-&gt;prepare(<span class="hljs-string">"SELECT * FROM users WHERE username = ?"</span>);
$req-&gt;execute([$username]);
<span class="hljs-comment">// Example with execute-time data passing and named placeholder</span>
$req = $db-&gt;prepare(<span class="hljs-string">"SELECT * FROM users WHERE username = :username"</span>);
$req-&gt;execute([
    <span class="hljs-string">'username'</span> =&gt; $username
]);
<span class="hljs-comment">// Example with manual binding and named placeholder before execute</span>
$req = $db-&gt;prepare(<span class="hljs-string">"SELECT * FROM users WHERE username = :username"</span>);
$req-&gt;bindParam(<span class="hljs-string">'username'</span>, $username);
$req-&gt;execute();</code></pre>
<h3>Additional steps to go through</h3>
<p>The part above this one will give you a good base to work on, but to really
make things as foolproof as possible,
we need to have a few other tweaks and bits.</p>
<h4>Preparation emulation</h4>
<p>First of all, and because the preparation mechanism is a real
<em>database construct</em>,
we need to disable what is called &quot;emulation&quot; (which consists in the PDO library
simulating the preparation mechanism, for DBMSs that don't have a decent
preparation mechanism).</p>
<p>To <em>do</em> that, we need to set a PDO configuration, <code>PDO::EMULATE_PREPARES</code>,
to <code>false</code>.</p>
<pre><code class="language-php">$db-&gt;setAttribute(PDO::EMULATE_PREPARES, <span class="hljs-keyword">false</span>);</code></pre>
<h4>Data validation</h4>
<p>A golden rule of data handling is &quot;never trust the user&quot;.</p>
<p>To properly handle form submission, you need another step <em>before</em> trying to
even imagine inserting data into your database: validation.</p>
<p>You won't &quot;format&quot; data, you won't change anything, but, for every bit of info
that you received, you'll take it, and compare it against a set of rules,
to make sure everything is as expected.</p>
<p>Sounds complicated? It isn't.</p>
<p>For a native PHP only solution, you have the <code>filter_var</code>
method to work with.</p>
<p>As the <a href="https://www.php.net/manual/en/function.filter-var.php">documentation</a>
shows, you have <a href="https://www.php.net/manual/en/filter.filters.validate.php"><em>a lot</em> of different filters and rules</a>
you can use to make sure that you are receiving data you expected.</p>
<p>Too bothersome? There are <em>a lot</em> of libraries that can greatly simplify that
for you, like
<a href="https://github.com/siriusphp/validation">this library (<code>siriusphp/validation</code>)</a>
.</p>
<h4>Final note for MySQL</h4>
<p>Remember, folks, that if you want to store UTF-8-encoded data in your
MySQL DBMS, you need to use the type <code>utf8mb4</code>, which is the <em>real</em> UTF-8 type,
instead of using <code>utf8</code>, which <strong>is not</strong> the real UTF-8.</p>
<p>The <code>utf8</code> format is only encoded on 3 bytes, instead of 4, which excludes <em>a lot</em>
of characters.</p>]]></description>
      <content:encoded><![CDATA[<blockquote>
<p>Edit: Added a more complete guide to proper anti-injection measures,
thanks to <a href="https://dev.to/tarialfaro/comment/bfp9">Tari R. Alfaro's comment</a>.</p>
</blockquote>
<p>We often need to make SQL requests to work with dynamically-provided content.</p>
<p>For that, there is the &quot;prepare&quot; mechanism.</p>
<p>From the <a href="https://www.php.net/manual/en/pdo.prepared-statements.php">PHP documentation</a>,
it allows one to &quot;prepare&quot; SQL requests.</p>
<p>This is not only provided by PDO, virtually every SQL tool have prepared
statements, as &quot;prepare&quot; is a standard RDBMS mechanism.</p>
<blockquote>
<p>If you want a more in-depth explanation of &quot;What are prepared statements&quot;,
make sure to check out <a href="https://phpdelusions.net/sql_injection">this article</a>.</p>
</blockquote>
<h2>Bad idea: directly insert dynamic data in a SQL request</h2>
<p>As seen in the <code>htmlspecialchars</code> example, there's lots of occurences on which
we see dynamically-inserted data (like the example below).</p>
<pre><code class="language-php"><span class="hljs-meta">&lt;?php</span>
<span class="hljs-comment">// Example</span>
$user = $_POST[<span class="hljs-string">'username'</span>];
$db-&gt;query(<span class="hljs-string">"SELECT * FROM users WHERE username = $user"</span>);</code></pre>
<p>This creates a few issues.</p>
<ul>
<li>RDBMS won't be able to properly optimize the request</li>
<li>They also won't be able to pre-validate the content type of the field</li>
<li>This allows for very easy
<a href="https://en.wikipedia.org/wiki/SQL_injection">SQL injections</a>.</li>
</ul>
<h2>Good idea: Using the prepare mechanism to securely and efficiently pass dynamic data</h2>
<p><em>I won't go into detail on how preparing statements is a benefit, see
<a href="https://phpdelusions.net/sql_injection">the article linked above</a> for that.</em></p>
<p>Preparing statements is a very easy thing to do.</p>
<ul>
<li>Create a request with placeholders instead of your values.
Documentation for your SQL library will give you the placeholders to use.</li>
<li>Execute that request, passing data that should be used instead of those
placeholders.
Another approach is to manually bind each value before executing.</li>
</ul>
<p>In PHP, both approaches are very simple.</p>
<pre><code class="language-php"><span class="hljs-meta">&lt;?php</span>
<span class="hljs-comment">// Example with execute-time data passing and unnamed placeholder</span>
$req = $db-&gt;prepare(<span class="hljs-string">"SELECT * FROM users WHERE username = ?"</span>);
$req-&gt;execute([$username]);
<span class="hljs-comment">// Example with execute-time data passing and named placeholder</span>
$req = $db-&gt;prepare(<span class="hljs-string">"SELECT * FROM users WHERE username = :username"</span>);
$req-&gt;execute([
    <span class="hljs-string">'username'</span> =&gt; $username
]);
<span class="hljs-comment">// Example with manual binding and named placeholder before execute</span>
$req = $db-&gt;prepare(<span class="hljs-string">"SELECT * FROM users WHERE username = :username"</span>);
$req-&gt;bindParam(<span class="hljs-string">'username'</span>, $username);
$req-&gt;execute();</code></pre>
<h3>Additional steps to go through</h3>
<p>The part above this one will give you a good base to work on, but to really
make things as foolproof as possible,
we need to have a few other tweaks and bits.</p>
<h4>Preparation emulation</h4>
<p>First of all, and because the preparation mechanism is a real
<em>database construct</em>,
we need to disable what is called &quot;emulation&quot; (which consists in the PDO library
simulating the preparation mechanism, for DBMSs that don't have a decent
preparation mechanism).</p>
<p>To <em>do</em> that, we need to set a PDO configuration, <code>PDO::EMULATE_PREPARES</code>,
to <code>false</code>.</p>
<pre><code class="language-php">$db-&gt;setAttribute(PDO::EMULATE_PREPARES, <span class="hljs-keyword">false</span>);</code></pre>
<h4>Data validation</h4>
<p>A golden rule of data handling is &quot;never trust the user&quot;.</p>
<p>To properly handle form submission, you need another step <em>before</em> trying to
even imagine inserting data into your database: validation.</p>
<p>You won't &quot;format&quot; data, you won't change anything, but, for every bit of info
that you received, you'll take it, and compare it against a set of rules,
to make sure everything is as expected.</p>
<p>Sounds complicated? It isn't.</p>
<p>For a native PHP only solution, you have the <code>filter_var</code>
method to work with.</p>
<p>As the <a href="https://www.php.net/manual/en/function.filter-var.php">documentation</a>
shows, you have <a href="https://www.php.net/manual/en/filter.filters.validate.php"><em>a lot</em> of different filters and rules</a>
you can use to make sure that you are receiving data you expected.</p>
<p>Too bothersome? There are <em>a lot</em> of libraries that can greatly simplify that
for you, like
<a href="https://github.com/siriusphp/validation">this library (<code>siriusphp/validation</code>)</a>
.</p>
<h4>Final note for MySQL</h4>
<p>Remember, folks, that if you want to store UTF-8-encoded data in your
MySQL DBMS, you need to use the type <code>utf8mb4</code>, which is the <em>real</em> UTF-8 type,
instead of using <code>utf8</code>, which <strong>is not</strong> the real UTF-8.</p>
<p>The <code>utf8</code> format is only encoded on 3 bytes, instead of 4, which excludes <em>a lot</em>
of characters.</p>]]></content:encoded>
      <guid isPermaLink="false">dynamic-data-and-sql-statements</guid>
      <pubDate>Sun, 02 Jun 2019 00:00:00 +0200</pubDate>
      <author>Artemis</author>
    </item>
    <item>
      <title><![CDATA[htmlspecialchars()]]></title>
      <link>https://www.artemix.org/blog/htmlspecialchars</link>
      <description><![CDATA[<p>From the <a href="https://www.php.net/manual/en/function.htmlspecialchars.php">PHP documentation</a>,
it converts special characters to HTML entities.</p>
<h2>Bad idea: Using <code>htmlspecialchars</code> for &quot;clearing&quot; input</h2>
<p>This method is made to transform HTML-related characters to their HTML entity
counterparts, <em>not</em> to &quot;clean&quot; data before a save operation, e.g. a SQL
<code>INSERT</code> (<a href="https://www.artemix.org/blog/dynamic-data-and-sql-statements">1</a>).</p>
<p>We see a lot of <code>htmlspecialchars</code> usage for saving data into a database,
which is definitely not a good thing.</p>
<pre><code class="language-php"><span class="hljs-meta">&lt;?php</span>
<span class="hljs-comment">// Example</span>
$username = htmlspecialchars($_POST[<span class="hljs-string">'username'</span>]);
$db-&gt;query(<span class="hljs-string">"SELECT * FROM users WHERE username = '$username';"</span>);</code></pre>
<p>Not only this won't properly prevent SQL injections, but you'll also end up
modifying the data in a non-reversible way.
You <em>cannot</em> revert back the data to &quot;not HTML special chars&quot; in a reliable way.</p>
<p>This means that, by using <code>htmlspecialchars</code> here, you can't provide any &quot;edit&quot;
system, as you won't be able to allow the user to edit the <em>original</em> message.</p>
<h2>Good idea: Using <code>htmlspecialchars</code> to sanitize user-generated content</h2>
<p>As said before, this method is made to be used when outputting content to a
page.
It's tasked with replacing any HTML-related character with their HTML entity
counterpart.</p>
<p>For example, if you have a forum or a comment space, you can use this method to
avoid <a href="https://en.wikipedia.org/wiki/Cross-site_scripting">XSS</a> flaws.</p>
<pre><code class="language-php"><span class="hljs-meta">&lt;?php</span>
<span class="hljs-comment">// Example</span>
$comment = <span class="hljs-string">'This is a comment &lt;script src="badstuff.js"&gt;&lt;/script&gt; to test XSS'</span>;
<span class="hljs-meta">?&gt;</span>
<span class="hljs-comment">// ...</span>
&lt;article&gt;<span class="hljs-meta">&lt;?</span>= htmlspecialchars($comment); <span class="hljs-meta">?&gt;</span>&lt;/article&gt;</code></pre>]]></description>
      <content:encoded><![CDATA[<p>From the <a href="https://www.php.net/manual/en/function.htmlspecialchars.php">PHP documentation</a>,
it converts special characters to HTML entities.</p>
<h2>Bad idea: Using <code>htmlspecialchars</code> for &quot;clearing&quot; input</h2>
<p>This method is made to transform HTML-related characters to their HTML entity
counterparts, <em>not</em> to &quot;clean&quot; data before a save operation, e.g. a SQL
<code>INSERT</code> (<a href="https://www.artemix.org/blog/dynamic-data-and-sql-statements">1</a>).</p>
<p>We see a lot of <code>htmlspecialchars</code> usage for saving data into a database,
which is definitely not a good thing.</p>
<pre><code class="language-php"><span class="hljs-meta">&lt;?php</span>
<span class="hljs-comment">// Example</span>
$username = htmlspecialchars($_POST[<span class="hljs-string">'username'</span>]);
$db-&gt;query(<span class="hljs-string">"SELECT * FROM users WHERE username = '$username';"</span>);</code></pre>
<p>Not only this won't properly prevent SQL injections, but you'll also end up
modifying the data in a non-reversible way.
You <em>cannot</em> revert back the data to &quot;not HTML special chars&quot; in a reliable way.</p>
<p>This means that, by using <code>htmlspecialchars</code> here, you can't provide any &quot;edit&quot;
system, as you won't be able to allow the user to edit the <em>original</em> message.</p>
<h2>Good idea: Using <code>htmlspecialchars</code> to sanitize user-generated content</h2>
<p>As said before, this method is made to be used when outputting content to a
page.
It's tasked with replacing any HTML-related character with their HTML entity
counterpart.</p>
<p>For example, if you have a forum or a comment space, you can use this method to
avoid <a href="https://en.wikipedia.org/wiki/Cross-site_scripting">XSS</a> flaws.</p>
<pre><code class="language-php"><span class="hljs-meta">&lt;?php</span>
<span class="hljs-comment">// Example</span>
$comment = <span class="hljs-string">'This is a comment &lt;script src="badstuff.js"&gt;&lt;/script&gt; to test XSS'</span>;
<span class="hljs-meta">?&gt;</span>
<span class="hljs-comment">// ...</span>
&lt;article&gt;<span class="hljs-meta">&lt;?</span>= htmlspecialchars($comment); <span class="hljs-meta">?&gt;</span>&lt;/article&gt;</code></pre>]]></content:encoded>
      <guid isPermaLink="false">htmlspecialchars</guid>
      <pubDate>Sat, 01 Jun 2019 00:00:00 +0200</pubDate>
      <author>Artemis</author>
    </item>
  </channel>
</rss>
