Meta – Samposium https://samnicholls.net The Exciting Adventures of Sam Mon, 18 Jan 2016 12:29:13 +0000 en-GB hourly 1 https://wordpress.org/?v=5.7.5 101350222 Wrangling WordPress https://samnicholls.net/2015/10/23/wrangling-wordpress/ https://samnicholls.net/2015/10/23/wrangling-wordpress/#comments Fri, 23 Oct 2015 09:55:49 +0000 http://blog.ironowl.io/?p=54 In my previous post, I admitted to switching back to WordPress. Here is what I learned and some of the pros and cons I’ve encountered so far.

The Famous Five Minute Install

WordPress prides itself on its famous five minute install, so I figured even if I decided to abort the migration immediately after installation, I’d not have wasted much of my time. Indeed, the steps are simple and it takes almost no time to download and unzip the source, generate some database credentials, edit the sample configuration and activate a new Apache VirtualHost. The longest part of the installation was impatiently waiting for the new DNS zone to propagate.

Well, excluding the time it took for me to realise that the reason the domain was displaying the default VirtualHost was that it had been configured for :443, not :80. Oops…

I followed the prompts, created a user and I was automatically logged in to my new blog. Tada! I was done.

A Brief Note on WordPress Security

One of the main reasons I’d been less inclined to install WordPress was its reputation for poor security. I can’t quantify whether this is because the code base is actually hacky or poor1, or whether it is merely a victim of its own popularity and its code base is under more forensic scrutiny from would-be attackers.

I suspect it’s a little of both, but end-users themselves are often to blame for the repercussions. WordPress and its plugins are frequently updated (almost annoyingly so) and yet outdated versions of WordPress litter the web, ripe for the picking. Keeping on top of these updates is a first and critical step towards maintaining security.

Judging by my Apache access logs, the main threat to a WordPress installation that isn’t particularly sensitive, is automated brute forcing of administrator account logins2. For some reason, in 2015, out-of-the-box WordPress has no ability to throttle or temporarily deny a user access to the wp-admin login page following multiple failed logins, despite this appearing to be a major attack vector.

Now, when I set-up my first VPS, I found several helpful guides published by Linode. One such guide, for server security, described the installation and configuration of fail2ban, an excellent tool that monitors various system logs and drops traffic from IP addresses that appear to be acting suspiciously. Helpfully, someone has written a WordPress fail2ban plugin which uses the server’s LOG_AUTH notification mechanism to append to the system’s auth log. An additional rule (called a “jail”) is appended to fail2ban.conf, specifying a filter (provided by the plugin) that is responsible for parsing relevant log lines and flagging suspicious behaviour – in this case, failed login attempts, triggering a ban.

I would imagine (and hope) for a small-fry blog like mine, where intrusion provides no real gain to an assailant other than my inconvenience (though I suppose automatically pwning any boxes you can for a bot net is desirable), that deployment of fail2ban in this fashion will effectively eliminate risk from the most likely sources (automated scanning tools). Indeed, in just over four days, since setting up the jail, 50 IPs have been banned for failing to enter a valid administration password. Of course, it probably helps further that my credentials were generated by a password manager and so are less likely to be guessed by brute-force anyway.

IP banning aside, the WordPress Codex also has a nice article on hardening WordPress installations which covers a few other topics, like using Apache RewriteRules to protect the wp-includes directory, ensuring correct file permissions, securing the wp-config.php and limiting database privileges3. It also briefly name drops OSSEC: an open source intrusion detection system that happens to be pretty awesome.

Replicating Functionality

Equipped with a bare bones WordPress blog, complete with an example post and comment from “Mr. WordPress”, I first tasked myself to replicate the functionality provided by my GitHub pages blog. After all, if I really was going to migrate, I’d need to confirm that my new posts could be afforded like-for-like functionality with old ones.

Syntax Highlighting

At the top of my agenda was syntax highlighting. Code snippets are a vital part of sharing tips and fixes, as well as describing how exactly computations were allowed to go badly wrong. I briefly searched around for appropriate plugins before settling on Crayon Syntax Highlighter. Installation should have been simple, but the automatic plugin installation failed repeatedly with a vague permissions error.

I spent the best part of half an hour going around in circles, altering various directory and file permissions and groups, and going so far as touching the files myself to no avail. The Apache error log provided more detail but nothing helpful:

PHP Warning:  file_put_contents(ssh2.sftp://Resource id #167/█████████████████████████████████████████/wp-content/upgrade/crayon-syntax-highlighter/crayon-syntax-highlighter/langs/c#/statement.txt): failed to open stream: operation failed in class-wp-filesystem-ssh2.php on line 181

After checking I was able to install other plugins, I had a hunch and looked for a bug to confirm my suspicions: I think the installation fails due to the hash symbol in the file path being interpreted as a special character. After some digging I found an old bug that describes this exact issue, but was reported as fixed back in 2012. I’d wasted enough time here and at the risk of not quite solving the mystery and just getting up and running, I gave up on the automated installation and downloaded and unzipped the plugin to the necessary directory myself.

Footnotes

FD Footnotes is a clean and simple plugin for footnotes. Installation was trivial but unfortunately the syntax was not the same as used in my previous posts, which will undoubtedly add complexity for post transfer later.

Comments

I added comments to my previous blog with Disqus. This proved convenient as migration was simply a case of installing and configuring the WordPress Disqus plugin. As I was using a temporary URL, I appended this to the list of trusted hosts on the Disqus administration page.

Markdown Support (and more…)

I’ve become accustomed to writing with Markdown and was disappointed to find that the WordPress editor does not support it by default. A brief Google search suggested the Jetpack plugin; a disturbingly overpowered plugin from the same people who host wordpress.com. Along with Markdown support, features include; additional visitor statistics, social content sharing buttons, enhanced security, a Latex plugin (very handy), shortlinks, and a subscription system.

Duplicating Content

When I migrated Vic’s blog from wordpress.com, an importer plugin made the process relatively hassle free and I was hoping to find a similar plugin for importing my Jekyll posts. I found an RSS importer that failed to work with my atom.xml and another plugin that could parse the website directly failed to import posts also. I turned to scripting and the best I could find was a hacky looking PHP script that while useful, would still need manual editing to handle images, code samples, footnotes and intra-blog links. In the end, I decided to just re-create each post manually. How hard could it be?

The process was simple for my small quote and image posts but a very painful experience for my full-fledged technical posts. Copying the Markdown directly, I was surprised to find that code snippets and footnotes worked without intervention, but newline characters were incorrectly interpreted as paragraph breaks and there was a mess of ampersands, greater and less than symbols incorrectly rendered in the text as their corresponding HTML entities. I needed something more clever.

I tried using pandoc to convert the Markdown to HTML but this destroyed code blocks by adding syntax highlighting inside of span tags. I opted to instead convert to a slightly different flavour of Markdown that appeared more acceptable to the visual editor. Posts still required a significant amount of intervention to correct mistakes in the new formatting, update images and links.

pandoc --atx-headers --normalize -f markdown -t markdown_github+footnotes in.md > out.md

Errors in parsing frequently caused code snippets to appear without formatting and it took me a while to notice spurious escape characters preceding underscores, less and greater than symbols. Switching between the Visual and Text editor modes just once would mangle all indentation inside code blocks.

I spent the best part of five hours converting and correcting just under 40 posts. I regret this course of action.

Initial Thoughts

Content migration aside, I’ll enumerate a few positives and negatives I’ve encountered over my first few days of use:

Likes

Media insertion
As mentioned in my last post, embedded media was previously a pain. I almost exclusively used the Github web interface to author posts, which doesn’t currently allow for arbitrary file uploads. Uploading an image would be done on my local machine via a Github commit and I’d have to manually craft the image tags myself as necessary. The WordPress editor on the other hand has a nifty upload and image manager tool. It was also quite easy to fix broken images after importing my content.

Better web based visual editor
Github’s web editor is handy but not intended for this purpose, the WordPress editor affords more writing-specific functionality when authoring posts.

Linking to previous posts
Jekyll allowed for intra-blog links with a special post tag which was useful (as one doesn’t need to write the full HTML for an a tag), but not completely intuitive as I didn’t know the names of my previous posts off the top of my head. Here I can click the create link button and select from my post list, or provide an external URL.

Categories and tagging
I post several different categories of information to my blog and would like to demarcate them more obviously for readers. Tags also provide a handy way to move through previous posts that cover similar topics in the same category. Whilst more than possible with Jekyll via numerous plugins, Github pages allow support a very narrow subset of the plugins available.

Plugins
There is a whole world of WordPress plugins available for a variety of tasks. Plugins are typically quick and simple to install and configure.

Post preview
I can preview and save posts without needing to commit half finished drafts.

Automatic URLs, slugs, pretty permalinks
These are just a few bits and bobs that are taken care of automatically for me, making the authoring process just that little bit more streamlined.

Dislikes

Tables
The editor does not seem to support tables by default, I’ll have to select one of the many plugins. However, Markdown tables are correctly parsed for post display, tables are just not friendly to create and edit in the editor.

Plugins
Plugin organisation and management leaves a lot to be desired. Searches don’t offer any sorting or filtering, and there are usually many, many plugins that achieve the same task to with varying degrees of support and success.

Themes and templates
Editing themes is quite a frustrating endeavor, requiring edits to various PHP and CSS files. The default themes use too many @media CSS rules for browsers of various sizes, which I find increases the difficulty in ensuring a uniform interface across devices when making changes to attributes.

No Markdown
I was disappointed that a plugin was needed for Markdown, especially as the most recommended solution is quite a vast plugin that unnecessarily integrates my blog with a wordpress.com account for various other features. Even with Markdown support, there is no syntax highlighting for it in the editor.

Less Control
WordPress obscures a lot of the markup process and I have already found it putting tags where I don’t want them with no way to circumvent it. There is an awful lot of crap in the headers and footers, though part of this is for improved indexing in search engines which is one of the reasons I switched in the first place.

Post migration
I said content migration aside, but it was such an awful and frustrating process that I encourage you to think twice about how you will get your old posts imported if you are planning to do this yourself.

Initial Conclusion

Sidetracked by how simple the installation process was, I’d clearly underestimated the work that was required post-install for actually getting the blog migrated with the same functionality, content and design.

That said, overall, I think I’m happy with the migration currently – the pros narrowly outweigh the cons and hopefully most of the effort expended is a one-time-only initial set-up deal. At the very least, it’s easier to get things out of WordPress, than in. I’m looking forward to a more streamlined authoring process from this point on.

Though, I think if I’d known how painful content migration was going to be (I expected having to do some conversion and maybe manual input of metadata, but was not expecting the conversion to be so flaky), I would definitely have thought twice about whether my time could have been better spent on something else. With all this in mind, I’d like to suggest a new installation tagline for the WordPress team:

WordPress: Five minutes to install, a lifetime to configure.

  1. Though, it is written in PHP – a language generally regarded as having a very low barrier of entry.
  2. Typically originating from China…
  3. Although this often proves problematic in practice, as plugins often require structural permissions, such as the creation of tables.
]]>
https://samnicholls.net/2015/10/23/wrangling-wordpress/feed/ 1 54
Welcome back, WordPress https://samnicholls.net/2015/10/19/welcome-back-wordpress/ https://samnicholls.net/2015/10/19/welcome-back-wordpress/#comments Mon, 19 Oct 2015 01:12:45 +0000 http://blog.ironowl.io/?p=29 I’d fallen in and out of love with WordPress years ago. Undoubtedly, it revolutionised the process of keeping (and more importantly, actually maintaining) a blog and expanded to serve as a reasonable stand-in as a simple few-page website for the lazy and design-illiterate, too.

I switched to running my own WordPress installation after finding myself frustrated at the lack of tools offered by early-00s Blogger. Yet over time, as development rapidly continued, WordPress acquired piles of features to cater to every girl and her cat. I felt like it had lost sight of just being a great blogging platform. WordPress was doing too much, it had become a bloated machine that did far more than I needed to just keep a blog. Perhaps worst of all, its popularity (and codebase) had made it an excellent attack vector for script-kiddies and true black-hats alike. It was time to find something else to play with.

WordPress in its natural habitat

WordPress in its natural habitat

A common question that surfaces from time to time on various mediums I frequent is “What blog should I use?”. I don’t think there is a real right answer (certainly not for the general case) and to an extent I think one might as well be asking “What colour is the bike shed?”. But the answer I would typically settle on to sum up my past experiences was:

“Anything but wordpress.”

My last use of WordPress was ended rather abruptly1 when the RAID controller for my first VPS malfunctioned catastrophically and I lost everything2. Setting up my first unmanaged VPS prompted me to re-think my options. By this point, I’d settled for a manually curated static personal site. I’d also truly fallen for Python and any web applications I needed to write were atop of the  Django framework3. For the longest time I got away without even having to install PHP.

During my first year of university, I enjoyed Posterous, until it closed down and I dabbled in one platform after another. For longer than I should have, I used tumblr. Mainly to keep track of my undergraduate thesis progress and even the humble beginnings of my PhD research, I liked its simple user interface and concept of different “post types” but in the end, it unsurprisingly fell very short of the functionality needed to communicate both academic and technical writing.

It is an incredible resource for reaction GIFs, though.

In the market for something new, a friend of mine suggested I try out Jekyll: a generator that effortlessly transforms Markdown into a static website.The best part? GitHub can generate and serve these websites for you, for free.

With a little bit of fiddling, I was up and running with what is probably the best looking blog I’ve ever attached my name to. More impressively, with hardly any configuration, I had syntax highlighting, fancy tables and footnotes too.

This set-up has served me well since January. After the initial faff of templating and the usual battle of wills against CSS, the authoring and maintenance process has been frictionless. One just commits a Markdown post and GitHub takes care of the rest.

But the novelty of simplicity has begun to wear off and I’ve found myself wanting additional functionality. I want categories and tagging back. But it’s not that Jekyll can’t do this, arbitrary Jekyll plugins cannot be executed with GitHub’s Pages platform4. Sure, I could install Ruby, mess around with gems and generate my own Jekyll powered blog5, but most of the attraction to this set-up was the no-installation-necessary, and the automation.

A longing for categorisation was not my only problem. I also despised writing tables in Markdown, and my media-embedding workflow left a lot to be desired; having to commit the image to Git (often from another device) and manually craft the URL for the image tag in the Markdown text. I also missed captions, comments (which I did acheive, by embedding Disqus), and search.

Speaking of search, I also found my personal website performing less well in searches for my name and relevant descriptors after pointing it directly at the GitHub pages hosted blog with a CNAME. I’m yet to come to a satisfactory conclusion on why this is the case but suspect it is one of:

  • Index entries existed both for samstudio8.github.io and samnicholls.net and I had not remembered to use Webmaster Tools6 to “move” indexes from the former to the latter.
  • Confusion caused by suddenly setting samstudio8.github.io‘s CNAME to samnicholls.net prevented indexing of one or both domains. Indeed, for some time afterward, dead results to documents hosted on the previous static website plagued rankings and very new blog posts failed to appear without manual submission for indexing.
  • My Google-fu SEO-speak was just plain weaker on my no-frills Jekyll template, causing either a loss in score for keywords such as my name, or reducing the index frequency of the site, which would explain the lack of new posts populating results.

And so, it appeared that once again I was on the market for a new blog platform.

Recently, my partner wished to move from her wordpress.com hosted blog to something she had more control over. After brief consideration of available options it seemed sensible to “stick with what we know”, primarily as importing WordPress to WordPress was a mostly trivial endeavour that could be accomplished via a plugin, which was especially useful given the number of embedded photos and videos.

The blog in question appears well indexed and has gained popularity for specific niche queries, such as tourists in Spain looking for where to swim near Torrevieja. It even turns a tiny AdSense profit.

Clearly jealous, as my blog was not even able to rank for a word that I invented without intervention, I’ve come full circle, betrayed my vow to never use WordPress again and am trying it out for myself, once more.

Welcome back, WordPress.

  1. Through no fault of its own, I should add.
  2. It was around this time that I learned a tough lesson about backups: <Don’t trust anyone else to do them for you.
  3. One of the few projects I can’t help but fanboy over.
  4. An obviously sensible choice by GitHub that I’m not sore about.
  5. Which I had actually done in the past to make a static site for the short lived OAOSIDL project.
  6. Apparently now called Search Console
]]>
https://samnicholls.net/2015/10/19/welcome-back-wordpress/feed/ 1 29
Ghostbusting https://samnicholls.net/2015/05/03/ghostbusting/ https://samnicholls.net/2015/05/03/ghostbusting/#respond Sun, 03 May 2015 09:59:47 +0000 http://blog.ironowl.io/?p=250 Shortly after setting up this blog, I embedded Google Analytics tracking; primarily because I like numbers but also in hope of discovering that at least one other person who isn’t me or one my supervisors is interested in my adventures. It’s also great writing practice and gives me the chance to properly think through the things that I am doing to avoid looking wrong on the internet.

I was already in the habit of spamming links to my posts via various social networks so it wasn’t a long wait for the warm, fuzzy feeling of confirmation that people were actually reading my work. Or at the very least clicking on it.

However after a few days, I noticed several strange entries amongst my lovely numbers1:

Source Sessions % of Referrals Bounce Rate Pages / Session Avg. Session Duration
pornhub-forum.uni[dot]me 90 37.50% 97.78% 1.02 00:00:05
free-share-buttons[dot]com 79 32.92% 8.86% 1.91 00:01:24
site4.free-share-buttons[dot]com 25 10.42% 0.00% 2.00 00:01:32
site3.free-share-buttons[dot]com 18 7.50% 0.00% 2.00 00:01:27
site2.free-share-buttons[dot]com 17 7.08% 0.00% 2.00 00:01:28
forum.topic62206786.darodar[dot]com 7 2.92% 0.00% 3.00 00:00:00
Get-Free-Traffic-Now[dot]com 4 1.62% 100.00% 1.00 00:00:00

Curses. All my non-social referrals are ghost referrals! Disreputable publishers use spambots liberally to remotely execute Google Analytics tracking scripts2 to appear to be providing a stream of referrals to your website. Though, I’m unsure of the aim of this apparent data pollution3 attack. Beyond a poor attempt at driving confused hostmaters to the sources to increase organic traffic I don’t really see what the benefit to the executor is4? Perhaps the sites attempt to install malware on or capture more valuable information from unsuspecting visitor’s machines.

Oddly, page specific metrics (such as landing/exit pages) are polluted too. Bots copy the hostname of the referral source to the page name of the false hit, giving hostmasters the impression something more worrying is afoot. It’s easy to forget falsification of data is a potential possibility, especially when one is not responsible for collection and management of the data.

None of this is particularly important or bothersome, unless like me, you like numbers and numbers that are wrong are upsetting. So how can normality be restored? These spambots target indiscriminately and remotely, leaving them unaware of the actual target and thus with no option but to spoof the hostname of the hit (or leave the field unset) which should in fact match that of the website under attack.

A helpful blogpost details how to set up a simple filter in your Google Analytics control panel to remove future5 spurious data by ignoring hits which fail to provide a valid expected hostname6. The remaining 52.25% of my traffic appears genuine, hooray!


tl;dr

  • Spambots performed a seemingly pointless data pollution attack on my Google Analytics records.
  • One should always be as suspicious of data as possible, especially if it was collected by somebody else.
  • I like numbers. I really don’t like people messing with my numbers.

  1. I haven’t censored the source column as it may be potentially useful to others having the same problem3
  2. Presumably by guessing or spidering Google Analytics tracking IDs. 
  3. Electing to use “pollution” over “poison” here as the result is less directly toxic and more of confusing annoyance. 
  4. Although by mentioning the domains here perhaps I’ve done exactly what they wanted… 
  5. If you are as fussy as me when it comes to data, that same blog has another helpful post which offers a method to clean up historic data too. 
  6. Be sure to correctly escape the regular expression! 
]]>
https://samnicholls.net/2015/05/03/ghostbusting/feed/ 0 250