Source code

PAGE: 1 2 3 4 NEXT

301 redirects, and the importance of keeping SERP mojo

Sunday, November 16, 2008, 11:59 PM
Source Code by John (Article #229)

My current big project, starting to wind down a bit, is a redesign of the store website for GroundhogStuff.com. For those who don't read my blog much, I live in Punxsutawney, PA. Yes, the town with the groundhog. But, whistlepigs are not the topic -- stop distracting me!

This site has been running a ShopPal system for about four years and has accumulated several hundred products for sale during that time. We had to transfer all these products over from the old ShopPal system to the new custom system I built.

Of course, because I am a geek, I slapped together a script using cURL to login to the existing ShopPal system, load the existing product pages from the admin, dump those into a browser and dump the browser stuff into files using ScrapBook. The reason for the browser fun is that ShopPal isn't particularly friendly to cURL, and seems to require a browser running before it will serve up many admin pages. There's probably a way to get around it, but the cURL + ScrapBook solution was what jumped into my mind the fastest.

So, once we have all these pages saved, I parsed them and dumped the contents into the MySQL database running the inventory system for the website.

But, I also wanted to preserve the old URLs and redirect them to the new sources.

Now, a ShopPal URL will go something like this one:
/index.cfm/fa/items.main/parentcat/9099/subcatid/0/id/196629

Kinda cumbersome, right... So, I wrote a bit of code to parse the incoming URLs for the old ShopPal structure and redirect those, based on copies of the files from the admin, to the new URLs. If you click on the link above, you'll see that it dumps you to a kinda WordPress-y style mod_rewrite URL.

Now, here's the big thing to remember... if you want to keep your Google mojo, you gotta use a full 301 header to redirect GoogleBot to the new URL structure. 302s are bad, bad, bad.

So, for example, the underlying PHP code for this particular solution ends up being something like this:

header('Location: /shop/$cat/$id'.'_$atitle.htm',TRUE,301);

This lets GoogleBot know that the new URL structure is permanent and that it can here forward ignore that old URL.

More importantly, if preserving your SERP rankings in Google is important, the 301 is considered the one and only way to do the job. Google does not accept 302s to pass PR and SERP mojo, because that would allow people to game the system, as the 302ed page would still retain it's rank. In PR land, you got give something (the old URL structure's PR) to gain something (the new URL's PR).

It's as simple as that.

If you want a little bit more information, check out this article by Matt Cutts about 302 redirects and their general suboptimal-ness.


Mail article to a friend

Hitting your website with a pipe

Saturday, September 6, 2008, 12:44 AM
Source Code by John (Article #227)

UPDATE: I cannot for the life of me get a pipe past my scrubbing script. And I'm not going to break working code just to illustrate a point. So, when you see the word [PIPE]. assume it means the broken vertical bar character...

Here's a cool little character: the pipe. It's SHIFT + backslash on your keyboard.

That little guy is a pipe. And with the rising interest in using server-side Linux command-line apps to handle a variety of tasks -- for example, most video uploads sites are using ffmpeg at some point in their process -- the pipe is of rising importance.

In Linux, a pipe allows you to pass the results of one command-line app to another command-line app. To pick an obvious example, you can combine LS and GREP like so...

ls [PIPE] grep 'flv'

This command lists the contents of the current directory, then dumps those contents to the greap command, which would sort out all the FLV files. And you can endlessly pipe commands on and on. It's a pretty cool trick when you get right down to it. Imagine being able to batch process images, OCR them, move them to a new directory, rename them, staple them and mail them to Mars. That's what pipes do.

But, from a website standpoint, pipes can represent an awful danger. Because, if you're invoking something like PHP's EXEC function without scrubbing your inputs, it is possible for someone to pipe a new command right into the command you are EXECing.

Think about it: it is a gateway for someone to execute a server-side command with full read-write access at least within the current directory.

Ouch.

Suddenly something soooo awesome becomes sooooo suckie, right?

Well, not really.

For the most part, if you attack input scrubbing as a least-privilege problem, it's not a big deal. Using a function like PREG_REPLACE, it isn't hard to remove pipes and leave whatever characters you do need for the command to execute.

Yeah, I know, I'm a bit cavalier in my desire to allow such inputs. And it isn't something the beginner PHP coder should pursue. But, there's just so much cool stuff you can run on Linux. Why leave all those toys on the shelf when a little aggressive scrubbing is all it takes to bring them out of the toybox?


Mail article to a friend

Friggin objects nested in PHP arrays

Monday, August 18, 2008, 10:45 PM
Source Code by John (Article #225)

Anyone who visits my blog much knows I'm not the world's biggest fan of using frameworks or even someone else's code. Yeah, I've tinkered with WordPress a bit, that's mostly to drag out what really makes it effective from an SEO standpoint.

I am currently working on a project that makes use of the US Postal Service's Web API Tools. The US Postal Service requires that you jerk around with their shockingly limited testbed server before going whole hog onto their production server. In the interest of not pulling my hair out with experimenting on this very limited testbed server, I opted to Google some of the API code.

But, the code itself dumps objects nested inside arrays that don't have very useful keys. OK, find some more code, right? Wrong. A lot of the API-ready code doesn't seem to work. Hahaha. So, a crappy solution is better than no solution, right?

Proceeding with the crappy solution, I had to dig the objects out the array. This means going through the array recursively -- something PHP does piss poorly at best -- and identifying the objects and then converting them to more usable variables.

One function helps tremendously in this task: get_object_vars.

Get_object_vars is a PHP function that loads your objects into an array. It's not always the most useful functions, since obviously well coded objects should actually be simple and easy to use. A good object should work like $whatever->subwhatever and away you go, right? But, not all objects are good objects. Especially when they are nested in arrays that have little or no identifying keys and the keys change dramatically based on what they're loading.

That's where something like get_object_vars comes in. You've dumped your whole array and you run into a roadblock. It's an object. Instead of desperately trying to make the object work, which can be hard to do if it isn't your code and the code is poorly documented, you can just dump the object to an array and the go through it recursively.

Is that good form for coding? Nope. It's terrible form.

But, one thing you find out about coding for a living, especially in a meat market like web design, is that whatever gets the job done is right. Especially if you can ensure it doesn't have any security flaws.

I've seen lots of well-formed object-oriented code that just doesn't work well. Hell, look at Digg's recent problem with unscrubbed inputs!

I think this is one of the reasons I hate the religion of so-called good coding. Well-formed code can still be garbage. And poorly-formed code can still run through hackers like a tank. Now, that's a little more of a rant than the topic deserves, so I'll leave that at that.

Whatever the case and whatever your stance, get_object_vars is a handy tool.


Mail article to a friend

PHP's similar text function

Friday, June 13, 2008, 11:40 AM
Source Code by John (Article #215)

The similar_text function in PHP is an intriguing, if not 100% useful function. The function works like so...

$sim = similar_text($a, $b, &$p);

This delivers a variable, $p, that shows th percentage similarity between two strings.

The fundamental flaw of this function is that it has real memory performance issues when comparing medium-size strings, anywhere from 10,000 character and up depending on the amount of memory in your server. This is because the function itself, on the binary level, is built to run recursively. So, the eventual memory load is huge.

Oddly, if you explode your two string in PHP and run through them recursively in PHP just using similar_text to compare single word string, similar_text works very well. What I do is break my strings up into all the words, using the explode function, and then go through all the words recursively, scoring the percent similarity and the number of hits.

While you have to work around the function a bit, it is a very good function. It allows you to build scripts that will deliver quality search, with some adjustment for misspellings. I use it for a function that compares items from RSS feeds saved in a DB. This allows me to create a similar news items functions. It could also be easily integrated for a similar items function in an ecommerce shopping cart.



Mail article to a friend

PHP's fgetcsv function sucks

Monday, May 19, 2008, 12:23 AM
Source Code by John (Article #203)

Fgetcsv is a function that ships with PHP since sometime in version 4. It allows you to parse a Comma Separated Values format spreadsheet.

But, it comes with an ugly flaw: it doesn't automatically belch out null values.

Now, if you are using CSVs to create databases in MySQL, that sucks big time. Because every iteration of an INSERT command in MySQL has to have the right number of data inserts to correspond the all the fields you request.

Yes, I am aware that the LOAD command works more efficiently from a DBA standpoint. But, for my purposes, I'm letting folks upload CSVs, and there is no viable way to allow them to use the LOAD command. So, I am stuck with CREATE and INSERT to provide the simplest and most accessible approach.

Yeah, I probably should have written the script in Perl. But, I like to keep a project as close to a basic framework as possible. Plus, I'm just not exceedingly fond of diving into Perl.

But, you'd thinks such a basic function would have been integrated with the notion in mind that CSV files are rarely consistent beyond the fact they have commas in them.

My workaround is simple. The first line of these CSV files is supposed to have the title for those fields in it. Count all the fields in the first line, and then iterate out some blank spaces into the empty fields if they yielding null values for those fields.

The project itself uses slightly under 100 fields. It is large enough that importing the spreadsheets from Excel is too cumbersome and eventually runs the server into an out of memory error.

And, with 100 fields being dynamically inserted from mismatched content over perhaps 35,000 or so entries, you're pretty much begging MySQL to get catty about something.

Now, here's the code that the main PHP website suggests using...

$row = 1;
$handle = fopen("test.csv", "r");
while (($data = fgetcsv($handle, 1000, ",")) !== FALSE) {
$num = count($data);
echo "

$num fields in line $row:

\n"
;

$row++;
for ($c=0; $c < $num; $c++) {
echo $data[$c] . "
\n"
;

}
}
fclose($handle);


The primary flaw in this code is that it assumes each line will be consistently populated with content. Anyone who has ever handled CSV files will tell you that ain't the truth at all.

But, if you add this little bit to the code...

$row = 0;
$c=0;
while (($data = fgetcsv($handle, 4096, ',', '"'))){
$row++;
if($row==1 and $c==0) $num = count($data);

Something funny happens when you go to iterate all those cells into the fields in MySQL. Now you have empty sets that can be used to insert null values into MySQL. Haha.

Now you won't get those MySQL errors griping about not having a match at row 1 or whatever. Because now if the CSV doesn't give you a piece of content, you still have a proper column count that allows you to stop and say, "Well, if I'm missing data, let's use a conditional structure to insert a null value into MySQL."

Haha. Stupid row count error begone!


Mail article to a friend

Some working code for the latitude-longitude conversion project

Thursday, February 21, 2008, 10:39 PM
Source Code by John (Article #186)

An email came in tonight that had some trouble applying the information from the post about building a system that converts latitudes and longitudes into X-Y co-ordinates for use with images generated by GD with PHP.

In the interest of promoting a little more applicability, I'm going to post the code that generates an actual PNG file from the map database.

This loads all the co-ordinates into three arrays. One for the rivers in the county. Another for the roads. A third for the county political boundaries.

Then it goes through the arrays, and draws the lines onto the image. The line colors and thicknesses are based on the type of river or stream listed from the original shapefiles.

This probably doesn't make things any clearer, but it at least gives you a look at proven production code.

And, for the critics... please refer to the post where I admit that I am very lazy about constructing anything into a usable object class. Yeah, I know the functions is significantly more recursive and could be done better with some OOP. I just don't care, because the code works.

Anyhow... I hope this gives some folks who were looking at the lat-long post something with a little more meat to chew on.


Mail article to a friend
Download getmap.zip

PHP metaphone function

Thursday, January 24, 2008, 10:53 AM
Source Code by John (Article #175)

echo metaphone("Garage");

Output: KRJ

The metaphone function is very likely a function you have never seen unless you have dug pretty deep into the PHP manual. Of course, there is one very obvious reason for that: it isn't the most useful command in the world.

About the only real use I've seen for it is in building a quick spell checker function. Other than that, I could imagine a cryptographic purpose for it, perhaps as a salt in a hash. I've seen some search algorithms that use it, but I'm not certain that a search that yield matches for "sun" and "son" as winners is a winner itself.

But, it is fun to dig into the obscure functions and see what is hiding there.


Mail article to a friend

An object class for converting dates in PHP

Wednesday, January 23, 2008, 12:57 AM
Source Code by John (Article #172)

One of my really, really bad weaknesses as a programmer is a tendency to not tidy up commonly-used code into object classes. Of course, one of the primary reasons for this is that object-oriented programming is not necessary for anything to function. Another reason is that I tend to be a tinkerer. I like to dip in and out of the code constantly rechecking the functionality. I rarely have a sustained bout of steady coding.

But, I thought it would be a fun bit of teaching to show you the construction of an object-oriented piece of PHP code. The code I chose is my old date conversion code. This code converts dates from either SQL (yyyy-mm-dd) or regular (mm/dd/yyyy) format to SQL, regular or full English expression (Saturday, February 2, 2008).

READ MORE ...


Mail article to a friend
Download dateconv-oop.zip

Converting latitude-longitude into x-y cordinates for images

Tuesday, January 22, 2008, 2:33 AM
Source Code by John (Article #171)

UPDATE: You can find a full working piece of code for this project in this later post.

Right now I'm tinkering with building a map system out of the ARC Shapefile datasets available on the internet. These are the files that the Census and the USGS and such use to actually map pretty much everything in the United States.

Not surprisingly, these maps present a challenge. Foremost is shear size and processing power. For the sake of my early experiments, I'm limiting my efforts to local maps (Jefferson County, Pennsylvania) and converting those maps into three things: database entries for dynamic mapping, vector graphics for re-use and raster graphics for web use and for a long-standing project I have to rebuild Google Maps from scratch.

Assembling all this data presents another challenge. Converting the map data into images and vector files. Particularly a bitch is converting latitude and longitude, which have an origin that is to the lower right of the images I intend to generate into typical graphics which have an origin to the upper left of the images that will be generated.

READ MORE ...


Mail article to a friend

Additional approaches to large file support in PHP/MySQL environments

Saturday, January 12, 2008, 2:32 AM
Source Code by John (Article #168)

In recent weeks a number of folks have come to the blog from search engines looking for approaches to handling large files in a PHP / MySQL environment. In a previous article, I had covered how to use the PHP.ini and my.cnf files to increase upload sizes.

However, once you have accommodated those solution, sometimes the upload issue isn't a matter of file size permissions.

READ MORE ...


Mail article to a friend

PAGE: 1 2 3 4 NEXT

© 2010 Pro Content and Design. All rights reserved.


Tools

Check Google PageRank


Welcome!

Wonder where to start with your web design business?

This blog follows along with my efforts to build and grow a website design business, Pro Content and Design.

The goal of this blog is to fill in blanks that may be empty as you get your business rolling.

This blog, particularly the source code section, is not intended for beginners. If you are not comfortable with databases, Ajax, DOM objects and other advanced methods, I strongly suggest you go take a look over at W3 Schools before even reading -- let alone tinkering with -- any of the code here.

I hope this blog has some value to web designers as they attempt to get their businesses going.

Good luck, and happy reading.

Thank you,
John Crawford
Pro Content and Design

Books


I highly recommend Art of the Start if you have no idea where to start with marketing.

Links

Coding
W3 Schools
IBM's Mastering Ajax Series

Graphic Design
Worth 1000
Stock.XCHNG
Urban Fonts

Website Software
Apache Web Server
SquirrelMail
PHP/Zend

Website Design Issues
Non-Standard Character Guide
Google Trends
Search Engine Optimization Analyzer

Business
Guy Kawasaki's Blog
Seth Godin's Blog
Freakonomics

Computers
NewEgg

My Main Website
Pro Content and Design

Websites I have built
PunxsyPage: local free classifieds website

Farm N Land: low-cost real estate listing website

Groundhog Festival: for the local summer festival

Weather Discovery Center

My Webapps
TV Stations Transmitter Database

Google PageRank Checker