Most recent articles

PAGE: PREV 1 2 3 4 5 6 7 8 9 10 Archives NEXT

It's awesome when your joke takes off

Monday, July 14, 2008, 10:49 PM
Thoughts by John (Article #221)

First, some background: today this guy was posting on Digg.com looking for people to stress test his 486 Linux box. He's running some thing called WebLua, plus some Lighttpd. The website is http://foureightysix.go-beyond.org.

Overall a pretty cool project, so I have no issue sending some linkage love out into the world to a total stranger. He lists the top referrers and how many hits from them and what percentage of overall hits they constitute.

Of course, because I'm a dweeb who reads crap like XKCD and such, my first idea was this: referrers are notoriously deceptive. It isn't hard even with Firefox and a few plugins to change your referring URL to whatever you want.

If you know Linux well, you should know cURL, which allows you to send and load data across a number of protocols, include http.

Now, I'm not the first person to realize that the referrer on his page could be tricked out.

I was the first to realize that cryptic is better than some form of accusation of homosexuality or some hardware driven joke (386isbetter.com is a good joke).

I went for www.fbi.gov as my referrer. When I had left the joke go, it was in the middle of the pack with like 150 hits, all mine. When I came back, I was impressed to see 12,000 more hits! Haha! Joke delivered + joke shared + joke repeated = joke accomplished!!!

6.45% of the folks hitting the stress test thought fbi.gov was funny enough.

I only mention this, because internet humor tends to be more than a tiny bit boyish, usually tossing about an indictment of someone's gender, sexuality or their preferred hardware. It's funnier to hint lightly that the FBI is watching you.

Just my small contribution to a slightly dumber world.

UPDATE: 7:59 AM

His system is finally showing some slowdown. I think it is probably fair to say a lot of that is the product of running 16 Mb RAM on a system running non-stop.

Understandably, most of the inefficiencies in a web server stem from the scripting languages. PHP, while easy-to-use and easy-to-prototype, isn't always a blazing fast system. And that was his point using something like WebLua.

It's kind of a "let's get back to basics" message.


Mail article to a friend

Domain name generator, plus WHOIS and PageRank features

Friday, July 11, 2008, 11:53 PM
Website Design by John (Article #220)

I've been tinkering around with a domain name generator the last few weeks. The first fairly workable version is now available at GeneratedNames.com. The site's primary function is to leverage an English language dictionary to allow webmasters to find new and relevant domain names and register them.

The site also offers a combination PageRank check + WHOIS lookup. I added this largely because I got tired of looking up both PR and WHOIS data at separate sources.

The site still has a way to go. It is heavily dependent on a recursive function that grows fairly fast with the number of words you decide to lookup. So, there are still some speed issues with larger queries. Of course, if PHP has one really ugly aspect, it is that recursion is painfully slow.

Anyhoo... that's what I been doin'.


Mail article to a friend

What I've been working on lately

Tuesday, July 1, 2008, 1:30 PM
Other Stuff by John (Article #219)

I have recently produced two new websites for clients and am working on two new sites of my own.

The two for clients are DuBoisBride.com and Royers219AutoSales.com. Neither site is very glitzy, but both are function-driven. DuBoisBride.com is a redesign to improve the functionality of the registration system for an annual bridal expo. Royers219AutoSale.com, as the name suggest, is a car dealer website.

The two sites I am currently working on for my own reasons are MakeABase.com and GeneratedNames.com.

MakeABase.com

MakeABase.com is intended to allow users to embed database tables in their own web pages with no coding, similar to how you just copy and paste some JavaScript in order to embed Google Maps on your own web pages. I set up a demo of an embedded DB table running on a GeoCities website. And, yes, it gets around cross-site scripting issue without using any hacks or zero-day stuff. It is 100% valid, safe JavaScript that does the job. It isn't a true AJAX app, because the only good way to handle the system was to ditch being asynchronous and working over ActiveX.

The process of building a new DB is simple. Take a spreadsheet (right now it supports Excel XLS and Comma Separated Values CSV file formats), upload it and then copy and paste the JavaScript into your web page.

GeneratedNames.com

GeneratedNames.com is a website that uses a dictionary to help you generate domain names for your website. You enter you preferred keywords and then it gives you a series of definitions. You select the definitions that most closely match your intent, and then it fires off a list of related words. You select the words you like and then it remixes them to generate a list of potential domain names. You can then check to see if the domain name is registered. If it isn't registered, then it offers the option to register them through GoDaddy.

GeneratedNames.com still has a way to come before it is really production ready. MakeABase.com is closing in on being ready to test with other users.

Of course, the bitch with MakeABase is that if it generates any interest, it isn't going to take long for it to outstrip the server it now running on.


Mail article to a friend

Even my spam tells Soviet Russia jokes

Sunday, June 29, 2008, 6:35 PM
Thoughts by John (Article #218)

Yup, when your inbox pulls a Russian reversal, it is time to wonder. Consider this gem:

Subj: in sov r bot tests you

Message:
Text

Obviously some guy is testing a spam bot. But, still, you gotta give credit where credit is due. He could have passed a message saying, "Test." and called it at that.

And, yeah, I'm pretty sure there is a social engineering element where he is using a worn phrase improve his ability to classify sends as positive. The thing is, I still fell for it.

I read some of my spam every week, just to see what the trends are. Everyone still remembers where they were when they read their first spam message inexplicably full of pulled quotes from Jonathan Livingston Seagull.

And the thing is, I'm only ever going to read so much of my spam, even if I am checking trends. I mean, come on! It's frackin spam.

So, I give congratulations to the guy for using a subject that ensured that of all the spam spilling from all the web onto my server, it was his spam I read.

After all, in Soviet Russia, spam reads you.


Mail article to a friend

Did a similar text function bite Yahoo in the ass?

Friday, June 20, 2008, 11:29 AM
Thoughts by John (Article #217)

I was reading Talking Points Memo today (yeah, I am a Democrat, and have been so since I turned 18 in 1996, which make me better than you, because I was a Dem when it wasn't cool) and they had an article that jumped right out at me as a tiny software not-quite-glitch.

The basics: it is an article about Barrack Obama opting out of public funding for the 2008 general election. Fair enough. But, the randomly pulled photo right next to it was one with osama bin Laden and Aymin al-Zawahiri in it.

I think Yahoo is running some variant of the similar_text function, and that it just bit them in the ass. Depending on how you treat it, and what version you're using, you can pull false positives.

One of the things I do with my similar_text algortihms is to score something for not-quite matches. I score them a significant amount lower than 100% matches, but they're still scored.

It is easy to see how a less than 100% match could produce a false positive between Osama and Obama. If you run a quick and dirty test on PHP's similar_text function you will pull an 80% hit.

I'd imagine because Yahoo is trying so hard to always match a photo to a story, they're playing pretty fast and loose with the algorithm One thing I've learned playing with articles that have huge swathes of non-matching text is that you have to have a fairly low threshold to score a hit. For example, you can weight non-matches too heavily, because odds are high that 60-95% of the text, depending on length and subject matter, will not match.

It is not a great defense. It may not even be a defense. it is an advisory about about how bad the similar_text function can bite you.


Mail article to a friend

Copyright bullies

Saturday, June 14, 2008, 2:34 AM
Thoughts by John (Article #216)

Sometimes you wonder if members of the old economy even remotely get what has happened to the world while they were off pretending the end is not near.

Case in point: the AP has apparently developed a fetish for issuing DMCA takedown notices to some blogs for citing portions of AP articles and linking back to the originating articles. The longest citation was, get this, 79 words. In essence, the AP is conducting a frontal assault on fair use and fair comment, because it is too stupid to understand that an ethically correct backlink with a limited citation is a net gain for the AP.

Ponder that for a second.

What the AP is actually citing is a variant of "hot news misappropriation". This is a branch of unfair competition law, not copyright law. The problem with citing hot news misappropriation is that isn't all that rough of a law. Basically, if you're not outright free-riding on someone else's efforts, you're in the clear!! For example, Motorola successfully defended itself against a misappropriation claim in 1997 for retransmitting key details of NBA games.

The other half of this equation is the fair use and fair comment doctrines of copyright law. Fair use basically says you have the right to reproduce a limited portion of a copyrighted work for expository purposes within a derivative work. So, for example, if I am writing about Abraham Lincoln, I have the right to cite, within reasonable lengths, books about Abe Lincoln. If I am discussing the war in Iraq, I have the right to cite AP articles about the war.

Fair comment is actually a bit stronger for the accused. Fair comment says that when you are discussing a particular matter, you have the right to cite fairly significant portion of a copyrighted work in order to illustrate a point you are commenting about.

Now, both fair use and fair comment are meant to be 100% compatible with fair trade practices. You can't quote 198 pages of a 200 page book and claim you're just illustrating a point in fair comment. Although, it should be noted that the shorter a work is, the greater the percentage you are allowed to cite. For example, you can cite all of a piece of ad copy. Nike couldn't sue you for saying that quoting "Just do it" is cribbing 100% of its copyrighted work.

I get why the RIAA, the music industry stooges, can't get the point. The music industry is the legitimate front of an outright criminal racket, largely operated by elements of the Italian-American mafia. It would be out of character for guys like Tommy Matola to "get it".

But, you might think folks who regularly sue government and corporations to open up information would be a little more receptive to the general idea of expanding public discourse.

Apparently not.

News outlets are already losing their taste for dealing with the AP
, as demonstrated by the fact that the eight largest newspapers in Ohio have started their own news sharing service that provides, surprise, ethical attribution and backlinking where possible.

The AP is starting to stink a lot like RIAA, isn't it?

I get the notion that the AP may be fighting for its very existence. News, especially the newspaper business, is increasingly diluted by the internet. Newspapers themselves have a much stronger incentive to drop the AP altogether and simply report local news. A lot of papers don't, because they don't have a real business plan for selling ads on fewer pages.

Of course, local advertisers are, themselves, too dumb to catch on that they're being had by folks with decreasing readership. You'd be horrified to discover how many newspapers are including their weekly freebie rags' circulation in with the regular circulation to generate a circulation figure that is often inflated by more than 100%.

These folks are shoveling a lot of bullshit in order to make ends meet. And if you think for a second they won't do something desperate and stupid to keep those ends meeting, you'll be unpleasantly surprised.

Of course, they could modernize. But, why bother doing that, when you can just keep reprinting vanilla crap from the AP wire.

Newspaper folks are dreadful bunch. Especially at newspapers that are more than 50% AP copy. It turns out, people need proximity to their subject in order to have enthusiasm.

The truth is, with the internet, the AP is utterly pointless. De facto national news sources, such as the New York Times, are readily available. What we need is more local and a ton less AP.

In fact, if the AP went under tomorrow, the average person would barely notice. Well, maybe they'd notice that their local newspaper became about 1,000% better!

The truth is, most newspapers need to cut the AP out altogether. Seek a business relationship with other regional newspapers. Burn the AP to a crisp.

The news in general would improve significantly to show for it.


Mail article to a friend

PHP's similar text function

Friday, June 13, 2008, 11:40 AM
Source Code by John (Article #215)

The similar_text function in PHP is an intriguing, if not 100% useful function. The function works like so...

$sim = similar_text($a, $b, &$p);

This delivers a variable, $p, that shows th percentage similarity between two strings.

The fundamental flaw of this function is that it has real memory performance issues when comparing medium-size strings, anywhere from 10,000 character and up depending on the amount of memory in your server. This is because the function itself, on the binary level, is built to run recursively. So, the eventual memory load is huge.

Oddly, if you explode your two string in PHP and run through them recursively in PHP just using similar_text to compare single word string, similar_text works very well. What I do is break my strings up into all the words, using the explode function, and then go through all the words recursively, scoring the percent similarity and the number of hits.

While you have to work around the function a bit, it is a very good function. It allows you to build scripts that will deliver quality search, with some adjustment for misspellings. I use it for a function that compares items from RSS feeds saved in a DB. This allows me to create a similar news items functions. It could also be easily integrated for a similar items function in an ecommerce shopping cart.



Mail article to a friend

Know your web toys: Google static maps

Thursday, June 12, 2008, 10:21 AM
Website Design by John (Article #212)

Here's something you may not have been aware of: Google Static Maps.

It serves up Google maps, without the JavaScript. This is ideal for websites that for some reason cannot push their users into deploying JavaScript (I guess that would most be warez and hacking sites, cause I can't imagine who else would still have lots of users scared of JS).

OK, maybe a more practical use would be that you have one map you want your users to look at, and you don't want them wandering off. I could also imagine a use for desktop apps that retrieve data from your server.

Anyhoo...

There is also now a Google Maps for Flash tool.

Pretty cool stuff, especially when you consider that Google Maps is the only really successful API tool in use on the internet.


Mail article to a friend

Useful advice from Google on Google

Wednesday, June 4, 2008, 12:12 PM
Website Design by John (Article #211)

From Matt Cutts's website.

It is a video, and it is a good rundown of everything your website should be doing to fight web spamming (fake links, fake sites, etc.).

By far his most useful advice: make spammers spend more time on each spam.

I'm still not a big fan of CAPTCHA (those dumb little images with a word in them). In my experience, a disabled submit button + a JavaScript to enable it it alongside the user agreement works better. Especially when integrated with requiring registration.

Now, the one thing I'll say on the subject is that what people often call web spam is often just being bitchy about marketing tactics. I only sweat certifiable spam. If a local advertiser uses the free classifieds page to push some obvious multi-level marketing mojo, I actually don't get uppity about deleting that as long as the guy pimps the thing out there as straight MLM.

One of the big take aways for the Matt Cutts video is the one thing I love about Google. Google is one of the few companies that gets you have to go out to your user base and just plain lay out what it is you are asking of them. Skip the BS and just say, "Hey, we need you guys gunning for web spam, too. If your sites are filtering this crap, it makes our job easier."

Google, when it is well integrated with sites and users, is an ecosystem. Sites provide good content that allows users to find what they're looking for and come back to the sites.

It's a good thing to be more inclusive.


Mail article to a friend

Sorting out the National Weather Service XML feed

Tuesday, June 3, 2008, 12:19 PM
Website Design by John (Article #210)

I was tinkering yesterday with building a 7-day forecast from the data that can be had off the National Weather Service XML feeds. Let me tell you this: those feeds are frakkin cumbersome. Wow.

First off, the NWS feed is built to work with their own code using PHP/SOAP. Which, hey, I have nothing against, but not if that is all the feed is built to do.

The NWS is typical of what is wrong with a lot of government open data projects. Now, it isn't as poorly designed as the FCC's database -- because that was built to just be a pain, because you have to POST your request in before you can get the data. On the other hand, once you get the data out, the FCC's system is dead easy, since you can pull it all into an XLS or CSV format and from there do whatever you need to export it to database.

The NWS data is not only inconsistent, the keying on it is inconsistent. Hahaha.

Check it out, here:

Depending on how many hours of data the NWS has, the keys for the data change. But, it doesn't note which pieces of data the keys actually go to. So, for example, there maybe 57 pieces of data for the Dew Point that day, and it will correspond to one of the k-Xh-Xn keys. But, it only matches the keys to hours and and days in SQL format + GMT deviation.

So, you actually have to demuck from the feed how long the strings are for the data you want before you can match them to a key and thereby to a day.

For example, in order to pull the hourly predicted temperature, you have to find the XML category for that, identify it, load all its data into an array, and then count them. You also have to load all the keys for all the data. And you have to put them in an array and count those, too. And then you have to create your key-to-data matches by matching the counts against the key of the same length.

I don't have a problem with doing that. But, holy shit is that a lot of effort to get at what is supposed to be easily publicly available data!!

Some point here I will clean up the code and post it here. If someone has a strong interest in it,
use the contact page and I will post it.


Mail article to a friend

PAGE: PREV 1 2 3 4 5 6 7 8 9 10 Archives NEXT

© 2010 Pro Content and Design. All rights reserved.


Tools

Check Google PageRank


Welcome!

Wonder where to start with your web design business?

This blog follows along with my efforts to build and grow a website design business, Pro Content and Design.

The goal of this blog is to fill in blanks that may be empty as you get your business rolling.

This blog, particularly the source code section, is not intended for beginners. If you are not comfortable with databases, Ajax, DOM objects and other advanced methods, I strongly suggest you go take a look over at W3 Schools before even reading -- let alone tinkering with -- any of the code here.

I hope this blog has some value to web designers as they attempt to get their businesses going.

Good luck, and happy reading.

Thank you,
John Crawford
Pro Content and Design

Books


I highly recommend Art of the Start if you have no idea where to start with marketing.

Links

Coding
W3 Schools
IBM's Mastering Ajax Series

Graphic Design
Worth 1000
Stock.XCHNG
Urban Fonts

Website Software
Apache Web Server
SquirrelMail
PHP/Zend

Website Design Issues
Non-Standard Character Guide
Google Trends
Search Engine Optimization Analyzer

Business
Guy Kawasaki's Blog
Seth Godin's Blog
Freakonomics

Computers
NewEgg

My Main Website
Pro Content and Design

Websites I have built
PunxsyPage: local free classifieds website

Farm N Land: low-cost real estate listing website

Groundhog Festival: for the local summer festival

Weather Discovery Center

My Webapps
TV Stations Transmitter Database

Google PageRank Checker