Solr Stopwords & Synonyms Collection

by Jason on May 22, 2011

As mentioned in my last post, I have been working with Solr extensively for the last few months and I am currently in the process of refining my schema.

This refinement has inevitably lead to me looking into the appropriate stopwords and synonyms for my implementation. A few frustrated searches on Google for other peoples stopwords & synonyms has come up short.

So, instead of giving up, I decided to start my own public Git repository!

I am asking the community to submit their stopwords and synonyms. Feel free to create branches for different languages, different industry implementations (for instance, I can imagine the stop words for a library would differ from the stopwords & synonyms of a Twitter search engine!).

If anyone would like to join this effort and/or have any suggestions to further this discussion I’m all ears!

How to reindex a Solr Database

by Jason on May 22, 2011

The past few months I’ve ventured into new territories such as Hadoop Map Reduce, Amazon Web Services, and the topic of this post Solr.

My experience with Solr has been amazing. The learning curve for this database is VERY light. In the past I’ve attempted to work with Cassandra and Amazon’s Key/Value Pair database, but both suffered from complexity/learning curve issues, limited database drivers, and in Amazon’s case, a lack of sufficient documentation.

Inevitably, after working with Solr for a little while, you’ll think to yourself, “I really need to tweak this field (analyzer, filter, etc)”. If you’re like me, you’ll begin with trial & error. You’ll modify the schema.xml file, re-deploy it, restart the server…. nothing happened? I still see the exact same data. WTF?!

Disappointment sets in when you realize that you have to re-index your data. You read it in the forums but don’t really know what it means. If you were like me you frantically started looking around for a Re-Index button in the Solr Admin, but you won’t find it.

So, I’m here to explain.

There are two methods to re-index your data:

  1. Re-run whatever process(es) initially processed your data set. For me, this wasn’t an option. I am currently gathering several gigabytes of data from a variety of sources and I’m not going to hold on to all of it.
  2. Query Solr, Re-Insert results. Any fields that you have chosen stored=”true” for in your schema.xml will be available to you in original form to re-insert (reindex).

For those interested, my company has allowed me to open-source my PHP script that will help you to re-index your Solr database.

Have a look

PHP-APNS fails to send push notifications

by Jason on June 23, 2010

Lately I’ve been involved heavily in iPhone development. Not so much on the Objective-C side, but more on the developing of supporting APIs and setting up a push notification system.

We decided to use PHP-APNS for the server-side Push Notification system. Their set of scripts (along with a few keepalive shell scripts) power our Push Notification server in a scalable fashion.

The problem

Once our app was approved and sold, we switched our certificates from development to production. Suddenly, we weren’t able to send push notifications to more than one phone! This was worrying.

We did the normal thing of isolating who owned what device token (so we weren’t sending push’s to strangers!), and tested, tested, tested.

The solution

It turns out that the problem was with our development device tokens. As soon as I stopped attempting to send Push Notifications to the device tokens used during the development period, everyone else was able to receive Push Notifications just fine!

I wanted to post this online because I searched all over the place and found no solution. Several people asked the question, but none received an answer. Here is the answer!

Fixing Snow Leopard’s Wireless Dropping Issue

by Jason on April 4, 2010

Ever since upgrading to Snow Leopard I’ve had major issues with my wireless connectivity. I recently hooked up an old Mac Mini to my TV for the sole purpose of running Boxee and these wireless issues have plagued me.

I tried all of the “fixes” mentioned on forum posts including:

  1. Manually setting IP address & DNS in Network Settings
  2. Flushing DNS
  3. Restarting Airport (doesn’t help if you can’t connect in the first place)
  4. Resetting PRAM

All of the solutions would temporarily fix the problem but usually within 10 minutes the connection would drop again.

So, my solution was to write a simple bash script that checked for network connectivity and if none existed it would flush the DNS and restart the airport.

Simply place this script on your system and create a cron job to execute the script every minute.


count=$(ping -c 1 '' | grep 'received' | awk -F',' '{ print $2 }' | awk '{ print $1 }')
if [ $count -eq 0 ]; then
	#no internet 
	#make sure connectivity isn't currently being restarted
        if [ ! -f /Users/[insert_your_user_here]/ ]; then

                #create pid file so script doesn't continuously run
                #when being restarted
                echo "Creating PID"
                touch /Users/[insert_your_user_here]/;

                #log event
                echo "No Internet " `date` >> /Users/[insert_your_user_here]/network_monitor.log;

                #flush DNS
                echo "Flushing DNS"
                dscacheutil -flushcache;

                #restart airport
                echo "Restarting Airport"
                /usr/sbin/networksetup -setairportpower en1 on;

                #sleep for five seconds
                echo "Sleeping"
                sleep 5;

                #clear pid file
                echo "Clearing PID"
                rm -f /Users/[insert_your_user_here]/;
	#internet is good
	echo "Internet Live " `date` >> /Users/[insert_your_user_here]/network_monitor.log;

PHP File Streaming with cat and passthru

by Jason on November 9, 2009

The problem

This weekend, I needed to figure out how to make an on-request, downloadable, backup of an entire MySQL database (for Magento, in case you were wondering).
At first, I attempted the simplest approach, which was to perform the backup via system() or exec(), and then serve the file as a download using either file_get_contents or fopen/fread, etc. This wouldn’t work because PHP would run out of memory when reading the file. I even tried stream_get_contents() with the same result.


I’m sure there are other ways of getting this done, but my approach has been working flawlessly so far and I thought I’d share with you.

The solution

Here is the entire code segment which performs the backup, bzips the SQL file, then serves it as a download using passthru and cat.

		//Define where backup will go
		$backup_folder = getenv("DOCUMENT_ROOT") . $settings['base_url'] . '/backup/';
		$backup_file = 'aps_backup_' . date("Y-m-d-H-i-s")  . '.sql.bz2';
		$backupFile = $backup_folder . $backup_file;
		//Perform the database backup
		$command = "mysqldump --opt -h$dbhost -u$dbuser -p$dbpass $dbname | bzip2 > $backupFile";
		error_log("Executing: $command");
		//Serve file as download
		header("Content-type: application/octet-stream");
		header("Content-Disposition: attachment; filename=\"$backup_file\"");
		passthru("cat $backupFile");

As you can see, the database backup command is executed, and the output is bzipped and stored in the “backup” folder.
Once the command is completed, the headers are setup to serve a file download, and using the passthru command, we simply cat the file.
This effectively streams the file as a download, thus alleviating the memory limit errors I was receiving when trying to open/read/serve the large file download.
Obviously, this wouldn’t work on Windows.

I’m curious if anyone else has been faced with this problem, and what your solution(s) were?

PHPMailer Inline String Attachment

by Jason on September 9, 2009

Recently, while using PHPMailer, I needed the ability to have an inline string attachment, however this functionality is not present as of version 5.0.2, so I wrote my own.

For those who don’t know, PHPMailer has a number of ways you can create email attachments:

  • Standard Attachment – works as you’d expect
  • Inline Attachment – attachments you can reference in the message body. For instance, if you attach an image, you can then reference that image using the IMG tag in the body
  • String Attachment – builds an attachment from blob data (usually stored in a database)

Inline attachments and String attachments rock, however there was no combination of the two. So, I wrote one. Simply add this block of code to your phpmailer class and you’re good to go.

   * Adds a string or binary attachment (non-filesystem) to the list.
   * This method can be used to attach ascii or binary data,
   * such as a BLOB record from a database.
   * @param string $string String attachment data.
   * @param string $cid Content ID of the attachment.  Use this to identify
   *        the Id for accessing the image in an HTML form.
   * @param string $filename Name of the attachment.
   * @param string $encoding File encoding (see $Encoding).
   * @param string $type File extension (MIME) type.
   * @return void
  public function AddInlineStringAttachment($string, $cid, $filename, $encoding = 'base64', $type = 'application/octet-stream') {
    // Append to $attachment array
    $this->attachment[] = array(
      0 => $string,
      1 => $filename,
      2 => $filename,
      3 => $encoding,
      4 => $type,
      5 => true,  // isStringAttachment
      6 => 'inline',
      7 => $cid

WordPress API

by Jason on July 5, 2009

Recently I’ve gotten into WordPress plugin development. I’ve created a few plugins for clients that include widgets, settings pages, and hooks into several actions. However, one thing I’ve not yet been able to figure out is how to encrypt settings page options. Has anyone done this?

There are currently a few ways to create settings pages. You can either write out the entire form, or you can create sections and fields using the API. Each option, however, utilizes get_option() to show the current value.

This is fine, however there is no API function (that I’ve found) that runs BEFORE an option is updated. The only function I have right now is update_option_(the_option)(). But, this runs AFTER the update has already happened. So, if I wanted to encrypt those options using server-side encryption, I would enter into an infinite loop since I’d be calling update_option_(the_option)() over and over again.

Has anyone ever encrypted settings page option values before? If so, how do you go about doing it properly?

I’m sure I can hack together a solution that encrypts & decrypts the values using Javascript on the form, but that just seems dirty to me.

I’d be interested to hear if anyone else has had this problem.

Javascript Minifier

by Jason on June 14, 2009

Last week, I created a simple online tool for minifying Javascript snippets or entire Javascript files.


Check it out

Kaltura WordPress Plugin – Switching Partner IDs

by Jason on May 26, 2009

I am using the Kaltura All-In-One-Video WordPress Plugin on a project these days and I’ve come across a situation where I needed to switch my Partner ID to a new one (we purchased a paid account).

However, there is currently no way of doing this in the interface or by editing any of the plugin files… weird.

After calling their support and getting no answer, I found my own solution and I’m posting it here in hopes that it may help you.

This involves deleting a few rows from the wordpress database, so I’d recommend doing a backup before any of this.

Simply running the following SQL statement on your wordpress database will remove all Kaltura configurations:

delete from wp_options WHERE option_name LIKE ‘kaltura%’;

After running this statement, login to the admin area of WordPress and you will now be prompted to Create a Partner ID (or specify an existing one).

Hope this helps!

Blueprint CSS Framework

by Jason on April 30, 2009

The past few projects I’ve worked on, I’ve opted to use the Blueprint CSS Framework.

Blueprint offers several things that truly make a developer happy:

  1. Small footprint
  2. Instant cross-browser compatibility
  3. Simple grid-based framework that allows for the creation of simple to complex web designs
  4. The grid is completely customizable.
  5. Did I mention cross-browser compatibility?

I’ve always considered myself fairly fluent with CSS, but nothing is more frustrating than wasting a few hours getting things to line up properly in all browsers when you could be spending that time making the product more stable or adding new features.

Blueprint CSS uses a CSS Reset style, and then sets very nice (yet configurable) defaults for fonts, font sizes, alignment, grid, etc. For those who aren’t aware, a CSS Reset style is essentially a few lines of cleverly crafted CSS that removes the default options browsers set for things like font sizes, alignment, and a variety of tag defaults. This dramatically improves ones ability to make a site cross-browser compatible, but requires that you manually set options you aren’t used to setting. Luckly Blueprint does this for you.

Ok, that’s enough support for Blueprint today. I’m not saying everyone should use it, but I’d highly recommend it. It saved me a ton of time and I’ve always been extremely happy with the results.