PHP Search this Website Application that Works
Foreword: This is a PHP search engine program for your own website.
By: Chrysanthus Date Published: 26 Aug 2018
Introduction
Before you continue, make sure you are registered so that you can be paid for reading and copying the code of this article, and more. The application is just one PHP file, which you place in a directory (preferably home directory).
Code Segments
The file has the following code segments:
- HTML Code Strings
- Global Variables
- Placing of Useful Keywords into Array
- The scanTree() Recursive Function
HTML Code Strings
This code segment has the top and bottom HTML code. In this segment, replace the title, "Searching Title goes Here" with that of your choice.
Global Variables
The recursive function needs global variables to work.
Placing of Useful Keywords into Array
The HTTP POST method is used to send the keyword phrase from the browser to the web server. At the server, this code segment removes the non-keywords such as prepositions. The keywords are placed into an array.
The scanTree() Recursive Function
This code segment searches the directory tree for HTML files, beginning from the current workig directory. It checks each HTML file for the presence of any of the keywords. If your site is big, the code would take some time.
The Complete Program
Here is the complete program. Replace the value of the variable, $domainURL, with yours (like http://www.google.com). You are free to copy and modify the complete code and use it for any purpose. The complete code is:
<?php
$pageTop = "<!DOCTYPE HTML>
<html lang='en'>
<head>
<title>Searching Title goes Here</title>
</head>
<body >
<article>";
$pageBottom = "</article>
</body>
</html>";
echo $pageTop; //top of page without data
echo "<h1>Search Result</h1>";
//obtain the search string
$searchStr = $_POST['searchStr'];
if ($searchStr == "")
{
echo "<strong>Search string is empty!</strong>";
}
else
{
//remove the non-keywords using regex
$nonKeywords = array("about", "along", "among", "before", "after", "by", "for", "in", "from", "on", "of", "since", "to", "until", "till", "up", "with", "between", "the", "a", "an", "while", "whereas", "since", "as", "for", "therefore", "but", "and", "or", "I", "you", "he", "she", "we", "they", "me", "him", "her", "us", "them", "my", "your", "his", "her", "our", "their", "mine", "yours", "hers", "ours", "theirs", "some", "few", "many", "much", "little");
$arrLength = count($nonKeywords); // no. of elements in the array
$newSearchStr; //search string after removing non-keywords
for ($i=0; $i<$arrLength; ++$i)
{
$newSearchStr = preg_replace("/b$nonKeywords[$i]b/", "", $searchStr);
$searchStr = $newSearchStr;
}
//place each word of search string into an array
$searchStrArr;
preg_match_all("/c++|bw+b/i", $newSearchStr, $searchStrArr);
}
$foundSomething = false;
$dr = '.';
$level = 0;
$noItems = 0;
$us = array(0); //array of indexes to scan already visited directory
$u = 0;
$begin = 0;
$j = 0;
$goingUp = false;
chdir('.');
$iPath = getcwd();
$domainURL = 'http://localhost';
//search entire site
function scanTree($path, $begin)
{
global $foundSomething, $searchStrArr, $level, $begin, $j, $noItems, $us, $u, $goingUp, $iPath, $domainURL;
$arrDir = scandir($path);
$noItems = count($arrDir);
//to end recursion
if (($level === 0)&&($goingUp === true)&&(count($us) === 0))
{
$noItems = 0;
$begin = 0;
}
if ($begin == $noItems)
{
array_pop($us);
$indx = count($us) - 1;
$u = $us[$indx];
}
{
if ($arrDir[$j] === '.')
continue;
if (($arrDir[$j] === '..')&&($noItems == 2))
{
continue;
}
elseif ($arrDir[$j] === '..')
continue;
if (is_dir($arrDir[$j]))
{
if ($level === 0)
$us[0] = $j + 1; //reset comeback index for topmost directory
if (($goingUp === true)&&($level !== 0))
{
array_pop($us);
$goingUp = false;
}
elseif ($level === 0)
$goingUp = false;
if ($y === null)
$y = $j + 1; //for the very first (top) scan
$currPath = $path . '/' . $arrDir[$j] . '/';
chdir($currPath);
$level = $level + 1;
$u = $j + 1;
$us[] = $u;
$begin = 0;
scanTree(getcwd(), 0);
}
else
{
if (preg_match("/.htm$/", $arrDir[$j]))
{
$fileStr = file_get_contents($arrDir[$j]);
$fileStr = preg_replace('/^s*|s*$/', '', $fileStr); #remove leading and trailing whitespaces
$title = '';
preg_match("/<title>.+</title>/", $fileStr, $title);
$titl = $title[0];
$titl = preg_replace("/<title>/", '<strong>', $titl);
$titl = preg_replace("/</title>/", '</strong><br>', $titl);
$keyStrArrD = array();
for ($k=0; $k<count($searchStrArr[0]); ++$k)
{
$keyArr;
$keyStrArr;
$keyStrArr1 = array();
$keyStrArrNoT = array();
$regex0 = $searchStrArr[0][$k];
if (preg_match_all("/$regex0/", $fileStr, $keyArr))
{
for ($l=0; $l<count($keyArr[0]); ++$l)
{
$regex1 = $keyArr[0][$l];
preg_match_all("/.{0,66}$regex1.{0,66}/", $fileStr, $keyStrArr);
array_push($keyStrArr1, $keyStrArr[0][$l]);
}
for ($m=0; $m<count($keyStrArr1); ++$m)
{
$strNoD = preg_replace("/<.+>/", '', $keyStrArr1[$m]);
$strNoD = preg_replace("/</.+>/", '', $strNoD);
array_push($keyStrArrNoT, $strNoD);
}
for ($n=0; $n<count($keyStrArrNoT); ++$n)
{
if (preg_match("/w/", $keyStrArrNoT[$n]))
{
$strD = '. . . ' . $keyStrArrNoT[$n] . ' . . . ';
array_push($keyStrArrD, $strD);
}
}
}
}
if (count($keyStrArrD) > 0)
{
$foundSomething = true;
$ePath = $path;
$ePath = str_replace($iPath, '', $ePath);
echo "<a href='$domainURL" . $ePath . '/' . $arrDir[$j] . "'>$titl</a>";
for ($q=0; $q<count($keyStrArrD); ++$q)
{
echo $keyStrArrD[$q];
}
echo '<br><br>';
}
}
}
{
$arrDirPresent = scandir($path);
$usefulDirPrsnt = 'No';
for ($w=0; $w<count($arrDirPresent); ++$w)
{
if (($arrDir[$w] === '.')||($arrDir[$w] === '..'))
continue;
if (is_dir($arrDirPresent[$w]))
$usefulDirPrsnt = 'Yes';
}
if ($usefulDirPrsnt == 'Yes')
{
array_pop($us);
$inx = count($us) - 1;
$u = $us[$inx];
}
}
}
if ($level > 0)
{
$goingUp = true;
chdir('..');
$level = $level - 1;
$currPath = getcwd();
$begin = $u;
scanTree($currPath, $u);
}
}
scanTree($dr, 0);
if ($foundSomething === false)
{
echo "<strong>No match found! </strong>";
echo "<br>Go to: <a href='categories.htm'>Categories</a>";
}
echo $pageBottom; //bottom of page without data
?>
Chrys
Related Links
Basics of PHP with Security ConsiderationsWhite Space in PHP
PHP Data Types with Security Considerations
PHP Variables with Security Considerations
PHP Operators with Security Considerations
PHP Control Structures with Security Considerations
PHP String with Security Considerations
PHP Arrays with Security Considerations
PHP Functions with Security Considerations
PHP Return Statement
Exception Handling in PHP
Variable Scope in PHP
Constant in PHP
PHP Classes and Objects
Reference in PHP
PHP Regular Expressions with Security Considerations
Date and Time in PHP with Security Considerations
Files and Directories with Security Considerations in PHP
Writing a PHP Command Line Tool
PHP Core Number Basics and Testing
Validating Input in PHP
PHP Eval Function and Security Risks
PHP Multi-Dimensional Array with Security Consideration
Mathematics Functions for Everybody in PHP
PHP Cheat Sheet and Prevention Explained
More Related Links