30

The facts:

  • there is a website
  • this website is accessible via www.example.org
  • there is an EC2 instance which very likely keeps the website
  • the server is Apache
  • the server OS is Ubuntu
  • I have full access to the server (and sudo privileges)
  • the server is a huge mess

The problem is I have no idea where to - simply put - find the index.html/index.php which gets loaded.

How do I figure out where to find the website's PHP and HTML code? Is there a systematic approach to this problem?

Raffael
  • 689
  • 7
  • 15

6 Answers6

55

First of all you should check what websites are hosted on the server

# apachectl -t -D DUMP_VHOSTS

Then when you will find a site check corresponding configuration file for the option DocumentRoot. For example

# apachectl -t -D DUMP_VHOSTS
VirtualHost configuration:
wildcard NameVirtualHosts and _default_ servers:
*:80                   is a NameVirtualHost
         default server 192.168.88.87 (/etc/httpd/conf.d/192.168.88.87.conf:1)
         port 80 namevhost 192.168.88.87 (/etc/httpd/conf.d/192.168.88.87.conf:1)
         port 80 namevhost gl-hooks.example.net (/etc/httpd/conf.d/hooks.conf:1)
                 alias example.net
                 alias www.example.net

You want to know where is resides a website example.net

# grep DocumentRoot /etc/httpd/conf.d/hooks.conf
    DocumentRoot /vhosts/gl-hooks.example.net/

# cd /vhosts/gl-hooks.example.net/
# ls -la
total 4484
drwxr-xr-x  6 apache apache    4096 Feb 10 11:59 .
drwxr-xr-x 14 root   root      4096 Feb 23 08:54 ..
-rw-r--r--  1 root   root      1078 Dec 19 09:31 favicon.ico
-rw-r--r--  1 apache apache     195 Dec 25 14:51 .htaccess
-rw-r--r--  1 apache apache      98 Dec  7 10:52 index.html

Should also be on the lookout for aliases and redirects/rewrites

You also should paid attention on any alias directives. For example with the following settings

<VirtualHost *:80>
   ServerName example.net
   ServerAlias www.example.net
   ...
   DocumentRoot /vhosts/default/public_html/
   Alias /api/ /vhosts/default/public_api/
   ...
</VirtualHost>

When you will access http://example.net/some.file.html - apache will look the file at /vhosts/default/public_html/, at the same time with http://example.net/api/some.file.html the file will be looked at /vhosts/default/public_api/.

What about rewrites/redirects, especially programmatic (when redirects are triggered by some php code), I think there is no easy way to find such cases.

ALex_hha
  • 7,193
  • 1
  • 25
  • 40
3

Try using find

find / -type f \( -iname "*index.html*" -o -iname "*index.php*" \) 2> /dev/null

Otherwise assuming Apache has been installed from Ubuntu repositories, look in /etc/apache2/sites-available, i.e.

grep -niR "thedomainname" /etc/apache2/sites-available

If the website has an apache VHOST defined, that might locate the config file, then look in that file for "documentroot" this should tell you the location of the source code

the_velour_fog
  • 497
  • 2
  • 4
  • 14
  • 1
    well ... I "did" that - it took 2 hours, the server almost stopped responding and I found 67 index.html's and almost as many index.php's. So that approach isn't doing it for me. – Raffael Feb 29 '16 at 11:00
  • 2
    It's a really bad idea to use find in such case – ALex_hha Feb 29 '16 at 11:06
  • 1
    And probably you should use sites-enabled instead – ALex_hha Feb 29 '16 at 11:11
  • 1
    index.html isn't a great file to hunt for. There's a few CMSs out there which put one in each directory in case directory listings are not turned off in Apache so it will always load a blank page instead of showing the directory contents. – gabe3886 Feb 29 '16 at 16:29
  • @the_velour_fog No that it's super critical here, but the `-type f` is only applying to `-iname "*index.html*"` in your command. Should be `-type f \( -iname "*index.html*" -o -name "*index.php*" \)` –  Mar 01 '16 at 22:23
  • @BroSlow thanks for pointing that out - updated answer. – the_velour_fog Mar 01 '16 at 22:39
2

Another method, which can be useful for debugging a website (or any process for that matter) is to use lsof (which may not be on path, commonly found in /sbin/lsof)

lsof -s [PID] will list all the files the given process has a handle on, and can be useful to see exactly what is being used (this includes your html/php files, as well as log files and libraries the site needs)

Centimane
  • 216
  • 2
  • 14
1

Please go to

cd /etc/apache2/site-avaliable/

Here you will find your configuration file (i.e : 000-default.conf)

Please open this file/open your configuration file using

vi 000-default.conf

There you will find DocumentRoot.That is your website's code

This is the Default conf file likewise you will some conf details please check those as well.

1

I have no idea where to ... find the index.html/index.php which gets loaded.

Look for page source files

One approach is to browse the site to find a more unique page - lets say newcontactform.php - ideally one that is unlikely to appear in other sites hosted by the same server.

You can then try

locate newcontactform.php

if that fails, follow by

find / -name newcontactform.php

this should produce a managably small list of candidates.

You can then inspect the files, do diffs and if necessary try small changes (e.g. insert an HTML comment) to verify that the file indeed produces the page.

Find the configs

Sometimes config files are evident in the output of the ps command. Worst case is ps -ef | grep -e 'apache|httpd' but more creative use of ps options might be worth exploring.

You can look for httpd.conf in the typical locations for Ubuntu and for the Apache httpd project (which may differ) or just use locate and find as above.

Sometimes the main config file refers to other config files for vhosts. You can work this out by identifying the main config file.

Chronic cases

Sometimes, old servers run a variety of webserver daemons. In that case it can take a while to find them all and work out where their config files are. A combination of the techniques above should eventually succeed.

You can find what programs are listening on port 80 etc using netstat -lntp. Often, locating the binaries is a useful pointer to a directory tree that contains the config files.

RedGrittyBrick
  • 3,832
  • 1
  • 17
  • 23
1

You can check the Vhost for the domain that you are looking for in the web server's (apache) configuration file - httpd.conf (most probably located in /etc/) Simply open the file and scroll thru it until you find the VirtulaHost directive for your domain and there you will see the DocumentRoot directive - which is your website's document root directory, the place where you will find the application's files.