Home » Php » php – Tips for refactoring a 20K lines library

php – Tips for refactoring a 20K lines library

Posted by: admin April 23, 2020 Leave a comment


I’ve already awarded a 100 point bounty to mario’s answer, but might start a second 100 point bounty if I see new good answers coming in. This is why I’m keeping the question open and will not choose a final answer, despite having awarded the bounty to mario.

This might seem like a simple question (study the code and refactor) but I’m hoping those with lots more experience can give me some solid advice.

The library is an open source 20,000 line library that’s all in a single file and which I haven’t written myself. The code looks badly written and the single file is even a bigger problem, because it freezes eclipse for half a minute at least every time I want to make a change, which is one of the reasons I think it’s worth it to refactor this library into smaller classes.

So aside from reading the code and trying to understand it, are there common (or not so common) tips when refactoring a library such as this? What do you advise to make my life a little easier?

Thanks to everyone for your comments.

How to&Answers:

A few generic principles apply:

  1. Divide and conquer. Split the file into smaller, logical libraries and function groupings. You will learn more about the library this way, and make it easier to understand and test incrementally.

  2. Remove duplication. Look for repeated functions and concepts, and replace them with standard library functions, or centralized functions within the library.

  3. Add consistency. Smooth out parameters and naming.

  4. Add unit tests. This is the most important part of refactoring a library. Use jUnit (or similar), and add tests that you can use to verify that the functions are both correct, and that they have not changed.

  5. Add docs. Document your understanding of the consistent, improved library as you write your tests.


If the code is badly written, it is likely that it has a lot of cloning. Finding and getting rid of the clones would then likely make it a lot more maintainable as well as reducing its size.

You can find a variety of clone detectors, these specifically for PHP:

ranked in least-to-most capability order (IMHO with my strong personal self-interest in CloneDR) in terms of qualitatively different ability to detect interesting clones.

If the code is badly written, a lot of it might be dead. It would be worthwhile to find out which part executes in practice, and which does not. A test coverage tool can give you good insight into the answer for this question, even in the absence of tests (you simply exercise your program by hand). What the test coverage tool says executes, obviously isn’t dead. What doesn’t execute… might be worth further investigation to see if you can remove it. A test coverage tool is also useful to tell you how much of the code is exercised by your unit tests, as suggested by another answer. Finally, a test coverage tool can help you find where some of the functionality is: exercise the functionality from the outside, and whatever code the test coverage tool says is executed is probably relevant.

Our PHP Test Coverage Tool can collect test coverage data.


If it’s an open source library, ask the developers. First it’s very likely someone already has (attempted) a restructured version. And very occassionally the big bloated version of something was actually auto-generated from a more modular version.

I actually do that sometimes for one of my applications which is strictly pluginized, and allows a simple cat */*.php > monolithic.php, which eases distribution and handling. So ask if that might be the case there.

If you really want to restructure it, then use the time-proven incremental extension structure. Split up the class library into mutliple files, by segregating the original class. Split every ~ 2000 lines, and name the first part library0.php:

 class library0 {
     var $var1,$var2,$var3,$var4;
     function method1();
     function method2();
     function method3();
     function method4();
     function method5();

The next part simple goes from there and holds the next few methods:

 class library1 extends library0 {
     function method6();
     function method7();
     function method8();

Do so until you have separated them all. Call the last file by its real name library.php, and class library extends library52 { should do it. That’s so ridiculously simplistic, a regex script should be able to do it.

Now obviously, there are no memory savings here. And splitting it up like that buys you nothing in terms of structuring. With 20000 lines it’s however difficult to get a quick overview and senseful grouping right the first time. So start with an arbitrary restructuring in lieu of an obvious plan. But going from there you could very well sort and put the least useful code into the last file, and use the lighter base classes whenever they suffice. You’ll need a dependency chart however to see if this is workable, else errors might blow up at runtime.

(I haven’t tried this approach with a huge project like that. But arbitrarily splitting something into three parts, and then reshuffling it for sensibility did work out. That one time.)


  • I assume you are planning to break the library up into thematically relevant classes. Definitely consider using autoloading. It’s the best thing since sliced bread, and makes inter-dependencies easy to handle.

  • Document the code using phpDoc compatible comments from the start.


Calling Side Approach

If you know the library use is limited to a particular class, module, or project it can be easier to approach the problem from the calling side. You can then do the following to clean the code and refactor it. The point of approaching from the calling side is because there are very few calls into the library. The fewer the calls the (potentially) less code that is actually used in the lib.

Write the Calling Side Tests

Write a test that mimics the calls that are done against the library.

Bury the Dead Code

If there is a lot of dead code this will be a huge win. Trace the the actual calls into the library and remove everything else. Run the test and verify.

Refactor Whats Left

Since you have the tests it should be much easier to refactor (or even replace) the code in the library. You can then apply the standard refactoring rules ie. (de-duplication, simplification, consolidation, etc).


Apart from what was already stated I suggest to have a look at Martin Fowler’s Catalog of Refactorings based on his book. The page also contains a large number of additional sources useful in understanding how refactoring should be approached. A more detailed catalog listing can be found at sourcemaking. Note that not all of these techniques and patterns can be applied to PHP code.

There is also a lot useful tools to assist you in the refactorings (and in general) at http://phpqatools.org. Use these to analze your code to find things like dead or duplicated code, high cyclomatic complexity, often executed code and so on. Not only will this give you a better overview of your code, but it will also tell you which portions of your code are critical (and better left untouched in the beginning) and which could be candidates for refactorings.

Whatever you do, do write Unit-Tests. You have to make sure you are not breaking code when refactoring. If the library is not unit-tested yet, add a test before you change any code. If you find you cannot write a test for a portion of code you want to change, check if doing a smaller refactoring in some other place might let you do so more easily. If not, do not attempt the refactoring until you can.


  1. Write tests for the library such
    that all the lines of the code is
    covered(i.e 100% Coverage).
  2. Use
    TDD. Start from the higher
    level module and re-factor(Top to
    Bottom approach).
  3. Run the tests mentioned in step 1. and verify with the results of step 2.

I understand that 100% coverage(as mentioned in step 1) does not necessarily mean that all the features have been covered at least we are making sure that whatever the o/p of the current system will be same as the o/p of new system.


A good book that answers your question with a lot of examples and details is: Working Effectively with Legacy Code, by Michael Feathers.


First of all, consider using a different IDE – Eclipse is notoriously terrible in terms of performance. Komodo is way faster. So is PhpStorm.

In terms of making the refactoring easier, I’d first try to identify the high-level picture – what functions are there? Are there classes? Can you put those classes into separate files just to start with?



Refactoring depends from you goals and type of solution. This book will help you to understand basic concepts of right code.


If you problem include the headache of manually placing the functions in different files than may be below strategy can help.

get your library file ina php variable

$code = file_get_contents('path/yo/your/library.php');

eliminate tags

 $code = str_replace('<?php' ,'' ,$code);
  $code = str_replace('?>' ,'' ,$code);

separate all the functions

$code_array = explode('function',$code);

now body of all the functions and their names are in array
create separate files for each of the functions in folder ‘functions’

foreach($code_array as $function)
   $funcTemp = explode('(',$function); // getting function name
   $function_name  = trim($funcTemp[0]);

  $function_text = '<?php function '.$function;


now all the functions of your library are in the separate files in a common folder. files are named with the function names. now you can easily look up you functions in folder view and apply your strategies to manage them.

You can also implemet __call() function to use same formates

function __call($name,$params)
  $name($params); // this may be wrong ...

Hope it helps 🙂


Usually, a general rule of thumb is to remove repeated code. Also make sure to have useful documentation. If you’re using Java, Javadoc is very useful, but a suitable equivalent is available for other languages.