Home » Php » php regex to extract multiple matches from string

php regex to extract multiple matches from string

Posted by: admin July 12, 2020 Leave a comment

Questions:

I’m trying to make a php regex to extract multiple sections/conditions from one string… let me show you what I’m talking about; this is an excerpt from the total file contents (the real contents contain hundreds of these groupings):

part "C28"
{ type       : "1AB010050093",
  %cadtype   : "1AB010050094",
  shapeid    : "2_1206",
  descr      : "4700.0000 pFarad 10.00 % 100.0 - VE5-VS3",
  insclass   : "CP6A,CP6B",
  gentype    : "RECT_032_016_006",
  machine    : "SMT",
  %package   : "080450E",
  %_item_number: "508",
  %_Term_Seq : "" }
part "C29"
{ type       : "1AB008140029",
  shapeid    : "2_1206",
  descr      : "150.0000 pFarad 5.00 % 100.0 Volt NP0 CERAMIC CAPACITOR",
  insclass   : "CP6A,CP6B",
  gentype    : "RECT_032_016_006",
  machine    : "SMT",
  %package   : "080450E",
  %_item_number: "3",
  %_Term_Seq : "" }

As you can see, the data in the excerpt repeats twice. I need to search through the whole file and extract the following:

  • string after the word “part” — which would be “C28” or “C29”
  • string after the “type” property — which would be “1AB010050093” or “1AB008140029”

So, essentially, I need to get all the part references and associated types out of this file…and I’m not sure the best way to go about doing this.

Please let me know if more info is needed to help… thanks in advance!

How to&Answers:

Description

This expression will:

  • capture the group name as ref
  • capture the values of the type and descr fields.
  • The Type field when captured should be put into a named group called partnumber
  • The fields can appear in any order in the body
  • the descr field is optional and should only be captured if it exists. The (?:)?`` brackets around thedescr` field makes the field optional

Note this is a single expression so you’ll in to use the x option to so the regex engine ignore white space.

^part\s"(?P<ref>[^"]*)"[^{]*{
(?:(?=[^}]*\sdescr\s*:\s+"(?P<descr>[^"]*)"))?
(?=[^}]*\stype\s*:\s+"(?P<type>[^"]*)")

enter image description here

PHP Code Example:

Input Text

part "C28"
{ type       : "1AB010050093",
  %cadtype   : "1AB010050094",
  shapeid    : "2_1206",
  descr      : "4700.0000 pFarad 10.00 % 100.0 - VE5-VS3",
  insclass   : "CP6A,CP6B",
  gentype    : "RECT_032_016_006",
  machine    : "SMT",
  %package   : "080450E",
  %_item_number: "508",
  %_Term_Seq : "" }
part "C29"
{ type       : "1AB008140029",
  shapeid    : "2_1206",
  descr      : "150.0000 pFarad 5.00 % 100.0 Volt NP0 CERAMIC CAPACITOR",
  insclass   : "CP6A,CP6B",
  gentype    : "RECT_032_016_006",
  machine    : "SMT",
  %package   : "080450E",
  %_item_number: "3",
  %_Term_Seq : "" }
part "C30"
{ type       : "1AB0081400 30",
  shapeid    : "2_1206 30",
  insclass   : "CP6A,CP6B 30",
  gentype    : "RECT_032_016_006 30",
  machine    : "SMT 30",
  %package   : "080450E 30 ",
  %_item_number: "3 30 ",
  %_Term_Seq : "30" }

Code

<?php
$sourcestring="your source string";
preg_match_all('/^part\s"(?P<ref>[^"]*)"[^{]*{
(?:(?=[^}]*\sdescr\s*:\s+"(?P<descr>[^"]*)"))?
(?=[^}]*\stype\s*:\s+"(?P<partnumber>[^"]*)")/imsx',$sourcestring,$matches);
echo "<pre>".print_r($matches,true);
?>

Matches

$matches Array:
(
[ref] => Array
    (
        [0] => C28
        [1] => C29
        [2] => C30
    )

 [descr] => Array
    (
        [0] => 4700.0000 pFarad 10.00 % 100.0 - VE5-VS3
        [1] => 150.0000 pFarad 5.00 % 100.0 Volt NP0 CERAMIC CAPACITOR
        [2] => 
    )

[partnumber] => Array
    (
        [0] => 1AB010050093
        [1] => 1AB008140029
        [2] => 1AB0081400 30
    )

)

Answer:

Assuming each groups have the same structure, you can use this pattern:

preg_match_all('~([^"]++)"[^{"]++[^"]++"([^"]++)~', $subject, $matches);
print_r($matches);

EDIT:

Notice: if you have more informations to extract, you can easily transform your datas into json, example:

$data = <<<LOD
part "C28"
{ type       : "1AB010050093",
  %cadtype   : "1AB010050094",
  shapeid    : "2_1206",
  descr      : "4700.0000 pFarad 10.00 % 100.0 - VE5-VS3",
  insclass   : "CP6A,CP6B",
  gentype    : "RECT_032_016_006",
  machine    : "SMT",
  %package   : "080450E",
  %_item_number: "508",
  %_Term_Seq : "" }
part "C29"
{ type       : "1AB008140029",
  shapeid    : "2_1206",
  descr      : "150.0000 pFarad 5.00 % 100.0 Volt NP0 CERAMIC CAPACITOR",
  insclass   : "CP6A,CP6B",
  gentype    : "RECT_032_016_006",
  machine    : "SMT",
  %package   : "080450E",
  %_item_number: "3",
  %_Term_Seq : "" }
LOD;
$trans = array( "}\n"   => '}, ' , 'part'  => ''    ,
                "\"\n{" => ':{"' , ':'     => '":'  ,
                "\",\n" => '","' );

$data = str_replace(array_keys($trans), $trans, $data);
$data = preg_replace('~\s*+"\s*+~', '"', $data);
$json_data =json_decode('{"'.substr($data,1).'}');

foreach ($json_data as $key=>$value) {
    echo '<br/><br/>part: ' . $key . '<br/>type: ' . $value->type;    
}