Home » excel » powerquery – Excel Power Query/M – Dynamically Combine a Number of Connections Based on the Number of Pages Available on a Website

powerquery – Excel Power Query/M – Dynamically Combine a Number of Connections Based on the Number of Pages Available on a Website

Posted by: admin May 14, 2020 Leave a comment

Questions:

I’ve only started to dip my toes into the world of M (I bought a book and everything whee!), so I’m hoping one of you brilliant people out there can help me 🙂

I use Power Query to pull data from tables on a website across multiple pages

Lucky for me, the URL for them ends in “page=1”, etc.

I have loaded each page as a separate connection (connection only) and have a master sheet that has all of them appended together. Each connection is named “Page 1”, “Page 2”, etc.

The original code is:

let
Source = Table.Combine({#"Page 1", #"Page 2", #"Page 3", #"Page 4", #"Page 5", #"Page 6", #"Page 7", #"Page 8", #"Page 9", #"Page 10", #"Page 11", #"Page 12", #"Page 13"})
in
Source

The number of pages that have data varies from time to time and I want to figure out a way to dynamically load only up to the number of pages available

(It turns out this website repeats the last page if you put a page number higher than the max pages, so it’s duplicating them in my master table because I’m loading 30 pages)

So in the example that there’s only 8 pages with data, I want it to load:

let
Source = Table.Combine({#"Page 1", #"Page 2", #"Page 3", #"Page 4", #"Page 5", #"Page 6", #"Page 7", #"Page 8"})
in
Source

Lucky again for me, the “Page 1 of 8” is on its own line on the website, so I can easily parse out the “8” with Power Query

A quick search brought up this solution

This seems like a good plan!
I can learn some M!

But I don’t want to use a separate file, so searching more I found a different solution that included using Excel.CurrentWorkbook(){[Name="TabelName"]}[Content]

So I combined it with the above to get:

Source = Excel.CurrentWorkbook(){[Name="TableName"]}[Content],
CombineTable = Table.Combine(Source[Column1])
in
CombineTable

And I made a table named TableName with ‘Page 1’ through ‘Page 30’ down, surrounded it with an IF statement so it’ll be blank in that cell if it’s greater than the number of pages

Threw this into the Advanced Editor
It saw ‘Page 1’, but couldn’t make a table out of it

I have an inkling there’s an easier way to do this, but my Google-Fu is failing me here. It seems like it should be a really easy solution haha

I’d be eternally thankful if anybody can point me in the direction of an elegant solution 🙂

-EDIT-
The code for each connection (the URL is an internal business URL, so I’ve just replaced it with URL):

let
Source = Web.Page(Web.Contents("https://URL&page=1")),
Data0 = Source{0}[Data],
#"Changed Type" = Table.TransformColumnTypes(Data0,{{"", type text}, {"Policy number", type text}, {"Creation date and time", type text}, {"Deadline", type text}, {"Case Ref", type text}, {"Lock User", type text}}),
#"Removed Bottom Rows" = Table.RemoveLastN(#"Changed Type",2),
#"Changed Type1" = Table.TransformColumnTypes(#"Removed Bottom Rows",{{"Creation date and time", type datetime}, {"Deadline", type datetime}}),
#"Removed Columns" = Table.RemoveColumns(#"Changed Type1",{""})
in
#"Removed Columns"

And here’s the code to get the page number:

let
Source = Web.Page(Web.Contents("https:URL&page=1")),
Data0 = Source{0}[Data],
#"Changed Type" = Table.TransformColumnTypes(Data0,{{"", type text}, {"Policy number", type text}, {"Creation date and time", type text}, {"Deadline", type text}, {"Case Ref", type text}, {"Lock User", type text}}),
#"Removed Top Rows" = Table.Skip(#"Changed Type",21),
#"Removed Other Columns" = Table.SelectColumns(#"Removed Top Rows",{"Creation date and time"}),
#"Split Column by Delimiter" = Table.SplitColumn(#"Removed Other Columns", "Creation date and time", Splitter.SplitTextByDelimiter(" ", QuoteStyle.Csv), {"Creation date and time.1", "Creation date and time.2", "Creation date and time.3", "Creation date and time.4"}),
#"Changed Type1" = Table.TransformColumnTypes(#"Split Column by Delimiter",{{"Creation date and time.1", type text}, {"Creation date and time.2", Int64.Type}, {"Creation date and time.3", type text}, {"Creation date and time.4", Int64.Type}}),
#"Removed Other Columns1" = Table.SelectColumns(#"Changed Type1",{"Creation date and time.4"}),
#"Renamed Columns" = Table.RenameColumns(#"Removed Other Columns1",{{"Creation date and time.4", "I&W - Retail - Pages"}})
in
#"Renamed Columns"

How to&Answers:

It’s hard to test this without access to your actual URL.

But we can consolidate everything into one query by generating a list of pages, querying each page with a function to return the data we want, then combining that data.

let
    SourceURL = "https://URL",

    LastPageNumber = (MyURL) =>
        let
            Source = Web.Page(Web.Contents(MyURL & "&page=1")),
            WebData = Source{0}[Data],
            #"Filtered Page Text" = Table.SelectRows(WebData, each Text.StartsWith([Creation date and time], "Page ")),
            #"Last Page Number" = try Number.FromText(Text.AfterDelimiter(#"Filtered Page Text"{0}[Creation date and time], " ", {0, RelativePosition.FromEnd})) otherwise 8
        in
            #"Last Page Number",

    PageNumbers = List.Numbers(1, LastPageNumber(SourceURL)),
    #"Table from List" = Table.FromList(List.Transform(PageNumbers, each Number.ToText(_)), Splitter.SplitByNothing(), {"Page"}),

    GetWebData = (MyURL, PageNumber) =>
        let
            Source = Web.Page(Web.Contents(MyURL & "&page=" & PageNumber)),
            WebData = Source{0}[Data],
            #"Removed Bottom Rows" = Table.RemoveLastN(WebData,2),
            #"Changed Type" = Table.TransformColumnTypes(#"Removed Bottom Rows",{{"", type text}, {"Policy number", type text}, {"Creation date and time", type datetime}, {"Deadline", type datetime}, {"Case Ref", type text}, {"Lock User", type text}})
        in
            #"Changed Type",

    #"Retrieve Web Data" = Table.AddColumn(#"Table from List", "WebData", each try GetWebData(SourceURL, [Page]) otherwise #table({""},{})),
    #"Combined Web Data" = Table.Combine(#"Retrieve Web Data"[WebData])
in
    #"Combined Web Data"

Change the SourceURL value as appropriate.

The first function LastPageNumber filters the Creation date and time column for a value beginning “Page”, and extracts the value after the last space – this gives us the 8 from “Page 1 of 8”.

Now we can generate a list of numbers from 1 to the last page number. Then the second function GetWebData does the retrieval of the web data for each specified page number, and puts those tables returned into a column.

Finally, we combine the web data tables returned using Table.Combine