Skip to content

Running a stored procedure only once after all records in a CSV file have been saved to the database #445

@mickvikt

Description

@mickvikt

Let's say we have a data pipeline like this:

        var stream = contextStream
            .ToSqlCommand("Create tables", Queries.CreateTables)
            .CrossApplyFolderFiles("Get data file", "data.csv", true)
            .CrossApplyTextFile("Parse data", FlatFileDefinition.Create(
                i => new
                {
                    code = i.ToColumn(0),
                    name = i.ToColumn(1)
                }).IsColumnSeparated(','))
            .SqlServerSave("Populate data table", o => o
                .SeekOn(i => code)
                .ToTable("[dbo].[some_table]")
            )
	   .ToSqlCommand("Run a stored procedure", Queries.RunStoredProcedure);

A CSV is parsed, its records are saved to a table on a SQL Server database and then I want to run a stored procedure which depends on that data.

If I try to run it and the CSV file contains 5 rows, stored procedure is being run 5 times, and I only need to run it once.

I tried to create a second data pipeline to run the stored procedure like this:

        var stream = contextStream
            .ToSqlCommand("Create tables", Queries.CreateTables)
            .CrossApplyFolderFiles("Get data file", "data.csv", true)
            .CrossApplyTextFile("Parse data", FlatFileDefinition.Create(
                i => new
                {
                    code = i.ToColumn(0),
                    name = i.ToColumn(1)
                }).IsColumnSeparated(','))
            .SqlServerSave("Populate data table", o => o
                .SeekOn(i => code)
                .ToTable("[dbo].[some_table]")
            );

       contextStream
	   .ToSqlCommand("Run a stored procedure", Queries.RunStoredProcedure);         

but when I try to execute it, stored procedure is not being run after whole CSV import is complete and rows are stored in the table, it's being run somewhere in between and the stored procedure can't get the data it needs since it has not been saved yet.

How can I signal to the pipeline, that I need to run the stored procedure just once?
I tried to create a separate DefineProcess method for running SP, but that seems really clumsy.

What would you advice?
Thank you.

Metadata

Metadata

Assignees

Labels

documentationDocumentation may be added or completed on the portalquestionFurther information is requested

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions