Why CSV imports often go wrong
CSV feels simple because it is universal. In reality, it carries all the common ambiguities: poor column naming, incomplete addresses, inconsistent banner formats, duplicates and missing values.
If those issues are ignored early on, they contaminate the full analysis. The time saved at the start is then lost in manual corrections, cross-checks and recalculations.
Map the right columns from the start
A strong import starts with explicit mapping. You need to know which columns identify the outlet, the banner, the address, the role in the transaction and the metric used for calculation.
This should not be seen as a purely technical form step. It is a quality-control stage: if the mapping is clear, downstream calculations become far more reliable.
- separate business identifiers from display fields
- plan for banner and group resolution
- validate critical columns before running calculations
Normalise without overengineering
The goal is not to build a full data-engineering pipeline. The goal is to obtain a coherent dataset for a fast and accurate concentration analysis.
The best approach is to correct the inconsistencies that matter most: banners, roles, addresses and core metrics. Beyond that, sophistication should remain proportionate to the case.
Preserve the cleanup trail
Every material correction should remain visible or recoverable: banner grouping, outlet exclusion, role reassignment or store-format reclassification. Without that traceability, the import becomes a black box.
A strong workflow leaves a clean record of what was changed. That is what makes the analysis reproducible, easier to explain to clients and less exposed to silent errors.